We used to learn reading, writing and arithmetic. Now we need to learn about data, too.
It’s also not just understanding the data that you’re putting out into the world and how it’s being used. It’s also important for all of us to be data literate so that we can make use of the data ourselves, and understand and participate in the decisions that are being made by others.

— Distinguished Professor Kerrie Mengersen (2020)

The Challenge

The availability of data across government, business and research has increased dramatically in recent years. This access to data has resulted in almost every member of society needing a skill set that allows them to think critically about the inferences that can validly be drawn to improve decisions based on data. Rapid advancements in technology have lead to researchers conducting sophisticated experiments which collect incredibly large and complex measurement data. As a result, these researchers need access to cutting edge statistical and machine learning models to aid in determining valid inferences from their experimental data and determining the most expeditious path for their future endeavours. Additionally, for ethical and supply reasons, there will always be scientific research that is propelled by small to moderate sized data. These experimenters also need access to modern statistical methods that can extract the most robust inferences from their limited sample. Business need to access these models to aid in making data based decisions that will optimise their operations and returns. It is the conversion of data to inferences on which to base decisions that can enable industry to thrive on a global scale. Access to sophisticated and robust models that can be readily deployed into their organisation are required to ensure all businesses can benefit from the big data era. Our data rich world also presents a major challenge for society that educators need to meet. Producing a data literate population that can comprehend and critically assess the ever increasing data that is presented to them. Interpreting numbers and graphs is a learnt skill that will lead to better decisions to improve outcomes for all members of society.


The Approach

Researchers often have limited access to statistical support and have too many time constraints to invest in learning code to analyse their experimental data. An intuitively designed GUI that allows them to quickly compare multiple models and visualise trial data, relationships and results will enable them to progress their research at a faster pace, testing for many more effects than may be considered using traditional software. The traditional Data Science Team model for business is potentially exposing many small to medium size organisations to making ineffective data based decisions, as the overheads of building a team or continually outsourcing to consulting firms is beyond a reasonable budget. In addition, having personnel construct code from scratch is both time consuming and error prone, with a high level of code review and organisation being required to build validated models. A GUI driven software that is designed to provide access to sophisticated algorithms that can be deployed to any system without the need to code would greatly enhance the use of data in industry. In terms of education, one way of improving statistical literacy and thinking is through the identification and use of appropriate statistical software that will allow students, from a very early age, access to modern statistical modelling techniques on a platform that allows them to focus on outcomes. Rather than treating Bayesian and Frequentist paradigms as disparate entities or statistics and machine learning as separate disciplines, our approach is to switch seamlessly between them. This enables both students and researchers to better understand the similarities and differences between various approaches and underlying philosophies, and allows instructors to teach concepts and thinking, as opposed to coding.


The Solution

The collective experience of PAG, and their collaborative partners, has lead us to begin the development of AutoStat®, a statistical software that is designed to be accessible to a range of users. We believe software that is intuitive to use is an essential element of overhauling the current educational approach to statistical learning. AutoStat® aims to embed state-of-the-art statistical and machine learning tools in an accessible, modelling-focused interface. It is designed to meet the needs of business, researchers and students who would otherwise be excluded from the process of modern statistical modelling due to their lack of coding acumen or through an academic inclination that does not encompass coding. It also addresses the need for appropriate, reproducible statistical approaches by removing many of the error-prone steps associated with coding, choice of algorithm and presentation of results. It is designed to facilitate confidence in modelling and further exploration of statistical paradigms and potential. Graphics is a very powerful tool for understanding data, and to maximise exposure we have designed a drag and drop GUI that allows any relationship of interest to be displayed in seconds. Businesses will benefit from its use by rapidly prototyping models to compare predictive power, then deploying the chosen modelling suite onto their system.


The Impact

AutoStat® has the potential to free students’ intellectual resources in order to focus on understanding the ideas, concepts and inferences in statistical modelling, and will enable them to apply modern and innovative statistical methods to their own research into the future. Business will be able to unlock the power of the big data era, regardless of their personnel size and turnover. Access to modern AI algorithms will help enable them to thrive in the market of their choosing with pivotal business decisions being determined using available data. The prevalence of data in modern society has also fuelled the need for data literacy, with a broad range of people requiring an education that allows them to make decisions based on thinking critically in regards to available data sources. While many people attempting to make decisions based on data have the ability to understand statistical concepts and think in these terms, performing such tasks in the real world will be contingent upon using a software that is appropriate for their frequency of use.