Analysis – Explore, Visualise, Analyse
While modern data analysis is carried out using software; the PPDAC – Analysis Stage should emphasise teaching and learning a conceptual understanding of analytical techniques and the ability to interpret the output of software packages rather than on the ability to hand calculate formulae. Learners must know how to make sense of what the data means in the context of the problem they are trying to solve. The first step in data analysis is to get to know the data, often by “eyeballing” it in a graph or table. For learners within Broad General Education (BGE), this means knowing how to read and reason about tables, charts, graphs and infographics. Learners will encounter a wide range of engaging, interactive and informative techniques for showing data in the media or on the internet. These representations are often considerably more complex and informative than traditional pie or bar charts and learners will need some support as they learn to understand them. They should also know how to create tables, charts and diagrams either by hand (particularly in the early levels) or by using software packages or programming languages (in the senior phase).
The next stage in data analysis is to summarise it with descriptive statistics such as mean and spread (average, median, mode, minimum, maximum inter-quartile ranges and later standard deviation).
By the end of Broad General Education learners should know how to:
- Read and create tables, charts, graphs or infographics of data;
- Summarise a dataset using descriptive statistics such as mean and spread;
- Calculate and interpret effect sizes to answer the questions: how much of a difference is there between these two groups and does it matter?
Senior phase learners should know how to:
- Calculate and interpret effect sizes using standardized measures such as Cohen’s d;
- Understand in principle how inferential statistical tests such as t-test and correlation work and be able to interpret the output;
- Understand in principle how machine learning algorithms work and be able to interpret the output;
- Evaluate the quality of analysis to decide whether it was conducted fairly.
Software considerations
Learners need access to a range of software packages for collecting, analysing and presenting data including spreadsheets, point and click data analysis tools, commercial visualisation tools, and programming environments for data science.
In the BGE, access to a simple spreadsheet package is likely to be most useful, alongside simple point and click analysis tools such as CODAP (Common Online Data Analysis Platform). Learners in the Senior Phase (SP), particularly those who choose to study the National Progression Award in Data Science, will additionally benefit from commercially available visualisation tools and specialist programming environments.