This unit of study is designed to introduce students to the analysis of data they may face in their future careers, in particular data that are not well behaved. The data may be non-normal, there may be missing observations, they may be correlated in space and time or too numerous to analyse with standard models. The unit is presented in an applied context with an emphasis on correctly analysing authentic datasets, and interpreting the output. It begins with the analysis and design experiments based on the general linear model. In the second part, students will learn about the generalisation of the general linear model to accommodate non-normal data with a particular emphasis on the binomial and Poisson distributions. In the third part linear mixed models will be introduced which provide the means to analyse datasets that do not meet the assumptions of independent and equal errors, for example data that is correlated in space and time. The units ends with an introduction to machine learning and predictive modelling. A key feature of the unit is using R to develop coding skills that are become essential in science for processing and analysing datasets of ever increasing size.
One 2-hour workshop per week, one 3-hour computer practical per week
One computer-based exam during the exam period (50%), assessment tasks focusing on analysing and interpreting real datasets (50%)
ENVX2001 or BIOM2001 or STAT2X12 or BIOL2X22 or DATA2002 or QBIO2001