Data Management & Statistical Computing (BSTA5004)


The aim of this unit is to provide students with the knowledge and skills required to undertake moderate to high level data manipulation and management in preparation for statistical analysis of data typically arising in health and medical research. Students will: gain experience in data manipulation and management using two major statistical software packages (Stata and SAS); learn how to check and clean data, display and summarise data using statistical software, and link files through use of unique and non-unique identifiers; acquire fundamental programming skills for efficient use of software packages; and learn key principles of confidentiality and privacy in data storage, management and analysis. The topics covered are: Module 1 - Stata and SAS: The basics (importing and exporting data, recoding data, formatting data, labelling variable names and data values; using dates, data display and summary presentation); Module 2 - Stata and SAS: graphs, data management and statistical quality assurance methods (including advanced graphics to produce publication-quality graphs); Module 3 - Data management using Stata and SAS (using functions to generate new variables, appending, merging, transposing longitudinal data; programming skills for efficient and reproducible use of these packages, including loops, arguments and programs/macros).

8-12 hours total study time per week, distance learning


3x written assignments (30%, 35%, 35%)


Recommended if you have not used SAS or Stata before: Cody R, Smith J. Applied Statistics & the SAS Programming Language. 5th edition. Prentice Hall 2005. Hills M & De Stavola BL. A Short Introduction to Stata for Biostatistics (Updated to Stata 12. Timbe

