Data comes in many and varied formats, it can be tall or wide, big or small, structured or unstructured. Regardless of where you get your data from, it will almost always require some wrangling. Data wrangling is the convolution, alignment and preparation of data before use. This unit provides an overview of best practices in organising your research data from the point of discovery through to its use for scientific applications. You will learn the principles of data handling and how to maintain rigour and integrity of your data throughout your research, including documenting data provenance, how to access major databases, and data licensing. After calculating summary statistics to aid in the identification of outliers and missing values, you will learn how to clean and wrangle data in a reproducible manner in R, at a variety of scales. You will "wrangle" your research data using R, identifying outliers and missing values and ensuring provenance.
3 x 2-3-hr 'live labs'
2 x online quizzes (15% each, total 30%), oral presentation (30%), written report (40%)
Data Wrangling with R (Boehmke, B, 2016)
Basic exploratory data analysis, basic coding in R