Data comes in many and varied formats, it can be tall or wide, big or small, structured or unstructured. Regardless of where you get your data from, it will almost always require some wrangling. Data wrangling is the convolution, alignment and preparation of data before use. This unit provides an overview of best practices in organising your research data from the point of discovery through to its use for scientific applications. You will learn the principles of data handling and how to maintain rigour and integrity of your data throughout your research, including documenting data provenance, how to access major databases, and data licensing. After calculating summary statistics to aid in the identification of outliers and missing values, you will learn how to clean and wrangle data in a reproducible manner in R, at a variety of scales. You will "wrangle" your research data using R, identifying outliers and missing values and ensuring provenance.
Unit details and rules
Academic unit | Mathematics and Statistics Academic Operations |
---|---|
Credit points | 2 |
Prerequisites
?
|
None |
Corequisites
?
|
None |
Prohibitions
?
|
None |
Assumed knowledge
?
|
Basic exploratory data analysis, basic coding in R |
Available to study abroad and exchange students | Yes |
Teaching staff
Coordinator | Jie Yen Yen Fan, jieyen.fan@sydney.edu.au |
---|