This unit of study covers the data engineering issues of building robust and scalable data processing pipelines. While data engineers may not be directly performing data analysis, they must have the technical knowledge and skillset to provide data analysts with appropriate data analytics architectures and to provide them with reliable and well-formed data that is ready to be analysed. Topics covered range from data ingestion from various sources including databases, text files and web services, to data cleaning and data transformation approaches, and the system architectures that allow the pipeline to run efficiently and automatically. Special consideration is given to building scalable data analysis solutions using a blend of Big Data processing techniques including data stream processing and distributed data processing platforms such as Apache Spark.
Unit details and rules
Academic unit | Computer Science |
---|---|
Credit points | 6 |
Prerequisites
?
|
COMP5310 |
Corequisites
?
|
None |
Prohibitions
?
|
OCMP5339 |
Assumed knowledge
?
|
Proficiency in programming, especially Python, and in database querying with SQL; basic Unix scripting |
Available to study abroad and exchange students | Yes |
Teaching staff
Coordinator | Lijun Chang, lijun.chang@sydney.edu.au |
---|---|
Lecturer(s) | Elliot Zhu, jie.zhu@sydney.edu.au |