Skip to main content
Unit of study_

OCMP5310: Principles of Data Science

2025 unit information

The focus of this unit is on understanding and applying relevant concepts, techniques, algorithms, and tools for the analysis, management and visualisation of data- with the goal of enabling discovery of information and knowledge to guide effective decision making and to gain new insights from large data sets. To this end, this unit of study provides a broad introduction to data management, analysis, modelling and visualisation using the Python programming language. Development of custom software using the powerful, general-purpose Python scripting language; Data collection, cleaning, pre-processing, and storage using various databases; Exploratory data analysis to understand and profile complex data sets; Mining unlabelled data to identify relationships, patterns, and trends; Machine learning from labelled data to predict into the future; Communicate findings to varied audiences, including effective data visualisations. Core data science content will be taught in normal lecture + tutorial delivery mode. Python programming will be taught through an online learning platform in addition to the weekly face-to-face lecture/tutorials. The unit of study will include hands-on exercises covering the range of data science skills above.

Unit details and rules

Managing faculty or University school:

Engineering

Study level Postgraduate
Academic unit Computer Science
Credit points 6
Prerequisites:
? 
None
Corequisites:
? 
None
Prohibitions:
? 
COMP5310 or INFO3406
Assumed knowledge:
? 
Good understanding of relational data model and database technologies as covered in ISYS2120 or COMP9120 (or equivalent UoS from different institutions)

At the completion of this unit, you should be able to:

  • LO1. select statistical techniques appropriate for evaluation of a predictive model that is based on data analysis, and justify this choice
  • LO2. select statistical techniques appropriate for summarisation and analysis of a data set, and justify this choice
  • LO3. apply concepts and terms from social science to describe and analyse the role of a data analysis task in its organisational context
  • LO4. understand the role of data science in decision-making
  • LO5. understand the technical issues that are present in the stages of a data analysis task and the properties of different technologies and tools that can be used to deal with the issues
  • LO6. process large data sets using appropriate technologies
  • LO7. carry out (in guided stages) the whole design and implementation cycle for creating a pipeline to analyse a large heterogenous dataset
  • LO8. seek details of how to use a method or tool in the data analytic process
  • LO9. communicate the results produced by an analysis pipeline, in oral and written form, including meaningful diagrams
  • LO10. communicate the process used to analyse a large data set, and justify the methods used

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Session MoA ?  Location Outline ? 
Semester 1a 2024
Online Online Program
Semester 2a 2024
Online Online Program
Session MoA ?  Location Outline ? 
PG Online Session 1A 2025
Online Online Program
Outline unavailable
PG Online Session 2A 2025
Online Online Program
Outline unavailable
Session MoA ?  Location Outline ? 
Semester 1b 2023
Online Online Program
Semester 2a 2023
Online Online Program

Find your current year census dates

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.