Skip to main content
Unit outline_

STAT3888: Statistical Machine Learning

Semester 2, 2023 [Normal day] - Camperdown/Darlington, Sydney

Data Science is an emerging and inherently interdisciplinary field. A key set of skills in this area fall under the umbrella of Statistical Machine Learning methods. This unit presents the opportunity to bring together the concepts and skills you have learnt from a Statistics or Data Science major, and apply them to a joint project with NUTM3888 Metabolic Cybernetics where Statistics and Data Science students will form teams with Nutrition students to solve a real world problem using Statistical Machine Learning methods. The unit will cover a wide breadth of cutting edge supervised and unsupervised learning methods will be covered including principal component analysis, multivariate tests, discrimination analysis, Gaussian graphical models, log-linear models, classification trees, k-nearest neighbours, k-means clustering, hierarchical clustering, and logistic regression. In this unit, you will continue to understand and explore disciplinary knowledge, while also meeting and collaborating through project-based learning; identifying and solving problems, analysing data and communicating your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, and this is essential for both professional and research pathways in the future.

Unit details and rules

Academic unit Mathematics and Statistics Academic Operations
Credit points 6
Prerequisites
? 
STAT2X11 and (DATA2X02 or STAT2X12)
Corequisites
? 
None
Prohibitions
? 
STAT3914 or STAT3014
Assumed knowledge
? 

STAT3012 or STAT3912 or STAT3022 or STAT3922

Available to study abroad and exchange students

Yes

Teaching staff

Coordinator John Ormerod, john.ormerod@sydney.edu.au
Type Description Weight Due Length
Assignment Disciplinary Assignment
Exploratory data analysis of data set used in Major project.
10% Week 05
Due date: 01 Sep 2023 at 23:59

Closing date: 08 Sep 2023
4 weeks
Outcomes assessed: LO1 LO3 LO4 LO7 LO8
Presentation group assignment Project pitch
Present the group's pitch for the major project question.
10% Week 06
Due date: 07 Sep 2023 at 02:00

Closing date: 07 Sep 2023
5 min + 2min Q&A
Outcomes assessed: LO1 LO2 LO3 LO6 LO7 LO8
Tutorial quiz Disciplinary quiz
Quiz on lecture material
20% Week 09
Due date: 04 Oct 2023 at 14:00

Closing date: 04 Oct 2023
1 hour
Outcomes assessed: LO4 LO8 LO5
Presentation group assignment Major project - presentation
Group presentation of results (slides submitted at given due date)
15% Week 12
Due date: 26 Oct 2023 at 13:00

Closing date: 26 Oct 2023
5 minutes + 2 mins for questions
Outcomes assessed: LO1 LO8 LO7 LO6 LO5 LO4 LO3 LO2
Assignment group assignment Major project - Manuscript
Statistical analysis of nutrition data set
35% Week 13
Due date: 03 Nov 2023 at 23:59

Closing date: 10 Nov 2023
4000 words
Outcomes assessed: LO1 LO8 LO7 LO6 LO5 LO4 LO3 LO2
Assignment Major project - reflection/viva/minutes
Used to assess individual contributions
10% Week 13
Due date: 03 Nov 2023 at 23:59

Closing date: 03 Nov 2023
500 words/5 min
Outcomes assessed: LO1 LO2 LO3 LO4 LO5 LO6 LO7 LO8
group assignment = group assignment ?

Assessment summary

  • Examination: This exam will test the learning outcomes attained in lectures, and tutorials/computer labs. University-approved non-programmable calculators may be used.
  • Computer lab reports: There are 2 computer lab reports, which must be submitted electronically in Turnitin, via the Learning Management System (Canvas) website, by the deadline. Note that a submission will not be marked if it is illegible, sideways or upside down. It is your responsibility to check your submission receipt (which will be automatically emailed to you) to ensure that your report has been submitted correctly.
  • Major project: The major project is broken up into several assessment items: major report, multimedia item, presentation, meeting minutes, peer-to-peer review, group work attendance, and short reflection.

Detailed information for each assessment can be found on Canvas.

Assessment criteria

The University awards common result grades, set out in the Coursework Policy 2014 (Schedule 1).

As a general guide, a high distinction indicates work of an exceptional standard, a distinction a very high standard, a credit a good standard, and a pass an acceptable standard.

For more information see guide to grades.

Late submission

In accordance with University policy, these penalties apply when written work is submitted after 11:59pm on the due date:

  • Deduction of 5% of the maximum mark for each calendar day after the due date.
  • After ten calendar days late, a mark of zero will be awarded.

Academic integrity

The Current Student website provides information on academic integrity and the resources available to all students. The University expects students and staff to act ethically and honestly and will treat all allegations of academic integrity breaches seriously.

We use similarity detection software to detect potential instances of plagiarism or other forms of academic integrity breach. If such matches indicate evidence of plagiarism or other forms of academic integrity breaches, your teacher is required to report your work for further investigation.

Use of generative artificial intelligence (AI) and automated writing tools

You may only use generative AI and automated writing tools in assessment tasks if you are permitted to by your unit coordinator. If you do use these tools, you must acknowledge this in your work, either in a footnote or an acknowledgement section. The assessment instructions or unit outline will give guidance of the types of tools that are permitted and how the tools should be used.

Your final submitted work must be your own, original work. You must acknowledge any use of generative AI tools that have been used in the assessment, and any material that forms part of your submission must be appropriately referenced. For guidance on how to acknowledge the use of AI, please refer to the AI in Education Canvas site.

The unapproved use of these tools or unacknowledged use will be considered a breach of the Academic Integrity Policy and penalties may apply.

Studiosity is permitted unless otherwise indicated by the unit coordinator. The use of this service must be acknowledged in your submission as detailed on the Learning Hub’s Canvas page.

Outside assessment tasks, generative AI tools may be used to support your learning. The AI in Education Canvas site contains a number of productive ways that students are using AI to improve their learning.

Simple extensions

If you encounter a problem submitting your work on time, you may be able to apply for an extension of five calendar days through a simple extension.  The application process will be different depending on the type of assessment and extensions cannot be granted for some assessment types like exams.

Special consideration

If exceptional circumstances mean you can’t complete an assessment, you need consideration for a longer period of time, or if you have essential commitments which impact your performance in an assessment, you may be eligible for special consideration or special arrangements.

Special consideration applications will not be affected by a simple extension application.

Using AI responsibly

Co-created with students, AI in Education includes lots of helpful examples of how students use generative AI tools to support their learning. It explains how generative AI works, the different tools available and how to use them responsibly and productively.

WK Topic Learning activity Learning outcomes
Week 01 Introduction, administration and motivation Lecture (1 hr)  
Data cleaning Lecture (1 hr)  
Unsupervised learning - Introduction to clustering Lecture (1 hr)  
Introduction to the project Workshop (1 hr) LO2 LO3 LO6
Week 02 Unsupervised learning - K-means Lecture (1 hr)  
Unsupervised learning - Model based clustering Lecture (1 hr)  
Unsupervised learning - Hierarchical clustering Lecture (1 hr)  
Tutorial/Lab - Clustering Tutorial (1 hr)  
Workshop on cultural competency Workshop (2 hr) LO6
Week 03 Unsupervised learning - PCA background Lecture (1 hr)  
Unsupervised learning - Principal component analysis Lecture (1 hr)  
Unsupervised learning - Dimension reduction Lecture (1 hr)  
Group meeting - Guidance on choosing a research topic Workshop (2 hr)  
Tutorial/Lab - Dimension reduction Tutorial (1 hr)  
Week 04 Supervised learning - Introduction to supervised learning Lecture (1 hr)  
Supervised learning - Logistic regression Lecture (1 hr)  
Supervised learning - Penalized Logistic regression Lecture (1 hr)  
Group work - Group formation Workshop (2 hr)  
Tutorial/Lab - Logistic regression Workshop (1 hr)  
Week 05 Supervised learning - Discrimination analysis Lecture (1 hr)  
Supervised learning - Regression and classification trees Lecture (1 hr)  
Supervised learning - Random forests Lecture (1 hr)  
Tutorial/Lab - Discrimination analysis and classification trees Tutorial (1 hr)  
Group work - Choosing a research topic for the group Workshop (2 hr)  
Week 06 Neural networks Lecture (1 hr) LO8
Neural networks Lecture (1 hr) LO8
Neural networks Tutorial (1 hr) LO8
Group work - Fianlising a research topic Workshop (2 hr)  
Week 07 Support vector machines Lecture (1 hr) LO8
Support vector machines Lecture (1 hr) LO8
Support vector machines Tutorial (1 hr) LO8
Assessment: Project pitch Workshop (2 hr)  
Week 08 How to write a manuscript Workshop (2 hr)  
Week 09 Quiz (worth 20%) Lecture (1 hr)  
Group work Workshop (2 hr)  
Week 10 Group work Workshop (2 hr)  
Week 11 Group work Workshop (2 hr)  
Week 12 Final presnetation Workshop (2 hr)  

Study commitment

Typically, there is a minimum expectation of 1.5-2 hours of student effort per week per credit point for units of study offered over a full semester. For a 6 credit point unit, this equates to roughly 120-150 hours of student effort in total.

Learning outcomes are what students know, understand and are able to do on completion of a unit of study. They are aligned with the University's graduate qualities and are assessed as part of the curriculum.

At the completion of this unit, you should be able to:

  • LO1. apply disciplinary knowledge in statistics and data science to solve problems in an interdisciplinary context (nutrition)
  • LO2. find, define, and delimit authentic problems in order to address them
  • LO3. create an investigation strategy, explore solutions, discuss approaches, and predict outcomes
  • LO4. apply, formulate, interpret, and compare statistical machine learning methods including (wherever relevant) evaluation of model appropriateness
  • LO5. demonstrate integrity, confidence, personal resilience, and the capacity to manage challenges, both individually and in teams
  • LO6. collaborate with diverse groups across cultural and disciplinary boundaries to develop solution(s) to the project problems
  • LO7. communicate project outcomes effectively to a broad audience
  • LO8. identify appropriate machine learning problems to a particular problem, and judge the appropriateness of model evaluation procedures.

Graduate qualities

The graduate qualities are the qualities and skills that all University of Sydney graduates must demonstrate on successful completion of an award course. As a future Sydney graduate, the set of qualities have been designed to equip you for the contemporary world.

GQ1 Depth of disciplinary expertise

Deep disciplinary expertise is the ability to integrate and rigorously apply knowledge, understanding and skills of a recognised discipline defined by scholarly activity, as well as familiarity with evolving practice of the discipline.

GQ2 Critical thinking and problem solving

Critical thinking and problem solving are the questioning of ideas, evidence and assumptions in order to propose and evaluate hypotheses or alternative arguments before formulating a conclusion or a solution to an identified problem.

GQ3 Oral and written communication

Effective communication, in both oral and written form, is the clear exchange of meaning in a manner that is appropriate to audience and context.

GQ4 Information and digital literacy

Information and digital literacy is the ability to locate, interpret, evaluate, manage, adapt, integrate, create and convey information using appropriate resources, tools and strategies.

GQ5 Inventiveness

Generating novel ideas and solutions.

GQ6 Cultural competence

Cultural Competence is the ability to actively, ethically, respectfully, and successfully engage across and between cultures. In the Australian context, this includes and celebrates Aboriginal and Torres Strait Islander cultures, knowledge systems, and a mature understanding of contemporary issues.

GQ7 Interdisciplinary effectiveness

Interdisciplinary effectiveness is the integration and synthesis of multiple viewpoints and practices, working effectively across disciplinary boundaries.

GQ8 Integrated professional, ethical, and personal identity

An integrated professional, ethical and personal identity is understanding the interaction between one’s personal and professional selves in an ethical context.

GQ9 Influence

Engaging others in a process, idea or vision.

Outcome map

Learning outcomes Graduate qualities
GQ1 GQ2 GQ3 GQ4 GQ5 GQ6 GQ7 GQ8 GQ9

This section outlines changes made to this unit following staff and student reviews.

No changes have been made since this unit was last offered.

Where to go for help: For help with statistics, you can post a question on the Ed forum (anonymously from other students if you prefer), ask your tutor during a tutorial, computer lab (or workshop, for STAT3888), consult the lecturer in their consultation time (see above), or email john.ormerod@sydney. edu.au. For administrative questions, first check carefully whether the answers are on this information sheet or on the STAT3888 webpage; if not, ask on the ED forum or (if the question is specific to your situation) ask at the Student Services Office (Carslaw 520) or email STAT3888@sydney.edu.au or STAT3914@sydney.edu.au as appropriate. Ensure that any emails that you send contain your name and SID, because anonymous emails will be ignored. If your email includes questions that other students would benefit from seeing the answers to, you may be asked to post them on the Ed forum so that they can be answered there.

Disclaimer

The University reserves the right to amend units of study or no longer offer certain units, including where there are low enrolment numbers.

To help you understand common terms that we use at the University, we offer an online glossary.