# Data Science

## DATA SCIENCE

## Data Science major

A major in Data Science requires 48 credit points from this table including:

(i) 6 credit points of 1000-level core units

(ii) 6 credit points of 1000-level units according to the following rules*:

(a) 6 credit points of selective units OR

(b) 3 credit points of statistics units and 3 credit points of computation units OR

(c) 3 credit points of advanced statistics units and 3 credit points of mathematics units OR

(d) 3 credit points of advanced statistics units and 3 credit points of linear algebra units for students in the Mathematical Sciences program^

(iii) 12 credit points of 2000-level core units

(iv) 6 credit points of 2000-level selective units

(v) 6 credit points of 3000-level core interdisciplinary project units

(vi) 6 credit points of 3000-level methodology units

(vii) 6 credit points of 3000-level methodology or application or interdisciplinary project selective units

*Students not enrolled in the BSc may substitute ECMT1010 or BUSS1020

^If elective space allows, students may substitute DATA1001/1901 for the advanced statistics unit

## Data Science minor

A minor in Data Science requires 36 credit points from this table including:

(i) 6 credit points of 1000-level core units

(ii) 6 credit points of 1000-level units according to the following rules*:

(a) 6 credit points of selective units OR

(b) 3 credit points of statistics units and 3 credit points of computations units OR

(c) 3 credit points of advanced statistics units and 3 credit points of calculus and linear algebra units

(iii) 12 credit points of 2000-level core units

(iv) 6 credit points of 2000-level selective units

(v) 6 credit points of 3000-level methodology units

### Units of study

The units of study are listed below.

#### 1000-level units of study

###### Core

**DATA1002 Informatics: Data and Computation**

Credit points: 6 Teacher/Coordinator: Prof Alan Fekete Session: Semester 2 Classes: Lectures, Laboratories, Project Work - own time Prohibitions: INFO1903 OR DATA1902 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e.g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model.

**DATA1902 Informatics: Data and Computation (Advanced)**

Credit points: 6 Teacher/Coordinator: Prof Alan Fekete Session: Semester 2 Classes: lectures, laboratories Prohibitions: INFO1903 OR DATA1002 Assumed knowledge: This unit is intended for students with ATAR at least sufficient for entry to the BSc/BAdvStudies(Advanced) stream, or for those who gained Distinction results or better, in some unit in Data Science, Mathematics, or Computer Science. Students with portfolio of high-quality relevant prior work can also be admitted. Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e. g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model. This unit includes the content of DATA1002, along with additional topics that are more sophisticated, suited for students with high academic achievement.

###### Selective

**DATA1001 Foundations of Data Science**

Credit points: 6 Teacher/Coordinator: Prof Qiying Wang Session: Semester 1,Semester 2 Classes: 3x1-hr lectures; 1x2-hr lab/wk Prohibitions: DATA1901 or MATH1005 or MATH1905 or MATH1015 or MATH1115 or ENVX1001 or ENVX1002 or ECMT1010 or BUSS1020 or STAT1021 Assessment: RQuizzes (10%); 3 x projects (30%); final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

DATA1001 is a foundational unit in the Data Science major. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research which relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology, DATA1001 develops critical thinking and skills to problem-solve with data. It is the prerequisite for DATA2002.

Textbooks

Statistics, (4th Edition), Freedman Pisani Purves (2007)

**DATA1901 Foundations of Data Science (Adv)**

Credit points: 6 Teacher/Coordinator: Prof Qiying Wang Session: Semester 1,Semester 2 Classes: Lecture 3 hrs/week + Computer lab 2 hr/week Prohibitions: MATH1905 or ECMT1010 or ENVX1002 or BUSS1020 or DATA1001 or MATH1115 or MATH1015 Assumed knowledge: An ATAR of 95 or more Assessment: RQuizzes (10%), Projects (30%), Final Exam (60%). Mode of delivery: Normal (lecture/lab/tutorial) day

DATA1901 is an advanced level unit (matching DATA1001) that is foundational to the new major in Data Science. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research which relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology and masterclasses, DATA1901 develops critical thinking and skills to problem-solve with data at an advanced level. By completing this unit you will have an excellent foundation for pursuing data science, whether directly through the data science major, or indirectly in whatever field you major in. The advanced unit has the same overall concepts as the regular unit but material is discussed in a manner that offers a greater level of challenge and academic rigour.

Textbooks

All learning materials will be on Canvas. In addition, the textbook is Statistics (4th Edition) { Freedman, Pisani, and Purves (2007), which is available in 3 forms: 1) E-text $65 (www.wileydirect.com.au/buy/statistics-4th-international-student-edition/), 2) hard copy (Co-op Bookshop), and 3) the Library.

**ENVX1002 Introduction to Statistical Methods**

Credit points: 6 Teacher/Coordinator: A/Prof Thomas Bishop Session: Semester 1 Classes: 3 hours per week of lectures; 2 hours per week of computer tutorials Prohibitions: ENVX1001 or MATH1005 or MATH1905 or MATH1015 or MATH1115 or DATA1001 or DATA1901 or BUSS1020 or STAT1021 or ECMT1010 Assessment: Assignments, quizzes, presentation, exam Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Available as a degree core unit only in the Agriculture, Animal and Veterinary Bioscience, and Food and Agribusiness, and Taronga Wildlife Conservation streams

This is an introductory data science unit for students in the agricultural, life and environmental sciences. It provides the foundation for statistics and data science skills that are needed for a career in science and for further study in applied statistics and data science. The unit focuses on developing critical and statistical thinking skills for all students. It has 4 modules; exploring data, modelling data, sampling data and making decisions with data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology, ENVX1002 develops critical thinking and skills to problem-solve with data.

Textbooks

Statistics, Fourth Edition, Freedman Pisani Purves

###### Statistics

**MATH1005 Statistical Thinking with Data**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Intensive January,Semester 1,Semester 2 Classes: 2x1-hr lectures; 1x1-hr lab/wk Prohibitions: MATH1015 or MATH1905 or STAT1021 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or DATA1901 Assumed knowledge: HSC Mathematics. Students who have not completed HSC Mathematics (or equivalent) are strongly advised to take the Mathematics Bridging Course (offered in February). Assessment: quizzes (10%), project 1 (10%), project 2 (15%), final exam (65%) Mode of delivery: Normal (lecture/lab/tutorial) day

In a data-rich world, global citizens need to problem solve with data, and evidence based decision-making is essential is every field of research and work.

This unit equips you with the foundational statistical thinking to become a critical consumer of data. You will learn to think analytically about data and to evaluate the validity and accuracy of any conclusions drawn. Focusing on statistical literacy, the unit covers foundational statistical concepts, including the design of experiments, exploratory data analysis, sampling and tests of significance.

This unit equips you with the foundational statistical thinking to become a critical consumer of data. You will learn to think analytically about data and to evaluate the validity and accuracy of any conclusions drawn. Focusing on statistical literacy, the unit covers foundational statistical concepts, including the design of experiments, exploratory data analysis, sampling and tests of significance.

Textbooks

Statistics, (4th Edition), Freedman Pisani Purves (2007)

###### Computation

**MATH1115 Interrogating Data**

Credit points: 3 Teacher/Coordinator: Prof Qiying Wang Session: Intensive January,Semester 1,Semester 2 Classes: 2-hr lab/wk Prerequisites: MATH1005 or MATH1015 Prohibitions: STAT1021 or ENVX1001 or ENVX1002 or BUSS1020 or ECMT1010 or DATA1001 or DATA1901 Assessment: LQuizzes (5%); projects (30%); final exam (65%) Mode of delivery: Normal (lecture/lab/tutorial) day

In a data-rich world, global citizens need to problem solve with data, and evidence based decision-making is essential is every field of research and work. This unit equips you with foundational statistical thinking to interrogate data. Focusing on statistical literacy, the unit covers foundational statistical concepts such as visualising data, the linear regression model, and testing significance using the t and chi-square tests. Based on a flipped learning approach, you will experience most of your learning in weekly collaborative 2 hour labs, supplemented by readings and lectures. Working in teams, you will explore three real data stories across different domains, with associated literature. The combination of MATH1005/1015 and MATH1115 is equivalent to DATA1001, allowing you to pathway to the Data Science, Statistics, or Quantitative Life Sciences majors.

Textbooks

Statistics, (4th edition), Freedman, Pisani and Purves, (2007)

###### Advanced Statistics

**MATH1905 Statistical Thinking with Data (Advanced)**

Credit points: 3 Teacher/Coordinator: Prof Qiying Wang Session: Semester 2 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1005 or MATH1015 or STAT1021 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or DATA1901 Assumed knowledge: (HSC Mathematics Extension 2) OR (90 or above in HSC Mathematics Extension 1) or equivalent Assessment: 2 x quizzes (20%); 2 x assignments (10%); final exam (70%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a Junior level requirement in the Faculty of Engineering. This Advanced level unit of study parallels the normal unit MATH1005 but goes more deeply into the subject matter and requires more mathematical sophistication.

Textbooks

A Primer of Statistics (4th edition), M C Phipps and M P Quine, Prentice Hall, Australia (2001)

###### Mathematics

**MATH1021 Calculus Of One Variable**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Intensive January,Semester 1,Semester 2 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1011 or MATH1901 or MATH1906 or ENVX1001 or MATH1001 or MATH1921 or MATH1931 Assumed knowledge: HSC Mathematics Extension 1 or equivalent. Assessment: 2 x quizzes (30%), 2 x assignments (5%), online quizzes (10%), final exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates differential calculus and integral calculus of one variable and the diverse applications of this theory. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include complex numbers, functions of a single variable, limits and continuity, differentiation, optimisation, Taylor polynomials, Taylor's Theorem, Taylor series, Riemann sums, and Riemann integrals.

Textbooks

Calculus of One Variable (Course Notes for MATH1021)

**MATH1921 Calculus Of One Variable (Advanced)**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Semester 1 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1001 or MATH1011 or MATH1906 or ENVX1001 or MATH1901 or MATH1021 or MATH1931 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent. Assessment: 2 x quizzes (20%); 2 x assignments (10%); final exam (70%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates differential calculus and integral calculus of one variable and the diverse applications of this theory. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include complex numbers, functions of a single variable, limits and continuity, differentiation, optimisation, Taylor polynomials, Taylor's Theorem, Taylor series, Riemann sums, and Riemann integrals. Additional theoretical topics included in this advanced unit include the Intermediate Value Theorem, Rolle's Theorem, and the Mean Value Theorem.

Textbooks

As set out in the Junior Mathematics Handbook

**MATH1931 Calculus Of One Variable (SSP)**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Semester 1 Classes: 2x1-hr lectures; and 1x1-hr tutorial/wk Prohibitions: MATH1001 or MATH1011 or MATH1901 or ENVX1001 or MATH1906 or MATH1021 or MATH1921 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent. Assessment: Seminar participation (10%); 3 x special assignments (10%); 2 x quizzes (16%); 2 x assignments (8%); final exam (56%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Note: Enrolment is by invitation only

The Mathematics Special Studies Program is for students with exceptional mathematical aptitude, and requires outstanding performance in past mathematical studies. Students will cover the material of MATH1921 Calculus of One Variable (Adv), and attend a weekly seminar covering special topics on available elsewhere in the Mathematics and Statistics program.

**MATH1023 Multivariable Calculus and Modelling**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Intensive January,Semester 1,Semester 2 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1013 or MATH1903 or MATH1907 or MATH1003 or MATH1923 or MATH1933 Assumed knowledge: Knowledge of complex numbers and methods of differential and integral calculus including integration by partial fractions and integration by parts as for example in MATH1021 or MATH1921 or MATH1931 or HSC Mathematics Extension 2 Assessment: 2 x quizzes (30%), 2 x assignments (5%), online quizzes (10%), final exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates multivariable differential calculus and modelling. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include mathematical modelling, first order differential equations, second order differential equations, systems of linear equations, visualisation in 2 and 3 dimensions, partial derivatives, directional derivatives, the gradient vector, and optimisation for functions of more than one variable.

Textbooks

Multivariable Calculus and Modelling (Course Notes for MATH1023)

**MATH1923 Multivariable Calculus and Modelling (Adv)**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Semester 2 Classes: 2x1-hr lectures; and 1x1-hr tutorial/wk Prohibitions: MATH1003 or MATH1013 or MATH1907 or MATH1903 or MATH1023 or MATH1933 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent. Assessment: 2 x quizzes (20%); 2 x assignments (10%); final exam (70%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates multivariable differential calculus and modelling. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include mathematical modelling, first order differential equations, second order differential equations, systems of linear equations, visualisation in 2 and 3 dimensions, partial derivatives, directional derivatives, the gradient vector, and optimisation for functions of more than one variable. Additional topics covered in this advanced unit of study include the use of diagonalisation of matrices to study systems of linear equation and optimisation problems, limits of functions of two or more variables, and the derivative of a function of two or more variables.

Textbooks

As set out in the Junior Mathematics Handbook

**MATH1933 Multivariable Calculus and Modelling (SSP)**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Semester 2 Classes: 2x1-hr lectures; and 1x1-hr tutorial/wk Prohibitions: MATH1003 or MATH1903 or MATH1013 or MATH1907 or MATH1023 or MATH1923 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent. Assessment: Seminar participation (10%); 3 x special assignments (10%); 2 x quizzes (16%); 2 x assignments (8%); final exam (56%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Note: Enrolment is by invitation only.

The Mathematics Special Studies Program is for students with exceptional mathematical aptitude, and requires outstanding performance in past mathematical studies. Students will cover the material of MATH1923 Multivariable Calculus and Modelling (Adv), and attend a weekly seminar covering special topics on available elsewhere in the Mathematics and Statistics program.

**MATH1002 Linear Algebra**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Intensive January,Semester 1 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1012 or MATH1014 or MATH1902 Assumed knowledge: HSC Mathematics or MATH1111. Students who have not completed HSC Mathematics (or equivalent) are strongly advised to take the Mathematics Bridging Course (offered in February). Assessment: online quizzes (10%), quiz (15%), assignments (10%), final exam (65%) Mode of delivery: Normal (lecture/lab/tutorial) day

MATH1002 is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a Junior level requirement in the Faculty of Engineering.

This unit of study introduces vectors and vector algebra, linear algebra including solutions of linear systems, matrices, determinants, eigenvalues and eigenvectors.

This unit of study introduces vectors and vector algebra, linear algebra including solutions of linear systems, matrices, determinants, eigenvalues and eigenvectors.

Textbooks

Linear Algebra: A Modern Introduction, (4th edition), David Poole

**MATH1902 Linear Algebra (Advanced)**

Credit points: 3 Teacher/Coordinator: A/Prof Sharon Stephen Session: Semester 1 Classes: 2x1-hr lectures; 1x1-hr tutorial/wk Prohibitions: MATH1002 or MATH1014 Assumed knowledge: (HSC Mathematics Extension 2) OR (90 or above in HSC Mathematics Extension 1) or equivalent Assessment: Online quizzes (10%); 4 x assignments (20%); final exam (70%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a Junior level requirement in the Faculty of Engineering. It parallels the normal unit MATH1002 but goes more deeply into the subject matter and requires more mathematical sophistication.

Textbooks

As set out in the Junior Mathematics Handbook

#### 2000-level units of study

###### Core

**DATA2001 Data Science: Big Data and Data Diversity**

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: Lectures, Laboratories, Project Work - own time Prerequisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1910 OR INFO1903 OR INFO1103 Prohibitions: DATA2901 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.

**DATA2901 Big Data and Data Diversity (Advanced)**

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: lectures, laboratories Prerequisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1903 OR INFO1103. Students need Distinction or better in one of the prerequisite units. Prohibitions: DATA2001 Assessment: through semester assessment (60%), final exam (40%) Mode of delivery: Normal (lecture/lab/tutorial) day

This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry. Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'. This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.

**DATA2002 Data Analytics: Learning from Data**

Credit points: 6 Teacher/Coordinator: A/Prof Jennifer Chan Session: Semester 2 Classes: Lecture 3 hrs/week + workshop 2 hr/week Prerequisites: [DATA1001 or ENVX1001 or ENVX1002] or [MATH10X5 and MATH1115] or [MATH10X5 and STAT2X11] or [MATH1905 and MATH1XXX (except MATH1XX5)] or [BUSS1020 or ECMT1010 or STAT1021] Prohibitions: STAT2012 or STAT2912 or DATA2902 Assumed knowledge: Basic linear algebra and some coding for example MATH1014 or MATH1002 or MATH1902 and DATA1001 or DATA1901 Assessment: Model reports (15%), online quizzes (15%), group work assignment and presentation (20%) and final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

Technological advances in science, business and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2002 is an intermediate unit in statistics and data sciences, focusing on learning data analytic skills for a wide range of problems and data. How should the Australian government measure and report employment and unemployment? Can we tell the difference between decaffeinated and regular coffee ? In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforcing your programming skills through experience with a statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skill to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.

**DATA2902 Data Analytics: Learning from Data (Adv)**

Credit points: 6 Teacher/Coordinator: A/Prof Jennifer Chan Session: Semester 2 Classes: Lecture 3 hrs/week + workshop 2 hr/week Prerequisites: A mark of 65 or above in any of the following (DATA1001 or DATA1901 or ENVX1001 or ENVX1002) or (MATH10X5 and MATH1115) or (MATH10X5 and STAT2011) or STAT2911 or (MATH1905 and MATH1XXX [except MATH1XX5]) or (BUSS1020 or ECMT1010 or STAT1021) Prohibitions: STAT2012 or STAT2912 or DATA2002 Assumed knowledge: Basic linear algebra and some coding for example MATH1014 or MATH1002 or MATH1902 and DATA1001 or DATA1901 Assessment: Model reports (15%), online quizzes (15%), group work assignment and presentation (20%) and final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

Technological advances in science, business, and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2902 is an intermediate unit in statistics and data sciences, focusing on learning advanced data analytic skills for a wide range of problems and data. How should the Australian government measure and report employment and unemployment? Can we tell the difference between decaffeinated and regular coffee? In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforcing your programming skills through experience with statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skill to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.

###### Selective

**COMP2123 Data Structures and Algorithms**

Credit points: 6 Teacher/Coordinator: Andreas Van Renssen Session: Semester 1 Classes: Lectures, Tutorials Prerequisites: INFO1110 OR INFO1910 OR INFO1113 OR DATA1002 OR DATA1902 OR INFO1103 OR INFO1903 Prohibitions: INFO1105 OR INFO1905 OR COMP2823 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will teach some powerful ideas that are central to solving algorithmic problems in ways that are more efficient than naive approaches. In particular, students will learn how data collections can support efficient access, for example, how a dictionary or map can allow key-based lookup that does not slow down linearly as the collection grows in size. The data structures covered in this unit include lists, stacks, queues, priority queues, search trees, hash tables, and graphs. Students will also learn efficient techniques for classic tasks such as sorting a collection. The concept of asymptotic notation will be introduced, and used to describe the costs of various data access operations and algorithms.

**COMP2823 Data Structures and Algorithms (Adv)**

Credit points: 6 Teacher/Coordinator: Dr Julian Mestre Session: Semester 1 Classes: lectures, tutorials Prerequisites: INFO1110 OR INFO1910 OR INFO1113 OR DATA1002 OR DATA1902 OR INFO1103 OR INFO1903 Prohibitions: INFO1105 OR INFO1905 OR COMP2123 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will teach some powerful ideas that are central to solving algorithmic problems in ways that are more efficient than naive approaches. In particular, students will learn how data collections can support efficient access, for example, how a dictionary or map can allow key-based lookup that does not slow down linearly as the collection grows in size. The data structures covered in this unit include lists, stacks, queues, priority queues, search trees, hash tables, and graphs. Students will also learn efficient techniques for classic tasks such as sorting a collection. The concept of asymptotic notation will be introduced, and used to describe the costs of various data access operations and algorithms.

**COSC2002 Computational Modelling**

Credit points: 6 Teacher/Coordinator: Dr Tristram Alexander Session: Semester 1 Classes: lecture 2x1 hr/week; labs 1x1 hr/wk + 1x2 hrs/wk Prohibitions: COSC1003 or COSC1903 or COSC2902 Assumed knowledge: HSC Mathematics; DATA1002, or equivalent programming experience, ideally in Python. Assessment: In-lab checkpoints (10%), Assignment (10%), Class test 1 (20%), Class test 2 (20%), Final exam (40%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will introduce a wide range of modelling and simulation techniques for tackling real-world problems using a computer. Data is often expensive to obtain, so by harnessing the enormous computational processing power now available to us we can answer what if questions based on data we already have. You will learn how to break a problem down into its key components, identifying necessary assumptions for the purposes of simulation. You will learn how to develop suitable metrics within computational models, to allow comparison of simulation data with real-world data. You will learn how to iteratively improve simulations as you validate them against real results, and you will gain experience in identifying the types of exploratory questions that computational modelling opens up. Programming will be in python. You will learn how to generate probabilistic data, solve systems of differential equations numerically, and tackle complex adaptive systems using agent-based models. Dynamical systems ranging from traffic flow to social segregation will be considered. By doing this unit you will develop the skills to go behind your data, understand why the data you observe might be as it is, and test scenarios which might otherwise be inaccessible.

**COSC2902 Computational Modelling (Advanced)**

Credit points: 6 Teacher/Coordinator: Dr Tristram Alexander Session: Semester 1 Classes: Lectures 2x1 hr/wk; Labs 1x1 hr/wk + 1x2 hr/wk Prerequisites: 48 credit points of 1000 level units with an average of 65 Prohibitions: COSC1003 or COSC1903 or COSC2002 Assumed knowledge: HSC Mathematics; DATA1002, or equivalent programming experience, ideally in Python. Assessment: In-lab checkpoints [10%] Assignment [10%] Class test 1 [20%] Class test 2 [20%] Final exam [40%] Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit will introduce a wide range of modelling and simulation techniques for tackling real-world problems using a computer. Data is often expensive to obtain, so by harnessing the enormous computational processing power now available to us we can answer what if questions based on data we already have. You will learn how to break a problem down into its key components, identifying necessary assumptions for the purposes of simulation. You will learn how to develop suitable metrics within computational models, to allow comparison of simulation data with real-world data. You will learn how to iteratively improve simulations as you validate them against real results, and you will gain experience in identifying the types of exploratory questions that computational modelling opens up. Programming will be in python. You will learn how to generate probabilistic data, solve systems of differential equations numerically, and tackle complex adaptive systems using agent-based models. Dynamical systems ranging from traffic flow to social segregation will be considered. By doing this unit you will develop the skills to go behind your data, understand why the data you observe might be as it is, and test scenarios which might otherwise be inaccessible. This is an advanced unit. It runs jointly with the associated mainstream unit, however the lab work and assessment requires a greater level of academic rigour. You will be required to engage in more challenging real-world computational modelling problems than the mainstream unit, and explore more deeply the reasons behind simulation results.

**STAT2011 Probability and Estimation Theory**

Credit points: 6 Teacher/Coordinator: A/Prof Jennifer Chan Session: Semester 1 Classes: 3x1-hr lectures; 1x1-hr tutorial; and 1x1-hr computer lab/wk Prerequisites: (MATH1X21 or MATH1931 or MATH1X01 or MATH1906 or MATH1011) and (DATA1X01 or MATH10X5 or MATH1905 or STAT1021 or ECMT1010 or BUSS1020) Prohibitions: STAT2911 Assessment: 2 x quizzes (30%); weekly computer practical reports (5%); a 1-hr computer exam in week 13 (15%); and a final 2-hr exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to probability, the concept of random variables, special distributions including the Binomial, Hypergeometric, Poisson, Normal, Geometric and Gamma and to statistical estimation. This unit will investigate univariate techniques in data analysis and for the most common statistical distributions that are used to model patterns of variability. You will learn the method of moments and maximum likelihood techniques for fitting statistical distributions to data. The unit will have weekly computer classes where you will learn to use a statistical computing package to perform simulations and carry out computer intensive estimation techniques like the bootstrap method. By doing this unit you will develop your statistical modeling skills and it will prepare you to learn more complicated statistical models.

Textbooks

An Introduction to Mathematical Statistics and Its Applications (5th edition), Chapters 1-5, Larsen and Marx (2012)

**STAT2911 Probability and Statistical Models (Adv)**

Credit points: 6 Teacher/Coordinator: A/Prof Jennifer Chan Session: Semester 1 Classes: 3x1-hr lectures; 1x1-hr tutorial; and 1x1-hr computer lab/wk Prerequisites: (MATH1X21 or MATH1931 or MATH1X01 or MATH1906 or MATH1011) and a mark of 65 or greater in (DATA1X01 or MATH10X5 or MATH1905 or STAT1021 or ECMT1010 or BUSS1020) Prohibitions: STAT2011 Assessment: 2 x quizzes (10%); 2 x assignments (5%); computer work (5%); weekly computer lab reports (5%); a computer lab exam (10%) and a final 2-hr exam (70%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit is essentially an advanced version of STAT2011, with an emphasis on the mathematical techniques used to manipulate random variables and probability models. Common distributions including the Poisson, normal, beta and gamma families as well as the bivariate normal are introduced. Moment generating functions and convolution methods are used to understand the behaviour of sums of random variables. The method of moments and maximum likelihood techniques for fitting statistical distributions to data will be explored. The notions of conditional expectation and prediction will be covered as will be distributions related to the normal: chi^2, t and F. The unit has weekly computer classes where you will learn to use a statistical computing package to perform simulations and carry out computer intensive estimation techniques like the bootstrap method.

Textbooks

Mathematical Statistics and Data Analysis (3rd edition), J A Rice

**QBUS2830 Actuarial Data Analytics**

*This unit of study is not available in 2020*

Credit points: 6 Session: Semester 1 Classes: 1x 2hr lecture per week and 1x 1hr tutorial per week Prerequisites: QBUS2810 or DATA2002 or ECMT2110 Assumed knowledge: BUSS1020 or ECMT1010 or ENVX1001 or ENVX1002 or ((MATH1005 or MATH1015) and MATH1115) or 6 credit points in MATH 1000-level units including MATH1905. Assessment: assignments (30%), mid-semester exam (20%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

The unit covers a range of statistical models and methods for analysing quantitative actuarial data in general insurance. Both maximum likelihood estimation and Bayesian estimation methods are adopted for statistical inferences with the use of modern software tools such as the R and OpenBUGS packages. Topics covered include probability distributions for actuarial modelling, claim size modelling, claim frequency modelling, loss reserve forecasting, pure premium calculation, premium rates reviewing and revising (credibility theory), linear and generalised linear models, Poisson process and Markov process in actuarial modelling. Upon the completion of this unit and other relevant business analytics units, students may undertake professional examinations for actuaries or may get exemptions in some professional examination papers.

**GEGE2001 Genetics and Genomics**

Credit points: 6 Teacher/Coordinator: Dr Jenny Saleeba Session: Semester 1,Semester 2 Classes: Two lectures per week; one 3-hour practical session per week; and one tutorial per fortnight Prohibitions: GENE2002 or MBLG2972 or GEGE2901 or MBLG2072 Assumed knowledge: Mendelian genetics; mechanisms of evolution; molecular and chromosomal bases of inheritance; and gene regulation and expression. Assessment: Assignments, quizzes and presentation (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

The era of genomics has revolutionised our approach to biology. Recent breakthroughs in genetics and genomic technologies have led to improvements in human and animal health, in breeding and selection of economically important organisms and in the curation and care of wild species and complex ecosystems. In this unit, students will investigate/describe ways in which modern biology uses genetics and genomics to study life, from the unicellular through to complex multicellular organisms and their interactions in communities and ecosystems. This unit includes a solid foundation in classical Mendelian genetics and its extensions into quantitative and population genetics. It also examines how our ability to sequence whole genomes has changed our capacities and our understanding of biology. Links between DNA, phenotype and the performance of organisms and ecosystems will be highlighted. The unit will examine the profound insights that modern molecular techniques have enabled in the fields of developmental biology, gene regulation, population genetics and molecular evolution.

**GEGE2901 Genetics and Genomics (Advanced)**

Credit points: 6 Teacher/Coordinator: Dr Jenny Saleeba Session: Semester 1,Semester 2 Classes: Two lectures per week; one 3-hour practical session per week; and one tutorial per fortnight Prerequisites: Annual average mark of at least 70 Prohibitions: GENE2002 or MBLG2072 or GEGE2001 or MBLG2972 Assumed knowledge: Mendelian genetics, mechanisms of evolution, molecular and chromosomal bases of inheritance, and gene regulation and expression. Assessment: Assignments, quizzes, presentation, final exam Mode of delivery: Normal (lecture/lab/tutorial) day

The era of genomics has revolutionised our approach to biology. Recent breakthroughs in genetics and genomic technologies have led to improvements in human and animal health, in breeding and selection of economically important organisms and in the curation and care of wild species and complex ecosystems. In this unit, students will investigate/describe ways in which modern biology uses genetics and genomics to study life, from the unicellular through to complex multicellular organisms and their interactions in communities and ecosystems. This unit includes a solid foundation in classical Mendelian genetics and its extensions into quantitative and population genetics. It also examines how our ability to sequence whole genomes has changed our capacities and our understanding of biology. Links between DNA, phenotype and the performance of organisms and ecosystems will be highlighted. The unit will examine the profound insights that modern molecular techniques have enabled in the fields of developmental biology, gene regulation, population genetics and molecular evolution. The Advanced mode of Genetics and Genomics will provide you with challenge and a higher level of academic rigour. You will have the opportunity to plan a project that will develop your skills in contemporary genetics/molecular biology techniques and will provide you with a greater depth of disciplinary understanding. The Advanced mode will culminate in a written report and/or in an oral presentation where you will discuss a recent breakthrough that has been enabled by the use of modern genetics and genomics technologies. This is a unit for anyone wanting to better understand the how genetics has shaped the earth and how it will shape the future.

**QBIO2001 Molecular Systems Biology**

Credit points: 6 Teacher/Coordinator: Dr Edward Hancock Session: Semester 1 Classes: Two 1-hour lectures; one 3-hour practical session on a weekly basis Assumed knowledge: Basic concepts in metabolism; protein synthesis; gene regulation; quantitative and statistical skills Assessment: One 3-hour final exam (50%), three 45-minute quizzes (20%), one 5-minute presentation (10%), laboratory assessment and practical book (20%) Mode of delivery: Normal (lecture/lab/tutorial) day

Experimental approaches to the study of biological systems are shifting from hypothesis driven to hypothesis generating research. Large scale experiments at the molecular scale are producing enormous quantities of data ("Big Data") that need to be analysed to derive significant biological meaning. For example, monitoring the abundance of tens of thousands of proteins simultaneously promises ground-breaking discoveries. In this unit, you will develop specific analytical skills required to work with data obtained in the biological and medical sciences. The unit covers quantitative analysis of biological systems at the molecular scale including modelling and visualizing patterns using differential equations, experimental design and data types to understand disease aetiology. You will also use methods to model cellular systems including metabolism, gene regulation and signalling. The practical program will enable you to generate data analysis workflows, and gain a deep understanding of the statistical, informatics and modelling tools currently being used in the field. To leverage multiple types of expertise, the computer lab-based practical component of this unit will be predominantly a team-based collaborative learning environment. Upon completion of this unit, you will have gained skills to find meaningful solutions to difficult biological and disease-related problems with the potential to change our lives.

Textbooks

An Introduction to Systems Biology: Design Principles of Biological Circuits, Uri Alon, (Chapman and Hall/CRC, 2007). Systems Biology, Edda Klipp, Wolfram Liebermeister, Christoph Wierling, Axel Kowald, Hans Lehrach, and Ralf Herwig, (Wiley-Blackhall, 2009). Molecular biology of the cell, Alberts B et al (6th edition, Garland Science, 2015) Discovering Statistics Using R, Andy Field (2012, SAGE Publications Ltd). Computational and Statistical Methods for Protein Quantitation by Mass Spectrometry, Martens L et al (Wiley, 2013)

#### 3000-level units of study

###### Core interdisciplinary project

**DATA3888 Data Science Capstone**

Credit points: 6 Teacher/Coordinator: Prof Jean Yang Session: Semester 1 Prerequisites: DATA2001 or DATA2901 or DATA2002 or DATA2902 or STAT2912 or STAT2012 Assessment: Disciplinary component: Online quiz (10%), Student led lecture (10% report, 20% presentations, 10% peer review). Interdisciplinary component: Reflective task (5%), Team work process (10%), Report and presentation (35%) Mode of delivery: Normal (lecture/lab/tutorial) day

In our ever-changing world, we are facing a new data-driven era where the capability to efficiently combine and analyse large data collections is essential for informed decision making in business and government, and for scientific research. Data science is an emerging interdisciplinary field with its focus on high performance computation and quantitative expression of the confidence in conclusions, and the clear communication of those conclusions in different discipline context. This unit is our capstone project that presents the opportunity to create a public data product that can illustrate the concepts and skills you have learnt in this discipline. In this unit, you will have an opportunity to explore deeper disciplinary knowledge; while also meeting and collaborating through project-based learning. The capstone project in this unit will allow you to identify and place the data-driven problem into an analytical framework, solve the problem through computational means, interpret the results and communicate your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, to translate problem between two or more disciplines and this is essential for both professional and research pathways in the future.

###### Methodology

**DATA3404 Data Science Platforms**

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: lectures, tutorials Prerequisites: DATA2001 OR DATA2901 OR ISYS2120 OR INFO2120 OR INFO2820 Prohibitions: INFO3504 OR INFO3404 Assumed knowledge: This unit of study assumes that students have previous knowledge of database structures and of SQL. The prerequisite material is covered in DATA2001 or ISYS2120. Familiarity with a programming language (e.g. Java or C) is also expected. Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit of study provides a comprehensive overview of the internal mechanisms data science platforms and of the systems that manage large data collections. These skills are needed for successful performance tuning and to understand the scalability challenges faced by when processing Big Data. This unit builds upon the second' year DATA2001 - 'Data Science - Big Data and Data Diversity' and correspondingly assumes a sound understanding of SQL and data analysis tasks.

The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.

The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.

The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.

The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.

The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.

The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.

**DATA3406 Human-in-the-Loop Data Analytics**

Credit points: 6 Teacher/Coordinator: Prof Judith Kay Session: Semester 2 Classes: lectures, laboratories, project work Assumed knowledge: Basic statistics, database management, and programming. Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit focuses on methods and techniques to take into consideration the human elements in data science. Humans can act as both sources of data and its interpreters, introducing a range of complexities with regards to analysis. How do we account for the unreliability in data collected from humans? What can be done to address the subjects' concerns about their data? How can we create visualisations that facilitate understanding of the main findings? What are the limitations of any predictions? The ability to consider human factors is essential in any loop that involves people gathering, storing, or interpreting data for decision making.

On completion of this unit, students will be able to identify and analyse the human factors in the data analytics loop, and will be able to derive solutions for the challenges that arise.

On completion of this unit, students will be able to identify and analyse the human factors in the data analytics loop, and will be able to derive solutions for the challenges that arise.

**COMP3308 Introduction to Artificial Intelligence**

Credit points: 6 Teacher/Coordinator: A/Prof Irena Koprinska Session: Semester 1 Classes: Tutorials, Lectures Prohibitions: COMP3608 Assumed knowledge: Algorithms. Programming skills (e.g. Java, Python, C, C++, Matlab) Assessment: Through semester assessment (45%) and Final Exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

Artificial Intelligence (AI) is all about programming computers to perform tasks normally associated with intelligent behaviour. Classical AI programs have played games, proved theorems, discovered patterns in data, planned complex assembly sequences and so on. This unit of study will introduce representations, techniques and architectures used to build intelligent systems. It will explore selected topics such as heuristic search, game playing, machine learning, neural networks and probabilistic reasoning. Students who complete it will have an understanding of some of the fundamental methods and algorithms of AI, and an appreciation of how they can be applied to interesting problems. The unit will involve a practical component in which some simple problems are solved using AI techniques.

**COMP3608 Introduction to Artificial Intelligence (Adv)**

Credit points: 6 Teacher/Coordinator: A/Prof Irena Koprinska Session: Semester 1 Classes: Lectures, Tutorials Prerequisites: Distinction-level results in at least one 2000 level COMP or MATH or SOFT unit Prohibitions: COMP3308 Assumed knowledge: Algorithms. Programming skills (e.g. Java, Python, C, C++, Matlab) Assessment: Through semester assessment (45%) and Final Exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

Note: COMP3308 and COMP3608 share the same lectures, but have different tutorials and assessment (the same type but more challenging).

An advanced alternative to COMP3308; covers material at an advanced and challenging level.

**COMP3027 Algorithm Design**

Credit points: 6 Teacher/Coordinator: Seeun Umboh Session: Semester 1 Classes: lectures, tutorials Prerequisites: COMP2123 OR COMP2823 OR INFO1105 OR INFO1905 Prohibitions: COMP2007 OR COMP2907 OR COMP3927 Assumed knowledge: MATH1004 OR MATH1904 OR MATH1064 Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to the design techniques that are used to find efficient algorithmic solutions for given problems. The techniques covered include greedy, divide-and-conquer, dynamic programming, and adjusting flows in networks. Students will extend their skills in algorithm analysis. The unit also provides an introduction to the concepts of computational complexity and reductions between problems.

**COMP3927 Algorithm Design (Adv)**

Credit points: 6 Teacher/Coordinator: Seeun Umboh Session: Semester 1 Classes: lectures, tutorials Prerequisites: COMP2123 OR COMP2823 OR INFO1105 OR INFO1905 Prohibitions: COMP2007 OR COMP2907 OR COMP3027 Assumed knowledge: MATH1004 OR MATH1904 OR MATH1064 Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to the design techniques that are used to find efficient algorithmic solutions for given problems. The techniques covered include greedy, divide-and-conquer, dynamic programming, and adjusting flows in networks. Students will extend their skills in algorithm analysis. The unit also provides an introduction to the concepts of computational complexity and reductions between problems.

**STAT3021 Stochastic Processes**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 1 Classes: 3 lectures per week, tutorial 1hr per week. Prerequisites: STAT2X11 and (MATH1003 or MATH1903 or MATH1907 or MATH1023 or MATH1923 or MATH1933) Prohibitions: STAT3911 or STAT3011 Assessment: 2 x Quiz (2 x 15%), 2 x Assignment (2 x 5%), Final Exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

A stochastic process is a mathematical model of time-dependent random phenomena and is employed in numerous fields of application, including economics, finance, insurance, physics, biology, chemistry and computer science. After setting up basic elements of stochastic processes, such as time, state, increments, stationarity and Markovian property, this unit develops important properties and limit theorems of discrete-time Markov chain and branching processes. You will then establish key results for the Poisson process and continuous-time Markov chains, such as the memoryless property, super positioning, thinning, Kolmogorov's equations and limiting probabilities. Various illustrative examples are provided throughout the unit to demonstrate how stochastic processes can be applied in modeling and analyzing problems of practical interest. By completing this unit, you will develop the essential basis for further studies, such as stochastic calculus, stochastic differential equations, stochastic control and financial mathematics.

**STAT3022 Applied Linear Models**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 1 Classes: Three 1 hour lectures, one 1 hour tutorial and one 1 hour computer laboratories per week. Prerequisites: STAT2X11 and (DATA2X02 or STAT2X12) Prohibitions: STAT3912 or STAT3012 or STAT3922 Assessment: 2 x assignment (15%), 3 x quizzes (30%), final exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

In today's data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such thing as a best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will apply the theory to various real-world problems using statistical software in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.

**STAT3922 Applied Linear Models (Advanced)**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 1 Classes: Three 1 hour lectures, one 1 hour tutorial and one 1 hour computer laboratory per week. Prerequisites: STAT2X11 and [a mark of 65 or greater in (STAT2X12 or DATA2X02)] Prohibitions: STAT3912 or STAT3012 or STAT3022 Assessment: 2 x assignment (10%), 3 x quizzes (35%), final exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will introduce the fundamental concepts of analysis of data from both observational studies and experimental designs using classical linear methods, together with concepts of collection of data and design of experiments. You will first consider linear models and regression methods with diagnostics for checking appropriateness of models, looking briefly at robust regression methods. Then you will consider the design and analysis of experiments considering notions of replication, randomization and ideas of factorial designs. Throughout the course you will use the R statistical package to give analyses and graphical displays. This unit is essentially an Advanced version of STAT3012, with additional emphasis on the mathematical techniques underlying applied linear models together with proofs of distribution theory based on vector space methods.

**STAT3023 Statistical Inference**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 2 Classes: Three 1 hour lectures, one 1 hour tutorial and one 1 hour computer laboratory per week. Prerequisites: STAT2X11 Prohibitions: STAT3913 or STAT3013 or STAT3923 Assumed knowledge: DATA2X02 or STAT2X12 Assessment: 2 x Quizzes (25%), Computer Lab Report (10%), Computer Exam (10%), Final Exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

In today's data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such a thing as the best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will apply the methods learnt to real-world problems in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.

**STAT3923 Statistical Inference (Advanced)**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 2 Classes: Three 1 hour lectures, one 1 hour tutorial and one 2 hour advanced workshop. Prerequisites: STAT2X11 and a mark of 65 or greater in (DATA2X02 or STAT2X12) Prohibitions: STAT3913 or STAT3013 or STAT3023 Assessment: 2 x Quizzes (20%), weekly homework (5%), Computer Lab Reports (10%), Computer Exam (10%), Final Exam (55%) Mode of delivery: Normal (lecture/lab/tutorial) day

In today's data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such thing as a best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will rigorously prove key results and apply these to real-world problems in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.

**STAT4025 Time Series**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 1 Classes: 3 lectures, one tutorial and one computer class per week. Prerequisites: STAT2X11 and (MATH1X03 or MATH1907 or MATH1X23 or MATH1933) Prohibitions: STAT3925 Assessment: 2 x Quiz (20%), Computer lab participation / task completion (10%), Computer Exam (10%), Final Exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will study basic concepts and methods of time series analysis applicable in many real world problems in numerous fields, including economics, finance, insurance, physics, ecology, chemistry, computer science and engineering. This unit will investigate the basic methods of modelling and analyzing of time series data (ie. data containing serially dependence structure). This can be achieved through learning standard time series procedures on identification of components, autocorrelations, partial autocorrelations and their sampling properties. After setting up these basics, students will learn the theory of stationary univariate time series models including ARMA, ARIMA and SARIMA and their properties. Then the identification, estimation, diagnostic model checking, decision making and forecasting methods based on these models will be developed with applications. The spectral theory of time series, estimation of spectra using periodogram and consistent estimation of spectra using lag-windows will be studied in detail. Further, the methods of analyzing long memory and time series and heteroscedastic time series models including ARCH, GARCH, ACD, SCD and SV models from financial econometrics and the analysis of vector ARIMA models will be developed with applications. By completing this unit, students will develop the essential basis for further studies, such as financial econometrics and financial time series. The skills gained through this unit of study will form a strong foundation to work in a financial industry or in a related research organization.

**STAT4026 Statistical Consulting**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 1 Classes: lecture 1 hr/week; workshop 2hrs/week Prerequisites: At least 12cp from STAT2X11 or STAT2X12 or DATA2X02 or STAT3XXX Prohibitions: STAT3926 Assessment: 4 x reports (40%), take-home exam report (40%), oral presentation (20%) Practical field work: Face to face client consultation: approximately 1 - 1.5 hrs/week Mode of delivery: Normal (lecture/lab/tutorial) day

In our ever-changing world, we are facing a new data-driven era where the capability to efficiently combine and analyse large data collections is essential for informed decision making in business and government, and for scientific research. Statistics and data analytics consulting provide an important framework for many individuals to seek assistance with statistics and data-driven problems. This unit of study will provide students with an opportunity to gain real-life experience in statistical consulting or work with collaborative (interdisciplinary) research. In this unit, you will have an opportunity to have practical experience in a consultation setting with real clients. You will also apply your statistical knowledge in a diverse collection of consulting projects while learning project and time management skills. In this unit you will need to identify and place the client's problem into an analytical framework, provide a solution within a given time frame and communicate your findings back to the client. All such skills are highly valued by employers. This unit will foster the expertise needed to work in a statistical consulting firm or data analytical team which will be essential for data-driven professional and research pathways in the future.

###### Application

**ENVX3001 Environmental GIS**

Credit points: 6 Teacher/Coordinator: Dr Aaron Greenville Session: Semester 2 Classes: Three-day field trip, (two lectures and two practicals per week) Prerequisites: 6cp from (ENVI1003 or AGEN1002) or 6cp from GEOS1XXX or 6cp from BIOL1XXX or GEOS2X11 Assessment: 15-minute presentation (10%), 3500 word prac report (35%), 1500 word report on trip excursion (15%), 2-hour exam (40%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit is designed to impart knowledge and skills in spatial analysis and geographical information science (GISc) for decision-making in an environmental context. The lecture material will present several themes: principles of GISc, geospatial data sources and acquisition methods, processing of geospatial data and spatial statistics. Practical exercises will focus on learning geographical information systems (GIS) and how to apply them to land resource assessment, including digital terrain modelling, land-cover assessment, sub-catchment modelling, ecological applications, and soil quality assessment for decisions regarding sustainable land use and management. A three day field excursion during the mid-semester break will involve visiting Canberra to hear from various government agencies which research and maintain GIS coverages for Australia. By the end of this unit, students should be able to: differentiate between spatial data and spatial information; source geospatial data from government and private agencies; apply conceptual models of spatial phenomena for practical decision-making in an environmental context; apply critical analysis of situations to apply the concepts of spatial analysis to solving environmental and land resource problems; communicate effectively results of GIS investigations through various means- oral, written and essay formats; and use a major GIS software package such as ArcGIS.

Textbooks

Burrough, P.A. and McDonnell, R.A. 1998. Principles of Geographic Information Systems. Oxford University Press: Oxford.

**ENVX3002 Statistics in the Natural Sciences**

Credit points: 6 Teacher/Coordinator: Dr Floris van Ogtrop Session: Semester 1 Classes: One 2-hour workshop per week, one 3-hour computer practical per week Prerequisites: ENVX2001 or STAT2X12 or BIOL2X22 or DATA2X02 or QBIO2001 Assessment: One computer-based exam during the exam period (50%), assessment tasks focusing on analysing and interpreting real datasets (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

This unit of study is designed to introduce students to the analysis of data they may face in their future careers, in particular data that are not well behaved. The data may be non-normal, there may be missing observations, they may be correlated in space and time or too numerous to analyse with standard models. The unit is presented in an applied context with an emphasis on correctly analysing authentic datasets, and interpreting the output. It begins with the analysis and design of experiments based on the general linear model. In the second part, students will learn about the generalisation of the general linear model to accommodate non-normal data with a particular emphasis on the binomial and Poisson distributions. In the third part linear mixed models will be introduced which provide the means to analyse datasets that do not meet the assumptions of independent and equal errors, for example data that is correlated in space and time. The units ends with an introduction to machine learning and predictive modelling. A key feature of the unit is using R to develop coding skills that are become essential in science for processing and analysing datasets of ever increasing size.

**AMED3002 Interrogating Biomedical and Health Data**

Credit points: 6 Teacher/Coordinator: Dr Ellis Patrick Session: Semester 1 Classes: face to face 5 hrs/week; online 2 hrs/week; individual and/or group work 3-6 hrs/week Assumed knowledge: Exploratory data analysis, sampling, simple linear regression, t-tests, confidence intervals and chi-squared goodness of fit tests, familiar with basic coding, basic linear algebra. Assessment: Exam, assignments, quiz, presentation Mode of delivery: Normal (lecture/lab/tutorial) day

Biotechnological advances have given rise to an explosion of original and shared public data relevant to human health. These data, including the monitoring of expression levels for thousands of genes and proteins simultaneously, together with multiple databases on biological systems, now promise exciting, ground-breaking discoveries in complex diseases. Critical to these discoveries will be our ability to unravel and extract information from these data. In this unit, you will develop analytical skills required to work with data obtained in the medical and diagnostic sciences. You will explore clinical data using powerful, state of the art methods and tools. Using real data sets, you will be guided in the application of modern data science techniques to interrogate, analyse and represent the data, both graphically and numerically. By analysing your own real data, as well as that from large public resources you will learn and apply the methods needed to find information on the relationship between genes and disease. Leveraging expertise from multiple sources by working in team-based collaborative learning environments, you will develop knowledge and skills that will enable you to play an active role in finding meaningful solutions to difficult problems, creating an important impact on our lives.

**QBUS3810 Actuarial Risk Analytics**

*This unit of study is not available in 2020*

Credit points: 6 Session: Semester 1 Classes: 1x 2hr lecture and 1x 1hr tutorial per week Prerequisites: QBUS2810 or DATA2002 or ECMT2110 Prohibitions: ECMT3180 Assessment: assignment 1 (10%), assignment 2 (10%), assignment 3 (10%), mid-semester exam (15%), group assignment (15%), final exam (40%) Mode of delivery: Normal (lecture/lab/tutorial) day

Everyone working in business needs to understand and manage risk. This unit provides the basic knowledge and tools needed to do this. It includes material on the risk management strategies that every business needs, as well as specific quantitative and statistical techniques for evaluating risk. Through this unit students learn how different aspects of risk management fit together (like Value-at-Risk (VaR) and tail-VaR calculations, Monte-Carlo simulation, extreme value theory, individual and collective risk models, credibility theory and credit scoring).

**GEGE3004 Applied Genomics**

Credit points: 6 Teacher/Coordinator: Prof Claire Wade Session: Semester 2 Classes: Workshop 4 hours per week during standard semester. Prerequisites: 6cp of (GEGE2X01 or QBIO2XXX or DATA2X01 or GENE2XXX or MBLG2X72 or ENVX2001 or DATA2X02) Prohibitions: ANSC3107 Assumed knowledge: Genetics at 2000 level, Biology at 1000 level, algebra Assessment: The assessment will consist of one intra-semester examination (20%), group work assignment (30%)[ including assessment both of a project report (20%) and the team process (10%)], individual assignment (10%) and final examination (40%). Mode of delivery: Normal (lecture/lab/tutorial) day

Note: This unit must be taken by all students in the Genetics and Genomics major.

The average mammalian genome is 3 billion nucleotides long and some other organisms have genomes that are even larger. Working with DNA at the nucleotide level on an organismal scale is impossible without the assistance of high performance computing. This unit will investigate strategies to manipulate genomic data on a whole organism scale. You will learn how scientists use high performance computing and web-based resources to compare and assemble genomes, map genes that cause specific phenotypes, and uncover mutations that cause phenotypic changes in organisms that influence health, external characteristics, production and disease. By doing this unit you will develop skills in the analysis of big data, you will gain familiarity with high performance computing worktop environments and learn to use bioinformatics tools that are commonly applied in research.

**BCMB3004 Beyond The Genome**

Credit points: 6 Teacher/Coordinator: Prof Stuart Cordwell Session: Semester 2 Classes: lectures 2 hrs/week, practicals 3 hrs/week Prerequisites: 12 credit points from (AMED3001 or BCHM2X71 or BCHM2X72 or BCHM3XXX or BCMB2X01 or BCMB2X02 or BCMB3XXX or BIOL2X29 or BMED2401 or BMED2405 or GEGE2X01 or MBLG2X01 or MEDS2002 or MEDS2003 or PCOL2X21 or QBIO2001) Prohibitions: BCHM3X81 or BCMB3002 Assumed knowledge: Intermediate protein chemistry and biochemistry concepts Assessment: 4 x in-practical reports (10%), take-home computational practical (5%), 1000-1500wd scientific report (10%), mid-semester quiz (10%), 1500-2000wd data analysis and interpretation scientific report (15%) final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

The sequencing of the human genome was a landmark achievement in science and medicine, marking the 'Age of Genomics'. Now we can access the blueprints for life, but need to uncover how those blueprints work, allowing organisms to respond to internal and external environmental changes, and how we can utilise this plethora of DNA sequence information to improve human and planetary health. This unit will investigate the function of the genome by examining the proteome, metabolome and beyond. You will investigate links between the central dogma of molecular biology and the complexities of living genomes - from modifications that massively increase diversity to the dynamic metabolome. You will explore fundamental cellular processes and discover how they are shaped by the proteome via gene expression, post-translational modification and protein complex formation. These processes will be examined in the context of human health and cardiovascular and metabolic disorders (e. g. type 2 diabetes) to demonstrate how global approaches can define, diagnose and help develop treatments for disease. You will practice methods employed in the post-genome era, including the 'Multi-omics' approaches that provide a global view of living systems, and discover how they are applied to solve problems in biology, biomedicine and agriculture. By the end of the unit students will understand why global 'omics approaches are needed in the post-genome era and know how best to apply such tools to given biological and biomedical problems.

**BCMB3904 Beyond The Genome (Advanced)**

Credit points: 6 Teacher/Coordinator: Prof Stuart Cordwell Session: Semester 2 Classes: lectures 2 hrs/week, practicals 3 hrs/week, 4 x 1 hr advanced tutorials, 8 x 1 hr advanced practicals Prerequisites: An average mark of 75 or above in 12 credit points from (AMED3001 or BCHM2X71 or BCHM2X72 or BCHM3XXX or BCMB2X01 or BCMB2X02 or BCMB3XXX or BIOL2X29 or BMED2401 or BMED2405 or GEGE2X01 or MBLG2X01 or MEDS2002 or MEDS2003 or PCOL2X21 or QBIO2001) Prohibitions: BCHM3X92 or BCMB3004 Assumed knowledge: Students should understand basic concepts in human, mammalian, plant and/or prokaryotic biology. Students should have a basic understanding of the 'genome' and of the central dogma of molecular biology (gene transcription and protein translation). Additional knowledge of basic chemistry and protein biochemistry will be helpful. Assessment: 4 x in-practical reports (10%), take-home computational practical (5%), 1000-1500wd scientific report (10%), mid-semester quiz (10%), 1500-2000wd data analysis and interpretation scientific report (15%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day

The sequencing of the human genome was a landmark achievement in science and medicine, marking the 'Age of Genomics'. Now we can access the blueprints for life, but need to uncover how those blueprints work, allowing organisms to respond to internal and external environmental changes, and how we can utilise this plethora of DNA sequence information to improve human and planetary health. This unit will investigate the function of the genome by examining the proteome, metabolome and beyond. You will investigate links between the central dogma of molecular biology and the complexities of living genomes - from modifications that massively increase diversity to the dynamic metabolome. You will explore fundamental cellular processes and discover how they are shaped by the proteome via gene expression, post-translational modification and protein complex formation. These processes will be examined in the context of human health and cardiovascular and metabolic disorders (e. g. type 2 diabetes) to demonstrate how global approaches can define, diagnose and help develop treatments for disease. You will practice methods employed in the post-genome era, including the 'Multi-omics' approaches that provide a global view of living systems, and discover how they are applied to solve problems in biology, biomedicine and agriculture. Beyond the Genome (Advanced) has the same overall structure as BCMB3004 but focuses on a more advanced level of practical work, data analysis and interpretation, using cutting-edge technologies. By the end of the unit students will understand why global 'omics approaches are needed in the post-genome era and know how best to apply such tools to given biological and biomedical problems.

###### Selective Interdisciplinary Project

**SCPU3001 Science Interdisciplinary Project**

Credit points: 6 Teacher/Coordinator: Prof Pauline Ross Session: Intensive February,Intensive July,Semester 1,Semester 2 Classes: The unit consists of one seminar/workshop per week with accompanying online materials and a project to be determined in consultation with the partner organisation and completed as part of a team with academic supervision. Prerequisites: Completion of 2000-level units required for at least one Science major. Assessment: group plan, group presentation, reflective journal, group project Mode of delivery: Normal (lecture/lab/tutorial) day

This unit is designed for students who are concurrently enrolled in at least one 3000-level Science Table A unit of study to undertake a project that allows them to work with one of the University's industry and community partners. Students will work in teams on a real-world problem provided by the partner. This experience will allow students to apply their academic skills and disciplinary knowledge to a real-world issue in an authentic and meaningful way. Participation in this unit will require students to submit an application to the Faculty of Science.

**STAT3888 Statistical Machine Learning**

Credit points: 6 Teacher/Coordinator: Dr John Ormerod Session: Semester 2 Classes: Three 1 hour lectures, one 1 hour tutorial and one 1 hour computer laboratory per week. Prerequisites: STAT2X11 and (DATA2X02 or STAT2X12) Prohibitions: STAT3914 or STAT3014 Assumed knowledge: STAT3012 or STAT3912 or STAT3022 or STAT3922 Assessment: Written exam (40%), major project (50%), computer labs (10%) Mode of delivery: Normal (lecture/lab/tutorial) day

Data Science is an emerging and inherently interdisciplinary field. A key set of skills in this area fall under the umbrella of Statistical Machine Learning methods. This unit presents the opportunity to bring together the concepts and skills you have learnt from a Statistics or Data Science major, and apply them to a joint project with NUTM3888 where Statistics and Data Science students will form teams with Nutrition students to solve a real world problem using Statistical Machine Learning methods. The unit will cover a wide breadth of cutting edge supervised and unsupervised learning methods will be covered including principal component analysis, multivariate tests, discrimination analysis, Gaussian graphical models, log-linear models, classification trees, k-nearest neighbors, k-means clustering, hierarchical clustering, and logistic regression. In this unit, you will continue to understand and explore disciplinary knowledge, while also meeting and collaborating through project-based learning; identifying and solving problems, analysing data and communicating your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, and this is essential for both professional and research pathways in the future.