MEAFA Professional Development Workshop on Machine Learning using Python 19-23 February 2018

Machine Learning

Machine learning describes a set of data analysis methods that automatically detects patterns in data and use them to predict future data and guide decision making, often in real-time. Machine Learning is inherently iterative because as the method is exposed to new data, then it can learn from patterns and is able to independently adapt without the need to be explicitly programmed. Thus, the method learns from previous computations to produce reliable, repeatable results. This workshop will provide a comprehensive introduction to machine learning for predictive problems (known as supervised learning), with a focus on practical implementation and applications. Starting with basic methods such as linear regression, naïve Bayes and logistic regression, the workshop will build towards popular algorithms that have the ability to learn complex patterns from data, such as random forests, gradient boosting, and neural networks.

The workshop will be based on the powerful ecosystem of free and open source machine learning libraries for Python. The main tool will be Scikit-Learn, a production-ready framework that provides a wide range of machine learning algorithms through implementations that are simple to use. The workshop will also leverage state-of-art libraries such as XGBoost and TensorFlow for specialised tasks. By participating in this workshop, you will acquire a wide range of machine learning tools for applications in business and economics. The workshop will include case studies that demonstrate full machine learning project workflows, thus providing participants with templates that they can build on to start developing their own machine learning projects upon completion of the workshop.

Machine Learning presenter

The presenter for the Days 3-5 on Machine Learning is Marcel Scharth, Business Analytics, The University of Sydney. Marcel specialises in Bayesian statistics, Monte Carlo methods, statistical (machine) learning, and time series analysis. Marcel's current interests include Bayesian machine learning, high-dimensional estimation, dynamic factor models, and statistical learning methods for time series. Marcel teaches Statistical Learning and Data Mining, and Predictive Analytics.

Python presenter

The presenter Days 1-2 on Python is Dmytro Matsypura, Business Analytics, The University of Sydney. Dmytro uses Python in his day to day work for applications of convex and combinatorial optimisation in forecasting, graph theory, finance, transportation, and ecology. Dmytro also teaches Python as the main computing tool in his courses on Management Science and Optimisation.

Content decription

You may attend any one day or any combination of the following days.

Days 1-2 (Monday-Tuesday 19-20 Feb 2018): Introduction to Python and Data Management by Dmytro Matsypura, Business Analytics, The University of Sydney

These days assume no previous knowledge of Python or other programming language. The day introduces Python, its limitations and strengths, and core syntactic features. It discusses key principles and presents notable features, including data types, and looping techniques. It also covers complex data structures, the combination of datasets and extraction of information from data, including the management of textual data. The day is of interest to those who are new to Python or have no prior programming experience. It is also useful to those with limited programming experience who wish to attain a more structural understanding of Python.

Days 3-5 (Wednesday-Friday, 21-23 Feb 2018): Machine Learning Marcel Scharth, Business Analytics, the University of Sydney.

The 3-day workshop assumes prior knowledge on Python as covered in Days 1-2. During these three days you will learn practical tools for Machine Learning, specifically supervised learning, including the widely-used methods of ridge regression, the Lasso, naïve Bayes classifier, decision trees, random forests, boosting, support vector machines, and neural networks. The last part of the workshop introduces advanced tools for neural networks and the concept of deep learning. The focus is applied but background material will be provided for technical descriptions of the algorithms, theory and further reading.

Enrolment and Fees

You may attend Days 1-2 to learn Python, Days 3-5 to learn Machine Learning if you already know Python, or all days of the workshop if you do not know Python and also want to learn Machine Learning. The cost for attending each day is $600. Prices include GST. If you are paying with a University of Sydney credit card or through an internal journal entry then deduct the cost of GST.

Fees include extensive course material, code, data sets, use of computing facilities, and full catering throughout the days. To express your interest in attending you must complete the online form:

Expression of Interest

Numbers are limited and places are reserved on a first-come first-served basis upon the submission of the online EOI form. Successful attendees will be notified shortly after and invoices will be issued accordingly. Due to limited places, MEAFA maintains a no refund policy. For more information on enrolment and fees contact business.meafa@sydney.edu.au.

Net proceeds from the workshop go to funding MEAFA PhD scholarships.

Discounts

You may qualify for one of the following discounts:

  • 25% discount for a limited number of non-employed full-time research students.
  • 10% discount for additional attendees from the same business organisation, governmental department or academic unit.

Venue and computing facilities

The workshop takes place at The University of Sydney Business School Codrington Building H69, Computer Lab 5. The H69 Codrington Building is adjacent to the new Business School building. For directions, go to Campus Maps and search for the keyword 'H69'.

Desktop PCs are provided onsite. You can also work on your own laptop but you cannot access the web using the University of Sydney server. If you plan to work on your own laptop please install the XGBoost, LightGBM, TensorFlow, Keras, and Natural Language Toolkit (including data) packages prior to the workshop. The WordCloud and Basemap packages will also be needed to reproduce some of the visualisations shown in the workshop.

Accommodation

MEAFA does not engage in the administration of temporary accommodation. It is up to you to find suitable living arrangements.

Timetable

All days have the same time schedule:

  • 08:40-09:00 - Welcome tea and coffee
    09:00-10:30 - Session 1
  • 10:30-10:45 - Morning break
    10:45-12:15 - Session 2
  • 12:15-13:15 - Lunch
    13:15-14:45 - Session 3
  • 14:45-15:00 - Afternoon break
    15:00-16:30 - Session 4
  • 16:30-17:00 - Buffer-time and user-specific questions

The computer labs will be accessible from 8am to 8pm every day.

Detailed Programme

Day 1 (Monday 19 Feb 2018): Introduction to Python

Session 1: Python basics
Python versions; Python distributions; installation; updates; help resources; interactive development environments.
Session 2: Python key features
Data types; type conversions; files and loops; Boolean statements; if statements; sets and set operations.
Session 3: List and dictionaries
Lists and list operations; list comprehensions; dictionaries; creating, iterating over, sorting, accessing keys and values; fast extraction of lists from dictionaries.
Session 4: Programming features
Introduction to functions; variable scopes; loops; modules and classes; debugging and error handling.

Day 2 (Tuesday 20 Feb 2018): Data Management using Python

Session 1: Data fundamentals
Structured and unstructured data; delimited data; file management; reading, writing and appending; dialects and formatting parameters.
Session 2: Numerical data
Data structures; storage and precision; summaries; tabulations; data combinationss.
Session 3: Text data
Regular expressions; parsing text; ASCII and Unicode text; text mining; text functions.
Session 4: Output management
Output management; output formatting; tables and graphs; working with large data.

Day 3 (Wednesday 21 Feb 2018): Machine Learning fundamentals

Session 1: Introduction to Machine Learning
Project workflow with Scikit-Learn and Pandas; linear regression; k-nearest neighbours regression; cross-validation; model evaluation for regression.
Session 2: Regularized linear methods
Standardising predictors; pipelines; ridge regression; the Lasso; elastic net; Bayesian ridge regression.
Session 3: Naïve Bayes
Case study on sentiment analysis.; text processing with the Natural Language Toolkit; tokenization; removing punctuation and stopwords; stemming and lemmatization; naïve Bayes classifier; evaluating binary classification models.
Session 4: Logistic regression and optimal decisions
Case study on consumer credit risk modelling; feature engineering; logistic regression; regularised logistic regression; Gaussian discriminant analysis; loss matrices and decision thresholds; imbalanced classes; implementing optimal decisions.

Day 4 (Thursday 22 Feb 2018): Trees and Ensembles

Session 1: Decision trees and random forests
Decision trees; visualising decision trees; bagging; random forests; tuning random forests.
Session 2: Boosting
AdaBoost; gradient boosting; boosting with Scikit-Learn; specialised libraries: XGBoost and LightGBM; tuning boosting algorithms.
Session 3: Ensemble learning
Boosting (continued); voting classifier; Ensemble learning; model stacking.
Session 4: Application
Case study on a Kaggle competition.

Day 5 (Friday 23 Feb 2018): Advanced methods

Session 1: Support vector machines
Linear SVM classification; soft margin classification; nonlinear SVM classification; polynomial kernel; Gaussian RBF kernel; SVM regression.
Session 2: Getting started with neural networks
Deep learning frameworks: TensorFlow and Keras; creating and managing graphs; linear regression with TensorFlow or Keras; implementing gradient descent; feeding data to the training algorithm.
Session 3: Artificial neural networks
Introduction to artificial neural networks; multi-layer perceptron and backpropagation; tuning neural networks..
Session 4: Introduction to deep learning
Introduction to training deep neural networks.

N.B. The precise content per session is subject to reshuffling and fine-tuning.

Expression of Interest

Numbers are limited and places are reserved on a first-come first-served basis following the completion of the online form:

Expression of Interest