Statistical Natural Language Processing (COMP5046)


This unit deals with techniques for the automatic processing of natural languages (such as English, French, etc) and the engineering of such software systems. Engineering processes will be described in the context of methods for creating effective tools for information retrieval and extraction, question answering, classifying and clustering of the documents in a large corpora. Processing sub-systems for such tasks as tokenisation, lexical verification, part-of-speech tagging, parsing and word sense disambiguation will be described. Particular emphasis is given to methods that analyse the meaning in texts and the general application of machine learning methods to these topics. Various applications of these methods to research in health texts and other contexts being pursued in the University of Sydney will be explored.

Our courses that offer this unit of study

Further unit of study information


One 2 hour scheduled small-group class per week.


Through semester assessment (60%), Final Exam (40%)

Faculty/department permission required?


Unit of study rules

Prerequisites and assumed knowledge

Prerequisite: Assumed knowledge: Knowledge of an OO programming language

Prerequisite: Assumed knowledge: Knowledge of an OO programming language



Study this unit outside a degree

Non-award/non-degree study

If you wish to undertake one or more units of study (subjects) for your own interest but not towards a degree, you may enrol in single units as a non-award student.

Find a non-award course for this unit of study

Cross-institutional study

If you are from another Australian tertiary institution you may be permitted to underake cross-institutional study in one or more units of study at the University of Sydney.

Find a cross-institutional course for this unit of study