Natural Language Processing (COMP5046)


This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP).Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answer; machine translation; and classifying and clustering of documents. This unit will explore state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense disambiguation, part-of-speech tagging, named entity recognition, text categorisation, phrase structure and Combinatory Categorial Grammar parsing.Students will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. Students will annotate data as part of completing a real-world NLP task.

Our courses that offer this unit of study

Further unit of study information


Lectures, Laboratory


Through semester assessment (50%) and Final Exam (50%)

Faculty/department permission required?


Unit of study rules

Assumed knowledge

Knowledge of an OO programming language

Study this unit outside a degree

Non-award/non-degree study

If you wish to undertake one or more units of study (subjects) for your own interest but not towards a degree, you may enrol in single units as a non-award student.

Cross-institutional study

If you are from another Australian tertiary institution you may be permitted to underake cross-institutional study in one or more units of study at the University of Sydney.