Statistical Natural Language Processing (COMP5046)
UNIT OF STUDY
This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP).
Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answer; machine translation; and classifying and clustering of documents. This unit will explore state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense disambiguation, part-of-speech tagging, named entity recognition, text categorisation, phrase structure and Combinatory Categorial Grammar parsing.
Students will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. Students will annotate data as part of completing a real-world NLP task.
Further unit of study information
Lecture 2 hrs/week; Laboratory 1 hr/week.
Through semester assessment (50%) Final Exam (50%)
Faculty/department permission required?
Unit of study rules
Prerequisites and assumed knowledge
Knowledge of an OO programming language
Study this unit outside a degree
If you wish to undertake one or more units of study (subjects) for your own interest but not towards a degree, you may enrol in single units as a non-award student.
If you are from another Australian tertiary institution you may be permitted to underake cross-institutional study in one or more units of study at the University of Sydney.