Honours Projects 2010
Projects supervised by Jon Patrick
A study of prepositions in the English Language
Prepositions are an important part of the glue of meaning in language. No truly successful semantic analyser of language will be successful without effective detection of the meaning of a preposition in its usage. The aim of this project is to build a system that detects a preposition meaning in each instance in text. This involves developing a system for describing the various meanings of prepositions and the various formal roles they can serve in their usage and then developing algorithms for detecting those meanings and roles in a sentence.
Canonicalisation of medical texts
This research is a study of the pre-processing functionality needed for the natural language processing of medical texts. Medical texts and clinical notes contain a great deal of specialised information that needs to be identified before more conventional methods of tagging and parsing can begin to operate effectively. Certain genres of texts, such as articles originating from newspapers or journals, or formal case reports written up by doctors about their patients are generally expected to conform to conventions of structure and readability. Yet, other types of texts, such as notes on a patient written by a GP for later review, or Emergency Department (ED) reports entered by triage nurses as patients enter the ED are marked by problems which are prone to cause issues in the quality of any text processing we may wish to perform. Examples of such problems are variance in the representation of core medical concepts (whether unconscious, such as typographical errors, or conscious, such as abbreviations), and the occurrences of different notations to signify the same concept (for example, the many ways in which a doctor might denote a blood pressure reading). Yet other problems are general to nearly all texts in the medical domain, such as the recognition of multiword expressions (for example, we would wish “arterial blood pressure” to be considered as a single unit, as opposed to three distinct and unrelated words).
The aim of this project is to identify methods for performing these pre-processing functions so as to enhance the usability of the text in language technology systems, such as document classifiers. This work will use a large corpus of texts obtained from the Royal Prince Alfred Hospital, Intensive Care information system. Some types of processing needed are:
General Purpose Information Systems for Hospitals
There are a number of projects aimed at advancing the research into the process of generating operational information systems from a master system. This is the idea of having an information system generator that creates particular systems for medical specialities such as Emergency, Intensive Care, Cardiology, Oncology, Obstetrics, etc. The projects below aim to address different facets of this general problem.
Project Objectives
The current Emergency Department Information Systems are considered to be antiquated technology that needs to be redeveloped. They have limitations in that they do not interact with other information systems especially with the medical imaging and pathology systems. This project aims to expand our research into ways of generating hospital information systems and to produce a working prototype of an ED system from a generic solution to generating application specific information systems.
Project Resources
Previous work has a produced a systems analysis of the ED at Westmead hospital. Also there is an historical set of specifications for ED systems which will be made available from a group of doctors specializing in this area of medicine. A methodology has been created for generating ISs from a master system which has as its major functions
- Incorporation of medical terminology systems
- Real-time data capture of point of care information, e.g., text descriptions, bedside measurements (pulse, blood pressure, temperature, etc.)
- Interaction with other information systems for data requests and storage
- Storage of a standardised electronic medical records, Retrieval of Electronic Medical Records (EMR)
The Representation of Clinical Care Plans for a point-of-care monitoring system
Royal Prince Alfred Hospital, Intensive Care Unit
A range of clinical decisions are made based on the reports that are returned from other departments (e.g. pathology) in concert with bedside instrumentation and bedside manual observations, all of which are entered into the ICU Information System. The clinical decisions are usually entered into the patient notes but there is no mechanism that ensures they are actually recorded, or acted on. Provision of alerts for a range of clinical criteria could be provided at this point of care. Hence it is proposed that the current set of care plan protocols stored on site can be scrutinised for the knowledge that might by usable. The task is to extract information from the Care Plan Protocols and convert them into “Computable Guidelines” and adapt them for use with the ICU-IS to better aid clinical work processes and flow.
Clinical Data Analytics Language (CliniDAL)
Previously a prototype Clinical Data Analytics Language has been developed for use in Intensive Care Units. The purpose of this language is to allow doctors to ask a range of questions from the clinical information system using restricted natural language. This project has to expand the capabilities of the original CliniDAL to allow a wider range of questions and to allow it to be bolted on to other information systems in the hospital. The project will require working with a number of hospitals to establish the types of questions they want to ask and the nature of the answers they expect.
Adaptable Finite State Machines
Finite State Machines (FSM) are well known for their capacity to accurately model well known processing tasks. In the field of machine learning they are little used as they are considered as too rigid.
This is an unfair view as there are techniques for making them very adaptable. The aim of this project is to study methods of making FSMs adaptable and showing how useful they are for lower level language processing tasks.
Software Engineering Framework for Computational Modelling and Machine
Learning Computational modelling has at its kernel machine learning processes. However the process of getting a machine learning into a working industrial environment has many steps. The aim of this project is to design and engineer a workflow process that automates the creation of the industrial process in which a trained machine learner will operate.
Spelling Corrector for Medical Texts
The process of producing a spelling checker is relatively straightforward as one only has to check that each word is present in a list of known words. This method breaks down when new words are introduced such as people's names. However it also fails with specialist terminology such as in medicine as the software providers don't have the specialist dictionary to check words, plus in technical disciplines new words are being coined frequently and no static list can keep up with the influx of new words.
Spelling correction on the other hand is a much more difficult task as for most typing mistakes there can be a number of alternative correct words. This problem is tackled nowadays with machine learning strategies where the corrector is trained from a series of examples.
Sophisticated solutions use multiple methods such as a machine learner, rules for the most common mistakes, and grammatical rules for certain technical language.
We have a collection of 85,000 words that have been manually spell corrected, taken from a collection of 60 million words of clinical notes written at the Royal Prince Alfred Hospital over 6 years. The aim of this project is to define methods for building a very knowledgable spell corrector using this training data.
Data Mining of Ambulance Data
Co-supervisor: Irena Koprinska
The NSW Ambulance Service is the third largest in the world responding to more than 1 million calls per annum. The Service collects a variety of data to understand its processes and to improve the way it does its work. The data can be divided into two aspects, namely the data about calls for assistance and their reaction to them, secondly data about the patients and the assistance and care they provide to people while travelling in ambulances. There is a need to data mine these stores of information to reveal hitherto unseen correlations in the behaviour of the Service in responding to calls for assistance and the effectiveness of the assistance provided.