I am currently a PhD student in the School Information Technologies at the University of Sydney. In 2008, I completed a bachelor of Arts (Languages) from the University of Sydney, majoring in Linguistics, German and Digital Cultures. This was followed by a Graduate Certificate in Computing in 2009.
My PhD is in the field of Natural Language Processing, specifically on integrating the structure of Named Entities into Combinatory Categorical Grammar.
I am currently working on integrating Named Entity information into Combinatory Categorial Grammar.
Named Entity Recognition (NER) is the task of identifying and classifying mentions of people, organisations, locations, and other Named Entities (NE) within text, and is usually performed by statistical models trained on large amounts of annotated training data. NER systems frequently form part of a pipeline of language analysis tools along with tokenisation, part of speech tagging, and parsing, with errors produced in one system propagating down the pipeline. By instead integrating NE structure directly into the grammar formalism and performing NER and parsing concurrently, we hope to improve both parsing and NER accuracy by improving the amount of linguistic information, both semantic and syntactic, available for both tasks.
- Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran (2011).
Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence (submitted). Elsevier. [abstract][resources]
We automatically create enormous, free and multilingual “silver”-standard training annotations for named entity recognition (NER) by exploiting the text and structure of Wikipedia. Most NER systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes.We first classify each Wikipedia article into named entity (NE) types, training and evaluating on 7,200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy.We transform the links between articles into NE annotations by projecting the target article’s classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards.We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against CoNLL Shared Task data and other gold-standard corpora. Our approach outperforms other approaches to automatic NE annotation (Richman08,Mika08); competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.
- Dominic Balasuriya, Nicky Ringland, Joel Nothman, Tara Murphy, and James R. Curran (2009).
Named entity recognition in Wikipedia. In Proceedings of the Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (PeoplesWeb), pages 10–18. [abstract][www][resources]
Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia’s link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves.We present the first NER evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG shows that Wikipedia text is a harder NER domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG and, when used as training data, outperforms newswire models by up to 7.7%.
- Nicky Ringland, Joel Nothman, Tara Murphy, and James R. Curran (2009).
Classifying articles in English and German Wikipedia. In Proceedings of the Australasian Language Technology Association Workshop (ALTW), pages 20–28. [abstract][www][acl][resources]
Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. We investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural features of Wikipedia, we can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F-score of the state-of-the-art process in English.
The Girls' Programming Network is an extra-curricular program for high school girls in years 9-12 interested in computers and technology. GPN is run by the School of Information Technology at the University of Sydney, and investigates different topics in the world of computers and IT, teaching students to program in Python.
GPN is an excellent opportunity for students to move beyond being users of technology and learn to be the designers and builders of technology, to move ahead of the typical user and employ both advanced technical skills and creative expression. It lets girls explore computer science in an encouraging, fast-paced atmosphere, and gives the girls the chance to meet new friends with similar interests, find female university students as mentors, and find out about university life.
The ability not only to use computers but to be a true creator of technology is quickly becoming a valuable skill in any career, including science, engineering, arts and medicine. For a number of reasons, girls tend to become expert users of technology, but rarely independently take steps to explore creating and developing technology in their own time. This leads to an imbalance in confidence at university, as boys frequently have substantial previous experience in programming and IT. Coupled with the gender imbalance of technology based university enrolments, some girls find choosing Information Technology degrees daunting. The purpose of GPN is to provide an opportunity to develop their technical skills and social network so they can pursue their interest in IT with confidence.
I am particularly motivated to offer the help, guidance and support I would have benefited from to other young girls now, to demonstrate my passion and show that girls have a strong place in IT.
The NCSS summer school is an intensive ten-day residential camp for outstanding Year 11 and 12 students and teachers from around Australia. The participants learn computer science and software engineering, and complete a major team project in either the social networking or embedded systems streams. It is hosted by the University of Sydney and has been offered in various formats since 1996. In that time, NCSS has taught over 1400 students and over 120 teachers. The summer school has an impressive record of engaging a range of students from around Australia. Students are selected on academic excellence, computing interest and experience, recommendations from teachers and their performance in the NCSS Challenge. It's designed as the polishing school for the elite of the elite from amongst the Challenge, and also some academically talented students who are yet to experience the joys of programming. During NCSS, students intensively learn more advanced programming skills (either in web programming or embedded systems). The web programming teams build Facebook-like social network applications using Python (with the Tornado web framework), HTML5, JQuery, SQLite, and a bunch of other modern tools. The embedded systems teams have worked with iRobot Create robots (the Roomba base robots) to solve problems collaboratively, like getting out of a maze and rescuing other robots in Arduino's C/C++-like language, and more recently have worked with Arduinos to create GPS tracking systems to be launched in weather balloons. Watch our video to learn more about the program.
The NCSS Challenge is a 5-week online programming competition and learning resource, designed to improve the quality of computer science and software engineering taught in Australian high schools. The Challenge is unique because unlike existing competitions, we don't expect students to know how to program in advance, but instead teach them about computer programming over a 5 week course. We provide a short set of notes and video lectures covering one or more programming concepts, and then a series of fun and challenging programming problems that require those concepts to solve. Our intelligent auto-marker runs a series of test cases live against the problems, identifying errors and giving students hints about how to correct their program. We also have online forums and messages to over 30 volunteer tutors who provide technical support and encouragement.
The Challenge can be used as an in class activity for the 5 weeks, or an extension activity for students to do in their own time. A number of teachers have also used it for their own professional development. In 2012 we had 3 different streams (for differing levels of Python, from very straightforward programming drill up to some problems that would make the best of our undergraduates think hard for a long time about the solution). We have also previously run an embedded systems stream in the Challenge, where hardware was sent out to students, subsidised by industry sponsorship.
In 2012 we had over 4200 high school students (and some primary school students) and 350 teachers compete in the Challenge from right across Australia, New Zealand and Singapore. We received over 190,000 submissions to the Challenge system in 5 weeks. We taught students how to estimate the speed of light using their microwaves and marshmallows, how to use Twitter to make poetry, and how to use machine learning methods to learn which mushrooms are safe to eat.
CS4HS at Sydney is a free two day workshop for high school maths, science and computing teachers run by the University of Sydney and sponsored by Google.
Our workshops aim to provide teachers with support, materials and real world examples of computation in action for use in a classroom setting, including text materials and data for experiments. We present recent developments in the field of computer science and the exciting problems that students can work towards.
CS4HS workshops at Sydney includes short talks by some of Australia's leading academics, with each talk followed by a practical lab with materials and activities teachers can use in a classroom setting.
I am an Outreach Officer for the National Computer Science School, which runs the NCSS Summer School and the NCSS Challenge.