James Curran

James Curran

BSc Adv (Hons) Sydney PhD Edinburgh

Dr
Associate Professor and ARC Australian Research Fellow
Schwa Lab

SIT Building J12, Room 4E-449
(Cnr Cleveland St and City Rd)
P: +61 2 9036 6037
F: +61 2 9351 3838
E: james dot r dot curran AT sydney DOT edu DOT au
URL http://www.sydney.edu.au/it/~james

Research interests

My research is in computational linguistics, focusing on robust statistical approaches to broad-coverage large-scale natural language processing (NLP). My interests range from the design of fundamental NLP components, including text processing and tagging tools, through to statistical parsers and high-level systems for financial modelling using text, question answering and information extraction.

My background is in computer science. Computational linguistics poses challenges that require both algorithmic and implementation techniques of interest to computer scientists. Meanwhile, computational linguists are developing complex formalisms and statistical models that enable increasingly detailed linguistic analyses. Unfortunately, greater fidelity usually brings significant efficiency penalties. Providing even a superficial analysis of the rapidly growing volume of text now available is a prodigious task.

I am excited by the challenges of developing large-scale and robust deep-linguistic processing techniques that are feasible for tera-scale datasets. I believe that statistical parsing with lexicalised grammar formalisms, e.g. Combinatory Categorial Grammar (CCG), and supertagging, provides the best trade-off between linguistic fidelity and efficiency. Efficient, accurate parsing will enable us to create and exploit unprecedented quantities of automatically analysed text using semi-supervised knowledge acquisition. This will be crucial to overcoming the knowledge bottleneck that hampers real-world applications of NLP.

Selected publications

The following list is a selection from recent publications.

  • J Nothman, N Ringland, W Radford, T Murphy, and J R Curran (2012). Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence, Elsevier.
  • B Hachey, W Radford, J Nothman, M Honnibal, and J R Curran (2012). Evaluating Entity Linking with Wikipedia. Artificial Intelligence, Elsevier.
  • D Vadas and J R Curran (2011). Parsing noun phrases in the Penn Treebank. Computational Linguistics, 37(4). MIT Press.
  • S Clark and J R Curran (2007). Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. Computational Linguistics 33(4):493–552.
  • J R Curran, T Murphy, and B. Scholz (2007). Minimising semantic drift with Mutual Exclusion Bootstrapping. In Proceeding. of the Conference of the Pacific Association for Computational Linguistics, pp 172–180. Melbourne, Australia. Best Paper Award.
  • J Gorman and J R Curran (2006) Scaling Distributional Similarity to Large Corpora. In Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp 361–368, Sydney, Australia.
  • S Clark and J R Curran (2004) Parsing the WSJ using CCG and Log-Linear Models. In Proceeding. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pp 104–111, Barcelona, Spain.
  • S Clark and J R Curran (2004) The Importance of Supertagging for Wide-Coverage CCG Parsing. In Proceeding. of the 20th International Conference on Computational Linguistics (COLING), pp 282–288, Geneva, Switzerland.
  • J R Curran and S Clark (2003) Investigating GIS and Smoothing for Maximum Entropy Taggers. In Proceeding. of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp 91–98, Budapest, Hungary.
  • J R Curran and S Clark (2003) Language Independent NER using a Maximum Entropy Tagger. In Proceeding. of the 7th Conference of Natural Language Learning (CoNLL), pp 164–167, Edmonton, Canada.

Teaching Interests

Introductory and advanced programming, data structures, algorithms, software engineering, artificial intelligence, machine learning, computational linguistics.
Courses taught:

INFO1903: Informatics (Advanced)
ENGG1801: Engineering Computing
COMP5046: Statistical Natural Language Processing