Endangered Languages

A discussion on endangered languages and the role of information technology in attempts to preserve them by Elizabeth Brownlow, 2005.

Papuan painting


As with many other knowledge areas in the humanities, the future of linguistics and linguistic research is immutably linked to information technologies and the digitisation and digital storage of linguistic knowledge. In particular, digital technologies are especially useful in the contemporary preservation and the dissemination of information on endangered and dying languages. Accompanying these new capabilities, however, is a new series of debates over ethical concerns of both language death and language preservation. Language diversity is yet another casualty of the modern world, which raises one question that most linguists feel obliged to attempt to answer: what should we do?

Current Global Language Profile

The increasing contemporary globalisation has had a dramatic impact on the diversity of languages worldwide. Historically, colonisation, imperialism and population migration have always lead to the demise of languages as one language becomes more economically and socially advantageous to speak than another; for example, Tok Pisin came to replace many New Guinean languages as its use in trade increased. In the twentieth and twenty first centuries, however, this process has exponentially increased. Out of the estimated six thousand languages currently spoken worldwide, it is conjectured that - according to current trends - probably half will die during this century (Crystal 2000 and 2004; Tsunoda 2005; Wurm 1991); "the extent and rate of the ongoing loss in the world's linguistic diversity is currently so cataclysmic that it makes the word 'revolution' look like an understatement" (Crystal 2004:47-8). This acceleration can be attributed to aggressive colonisation by various European governments into such countries as Australia, New Zealand, North America and South Africa; to the increasing globalisation of trade and the pervasion of a 'language of trade'; to enforced nationalism wherein regional languages are legislated (almost) to extinction such as Basque[1]; to economic reasons (Crystal 2004; Bavin 1989; Tsunoda 2005; Wurm 1991). Just as prestige and hierarchies are a fundamental part of the formation of human society, so too are they present in the global system of language usage and death; in all the conditions mentioned above, one language is attributed higher status at the expense of all others. Therefore status is an vital factor in determining the 'health' and common usage of a language (Steele 11/09/03); fluency in a high-prestige language can provide speakers with far greater social and employment opportunities whilst low-prestige languages are lost to the younger generations.

Global regions
According to David Crystal (2004), a language dies when its last fluent speaker dies, or when its second last speaker dies. This latter argument is based on the definition of language purely as a communication system (Malik 2000; Wurm 1991), and it evokes the debate over whether or not linguists, communities and governments should attempt to preserve, document or revive dying languages. There are many arguments in favour of language preservation, the most comprehensive of which accounts for the context of language endangerment as well as the practicalities of the modern world. According to Tasaku Tsunoda (2005), language is immutably linked with culture and with cultural identity. From this perspective, therefore, allowing or causing language death is a loss of identity for the community involved. To evaluate this, the environment and history of the language's demise is essential; a large proportion of languages are endangered through colonisation and the destruction of native culture. In Australia, the systematic slaughter or removal - such as in Tasmania - of native people, combined with assimilation policies, institutionalised racism and the forced removal of mixed-race children, produced extensive resentment of the invading culture and therefore greater significance in retaining the indigenous culture. In this and many other situations, the preservation of language is the preservation of unique culture and hence identity, a connection between land, the people and their language that serves to maintain community solidarity and self-esteem (Tsunoda 2005). The problem is that, while there were an estimated two hundred languages spoken before European colonisation[2], there are less than fifty that are considered healthy - that is, that are taught to children and still used for meaningful communication (Bavin 1989). Furthermore, while "no language exists in isolation" (Crystal 2004:42) and it is natural for languages to affect each other in terms of syntax, phonology and lexicon, modern endangered languages are subsumed by the dominant language/culture at a rapid rate, so that often the language has already been altered by the time it is recognised as endangered, let alone recorded (Campbell and Muntzel 1989). In terms of cultural identity this loss can greatly impact the community, a factor that is often invisible to mainstream politics and society (Tsunoda 2005), and so from this perspective there is value in preserving the language diversity and uniqueness.

The implication of published work on endangered languages is that many linguists think that this decline in global language diversity should be prevented by rejuvenating or documenting 'endangered' languages. Not all academics, however, share this opinion. Kenan Malik, an academic specialising on, amongst numerous other fields, race and cultural issues, describes such efforts as nostalgic, "backward-looking visions" (Malik 2000) that ignore the basic function of language - to communicate - and argues that in a globalised world the ability for the majority of the population to communication in the same language is beneficial, not detrimental to culture. Additionally, Malik opposes the idea that each language uniquely shapes the way that its native speakers think and perceive the world, equating the contemporary arguments in favour of the preservation of linguistic diversity with "the same philosophy that gave rise to ideas of racial difference". This view is diametrically opposed to that of Tsunoda; it completely ignores issues of individual and cultural identity, especially in the face of colonialism, let alone language uniqueness. Malik rejects Crystal's (2004) equation of endangered languages to endangered flora and fauna (both which are internationally protected), arguing that universal communication is more important than minority languages which act as "barriers" to cultural interaction. While this has some currency from the perspective of globalisation and the 'global village', it is extremely parochial to suggest that the current accelerated assimilation of minority languages - especially where the assimilation has historically been forced, for example in Australia - is beneficial to those cultures which equated language with identity, and which are being subsumed by another language. Additionally, the argument that speakers of minority languages should adopt a dominant, more economically advantageous language completely misses one of the fundamental reasons underlying the original discrimination (intentional or otherwise); Malik's contention that adopting a dominant language and hence Western education is beneficial essentially blames the minority group for the cultural bias against them. This line of argument also assumes that discrimination will end with the removal of the language barrier; in contemporary Britain, racism against non Anglo-Saxons still exists on the basis of ethnicity, even where the English language is used by all.

Tsunoda (2005) notes that, in the past, conflicts over the direction and purpose of linguistic research have arisen. In contemporary research, greater efforts are being made to address such ethical issues and transform the primary motivation from self-interest - where this is the case - to community-directed, community-benefiting research into linguistic heritage.

IPA & Information Technology

A major problem with language preservation is the methods of recording information. To overcome ambiguities of non-phonetic spellings, the International Phonetic Alphabet (IPA) notation was developed[3] to represent all phonetic constituents of speech - consonants, vowels, intonation, stress, syllabification, segment length, and other distinctive features (see Ladefoged for more detailed information). Appendix One contains the complete charts of these symbols. Although largely based on the Latin and Greek alphabets, there are also many original symbols used, making IPA notation complex and detailed to apply properly. Theoretically, this comprehensive system should facilitate the unimpeded exchange of phonetic information between linguists; the universality of IPA, however, is a myth. Although the IPA alphabet provided by the International Phonetic Association is itself standard, variations often appear (eg for diacritics, suprasegmentals, and vowels) in usage, as well as in 'improvements' made by other linguists. In 1995 Clive Upton reworked the existing IPA system, and although his changes were logical improvements based on phonological changes in vowel pronunciations that have occurred since the 1930s (Wells 2001), the presence of a second standard can be confusing. While the difference between /e/ (IPA) and /E/ (Upton) is self-evident, the character /a/ is used by both systems to represent different sounds; while not insurmountable - a cursory examination of the transcript should allow the reader to determine which standard is being applied, the differences being analogous to differences in spelling between Australian and American English - this is further complication can obfuscate the actual value of IPA. Since the central concern of linguistics is the accurate, reliable documentation of phonological systems, any confusion occurring in the documentation of dying languages may lead to inaccurate interpretations of research. For community and linguist alike, accurate preservation is ethically fundamental.

While IPA is a very effective tool to use by hand, digitally there are several problems since it is only based in part on the ASCII character set; many other symbols are derived from Greek, Latin or Cyrillic letters, or are entirely original (see Appendix One for complete IPA character sets). Therefore the majority of characters used are not contained within the standard fonts found on computers. This is a major problem since it impedes conversion of transcripts into accessible, digital files. To accommodate this, SIL (the Summer Institute of Linguistics) has created specialised IPA fonts packages for both Windows and Macintosh[4], comprising three fonts containing all the elements of IPA as scalable[5], TrueType[6] characters based on Times, Helvetica and Prestige styles. In making this and other pieces of linguistic software freeware, SIL asserts its support of "the spirit of academic community" and not-for-profit knowledge development. It is a vital tool in the digitisation of linguistic information, allowing the easy creation of text documents containing IPA symbols within normal word processing programs. More importantly as freeware it facilitates the ready dissemination of such information, which is fundamental to the preservation of knowledge.

IPA is also the most problematic element of linguistic data to publish online. To display phonetic information on webpages, designers have two options: to transform IPA text - created using SIL fonts - into images, or to employ an altered version of IPA that uses ASCII characters. For webpages aimed at non-academic linguistic audiences, or not containing complex phonological analysis, then the latter alternative is acceptable and commonly occurs. William Foley's webpage '"The Flood": A Story in the Yimas Language of New Guinea', for example, uses 'N' and 'ny' to denote the IPA characters /N/ and /nJ/ respectively in the phonetic transcript. As this webpage is part of a showcase of the research aimed at prospective students, and therefore a general audience, then this method of presentation is acceptable. Additionally, this method is more attractive to academics with little time or IT expertise. The redevelopment of this research website is currently underway , where contrastively this information will be presented in embedded graphics. As such, realistic rather than adapted transcripts can be used, which is the best interest of linguistics in general. Appendix Two contains extracts of Foley's page content in both its original adapted IPA and proper IPA; the difference between the two is marked.

Digital Preservation

Digital technology has played a large role in the preservation process, most notably because it readily allows the storage and dissemination of linguistic information such as is not possible with paper-based archiving. This is especially important since language documentation - if done for its own sake - is a purely academic exercise without practical application. To this end, the internet, computer databases and specialised software are used to contain and revitalise endangered languages.

There are issues of accuracy and comprehensiveness in the preservation process. Firstly, where a language is certain to become extinct without intervention, then any material recorded is better than nothing. Secondly, 'live' languages are constantly evolving and not uniform. The best example of this is the current diversity of English; accents and colloquialisms are an essential aspect of the common usage of any language, and are usually regionally-specific. Therefore any material recorded is simply a 'snapshot' of how a specific individual/group speaks at one point in time. In terms of language preservation, it must therefore be understood that what is retained is not the living 'language' but elements of the original. Nonetheless, this is no reason to not preserve a dying language. Crystal (2000:162) summarises the situation: "The revived language is not the same as the original language, of course; most obviously, it lacks the breadth of functions which it originally had, and large amounts of old vocabulary are missing. But, as it continues in present-day use, it will develop new functions and new vocabulary, just as any other living language would, and as long as people value it as a true marker of their identity, and are prepared to keep using it, there is no reason to think of it as anything other than a valid system of communication".

Paradisec at work

A digital recorder at Paradisec

Digital technologies provide a new avenue for the preservation of written and oral linguistic material, music and images of aspects of culture. One organisation that utilises such technology is PARADISEC, the Pacific and Regional Archive for Digital Sources in Endangered Languages. Funded by the Universities of Sydney, Melbourne, New England, ANU and the Australian Research Council, its purpose is to restore, digitise and archive linguistic and cultural material of the Pacific, Oceania and Southeast Asia. With such academic affiliations, and also offering commercial services, PARADISEC has created a research base in a previously unoccupied niche. Since the Asia-Pacific region contains approximately one third of the world's estimated six thousand languages (Crystal 2004; PARADISEC), the majority of which it is predicted will become extinct within this century (Tsunoda 2005; PARADISEC), from a linguistic perspective this is a vital service both for the academia and the communities involved. Technologically, the storage, widespread communication and physical preservation of information is more feasible and reliable digitally than in any previous format - the now obsolete reel to reel magnetic tapes are prone to mould and degradation - therefore the work of PARADISEC is advantageous to future research. There are also many other digital archives concerned with the preservation of endangered languages. OLAC, the Open Language Archives Community, "is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources." AILLA (The Archive of the Indigenous Languages of Latin America) and ASEDA (The Aboriginal Studies Electronic Data Archive) also perform similar functions, but with more a culturally specific focus. It is important to note, too, that these organisations complement each other, occupying separate niches within the considerable field of linguistic research; indeed the list of endangered languages archives is considerable.

Websites are an important aspect of contemporary language preservation efforts. Used for promoting organisations, as bulletin boards and archive portals, they are a vital tool for communication and the advertisement of an organisation's activities. By raising public awareness websites can publicise the situation, be used to organise information and activities to document and preserve languages. Due to the large scope of language preservation efforts worldwide, this essay will focus on those concerning endangered Aboriginal Australian languages. As previously stated, there were only approximately fifty healthy Aboriginal languages remaining in Australia fifteen years ago (Bavin 1989), and today that figure is approximately twenty (Tsunoda 2005). Furthermore, according to Crystal (2004), a 1999 study by SIL Ethnologue[7] found that of the fifty one languages in the world that had only one fluent speaker remaining, twenty-eight of them were in Australia; this figure has probably also decreased. For reasons of cultural identity, efforts are being made by many communities to document, preserve Aboriginal language and culture, often using a website. Ara Irititja is one such site; centring on a database and message board, its intention is to raise awareness within the general population, to mobilise support for the preservation of several specific Aboriginal languages and to allow universal access to digitised cultural and linguistic information. The focus for such community websites differs greatly from their academic counterparts; the motivation is personal rather than academic, and the contents are presented from an inside rather than outside perspective.

The other major digital tool for language preservation is specialised software that stores and presents sounds, images, and syntactic and phonetic information. One piece of software is 'Kirrkirr', a database designed to display word-networks. The sample dictionary provided contains data on Warlpiri, an Aboriginal Australian language. This program is important for three reasons. Firstly, it creates a dictionary list searchable in both English and the specific language, along with sound recordings of pronunciations, that visually illustrates the network nature of a lexicon; synonyms, antonyms, related words, alternate forms, collocations etc for every entry are displayed as networks (see Appendix Three for example screenshots). Secondly it can be used with "almost any dictionary in XML format", making it more versatile than a single-language dictionary database. Thirdly, it is freeware, thus increasing its availability and usage, supporting any preservation efforts. While Kirrkirr cannot document the full complexity of a language, and is not a stand-alone teaching course, it is an effective tool that can preserve, organise and reteach aspects of a language.


The world's languages are dying at an unprecedented rate. The debate over whether or not to attempt to preserve dying languages, either through documentation or revival, is contentious, sparking many questions: does preservation benefit the community involved? How can accurate preservation be achieved? Are linguists who document languages acting in the interests of academia or for a 'greater good'? The greatest impediment to those in favour of preservation, however, is not resistance but apathy and ignorance: language preservation is not a 'popular' issue, and the large majority of the global population is unaware of the rate of language death (Tsunoda 2005).

Currently, various linguists, governments and communities are attempting to preserve and revive endangered languages through, amongst other techniques, the development if IT solutions. Such applications, especially freeware software, give the wider community access to information and issues that would otherwise remain unseen. As Tsunoda (2005) states, raising awareness of language death is the first, essential step that, for the majority of endangered languages, has not yet occurred.


[1] The twentieth century Spanish dictator Francisco Franco, in the name of a unified Catholic Spain, prohibited the use of regional languages, one of which was Basque. The language survived due to active efforts to maintain it during the dictatorship, and it has since gained official status. It is, however, still endangered.

[2] Due to the lack of records, the ensuing destruction of Aboriginal culture and the difficulty in establishing boundaries between distinct languages and related dialects, the exact figure is impossible to gauge.

[3] IPA was created by the International Phonetic Association, which was established in Paris in 1886. The first official version of IPA was published by Paul Passey, the Association's leader, in 1888, and was based on the work of previous phoneticians. Subsequent amendments have been made over the years, most notable are the IPA Kiel Convention (1989), and revisions in 1993 and 1996. Source: Wikipedia.com.

[4] Although a few other fonts systems do exist, the SIL fonts package is the most comprehensive, widely used, and is endorsed by the linguistic community. Therefore it is the only fonts software that will be discussed here.

[5] According to Webopedia, a scalable font is one in which the shape of each letter is defined and not the absolute size, and hence it be scaled to any size.

[6] According to Webopedia, TrueType is an outline font technology that allows the application of different type faces a font. SIL IPA93 allows regular, bold, italic and bold italic variants.

[7] SIL maintains an online community journal, titled Ethnologue, which is concerned with all aspects of language death and preservation.



