18 April 2011

What Are The Causes of Phonemic Diversity?

Atkinson's Serial Founder Effect Hypothesis

Quentin Atkinson ("Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa", Science 4/15/2011) has made headlines with his bold assertion that "the number of phonemes used in a global sample of 504 languages is also clinal and fits a serial founder–effect model of expansion from an inferred origin in Africa. This result, which is not explained by more recent demographic history, local language diversity, or statistical non-independence within language families, points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages."

The Underlying Data

He reaches this conclusion by massaging data from three maps prepared by Ian Maddieson from secondary sources in the World Atlas of Language Structure.  The maps clearly show that phonemic diversity isn't random, but don't necessarily support Atkinson's hypothesis either.

The maps show the number of consonants, with bins of

Small 6-14 (91 languages) 
Moderately small 15-18 (121 languages)
Average 19-25 (182 languages)
Moderately large 26-33 (116 languages)
Large 33+ (53 languages),




vowels, with bins of

Small (2-4) (93 languages)
Average (5-6) (288 languages)
Large (7-14) (183 languages),


and tone systems, with bins of

No tones (307 languages)
Simple tone system (132 languages) 
Complex tone system (88 languages),




in 504 languages around the world.

The Problems With Aktinson's Hypothesis

Atkinson's hypothesizes about the time-depth of these relationships and their mechanism aren't very persuasive. 

Regional Trends Are Present

Clearly, each of these features exhibit strong regional trends. 

The Amazon is vowel rich, consonant poor and often has a tone system.  The Caucasian languages are short on vowels, rich in consonants and usually lack of tone system.  Indigenous Australian languages without exception are vowel poor, lack tone, and have few to a moderately large number of consonants.  Papuan languages tend to be tonal and consonant poor.

Many West African and East African and Southeast Asian languages have complex tone systems, large numbers of vowels, and moderately large inventories of consonants.  Many indigenous languages of the Pacific Coast in the Americas are vowel poor, have simple tone systems and and tend to have high numbers of consonants.

Europe, South Asia, and North Asia tend to lack tone, have moderate numbers of consonants, and have moderate to high numbers of vowels.

The Data Are Too Noisy To Represent A Serial Founder Effect

Gleaning a relatively clear serial founder effect from these relationships seems far fetched.  The data sets are simply too noisy for that.  One can see clusters where the phonemic features seem to coincide.  One can fairly infer that regional clusters of similar phonemic features have a common origin.  One can even fairly infer that some common reason may account for the fact that non-adjacent regional clusters are phonemically similar.

The Data Don't Closely Track Language Families

Certainly it is true that the features don't seem to track language families. 

For example, Uralic Saami language is consonant rich, while the nearby Uralic Finnish language is consonant poor.

The only New World languages definitively tied to the Old World are the Na-Dene languages and the Yenesian Ket language.  Central Siberian Ket is vowel rich, consonant poor, and lacks a tone system; Na-Dene languages tend to be vowel poor and consonant rich, and some of the Na-Dene languages have a tone system.  Whatever the total number of phonemes is in these related languages, the case that Na-Dene's phoneme set arose from phoneme loss from its parent language is implausible.  Its phoneme losses and gains seem to have counterbalanced each other.

Some Niger-Congo languages have complex tone systems, other than no tone systems at all.  Some Niger-Congo languages (particularly toward Nigeria) are consonant rich, others (particularly towards Ivory Coast and Liberia) are consonant poor.

Areal Effects Are Inconsistent

Indeed, a linguistic trait that follows strong regional patterns while not closely tracking language family relationships is pretty much the definition of an areal effect in linguistics. 

But the source of these areal effects is obscure and inconstant.

Sometimes, the areal effects cross immense linguistic relationship divides.  The phoneme set of Basque, which is more distant from the other languages of Europe than any other for many hundreds of miles, is quite similar to that of many nearby Indo-European languages.  Yet, the phoneme set of the Caucasian languages is dramatically different from those of any of its neighbors, although languages of the Caucasus mountains do seem to have phoneme set sizes similar to each other in some cases despite not being very closely related linguistically and together they are very different from that of neighboring languages.

The Data Aren't A Close Fit To Pre-Historic Migration Patterns

The regions of phonemic similarity don't obviously track paths of deep pre-historic migration.  For example, if they did, we would expect continuity of phonemic trends from Africa to South Asia to Southeast Asia reflecting a Southern route migration.  Instead, we see similarities between Africa and Southeast Asia that are interrupted in South Asia.

Most Papuan languages have few consonants, but a couple of them which are not particularly closely related to each other, at odds with very nearby members of their respective language Trans-Papuan language genuses, have large inventories of consonants, and no Trans-Papuan languages have an intermediate number of consonants.  One imagines a scenario here in which one group intentionally develops a different phoneme set from their close linguistic neighbors in Papua New Guinea for no reason other than to make their language harder for their neighbors to acquire, thereby cementing group identity.

The Data Don't, and Shouldn't Reflect Deep Time Depth

Nor do they seem to obviously reflect linguistic time depth on the scale of tens of thousands of years.  All languages of the New World should have similar linguistic time depth, but show very distinct intra-American regional clusters in phonemic diversity.  Papua New Guinea and Australia were settled by modern humans at about the same time, but have dramatically different phonemic profiles. 

Semitic Hebrew is consonant poor, while Ethio-Semitic Tigre is moderately consonant rich, despite the fact that the languages diverged from each other within the last four thousand years.  The Semitic Arabic dialects spoken closest Israel, which are some of the closest linguistic relatives of Hebrew, are in between.

Irish and Hindi have large numbers of consonants and average numbers of vowels, while most Indo-European languages have average numbers of consonants and large numbers of vowels.  Yet, Irish and Hindi each probably branched off the related Indo-European languages with more typical Indo-European phoneme sets within the last four thousand years.  The Indo-European Hittite language had a particularly low number of vowels (just four) despite being attested at about the same time that Hindi and Irish diverged from their most closely related, but phonemically distinct Indo-European languages, possibly as a substrate influence from vowel poor non-Indo-European languages in Anatolia.

The number of phonemes in English has changed between Old English and contemporary spoken English, and isn't the same even for all dialects of American English, which started to diverge from British English and from each other only in the last few hundred years.

Overall, the evidence seems to indicate that phoneme change can take place in time frames of several centuries, while language family relationships can be pretty clear for periods of several millennia, at least.

The Overall Patterns Couldn't Have Been Produced By Serial Founder Effects

Serial founder effects are hard pressed to explain why the Amazon is consonant poor and vowel rich, by the North Pacific Coast of North America has languages that are rich in consonants and short in vowels, while both have a mix of simple tone systems and languages without tone systems.  Both groups of languages, presumably, have similar time depth and derive from populations that would have been close in space to each other and genetically similar about 17,000 years ago.

The trends also have an immense amount of noise in them.  The Tibeto-Burman Naxi language of Southwest China, for example, is consonant rich, vowel rich and has a complex tone system, giving it the greatest amount of phonemic diversity possible in the data set.  So does the Yulu language of Sudan (a Nilo-Saharan language), and the Nilo-Saharan Ngiti language of the Congo.   A serial founder effect shouldn't put any languages with maximal phoneme sets so distant from each other.

Simply put, the cline that Atkinson proposes isn't the kind of cline one would associate with a serial founder effect.  It is a vague general trend with tenuous foundations that is inconsistent with a serial founder effect.   Whatever accounts for variation between languages in phonemic diversity, it is not a serial founder effect.

There Are Patterns To Phonemic Diversity

The problem is a fascinating puzzle, but Atkinson seems to be largely barking up the wrong tree to explain it. 

Climate Zone And Phoneme Mix

For example, without going into the matter with any particular preconceptions, one would notice that tropical areas tend to have high levels of tonality and rich vowel sets, while temperate and desert areas have low levels of tonality.  One might explain that by concluding that tropical areas have a rich array of bird sounds that locals have an evolutionary fitness reason to learn (culturally) or have a genetic ability to distinguish (or both), and that a well developed capacity to distinguish sounds of these kinds for non-linguistic purposes makes it more natural for these societies to use these kinds of sounds linguistically among themselves.  Consonant frequency might be a factor that allows a language to have a sufficient number of words by compensating for a lack of ability to make different tones and vowel sounds.

Is this the correct hypothesis?  I surely don't know.  But, it is a better statistical match to the data than Atkinson's hypothesis and is plausible enough to make sense.

Contact and Isolation

Unusually high phoneme inventories might be a marker of high levels of inter-linguistic contact.  The Nilo-Saharans who have the highest phoneme inventories in Africa are situated betwixt Afro-Asiatic and Niger-Congo language speakers and may be borrowing phonemes from the areal influences of both.  The Naxi who have the highest phoneme inventories in Asia are situated between Tibeto-Burmese speakers who have one set of phonemes and Southeast Asian language speakers who have another one.  Likewise, the presence of click consonants in some Bantu languages, presumably because the population previously spoke a Khoisan language before shifting to a Bantu superstrate language, reflects the kind of phoneme gain that can take place from linguistic contact.

Unusually low phoneme inventories might be a market of prolonged population isolation with low population densities in places like Australia.  Phoneme inventories can also occur when language learners in a new superstrate language can't accurately reproduce all of the sounds in the superstrate language.  This may be what happened in the case of Hittite and other Indo-European Anatolian languages.  One also sees it when the Japanese borrow words from languages with more phonemes than its own language has into Japanese.

The number of phonemes in a language is really a matrix of a smaller number of options for sound making. One might, for example, appropriately think of tone systems as multipliers of the number of phonemes in a language, rather than mere additional phonemes. Perhaps the appropriate way to count is to look at the number of combined tone-vowel/consonant sounds in a language (also perhaps modulated by rhythm or otherwise), rather than examining phoneme differences in isolation. Perhaps there is a phoneme set size towards which languages trend when not artificially reduced by small isolated populations (Hawaiian is another famous example phoneme loss in addition to Australia, for possibly similar reasons) and when not artificially increased by phoneme sharing at the boundaries of phoneme regions.

Classes of Phonemes May Be The Most Suitable Level Of Analysis

Consonant and vowel count may be a poor measure in any case.  Many of the outlier cases in consonant number involve an entire classes of consonants or vowels found in one area, but not another, rather than individual phonemes. 

A lack of frictive consonants is a major factor in the low number of consonants found in Australia, Papuan languages, and low consonant languages of the Americas. 

High consonant count African languages are often notable for including labial-velar consonants (found elsewhere only in a small number of Papuan languages), and high vowel count in African languages are often notable for including a distinction between nasalized and non-nasalized vowels (otherwise found mostly in the Americas and sporadically elsewhere). The high consonant count in the Caucasian languages is largely attributable to the pharyngeal consonants used there, in some Afro-Asiatic languages, and in few other places.  Click consonants are also found only in one small, regional subset of Africans.

In contrast, distribution of the "th" sound is sporadic (and a poor fit for a serial founder effect).

Most of the cases of the losses of click sounds, labial-velar consonants, frictive consonants and perhaps even pharyngeal consonants and nasalized vowels, from parent languages that included them, might very well fit a serial founder effect model, since the distribution of these phonemic divergences are consistent with that model, while the distribution of the presence or absence of a "th" sound, linguistic tone systems and some kinds of vowel number variation may not. 

But, at any rate, the analysis probably needs to be conducted on a class of phoneme by class of phoneme basis, rather than on a total consonant set basis to make much sense as a phylogenetic tree.  Clearly, there were at least as many cases of independent phonemic invention in particular regions as there were cases of independent invention of plant and animal domestication complexes, and some kinds of phonemic invention (and linguistic invention generally) seem more prone to independent invention than others.

No comments: