13 April 2011

How Old Are The Indo-European Languages?

Russell Gray and Quentin Atkinson in their article in Nature in 2003 looked at the amount of lexical differences between the various living and extinct Indo-European languages and concluded that the language family was about 8,000 years old. Robin J. Ryder and Geoff K. Nicholls in an article in the Journal of the Royal Statistical Society: Series C (Applied Statistics) for January 2011 reproduced the same result, with similar data and a more refined version of the same model that addressed some of the methodological objections to the Gray and Atkinson study.

These dates are wrong. We can conclude that these dates are wrong because we have more sources of data than the models rely upon that strongly support a contrary result. Their models produce the wrong estimates of the age of the Indo-European language family because the models that they use make simplifying assumptions that have a material effect on the result. This post explains why these two studies (they aren't really independent as the later study uses data and methods that overlap the first) are wrong and why "the Kurgan theory [of Marija Gimbutas] that the spread started between 6000 and 6500BP" is a more accurate estimate.

Both studies use essentially the same inputs - a list of common words for particular things from as many Indo-European languages as possible, living and dead, and some estimates of the time period during which those word lists were valid. They then estimate the rate that languages change their vocabulary based on examples of languages that are known to have changed to certain degrees in known time periods.

The degree of similarity between languages in lexicon produces presumed relationships between the languages that closely mirror what we believe to be true by other means. The branches of the tree are essentially the same.

The number of years before present that any given two languages have a presumed common ancestor in these models is essentially a highly massaged way of describing the degree to which they are lexically similar. The fewer words a pair of languages have in common, the more distant the model presumes their common ancestor to be.

In some circumstances, specifically, when languages are already differentiated from each other and only have neighboring languages influencing their evolution that don't interact heavily and are from related language families, this assumption is reasonable. But, in critical situations it is not. In real life, languages exhibit punctuated evolution. They change rapidly when they first differentiate and when they co-exist with or are in close regular proximity to very different languages, while changing slowly when isolated.

The model is wrong because it fails to account for these facts, both of which produce more rapid language change than would result from the mere prolonged random linguistic drift of a language in isolation or near isolation. Put another way, the age estimates produced by these models is the oldest age that could be expected in an environment in which there is no real linguistic competition from other language families and one starts with a single common language, something that is a good description of the groups of languages used to estimate the rates of language change used in the model. But, there are circumstances in the history of the Indo-European languages whose lexical differences from each other are driving the old estimate for a proto-Indo-European language that would have sped up the rate of language change at critical junctures. Those circumstances, in fact, look quite similar to the circumstances where the model fails to produce accurate age estimates for languages with known ages.

Atkinson recognized this was a problem, and in a 2008 study fit the data to a more complex model that attempted to determine how much language change was attributable to change in the formative period of a language, and how much language change was due to random drift. In the Indo-European languages, Atkinson's effort determined that about 21% of language change was due to language formation effects, a result that neatly produces an estimated age of proto-Indo-European of about 6600 years BP, a match within the margin of error with the most widely accepted Kurgan hypothesis of Indo-European language origins.

What makes the evidence for the Kurgan hypothesis such much more plausible than the estimates produced by the two statistical estimates?

The Kurgan hypothesis looks at the earliest known cultures to speak Indo-European languages - the Indo-Aryans of India who composed the Vedic Sanskrit epics from which Hindo and other Indo-Aryan languages are known to descend, the ancient Persian authors of the early Indo-Iranian Avestia, the Mycenians who replaced the Minoans and the Pelagasians who spoke non-Indo-European languages, the Hittites in Anatolia whose language led to the extinction of the Hattic and Hurrian languages there, the Mittani who spoke an Indo-European language with Indo-Aryan affinities, the Tocharians who spoke an Indo-European language in the Tarim basin until well into the 1st millenium CE, the Celts, and the earliest Italic language speakers in Italy. Archaeology and physical anthropology (i.e. similarity of bones) and more recently genetics (of populations and ancient DNA) are then used to identify which archaeological cultures appear to be in continuity with each other, and which appear to represent a break from prior archaeological cultures. Periods in which there is continuity in archaeological culture are presumed to represent periods of relative linguistic continuity. Periods in which there is discontinuity in archaeological culture or a historical record of a language transition are deemed to be periods of relative linguistic change. By this means predecessor cultures are assigned to Indo-European and non-Indo-European categories. Cultural continuity of Indo-European and non-Indo-European languages are traced back until a juncture of all of the source in which words that have a common origin in distant branches of the language family were present. The cultures themselves are dated using carbon dating, tree rings, and other methods of dating archaeological strata.

Boundary Dates For Indo-European Cultures

This method provides some minimum ages of Indo-European languages.

Cultural practices like cremation which Vedic epics attest to being adopted around the time that the Indo-Aryans arrive in India appear around 3911 BP. A contemporaneous Akkadian historical record establishes that the Hittites conquered their second city state in 3765 BP and that before that date the Hittites ruled a single city (they would go on to rule almost all of Anatolia and the Northern Levant). Cremation starts to appear on the Pannonian plain and along some locations on the mid-Danube River basin around 4000 BP. From 3561 BCE- 3471 BP a Sanskrit speaking, Indo-Aryan god worshipping elite establishes itself in Mitanni empire in Northeast Mesopotamia.

Tarim basin mummies that remain similar in physical type which is physically and genetically similar to peoples of the same time period on the European steppe start to appear around 3811 BP and continue to have a similar appearance until about 1200 BP. J. P. Mallory and Victor H. Mair in "The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples from the West." (2000), suggest in a step critical to fixing an earliest possible proto-Indo-European date that, as Wikipedia paraphrases them:

[T]he Tocharian languages were introduced to the Tarim and Turpan basins from the Afanasevo culture to their immediate north. The Afanasevo culture (c. 5500–4500 BP) displays cultural and genetic connections with the Indo-European-associated cultures of the Central Asian steppe yet predates the specifically Indo-Iranian-associated Andronovo culture (c. 4000–2900 BP) enough to isolate the Tocharian languages from Indo-Iranian linguistic innovations like satemization.

This provides an archaeologically supported date for a possible split of the Tocharian language from the other Indo-European languages of no earlier than 5500 BP, and supports the Kurgan hypothesis because, per the Wikipedia article on the Afanasevo culture, this is an early and extreme eastern outlier culture whose "burials bear a remarkable resemblance to those much further west in the Yamna culture [5200 BP to 4200 BP in the Pontic Steppe], the Sredny Stog culture [6500-5500 BP "just north of the Sea of Azov between the Dnieper and the Don."], the Catacomb culture [4800-4200 BP in the Pontic Steppe], Poltavka culture [4700—4100 BP of the middle Volga River], and the Corded Ware Culture [5000–4350 BP aka the Battle Axe culture or Single Grave culture]."

The oldest Corded Ware sites are in Poland, and it is found in much of Russia and the Baltic sea area. It reaches the "Danubian and Nordic areas of western Germany" around 4400 BP and might be Indo-European or might be one of the last pre-Indo-European layers in part because, "In places a continuity between Funnel Beaker and Corded Ware can be demonstrated, whereas in other areas Corded Ware heralds a new culture and physical type." Also, specific origins much later in time have been identified for the Germanic, Balto-Slavic and Celtic languages. For example, the beginning of late Nordic Bronze Age in Southern Scandinavia around 2110 BP is a moment of major cultural upheaval when cremation begins to appear in Southern Scandinavia, many metal objects related to horses are found and there seems to be a strong coincidence with the appearance of the Germanic languages. The main expansion of the Slavic languages takes place around 1400 BP, after the fall of the Roman empire from the Southwest edge of the Slavic language speaking areas today.

So, if the Corded Ware culture was Indo-European, it may not have left Indo-European linguistic traces in much the same way that Indo-European Celtic languages left very little trace in France, Spain and Portugal were Celtic was replaced by Roman which was replaced by Romance languages.

Given the timing of the Afanasevo culture whose Indo-European identity can be inferred reasonably reliably despite a lack of direct linguistic evidence, the Tocharian branch of Indo-European languages must break off from the Sredny Stog culture, since all of the others are too recent to be a source for it, and the Sredny Stog culture becomes a strong candidate for the proto-Indo-European culture. (Gray estimates this date to be 6900 BP, which is about 1400 years too early.)

There is cultural continuity between the earliest known speakers of the Indo-Iranian languages and the Sintashta-Petrovka-Arkaim culture in the Southern Urals and northern Kazakhstan which flourished from roughly 4211 BP and 3611 BP that conducted chariot burials, engaged in copper mining and metallurgy, and was, according to ancient DNA genetically at least 90% West Eurasian on both the maternal and paternal sides in which at least 60% of the individuals overall had light hair and blue or green eyes. There is also cultural continuity between the earliest known speakers of the Indo-Iranian languages and the Andronovo culture which invented the spoke-wheeled chariot which is strongly associated with the Indo-Iranians around 4010 BP. Some genetic traits that are more common in high caste Indo-Aryan language speakers are believed to have West Eurasian origins.

The Urnfield culture emerges around the time of several collapses and upheavals in the Eastern Mediterranean, Anatolia and the Levant around the time of the Urnfield origins:

* end of the Mycenean culture with a conventional date of ca. 3200 BP
* destruction of Troy VI ca. 3200 BP
* Battles of Ramses III against the Sea Peoples, 3206-3211 BP
* end of the Hittite empire 3190 BP
* settlement of the Philistines in Palestine ca. 3180 BP

The Urnfield culture is found in Central Europe and Eastern France and Northern Italy and is commonly seen as an Indo-European cultural sucessor to the Bronze Age Indo-European cultures and a cultural predecessor to the Hallstatt culture from the 8th to 6th centuries BC (European Early Iron Age) and followed in much of Central Europe by the La Tène culture associated with the Celts. The late Urnfield culture probably gave rise to the Italic languages, which are closely related to Celtic as well, and may even have also been a source for the Germanic languages. The late Urnfield culture at its greatest extent reached far Northeastern Iberia. (Indo-European) Greek colonies were established in Iberia ca. 2710 BP.

Boundary Dates For Non-Indo-European Cultures

There are also ancient cultures we know did not speak Indo-European languages. Sumerian was a literary non-Indo-European language attested in writing from 5500 BP to 4200 BP in Mesopotamia (Iraq), until about 4015 BP when the Semitic Akkadian language was spoken, after which the Kassite empire from the mountains to the East where a Hurrian language (a non-Indo-European, non-Afro-Asiatic language related to the languages of the North Caucuses) was imposed, until it was defeated in the North by the Mittani and later by the Hittites.

We also have continous literary records of the Egyptians in Coptic from about 5000 BP until well past 3000 BP, at times through the South Levant, and of Semitic languages being spoken in the North Levant prior to their arrival in Sumeria. We also know that the Semitic languages and Coptic are part of the same language family.

Thus, there were no Indo-European languages spoken South of Anatolia prior to the Mittani for a prolonged period of time during which Indo-European languages prior to those first know to the historical record or the cultures that spoke them were in existence. Prior to the arrival of the Mittani, the people of that region in the mountains to the East of Mesopotamia spoke a non-Indo-European language.

We know that from roughly 4110 BP to 3465 BP that Linear A script used by speakers of non-Indo-European Minoan language, which was probably related to the pre-Indo-European Etruscan (Tuscany), Rhaetic (roughly speaking Venice), and Lemnian (a Greek island) languages. We also know that the Pelagasians who inhabited mainland Greece immediately prior to the arrival of the Mycenians (who spoke ancient Greek) spoke a non-Indo-European language.

The non-Indo-European Elamite language is attested in writing in Persia from 3511 BP to 2342 BP, and a system of proto-writing that appears to be in cultural continuity with it was used from 5100 BP or so.

The Hattic language was spoken for an indeterminate period prior to the emergence of the Hittite empire as the predominant language of North Central Anatolia and shows some similarities to the languages of the Caucusas Mountains. The non-Indo-European Kaskians regularly invade Hittite territory from the East coast of the Sea of Maramara and the adjacent Anatolian plains which the Hittites never manage to control ca. 3661 BP until they are defeated by an Assyrian king ca. 3160 BP.

There is cultural continuity between the Harappans of the Indus River Valley and the Sumerians. The Harappans used Sumerian crops very shortly after they were found in Sumeria, engaged in regular trade with Sumeria for thousands of years and had a linguistic minority district in Sumerian port cities, and used a system of symbolic seals similar to those used by the Vinca culture of the Balkans and the Sumerians. The Indus River Valley culture shows remarkable continuity and unity for about 4000 years until its collapse not long before the appearance of the Indo-Aryans ca. 3900 BP.

Since we know very little about what the Harappan language was like, and it was relatively isolated from the rest of the food producing world for so long, it would be possible to see it as the proto-Indo-European homeland rather than Central Asia. But, in that hypothesis it is hard to explain why so many genetic markers typical of Pakistanis (e.g. Y-DNA haplogroup L) are not found in many other Indo-Europeans at any frequency.

Prior to the Urnfield and Greek influences arriving in Iberia it appears that non-Indo-European language speakers, some of whom spoke Vasconic languages related to Basque lived in Iberia. These peoples (or at least some of them) based on place named and archaeological continuity appear to have been part of a common Atlantic megalithic culture that coicided with the Neolithic revolution in the Atlantic region.


In the period immediately following 4000 BP, which also coincides with the worst short term drought since agriculture was invented in the Middle East that also appears to have extended to the Indus River Valley, Indo-European peoples appear on the scene from Greece to Anatolia to the Balkans to Central Asia to North India to the Tarim Basin. There is rough consensus on the timing, ethnic identity and location of Indo-European language speakers in all periods after then outside of Northern and Eastern Europe where there is some dispute over whether particular cultures were or were not Indo-European language speakers.

Before that point in time, the Indo-Europeans were pretty much confined to the Eastern European and Central Asian steppe where they would have been distinguished by the horse culture and would have been predominantly herders in juxtaposition to their Old European farmer neighbors to the West.

Lexical analysis suggests an age of Proto-Indo-European based on an counterfactual assumption of gradual random linguistic drift of 8000 BP which coincides with the earliest LBK Neolithic expansion into Eastern Europe out of someplace in the vicinity of Anatolia and the Balkans. The Kurgan hypothesis sees cultural connections between the earliest likely Indo-European language speakers and the Early Kurgan (burial mound) culture in Eastern Europe and Central Asia of ca. 6500 to 6000 BCE, while placing the LBK farmers and Cucuteni-Trypillian culture [7500 BP to 4750 BP], of "Old Europe" in cultural continuity with the non-Indo-European language speaking people of pre-Hittite Anatolia and of Sumeria and the Levant. In an Anatolian hypothesis, the Cucuteni-Trypillian culture would have to be Indo-European and would have to have somehow extended its influence all of teh way to India with technologies like chariots that it did not possess, and the non-Indo-European language cultures that the Indo-Europeans displaced are hard to explain.

The uniform rate of linguistic change hypothesis would suggest that it took about 4000 years for the Indo-European languages to diversify to the extent observed in 4000 BP. The Kurgan hypothesis suggests that this happened much more rapidly, over about 2000 to 2500 years, probably due to an increased rate of language change as daughter languages distinguished themselves from each other, due to founder effects, and as a result of coming into intimate intense contact with competing non-Indo-European langauges as they expanded.

What a mere statistical model based on word comparisons between languages fails to provide that the Kurgan hypothesis does is to provide a step by step chain of cultures that could bring Indo-European languages to the places that they ultimately end up, and reasons (including domestication of the horse and development of the chariot and in the case of the Hittites, a monopoly on iron production coinciding with climate based weakness in neighboring cultures) that they were able to expand their linguistic reach.

