01 November 2010

Out Of Africa And Into India (Updated)

Mitochondrial DNA (mtDNA) and Y-DNA diversity are uniparental genetic markers (inherited in the female and male lines respectively) that are central to understanding the pre-historic adventures of modern humans. The distribution of lineages of these genetic markers, the distribution of Neanderthal DNA in modern Eurasians, and archaeological evidence is suggestive of the idea that the ancestors of modern Eurasians may have been spent an extended period of time becoming a distinct genetic community somewhere in greater India.

The Maternal Line Story

Almost mtDNA types which are believed to be indigenous to Eurasia are absent in Africa, and there is rather convincing evidence that the few exceptions (e.g. mtDNA haplotypes M1 and U6 and mtDNA haplotype H1 in the North African Tuareg) are the product of "back migration" to Africa rather than having an African origin. Likewise, all mtDNA types which are believed to be indigenous to Eurasia are genetic descendants of just on subtype of African mtDNA (L3* sometimes described in the older literature as L3a). Even "relict" populations like Aboriginal Australians and indigenous Andamanese Islanders have distinctive Eurasian mtDNA descended from one of the three main mtDNA lines.

All mtDNA types found in Eurasia all derive from three types not believed to be native to Africa, called haplotypes M, N and R (with R being a descendant of N). M is found ingeniously only in Eastern Eurasia, while N and R are found in both Western and Eastern Eurasia.

Basically, all Eurasian women have descended from three women (not necessarily contemporaries of each other, and one of whom (R) was a descendant of one of the others (N)) something like 70,000 years ago. All three of those women, in turn, were descended from a single African woman who probably lived somewhere around Ethiopia, a few thousand years earlier, where the mtDNA type from which all Eurasians descend, L3*, is most common today.

The Paternal Line Story

There is a parallel development on the Y-DNA side. Eurasian Y-DNA is descended almost entirely from Y-DNA haplotypes C, D, E and F (C and F have one common ancestor, while D and E have another, and all descend from a single "African Adam" who would have been in the earliest modern human populations).

African Y-DNA is descended almost entirely from Y-DNA haplotypes A, B, and E (all of which which share a common "African Adam" ancestor with C, D, and F). And, only some subhaplotypes of Y-DNA haplotype E are found in both Africa and Europe, with none being indigenous to East Eurasia.

Moreover, Y-DNA haplotypes C and D are indigenous to East Eurasia only, while Y-DNA haplotype F is found in both West Eurasia and East Eurasia, and the Y-DNA haplotype E subtypes found in Eurasia are found indigenously only in West Eurasia.

Just eight individuals, out of tens of thousands who have had Y-DNA haplotypes tested worldwide in tests sufficiently discriminating to discern this haplotype, have been identified as having the Y-DNA type that is ancesteral to Y-DNA haplotypes D and E, which is called DE*, five in Nigeria, one in Guinea Bissau and two in Tibet.

New data from a major whole genome genotyping study shows that the divide between the Y-DNA DE and CT lineages is even deeper than previous analysis had suspected (source): "14.4. Y chromosome Haplogroups . . . It . . . shows evidence for a deep division between haplogroups DE and CT, previously identified only by a single marker (P143); Karafet, Mendez et al. (2008)."

Y-DNA haplotype D in detail

The distribution of Y-DNA haplotype D and its subtypes is idiosyncratic.

The unmodified D* variety is found in Tibet, in non-Chinese Tibeto-Burmese language speaking populations outside Tibet, in speakers of languages that belong to the same family as Thai and Laotian (Tai-Kadai), in Andamanese Islanders, and in small numbers in North Asians and Pacific Ocean Islanders; the most basal forms of Y-DNA subhaplotype D* are found in Tibet.

Haplotype D1 (which differs by different mutations than D2 or D3 from D*) is found among Tibetans, the Han Chinese, non-Chinese Tibeto-Burmese language speaking populations outside Tibet, speakers of the languages that belong to the same family as Thai and Laotian, and speakers of the Hmong-Mien languages of Southeast Asia. Per the reference above, "The southern ethnic populations (Daic and Hmong-Mien) form a relatively separate cluster from Tibetan and Tibeto-Burman populations in the D1-M15 sub-haplogroup." The are a few Northeast Asian subtypes in the Tibeto-Burmese subtypes of subhaplotype D1, but are no Andamanese or Japanese peoples in Haplotype D.

Haplotype D3 is found almost entirely among Tibetans, Han Chinese, and other Tibeto-Burmese langauge speakers, with the most basal subtypes of D3 found in Tibet.

Y-DNA haplotype D2 which differs by different mutations than D1 and D3 from D* is found exclusively among the Japanese, where other Y-DNA haplotypes are absent. Haplotype D2 is generally associated with the more ancient (30,000 before present or more) Jomon population of Japan which was the earliest known civilization to make pottery and subsisted mostly on fishing, believed to be best represented in modern Japanese populations among the Ainu who are 85% Y-DNA haplotype D2 (with the balance being Y-DNA haplotype C3 which is common in an adjacent population with whom the Ainu have long had trade relations). Ethnic mix of Japan we see today developed when East Asian warriors on horseback with influenced from the Han Chinese and Northeast Asia entered the islands and started to rule them around 2500 years ago, give or take a few hundred years.

The estimated divergence times of the Y-DNA D subhaplogroups (based on a mutation rate clock) is as follows:

D* 66,392 ± 1,466 years

D1 51,640 ± 2,563 years

D3 52,103 ± 1,327 years

D2 37,678 ± 2,216 years

The estimated divergence dates for D*, D1 and D3 pre-date the arrival of humans (or for that matter hominins) in the Americas, Australia or Papua New Guinea.

In both the Ainu and Andaman Island cases, the evidence is suggestive of the idea that the populations were entirely Y-DNA haplotype D and mtDNA haplotype M in a localized group of subtypes. All Great Andaman Islanders belong to one of their own distinct mtDNA subhaplogroups of mtDNA haplogroup M. The proportion of non-M mtDNA in the Ainu is comparable to the proportion of non-D Y-DNA in that population and similarly shows affinities with neighboring populations. There is also suggestive evidence that the Paleo-Tibetans may have had a modal Y-DNA haplogroup D and a modal mtDNA subhaplotype M16.

Y-DNA haplotype E in detail

Within Y-DNA haplotype E, subtypes E1b1a, E1b1b*, and E2 are indigeneous to Africa. E1a is most common in West Africa, but found at low frequencies (1% to 3% in parts of Italy and Portugal that interacted with North Africa in historic times) on the Southern European Coast probably as a result of an expansion first from West Africa and then to North Africa at some point in the historic era.

Y-DNA subhaplotype E1b1b generally speaking seems to track the historical distribution of Afro-Asiatic language speakers (a language family that includes Hebrew, Arabic, ancient Egyptian, many languages of Ethiopia, and many of the languages of the herder peoples of North Africa, East Africa and the Sahel). It was also probably a component of the early colonization of Europe, perhaps after or perhaps before the Last Glacial Maximum although there are multiple theories regarding the demographic waves in which this Y-DNA subhaplotype spread, some placing its arrival in Europe and the Near East around the time of the dawn of agriculture and herding, or a few thousand years earlier when proto-argiculture that did not involve true domesticated species of plants and animals had begun.

The absence of Y-DNA haplotype E in East Asia, and the greater overlap of E haplotypes across the divide from Africa to Europe, suggests that it may have been added to the West Eurasian mix sometime after the formative period for mtDNA M, N and R, and Y-DNA haplotypes C, D and F in a hypothetical community somewhere in Africa distinct from other African populations or not long after the Out of Africa migration.

Where Did Eurasian and African Uniparental Markers Evolve?

The question this presents is "where did the distinctively Eurasian uniparental genetic markers evolve?"

Increasing evidence appears to support South Asia (i.e. Pakistan, India and Bangladesh) as the Eurasian Eden where the Out of Africa population became genetically distinctive before spreading across Eurasia. There is evidence to suggest that the mtDNA haplotypes M and N and R all arose in South Asia and then dispersed to different regions of Eurasia where they evolved further.

On the Y-DNA side, Europe appears to be "a “receiver” of intercontinental signals primarily from Asia". The CF lineages of Y-DNA (also sometimes called the CT lineages) could also have arisen in this Eurasian Eden, although the Y-DNA haplotype D lineages, which are not found indigenously in India (outside Tibeto-Burman speaking populations) seem unlikely to have derived from the same pool of people.

There is also evidence from the genetic diversity of South Asia to "suggest that between approximately 45 and 20 kya most of humanity lived in Southern Asia." This interpretation also has support in the archeology of South Asia. Population size is key to the pace of evolution, because the number of mutations found in the DNA of any given child from a parent is believed to be roughly constant, and population size therefore is the main determinant of the number of mutations that can take place in any one generation in a population. Evolution takes place when one of those mutations makes its way into the gene pool in any significant and enduring frequency.

Archeology is also pointing to the presence of modern human populations in South Asia earlier than traditionally assumed, probably at least 70,000 years ago, and possibly significantly earlier. One recent find of teeth and a jaw in a cave about 60 miles from the Chinese coast near the Vietnam border would suggest that modern humans or archaic humans with some modern dental features, were East Asia 100,000 years ago, around the same time that modern humans were first present in the Levant (a presence from which there appears to be a retreat for tens of thousands of years probably due to climate related reasons). These dates are still well after the first evidence of modern human populations in Africa, but are earlier than the previous 60,000 years ago estimates of the date of an Out of Africa event.

It also appears that the basic regional outlines of the genetic makeup of broad subcontinental regions in the world was set in the initial dispersal of Upper Paleolithic modern humans from the place where Eurasian genetics became distinctive from African genetics in uniparental markers. The split of South Asian populations from populations in Europe and further East in Asia appears to date from around 50,000 years ago, and the evidence is "highly suggestive that India, Trans-Caucasus and the regions between them were the birthplace of the mitochondrial DNA haplogroups which are now widely spread throughout Europe." (Specifically, the mtDNA lineages derived from R1* and HV). The regions in between them correspond to Eastern Turkey, Iran, Afghanistan and Pakistan on political maps today.

Another suggestive bit of evidence as to the location of the population that split between Europe and South Asia is the fact that the Northern Israeli Druze, who claim an ancesteral homeland somewhere in the mountains of Anatolia or Iran, are the only population with high frequencies of both the European and the Asian varieties of mtDNA haplotype X as well as the basal form of that mtDNA haplotype. The fact that this location coincides with the location suggested by other evidence quoted above, coroborates this theory.

Neanderthals, Humans and a Secondary Eurasian Homeland in India

There is also evidence from comparisons of Neanderthal and modern human whole genomes that all modern Eurasians today have roughly the same amount of Neanderthal DNA (1%-4%) relative to Africans, which would suggest that any admixture of modern humans and Neanderthals happened in the formative period of the Eurasian genome.

This was a surprising result. It contradicted evidence from mtDNA markers (also here) and Neanderthal Y-DNA (and here) that showed very distant links between Neanderthals and modern humans with profiles different from all modern humans whose uniparental markers have been sequenced, and that Neanderthals contributed less than 0.1% of modern human mtDNA.

Of course, one doesn't have to be very creative to come up with scenarios where uniparental marker estimates of genetic contributions are far lower than overall genetic contributions. For example, if admixture overwhelmingly involved isolated instances of Neanderthal and modern human intercourse, rather than stable relationships, the children stayed with their mothers, and offspring of Neanderthal mothers went extinct along with all other Neanderthals, then none of the offspring raised in modern human communities would have had Neanderthal mtDNA, and only half would have had Neanderthal Y-DNA even in the first generation. If the original modern human population with which Neanderthals might have admixed was so small that very few instances of admixture were involved (for example, a couple of instances in a band of modern humans with just a dozens of individuals in it), where random chance alone could easily have caused the Neanderthal Y-DNA line to die out. And, it wouldn't be too surprising if male Neanderthal-modern human hybrids were less successful in producing offspring than female Neanderthal-modern human hybrids. For example, female Neanderthals looked more like modern humans than male Neanderthals, something that would also presumably have been true of the hybrid children, so the girls may have had an easier time blending in to the modern human community than the boys.

An admixture rate of 1%-4% implies that the average person had one out of 32 great-great-great-great-great grandparents, or one out of 64 great-great-great-great-great-great grandparents at the time that the genetic proportion became fixed in the modern human population, or less if there was natural selection in favor of those traits, which seems likely given that the Neanderthals had longer to adapt to the local environment.

This result was also different from the expectation of those who believed that admixture had taken place, because they expected that the Neanderthal contribution would be greater in Europe where Neanderthals co-existed with early modern humans for what may have been thousands of years, than in Asia, where they did not. There are several cases, for example, where skeletal remains seem to show hybrid individuals in Europe. Also, Upper Paleolithic modern humans showed greater apparently bodily similarity to Neanderthals than their successors: "Upper Paleolithic humans (about 30,000 years ago) are about 20 to 30% more robust than the modern condition in Europe and Asia." This could have been convergent evolution, but if there was admixture of Neanderthal and modern human populations, this could also account for those traits.

It could be that there was greater admixture of Upper Paleolithic Europeans and Neanderthals, but that the genetic traces of Upper Paleolithic Europeans with higher admixture with Neanderthals was greatly diluted by later population waves from less admixed populations from the Near East or Southern Europe after Neanderthals went extinct. One time period when this could have happened would be the repopulation of Europe from Southern refugia after the Last Glacial Maximum around 20,000 years ago. Another, which is supported by the great difference in genetic makeup of early hunter-gather populations and early farmer populations in Europe based on ancient DNA samples, would be the population replacements that took place when farming populations from the Near East replaced many of the hunter-gatherer populations of Europe. A third time period could be from later population replacements, possibly in connection with the rise of Indo-European languages in Europe. Ancient DNA at any rate, provides strong evidence of at least one sort of major upheaval in European population genetics at some point after the initial spread of agriculture there, and does not rule out the possibility that there was more than one. Even if no one wave completely replaced the preceding population, the effect of multiple waves of migration of new peoples into Europe could have diluted the genetic traces of Neanderthal admixture in Europeans suggested by early skeletal remains to the point that they are unrecognizable relative to random variations in ancestry between individuals.

For example, if early Upper Paleolithic humans were 25% Neanderthal, and made up 20% of the gene pool of post-LGM Upper Paleolithic humans, who in turn made up 20% of the gene pool of Neolithic Europeans, who in turn made up 80% of the gene pool of contemporary Europeans, the Upper Paleolithic Neanderthal contribution to European DNA would be an indistinguishable 0.8% more Neanderthal DNA than their non-European Eurasian counterparts, in a pool that is overall 1% to 4% Neanderthal, varying randomly. And, European populations with a larger Paleolithic hunter-gather ancestral contribution (e.g. Estonians or Saami), might have a larger Neanderthal genetic component than members of the general population, escaping one or more of the waves of dilution of Upper Paleolithic modern human genes that the initial comparison of just two European genomes to Neanderthal genomes didn't capture.

In any case, the implication of the most recent Neanderthal DNA findings is that there was probably only a single instance of admixture between modern humans in Eurasia and Neanderthals, that it took place sometime before modern Eurasian humans dispersed from a common homeland, and that homeland must have been at least at the fringe of the Neanderthal range during the period when the two types of hominins co-existed.

What About Y-DNA haplotype D?

Y-DNA haplotype D, however, may not have been a part of this formative South Asian population. It is not currently found in India, West Asia, Africa or Europe (with the possible exception of some populations admixed with historically Northeast Asian populations in pre-historic or historic times) in indigenous populations. Y-DNA haplotype D is also not found in places where a wide mix of other haplotypes are present. it is not found in the indigenous populations of Australia, Papua New Guinea, or the Americas.

Some of the populations it is found in were isolated on islands for more than ten thousand years before coldest part of the last ice age, and are believed to have remained genetically isolated until the last few thousand years (in the case of the Japanese), or even until the present (in the case of the Andamanese). The estimated mutation clock ages of divergence from prior types of these subhaplogroups is consistent with this timeline.

The center of gravity of this haplotype appears to gravitate out of Tibet, which the most diversity of subhaplotypes and the most basal examples of the haplotype, including the deep root DE*, otherwise found only in West Africa, are found.

But, the fact that modern populations are found in Tibet doesn't necessarily mean that this was always so. The populations in Japan and the Andaman Islands are both suggestive of a Great Southern Migration coastal route population, probably dating back to a period before 50,000 years ago. Remote mountains and islands, where we find much of the Y-DNA haplogroup D population are classic refugia for relict populations whihc go extinct elsewhere when a population that drives them out of their homelands appears on the scene. And, the only way that genetically related people could have gone from one part of Asia to another was on foot (or in some cases, possibly by short range coastal boats). No storm could possibly have carried the Jomon by boat from the Andaman Islands or Tibet to Japan. Indeed, the great typhoons of the region generally travel from East to West. Presumably, then, Y-DNA haplogroup D once had a much more continous distribution than it does today.

The rarity of Y-DNA D haplogroups in historically Northeast Asian populations, likewise, may be a function of the D haplogroups being common prior to the Last Glacial Maximum (ca. 20,000 years ago), and other haplogroups being more common in the populations that repopulated Siberia after the Last Glacial Maximum (a point in history that made islands more easily reached due to a lower sea level, while making Northern and high elevation areas less habitable, with the timing of this illustrated, for example, here).

The mtDNA of aboriginal Southeast Asian populations suggests "at least 4 detectable colonization events . . . respectively dated to over 50,000 years ago, ∼10,000 years ago, the middle Holocene, and the late Holocene." Basal mtDNA subhaplogroups of the M, N and R lineages are all present in the earliest strata of aboriginal Southeast Asians, with the oldest dates coinciding with the estimated divergence times of Y-DNA haplogroups D1 and D3 that are found in Southeast populations today.

One can imagine, for example, that the ancestors of people in Y-DNA haplogroup D may have inhabited coastal regions of Sundaland around 52,000 years ago, with one group branching off to Japan around 35,000 years ago, and then retreated inland as sea level rose (with the exception of the trapped Andamanese and Jomon) as their homelands were innundated and then three successive waves of outsiders appears starting around 10,000 years ago, driving some populations into the mountains of Tibet.

If Tibet is a refugia, however, where did its people seek refuge from?

The story turns out to be anything but simple.

The genetic adapations of modern Tibetans from a presumed ancestral and non-adapted population to high altitude climates is dated to a mere 2,750 years ago. But, archaeology indicates that there have been people living in Tibet for more than 20,000 years, and the genetic evidence is suggestive of not one, but two pre-LGM waves of human colonization into Tibet.

Both of these pre-LGM waves were mostly from Northeast Asia by peoples with DNA haplotypes that evolved further to the South and moved counterclockwise around East Asia, with the first dominated by Y-DNA haplogroup D and the second about twenty-thousand years later dominated by Y-DNA haplogroup O. The second wave of Y-DNA haplogroup O expansion may have been the one that pushed the people with Y-DNA haplogroup D to the North perhaps from Sundaland to Northeast Asia and Japan, and then from there to Tibet. Both pre-LGM populations would then have had to retreat to Southern Tibet's valleys as climate conditions grew more harsh, and they probably would have merged into a single ethnicity by then, if they had not done so already.

The oldest mtDNA haplogroups found in Tibet belong to both the M (which includes C) and N (which includes A) lineages.

Analysis of the mtDNA of Tibet suggests a predominantly Northeast Asian origin for the current Tibetan population (full paper here):

[T]he timing and routes of entry of modern humans into the Tibetan Plateau is still unclear. To make these problems clear, we carried out high-resolution mitochondrial-DNA (mtDNA) analyses on 562 Tibeto-Burman inhabitants from nine different regions across the plateau. By examining the mtDNA haplogroup distributions and their principal components, we demonstrated that maternal diversity on the plateau reflects mostly a northern East Asian ancestry. Furthermore, phylogeographic analysis of plateau-specific sublineages based on 31 complete mtDNA sequences revealed two primary components: pre-last glacial maximum (LGM) inhabitants and post-LGM immigrants. Also, the analysis of one major pre-LGM sublineage A10 showed a strong signal of post-LGM population expansion (about 15,000 years ago) and greater diversity in the southern part of the Tibetan Plateau, indicating the southern plateau as a refuge place when climate dramatically changed during LGM.

The body of the paper notes:

[E]xtremely low frequencies of the South Asian lineageson the plateau reflected the strong ‘‘barrier effect’’ of theHimalayas between the Indian subcontinent and theTibetan Plateau (Gayden et al., 2007; Kang et al., 2010).

Archeological sites of more than 20,000 years old(Huang, 1994; Zhang and Li, 2002; Yuan et al., 2007)supported a pre-LGM human activity on the Tibetan Plateau. Notably, some of these pre-LGM Paleolithicsites were found in the northern part of the plateau, where there are very few current inhabitants living with harsh climate, indicating that most regions of the plateau was developed into greater glacier and permafrost (Lehmkuhland Haselein, 2000; Zheng et al., 2003), which might have ‘‘closed’’ most of the plateau, resulting in a large decrease of the local human population.

Recently, Zhao et al. (2009) claimed that an infrequent haplogroup (M16) in the Tibetan populations may represent the genetic relics of the Late Paleolithic inhabitants on the plateau. In our study, we also found some old lineages (M62, A10, and C4d) which may be the remains of pre-LGM inhabitants, providing evidences for LGM survivals. However, the age of a single haplogroup may not be the age of a population demographic event. If the people who colonized the area carried part of the original diversity not just one founder, the TMRCA of a haplogroup can be much older than the colonization event, and this haplogroup can in some cases have reached much lower frequency in the source population because of drift. Conversely if the founding group had a small effective population size, the TMRCA of a haplogroup can be much younger than the colonization time. In this study, time estimates of several haplogroups resulted in similar oldages of around 22,000 years, making it less possible tobe an over estimate of the colonization time. Giving the inhospitable environment of the Tibetan Plateau and the population decreasing during LGM, the effective population size on the plateau might be very small at least in part of the history; therefore, the colonization time of the plateau could be older than our estimates. This consequence mostly supports the pre-LGM colonization of the plateau suggested by archeology findings.

The exclusive distribution of mtDNA sublineages on the plateau enabled us to estimate the colonization timeof the plateau. Some of these plateau-specific lineages(A10, C4d, and M62) seem to have evolved on the plateau for a long time, indicating that the first entry of modern human into the Tibetan Plateau might happened before LGM deteriorated the plateau climate substantially (Aldenderfer and Yinong, 2004). In this respect,subclade A10 provided further information about the pre-LGM settlers. Subhaplogroup A10 showed the highest frequency and diversity in Shannan and Shigatse of south Tibet. Considering the present suitable environment of south Tibet, especially that of Shannan, (Alden-derfer and Yinong, 2004), the warm valleys in south Tibet might have been the refugium for the pre-LGM settlers during LGM. In addition, this haplogroupshowed a signal of population expansion, dating back to 15 KYA, as a consequence of environment change after the LGM period. Phylogeographic analysis of another major haplogroupon the plateau, M9, revealed a considerable genetic component of post-LGM migrants. However, the ancestral haplotypes of M9 was not found in the Tibetan populations, but in the populations of Southeast Asia, Japan, and coastal China (Jiangsu and Shandong province in China, author’s unpublished data). Moreover, another lineage nested in haplogroup M9, subclade E, has a Pleistocene origination in east Sunda of Southeast Asia (Derenko et al., 2007; Soares et al., 2008). Combining the geographic distribution information of the M9 diversity, we supported the hypothesis that haplogroup M9 originated in Southeast Asia more than 50,000 years ago and evolved into the subhaplogroup E and M9a-M9d, and M9a-M9d subsequently migrated northward probably during Late Pleistocene (Soares et al., 2008). Totally, the presence of only derived sublineages in the Tibetan Plateau indicates a counter-clockwise dispersal in mainland East Asia as previously proposed (Chaix etal., 2008). The high frequencies of sublineages M9a andM9d on the plateau may result from either demographic effect or selective effect (Gu et al., 2008), which requires further investigation.

The high frequency of Y chromosome polymorphic Aluinsertion (YAP) [Ed. the mutation that distiguishes the Y-DNA haplogroups D and E from C and F derived Y-DNA haplogroups] in Tibet makes the origins of the Tibetans intriguing and controversial. Although multiple origins of the Tibetan people have been proposed by different studies in the last decade (Qian et al., 2000; Su et al., 2000), the sources and routes presented in such claims are highly debated (Qian et al., 2000; Su et al.,2000; Thangaraj et al., 2005; Shi et al., 2008). Central Asia has been considered as the one of the main sources of YAP (Qian et al., 2000; Su et al., 2000; Gayden et al.,2007). In this study, however, considering the extremely low frequency of western or central Asian mtDNA haplogroups (2.3%), it is less likely that Central Asian is amajor contributor, at least in the maternal aspect. Recently, Shi et al. (2008) demonstrated two independent Paleolithic dispersal events of modern human into EastAsia of 50 KYA (marked by Y haplogroup D-M174) and 30 KYA (marked by Y haplogroup O-M175 and its derivatives) (Shi et al., 2005, 2008). Haplogroup D and O are both dominant Y haplogroups in Tibet, indicating multiple migrations into Tibet. The routes Shi et al. proposed of the two ancient human dispersals are consistent with our findings of pre- and post-LGM migrations to the Tibetan Plateau. This concurrence, however, needs to be investigated in our further study.

The papers by Shi, et al. (2005, 2008) referenced above are:

Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD,Chakraborty R, Jin L, Su B. 2005. Y-chromosome evidence of southern origin of the East Asian-specific haplogroup O3-M122. Am J Hum Genet 77:408–419.

Shi H, Zhong H, Peng Y, Dong YL, Qi XB, Zhang F, Liu LF, TanSJ, Ma RL, Xiao CJ, Wells RS, Jin L, Su B. 2008. Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations. BMC Biol 6:45.

Later Developments

Most of the exceptions to this general trend are attributable to migrations in the historic era, such as the Indo-Aryan invasion of South Asia, the Austronesian colonization of Oceania and Madagascar, the Jewish and Romani dispersals to Europe, the Southern expansion of Chinese culture, the Bantu expansion in Africa, Semitic conquests in East Africa, the Neolithic colonization of Europe from the Near East, and the expansion of Turkish and Mongolian populations from Northeast Asia into Central Asia, Korea and Japan.

There are a few apparent migrations where appear to post-date the initial dispersal of Eurasian modern humans which are pre-historic. These include the back migration of certain populations from the Near East into East Africa and North Africa, and apparent genetic links between the Berbers of North Africa and the Saami of polar Europe. Also obscure, but probably representing some later migration, are the origins of the dispersal patterns of Y-DNA haplotype T (a distant descendant a Y-DNA haplotype F), which is found in Africa, Europe and South Asia.


There are other ways one can fit the data. For example, one can imagine a small modern human community, entering the Levant, admixing with Neanderthals there, retreating to Africa only briefly and leaving Africa again for South Asia, and then dispersing from there. One can also imagine a distinct proto-Eurasian population evolving in Ethiopia, and then expanding from Africa only once well defined there. The evidence of an Out of Africa and Into India theory is more suggestive than it is definitive. But, it isn't clearly ruled out by the data and does seem to largely fit the facts.


India as a Eurasian Eden is somewhat ironic, because there are good arguments that culturally, South Asia has borrowed much of its way of life culturally from outsiders. There are good arguments that most of India's food crops, early technologies, all of its surviving languages (except Andamanese and one or two nearly moribund relict languages) and its Hindu religion are cultural imports. Viewed from the perspective of India as a Eurasian Eden, however, these imports are basically back migrations of people's to the Eurasian points of orign.

Footnote On Lake Mungo 3 Man

The rare exceptions that appear to prove the rule that all mtDNA in Eurasia belongs to haplotypes M, N or R or a descendant of them, is one sample on ancient DNA from one set of very old remains in Australia, Lake Mungo 3, arguably from a date approximately 60 thousand years (ka) before present, which rules out a modern admixture possibility. As the abstract of the research paper describing the findings explains:

Lake Mungo 3 is the oldest (Pleistocene) "anatomically modern" human from whom DNA has been recovered. His mtDNA belonged to a lineage that only survives as a segment inserted into chromosome 11 of the nuclear genome, which is now widespread among human populations. This lineage probably diverged before the most recent common ancestor of contemporary human mitochondrial genomes. This timing of divergence implies that the deepest known mtDNA lineage from an anatomically modern human occurred in Australia; analysis restricted to living humans places the deepest branches in East Africa. The other ancient Australian individuals we examined have mtDNA sequences descended from the most recent common ancestor of living humans. Our results indicate that anatomically modern humans were present in Australia before the complete fixation of the mtDNA lineage now found in all living people.

Almost all other evidence dates a modern human presence in Australia to 10,000 to 15,000 years after the putative age of the Lake Mungo 3 individual per this research paper. The controversy is described in detail here. Critical scientific opinion appears to suggest that the skeleton is actually younger than 40,000 years old and believe that is is highly likely that the anomolous mtDNA results obtained are typical of postmortem damage in human mitochondrial DNA found in ancient samples.

Maju, in the comments, suggests that there is a single Malaysian Negrito individual who was found to have an mtDNA haplotype L2. But, none of the literature accessible to Google references the find, and if it was the case, it could easily be a case where someone had a matrineal ancestor from Africa in the historic era who African identity is not known to anyone no living, as Malaysia has been on trade routes of peoples who also trade with Africa for at least a thousand years. In an isolated case, this is the more plausible interpretation, and such unexpected ancesteral discoveries are not unprecedented in DNA testing.

Updated and revised to reflect comments received and correct grammatical issues on November 2, 2010.


Maju said...

A very extensive review of Human genetic history. I broadly agree but not with every detail. Here there are some of my main differences:

1. I am of the opinion that several other "African" lineages made to Eurasia but essentially only to Arabia. These are in the L3'4'6 and the L0 clades but not in the L2 nor the L1 ones (however there was one Malaysian Negrito reported with L2 and there was also a controversial aDNA lineage in an early Australian individual that could be L(xM,N) as well).

2. I don't think South Asia alone is enough to explain the Eurasian expansion. It was central without doubt but I understand that SE Asia (including South China) must be accounted for when explaining mtDNA N and Y-DNA C and D. These lineages quite clearly ask for a SE Asian center of spread, as most diversity is located in East Asia and Near Oceania.

3. Neanderthal admixture:

"For example, if early Upper Paleolithic humans were 25% Neanderthal, and made up 20% of the gene pool of post-LGM Upper Paleolithic humans, who in turn made up 20% of the gene pool of Neolithic Europeans, who in turn made up 80% of the gene pool of contemporary Europeans"...

This is a very unlikely scenario, IMO, and anyhow you have to consider that Neanderthals were extended by all West Eurasia (except Arabia peninsula but including Central Asia up to Altai), so the issue of Neanderthal admixture in that period is not a European-only issue but a general West Eurasian one. And there have been no significant inputs in West Eurasia since the early UP colonization.

Also Europe was even in the Ice Age a major contributor to overall West Eurasian demography. However it's clear that founder effects and homogenization processes are concentrated in Europe (the latter probably because of the LGM "bottleneck"). There is no evidence so far, neither in the Paabo team's comparisons nor in the X-DNA (where a particularly old non-African lineage might be of Neanderthal origin - ??) that would suggest a specially European or West Eurasian admixture with Neanderthals. If anything it's something generically pan-Eurasian and hence a minor admixture episode in the Out of Africa migration most likely.

Andrew Oh-Willeke said...

1. I considered mentioning the one early Australian individual, but left that for another post as the details would take up a large part of the entire story and get me off track. I wasn't aware of the Malaysian Negrito indiviudal.

2. I'd be curious to see the data relevant to the SE Asian/South China involvement in mtDNA N and Y-DNA C and D, I'm not familiar with it.

3. I'm not aware of Neanderthal signs further East than about the Urals or maybe a just beyond there. Also, modern humans reached Central Asia and Siberia on the late side relative to Europe and East Asia from which they went there. There is also evidence of replacement of most Siberian poulations post-LGM, with only minor relicts remaining.

Maju said...

"I wasn't aware of the Malaysian Negrito indiviudal".

Sorry I can't provide right now of a source for this but it's part of the data in one of the few genetic papers on Orang Asli. However I just checked Hill 2006 and it's not there. It's always possible it's an error or a modern erratic arrival. One individual does not say much, does it?

"I'm not aware of Neanderthal signs further East than about the Urals or maybe a just beyond there".

The existence of Mousterian further east has been known for some time (see here for instance), this was joined with bone and teeth remains that could be Neanderthal, however until Paabo's team got their hands on them it was not 100% certain (ref.).

Other Neanderthal Asian sites are in the Levant (Syria and Palestine), in Iran and in Uzbekistan. So it's clear that Neanderthals dominated most of West Eurasia some 70-50 Ka ago, the exceptions being in Arabia Peninsula and the Iranian south (what is now the coasts of the Persian Gulf).

Iran rather than Palestine was maybe where the hybridization happened - but this is of course uncertain.

"I'd be curious to see the data relevant to the SE Asian/South China involvement in mtDNA N and Y-DNA C and D, I'm not familiar with it".

I have discussed this matter several times but I don't think I have ever dealt with it in a specific article. The case for N was discussed when dealing with overall Eurasian mtDNA here (though it's more than one year old: some details have changed but not for top-level N I believe).

The case for Y-DNA D is well addressed by Hong Shi 2008.

I'm not sure if there's a paper on C but it's easy to explain. Essentially there are six subclades (C6 is only very vaguely reported as New Guinea but can be a private lineage - and won't make much difference):

- C*: scattered by Eastern Eurasia and South Asia
- C1: Japan
- C2: Wallacea and Melanesia (and Polynesia)
- C3: NE Asia (and North America)
- C4: Australian Aborigines
- C5: first reported in South Asia but now seems to be found also in West Asia, Central Asia and even East Asia (scattered and minor)

Now connect the dots: regardless of where you place the less clear haplogroups/paragroups (C* and C5), you always get a SE Asian center of gravity for the whole haplogroup (my estimate was near Macao). This is a pretty effective way of estimating the origins of haplogroups in general, just make sure you are using top level subclades and that each one weights the same (regardless of frequency). A lesser problem may come with the "asterisk" paragroup, which may well be hiding a number of minor top-tier sub-lineages but that's something we have to live with until it is further clarified.

Overall it's clear that there was an expansion also in SEA, that these three lineages particularly demand it and that this fact calls for the "rapid migration" model, coastal or not.

Andrew Oh-Willeke said...

I went over one of the main papers on the Orang Asli (liked in the updated version of the post) without finding that, but did find language in it describing whether mtDNA lines were more or less basal in reference to the L3 positions, so this could have been a source of confusion. It also references a paper on L2, but only with respect to its methodology.

Thanks for the Neanderthal range references. Those are new to me.

I've located a lot of the work on Y-DNA haplotype D and included it in the post and see that you have a point there, although its origins are muddy and complicated at best, and did include a reference to the Shi paper.

I don't think that I agree with you on Y-DNA C. As I read the information that you refer to, a more plausible scenario to me seems to be its origin with C* in South Asia, with one part (C5) branching mostly to the West, and the others (C1, C2, C3, C4) making a rapid coastal expansion to the East, either bypassing the mainland or being squeezed out of it by subsequent peoples, and then differentiating locally. It is hard to see a SE Asian center of gravity for a population found almost exclusively outside SE Asia.

I'll look at the mtDNA N link when I get a chance.

Andrew Oh-Willeke said...

Can't say that I agree with you on mtDNA N either. The really striking thing about N is that you have branches of both N(xR) and R that are found in West Eurasia, South Asia and East Asia (including Australia).

You also have the U split between South Asia and West Eurasia (and I think that there are some South Asian specific U haplotypes missing from the chart), suggestive of an origin for U that would be in South Asia or Iran, and X which goes both East and West, with the split taking place early enough to end up in the indigenous American gene pool (although this was probably via Siberia which was demographically West Eurasian except for the far NE in the entire Paleolithic until about 2,000 years ago).

The implication is that at least until R becomes distinct from N, that both are confined to Eurasian Eden, and that the population of Eurasian Eden evolved from a subset of the African gene pool into a distinctively Eurasian gene pool with M, N and R all present no later than the colonization of Papua New Guinea and Australia by modern humans. This requires a lot of mutational clicks that take place in a single genetic community between L3 and 50,000 years ago.

The absence of any obvious intermediaries between the Australian and Papuan mtDNA M, N and R types in Sahul and Y-DNA C types is suggestive of a long period of evolution in Eurasian Eden followed by a rapid expansion into these territories and subsequent differentiation. The scatter of the Y-DNA F lineages at multiple levels of phylogeny depth is also indicative of lots of evolution in a Eurasian Eden before splitting up in all directions.

Since Y-DNA C is found in Australian and Papuan non-overlapping haplogroups, one imagines that at least Y-DNA C and mtDNA M, N, and R are all part of the rapid Eurasian population explosion across Asia from Eurasian Eden.

I think one could make a case for an mtDNA M, Y-DNA C and D first set of expansion waves, and an mtDNA N/R, Y-DNA F second expansion wave scenario.

Indeed, Shi's paper argues for a Y-DNA D wave followed by a Y-DNA O wave (which is a subset of Y-DNA F), and an mtDNA M first, mtDNA N and R second (or mtDNA M, N and R second) scenario, would be a natural parallel to that and fit the evidence of the oldest Y-DNA D layers probably being accompanied only by mtDNA M.

Andrew Oh-Willeke said...

A couple more things on Y-DNA C. I think that the data from rare haplogroups of C does convincingly make the case for it expanding via a Southern route rather than a Northern one.

This article argues for Y-DNA C dispersal around 40,000 years ago, which would be after Shi's Y-DNA D wave (ca. 50,000 years ago), and before Shi's Y-DNA O wave (ca. 30,000 years ago). The 40,000 years old date seems young for C4 which is high frequency in Australia and for C6 which is found in the Papuan Highlands. And, the big time span between 40,000 years ago and 15,000 years ago for C3 to wind up in the Americas is hard to explain. Why wouldn't C3 by a descendant of C5 then, rather than a sister haplogroup?

The fit seems more to C* going all the way from South Asia to Siberia and mutating only upon arrival, rather than one with serial founder populations.

Maju said...

In truth sometimes I'm not sure. Or rather I'm sure that the L2 individual exists, that I have read the paper but I lost track months ago and can't recall. So I understand your doubts because I would doubt of myself too, except that I'm sure it's not any dream. Anyhow...

As for C, I'm referencing wider ranges for C* and C5 following a reader of my blog and usual visitor of genetic/anthropology sites named Ebizur. He's like the walking encyclopedia of Y-DNA and has provided me and others with loads of data, sometimes even overwhelmingly so. I trust him, so I do not always check his sources (which in some cases are behind paywall anyhow).

I'd have to review too many comments in too many random discussions to find the sources but it was mentioned in the last few months anyhow.

It doesn't really matter because I got to about the same conclusions when I thought like you (and I did in the past). As soon as I realized that C* is extremely rare in India and also sufficiently 'common' elsewhere, specially in Eastern Eurasia, I had to acknowledge that C looks oriental in its spread because all or nearly all basal sublineages are essentially centered along the coasts of the Pacific Ocean, both north and south of SEA.

So SEA must be the answer. I was more uncertain about D until I read the Hong Shi paper. Nowadays I conceive the Eurasian expansion as two sequential explosions: one (probably first) in South Asia (Y-DNA F and mtDNA M) and the other in SEA (Y-DNA C and D and mtDNA N). However mtDNA M and Y-DNA K (MNOPS) were in the area soon after and everything got messed up, then a branch of MNOPS (P) probably back-migrated westward with branches of N (specially mtDNA R).

It's complex anyhow because we could well say that love (and sociality in general) knows no haplogroups. Eurasians of all lineages got mixed once and again, yet you can spot some patterns in all that mix anyhow. At least I think I have finally achieved that "genetic illumination" (took some time but now it mostly makes good sense after all).

Maju said...

"Can't say that I agree with you on mtDNA N either".

If you don't agree about Y-DNA C, then you won't agree re. mtDNA N either. I follow the same logic in both cases.

"The really striking thing about N is that you have branches of both N(xR) and R that are found in West Eurasia, South Asia and East Asia (including Australia)".

True. I can understand that perfectly: I have been confused by the complex patterns of N for long as well. But the case is where are most of those sublineages centered? And most are (by 2 to 1) in Eastern Eurasia (including Near Oceania). Actually, since R also seems SE-original now, the apportion is now of 3 to 1 (9 and 3). And two of those three are shared 50/50 between West Eurasia and South Asia (N1'5 and N2), R also is more diverse in South Asia than West Eurasia (though now it seems less diverse than in Eastern Eurasia anyhow).

"You also have the U split between South Asia and West Eurasia (and I think that there are some South Asian specific U haplotypes missing from the chart), suggestive of an origin for U that would be in South Asia or Iran".

West Asia for me. Only U2 and U7 are in South Asia and they both belong to U2'3'4'7'8'9, which looks West Asian by origin anyhow. All other basal U lineages: U1, U5 and U6 are "western".

"and X which goes both East and West"

But not symetrically: X has two basal sublineages: X1 and X2. X1 looks North African (or Palestinian at most) by origin and X2 also looks West Asian by origin. So for me X is the only N sub-lineage clearly West Asian by origin. Now, X2 did expand eastwards: to Altai and, somehow, also to North America.

"The implication is that at least until R becomes distinct from N, that both are confined to Eurasian Eden"...

What is "Eurasian Eden"? If you mean a single region where all Eurasian lineages coalesced together, I don't think that exists, except considering all Tropical Asia as such, from Arabia to Guangzhou and Borneo. And that tropical arch is diverse enough to allow for several founder effects: at least two: one in South Asia and the other in the Far East (SEA most probably).

I used to flirt with that idea, where everything would have spawned from South Asia but it does not make sense to me anymore. Reality has forced me to accept that SE Asia played a similar but secondary role similar to that of South Asia. It's just logical anyhow. After all the people who crossed India to the East eventually found themselves in another "empty" (??) region beyond the Ganges Delta where they could leave their mark for the future... massively.

More strange is the quite apparent back-migration westward of several such Eastern branches (soon after their expansion in the East or even simultaneously). These lineages are: mtDNA N and R and Y-DNA MNOPS (and some C). But whichever the reason and the how, it happened.

The simplest explanation is that there were flows forth and back through Tropical Asia for some time until the homeostasis (demographic pressure) closed the routes. When all containers were "full", flows effectively stopped, except to some extent through Siberia/Central Asia. But Siberia is another story altogether and its overall impact rather minor, except in America (again an "empty" container which was easily filled).


Maju said...


"The absence of any obvious intermediaries between the Australian and Papuan mtDNA M, N and R types in Sahul and Y-DNA C types is suggestive of a long period of evolution in Eurasian Eden followed by a rapid expansion into these territories and subsequent differentiation".

I don't know what you mean here: there are a number of major haplogroups in Sahul that are just one mutation downstream of M, N and R (more if you consider two mutations or so). These are: M29'Q (M), S (N) and P (R). O is also just two mutations downstream of N and there's still some other N, I'd have to check.

So people were pouring into Sahul right away (in few millennia) after they arrived to South Asia and SE Asia. So there is no "Eden" there. The place of latency was surely in Arabia (but was a rather dry "Eden" and that's why there were no or few expansions there). Arabia also acted as filter surely because not all apparently old lineages found there are found elsewhere in Eurasia-plus. These lineages have been mistaken as "slave trade" or "Indian Ocean trade" in general but I think these interpretations must be wrong because they are not your usual L2 and L3 lineages but lots of very deep rare clades in L0 and L3'4'6 specially (but few L3 as such and no L2).

"Since Y-DNA C is found in Australian and Papuan non-overlapping haplogroups, one imagines that at least Y-DNA C and mtDNA M, N, and R are all part of the rapid Eurasian population explosion across Asia from Eurasian Eden".

If "Eurasian Eden" is Arabia or the Persian Gulf (then a marshy oasis), fair enough. But I do not see any reason to imagine L3n or L3m (pre-N and pre-M) latent in India or whatever. Once they arrived there they boomed because it was prime and vast land. That's why I think M arrived first, while N went around M somehow once and again (first as pre-N and then as post-N, of course), maybe by exploiting coastal niches and boating around.

I think Y-DNA C and D, and probably K too, made their way to the East with N and the Eastern M subclades (many but not as many as in South Asia, nor as expansive as N). This was a general flow followed by various groups probably, which then remixed in SE Asia. Then some N and (importantly) some R made their way back to the Far West (West Asia), I think that with Y-DNA P, which seems to have left a good track in South Asia from Bengal towards the West. Maybe some C also was in that journey westward (C5 specially, some C* probably too), but had a much lesser impact. Instead, other relatives of MNOPS (IJ and G) were co-opted in that flow westward.

"Indeed, Shi's paper argues for a Y-DNA D wave followed by a Y-DNA O wave (which is a subset of Y-DNA F), and an mtDNA M first, mtDNA N and R second (or mtDNA M, N and R second) scenario".

I'm not in agreement with him in the mtDNA aspect. I think he fails to integrate this properly because of his focus on Y-DNA D. For some reason he totally ignored Y-DNA C instead. There was no knowledge of MNOPS yet in 2008, so this part is understandable.

The scenario is a complex puzzle but I believe it can be solved. In fact I believe I already have done that (with some help from some of my readers) but I have only explained it fragmentarily so far.

I will try to post something before the end of the year but it is a monumental task. I have the draft very clear but the many details to account for are scary.

And then the discussions in the comments section... :D

Maju said...

"A couple more things on Y-DNA C. I think that the data from rare haplogroups of C does convincingly make the case for it expanding via a Southern route rather than a Northern one".

Sahul specially clarifies much the matter for all haplogroups. This is true for Y-DNA C but also for K, and it is also true for mtDNA M, N and R.

I'm totally for the southern route for these and other reasons (climate for instance). The steppe route was only taken after at least 50 Ka ago (Y-DNA N and Q, mtDNA CZ, D, G, A, X, maybe some U and the ubiquitous Y-DNA J...) There's no indication of that route playing any role in the early Eurasian colonization. Terry is just being extremely stubborn.

The delicate question is not so much the route but where each major haplogroup coalesced.

As for your link, it says:

"a general south-to-north and east-to-west cline of Y-STR diversity is observed with the highest diversity in Southeast Asia".

More clear impossible.

"... argues for Y-DNA C dispersal around 40,000 years ago, which would be after Shi's Y-DNA D wave (ca. 50,000 years ago)"...

I do not pay any attention anymore to MC estimates. Not just they are all probably wrong but each author surely uses a slightly different method. I only read the patterns without any date on them.

"Why wouldn't C3 by a descendant of C5 then, rather than a sister haplogroup?"

That's the phylogeny: C3 cannot be a son (Y-DNA) of C5 because the defining mutations of C5 are not found in the genome of C3 individuals. C3 is not part of the C5 set but both are part of the C superset.

Each haplogroup is a set, which in turn is a subset ('son', 'daughter') of a larger haplogroup. You can only belong to C5 if you fit the requirements. C3 Y chromosomes do not fit there.

"The fit seems more to C* going all the way from South Asia to Siberia and mutating only upon arrival"...

I want to make a final precision here: C* is not C-root but a series of unclassified C descendants, say C7, C8, C9, C10, etc. Maybe they all belong to a single undiscovered haplogroup or maybe there are hundreds of tiny sublineages within C* but in any case it is not the same as C-root: C-root or C-zero disappeared long ago, as soon as all patrilineal descendants accumulated any other mutation in their Y-DNA. It is something that only existed in the distant past.

Beware of para-haplogroups: they are tricky.