Genetics researchers combined 770,000 whole genomes and large amounts of ancestry information from Ancestry.com have reproduced the broad outlines of the regional cultural zones described in Albion's Seed, which relied on social history data.
The results aren't surprising to someone sufficiently familiar with regional American culture, history and demographics, but they still look very impressive in this form.
From Nature (Figure 3). The article is Han, E. et al. "Clustering of 770,000 genomes reveals post-colonial population structure of North america." 8 Nat. Commun. 14238 (2017). The Supplemental Figures referenced in the citations discussed below can be found here.
This figure is discussed in these excerpts from the article:
Taken together with the IBD network clustering results, the visualizations of the genealogical data in North America (Fig. 3) highlight broad-scale demographic trends, as well as patterns specific to individual populations. . . . we divide the clusters into four broad categories, and present examples of each.
The first grouping, which we label as intact immigrant clusters, are likely driven by population structure present before immigration that may have been maintained post immigration. . . . We label the second grouping as continental admixed groups, the majority of which represent Hispanic/Latino populations. . . . The vast majority of our samples are contained in the third set of clusters, which we label as assimilated immigrant groups. . . . Finally, the fourth set of clusters we label as post-migration isolated groups; these groups have historically resided in small or geographically isolated communities within the United States. . . .
The first grouping, intact immigrant clusters, can be attributed to population structure existing prior to immigration to the United States. Despite subsequent admixture following immigration, we found clusters corresponding to Finnish, Scandinavian, Jewish and Irish ancestries—all groups who immigrated to the United States in large numbers within the past 150–200 years—as well as African Americans and individuals with Polynesian ancestry (labelled Hawaiian). . . .
The majority of these groups also show evident geographic localization within the United States (Fig. 3), corresponding to known migration patterns; for example, the Scandinavian and Finnish clusters are concentrated in the Midwest, while the African American cluster closely coincides with regions of high self-reported African ancestry. Reinforcing the connection between IBD clustering and global population structure, we observe that the degree of disconnectedness in the IBD network often correlates strongly with amount of admixture (Supplementary Fig. 22; Jewish r2=0.97, Finnish r2=0.67). . . .
We highlight two additional immigrant clusters with clear geographic concentrations both within and outside the United States: Acadians and French Canadians. During the mid 18th century, Acadian residents (modern-day Atlantic Canada) were expelled by the British and took refuge in various colonies, eventually including Louisiana, then under Spanish control. On the other hand, in the late 19th century, large numbers of French Canadians left rural Quebec in search of economic opportunities in New England and the northern United States. We identified two clusters in the IBD network likely corresponding to these distinct descendant groups. . . . As a final point, the low genetic differentiation (FST=0.001) between these groups, and their nearly indistinguishable admixture proportions, illustrates that standard methods may have difficulty separating them as we do here.
Next, we identified continentally admixed clusters, including Colombians and groups in Central America and the Caribbean, labels which are primarily inferred from the ancestral birth locations of cluster members. . . . It should be noted that, in some cases, the clusters we identify using IBD could be more reflective of US immigration patterns than inherent structure within source locations. For example, two of the Mexican clusters we identified are annotated with birth locations most concentrated in Jalisco and Monterrey, the predominant traditional sources of emigration to the United States. The over-representation of West Mexican birth locations in southwestern United States and Northeast Mexican birth locations in Texas, particularly South Texas in recent generations (Supplementary Figs 20,23), confirms known patterns of migration from eastern versus western Mexico to the United States . . . .
The five largest clusters, [(1) Northeast and Utah, (2) Pennsylvania, (3) Lower Midwest, (4) Upland South, (5) Lowland South] which we describe as assimilated immigrant clusters, account for a large portion (60%) of the IBD network and exhibit a markedly different profile. Lacking distinctive affiliations to non-US populations, they show almost no differentiation in allele frequencies (FST at most 0.001) and high levels of IBD to non-cluster members, suggestive of high gene flow between these clusters. Moreover, few members of these clusters could be assigned to a stable subset, indicating that this clustering is largely driven by continuous variation in IBD. Genealogical data reveal a north-to-south trend (Fig. 5), most consistently east of the Mississippi River (Fig. 3). These findings imply greater east-west than north-south gene flow, which is broadly consistent with recent westward expansion of European settlers in the United States, and possibly somewhat limited north-south migration due to cultural differences. . . .
Finally, we identified several clusters corresponding to post-migration isolated groups—historical groups who, despite possibly maintaining high levels of diversity and gene flow, likely experienced some geographic or cultural isolation during or following migration to the United States. One such cluster represents the Amish, a distinct ethno-religious minority that first arrived to the United States from Europe in the 18th century; the genealogical data associated with the Amish cluster pinpoint individual counties in Midwestern states and Pennsylvania with present-day Amish communities (Fig. 3; Supplementary Fig. 20). The clustering of IBD in Utah is most likely attributed to population growth of descendants of Mormons, who settled in Utah in the mid-1800s (Supplementary Figs 20,24). In addition, we identified a cluster concentrated near the Cumberland Mountain range that is suggestive of residents of Appalachia, people who experienced delayed economic development and regional isolation up until the 20th century. . . .
An unresolved issue common to . . . hierarchical clustering . . . is that stopping criteria are not well established: when to stop subdividing clusters . . . . additional clusters informative of population structure emerged when we proceeded to the third level of the hierarchical clustering (Supplementary Figs 25–27). For example, additional clustering discriminated Italians, Scottish, Norwegians and Eastern Europeans, and yielded fine-scale geographic structure in Ireland (Supplementary Fig. 26), the southern United States (Supplementary Fig. 27), and on the Island of Puerto Rico (Supplementary Fig. 21). . . .
[T]he genetic separation of other groups of historical importance—such as regions of Mexico corresponding to different sources of US immigration, and the New Mexican cluster corresponding to the Nuevomexicanos, European colonial settlers from New Spain—is a major contribution of this work. . . . we do not identify known structure in the United States among some present-day immigrant and other groups that are poorly represented in our sample, such as Southeast Asians and Chinese. . . .
[W]e find clear examples in our data where disease-risk variants are present at higher frequencies in identified clusters; these include a risk allele for prostate cancer that has a frequency of 5.6% in the African American cluster but is very rare (0.1%) outside the cluster, and a protective allele for squamous cell lung carcinoma that is 10 times more common in the Finnish cluster.