Ethnicity is relevant for interpreting genetic data on the context of genetically informed medical treatment of conditions such as cancer. And, in this context, ethnicity is a meaningful enough concept that it can be determined with 99%+ precision from 1000 Genomes data.
However, this probably overstates the precision of the information because the 1000 Genomes data involves people with very well established distinct identities and has instances where people with more similar ethnicities, or who are admixed, must be distinguished from each other. The experience of commercial genetic ancestry research trained on the same data suggests that in more realistic data sets would have less precision unless the ethnic categories they are trying to identify are very coarse.
Whole exome sequencing (WES) is widely utilized both in translational cancer genomics studies and in the setting of precision medicine. Stratification of individual's ethnicity is fundamental for the correct interpretation of personal genomic variation impact. We implemented EthSEQ to provide reliable and rapid ethnicity annotation from whole exome sequencing individual's data and validated it on 1,000 Genome Project and TCGA data demonstrating high precision (>99%). EthSEQ can be integrated into any WES based processing pipeline and exploits multi-core capabilities. Source code, manual and other data is available at http://demichelislab.unitn.it/EthSEQ.
Alessandro Romanel, Tuo Zhang, Olivier Elemento, Francesca Demichelis, "EthSEQ: ethnicity annotation from whole exome sequencing data" (pre-print published November 10, 2016). doi: http://dx.doi.org/10.1101/085837