General considerations

The population groups are based on a previous study of global variability that found a close match between geographic distribution of populations and genetic clustering using structure to arrange populations into groups based on patterns of variability (Rosenberg et al. 2002). This allows the pre-processing of groups to achieve faster results, however any population can be grouped as desired at the advanced search section.

Only unrelated individuals are considered in order to build all the statistical indexes provided, so the number of samples and genotypes stored in the data mart are slightly less than the total number of samples and genotypes present in each database.

The world map

From the clickable map at the frontpage you can select any population (individual dots on the map) or group of populations (coloured according to the listed geographic groupings) to activate the quick search function. The same information is obtained from an advanced search.

African American samples were collected throughout the US, but are located on the map at Washington DC (collected by NIST, Gaithersburg, MD - see contributors).

The advanced search

Multiple population selection is permitted, up to a maximum 5 populations or groupings. Selection of any combination of populations builds the custom query and the pre-calculated statistical summaries are generated from the merged genotype data accordingly.

The results page

The frequencies tab will contain each frequency set, the visual bar-chart translations and the complete dataset group pie-charts, and summary pie-charts arranged by major population-group. The corresponding global information of the other datasets where data for the is held is provided if selected.

The statistics tab will contain each SNP's information in a population-per-row arrangement, showing sample size (N), all the alleles found on the population set queried (alleles), minor allele (MA), minor allele frequency (MAF), observed and expected heterozygosities (HOBS and HEXP), and certain relevant statistical indexes: local inbreeding (FS) for single populations, genetic differentiation (FST) for groups of populations, and the informativeness for group assignment (In). For visual aid, the FST values are written in yellow when it starts to be significative (>0.05), in orange when it is significative (>0.15) and in red when it is highly differentiative (>0.25). In addition, descriptive SNP information is extracted from dbSNP build 132: chromosome, chromosome position, validation status, gene, reference allele on same strand as SNP (obtained from current genome reference hg19) and ancestral allele (obtained from the Chimpanzee genome).

When available, the downloads tab allows the user to retrieve all the statistical information in a single csv formatted file, separated by ";". Genotypes from alternative datasets are also available for download using the population filters defined by the user's queries.

Symmetrical bases

There are 10 SNPs in the 52plex identification set that show symmetrical substitutions (AT or CG). The reporting and interpretation of results for these SNPs requires particular care to ensure the alleles detected on any one platform are consistent with those reported in online SNP databases. The following 6 SNPs produce the inverse base in the 52plex SNPforID SNaPshot assay compared to the reference allele (e.g. an A allele strand detected in SNaPshot corresponds to the T allele strand reported in HapMap):

rs1357617 (A03), rs2046361 (A04), rs1015250 (A09),
rs1979255 (A34), rs907100 (A38) and rs1335873 (A52).

For these SNPs, the SNPforID browser describes frequencies and pie-charts for alleles detected by the SNaPshot assay. Pie charts in each case are adjusted to match those of HapMap with the allele segments appropriately labeled to indicate the inversion.

In addition, the 34plex ancestry set contains 4 symmetrical SNPs all of which produce the inverse base in the SNaPshot assay compared to the reference allele reported in HapMap:

rs773658 (P10), rs10141763 (P11), rs1335873 (A52) and rs16891982 (P25a)

Both the SNPforID browser and the partnering SNP classification system run by the Mathematics Dept. University of Santiago de Compostela (http://mathgene.usc.es/snipper/) use the SNaPshot allele designations so any attempt to convert base calls to the reference alleles will lead to interpretation errors. The following table summarizes the frequency variability differences between the 34plex SNaPshot assay and the reference allele designated in HapMap:

SNP Pattern of variability:
SNaPshot base call
Pattern of variability:
reference allele
rs773658G=AFR specific alleleC=AFR specific allele
rs10141763A=EUR near-fixed alleleT=EUR near-fixed allele
rs1335873T=AFR near-fixed alleleA=AFR near-fixed allele
rs16891982C=EUR fixed alleleG=EUR fixed allele
The statistical references

The algorithms used for calculating the expected heterozygosities and the F-statistics have been taken from "Principles of Population Genetics" (Hartl 1997, Sinauer Associates) and from "Population Genetics, a concise guide" (Gillespie 1998, Johns Hopkins University Press), following Dr. David McDonald's advice from his worked example of calculating F-statistics from genotypic data.

The In values are computed using equation (4) from "Informativeness of Genetic Markers for Inference of Ancestry" (Rosenberg 2003, Am. J. Hum. Genet. 73:1402-1422)