General considerations
The population groups are based on a previous study of global variability that found a close match between geographic distribution of populations and genetic clustering using structure to arrange populations into groups based on patterns of variability (Rosenberg et al. 2002). This allows the pre-processing of groups to achieve faster results, however any population can be grouped as desired at the advanced search section. Only unrelated individuals are considered in order to build all the statistical indexes provided, so the number of samples and genotypes stored in the data mart are slightly less than the total number of samples and genotypes present in each database. The dataset presented in this browser includes genotypes for the HGDP-CEPH diversity panel subset H952 defined by Rosenberg et al. (2006). As described by Pereira et al. (2012), three individuals were not included in the study (HGDP01219, HGDP01339 and HGDP01344) and individual HGDP01042 was used in substitution of HGDP01041.
The world map
From the clickable map at the frontpage you can select any population (individual dots on the map) or group of populations (coloured according to the listed geographic groupings) to activate the quick search function. The same information is obtained from an advanced search.
The advanced search
Multiple population selection is permitted, up to a maximum 5 populations or groupings. Selection of any combination of populations builds the custom query and the pre-calculated statistical summaries are generated from the merged genotype data accordingly.
The results page
The frequencies tab will contain each frequency set, the visual bar-chart translations and the complete dataset group bar-charts, and summary bar-charts arranged by major population-group. The statistics tab will contain each INDEL's information in a population-per-row arrangement, showing sample size (N), all the alleles found on the population set queried (alleles), minor allele (MA), minor allele frequency (MAF), observed and expected heterozygosities (HOBS and HEXP), and certain relevant statistical indexes: local inbreeding (FS) for single populations, genetic differentiation (FST) for groups of populations, and the informativeness for group assignment (In). For visual aid, the FST values are written in yellow when it starts to be significative (>0.05), in orange when it is significative (>0.15) and in red when it is highly differentiative (>0.25). When available, the downloads tab allows the user to retrieve all the statistical information in a single csv formatted file, separated by ";". Genotypes from alternative datasets are also available for download using the population filters defined by the user's queries.
The statistical references
The algorithms used for calculating the expected heterozygosities and the F-statistics have been taken from "Principles of Population Genetics" (Hartl 1997, Sinauer Associates) and from "Population Genetics, a concise guide" (Gillespie 1998, Johns Hopkins University Press), following Dr. David McDonald's advice from his worked example of calculating F-statistics from genotypic data. The In values are computed using equation (4) from "Informativeness of Genetic Markers for Inference of Ancestry" (Rosenberg 2003, Am. J. Hum. Genet. 73:1402-1422) |