The **population groups** are based on a previous study
of global variability that found a close match between geographic distribution
of populations and genetic clustering using *structure* to arrange
populations into groups based on patterns of variability (Rosenberg et al. 2002).
This allows the pre-processing of groups to achieve faster results, however
any population can be grouped as desired at the advanced search section.

Only **unrelated individuals** are considered in order to
build all the statistical indexes provided, so the number
of samples and genotypes stored in the data mart are slightly less than
the total number of samples and genotypes present in each database.

The dataset presented in this browser includes genotypes for the HGDP-CEPH diversity panel subset H952 defined
by Rosenberg et al. (2006). As described by Pereira et al. (2012), three individuals were not included in the study
(HGDP01219, HGDP01339 and HGDP01344) and individual HGDP01042 was used in substitution of HGDP01041.

From the clickable map at the frontpage you can select any population
(individual dots on the map) or group of populations (coloured according
to the listed geographic groupings) to activate the quick search function.
The same information is obtained from an advanced search.

Multiple population selection is permitted, up to a maximum 5 populations or groupings. Selection of any combination of populations builds
the custom query and the pre-calculated statistical summaries are generated
from the merged genotype data accordingly.

The **frequencies tab** will contain each frequency set, the visual bar-chart translations and the complete dataset group
bar-charts, and summary
bar-charts arranged by major population-group.

The **statistics tab** will contain each INDEL's
information in a population-per-row arrangement, showing sample size (N), all the alleles found
on the population set queried (alleles), minor allele (MA),
minor allele frequency (MAF), observed and expected heterozygosities
(H_{OBS} and H_{EXP}), and certain relevant statistical indexes:
local inbreeding (F_{S}) for single populations, genetic differentiation
(F_{ST}) for groups of populations, and the informativeness for group
assignment (I_{n}). For visual aid, the F_{ST} values are written
in yellow when it starts to be significative (>0.05), in orange when it is significative
(>0.15) and in red when it is highly differentiative (>0.25).

When available, the **downloads tab** allows the user to retrieve
all the statistical information in a single csv formatted file, separated by ";".
Genotypes from alternative datasets are also available for download using
the population filters defined by the user's queries.

The algorithms used for calculating the expected **heterozygosities**
and the *F-statistics* have been taken from
"Principles of Population Genetics" (Hartl 1997, Sinauer Associates) and from
"Population Genetics, a concise guide" (Gillespie 1998, Johns Hopkins University Press),
following Dr. David McDonald's advice from his
worked example of calculating *F-statistics* from genotypic data.

The *In* values are computed using equation (4) from
"Informativeness of Genetic Markers for Inference of Ancestry"
(Rosenberg 2003, Am. J. Hum. Genet. 73:1402-1422)