HapMap Project logo
International HapMap Project
 

Home | About the Project | Data | Publications | Tutorial

 

Generic Genome Browser Help

Retrieving genotype data

For a text report of the genotypes in the region you are browsing:
  1. Select "Download SNP genotype data" under "Reports & Analysis".


  2. Press "Configure..."

  3. Select a population from the drop-down menu and the format to display your results.
The underlying file format is currently space-delimited text tables. For details on the format, see the genotype download directory README.

Note: If you install Haploview, the output file can be loaded into this Java application for further data analysis.

Retrieving frequency data

For a text report of HapMap allele or genotype frequency data, proceed as above, but select "Download SNP Allele Frequency Data" or "Download SNP Genotype Frequency Data" instead.


The underlying file format is currently space-delimited text tables. For details on the format, see the frequency download directory README.

Viewing LD Tracks

Linkage disequilibrium (LD) refers to the phenomenon that alleles that are close together in the genome tend to be inherited together.

The HapMap GBrowser allows you to display as separate tracks, three LD measures for a given set of markers: D', r2, and LOD score. For example, you may want to compare the LD patterns corresponding to different populations or limit the number of SNPs displayed to those meeting certain criteria. For this, you can select different colors to represent different LD measures (e.g., red for D', and blue for r2), set up thresholds for LD measures (e.g., D'=1 or r2 > 0.8), and invert the orientation of the plot for different populations (e.g., normal for YRI and inverted for CEU).

  1. To view LD patterns, select the plugin: LD Plot track.

  2. To configure the LD tracks, select the "Annotate LD Plot" option from the Results & Analysis drop down list, and click the "Configure" button.

  3. In the configuration page (see image below), set up the parameters for each LD property, such as color of the pairwise plot, threshold for SNPs to be displayed, populations to be displayed, etc. Below is a description of each track and configuration options.

  4. For each HapMap data release the LD data is calculated for markers up to 250 kb. Precalculated LD values can be downloaded by selecting the "Download HapMap LD Data" option from the Results & Analysis drop down list. Click the "Configure" button and "Go".


LD Plots



Configuring LD Plots

The LD plot plugin generates a pairwise plot of marker-to-marker LD values, where the genotyped SNP are denoted as ticks and the marker pairwise information is plotted as boxes between these ticks.

These are some parameters you may change in the configuration of an LD plot: Note: LD data is generated from the Haploview software's .DPRIME and .CHECK output files. For each HapMap data release the LD data is calculated for markers up to 500 kb. For calculating the LD data a non-redundant set of genotyped SNPs are used. For each chromosome the non redundant set is created by picking the first genotype SNPs which has a MAF > 0. The Genotyped SNP track shows the complete set of markers genotyped (including duplicate SNPs genotyped in different platforms by different centers). For this reason, you might not find a one-to-one match between the markers in the two tracks.

Viewing phased haplotypes

Phased haplotypes were generated using the program PHASE version 2.0 (see the article by Stephens and Donnelly 2003). During phasing, each allele in a genotype is assigned to one or the other parental chromosome, using a maximum likelihood algorithm that uses trio (lineage) information in the HapMap population groups, or, if trio information is not available, by fitting the data to a model that minimizes the number of implied historical crossovers in the population. The phased haplotypes are displayed as a graphic in which each chromosome of the individuals sampled by the project is represented as a line one pixel high and each SNP allele is arbitrarily colored blue or yellow. A region of high LD will appear as a region in which there are long runs of SNPs whose alleles are the same color, indicating that there is little recombination among them. A region of low LD will appear as an area where the runs are shorter and more fragmentary.

  1. To view phased haplotypes, select the plugin: Phased Haplotype Display track.

  2. Select the "Annotate Phased Haplotype Display" option from the Results & Analysis drop down list, and click the "Configure" button.

  3. Select the population(s) for which to display haplotype information, and click the "Configure" button to return to the main display.

    The phased haplotypes for each population selected will appear in a separate track following the two-color scheme described earlier (see image below, track CEU Phased Haplotypes). The order of chromosomes is determined by a fast hierarchical clustering methodology, which places chromosomes that share similar haplotypes together.

  4. To retrieve the detailed phased genotypes, click on the track of the desired population. This will take you to a page that provides the haplotype information in tabular form, where each row represents each chromosome, and each column is an individual SNP. The background of each table entry is set to a color corresponding to what is seen in the graphical track.

    The advantage of this display over the pairwise LD triangle display described earlier is that it is more compact and therefore better suited for the display of large regions. This makes it easy to correlate the position of long common haplotypes with SNPs chosen by the tag-SNP picker. The disadvantage of this display is that it conceals much of the fine-structure of LD in the region, in particular strong linkage among SNPs that are not adjacent to one another.

Picking tag SNPs

Tag-SNPs are a reduced set of SNPs that capture most of the genetic variation in a region, and can be used in association studies to reduce the number of SNPs needed to detect LD-based association between a trait of interest and a region of the genome.

There is no single set of tag-SNPs that will satisfy the diverse requirements of every association study design. For example, you may wish to select SNPs that work well with a particular genotyping system (such as those that have been included on a particular SNP chip) and may be willing to accept different tradeoffs between the cost of genotyping a study population and the strength of the association they can detect. For this reason, the HapMap website does not offer a static set of pre-selected tag-SNPs, but instead offers researchers a tool for interactively selecting tag-SNPs based on user-provided criteria. This tag-SNP picking tool is Tagger, an algorithm that chooses tag SNPs by formally maximizing the number of linked SNPs captured by a given set of tags.

  1. To choose tag-SNPs, select the plugin: tag SNP picker track.

  2. Select "Annotate tag SNP picker" under "Reports & Analysis", and press the "Configure" button.

    Options for tag-SNP selection include selecting a population and a tagging method (pairwise or multi-marker), uploading a list of SNP IDs to be included in the set of tag-SNPs, uploading a list of SNP IDs to be excluded from the set of tag-SNPs, uploading a list of design scores (priorities) for each SNP, and selecting cutoffs for minimum acceptable LD value and allele frequency for SNPs to be included in the set. More information on these options is available in the Tagger website.

  3. After setting the desired options, click the 'Configure' button to run the analysis and return to the main display. Results are shown on a new feature track labeled tSNPs_TaggerMethod_population.

  4. To generate a text list of tag-SNPs, select "Download tag SNP data" under "Reports & Analysis", and press the "Configure" button and "Go".

    The generated report contains a tab-delimited list of tag-SNP names, chromosome, position, and allele frequency in the region. This is followed by a section that lists the LD tests performed between all SNP pairs to select the tag-SNP, and based on the r2 cutoff set in the configuration page. The last section lists the non tag-SNPs that each tag-SNP captures.

Symbols and colours used

The following table describes the symbols used at different magnifications and their meaning in the Generic Genome Browser. Note all alleles are with respect to the plus strand.

[Back to Generic Genome Browser]

Last updated : gbrowse_help.html,v 1.8 2005/10/27 19:38:19 krishnan Exp

Home | About the Project | Data | Publications | Tutorial
Please send questions and comments on website to hapmap-help@ncbi.nlm.nih.gov