HapMap Project logo

中文 | English | Français | 日本 | Yoruba

What is a Haplotype?
[More technical details]
Figure 1: Meitotic recombination to produce gametes
Diagram of meiotic recombination

Each individual carries two sets of chromosomes, one derived from his or her father, and the other from his or her mother. During generation of the gametes (sperm and egg), the maternal and paternal copies of the chromosomes exchange material in a process known as "meitotic recombination," leading to mosaic chromosomes that contain parts of the original paternal chromosome and parts of the original maternal chromosome. The chromosomes that are passed on to the individual's children are not identical copies of his parents' chromosomes, but reshuffled versions of them (Figure 1). On average, one reshuffling event occurs per chromosome per generation.

Figure 2: Genetic Mapping
Genetic mapping

Geneticists routinely take advantage of this reshuffling to identify the probable locations of genes involved in diseases, adverse drug reactions, or other medically important traits using a family of techniques called "genetic mapping". To understand how genetic mapping works, consider Figure 2, where there is a hypothetical gene, named X, which is responsible for a medically important trait. We assume that the gene occurs in two different forms. One form of the gene, uppercase X, increases the risk that a person who carries it will develop a disease. The other form of the gene, lowercase x, does not increase this risk. We assume that a recent ancestor carried both the X and the x forms in his two copies of the chromosome. We distinguish the two original chromosomes by color: one is red, the other blue. The X form of the gene resides on the red chromosome.

Over the course of a few generations, the chromosomes become scrambled by meiotic recombination, and the descendents have each inherited a slightly different scrambled version. However, no matter what scrambling occurs, the uppercase X form of the gene still resides on a red segment, while lowercase x resides on a blue segment, reflecting their ancestral chromosomes of origin. (In the diagram we've shown each individual as if they only have one copy of the chromosome, but in reality they have another copy of the chromosome derived from the "other side of the family." For simplicity, we don't show this second set of chromosomes.)

If chromosomes were really color-coded and geneticists could look through a microscope and see the alternating bands of color, then genetic mapping would be easy. First the geneticist would examine the family members to determine which ones had a genetic trait - for example heightened susceptibility to a disease. Then they would examine the alternating bands of color to see whether there was any ancestral chromosome segment that was shared by all the susceptible individuals. This shared segment would then be a likely region in which to start searching for the causative gene. Figure 3 shows this process at work. Susceptible individuals are indicated with a checkmark, and there is indeed a region of the original red chromosome that all of them share. As it happens, the X form of the gene resides here.

Figure 3
Finding a causative location

Of course, nothing in nature is as easy as this, and chromosomes don't come color coded. However, nature provides the next best thing in the form of "molecular polymorphisms". Polymorphisms are regions of the genome that vary between individuals, appearing in one version in one copy of a chromosome, and another version in a different copy. The most common polymorphism is a single nucleotide polymorphism, or SNP. A SNP consists of a position in the genome that is one nucleotide, (e.g. "A"), in some copies of the chromosome and a different nucleotide, (e.g. "G"), in others. On either side of the SNP the sequence is the same in all individuals (Figure 4). Over the past few years, several million SNPs have been identified and their precise positions on the chromosomes located. This information has been augmented by technologies that allow molecular biologists to rapidly determine which form of a particular SNP is carried by an individual. Each SNP version is in effect a color code, allowing one version of a chromosome to be distinguished from another.

Figure 4
Genetic mapping

The different versions of a SNP are known as "alleles." The process of determining which SNP alleles are carried by individuals is known as "genotyping." When the alleles of a whole set of SNPs is determined (conceptually painting the chromosome with distinguishing red and blue bands), the set of genotypes is known as a "haplotype." By genotyping individuals with SNPs, geneticists can identify chromosomal regions that are shared among the affected individuals, thereby gaining a fix on the probable location of the causative gene. This is known as an "association" study, because the geneticists are searching for associations between a particular version of a SNP, and a disease or other medically important trait. The SNP doesn't cause the disease, but is associated with it by virtue of the fact that it happens to reside on the same segment of the chromosome as the disease gene.

SNPs can -- and have -- been used to identify candidate genes responsible for diseases with genetic components. Examples include inflammatory bowel disease, asthma, and type II diabetes.

In the example shown in Figure 2, we looked at just two generations of meiotic recombination. Because not many generations occurred, the shared segments are relatively large. In a SNP mapping study, one would only need to characterize a few SNPs - just one per color-coded region. The relatively large size of the segments is a major drawback to family studies, because even after the segment associated with the medically relevant trait has been identified, it is still a large region of the genome to examine.

Figure 5
Genetic mapping

Now consider Figure 5, where a great many generations have occurred. Now the ancestral chromosomes have become extremely scrambled, and the shared regions are quite small. This is the situation that pertains when one tries to perform an association study in the general population. We each of us share common segments of the genome, but they are very small because they derive from a common ancestor who lived a considerable number of generations ago. If one could perform SNP association studies in the general population, one could rapidly zoom in on the gene of interest, because the segments are small. Of course, one needs to collect information on a lot of SNPs in order to do this, because you need to characterize at least one SNP per segment in order to be able to "color code" it.

Until recently, it was thought that this type of general population association study would be prohibitively expensive because of the large number of SNPs that would have to be characterized (millions). However, a breakthrough occurred roughly two years ago when researchers discovered that the ancestral shared regions are much larger than expected. Instead of being a few thousand base pairs on average, as suggested by mathematical models of the number of generations that have occurred in the human population, shared segments can range up to hundreds of thousands of base pairs. The reason for this is not understood for certain, but is thought to be the result of human population history, in which there may have been a number of times when the population was significantly reduced in number by migration or disease.

Whatever the reason, researchers can take advantage of the presence of large shared regions. Instead of genotyping all the SNPs in the shared region, they need only genotype one or two of the SNPs in order to determine what version ("color") a particular segment has. Instead of genotyping millions of SNPs in the course of an association study, it is now thought that with a proper understanding the shared segment structure, studies can be performed with a few hundred thousand SNPs, at great cost and time savings.

This, then, is the goal of the International HapMap Project. To genotype several million SNPs across three human population groups. To determine from this haplotype information the location and nature of the common ancestral segments carried by each of these populations, thereby creating a "haplotype map". To derive from this map a select set of 600,000 SNPs which can identify each of the segments. And finally, to make all the information derived from the Project freely available to the world research community in order to accelerate the search for treatments for genetically-related ills.

A more technical discussion of haplotypes...

Last updated : haplotype.html.en,v 1.10 2003/11/04 21:14:35 fiona Exp