Guidelines for Referring to the HapMap Populations in Publications and Presentations

It is important to exercise care when labeling the populations whose samples were used to develop the HapMap in any publications and presentations that describe the Project or use Project data. This document provides guidelines on how to refer to the populations and includes other relevant background information about each population.


The way that a population is named in studies of genetic variation, such as the HapMap, has important ramifications scientifically, culturally, and ethically. From a scientific standpoint, precision in describing the population from which the samples were collected is an essential component of sound study design; the source of the data must be accurately described in order for the data to be interpreted correctly. From a cultural standpoint, precision in labeling reflects acknowledgement of and respect for the local norms of the communities that have agreed to participate in the research. From an ethical standpoint, precision is part of the obligation of researchers to participants, and helps to ensure that the research findings are neither under-generalized nor over-generalized inappropriately. The use of careless and inconsistent terminology when describing the populations represents a failure in all three of these areas.

The populations included in the HapMap should not be named in such a way that they single out small, discrete communities of individuals and imply that those communities are somehow genetically unique, of special interest, or very different from their close neighbors. Labels that are too specific could also invade the privacy interests of communities (or even, conceivably, of individual sample donors).

On the other hand, describing the populations in terms that are too broad could result in inappropriate over-generalization. This could erroneously lead those who interpret HapMap data to equate geography (the basis on which populations were defined for the HapMap) with race (an imprecise and mostly socially constructed category). This, in turn, could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination in places where members of the named populations or of closely related populations are minorities.

The guidelines in this document take into account the above considerations. They also incorporate input obtained in some of the communities during the course of extensive community consultations about how the samples collected in those communities should be named, at least the first time that they are described in a publication or presentation.

Recommended Descriptors

The complete recommended language for naming the populations included in the HapMap (which reflects both the ancestral geography of each population and the geographic location where the samples from that population were collected) is:

After the complete descriptor for a population has been provided, it is acceptable to use a shorthand label for that population (e.g., "Yoruba," "Japanese," "Han Chinese," "CEPH") or the abbreviation for that population (e.g., "YRI," "JPT," CHB," "CEU") in the remainder of the article or presentation. However, the full descriptor for each population should be provided before such shorthand labels are used. This will help to avoid the risks associated with over-generalization of findings.

The sample sets should not be described as having come from "normal controls." Because no phenotypic information was collected with the samples, we have no way of knowing what sorts of medical conditions they have.

Recommended Language for Describing Criteria for Population Assignments

In addition to providing the complete descriptor for each population when first describing the populations, the criteria used to assign membership in each population should be noted. Appropriate language for doing this is:

Additional Background about the Populations

