In Part I of the “You and the $1000 Genome” series we examined the Archon X PRIZE for Genomics, a $10 million purse for the group that can sequence 100 genomes in 10 days for no more than $10,000/genome with an error rate below 0.001%. With today’s technology this goal is still a few years away.
But do we need an entire genomic sequence to obtain all the relevant medical information that our DNA contains? After all, 99.9% of my DNA is exactly the same as everyone else’s! Why sequence that 99.9% over and over and over if the results are the same every time? Wouldn’t it be cheaper to just sequence and then decode the 0.1%?
Sequencing that 0.1% is the goal of the International HapMap Project. HapMap stands for “Haplotype Map”, and those of you who are genetic genealogists will instantly recognize the importance of the word haplotype. The goal of the HapMap Project, begun in 2002, is to identify SNP groups (haplotypes) from a total of 270 individuals representing the Yoruba people of Nigeria, the Han Chinese in Beijing, the Japanese, and U.S. residents with northern and western European ancestry. The HapMap is essentially a catalog of all the common genetic variants in human beings.
Phase I of the HapMap project, which is complete, identified 1 million SNPs in the human genome. SNPs are “single nucleotide polymorphisms”, a single variation in the genetic code. According to some scientists, 1 million SNPs is about 10% of the total SNPs in the human genome. Interestingly, the results of Phase I of the HapMap suggested that SNPs tend to cluster together at certain locations and may be passed onto the next generation in groups. For many regions of our DNA there are only a few different haplotypes in most humans, and researchers can identify these haplotypes using just a few single SNPs. As a result, a single person’s genotype (collection of haplotypes) can be created by sequencing as few as 300,000 to 600,000 SNPs. For a recent review of Phase I of the HapMap Project, read this 2005 article in PLoS Genetics (open access).
Phase II of the HapMap Project identified close to 2.5 million SNPs using the same 270 samples. Although data acquisition for Phase II has been completed, analysis is still continuing.As the HapMap data becomes available, researchers can use it to identify genes and SNPs that are involved in disease. If most people with colon cancer share a certain haplotype, researchers can use that information to identify the genes involved and doctors can use that information to predict who might be susceptible to colon cancer long before the disease develops. I’ve previously written about two studies using information from the HapMap to identify a locus associated with diabetes and prostate cancer.
So with the huge success of the HapMap Project, do we really need genome sequencing? Some would argue that haplotyping is not sufficient, especially when a genetic disease is found at very low frequencies in the population. According to Jonathan Rothberg, the founder and chairman of 454 Life Sciences, “genotyping rests on the hypothesis that common alleles contribute to common diseases. What if very uncommon alleles contribute to common diseases? Only deep sequencing would be able to answer this question. The deeper the sequencing, the less frequent variant you can find. You need deep coverage to ensure the statistical likelihood of finding rare mutations.” Indeed, some mutations are so rare that they are only found within specific families or populations. If these families aren’t part of the HapMap Project, there is the potential that their personal SNPs won’t be identified.
Despite the concerns, there is little doubt that the HapMap Project is a valuable contribution to the field of personalized medicine. It has already produced results that will further our understanding of the genetic component of diseases such as diabetes and prostate cancer. While HapMap sequencing has limitations that differentiate it from whole-genome sequencing, it is a much cheaper and immensely useful tool for scientists and medical specialists.