Genomics 92 (2008) 41–51

Contents lists available at ScienceDirect

Genomics j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / y g e n o

HapMap tagSNP transferability in multiple populations: General guidelines Jinchuan Xing, David J. Witherspoon, W. Scott Watkins, Yuhua Zhang, Whitney Tolpinrud 1, Lynn B. Jorde ⁎ Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA

A R T I C L E

I N F O

Article history: Received 20 February 2008 Accepted 28 March 2008 Available online 14 May 2008 Keywords: TagSNPs Transferability Single nucleotide polymorphism Linkage disequilibrium Genome-wide association study

A B S T R A C T Linkage disequilibrium (LD) has received much attention recently because of its value in localizing diseasecausing genes. Due to the extensive LD between neighboring loci in the human genome, it is believed that a subset of the single nucleotide polymorphisms in a region (tagSNPs) can be selected to capture most of the remaining SNP variants. In this study, we examined LD patterns and HapMap tagSNP transferability in more than 300 individuals. A South Indian sample and an African Mbuti Pygmy population sample were included to evaluate the performance of HapMap tagSNPs in geographically distinct and genetically isolated populations. Our results show that HapMap tagSNPs selected with r2 N= 0.8 can capture more than 85% of the SNPs in populations that are from the same continental group. Combined tagSNPs from HapMap CEU and CHB+JPT serve as the best reference for the Indian sample. The HapMap YRI are a sufficient reference for tagSNP selection in the Pygmy sample. In addition to our findings, we reviewed over 25 recent studies of tagSNP transferability and propose a general guideline for selecting tagSNPs from HapMap populations. © 2008 Elsevier Inc. All rights reserved.

Introduction Linkage disequilibrium (LD) has been instrumental in localizing many Mendelian disease-causing genes [1–3], and it holds great promise for mapping genes related to complex disease [4–6]. In addition, LD plays a crucial role in other areas of human genetics, including studies of human population structure and migration history [7]. Since portions of the human genome are in extensive LD, certain single nucleotide polymorphisms (SNPs) can be selected to represent other nearby SNPs that are in strong LD with them and therefore largely redundant. A set of such SNPs (i.e., tagSNPs) can be used to capture the vast majority of SNP variation in a region, thereby reducing the genotyping cost significantly [8]. The International HapMap Project is an effort to identify and catalog common genetic variants (mostly SNPs) in the human genome [9]. It is believed that tagSNPs selected from HapMap populations will be useful for association studies performed in other populations [9,10]. With the completion of phase II of the HapMap project [11], more than three million SNPs have been genotyped in 270 individuals from the four HapMap populations: Yoruba from Ibadan, Nigeria (YRI), Japanese from Tokyo, Japan (JPT), Han Chinese from Beijing, China (CHB), and Utah residents with northern and western European ancestry (CEU). These data give researchers an unprecedented opportunity to select tagSNPs to cut genotyping costs while maintaining sufficient power to detect disease-causing mutations. Nevertheless, it is known that LD patterns and haplotype blocks can vary ⁎ Corresponding author. Fax: +1 801 585 9148. E-mail address: [email protected] (L.B. Jorde). 1 Current address: Yale School of Medicine, New Haven, CT 06510. 0888-7543/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2008.03.011

across populations due to their unique histories [12–14]. Several earlier studies suggested that tagSNPs should be assessed in each individual population [15–17]. To evaluate the usefulness of tagSNPs selected from HapMap populations, it is critical to evaluate the similarity of haplotypes in different populations (especially isolated ones) and whether tagSNPs can capture most of the variants in these populations. To assess LD and haplotype variation among populations and to examine the transferability of HapMap tagSNPs, we genotyped 141 SNPs in more than 300 individuals from 20 populations around the world, including a South Indian population sample composed of two tribal groups and a genetically distinct African Mbuti Pygmy population sample that has not been previously evaluated for LD. Results Populations A total of 325 individuals from 20 worldwide populations are included in the analysis, with geographic information and sample sizes shown in Fig. 1. The HapMap populations represent three major continental groups: CEU for Europe, YRI for sub-Saharan Africa, and CHB+JPT for East Asia. For direct comparison with HapMap populations, three continental population groups were constructed from our samples based on individual ancestry: 104 unrelated individuals of northern European descent (EUR), 145 unrelated individuals from subSaharan Africa (AFR, including the Mbuti Pygmy group), and 59 unrelated individuals from East Asia (EAS). These groups can be compared with the HapMap population groups CEU, YRI, and CHB+JPT, respectively. Two populations were analyzed as examples of more challenging populations for tagSNP transfer: 17 unrelated individuals

42

J. Xing et al. / Genomics 92 (2008) 41–51

Fig. 1. Populations examined. Number of individuals in each population sample is given in parentheses.

from two tribal noncaste populations (Irula and Khonda Dora) in South India (IND), which do not correspond to any HapMap continental group; and 37 unrelated individuals from an African Mbuti Pygmy group (PYG), which is genetically distinct from other African populations [18,19]. To examine the degree of population differentiation, we calculated pairwise Fst estimates between HapMap populations and our populations (Table 1). The AFR, EAS, and EUR samples show almost no differentiation from the corresponding HapMap YRI, CHB+JPT, and CEU samples (Fst values of 0.010, 0 and 0.003, respectively). The Indian sample is more divergent from the HapMap CHB+JPT and CEU groups (Fst values of 0.055 and 0.074, respectively), consistent with India's intermediate geographic location between Europe and East Asia. Mbuti Pygmies show substantial differentiation from all HapMap populations, including HapMap YRI (Fst = 0.043). Allele frequencies and pairwise LD patterns A total of 141 SNPs from 14 genomic regions on eight different chromosomes were genotyped. Each region is about 50 kb in length and contains 10 SNPs on average (Table 2). SNP genotype data for

Table 1 Pairwise Fst distances between HapMap populations and those of the present study

AFR EAS EUR IND PYG

YRI

CHB+JPT

CEU

0.010 0.191 0.123 0.136 0.043

0.201 0.000 0.082 0.055 0.231

0.153 0.075 0.003 0.074 0.186

HapMap populations were obtained from the HapMap project website. We first compared allele frequencies between HapMap populations and our three continental groups. Fig. 2A shows that each of our continental groups has the highest allele frequency correlation with its corresponding HapMap population. Spearman's correlation coefficients (rho) are 0.95, 0.96 and 0.95 for AFR vs YRI, EAS vs CHB+JPT and EUR vs CEU, respectively. In comparisons between population samples from different continents, the correlations range from as low as 0.30 for AFR vs CHB+JPT to a maximum of 0.70 for EAS vs CEU. A comparison of pairwise LD (measured as r2) for all pairs of SNPs in each region shows similar patterns across populations (Fig. 2B). The Spearman's rho value for the pairwise r2 values are 0.84, 0.94, and 0.95 for AFR vs YRI, EAS vs CHB+JPT, and EUR vs CEU, respectively. For between-continent comparisons, the correlations range from 0.63 for AFR vs CEU to 0.75 for EUR vs CHB+JPT. Similar analyses were performed using D′ as a measure of LD, although all correlations for D′ are lower compared to those of r2 (not shown). The lower correlation of D′ values may be largely caused by a ceiling effect of this measurement [20]. We then compared allele frequencies and LD patterns of HapMap populations with the Indian and Pygmy population samples. Allele frequencies in these two populations are less correlated with the corresponding frequencies in the HapMap populations than the case for our continental groups (Fig. 3). Allele frequencies for Indians show the highest correlation with the HapMap JPT+CHB (rho = 0.71), and Mbuti Pygmies correlate best with the HapMap YRI (rho = 0.87). Pairwise LD (r2) values also show a weaker correlation with HapMap populations, relative to the results of our continental groups. LD patterns in Indians are correlated with LD in the HapMap CHB+JPT and CEU populations to a similar degree (rho = 0.76 and 0.71, respectively) and to a lesser degree with YRI (rho = 0.62). The LD pattern in Mbuti

J. Xing et al. / Genomics 92 (2008) 41–51

43

Table 2 Fourteen genomic regions genotyped in this study a

Region

SNPs

Chromosomal position (NCBI build 36)

Gene content

01_chr4 02_chr2 03_chr2 04_chr4 05_chr4 06_chr4 07_chr6 08_chr7 09_chr11 10_chr12 11_chr16 12_chr18 13_chr18 14_chr18

10 12 10 8 10 10 10 11 12 9 11 9 9 10

chr4:118570829–118604338 chr2:118396837–118446760 chr2:51812762–51860087 chr4:118704627–118751776 chr4:118511074–118549903 chr4:74981921–75037270 chr6:165635865–165694591 chr7:116635430–116686530 chr11:1997573–2054530 chr12:38942446–38976973 chr16:61666033–61707014 chr18:23749694–23794966 chr18:24074314–24115028 chr18:24120336–24160471

Geneless CCDC93 Geneless Geneless Geneless Geneless C6orf118, PDE10A ST7 Geneless LRRK2 Geneless CDH2 Geneless Geneless

a b c

Distance to telomere/Centromere

b

– – – – – – – – 2 Mb from Telomere 2.4 Mb from Centromere – – – –

Recombination hotspots

c

1 None 1 2 None None 2 None 2 None None 3 1 None

Gene content is determined based on UCSC Gene Predictions track in the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway). Only distances less than 5 Mb are shown. Positions of recombination hotspots are obtained from HapMap project (http://www.hapmap.org/).

Pygmies is most similar to that in the HapMap YRI population (rho = 0.60; Fig. 4), although the correlation is less than the correlation between AFR and YRI (rho = 0.84; Fig. 2B). HapMap tagSNP transferability in comparable continental groups To examine the transferability and tagging efficiency of HapMap tagSNPs in major continental groups, tagSNPs in each genomic region were selected from each HapMap population so that 100% of the known polymorphic SNPs in each region would be captured with r2 N= 0.8 in that population. These sets of tagSNPs were then evaluated in each of our continental groups to determine the SNP capture rate: the

percentage of SNPs captured at r2 N= 0.8 when using a pairwise tagging algorithm. These SNP capture rates show how well the chosen tagSNPs represent haplotype variation in other populations. The tagging efficiency is evaluated by the total number of the captured SNPs divided by the number of tagSNPs used, i.e., the number of SNPs captured per tagSNP. By calculating per tagSNP capture rate, we effectively normalize the different number of tagSNPs selected from each HapMap population. The more SNPs captured per tagSNP, the more efficient the tagSNP strategy will be. Fig. 5A shows the SNP capture rate averaged over all 14 regions. TagSNPs selected from HapMap CEU, CHB+JPT, and YRI captured 93, 86, and 94% of SNPs in the corresponding continental groups in our

Fig. 2. Correlations of allele frequencies (A) and LD measures (r2) for all SNP pairs (B) between HapMap populations and corresponding continental groups. Spearman's correlations (rho) are shown.

44

J. Xing et al. / Genomics 92 (2008) 41–51

Fig. 3. Correlation of allele frequency between HapMap populations and (A) Indians; (B) Mbuti Pygmies. Spearman's correlations (rho) are shown.

Fig. 4. Correlation of pairwise LD (r2) between HapMap populations and (A) Indians; (B) Mbuti Pygmies. Spearman's correlations (rho) are shown.

J. Xing et al. / Genomics 92 (2008) 41–51

45

Fig. 5. HapMap tagSNP transferability and tagging efficiency. (A) HapMap tagSNP transferability in three continental groups (AFR, EAS, and EUR) and two populations (IND and PYG) are shown. The average transferability among all 14 regions is shown as bars, and the transferability for each individual region is shown as black dots. For example, the first blue bar in the “AFR” section indicates that tagSNPs selected from the HapMap CEU population captured ~ 60% of the SNPs with r2N= 0.8 in our Africans, on average. (B) HapMap tagSNP tagging efficiency. The average tagging efficiency across all 14 regions is shown as bars, and the tagging efficiencies for each region are shown as black dots. For example, the last brown bar in the “PYG” section indicates that on average every HapMap YRI tagSNPs captured 1.21 SNPs in our African Pygmy samples.

dataset EUR, EAS, and AFR, respectively. It may seem curious that YRI has the highest total capture rate among three HapMap groups. However, Fig. 5B shows that YRI has the lowest per-tagSNP capture rate among the three tests (1.69, 1.73, and 1.25 in CEU, CHB+JPT, and YRI, respectively). Therefore, the high capture rate in YRI is a result of the larger number of tagSNPs (102 out of 135 of total SNPs) selected in this population, and lower tagging efficiency. When applied to data from continental groups other than the ones from which they were chosen, most HapMap tagSNP sets still captured more than 80% of SNPs, with the exception of the tagSNPs selected from CEU or CHB+JPT, which only captured 66 and 62% of SNPs in AFR, respectively. Interestingly, tagSNPs from CEU show a higher capture rate (90%) in EAS compared to those from CHB+JPT. Closer examination revealed that the CHB+JPT tagSNP set has a higher tagging efficiency (1.73) compared to CEU (1.65), as the CHB+JPT tagSNP set captured more “untyped” SNPs (SNPs that are not selected as tagSNPs) in EAS (Fig. 5B). When each region was examined individually, we found that tagSNP transferability varies considerably among different chromosomal regions (Supplemental Fig. 1). For example, in regions 2 and 10, ~ 30% of the SNPs were selected as tagSNPs in all HapMap populations. In region 2, all three tagSNP sets capture more than 90% of the SNPs in EAS. In contrast, in region 10, they only capture ~ 50% of the SNPs in EAS, reflecting very different LD patterns among populations in this region. To examine variation in tagSNP transferability among regions, we calculated the average SNP capture rate in each of the 14 regions for

each continental group (i.e., capture rate of AFR by HapMap YRI tagSNPs, EAS by CHB+JPT tagSNPs, and EUR by CEU tagSNPs). Regions 5 and 10 have the lowest average SNP capture rates (73% in each region), while the rates in the other 12 regions ranged from 83 to 100%. The low capture rates show no apparent correlation with recombination hotspots, since neither region contains known recombination hotspot (Table 2). While seven of the other 12 regions do contain known hotspots, they showed no apparent decrease in the capture rate (region 12, for example, contains three hotspots but has an average capture rate of 96%). Distance to cetromere or telomere represents another factor that may influence the LD pattern. Region 10 resides within 3Mb of the centromere of chromosome 12 and region 5 is not located within 5Mb of the telomere or centromere. Other factors, such as gene content and GC content, can also influence LD patterns [21]. In our case, region 5 contains no genes, while region 10 is located within the LRRRK2 gene (Table 2). Since no apparent genomic pattern can be identified in the two regions with the lowest SNP capture rate and most above-noted factors have been shown to account for only a small proportion of the variance in LD [21], much of the variation observed in our regions may be attributed simply to the high level of stochastic variation inherent in the evolutionary process [22]. HapMap tagSNP transferability in Indian and Pygmy population samples We next evaluated the transferability of HapMap tagSNPs to our tribal Indian and Pygmy samples. As shown in Fig. 5A, YRI, CEU, and

46

J. Xing et al. / Genomics 92 (2008) 41–51

Table 3 HapMap tagSNP transferability and efficiency Total SNPs

% of SNPs captured with r2 N 0.8

No. of SNPs captured by each tagSNPs

Mean Maximum r2

Pairwise tagging AFR CEU CHB+JPT YRI EAS CEU CHB+JPT YRI EUR CEU CHB+JPT YRI IND CEU CHB+JPT YRI CEU+CHB+JPT PYG CEU CHB+JPT YRI

135 135 135 135 135 135 135 135 135 135 135 135 135 134 134 134

65.9 62.2 94.1 90.4 85.9 96.3 92.6 81.5 97.0 92.6 83.0 98.5 97.0 70.2 67.2 93.3

1.20 1.25 1.25 1.65 1.73 1.27 1.69 1.64 1.28 1.67 1.65 1.29 1.38 1.27 1.32 1.21

0.98 0.98 0.98 0.98 0.97 0.99 0.97 0.97 0.99 0.99 0.99 1 1 0.98 0.98 0.99

Aggressive tagging AFR CEU CHB+JPT YRI EAS CEU CHB+JPT YRI EUR CEU CHB+JPT YRI IND CEU CHB+JPT YRI CEU+CHB+JPT PYG CEU CHB+JPT

135 135 135 135 135 135 135 135 135 135 135 135 135 134 134

60.7 57.0 85.9 87.4 85.2 97.8 90.4 83.0 97.0 90.4 83.0 97.0 96.3 65.7 61.2

1.17 1.20 1.21 1.69 1.80 1.38 1.74 1.75 1.36 1.69 1.72 1.35 1.38 1.26 1.26

0.99 0.98 0.99 0.98 0.96 0.99 0.96 0.97 0.98 0.99 0.99 1 1 0.99 0.98

Testing Pop.

Reference HapMap Pop.

this algorithm [24]. The two tagging algorithms performed similarly in our dataset (Table 3). This may be due to the relatively small regions (~ 50 kb) in this study, which prevented the multimarker approach from making use of long-range LD. Discussion

CHB+JPT tagSNPs capture 99, 93, and 83% of the total SNPs in our Indian sample, respectively. Because Indian populations are both geographically and genetically intermediate between European and East Asian populations [18,19,23], we combined tagSNPs previously identified in CEU and CHB+JPT into a single set and examined its performance. We found that the combined set had a 97% capture rate with a per tagSNP capture rate of 1.38 (Table 3). Therefore, the combined set represents a better reference for the Indian sample with less genotyping cost (per tagSNP capture rate of 1.38 as compared to 1.29 of YRI) and minimum loss of information (97% capture rate compared to 93% for CEU) compared to a single HapMap population. When genotyping savings is the primary concern, the CEU set provides good coverage (93%) with an extra 21% per tagSNP capture rate (1.67 vs 1.38) compared to the combined set (Table 3). For the Mbuti Pygmy sample, YRI tagSNPs capture more than 93% of total SNPs, while tagSNPs from CEU and CHB+JPT only capture 70 and 67%, respectively. Therefore, despite the fact that the YRI set requires the lowest per tagSNP capture rate (1.21), it represents the best reference population in terms of maximizing the information gained (Fig. 5B). Performance of pairwise and aggressive tagging algorithm Finally, we compared the performance of the pairwise tagging algorithm to the aggressive tagging algorithm provided in Haploview. In addition to the pairwise tagging steps in which the algorithm selects a set of markers to capture all SNPs in a dataset with pairwise r2 larger than a preset threshold [16], the aggressive tagging algorithm also searches for combinations of multiple markers as predictors for certain alleles and removes the redundant individual tagSNPs during the process. Therefore, higher tagging efficiency can be achieved by

Linkage disequilibrium patterns, and thus tagSNP transferability rates, can be influenced both by the demographic histories of populations and by genomic factors. In accord with other studies, our data show less LD in African than in non-African populations [9], and we find that geographically isolated populations have somewhat lower tagSNP transferability rates. We also observed variation in tagSNP transferability rates among different genomic regions. This may reflect the inherent stochasticity in evolution and the influence of factors that can alter the LD pattern in a region, such as the presence of recombination hotspots, gene content, GC content, and distance relative to centromeres and telomeres. To date, more than 25 studies have assessed the tagSNPs transferability in a range of worldwide populations (detailed in Table 4). In the following section, we combine the results of our study with those of other recent studies to compose general guidelines for tagSNP selection based on HapMap populations. Fig. 6 summarizes the guidelines in a flowchart. If the population under consideration belongs to the same continental group (i.e., sub-Saharan Africa, Europe, and East Asia) as one of the HapMap populations, it is intuitive to choose tagSNPs from that HapMap population. Results from this study (Fig. 5) and other studies analyzing a number of worldwide populations support this approach [13,25–31]. In a study using the CEPH Human Genome Diversity Panel (HGDPCEPH) [28], tagSNPs were picked from HapMap samples to capture all SNPs at r2 N 0.85. The HapMap population located geographically closest to the population to be tagged yielded the best results for most populations except for Mayans (best results from CEU set) and Mozabites (best results from YRI set). This result may reflect recent European admixture in Mayans and African ancestry in Mozabites. Populations from another worldwide collection, the ALlele FREquency Database (ALFRED) with ~ 2000 individuals from 38 populations, have also been evaluated [31]. Instead of looking at the portability of the tagSNPs, the authors developed an algorithm to utilize tagSNPs to reconstruct untyped SNPs in other populations. Their results indicate that, proceeding eastward from Africa, the western population in two adjacent populations can generally be used as a reference for its eastern neighbor. The exceptions are populations that are known to have been isolated for many years, such as Samaritans or Pacific Islanders. Interestingly, Paschou et al. [31] found that due to its high genetic diversity, the African-American population is the only one that can be used to predict untyped SNPs in almost all other populations in the sample. In addition to studies that treat populations from multiple continental groups, several studies have focused on specific continental groups or populations [32–38]. These results, summarized in Table 4, suggest that in most cases, tagSNPs selected from the HapMap CEU and CHB+JPT populations can capture more than 80% of SNP variation in European and East Asian populations, respectively. TagSNPs selected from YRI usually capture more SNPs in sub-Saharan populations than tagSNPs from CEU or CHB+JPT. Nevertheless, due to the higher genetic diversity and lower LD in African populations [7,39–41], fewer SNPs can be tagged in sub-Saharan African populations compared to European and Asian groups, given the same number of tagSNPs. As a general rule, if the population under consideration belongs to the same continental group as one of the HapMap populations, tagSNPs chosen from that HapMap population will work well (Table 5). In some cases, study samples do not correspond well to a HapMap continental group, such as populations in the Middle East or America.

J. Xing et al. / Genomics 92 (2008) 41–51

47

Table 4 A summary of tagSNP transferability studies Year

No. of No. of Populations populations individuals

2003

3

96 trios

2004

5

1635

2004

3

242

2005 44

1262

2005

~ 1200

9

Chinese, Malysian, Utah CEPH Gambian, British, Norwegian, Finnish, Romanian UK Caucasian, African-American, CEPH European CEPH Human Genome Diversity Panel (HGDP-CEPH) 9 European populations

Regions

No. of SNPs

Conclusion

Reference

SCN1A gene

31

[17]

VDR gene region, 94 kb

55

Chr20, 10 Mb

2139

TagSNPs chosen from CEPH work poorly in Malay or Chinese. TagSNPs should be chosen from closely related populations. TagSNPs chosen from each European population can capture most SNPs in other European populations, but performed poorly in Gambians. TagSNPs selected from UK Caucasians can capture 96 and 84% of haplotypes in CEPH Europeans and African Americans, respectively.

CTLA4 gene, 14 kb

17

With 2 to 4 tagSNPs, tagSNP sets work well within continental groups, but work poorly across continental groups.

[53]

4 genes, 749 kb

100

TagSNPs selected from HapMap CEU captured more than 70% of SNPs in three genes for most populations (except two in LMNA gene), but only two populations in the PLAU gene. The geographically nearest HapMap population usually yields the best tagSNPs for target populations. Populations with low LD, especially African populations, require higher tagSNP density. TagSNPs are highly informative in populations within the same continental group and often efficient for more distant and differentiated populations. TagSNPs transfer better from “older” and more diverse populations to “younger” populations. TagSNPs selected from HapMap populations capture the majority of common haplotypes in many other populations and provide good power for association study involving common variants.

[32]

[26]

~ 1400

TagSNPs selected from East Asia populations are portable within the group. Fst between populations can be used to evaluate the portability of tagSNPs. Using tagSNPs from CEU, ~ 80% or more of SNPs were captured in non-African populations, but only 50% in African Americans. TagSNPs selected from the four populations have similar power among these populations in simulated association studies. HapMap CEU samples provide an adequate basis for tagSNPs selection in Finnish individuals. TagSNPs selected from HapMap CEU tagged more than 70% of SNPs in 64 genes in a Spanish population ( N 80% in 58 genes). HapMap CEU tagSNPs capture more than 90% of SNPs in Estonians.

~ 800 792

TagSNPs from HapMap CEU data captured 98% of SNPs in the cohort. ~ 90% transferability from HapMap CHB+JPT to Korean.

[36] [38]

HapMap CEU will be useful for tagSNP selection in Australians with European ancestry. ~ 110,000 Over 98% of Kosraen haplotypes are present in the HapMap CEU, JPT and CHB populations. 166 drug-related 861 TagSNPs chosen from HapMap CHB+JPT captured 98% genes of Thai SNPs. 3 ENCODE regions, 886 TagSNPs chosen from HapMap CHB+JPT captured more 500 kb each than 80% of Korean SNPs in all three regions. 6 regions, ~ 2.6 Mb 248 Moving out Africa, the western populations can be used as references to reconstruct “untyped” SNPs in their eastern neighbors, with the exception of isolated populations. 3 genes, 12 kb ~ 60 HapMap CEU works well for some Indian groups, not for others. Chr21, 3.3 Mb 3188 46% of SNPs in Sami are not present in HapMap dataset, and 43% of the Sami-unique SNPs are not tagged by HapMap CEU tagSNPs. 40 kb central regions ~ 627 TagSNPs chosen from HapMap CHB or JPT captured more than 80% of 10 ENCODE regions of Cebu Filipinos SNPs. Chr22, 8 Mb 771 HapMap CEU is sufficient for tagSNP selection in Sardinians.

[37]

2006 52

927

HGDP-CEPH

36 regions, ~ 12 Mb

2834

2006 38

1055

HGDP-CEPH

Chr22, 1 Mb

144

2006 38

~ 2000

15

869

10 regions, 338 kb 25 genes, 2.6 Mb

134

2006

2006

7

318

Allele FREquency Database(ALFRED) 4 HapMap populations, 3 HGDP populations, 6 Multiethnic Cohort (MEC) populations, Finnish and AfricanAmerican European, African, 5 East Asian populations

entire Chr21

19060

2006

7

396

61 genes, 5.7 Mb

2783

2006

4

185

Chr20, 10 Mb

2006

1

1425

CEU, 5 MEC populations, Chinese Caucasian, CEPH, Han Chinese, Japanese Finnish

1012– 2100 956

2006

1

845

Spanish

2006

1

1054

Estonian

2006 2006

1 1

44 90

European Korean

2006

1

359

Australian

2006

1

30

Kosraen trios

2006

1

280

Thai

2006

1

90

Korean

2007 38

1979

ALFRED

2007 2007

10 1

320 22

Indian Sami

2007

1

80

Filipino

2008

1

101

Sardinian

Chr14, 17.9 Mb 66 cancer-associated genes, ~ 7 Mb Two ENCODE regions, 500 kb each 4 regions, 14.4 Mb Chr7 ENCODE region, 500 kb Chr 6, 3.7 Mb; Chr10, 1.3 Mb Whole genome

1679

491

633

To test the HapMap tagSNP transferability in these populations, we examined a South Indian tribal population sample as a representative. Our results indicate that a combination of tagSNPs selected from CEU and CHB+JPT captures more than 95% of SNPs in the Indian population. This supports the use of HapMap populations as references for populations whose geographic regions are not represented in the HapMap samples, albeit with higher genotyping cost. A number of other studies showed that using the geographically nearest reference population or a combination of adjacent populations as a reference usually gives the best results for these populations (Table 4)

[30]

[25]

[28]

[13]

[14] [27]

[29] [54] [33] [34] [35]

[42] [55] [56] [31]

[47] [46] [57] [58]

[13,14,28,29,42]. Specifically, HapMap YRI and/or CEU provide good portability for Middle East populations. TagSNPs selected from CEU have a better capture rate for populations from Central and South Asian regions than CHB+JPT. In Oceania, the HapMap CHB+JPT population can serve as a good reference for Papuans, Melanesians, Micronesians, and Native Hawaiians. The HapMap CHB+JPT population can also be used as a reference for many Native American populations. It is noteworthy that, due in part to recent admixture between Native American and European populations, HapMap CEU sometimes serves as a better reference than CHB+JPT for Native American populations [28,29] (Table 5).

48

J. Xing et al. / Genomics 92 (2008) 41–51

Fig. 6. A flow chart for tagSNP selection using HapMap populations.

For comparison with the HapMap populations, we have focused here on continental population groups. However, because genetic variation is often distributed in a clinal fashion, continents are not always the optimal units for grouping populations [43]. For example, West Asian populations may be genetically more similar to the HapMap CEU than the CHB+JPT samples. Recently, the International HapMap Consortium has proposed to extensively genotype and sequence samples from seven additional populations of diverse origins [11]. The additional information in these populations will improve tagSNP performance in populations that are not well represented by the three HapMap groups. Because they may exhibit reduced genetic and environmental heterogeneity, isolated populations are thought to have a number of advantages when searching for genes related to complex diseases [44]. To gauge the portability and tagging efficiency of HapMap tagSNPs to isolated populations, we evaluated the tagSNP transferability in African Mbuti Pygmies. Genetically, Mbuti Pygmies are distinct from other African populations [45] and are often identified as a separate population from other Africans in genetic structure analyses [18,19]. Previous analyses have shown that the Mbuti Pygmy sample used here is genetically similar to the much smaller Mbuti Pygmy sample included in the CEPH Diversity Panel [18,19]. The Fst value of 4% between YRI and Mbuti Pygmies, obtained in this

study, confirms a substantial genetic difference between these populations. Nevertheless, YRI still serves as a sufficient reference population in terms of tagSNP selection, yielding a capture rate of more than 90%, albeit with a low tagging efficiency (1.21 per tagSNP capture rate). Other studies of isolated populations have shown varying degrees of transferability. Paschou et al. [31] found that in populations isolated for many years, like Samaritans or Pacific Islanders, genotypes cannot be reconstructed faithfully from tagSNPs selected from populations within the same continent. However, tagSNPs selected from AfricanAmericans can better predict untyped SNPs in these populations [31]. Johansson et al. [46] investigated the transferability of HapMap tagSNPs in the Sami population of northern Europe. When tagSNPs were selected from CEU with r2 N 0.8, only about 70% of the Sami SNPs were tagged, a percentage similar to the capture rate realized with the same number of randomly selected SNPs in the Sami. The low capture rate in this study may be caused by the difference in allele-frequency distributions in the two populations, since the untagged SNPs in Sami have significantly lower heterozygosity and minor allele frequencies compared to the tagged SNPs. Roy et al. [47] showed that tagSNPs selected with r2N 0.8 from every population (including Europeans) can capture 70 to 100% of haplotype diversity in other populations, with the exception of Manipuri Brahmin. However, the small data set

J. Xing et al. / Genomics 92 (2008) 41–51 Table 5 General guideline for tagSNP reference population selection Continental regions Target population

Reference Reference population

Sub-Saharan Africa

African-American Bantu Speaker Biaka Pygmy Mandenka Mbuti Pygmy

YRI YRI YRI YRI YRI

San Yoruba Ibo Ethiopian Jews Bedouin Druze Mozabite Palestinian Adygei Australian with European ancestry Basque British Italian Estonian Finn French German Norwegian Orcadian Romanian Russian Sami Sardinian Spanish Balochi Brahui Burusho Hazara Indian

YRI YRI YRI YRI CEU/YRI CEU/YRI YRI CEU/YRI CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU CEU+CHB+ JPT CEU CEU CEU CEU CEU CHB+JPT CHB+JPT CHB+JPT

Middle East

Europe

Central/South Asia

East/Southeast Asia

Oceania

America

Kalash Makrani Pathan Sindhi Uyghur Cambodian Han Chinese Northern Chinese (Daur, Hezhen, Mongola, Oroquen, Tu, Xibo) Southern Chinese (Ami, Atayal, Dai, Lahu, Miao, Naxi, Taiwanese, Tujia, She, Wa, Yi, Zhang) Hakka Japanese Korean Yakut Filipino Thai Melanesian Papuan Native Hawaiian Micronesians Colombian Karitiana Latino Maya Pima Surui

[14,27,29] [13,14,28] [13,14,28] [13,28] [This study, 13,14,28] [13,28] [13,14,27,28] [14] [14] [13,28] [13,14,28] [13,28] [13,28] [13,14,28] [37] [13,28] [25,30] [13,28,32] [32,35] [14,27,30,33] [13,28] [32] [30] [13,28] [30] [13,14,28] [46] [13,28,58] [34] [13,28] [13,28] [13,28] [13,28] [This study, 47] [13,28] [13,28] [13,28] [13,28] [26,28] [13,14,28] [13,14,26,27,28] [28]

CHB+JPT

[26,28]

CHB+JPT CHB+JPT CHB+JPT CHB+JPT CHB/JPT CHB+JPT CHB+JPT CHB+JPT CEU CEU/CHB/ JPT CHB+JPT CHB+JPT CEU CEU CHB+JPT CHB+JPT

[14] [13,14,27,28] [38,56] [13,14,28] [57] [55] [13,28] [13,28] [29] [14,42] [13,28] [13,14,28] [29] [13,14,28] [13,14,28] [13,14,28]

size (a single region of ~ 20 kb containing ~ 20 SNPs) and sample size (e.g., 11 Manipuri Brahmin individuals) in this study do not permit generalization of their results. Collectively, these results indicate that the portability of tagSNPs for isolated populations varies among populations and regions. In

49

some cases, only half of the variation in a target population can be captured. In such situations, several strategies have been proposed to improve tagSNP performance. For example, a combined set of tagSNPs (“cosmopolitan tagSNPs”) from multiple populations can be used to increase tag capture rates in distinct populations [27,48]. Another approach is to increase the tagSNP selection stringency (e.g., selecting tagSNPs using r2 = 0.9 instead of 0.8 as the threshold). A drawback of these approaches is that more tagSNPs have to be genotyped, lowering the tagging efficiency. Another strategy is to use populations other than the HapMap samples. Because closely related populations generally yield better tagging efficiency, Fst can be calculated among populations to determine which known population should serve as the best reference population [14,26]. In some cases, using a genetically diverse population (e.g., African-Americans) as a reference may improve the performance of tagSNPs [31]. Lastly, if no appropriate reference population has been surveyed, a small number of individuals from the target population can be sequenced in the regions of interest, and tagSNPs can be selected specifically for that population [12–14]. There are several potential pitfalls when using HapMap populations as references. First, the HapMap project is designed for the optimal capture of common variants in populations [9]. As a result, the allele frequency distribution of HapMap SNPs is skewed toward intermediate frequencies. Rare variants are poorly represented and may not be tagged by tagSNPs selected from HapMap populations [31,48]. Also, tagSNPs are likely to miss other types of variants, including insertion/deletion polymorphisms and structural variants, which are not included in the HapMap project [9]. An investigation of a ~30 kb deletion polymorphism in the APOBEC gene region [49] showed that, despite the presence of the deletion in nearly 40% of the world's population, no suitable tagSNPs could be selected for this variant from the HapMap Phase I data. Therefore, if the SNP allele frequency distribution in the target population differs markedly from that of the HapMap populations, or a study involves indels or rare variants, caution is needed when using HapMap tagSNPs. In addition, tagSNP transferability is also expected to vary across genomic regions. The stochastic nature of genome evolution and a number of genomic factors can influence variation in LD patterns, and thus tagSNP transferability. In any case, a good understanding of the genetic background, migration history, and allele frequency distribution of the target population will help in the tagSNP selection process. With the rapid development of sequencing and genotyping technologies and ever-decreasing cost, more and more researchers are using microarray-based whole genome SNP genotyping or even resequencing of target regions for association studies. Nevertheless, the whole-genome approach is still expensive, particularly when many thousands of cases and controls are needed to detect alleles with small effects [6]. Therefore, a detailed understanding of population history and the transferability of tagSNPs will remain an important component of human genetic studies for years to come. Materials and methods Genomic regions and SNPs Fourteen genomic regions on eight chromosomes were genotyped. Each region is about 50 kb in length and noncoding SNPs were selected in each region to cover the region with a density of 5 kb/SNP on average. Table 2 describes the position and properties (e.g., gene content) of the 14 regions. These regions were initially selected to examine the effect of recently fixed Alu elements on homologous recombination. Extensive analyses revealed that the Alu elements had little or no effect on the local recombination rate (D.J. Witherspoon et al., unpublished data). The SNPs were genotyped in a total of 351 individuals. The human population samples used for this study have been described previously [19,50]. After genotyping, 26 individuals lacking genotypes at more than 50% of the typed loci were excluded from the subsequent analysis. The final dataset was composed of genotypes from 325 individuals with a missing data rate of 2.8%. All SNPs were genotyped using ABI SNaPshot multiplex system (Applied Biosystems, Foster City, CA). The SNP rs numbers and genotypes in each individual are shown in Supplemental Table 1. SNP loci that deviated strongly from

50

J. Xing et al. / Genomics 92 (2008) 41–51

Hardy-Weinberg equilibrium (rs508897, chi-square test, P b 0.000001 in Africa), or with missing genotypes in one HapMap population (rs2311717), or SNPs that are fixed in any population (nine total) were removed before the analysis. The final number of SNPs used in each analysis is shown in Table 3. HapMap genotypes for all of our selected SNPs were obtained from the HapMap website (release 16c.1 of phase I, June 2005). These SNPs were genotyped in 209 unrelated individuals (60 Yoruba, 60 Utah residents with northern and western European ancestry, and 89 East Asians of Chinese and Japanese descent.). Data analysis Fst estimates between populations were calculated by the method described by Weir and Cockerham [51]. When population differentiation is weak, this method could result in negative Fst values due to sampling errors. In this case, the Fst value was rounded to zero. Measures of LD between pairs of SNP loci (r2 and D′) were calculated by Haploview (http://www.broad.mit.edu/mpg/haploview), using the confidenceinterval method which accepts unphased genotypes as input [52]. TagSNPs were selected from each HapMap population using the Tagger program [24] in Haploview with the pairwise and aggressive tagging options. We selected the most commonly used standard (r2 N= 0.8 between tag- and tagged-SNPs as both selecting and evaluating thresholds) to evaluate tagSNP transferability. That is, tagSNPs were selected from each HapMap population so that 100% of the polymorphic SNPs that we genotyped in each region would be captured with r2 N= 0.8 in that population. These sets of tagSNPs were then evaluated in each of our continental groups to determine the SNP capture rate: the percentage of SNPs captured at r2 N= 0.8 when using a pairwise tagging algorithm.

[12] [13]

[14]

[15] [16]

[17]

Acknowledgments The authors thank the two anonymous reviewers for their constructive and valuable comments. We also thank Elizabeth Marchani for her useful comments during the preparation of this manuscript. This work was supported by grants from the National Science Foundation (BCS-0218370), and National Institutes of Health (GM-59290 and HL-070048). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ygeno.2008.03.011.

[18]

[19]

[20]

[21] [22] [23]

References [24] [1] J. Hästbacka, A. de la Chapelle, M.M. Mahtani, G. Clines, M.P. Reeve-Daly, M. Daly, B.A. Hamilton, et al., The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping, Cell 78 (1994) 1073–1087. [2] E.G. Puffenberger, E.R. Kauffman, S. Bolk, T.C. Matise, S.S. Washington, M. Angrist, J. Weissenbach, et al., Identity-by-descent and association mapping of a recessive gene for Hirschsprung disease on human chromosome 13q22, Hum. Mol. Genet. 8 (1994) 1217–1225. [3] J.N. Feder, A. Gnirke, W. Thomas, Z. Tsuchihashi, D.A. Ruddy, A. Basava, F. Dormishian, R. Domingo Jr., M.C. Ellis, A. Fullan, L.M. Hinton, N.L. Jones, B.E. Kimmel, G.S. Kronmal, P. Lauer, V.K. Lee, D.B. Loeb, F.A. Mapa, E. McClelland, N.C. Meyer, G.A. Mintier, N. Moeller, T. Moore, E. Morikang, R.K. Wolff, et al., A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis, Nat. Genet. 13 (1996) 399–408. [4] L.B. Jorde, Linkage disequilibrium and the search for complex disease genes, Genome Res. 10 (2000) 1435–1444. [5] R.J. Klein, C. Zeiss, E.Y. Chew, J.-Y. Tsai, R.S. Sackler, C. Haynes, A.K. Henning, J.P. SanGiovanni, S.M. Mane, S.T. Mayne, M.B. Bracken, F.L. Ferris, J. Ott, C. Barnstable, J. Hoh, Complement factor H polymorphism in age-related macular degeneration, Science 308 (2005) 385–389. [6] Wellcome Trust Case Control Consortium, Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls, Nature 447 (2007) 661–678. [7] D.E. Reich, M. Cargill, S. Bolk, J. Ireland, P.C. Sabeti, D.J. Richter, T. Lavery, R. Kouyoumjian, S.F. Farhadian, R. Ward, E.S. Lander, Linkage disequilibrium in the human genome, Nature 411 (2001) 199–204. [8] G.C. Johnson, L. Esposito, B.J. Barratt, A.N. Smith, J. Heward, G. Di Genova, H. Ueda, H.J. Cordell, I.A. Eaves, F. Dudbridge, R.C. Twells, F. Payne, W. Hughes, S. Nutland, H. Stevens, P. Carr, E. Tuomilehto-Wolf, J. Tuomilehto, S.C. Gough, D.G. Clayton, J.A. Todd, Haplotype tagging for the identification of common disease genes, Nat. Genet. 29 (2001) 233–237. [9] D. Altshuler, L.D. Brooks, A. Chakravarti, F.S. Collins, M.J. Daly, P. Donnelly, A haplotype map of the human genome, Nature 437 (2005) 1299–1320. [10] International HapMap Consortium, The International HapMap Project, Nature 426 (2003) 789–796. [11] K.A. Frazer, D.G. Ballinger, D.R. Cox, D.A. Hinds, L.L. Stuve, R.A. Gibbs, J.W. Belmont, A. Boudreau, P. Hardenbol, S.M. Leal, S. Pasternak, D.A. Wheeler, T.D. Willis, F. Yu, H.

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

Yang, C. Zeng, Y. Gao, H. Hu, W. Hu, C. Li, W. Lin, S. Liu, H. Pan, X. Tang, J. Wang, W. Wang, J. Yu, B. Zhang, Q. Zhang, H. Zhao, H. Zhao, J. Zhou, S.B. Gabriel, R. Barry, B. Blumenstiel, A. Camargo, M. Defelice, M. Faggart, M. Goyette, S. Gupta, J. Moore, H. Nguyen, R.C. Onofrio, M. Parkin, J. Roy, E. Stahl, E. Winchester, L. Ziaugra, D. Altshuler, Y. Shen, Z. Yao, W. Huang, X. Chu, Y. He, L. Jin, Y. Liu, Y. Shen, W. Sun, H. Wang, Y. Wang, Y. Wang, X. Xiong, L. Xu, M.M. Waye, S.K. Tsui, H. Xue, J.T. Wong, L.M. Galver, J.B. Fan, K. Gunderson, S.S. Murray, A.R. Oliphant, M.S. Chee, A. Montpetit, F. Chagnon, V. Ferretti, M. Leboeuf, J.F. Olivier, M.S. Phillips, S. Roumy, C. Sallee, A. Verner, T.J. Hudson, P.Y. Kwok, D. Cai, D.C. Koboldt, R.D. Miller, L. Pawlikowska, P. Taillon-Miller, M. Xiao, L.C. Tsui, W. Mak, Y.Q. Song, P.K. Tam, Y. Nakamura, T. Kawaguchi, T. Kitamoto, T. Morizono, A. Nagashima, Y. Ohnishi, et al., A second generation human haplotype map of over 3.1million SNPs, Nature 449 (2007) 851–861. J.K. Pritchard, M. Przeworski, Linkage disequilibrium in humans: models and data, Am. J. Hum. Genet. 69 (2001) 1–14. A. Gonzalez-Neira, X. Ke, O. Lao, F. Calafell, A. Navarro, D. Comas, H. Cann, S. Bumpstead, J. Ghori, S. Hunt, P. Deloukas, I. Dunham, L.R. Cardon, J. Bertranpetit, The portability of tagSNPs across populations: a worldwide survey, Genome Res.16 (2006) 323–330. S. Gu, A.J. Pakstis, H. Li, W.C. Speed, J.R. Kidd, K.K. Kidd, Significant variation in haplotype block structure but conservation in tagSNP patterns among global populations, Eur. J. Hum. Genet. 15 (2007) 302–312. D. Thompson, D. Stram, D. Goldgar, J.S. Witte, Haplotype tagging single nucleotide polymorphisms and association studies, Hum. Hered. 56 (2003) 48–55. C.S. Carlson, M.A. Eberle, M.J. Rieder, Q. Yi, L. Kruglyak, D.A. Nickerson, Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium, Am. J. Hum. Genet. 74 (2004) 106–120. M.E. Weale, C. Depondt, S.J. Macdonald, A. Smith, P.S. Lai, S.D. Shorvon, N.W. Wood, D.B. Goldstein, Selection and evaluation of tagging SNPs in the neuronal-sodiumchannel gene SCN1A: implications for linkage-disequilibrium gene mapping, Am. J. Hum. Genet. 73 (2003) 551–565. M.J. Bamshad, S. Wooding, W.S. Watkins, C.T. Ostler, M.A. Batzer, L.B. Jorde, Human population genetic structure and inference of group membership, Am. J. Hum. Genet. 72 (2003) 578–589. W.S. Watkins, A.R. Rogers, C.T. Ostler, M.J. Bamshad, A.E. Brassington, M.L. Carroll, S.V. Nguyen, J.A. Walker, M.A. Batzer, L.B. Jorde, Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms, Genome Res. 13 (2003) 1607–1618. D.M. Evans, L.R. Cardon, A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations, Am. J. Hum. Genet. 76 (2005) 681–687. A.V. Smith, D.J. Thomas, H.M. Munro, G.R. Abecasis, Sequence features in regions of weak and strong linkage disequilibrium, Genome Res. 15 (2005) 1519–1534. M. Nordborg, S. Tavare, Linkage disequilibrium: what history has to tell us, Trends Genet. 18 (2002) 83–90. H. Vishwanathan, E. Deepa, R. Cordaux, M. Stoneking, M.V. Usha Rani, P.P. Majumder, Genetic structure and affinities among tribal populations of southern India: a study of 24 autosomal DNA markers, Ann. Hum. Genet. 68 (2004) 128–138. P.I. de Bakker, R. Yelensky, I. Pe'er, S.B. Gabriel, M.J. Daly, D. Altshuler, Efficiency and power in genetic association studies, Nat. Genet. 37 (2005) 1217–1223. X. Ke, C. Durrant, A.P. Morris, S. Hunt, D.R. Bentley, P. Deloukas, L.R. Cardon, Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples, Hum. Mol. Genet. 13 (2004) 2557–2565. W. Huang, Y. He, H. Wang, Y. Wang, Y. Liu, Y. Wang, X. Chu, Y. Wang, L. Xu, Y. Shen, X. Xiong, H. Li, B. Wen, J. Qian, W. Yuan, C. Zhang, Y. Wang, H. Jiang, G. Zhao, Z. Chen, L. Jin, Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 1418–1421. P.I. de Bakker, N.P. Burtt, R.R. Graham, C. Guiducci, R. Yelensky, J.A. Drake, T. Bersaglieri, K.L. Penney, J. Butler, S. Young, R.C. Onofrio, H.N. Lyon, D.O. Stram, C.A. Haiman, M.L. Freedman, X. Zhu, R. Cooper, L. Groop, L.N. Kolonel, B.E. Henderson, M.J. Daly, J.N. Hirschhorn, D. Altshuler, Transferability of tag SNPs in genetic association studies in multiple populations, Nat. Genet. 38 (2006) 1298–1303. D.F. Conrad, M. Jakobsson, G. Coop, X. Wen, J.D. Wall, N.A. Rosenberg, J.K. Pritchard, A worldwide survey of haplotype variation and linkage disequilibrium in the human genome, Nat. Genet. 38 (2006) 1251–1260. P.I. de Bakker, R.R. Graham, D. Altshuler, B.E. Henderson, C.A. Haiman, Transferability of tag SNPs to capture common genetic variation in DNA repair genes across multiple populations, Pac. Symp. Biocomput. (2006) 478–486. S. Nejentsev, L. Godfrey, H. Snook, H. Rance, S. Nutland, N.M. Walker, A.C. Lam, C. Guja, C. Ionescu-Tirgoviste, D.E. Undlien, K.S. Ronningen, E. Tuomilehto-Wolf, J. Tuomilehto, M.J. Newport, D.G. Clayton, J.A. Todd, Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene, Hum. Mol Genet. 13 (2004) 1633–1639. P. Paschou, M.W. Mahoney, A. Javed, J.R. Kidd, A.J. Pakstis, S. Gu, K.K. Kidd, P. Drineas, Intra- and interpopulation genotype reconstruction from tagging SNPs, Genome Res. 17 (2007) 96–107. J.C. Mueller, E. Lohmussaar, R. Magi, M. Remm, T. Bettecken, P. Lichtner, S. Biskup, T. Illig, A. Pfeufer, J. Luedemann, S. Schreiber, P. Pramstaller, I. Pichler, G. Romeo, A. Gaddi, A. Testa, H.E. Wichmann, A. Metspalu, T. Meitinger, Linkage disequilibrium patterns and tagSNP transferability among European populations, Am. J. Hum. Genet. 76 (2005) 387–398. C.J. Willer, L.J. Scott, L.L. Bonnycastle, A.U. Jackson, P. Chines, R. Pruim, C.W. Bark, Y.Y. Tsai, E.W. Pugh, K.F. Doheny, L. Kinnunen, K.L. Mohlke, T.T. Valle, R.N. Bergman, J. Tuomilehto, F.S. Collins, M. Boehnke, Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap database, Genet. Epidemiol. 30 (2006) 180–190.

J. Xing et al. / Genomics 92 (2008) 41–51 [34] G. Ribas, A. Gonzalez-Neira, A. Salas, R.L. Milne, A. Vega, B. Carracedo, E. Gonzalez, E. Barroso, L.P. Fernandez, P. Yankilevich, M. Robledo, A. Carracedo, J. Benitez, Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes, Hum. Genet. 118 (2006) 669–679. [35] A. Montpetit, M. Nelis, P. Laflamme, R. Magi, X. Ke, M. Remm, L. Cardon, T.J. Hudson, A. Metspalu, An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population, PLoS Genet. 2 (2006) e27. [36] E.M. Smith, X. Wang, J. Littrell, J. Eckert, R. Cole, A.H. Kissebah, M. Olivier, Comparison of linkage disequilibrium patterns between the HapMap CEPH samples and a family-based cohort of Northern European descent, Genomics 88 (2006) 407–414. [37] J. Stankovich, C.J. Cox, R.B. Tan, D.S. Montgomery, S.J. Huxtable, J.P. Rubio, M.G. Ehm, L. Johnson, H. Butzkueven, T.J. Kilpatrick, T.P. Speed, A.D. Roses, M. Bahlo, S.J. Foote, On the utility of data from the International HapMap Project for Australian association studies, Hum. Genet. 119 (2006) 220–222. [38] J. Lim, Y.J. Kim, Y. Yoon, S.O. Kim, H. Kang, J. Park, A.R. Han, B. Han, B. Oh, K. Kimm, B. Yoon, K. Song, Comparative study of the linkage disequilibrium of an ENCODE region, chromosome 7p15, in Korean, Japanese, and Han Chinese samples, Genomics 87 (2006) 392–398. [39] S.B. Gabriel, S.F. Schaffner, H. Nguyen, J.M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S.N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E.S. Lander, M.J. Daly, D. Altshuler, The structure of haplotype blocks in the human genome, Science 296 (2002) 2225–2229. [40] J.D. Wall, J.K. Pritchard, Haplotype blocks and linkage disequilibrium in the human genome, Nat. Rev. Genet. 4 (2003) 587–597. [41] S.A. Tishkoff, K.K. Kidd, Implications of biogeography of human populations for 'race' and medicine, Nat. Genet. 36 (2004) S21–S27. [42] P.E. Bonnen, I. Pe'er, R.M. Plenge, J. Salit, J.K. Lowe, M.H. Shapero, R.P. Lifton, J.L. Breslow, M.J. Daly, D.E. Reich, K.W. Jones, M. Stoffel, D. Altshuler, J.M. Friedman, Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia, Nat. Genet. 38 (2006) 214–217. [43] N.A. Rosenberg, S. Mahajan, S. Ramachandran, C. Zhao, J.K. Pritchard, M.W. Feldman, Clines, clusters, and the effect of study design on the inference of human population structure, PLoS Genet. 1 (2005) e70. [44] C. Bourgain, E. Genin, Complex trait mapping in isolated populations: are specific statistical methods required? Eur. J. Hum. Genet. 13 (2005) 698–706. [45] N.A. Rosenberg, J.K. Pritchard, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, M.W. Feldman, Genet. Struct. Hum. Populations. Science 298 (2002) 2381–2385.

51

[46] A. Johansson, V. Vavruch-Nilsson, D.R. Cox, K.A. Frazer, U. Gyllensten, Evaluation of the SNP tagging approach in an independent population sample-array-based SNP discovery in Sami, Hum. Genet. 122 (2007) 141–150. [47] N.S. Roy, S. Farheen, N. Roy, S. Sengupta, P.P. Majumder, Portability of Tag SNPs across isolated population groups: an example from India, Ann. Hum. Genet. 72 (2008) 82–89. [48] Z. Xu, N.L. Kaplan, J.A. Taylor, Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data, Eur. J. Hum. Genet. 15 (2007) 1063–1070. [49] J.M. Kidd, T.L. Newman, E. Tuzun, R. Kaul, E.E. Eichler, Population stratification of a common APOBEC gene deletion polymorphism, PLoS Genet. 3 (2007) e63. [50] D.J. Witherspoon, E.E. Marchani, W.S. Watkins, C.T. Ostler, S.P. Wooding, B.A. Anders, J.D. Fowlkes, S. Boissinot, A.V. Furano, D.A. Ray, A.R. Rogers, M.A. Batzer, L.B. Jorde, Human population genetic structure and diversity inferred from polymorphic L1(LINE-1) and Alu insertions, Hum. Hered. 62 (2006) 30–46. [51] B.S. Weir, C.C. Cockerham, Estimating F-statistics for the analysis of population structure, Evolution 38 (1984) 1358–1370. [52] J.C. Barrett, B. Fry, J. Maller, M.J. Daly, Haploview: analysis and visualization of LD and haplotype maps, Bioinformatics 21 (2005) 263–265. [53] A. Ramirez-Soriano, O. Lao, M. Soldevila, F. Calafell, J. Bertranpetit, D. Comas, Haplotype tagging efficiency in worldwide populations in CTLA4 gene, Genes Immun. 6 (2005) 646–657. [54] A. Tenesa, M.G. Dunlop, Validity of tagging SNPs across populations for association studies, Eur. J. Hum. Genet. 14 (2006) 357–363. [55] S. Mahasirimongkol, W. Chantratita, S. Promso, E. Pasomsab, N. Jinawath, W. Jongjaroenprasert, V. Lulitanond, P. Krittayapoositpot, S. Tongsima, P. Sawanpanyalert, N. Kamatani, Y. Nakamura, T. Sura, Similarity of the allele frequency and linkage disequilibrium pattern of single nucleotide polymorphisms in drugrelated gene loci between Thai and northern East Asian populations: implications for tagging SNP selection in Thais, J. Hum. Genet. 51 (2006) 896–904. [56] Y.K. Yoo, X. Ke, S. Hong, H.Y. Jang, K. Park, S. Kim, T. Ahn, Y.D. Lee, O. Song, N.Y. Rho, M.S. Lee, Y.S. Lee, J. Kim, Y.J. Kim, J.M. Yang, K. Song, K. Kimm, B. Weir, L.R. Cardon, J.E. Lee, J.J. Hwang, Fine-scale map of encyclopedia of DNA elements regions in the Korean population, Genetics 174 (2006) 491–497. [57] A.F. Marvelle, L.A. Lange, L. Qin, Y. Wang, E.M. Lange, L.S. Adair, K.L. Mohlke, Comparison of ENCODE region SNPs between Cebu Filipino and Asian HapMap samples, J. Hum. Genet. 52 (2007) 729–737. [58] A. Angius, F.C. Hyland, I. Persico, N. Pirastu, T. Woodage, M. Pirastu, F.M. De la Vega, Patterns of linkage disequilibrium between SNPs in a Sardinian population isolate and the selection of markers for association studies, Hum. Hered. 65 (2008) 9–22.

HapMap tagSNP transferability in multiple populations

Available online 14 May 2008. Keywords: TagSNPs .... the highest correlation with the HapMap JPT+CHB (rho=0.71), and. Mbuti Pygmies correlate .... account for only a small proportion of the variance in LD [21], much of the variation observed ... When genotyping savings is the primary concern, the CEU set provides good ...

2MB Sizes 3 Downloads 127 Views

Recommend Documents

Skill Transferability, Migration, and Development ...
Mar 30, 2016 - Stylized Case Study. Data: many agroclimatic origins and destinations, individual-level ... Large avg. elasticity: 1 SD⇑ similarity =⇒ 20%⇑ rice productivity. ⊳ similarly positive ..... mining sector (% district pop.) -0.202. (

Additive Genetic Models in Mixed Populations - GitHub
It needs to estimate one virtual variance for the hybrid population which is not linked to any genetic variance in the real world. .... Setup parallel computing.

Probabilistic Proofs and Transferability
Nov 6, 2008 - UNLV, Berkeley, Leeds, UT Austin, USC, UW Madison, NYU, University of ..... and thus we ought to be open to the possibility that a particular very ... boards) reliably preserve their data during the calculation, since the entire.

Human Populations -
Human Populations. Book, Principles of ... Global Human Population -grew fast recently 3 billion in. 1960 to 6 ... nutrition, sanitation, clean water, and education.

Mo_Jianhua_Asilomar15_Limited Feedback in Multiple-Antenna ...
Retrying... Mo_Jianhua_Asilomar15_Limited Feedback in Multiple-Antenna Systems with One-Bit Quantization.pdf. Mo_Jianhua_Asilomar15_Limited Feedback ...

transitional settlement displaced populations - HumanitarianResponse
The transitional settlement choices open to displaced people have ..... ed, what materials and design are used, who constructs the housing and how long ..... of the displaced populations and local hosts, who shape their own ..... This will add to you

Language Evolution in Populations - Linguistics and English Language
A particular feature of this work has been its foundations in 1) linguistic theory and 2) ... new social networks among children and younger people: These possibilities are influ- .... the competing grammars, in order to decide which is best. ... Gra

Evolutionary games in self-organizing populations
Institut de Recherches Interdisciplinaires et de Développements en Intelligence .... Each panel depicts a snapshot in the steady state of the active-linking ...

Contrasting evolutionary patterns in populations of demersal sharks ...
Oct 24, 2017 - DOI 10.1007/s00227-017-3254-2. ORIGINAL PAPER. Contrasting evolutionary patterns in populations of demersal sharks throughout the western Mediterranean. Sergio Ramírez‑Amaro1,2. · Antonia Picornell1 · Miguel Arenas3,4,5 · Jose A.

Mitochondrial DNA in Ancient Human Populations of Europe
A thesis submitted for the degree of Doctor of Philosophy at The ...... Abbreviations: A, adenine; aDNA, ancient DNA; B.C. , Before Christ; bp, base pair;.

Diverse coupling of neurons to populations in ... - Matteo Carandini
Apr 6, 2015 - V1 were bulk-loaded with Oregon Green BAPTA-1 dye and their ...... a, A recurrent network where excitatory cells (triangles) send synaptic.

Variation of morphological traits in natural populations ...
Une analyse en composantes principales a été employée pour expliquer la variance .... The statistical analysis of the data was carried out using the SPSS.

Invasion in multi-type populations: The role of ...
Aug 9, 2010 - is satisfied in our case), namely 〈N〉, we can analyse the dynamics of p(t) in terms of the backward Kolmogorov equation, whose solution corresponds to the extinction probability of the resident population as a function of p(t = 0) =

transitional settlement displaced populations - HumanitarianResponse
The Oxfam GB website contains a fully searchable database of all Oxfam publications ..... ed, what materials and design are used, who constructs the housing ...... employed in order to bring more money into the household, so that rent can be ...

The Evolution of Behavior in Biased Populations
Email: [email protected]. Phone: 1-864-656-4740. Fax: 1-864-656-4192. ... are not the best response to the population's aggregate behavior – “errors” – to be ..... are solipsists in every state, the stationary distribution puts mass on all a â

Reducing Prison Populations and Crime Rates In California Through ...
existing system o f communit)' supervision—probation and. parole—dramatically fails at its central task: punishing people. and reducing their future criminal ...

Adverse Fisheries Impacts on Cetacean Populations in the Black Sea
Nov 6, 2014 - 171. 4.1.1.2. Review of the Ukraine data collection programme . ...... promote education and disseminate general information on the need to conserve ...... Guidelines for technical measures to minimize cetacean-fishery.

Adverse Fisheries Impacts on Cetacean Populations in the Black Sea
Nov 6, 2014 - Sea Mammals Research Unit, University of St Andrews University, UK ... Shirshov Institute of Oceanology, Russian Academy of Science (Russia) ..... from Bulgaria and Romania, managed and published online by the ...... alia, fish stocks,

Emergence of target waves in paced populations of ...
Oct 2, 2009 - Online at http://www.njp.org/ .... One full Monte Carlo step consists of N = L2 elementary steps, during .... Importantly, while for M = 10−5 the two depicted time courses are .... Program of Higher Education of China (SRFD no.