M E N Journal Name

2 8 8 5 Manuscript No.

B

Dispatch: 28.5.10

Journal: MEN CE: Vinoth

Author Received:

No. of pages: 5 PE: Gomathi M

Molecular Ecology Resources (2010)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

doi: 10.1111/j.1755-0998.2010.02885.x

COMPUTER PROGRAM NOTE

COANCESTRY: a program for simulating, estimating and

analysing relatedness and inbreeding coefficients JINLIANG WANG

Institute of Zoology, Zoological Society of London, London NW1 4RY, UK

Abstract The software package COANCESTRY implements seven relatedness estimators and three inbreeding estimators to estimate relatedness and inbreeding coefficients from multilocus genotype data. Two likelihood estimators that allow for inbred individuals and account for genotyping errors are for the first time included in this user-friendly program for PCs running Windows operating system. A simulation module is built in the program to simulate multilocus genotype data of individuals with a predefined relationship, and to compare the estimators and the simulated relatedness values to facilitate the selection of the best estimator in a particular situation. Bootstrapping and permutations are used to obtain the 95% confidence intervals of each relatedness or inbreeding estimate, and to test the difference in averages between groups. Keywords: genetic markers, inbreeding coefficient, maximum likelihood, relatedness, simulations Received 22 March 2010; revision received 5 May 2010; accepted 13 May 2010 Genetic marker data are widely used to estimate the relatedness between individuals in a population in which pedigree records are either lacking or unreliable. Such marker-based relatedness is valuable in many areas of research in behaviour, evolution and conservation in natural populations (Blouin 2003). Example applications include estimating heritabilities (e.g. Ritland 2000; Visscher et al. 2006), minimizing inbreeding in captive populations (e.g. Jones et al. 2002; Sekino et al. 2004), studying spatial structure and isolation by distance (e.g. Hardy & Vekemans 2002; Vekemans & Hardy 2004), examining social structures and kin selection (e.g. Girman et al. 1 1997; Peters et al. 1999), inferring sex-based migration (e.g. Piertney et al. 1998; Knight et al. 1999) and estimating population sizes (e.g. Nomura 2008). Quite a few relatedness estimators have been proposed, which use the individual multilocus genotype information differently and thus could yield different estimates (below). Currently, there are several software packages implementing some of these estimators, such as Relatedness (Queller & Goodnight 1989), IDENTIX (Belkhir et al. 2002), SPAGeDI (Hardy & Vekemans 2002), ML_RELATE (Kalinowski 2 et al. 2006), GENALEX (Peakall & Smouse 2005). In this study, I describe a new computer program that comple-

Correspondence: Jinliang Wang, Fax: 0044 20 75862870; E-mail: [email protected]

 2010 Blackwell Publishing Ltd

ments previous ones in several important aspects as briefed below.

Relatedness between inbred individuals All relatedness estimators implemented in current computer programs assume non-inbred individuals. In this simple case, two diploid individuals, X and Y, can share 2, 1 and 0 pairs of genes identical by descent (IBD) at a locus, as shown by IBD mode S7, S8 and S9 in Fig. 1. The genes within an individual, either X or Y, are always non-IBD. Denoting the probabilities of IBD mode S7, S8 and S9 by D7, D8, and D9, respectively, the co-ancestry coefficient between X and Y is hXY ¼ 12D7 þ 14D8 : hXY represents the probability that a gene chosen at random from individual X is IBD to a homologous gene chosen at random from individual Y. In the literature, both hXY (e.g. Ritland 1996) and 2hXY (e.g. Lynch & Ritland 1999; Wang 2002) are used to characterize the genetic relatedness between X and Y, rXY, because of their common coancestry. Herein I define rXY = hXY because in the presence of inbreeding rXY can be larger than 1 if it is defined as rXY = 2hXY. For some common relationships, the values of D7, D8, D9 and rXY are listed in Table 1.

2 COMPUTER PROGRAM NOTE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

S1

S2

S3

S4

S5

S6

S7

S8

S9

Fig. 1 Identity by descent modes of the four genes at a locus of two diploid individuals. Each group of four dots represents an identical by descent (IBD) mode between two individuals. The top pair of dots represents the two genes in individual X and the bottom pair of dots represents the two genes in individual Y. Genes connected by lines are IBD.

Table 1 Probabilities of identical by descent and relatedness for some common relationships Relationship*

D1

D2

D3

D4

D5

D6

D7

D8

D9

rXY

FX

FY

Monozygotic twins, clonemates Parent-offspring Full sibs Half sibs, Avuncular, Grandparent-grandchild Double first cousins First cousins Unrelated Two selfed sibs Selfed-outbred sibs Parent-selfed offspring

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 0 1⁄4 0

0 1 1⁄2 1⁄2

0 0 1⁄4 1⁄2

1⁄2 1⁄4 1⁄4 1⁄8

0 0 0 0

0 0 0 0

0 0 0 1⁄8 0 0

0 0 0 1⁄8 0 0

0 0 0 1⁄4 1⁄4 0

0 0 0 0 1⁄4 0

0 0 0 1⁄4 0 1⁄2

0 0 0 0 0 0

1 ⁄ 16 0 0 1⁄4 0 1⁄2

6 ⁄ 16 1⁄4 0 0 1⁄2 0

9 ⁄ 16 3⁄4 1 0 0 0

1⁄8 1 ⁄ 16 0 1⁄2 1⁄4 1⁄2

0 0 0 1⁄2 1⁄2 0

0 0 0 1⁄2 0 1⁄2

Avuncular: Any of the four combinations of aunt-nephew, aunt-niece, uncle-nephew, uncle-niece. For selfed-outbred and parent-selfed offspring dyads, the first individual is denoted by X and the second Y.

The assumption of a large outbreeding population is at most an approximation in reality. Because of population subdivision (fragmentation), small population size and nonrandom mating (e.g. selfing), individuals can be inbred and the relatedness between such individuals should ideally take inbreeding into account. Otherwise, a biased estimate of relatedness might be resulted. The relatedness between siblings from first cousin mating, for example, would be underestimated by 1 ⁄ 32 if inbreeding of the siblings (or relatedness between parents) is ignored. In general, a set of nine IBD modes, as depicted in Fig. 1, are required to give a full description of the possible IBD relationships among the four homologous genes possessed by two individuals (Jacquard 1972). Given the probability of each of the nine IBD modes for individuals X and Y, the inbreeding coefficients of and relatedness between X and Y are FX ¼ D1 þ D2 þ D3 þ D4 ; FY ¼ D1 þ D2 þ D5 þ D6 ; rXY ¼ D1 þ 12ðD3 þ D5 þ D7 Þ þ 14D8 where Di is the probability of IBD mode i (=19) with P9 i¼1 Di ¼ 1, FX and FY are the inbreeding coefficients of X and Y. The nine IBD probabilities, relatedness and inbreeding coefficients for some relationships in selfing populations are listed in Table 1.

Currently, there is no moment relatedness estimator that could account for inbreeding. There are however two likelihood methods that allow for inbred individuals. Both attempt to estimate the nine IBD coefficients and thus their summary statistics (e.g. rXY, FX and FY) between a pair of individuals, using their multilocus genotypes. One is described by Milligan (2003), but only the reduced three-parameter model of non-inbreeding was implemented (as in ML_RELATE, Kalinowski et al. 2006). Later, the full nine-parameter model was implemented by Wang (2007) and Anderson & Weir (2007). In the former, a new likelihood method that estimates the nine IBD coefficients using the multilocus genotypes of the focal dyad as well as a reference individual was also described and implemented. As at present, there are no user-friendly computer programs available that calculate the two nine-parameter likelihood estimators. In COANCESTRY, both methods are implemented, and users have the choice whether to run the full nine-parameter or the reduced three-parameter models. When the full models are selected, COANCESTRY gives relatedness estimates from the two likelihood estimators with inbreeding taken into account, and also inbreeding coefficients. Additionally, five moment estimators, proposed by Queller & Goodnight (1989), Li et al. (1993), Ritland (1996), Lynch & Ritland (1999), Wang (2002), are also calculated.

 2010 Blackwell Publishing Ltd

4

COMPUTER PROGRAM NOTE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Bootstrapping over loci is adopted to obtain the 95% confidence interval of each estimator for each dyad. Genetic marker data used in relatedness analyses usually contain genotyping errors and mutations (Pompanon et al. 2005). Relatedness could be biased when estimated from markers that have a substantial mistyping rate, as demonstrated by simulations (Wang 2007). Both likelihood methods implemented in COANCESTRY allow the user to specify a locus-specific mistyping rate and estimate relatedness by accounting genotyping errors.

Inbreeding coefficients Individual inbreeding coefficients can be estimated from the same marker data used for estimating relatedness. Such estimates are useful in many applications, such as for investigating mating systems and inbreeding depression. Unlike previous relatedness software, COANCESTRY implements one moment and two likelihood methods (see above) for inferring inbreeding coefficients. Moment estimators are typically obtained by equating a certain summary statistic of the data to its expected value given the parameter and solving for the parameter, and are so called because they are usually in terms of the first few moments of allele frequency distributions. In contrast, likelihood estimates are obtained by maximizing the likelihood of the parameter given the data. The moment estimator of inbreeding implemented in COANCESTRY was proposed by Li & Horvitz (1953), as described in Ritland (1996). For a locus with k alleles of frequency pi (i = 1–k), the estimator is defined as F¼

k X Si  p2i ðk  1Þpi i¼1

where the indicator variable Si = 1 if the two genes within an individual are both allele i, and Si = 0 otherwise. The multilocus estimate is obtained by weighting the single locus estimates by k–1 (Ritland 1996). The inbreeding coefficient estimates from the likelihood methods are calculated from the estimates of the nine IBD coefficients as shown above (Wang 2007). When an individual is involved in N dyads, an estimate of its inbreeding coefficient is obtained as the average across the N estimates.

Simulations Quite a few relatedness estimators have been proposed, and compared for accuracy extensively using both simulated and empirical data sets (e.g. Lynch & Ritland 1999; Van de Casteele et al. 2001; Wang 2002; Milligan 2003; Csillery et al. 2006). However, there seems to be no single estimator that performs best in

 2010 Blackwell Publishing Ltd

all situations. The rank order of these estimators is marker data dependent and relies also on the true relatedness being estimated or the population’s relatedness structure. For example, for biallelic or effectively biallelic (where two of the alleles are common, with frequencies sum to a value close to one) markers, the Queller & Goodnight (1989), Ritland (1996) and Lynch & Ritland (1999) estimators behave badly, yielding frequently either undefined or erratically very large estimates (Wang 2002). Likelihood methods are better than moment estimators when a large number of polymorphic markers are available. Otherwise, they are less accurate than moment estimators. Usually the Ritland (1996) and Lynch & Ritland (1999) estimators are more accurate for unrelated or loosely related dyads but are less accurate for highly related dyads than the Queller & Goodnight (1989), Li et al. (1993) and Wang (2002) estimators. Because of these complexities, it is difficult for a user to decide which estimator is the most appropriate to use for his ⁄ her data analysis. It is therefore recommended that simulated data mimicking the real data (in markers and samples) are generated and comparatively analysed by different estimators to decide on the best estimator to use (e.g. Van de Casteele et al. 2001). If different estimators perform best for different population compositions on which one has no a priori knowledge, as is most often the case, hypotheses predicting relations with relatedness are preferably tested with these different ‘best estimators’ separately (Van de Casteele et al. 2001). Additionally, a simulation study also provides the distribution (and variance) of relatedness estimates for a given simulated value, which can be used to gauge how reliable the estimates are and how the reliability changes with true relatedness and marker information. A variance that is too high indicates that the marker information is insufficient and the estimates should be used with caution. COANCESTRY allows a user to define various relationships of his ⁄ her interest by using the nine IBD coefficients (see some examples in Table 1). It also allows the user to use a set of markers with either known or unknown allele frequencies in simulating the multilocus genotypes of pairs of individuals with a defined relationship. In the latter case, allele frequencies at each locus are simulated following a certain distribution defined by the user. On completion of analysing the simulated data set, COANCESTRY calculates a matrix of correlation coefficients among seven different relatedness estimators and the true simulated values. The best estimator is simply the one that has the highest correlation with the true values. COANCESTRY also compares the different estimators and true values in various graphs (see Fig. 2 for a screenshot of COANCESTRY).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

LOW RESOLUTION FIG

4 COMPUTER PROGRAM NOTE

Fig. 2 Screenshot of coancestry software interface, showing the relatedness estimates from seven estimators, the scatter graph for 5 different estimators, and the permutation test of average relatedness between two groups.

Test of difference in average relatedness or inbreeding between groups Frequently when relatedness estimates for pairs of individuals are obtained, one wishes to determine whether two groups of individuals have an average relatedness significantly different or not. For example, one may want to find out whether or not relatedness among males (or adults) is on average higher than that among females (or juveniles). The results would indicate whether there is sex-based (age-specific) migration among populations or social groups. Comparing the average relatedness within and between populations (or social groups) would indicate whether populations (groups) are genetically differentiated or not. COANCESTRY allows the user to partition dyads into groups, and to test the difference in average relatedness between groups by a permutation procedure. The test results are presented in a graph, in which the observed difference is marked on the distribution of simulated differences (see Fig. 2). Similarly, difference in average inbreeding coefficients is also tested between groups of individuals defined by the user.

The software package COANCESTRY includes an executable that calculates relatedness and inbreeding estimators (source code in Fortran 90 ⁄ 95), an executable of the user graphical interface for Windows XP or later (source code in Visual Basic), and a user’s manual and some example data sets. The program runs on PCs with Windows operating systems. It can be downloaded from http:// www.zsl.org/science/research/software/coancestry, 1360,AR.html 3

References Anderson AD, Weir BS (2007) A maximum-likelihood method for the estimation of pairwise relatedness in structured populations. Genetics, 176, 421–440. Belkhir K, Castric V, Bonhomme F (2002) Identix, a software to test for relatedness in a population using permutation methods. Molecular Ecology Notes, 2, 611–614. Blouin MS (2003) DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends in Ecology and Evolution, 18, 503–511. Csillery K, Johnson T, Beraldi D et al. (2006) Performance of marker-based relatedness estimators in natural populations of outbred vertebrates. Genetics, 173, 2091–2101.

 2010 Blackwell Publishing Ltd

COMPUTER PROGRAM NOTE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Girman D, Mills M, Geffen E, Wayne R (1997) A molecular genetic analysis of social structure, dispersal, and interpack relationships of the African wild dog (Lycaon pictus). Behavioral Ecology and Sociobiology, 40, 187–198. Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes, 2, 618–620. Jacquard A (1972) Genetic information given by a relative. Biometrics, 28, 1101–1114. Jones KL, Glenn TC, Lacy RC et al. (2002) Refining the whooping crane studbook by incorporating microsatellite DNA and legbanding analyses. Conservation Biology, 16, 789–799. Kalinowski ST, Wagner AP, Taper ML (2006) ML-Relate: a computer program for maximum likelihood estimation of relatedness and relationship. Molecular Ecology Notes, 6, 576–579. Knight ME, Van Oppen MJH, Smith HL et al. (1999) Evidence for male-biased dispersal in lake Malawi cichlids from microsatellites. Molecular Ecology, 8, 1521–1527. Li CC, Horvitz DG (1953) Some methods of estimating the inbreeding coefficient. American Journal of Human Genetics, 5, 107–117. Li CC, Weeks DE, Chakravarti A (1993) Similarity of DNA fingerprints due to chance and relatedness. Human Heredity, 43, 45–52. Lynch M, Ritland K (1999) Estimation of pairwise relatedness with molecular markers. Genetics, 152, 1753–1766. Milligan BG (2003) Maximum-likelihood estimation of relatedness. Genetics, 163, 1153–1167. Nomura T (2008) Estimation of effective number of breeders from molecular coancestry of single cohort sample. Evolutionary Applications, 1, 462–474. Peakall R, Smouse PE (2005) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6, 288–295.

 2010 Blackwell Publishing Ltd

Peters JM, Queller DC, Imperatriz-Fonseca VL, Roubik DW, Strassmann JE (1999) Mate number, kin selection and social conflicts in stingless bees and honeybees. Proceedings of the Royal Society of London B, 266, 379–384. Piertney SB, MacColl ADC, Bacon PJ, Dallas JF (1998) Local genetic structure in red grouse (Lagopus lagopus scoticus): evidence from microsatellite DNA markers. Molecular Ecology, 7, 1645–1654. Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes consequences, and solutions. Nature Reviews Genetics, 6, 847–859. Queller DC, Goodnight KF (1989) Estimating relatedness using molecular markers. Evolution, 43, 258–275. Ritland K (1996) Estimators for pairwise relatedness and inbreeding coefficients. Genetical Research, 67, 175–186. Ritland K (2000) Marker-inferred relatedness as a tool for detecting heritability in nature. Molecular Ecology, 9, 1195–1204. Sekino M, Sugaya T, Hara M, Taniguchi N (2004) Relatedness inferred from microsatellite genotypes as a tool for broodstock management of Japanese flounder Paralichthys olivaceus. Aquaculture, 233, 163–172. Van de Casteele T, Galbusera P, Matthysen E (2001) A comparison of microsatellite-based pairwise relatedness estimators. Molecular Ecology, 10, 1539–1549. Vekemans X, Hardy OJ (2004) New insights from fine-scale spatial genetic structure analyses in plant populations. Molecular Ecology, 13, 921–935. Visscher PM, Medland SE, Ferreira MAR et al. (2006) Assumption-free estimation of heritability from genome-wide identityby-descent sharing between full siblings. PLoS Genetics, 2, e41. Wang J (2002) An estimator for pairwise relatedness using molecular markers. Genetics, 160, 1203–1215. Wang J (2007) Triadic IBD coefficients and applications to estimating pairwise relatedness. Genetical Research, 89, 135–153.

Author Query Form Journal:

MEN

Article:

2885

Dear Author, During the copy-editing of your paper, the following queries arose. Please respond to these by marking up your proofs with the necessary changes/additions. Please write your answers on the query sheet if there is insufficient space on the page proofs. Please write clearly and follow the conventions shown on the attached corrections sheet. If returning the proof by fax do not write too close to the paper’s edge. Please remember that illegible mark-ups may delay publication. Many thanks for your assistance.

Query reference

Query

Q1

AUTHOR: Girman et al. 1996 has been changed to Girman et al. 1997 so that this citation matches the Reference List. Please confirm that this is correct.

Q2

AUTHOR: Peakall & Smouse 2006 has been changed to Peakall and Smouse 2005 so that this citation matches the Reference List. Please confirm that this is correct.

Q3

AUTHOR: Please check this website address and confirm that it is correct. (Please note that it is the responsibility of the author(s) to ensure that all URLs given in this article are correct and useable.)

Q4

AUTHOR: Please provide the significance for ‘‘*’’ in Table 1.

Q5

AUTHOR: Figure 2 has been saved at a low resolution of 274 dpi. Please resupply at 600 dpi. Check required artwork specifications at http:// authorservices.wiley.com/submit_illust.asp?site=1

Remarks

USING E-ANNOTATION TOOLS FOR ELECTRONIC PROOF CORRECTION Required Software Adobe Acrobat Professional or Acrobat Reader (version 7.0 or above) is required to e-annotate PDFs. Acrobat 8 Reader is a free download: http://www.adobe.com/products/acrobat/readstep2.html Once you have Acrobat Reader 8 on your PC and open the proof, you will see the Commenting Toolbar (if it does not appear automatically go to Tools>Commenting>Commenting Toolbar). The Commenting Toolbar looks like this:

If you experience problems annotating files in Adobe Acrobat Reader 9 then you may need to change a preference setting in order to edit. In the “Documents” category under “Edit – Preferences”, please select the category ‘Documents’ and change the setting “PDF/A mode:” to “Never”.

Note Tool — For making notes at specific points in the text Marks a point on the paper where a note or question needs to be addressed. How to use it: 1. Right click into area of either inserted text or relevance to note 2. Select Add Note and a yellow speech bubble symbol and text box will appear 3. Type comment into the text box 4. Click the X in the top right hand corner of the note box to close.

Replacement text tool — For deleting one word/section of text and replacing it Strikes red line through text and opens up a replacement text box. How 1. 2. 3. 4. 5. 6.

to use it: Select cursor from toolbar Highlight word or sentence Right click Select Replace Text (Comment) option Type replacement text in blue box Click outside of the blue box to close

Cross out text tool — For deleting text when there is nothing to replace selection Strikes through text in a red line. How 1. 2. 3. 4.

Page 1 of 3

to use it: Select cursor from toolbar Highlight word or sentence Right click Select Cross Out Text

Approved tool — For approving a proof and that no corrections at all are required. How to use it: 1. Click on the Stamp Tool in the toolbar 2. Select the Approved rubber stamp from the ‘standard business’ selection 3. Click on the text where you want to rubber stamp to appear (usually first page)

Highlight tool — For highlighting selection that should be changed to bold or italic. Highlights text in yellow and opens up a text box. How to use it: 1. Select Highlighter Tool from the commenting toolbar 2. Highlight the desired text 3. Add a note detailing the required change

Attach File Tool — For inserting large amounts of text or replacement figures as a files. Inserts symbol and speech bubble where a file has been inserted. How 1. 2. 3. 4.

to use it: Click on paperclip icon in the commenting toolbar Click where you want to insert the attachment Select the saved file from your PC/network Select appearance of icon (paperclip, graph, attachment or tag) and close

Pencil tool — For circling parts of figures or making freeform marks Creates freeform shapes with a pencil tool. Particularly with graphics within the proof it may be useful to use the Drawing Markups toolbar. These tools allow you to draw circles, lines and comment on these marks.

How 1. 2. 3. 4.

to use it: Select Tools > Drawing Markups > Pencil Tool Draw with the cursor Multiple pieces of pencil annotation can be grouped together Once finished, move the cursor over the shape until an arrowhead appears and right click 5. Select Open Pop-Up Note and type in a details of required change 6. Click the X in the top right hand corner of the note box to close.

Page 2 of 3

COANCESTRY: a program for simulating, estimating ...

COMPUTER PROGRAM NOTE. COANCESTRY: a ... study, I describe a new computer program that comple- ... Correspondence: Jinliang Wang, Fax: 0044 20 75862870; E-mail: ..... tion-free estimation of heritability from genome-wide identity-.

511KB Sizes 0 Downloads 246 Views

Recommend Documents

No documents