STRUCTURE and Problem #2

Nora Mitchell February 7, 2017

Goals for Today’s Lab

I

Review F-statistics conceptually

I

Install and learn how to use STRUCTURE

I

Introduce Problem #2

Hierarchical F-statistics

Toy Example Indivduals in subpopulations

Fis Fis is the variation of individuals within subpopulations (f, inbreeding) Fis is a measure of departure from H-W

Socrative Is this figure an example of high or low inbreeding (f, Fis)?

Fis Is this figure an example of high or low inbreeding (f, Fis)?

Fst Fst is the variation among subpopulations within the total (θ)

Socrative Is this figure an example of high or low population differentiation (θ, Fst)?

Fst Is this figure an example of high or low population differentiation (θ, Fst)?

Human Example

Rosenberg et al. (2002, Science) looked at diversity in humans! I

377 autosomal microsatellite loci

I

1056 individuals

I

52 populations

I

8 regions

Human Example AMOVA (to look at variance components, slightly different, analogous to F-statistics) Where is most of the variation? Sample World Africa Eurasia Europe Middle East Central/South America East Asia Oceania America

Regions

Pops

Within pops

5 1 3 1 1

52 6 21 8 4

93.2 96.9 98.3 99.3 98.6

Among pops within regions 2.5 3.1 1.2 0.7 1.3

1

9

98.6

1.4

1 1 1

17 2 5

98.7 93.6 88.4

1.3 6.4 11.6

From Rosenberg et al. (2002)

Among regions 4.3 0.5

Individual Assignment

How many distinct groups are there? What groups do individuals belong to?

STRUCTURE

STRUCTURE is a free software package from Pritchard et al. (2000) I

Uses multi-locus genotype data to investigate population structure

I

Assigns individuals to “K” number of clusters

I

Can be used to identify distinct populations, hybrids, migrants, etc.

I

Can use different genetic markers (microsats, SNPs, RFLPs, AFLPs)

I

Takes an MCMC approach

I

http://pritchardlab.stanford.edu/structure.html

STRUCTURE

Install Structure now and we will walk through an example of how to use it! http://pritchardlab.stanford.edu/structure_software/ release_versions/v2.3.4/html/structure.html If you are having trouble on Mac, see pages 4 & 5 of project 2

Human Example

From Rosenberg et al. (2002)

Interpreting Structure Output After you run Structure for K=1:N, there are two ways to choose the “right K” 1. Look at DeltaK output from Structure Harvester (measures rate of change of probability density of data given that K-value) Choose highest DeltaK 2. Look at the mean log posterior probability of the data LnP(D), also known as L(K) Choose a value where this seems to level off There may be more than one “correct” answer regarding the K chosen! Justify your choice!

Structure Harvester

Structure Harvester is a web-based program that takes the output from multiple runs of structure (in zip file format) to calculate DeltaK from Evanno et al. DeltaK is a measure of the rate of chage in the log probability of the data betwee successive K values http://taylor0.biology.ucla.edu/structureHarvester/

Structure Harvester LnK

From Evanno et al. (2005)

Structure Harvester DeltaK

From Evanno et al. (2005)

Socrative What is a reasonable estimate of K given these plots?

From Prunier and Holsinger (2010)

White proteas Look at Structure barplots for different Ks

From Prunier and Holsinger (2010)

TeraStructure What about large datasets? TeraStructure is a shortcut scalable approach for giant datasets: For instance: 1012 observed genotypes, 1 million individuals at 1 million SNPs

From Gopalan et al. (2016)

Project 2 Protea repens is a widespread South African shrub

Project 2 Samples from 19 populations across its range Originally 2006 polymorphic loci

From Carlson et al. (2015)

Project 2 Samples from 19 populations across its range Originally 2006 polymorphic loci

Prunier et al. Accepted

Project 2

For this project, analyzing Fst outlier loci I

662 individuals

I

19 populations

I

173 SNP loci

From Prunier et al. Accepted

Project 2

Questions I

What are estimates of Fis and Fst using Weir and Cockerham’s approach?

I

What are estimates using Kent’s Bayesian approach? How do they compare with the above?

I

Is there evidence for inbreeding in Protea repens?

I

How similar or different is the genetic structure for these loci compared with the publication based on individual assignment?

From Prunier et al. Accepted

Project 2

Methods Hints I

Use adegenet in R to estimate Weir and Cockerham’s F-stats.

I

Use Kent’s code for Bayesian estimates of theta and f. Means and credible intervals!

I

Compare models using DIC to see if there is evidence for inbreeding! (Set DIC to TRUE in code!)

I

Is a higher or lower DIC indicative of a “better” model?

I

In Structure, run for K = 2 to K = 19. Follow instructions in tutorial.

I

Bayesian code and Structure will take a chunk of time to run!

Project 2

Write-up Hints I

What are Fst outliers? Why might they be different? (Outside source...?)

I

Write-up shoud include appropriate figures

I

Answer questions as if they were main questions/hypotheses in introduction of a paper. Your write-up is a condensed results and/or discussion section.

Project 2

IMPORTANT Send me zip file with your Structure results for K = 2 to K = 19 by Thursday at midnight! I will compile class data and run it through Structure Harvester and send you the results! Write-up due to me via e-mail next Tuesday Feb 13th, 9:30am

Works Cited I Carlson, J.E., C.A. Adams, and K.E. Holsinger (2015). Intraspecific variation in stomatal traits, leaf traits and physiology reflects adaptation along aridity gradients in a South African shrub. Annals of Botany 117(1); 195-207. I Dent, A., and vonHoldt, B.M. 2012. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources 4(2):359-361. I Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14:2611-2620. I Gopalan, P., W. Hao, D.M. Blei, and J.D. Storey. 2016. Scaling probabilistic models of genetic variation to millions of humans. Nature Genetics 48:1587-1590. I Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959. I Prunier, R., and K. E. Holsinger. 2010. Was it an explosion? Using population genetics to explore the dynamics of a recent radiation within Protea (Proteaceae L.). Molecular Ecology 19(18): 3968-3980. I Prunier, R., M. Akman, N. Aitken, C. Kremer, A. Chuah, J. Borevitz, and K.E.Holsinger. Accepted. Isolation by distance and isolation by environment contribute to population differentiation in Protea repens (Proteaceae L.), a widespread South African species. American Journal of Botany. I Rosenberg, N.A., J.K. Pritcharad, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, and M.W. Feldman. 2002. Genetic Structure of Human Populations. Science 298(5602): 2381-2385.

STRUCTURE and Problem #2 - GitHub

Feb 7, 2017 - Uses multi-locus genotype data to investigate population ... the data betwee successive K values ... For this project, analyzing Fst outlier loci.

3MB Sizes 42 Downloads 407 Views

Recommend Documents

The Multidimensional Knapsack Problem: Structure and Algorithms
Institute of Computer Graphics and Algorithms .... expectation with increasing problem size for uniformly distributed profits and weights. .... In our implementation we use CPLEX as branch-and-cut system and initially partition ..... and Ce can in pr

Lab 3: Structure - GitHub
Structure Harvester is very easy to use, and is all web-based! You simply upload your zip file and then click “Harvest!” It may take a few minutes to run.

Problem Tutorial: “Apples” - GitHub
careful when finding x, cause the multiplication might not fit in the limits of long long. Also don't the forget the case when there's no answer. Page 1 of 1.

Problem 1 Problem 2 Problem 3 -
roads through the forest are both extremely important, so a boy who reaches the age of manhood is not designated ... first hike is a 5 kilometer hike down the main road. The second hike is a 51. 4 kilometer ..... from one state to another with two cl

AIFFD Chapter 9 - Size Structure - GitHub
May 14, 2015 - 9.1 Testing for Differences in Mean Length by Means of Analysis of .... response~factor and the data= argument set equal to the data frame ...

Problem Statement Data Layouts Unique Research ... - GitHub
Cluster C Profile. HDFS-EC Architecture. NameNode. ECManager. DataNode. ECWorker. Client. ECClient. BlockGroup. ECSchema. BlockGroup. ECSchema. DataNode. DataNode. DataNode … ECWorker. ECWorker. ECWorker. BlockGroup: data and parity blocks in an er

Problem Tutorial: “The queue” - GitHub
You need to optimize that solution even more using some data structure. In fact ... problem with 2 types of query (if you imagine that you have a very big array) :.

HW 2. - GitHub
0. > JL. f t. HW 2."? - /*//. =:- 122. ^ 53.0. C.VK,. , r~/ = O*. ^. -._ I"T.

Chapter 2 - GitHub
Jan 30, 2018 - More intuitively, this notation means that the remainder (all the higher order terms) are about the size of the distance between ... We don't know µ, so we try to use the data (the Zi's) to estimate it. • I propose 3 ... Asymptotica

PDF 2 - GitHub
css/src/first.less') .pipe(less()), gulp.src('./css/src/second.css') .pipe(cssimport()) .pipe(autoprefixer('last 2 versions'))) .pipe(concat('app.css')) .pipe(minifyCss()).

Covers Python 3 and Python 2 - GitHub
Setting a custom figure size. You can make your plot as big or small as you want. Before plotting your data, add the following code. The dpi argument is optional ...

Covers Python 3 and Python 2 - GitHub
You can add as much data as you want when making a ... chart.add('Squares', squares) .... Some built-in styles accept a custom color, then generate a theme.

Project 2 - GitHub
Use the following explicit schemes: 1. Finite-Volume: FTCS for both convection and diffusion. 2. Finite-Volume: First order upwind for convection, FTCS for ...

1 Math Review Problem 2 Macro Review Problem
Consider the two period endowment economy in the Macro Review slides: a household is endowed with Q1 units of goods in period one and Q2 units of goods in period two. The goods are perishable, hence the household cannot store the endowment from perio

Exploiting Problem Structure in Distributed Constraint ...
To provide communication facilities for rescue services, it may be neces- ...... In [PF03], agents try to repair constraint violations using interchangeable values ...... Supply chain management involves planning and coordinating a range of activ- ..

Queens Community District 2 - GitHub
for Public Use Microdata Areas (PUMAs). PUMAs are geographic approximations of community districts. 5NYC Dept of City Planning Facilites Database (2017); 6 Differences of less than 3 percentage points are not statistically meaningful. 7NYC Dept of Pa

CSCI 305 Homework 2 - GitHub
Feb 23, 2018 - Describe how Fortran common blocks work and give an example. What happens if two named common blocks with the same name contain different variables? What is the difference between a blank common and a named common? What does the linker

Bronx Community District 2 - GitHub
for Public Use Microdata Areas (PUMAs). PUMAs are geographic approximations of community districts. BX 2 shares PUMA 3710 with BX 1, and the ACS population estimate cannot be reliably disaggregated. 5NYC Dept of City Planning Facilites Database (2017

Recordset 1 och 2 - GitHub
TTEKOKORTISAR. EKK. TT-GÖTEBORG-PM. GPM. TT-NORRLANDS-PM. NPM .... This means of course that this field not is repeated. The signatures are SGML ...

Brooklyn Community District 2 - GitHub
for Public Use Microdata Areas (PUMAs). PUMAs are geographic approximations of community districts. 5NYC Dept of City Planning Facilites Database (2017); 6 Differences of less than 3 percentage points are not statistically meaningful. 7NYC Dept of Pa

android sai tech (2) - GitHub
Android is a mobile operating system that is based on a modified version of Linux. It was originally developed by a startup of the same name, .... Page 10 ...

Manhattan Community District 2 - GitHub
for Public Use Microdata Areas (PUMAs). PUMAs are geographic approximations of community districts. MN 2 shares PUMA 3810 with MN 1, and the ACS population estimate cannot be reliably disaggregated. 5NYC Dept of City Planning Facilites Database (2017