Modeling Dependent Gene Expression D ONATELLO T ELESCA University of Texas, M.D. Anderson Cancer Center Department of Biostatistics

Joint work with

¨ P ETER M ULLER (M.D. Anderson, Biostatistics) G IOVANNI PARMIGIANI (Johns Hopkins, Biostatistics)

Modeling Dependent Gene Expression – p. 1/22

Motivation: Epithelial Ovarian Cancer (EOC)

• Epithelial tumors start from the cells that cover the outer surface of the ovary. Most ovarian tumors are epithelial cell tumors. • Poor outcome in EOC patients is associated with metastases to the peritoneum and stroma. • Evidence is mounting that an inflammatory process contributes to tumor growth and metastasis to the peritoneum in EOC. Modeling Dependent Gene Expression – p. 2/22

EOC ( Complement and Coagulation Cascade Pathway)

Modeling Dependent Gene Expression – p. 3/22

Outline

• From Pathways to Conditional Independence Priors

◦ Non-recursive graphs and Markov Random Fields • Probability of Expression (Parmigiani and Garreth 2002)

◦ Modeling gene expression with Normal Uniform mixtures. • Dependent Probability of Expression

◦ Conditional dependence and tetrachoric correlation • Posterior Inferences and Computations

◦ Model determination via RJ–MCMC • Applications

◦ A simple simulation ◦ EOC study

Modeling Dependent Gene Expression – p. 4/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges.

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges.

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996).

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

6= 4

3

Modeling Dependent Gene Expression – p. 5/22

0.15

0.20

• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).

0.05

0.10

• y˜gt = ygt − (αt + mg )

0.00

Frequency

0.25

0.30

0.35

POE: Probability of Expression (Parmigiani and Garreth, 2002)

−5

0

5

10

Observed mRNA Intensity

Modeling Dependent Gene Expression – p. 6/22

0.15

0.20

• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).

0.05

0.10

• y˜gt = ygt − (αt + mg )

0.00

Frequency

0.25

0.30

0.35

POE: Probability of Expression (Parmigiani and Garreth, 2002)

−5

0

5

10

Observed mRNA Intensity

  −  f = U ( −κ g,−1  g , 0)  p(˜ ygt |egt ) = fegt (˜ ygt | κg , sg ) with fg,0 = N ( 0, sg )     fg,1 = U ( 0, κ+ ) g Modeling Dependent Gene Expression – p. 6/22

POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)

and

πg+ = P (egt = 1)

Modeling Dependent Gene Expression – p. 7/22

POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)

and

πg+ = P (egt = 1)

◦ Specifically, for each data point:

P (egt = 1 |

P (egt = −1 |

ygt , πg+ , πg− , f1,g , f0,g ) ygt , πg+ , πg− , f−1,g , f0,g )

=

=

πg+ f1,g (ygt ) πg+ f1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt )) πg− f−1,g (ygt ) πg− f−1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt ))

Modeling Dependent Gene Expression – p. 7/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements.

Modeling Dependent Gene Expression – p. 8/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg

Modeling Dependent Gene Expression – p. 8/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg ◦ We will relax assumption (1) integrating known pathway interactions in the form of a conditional independence prior.

Modeling Dependent Gene Expression – p. 8/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

egt

ene(g)t

zgt

zne(g)t

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

egt

ene(g)t

zgt

zne(g)t

⇛ mRNA Abundance

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

⇛ Latent Probit scores

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

⇛ Latent Probit scores

Ωz | G = {V, E}

⇛ Polychoric Concentration

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);

◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)

Modeling Dependent Gene Expression – p. 10/22

DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);

◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)

⇒ |{z} Z ∼ MN ( µ , Ω−1 , In ) z |{z} |{z} |{z} N ×n

N ×n N ×N n×n

The (i, j)th element in Ωz is −cij , and cij = 0 iff i ∈ / ne(j) −→ conditional independence. Modeling Dependent Gene Expression – p. 10/22

Posterior Inference and Computation • The availability of closed form conditional posterior distributions allows for straightforward Gibbs sampling, given a specific graph G = {V, E}. • Recognizing that the prior pathway represents knowledge of genetic interactions in a non pathological state, we allow for deviation from the prior dependence structure encoded in G = {V, E}. • We consider the prior path diagram G = {V, E}, as the saturated model and allow for random deletion/insertion of edges compatible with the original pathway. • If we define ν ∈ {G}ν , as a compatible reconfiguration of the original pathway, we are now interested in the following distribution:

P (θ, ν | Y ) = P (Y | θ, ν) P (θ|ν) P (ν ∈ {G}ν )

Modeling Dependent Gene Expression – p. 11/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Simulation Study

◦ We define latent expression scores as: wgt = zgt + X′gt bg

where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n

Modeling Dependent Gene Expression – p. 13/22

Simulation Study

◦ We define latent expression scores as: wgt = zgt + X′gt bg

where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n

◦ The mRNA abundance is then defined as (N=200, n=60): ygt | wgt ≤ −1 ∼ N (−4, 22 ), ygt | wgt > 3 ∼ N (4, 22 ), ygt | −1 < wgt ≤ 3 ∼ N (0, 1). ◦ We will consider two conditional dependence schemes, a cluster scheme and a banded scheme, and fit the model with a misspecified prior pathway. Modeling Dependent Gene Expression – p. 13/22

Simulation Study: (Banded Structure)

Signal

P (Cij 6= 0 | Y)

E(Cij | Y)

Modeling Dependent Gene Expression – p. 14/22

Simulation Study:(Banded Structure)

200

50 10 20 30 40 50 60

t

100 50

g

100

g

150

150

200 150 100 50

g

p∗ = (p+ − p− )

mRNA Abundace 200

Signal

10

30

t

50

10 20 30 40 50 60

t

Modeling Dependent Gene Expression – p. 15/22

Simulation Study: (Cluster Structure)

Signal

P (Cij 6= 0 | Y)

E(Cij | Y)

Modeling Dependent Gene Expression – p. 16/22

Simulation Study:(Cluster Structure)

10

20

30

t

40

50

60

200

g

50

100

150

200 50

g

100

150

150 100 50

g

p∗ = (p+ − p− )

mRNA Abundance

200

Signal

10

30

t

50

10

20

30

40

50

60

t

Modeling Dependent Gene Expression – p. 17/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT).

Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC.

Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC. • One subset of genes reported on the NIH custom microarray are genes in the coagulation and complement pathway (http://www.genome.ad.jp). The arches in the pathway are interpreted as prior judgement about (approximate) conditional dependence. Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

E[F DR | Y]

0.20 0.15 0.10

E(FDR | Y)

80 60

0.05

40

0.00

20 0

Number of Edges

100

120

No. of edges

0

1000

2000

3000

RJ−MCMC Iteration

4000

5000

0

50

100

150

Number of Significan Edges

Modeling Dependent Gene Expression – p. 19/22

EOC Study (Complement and Coagulation Pathway)

C4A

CR2

C2

C5R1

C5

C3conv

CR1

CCL13

PROS1

IL8

SERPINE1

C3AR1

CXCL14

CXCL6

VWF

PLAU

F8

F10

PROC

F2

F5

THBD

F2R

F9

◦ 10 benign samples ◦ 14 tumor samples ◦ 179 Genes ◦ Edges selected so that E(F DR | Y ) ≤ 0.05 Modeling Dependent Gene Expression – p. 20/22

Summary

• • • •



We provide a coherent probabilistic framework that integrates prior information about genetic interaction into the analysis of expression data. Prior information is formally introduced into the POE model for molecular classification in cancer, via conditional independence priors. Dependence between gene is formalized in term of polychoric correlations between trinary indicators of over,under or normal expression. The limitations associated with the multivariate probit formulation, are counterbalanced by the ease of representing conditional independence in the Gaussian framework. Preliminary results on simulated and data from an EOC study, show that our model validates patterns and strength of dependence between genes.

Modeling Dependent Gene Expression – p. 21/22

Acknowledgments

-

Peter Müller

(MDACC)

-

Giovanni Parmigiani

(Johns Hopkins)

• Contact / Preprints : ◦ e-mail: [email protected] ◦ web : donatello.telesca.googolpages.com/home

Modeling Dependent Gene Expression – p. 22/22

Modeling Dependent Gene Expression

From Pathways to Conditional Independence Priors. ◦ Non-recursive graphs and Markov Random Fields. • Probability of Expression (Parmigiani and Garreth ...

2MB Sizes 0 Downloads 266 Views

Recommend Documents

Modeling Dependent Gene Expression
Nov 13, 2008 - Keywords: Conditional Independence, Microarray Data, Probability Of Expression, Probit Models, Recip- ..... other words, we partition E into E = S ∪ M ∪ U. 2.3 A Prior ..... offers a good recovery of the true generating pattern.

Gene Expression and Ethnic Differences
Feb 8, 2007 - 1Ludwig Center and Howard Hughes Medical Institute, .... for Bioinformatics, Salk Institute for Biological Studies, La Jolla, CA 92186, USA. D.

Gene Expression and Ethnic Differences
Feb 8, 2007 - MIC, lists a total of 109 silent mutations out of 2335 .... Genetics LLC, State College, PA 16803, USA. ... Company, Indianapolis, IN 46285, USA.

CONTEXT DEPENDENT WORD MODELING FOR ...
Good CDW units should be consistent and trainable. ... these two criteria to different degrees. A simple .... CDW based language models, a translation N-best list. (N=10) is .... [13] S. Chen, J. Goodman, “An Empirical Study of Smoothing Tech-.

man-41\gene-expression-concept-map.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

POGIL 14 Gene Expression-Transcription-S.pdf
Page 1 of 2. Stand 02/ 2000 MULTITESTER I Seite 1. RANGE MAX/MIN VoltSensor HOLD. MM 1-3. V. V. OFF. Hz A. A. °C. °F. Hz. A. MAX. 10A. FUSED. AUTO HOLD. MAX. MIN. nmF. D Bedienungsanleitung. Operating manual. F Notice d'emploi. E Instrucciones de s

Rapid, broadâ•'scale gene expression evolution ... - Wiley Online Library
Apr 26, 2017 - Fishes, Leibniz-Institute of Freshwater. Ecology and Inland Fisheries, Berlin,. Germany. 5Division of Integrative Fisheries. Management, Department of Crop and. Animal Sciences, Faculty of Life Sciences,. Humboldt-Universität zu Berli

Control of insulin gene expression by glucose
buffered Krebs bicarbonate medium containing 5 mg of BSA/ml for 1 h. Subsequently cells were incubated for a further 4 h in fresh medium containing test ...

Modeling of Frequency-Dependent Viscoelastic ...
forming to the time-domain, leads to the following symmetric matricial system ..... 0.15 nm, meaning that neither too thin nor too thick viscoelastic layers lead to ...

Gene Expression Changes in the Motor Cortex Mediating ... - PLOS
Apr 24, 2013 - commenced, prior to the initial rise in task performance, or at peak performance. Differential classes of gene ..... the regression curve at which 10% of the peak performance is achieved (t10%-max) ...... VASP downregulation triggers c

regulation of gene expression in prokaryotes pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. regulation of ...

Gene expression perturbation in vitro—A growing case ...
Sep 14, 1982 - expression data and review how cell lines adapt to in vitro environments, to what degree they express .... +46 8 5248 6262; fax: +46 8 319 470.

Population genomics of human gene expression
Sep 16, 2007 - Understanding the molecular basis of human phenotypic variation is a ... functional genetic effects between populations, and describe the degree ... Received 30 May; accepted 29 August; published online 16 ...... Annotation (GOA) Datab

Gene Expression Changes in the Motor Cortex ... - Re.Public@Polimi
Apr 24, 2013 - In each of these three-group comparisons, the Kruskal-Wallis test was used to ..... 276–295. 114.8. (2)TGTCGGTGTCGTAAGGGTTG. 350–331.

Differential gene expression during seed germination ... - Springer Link
+49-39482-5663, Fax: +49-39482-5155. Present address: ... Received: 3 December 2001 / Accepted: 31 January 2002 / Published online: 28 March 2002. © Springer-Verlag 2002 ...... corbate peroxidase protect aerobic organisms from free.