Modeling Dependent Gene Expression D ONATELLO T ELESCA University of Texas, M.D. Anderson Cancer Center Department of Biostatistics

Joint work with

¨ P ETER M ULLER (M.D. Anderson, Biostatistics) G IOVANNI PARMIGIANI (Johns Hopkins, Biostatistics)

Modeling Dependent Gene Expression – p. 1/22

Motivation: Epithelial Ovarian Cancer (EOC)

• Epithelial tumors start from the cells that cover the outer surface of the ovary. Most ovarian tumors are epithelial cell tumors. • Poor outcome in EOC patients is associated with metastases to the peritoneum and stroma. • Evidence is mounting that an inflammatory process contributes to tumor growth and metastasis to the peritoneum in EOC. Modeling Dependent Gene Expression – p. 2/22

EOC ( Complement and Coagulation Cascade Pathway)

Modeling Dependent Gene Expression – p. 3/22

Outline

• From Pathways to Conditional Independence Priors

◦ Non-recursive graphs and Markov Random Fields • Probability of Expression (Parmigiani and Garreth 2002)

◦ Modeling gene expression with Normal Uniform mixtures. • Dependent Probability of Expression

◦ Conditional dependence and tetrachoric correlation • Posterior Inferences and Computations

◦ Model determination via RJ–MCMC • Applications

◦ A simple simulation ◦ EOC study

Modeling Dependent Gene Expression – p. 4/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges.

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges.

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996).

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 5/22

From Pathways to Conditional Independence Priors

◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1

2

M

1

2

4

3

6= 4

3

Modeling Dependent Gene Expression – p. 5/22

0.15

0.20

• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).

0.05

0.10

• y˜gt = ygt − (αt + mg )

0.00

Frequency

0.25

0.30

0.35

POE: Probability of Expression (Parmigiani and Garreth, 2002)

−5

0

5

10

Observed mRNA Intensity

Modeling Dependent Gene Expression – p. 6/22

0.15

0.20

• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).

0.05

0.10

• y˜gt = ygt − (αt + mg )

0.00

Frequency

0.25

0.30

0.35

POE: Probability of Expression (Parmigiani and Garreth, 2002)

−5

0

5

10

Observed mRNA Intensity

  −  f = U ( −κ g,−1  g , 0)  p(˜ ygt |egt ) = fegt (˜ ygt | κg , sg ) with fg,0 = N ( 0, sg )     fg,1 = U ( 0, κ+ ) g Modeling Dependent Gene Expression – p. 6/22

POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)

and

πg+ = P (egt = 1)

Modeling Dependent Gene Expression – p. 7/22

POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)

and

πg+ = P (egt = 1)

◦ Specifically, for each data point:

P (egt = 1 |

P (egt = −1 |

ygt , πg+ , πg− , f1,g , f0,g ) ygt , πg+ , πg− , f−1,g , f0,g )

=

=

πg+ f1,g (ygt ) πg+ f1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt )) πg− f−1,g (ygt ) πg− f−1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt ))

Modeling Dependent Gene Expression – p. 7/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements.

Modeling Dependent Gene Expression – p. 8/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg

Modeling Dependent Gene Expression – p. 8/22

POE: Probability of Expression

• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg ◦ We will relax assumption (1) integrating known pathway interactions in the form of a conditional independence prior.

Modeling Dependent Gene Expression – p. 8/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

egt

ene(g)t

zgt

zne(g)t

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

egt

ene(g)t

zgt

zne(g)t

⇛ mRNA Abundance

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

⇛ Latent Probit scores

Ωz | G = {V, E}

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression

ygt

yne(g)t

⇛ mRNA Abundance

egt

ene(g)t

⇛ Trinary indicators of DE

zgt

zne(g)t

⇛ Latent Probit scores

Ωz | G = {V, E}

⇛ Polychoric Concentration

Modeling Dependent Gene Expression – p. 9/22

DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);

◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)

Modeling Dependent Gene Expression – p. 10/22

DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);

◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)

⇒ |{z} Z ∼ MN ( µ , Ω−1 , In ) z |{z} |{z} |{z} N ×n

N ×n N ×N n×n

The (i, j)th element in Ωz is −cij , and cij = 0 iff i ∈ / ne(j) −→ conditional independence. Modeling Dependent Gene Expression – p. 10/22

Posterior Inference and Computation • The availability of closed form conditional posterior distributions allows for straightforward Gibbs sampling, given a specific graph G = {V, E}. • Recognizing that the prior pathway represents knowledge of genetic interactions in a non pathological state, we allow for deviation from the prior dependence structure encoded in G = {V, E}. • We consider the prior path diagram G = {V, E}, as the saturated model and allow for random deletion/insertion of edges compatible with the original pathway. • If we define ν ∈ {G}ν , as a compatible reconfiguration of the original pathway, we are now interested in the following distribution:

P (θ, ν | Y ) = P (Y | θ, ν) P (θ|ν) P (ν ∈ {G}ν )

Modeling Dependent Gene Expression – p. 11/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Posterior Inference and Computation: (RJ-MCMC Scheme)

◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.

1

2

M

1

2

4

3

= 4

3

Modeling Dependent Gene Expression – p. 12/22

Simulation Study

◦ We define latent expression scores as: wgt = zgt + X′gt bg

where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n

Modeling Dependent Gene Expression – p. 13/22

Simulation Study

◦ We define latent expression scores as: wgt = zgt + X′gt bg

where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n

◦ The mRNA abundance is then defined as (N=200, n=60): ygt | wgt ≤ −1 ∼ N (−4, 22 ), ygt | wgt > 3 ∼ N (4, 22 ), ygt | −1 < wgt ≤ 3 ∼ N (0, 1). ◦ We will consider two conditional dependence schemes, a cluster scheme and a banded scheme, and fit the model with a misspecified prior pathway. Modeling Dependent Gene Expression – p. 13/22

Simulation Study: (Banded Structure)

Signal

P (Cij 6= 0 | Y)

E(Cij | Y)

Modeling Dependent Gene Expression – p. 14/22

Simulation Study:(Banded Structure)

200

50 10 20 30 40 50 60

t

100 50

g

100

g

150

150

200 150 100 50

g

p∗ = (p+ − p− )

mRNA Abundace 200

Signal

10

30

t

50

10 20 30 40 50 60

t

Modeling Dependent Gene Expression – p. 15/22

Simulation Study: (Cluster Structure)

Signal

P (Cij 6= 0 | Y)

E(Cij | Y)

Modeling Dependent Gene Expression – p. 16/22

Simulation Study:(Cluster Structure)

10

20

30

t

40

50

60

200

g

50

100

150

200 50

g

100

150

150 100 50

g

p∗ = (p+ − p− )

mRNA Abundance

200

Signal

10

30

t

50

10

20

30

40

50

60

t

Modeling Dependent Gene Expression – p. 17/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT).

Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC.

Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC. • One subset of genes reported on the NIH custom microarray are genes in the coagulation and complement pathway (http://www.genome.ad.jp). The arches in the pathway are interpreted as prior judgement about (approximate) conditional dependence. Modeling Dependent Gene Expression – p. 18/22

EOC Study (Complement and Coagulation Pathway)

E[F DR | Y]

0.20 0.15 0.10

E(FDR | Y)

80 60

0.05

40

0.00

20 0

Number of Edges

100

120

No. of edges

0

1000

2000

3000

RJ−MCMC Iteration

4000

5000

0

50

100

150

Number of Significan Edges

Modeling Dependent Gene Expression – p. 19/22

EOC Study (Complement and Coagulation Pathway)

C4A

CR2

C2

C5R1

C5

C3conv

CR1

CCL13

PROS1

IL8

SERPINE1

C3AR1

CXCL14

CXCL6

VWF

PLAU

F8

F10

PROC

F2

F5

THBD

F2R

F9

◦ 10 benign samples ◦ 14 tumor samples ◦ 179 Genes ◦ Edges selected so that E(F DR | Y ) ≤ 0.05 Modeling Dependent Gene Expression – p. 20/22

Summary

• • • •



We provide a coherent probabilistic framework that integrates prior information about genetic interaction into the analysis of expression data. Prior information is formally introduced into the POE model for molecular classification in cancer, via conditional independence priors. Dependence between gene is formalized in term of polychoric correlations between trinary indicators of over,under or normal expression. The limitations associated with the multivariate probit formulation, are counterbalanced by the ease of representing conditional independence in the Gaussian framework. Preliminary results on simulated and data from an EOC study, show that our model validates patterns and strength of dependence between genes.

Modeling Dependent Gene Expression – p. 21/22

Acknowledgments

-

Peter Müller

(MDACC)

-

Giovanni Parmigiani

(Johns Hopkins)

• Contact / Preprints : ◦ e-mail: [email protected] ◦ web : donatello.telesca.googolpages.com/home

Modeling Dependent Gene Expression – p. 22/22

Modeling Dependent Gene Expression

From Pathways to Conditional Independence Priors. ◦ Non-recursive graphs and Markov Random Fields. • Probability of Expression (Parmigiani and Garreth ...

2MB Sizes 0 Downloads 310 Views

Recommend Documents

No documents