Modeling Dependent Gene Expression D ONATELLO T ELESCA University of Texas, M.D. Anderson Cancer Center Department of Biostatistics
Joint work with
¨ P ETER M ULLER (M.D. Anderson, Biostatistics) G IOVANNI PARMIGIANI (Johns Hopkins, Biostatistics)
Modeling Dependent Gene Expression – p. 1/22
Motivation: Epithelial Ovarian Cancer (EOC)
• Epithelial tumors start from the cells that cover the outer surface of the ovary. Most ovarian tumors are epithelial cell tumors. • Poor outcome in EOC patients is associated with metastases to the peritoneum and stroma. • Evidence is mounting that an inflammatory process contributes to tumor growth and metastasis to the peritoneum in EOC. Modeling Dependent Gene Expression – p. 2/22
EOC ( Complement and Coagulation Cascade Pathway)
Modeling Dependent Gene Expression – p. 3/22
Outline
• From Pathways to Conditional Independence Priors
◦ Non-recursive graphs and Markov Random Fields • Probability of Expression (Parmigiani and Garreth 2002)
◦ Modeling gene expression with Normal Uniform mixtures. • Dependent Probability of Expression
◦ Conditional dependence and tetrachoric correlation • Posterior Inferences and Computations
◦ Model determination via RJ–MCMC • Applications
◦ A simple simulation ◦ EOC study
Modeling Dependent Gene Expression – p. 4/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges.
Modeling Dependent Gene Expression – p. 5/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges.
Modeling Dependent Gene Expression – p. 5/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996).
Modeling Dependent Gene Expression – p. 5/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 5/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 5/22
From Pathways to Conditional Independence Priors
◦ We represent a pathway as a graph G = {V, E}, where V = V (G) is a set of genes involved in the pathway, and E = E(G) is a set of directed or undirected edges. ◦ Pathways usually involve loops and reciprocal (a ⇆ b) edges. ◦ We assume that pathways can be encoded in the structure of a reciprocal graph (Koster, 1996). 1
2
M
1
2
4
3
6= 4
3
Modeling Dependent Gene Expression – p. 5/22
0.15
0.20
• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).
0.05
0.10
• y˜gt = ygt − (αt + mg )
0.00
Frequency
0.25
0.30
0.35
POE: Probability of Expression (Parmigiani and Garreth, 2002)
−5
0
5
10
Observed mRNA Intensity
Modeling Dependent Gene Expression – p. 6/22
0.15
0.20
• ygt : expression, gene g, sample t with (g = 1, ..., N ), (t = 1, ..., n).
0.05
0.10
• y˜gt = ygt − (αt + mg )
0.00
Frequency
0.25
0.30
0.35
POE: Probability of Expression (Parmigiani and Garreth, 2002)
−5
0
5
10
Observed mRNA Intensity
− f = U ( −κ g,−1 g , 0) p(˜ ygt |egt ) = fegt (˜ ygt | κg , sg ) with fg,0 = N ( 0, sg ) fg,1 = U ( 0, κ+ ) g Modeling Dependent Gene Expression – p. 6/22
POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)
and
πg+ = P (egt = 1)
Modeling Dependent Gene Expression – p. 7/22
POE: Probability of Expression ◦ Trinary indicators of over/underexpression 8 > if Over expression > < 1 egt = 0 if Normal expression > > : −1 if Under expression ◦ The overall proportion of DE genes is characterized by: πg− = P (egt = −1)
and
πg+ = P (egt = 1)
◦ Specifically, for each data point:
P (egt = 1 |
P (egt = −1 |
ygt , πg+ , πg− , f1,g , f0,g ) ygt , πg+ , πg− , f−1,g , f0,g )
=
=
πg+ f1,g (ygt ) πg+ f1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt )) πg− f−1,g (ygt ) πg− f−1,g (ygt ) + (1 − πg+ − πg− )f0,g (ygt ))
Modeling Dependent Gene Expression – p. 7/22
POE: Probability of Expression
• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements.
Modeling Dependent Gene Expression – p. 8/22
POE: Probability of Expression
• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg
Modeling Dependent Gene Expression – p. 8/22
POE: Probability of Expression
• The POE framework converts abundance measurements into probabilities of DE, providing an interpretable scale for tumor classification and stabilizing the abundance measurements. Key Assumptions: 1) egt independent given πg+ , πg− and fg ‘s 2) ygt independent given egt , αt and mg ◦ We will relax assumption (1) integrating known pathway interactions in the form of a conditional independence prior.
Modeling Dependent Gene Expression – p. 8/22
DepPOE: Dependent Probability of Expression
ygt
yne(g)t
egt
ene(g)t
zgt
zne(g)t
Ωz | G = {V, E}
Modeling Dependent Gene Expression – p. 9/22
DepPOE: Dependent Probability of Expression
ygt
yne(g)t
egt
ene(g)t
zgt
zne(g)t
⇛ mRNA Abundance
Ωz | G = {V, E}
Modeling Dependent Gene Expression – p. 9/22
DepPOE: Dependent Probability of Expression
ygt
yne(g)t
⇛ mRNA Abundance
egt
ene(g)t
⇛ Trinary indicators of DE
zgt
zne(g)t
Ωz | G = {V, E}
Modeling Dependent Gene Expression – p. 9/22
DepPOE: Dependent Probability of Expression
ygt
yne(g)t
⇛ mRNA Abundance
egt
ene(g)t
⇛ Trinary indicators of DE
zgt
zne(g)t
⇛ Latent Probit scores
Ωz | G = {V, E}
Modeling Dependent Gene Expression – p. 9/22
DepPOE: Dependent Probability of Expression
ygt
yne(g)t
⇛ mRNA Abundance
egt
ene(g)t
⇛ Trinary indicators of DE
zgt
zne(g)t
⇛ Latent Probit scores
Ωz | G = {V, E}
⇛ Polychoric Concentration
Modeling Dependent Gene Expression – p. 9/22
DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);
◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)
Modeling Dependent Gene Expression – p. 10/22
DepPOE: Dependent Probability of Expression ◦ Trinary indicators of over/underexpression (Probit formulation) 8 > if zgt > φg Over expression > < 1 egt = 0 if − 1 < zgt ≤ φg Normal expression > > : −1 if zgt ≤ −1 Under expression where zgt ∼ N (µgt , 1);
◦ We introduce a dependence prior via tetrachoric correlations. ′ µgt = x′gt bg + zne(g)t cne(g)
⇒ |{z} Z ∼ MN ( µ , Ω−1 , In ) z |{z} |{z} |{z} N ×n
N ×n N ×N n×n
The (i, j)th element in Ωz is −cij , and cij = 0 iff i ∈ / ne(j) −→ conditional independence. Modeling Dependent Gene Expression – p. 10/22
Posterior Inference and Computation • The availability of closed form conditional posterior distributions allows for straightforward Gibbs sampling, given a specific graph G = {V, E}. • Recognizing that the prior pathway represents knowledge of genetic interactions in a non pathological state, we allow for deviation from the prior dependence structure encoded in G = {V, E}. • We consider the prior path diagram G = {V, E}, as the saturated model and allow for random deletion/insertion of edges compatible with the original pathway. • If we define ν ∈ {G}ν , as a compatible reconfiguration of the original pathway, we are now interested in the following distribution:
P (θ, ν | Y ) = P (Y | θ, ν) P (θ|ν) P (ν ∈ {G}ν )
Modeling Dependent Gene Expression – p. 11/22
Posterior Inference and Computation: (RJ-MCMC Scheme)
◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.
1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 12/22
Posterior Inference and Computation: (RJ-MCMC Scheme)
◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.
1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 12/22
Posterior Inference and Computation: (RJ-MCMC Scheme)
◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.
1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 12/22
Posterior Inference and Computation: (RJ-MCMC Scheme)
◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.
1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 12/22
Posterior Inference and Computation: (RJ-MCMC Scheme)
◦ We consider trans–dimensional moves that operate seamlessly between the space of pathways and the corresponding conditional independence structures.
1
2
M
1
2
4
3
= 4
3
Modeling Dependent Gene Expression – p. 12/22
Simulation Study
◦ We define latent expression scores as: wgt = zgt + X′gt bg
where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n
Modeling Dependent Gene Expression – p. 13/22
Simulation Study
◦ We define latent expression scores as: wgt = zgt + X′gt bg
where −1 Z ∼ MN (0, Ω z , IN ) |{z} N ×n
◦ The mRNA abundance is then defined as (N=200, n=60): ygt | wgt ≤ −1 ∼ N (−4, 22 ), ygt | wgt > 3 ∼ N (4, 22 ), ygt | −1 < wgt ≤ 3 ∼ N (0, 1). ◦ We will consider two conditional dependence schemes, a cluster scheme and a banded scheme, and fit the model with a misspecified prior pathway. Modeling Dependent Gene Expression – p. 13/22
Simulation Study: (Banded Structure)
Signal
P (Cij 6= 0 | Y)
E(Cij | Y)
Modeling Dependent Gene Expression – p. 14/22
Simulation Study:(Banded Structure)
200
50 10 20 30 40 50 60
t
100 50
g
100
g
150
150
200 150 100 50
g
p∗ = (p+ − p− )
mRNA Abundace 200
Signal
10
30
t
50
10 20 30 40 50 60
t
Modeling Dependent Gene Expression – p. 15/22
Simulation Study: (Cluster Structure)
Signal
P (Cij 6= 0 | Y)
E(Cij | Y)
Modeling Dependent Gene Expression – p. 16/22
Simulation Study:(Cluster Structure)
10
20
30
t
40
50
60
200
g
50
100
150
200 50
g
100
150
150 100 50
g
p∗ = (p+ − p− )
mRNA Abundance
200
Signal
10
30
t
50
10
20
30
40
50
60
t
Modeling Dependent Gene Expression – p. 17/22
EOC Study (Complement and Coagulation Pathway)
• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT).
Modeling Dependent Gene Expression – p. 18/22
EOC Study (Complement and Coagulation Pathway)
• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC.
Modeling Dependent Gene Expression – p. 18/22
EOC Study (Complement and Coagulation Pathway)
• We focus on the comparison of 10 peritoneal samples from patients with benign ovarian pathology (bPT) versus 14 samples from patients with malignant ovarian pathology (mPT). • Wang et Al. (2005) report a study of epithelial ovarian cancer (EOC). The goal of the study is to characterize the role of the tumor microenvironment in favoring the intra–peritoneal spread of EOC. • One subset of genes reported on the NIH custom microarray are genes in the coagulation and complement pathway (http://www.genome.ad.jp). The arches in the pathway are interpreted as prior judgement about (approximate) conditional dependence. Modeling Dependent Gene Expression – p. 18/22
EOC Study (Complement and Coagulation Pathway)
E[F DR | Y]
0.20 0.15 0.10
E(FDR | Y)
80 60
0.05
40
0.00
20 0
Number of Edges
100
120
No. of edges
0
1000
2000
3000
RJ−MCMC Iteration
4000
5000
0
50
100
150
Number of Significan Edges
Modeling Dependent Gene Expression – p. 19/22
EOC Study (Complement and Coagulation Pathway)
C4A
CR2
C2
C5R1
C5
C3conv
CR1
CCL13
PROS1
IL8
SERPINE1
C3AR1
CXCL14
CXCL6
VWF
PLAU
F8
F10
PROC
F2
F5
THBD
F2R
F9
◦ 10 benign samples ◦ 14 tumor samples ◦ 179 Genes ◦ Edges selected so that E(F DR | Y ) ≤ 0.05 Modeling Dependent Gene Expression – p. 20/22
Summary
• • • •
•
We provide a coherent probabilistic framework that integrates prior information about genetic interaction into the analysis of expression data. Prior information is formally introduced into the POE model for molecular classification in cancer, via conditional independence priors. Dependence between gene is formalized in term of polychoric correlations between trinary indicators of over,under or normal expression. The limitations associated with the multivariate probit formulation, are counterbalanced by the ease of representing conditional independence in the Gaussian framework. Preliminary results on simulated and data from an EOC study, show that our model validates patterns and strength of dependence between genes.
Modeling Dependent Gene Expression – p. 21/22
Acknowledgments
-
Peter Müller
(MDACC)
-
Giovanni Parmigiani
(Johns Hopkins)
• Contact / Preprints : ◦ e-mail:
[email protected] ◦ web : donatello.telesca.googolpages.com/home
Modeling Dependent Gene Expression – p. 22/22