Studying and improving current approaches for using a Bayesian framework for modeling gene regulatory networks Milad Behbahaninia1 and Michael Verdicchio2 Harrington Department of Bioengineering1, School of Computing and Informatics2
1) Study current approaches for using the Bayesian framework in biology Research Goals: 2) Improve usage of Bayesian networks in modeling gene regulation Abstract Gene regulatory networks are networks of genomic interaction, lending insight into various functions within a cell. Learning new ways in which genes interact can facilitate the discovery of triggering mechanisms and treatments for high-profile diseases such as cancer. There are many ways by which gene regulatory networks can be modeled, one of which is a Bayesian network (BN). Even though Bayesian networks have been widely used there is definite room for improvement. We will incorporate prior knowledge learned from biomedical knowledge bases into the Bayesian network learning process in order to improve the chances discovering of novel gene interactions using gene expression microarray data.
Why a Bayesian Network?
Challenges of BN Learning in the Biomedical Domain:
• A Bayesian network is a probabilistic graphical structure representing a set of variables and the probabilities of their interactions. • BNs minimize the number of parameters needed to specify the probability distribution by taking advantage of the conditional independencies between genes. These conditional independencies are encoded in the graph to help visualization and reasoning [6]. • BNs have been proven to be useful and important in biomedical applications including clinical decision support systems and information retrieval. BNs have also been researched for more than a decade, resulting in practical algorithms and tools. Learning the network back from data sets involves a search-and-score strategy, which attempts to identify the most probable network given the data. • Algorithms search the space of all possible networks for the one that maximizes the score based on greedy, local, or other search strategies [1].
1. Data sets are extremely large 2. Data sets have few samples but many variables 3. There are redundant and concurrent processes in the cellular system, but a BN model assumes all data comes from some homogenous distribution
Example Bayesian Network
Knowledge Resources Available for Integration with BNs:
P(s1) = 20%
S Feature S B L F C
Value
Expression
s1
Smoker
S2
Non- smoker
b1
Bronchitis
B2
No bronchitis
l1
Lung cancer
L2
No lung cancer
f1
Fatigue
P(f1|b1,l1) = 75%
F2
No fatigue
P(f1|b1,l2) = 10%
C1
Positive chest X-ray
P(f1|b2,l1) = 50%
C2
Negative chest X-ray
P(f1|b2,l2) = 5%
P(b1|s1) = 25%
P(l1| s1) = 0.3%
P(b1|S2) = 5%
L
P(l1 | S2) = 0.005%
B C
F
P(c1 | l1) = 60% P(c1 | l2) = 2%
Bayesian Network Learning with Knowledge Integration Tissues/Samples B
Bayesian Learning Algorithm
Genes
B
A
A D
D
C
C
Gene Ontology (GO) Interactions Annotates genes with BN edges can be validated their processes and by comparing the functions. Edges learned connected genes to known by the BN can be interactions from validated using GO terms biomedical literature
Pathways Edges learned by the BN can be validated by finding biological signaling pathways shared by the genes
What’s Next? The immediate goal is to incorporate biological knowledge obtained through knowledge repositories to improve the Bayesian network learning technique, which will be validated by comparing the improved method to the traditional method. The improved technique will be used in a human cancer study. The future goal is to apply similar techniques learned Bayesian networks to Causal Networks. Causal networks are a special case of Bayesian networks in which each connection carries with it the implication of a causal relationship. In the end, modified techniques for Bayesian and Causal networks will be evaluated and assessed. Works Cited 1. E. Almasri, P. Larsen, G. Chen, and Y. Dai, "Incorporating literature knowledge in bayesian network for inferring gene networks with gene expression data," 2008, pp. 184-195. [Online]. Available: http://dx.doi.org/10.1007/978-3-54079450-9_18 2. Ashburner et al., “Gene Ontology: tool for the unification of biology,” 2000. [Online]. Available: http://www.nature.com.ezproxy1.lib.asu.edu/ng/journal/v25/n1/abs/ng0500_25.html 3. Harris et al., “The Gene Ontology (GO) database and informatics resource,” 2003. [Online]. Available: http://www.nature.com.ezproxy1.lib.asu.edu/ng/journal/v25/n1/abs/ng0500_25.html 4. M. Kanehisa, S. Goto, S. Kawashima, and A. Nakaya, “The KEGG databases at GennomeNet,” 2001. [Online]. Avaiable: http://nar.oxfordjournals.org.ezproxy1.lib.asu.edu/cgi/content/full/30/1/42 5. P. Larsen, E. Almasri, G. Chen, and Y. Dai, "A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments," BMC Bioinformatics, vol. 8, pp. 317+, August 2007. [Online]. Available: http://dx.doi.org/10.1186/1471-2105-8-317 6. J. Pena, J. Bjorkegren, J. Tegner, “Growing Bayesian Network Models of Gene Networks from Seed Genes,” 2005. [Online]. Available: http://bioinformatics.oxfordjournals.org.ezproxy1.lib.asu.edu/cgi/reprint/21/suppl_2/ii224
Acknowledgements Unknown Gene Network
Micro-Array Data
Biological Knowledge
FURI: Research Program
Learned Network
Seungchan Kim, Ph.D., Assistant Professor, School of Computing and informatics Ina Sen, Graduate Research Associate, School of Computing and Informatics Archana Ramesh, Graduate Research Associate, School of Computing and Informatics