1

Evaluation of immune response against Plasmodium Falciparum malaria based on a hidden markov model Jimmy Alexander Cifuentes, Luis Fernando Ni˜no Universidad Nacional de Colombia Bogot´a, Colombia jiacifuentesro, [email protected]

Abstract—A hidden markov model (HMM) is presented, it allow evaluate the immune response against P lasmodium F alciparum malaria based on epitopes response obtained experimentally, the HMM is trained and improved with the experimental results and tested with new data. The model provides a sequence generated probability, that lower when the epitope does not generate adequate immune response. Index Terms—Hidden markov model, peptides, epitopes, scoring.

I. I NTRODUCTION

T

HE most letal malaria form is caused by P lasmodium F alciparum parasite, killing 2 million people anually, most of them african children under 5 years old, and affecting more than 500 million people worldwide anually [1]. At present there is great research to find a synthetic vaccine to reduce the number of infected and similarly the number of deaths from this disease. Highlight areas of similarity between two aminoacid sequences can reveal functional relationships that allow the classification of proteins (epitopes or aminoacid sequences) of interest identified, for example in the infection process of red blood cells by the P lasmodium F alciparum. A hidden markov model can be used evaluate the immune response against P lasmodium f alciparum malaria, based on the last experimental results. In the II section is presented a short review of Hidden Markov Models theory, then in the III section is presented the immune evaluation problem that is the principal item to resolve with this approach, after in the IV section is described the model design, the data that was used and it distribution, in the V section is described the model implementation and how was the experiments performed, in the VI section the obtained results are showed, finally the conslusions and future work are presented in the VII section.

II. H IDDEN M ARKOV M ODELS The HMM(hidden markov model) is a stocastic method that uses the probabilities measures in order to model data sequences represented like observation vectors [14], a HMM consists in a stochastic process that its not observable

(hidden) and another set of stochastic processes that produces a sequence of observable symbols [15].

The HMM have been used primarily in research of speech recognition[16], time-series clustering[17], but its use has spread to other areas such as written pattern recognition [11] or visual pattern recognition[18] and even in fields related to bioinformatics as the modeling of protein superfamilies[19] and coding regions in DNA [20].

Formally a HMM can de defined by a 5-tuple:

λ = (Σ, Q, A, B, π) , where Σ = v1 , v2 , ..., vM is an finit alphabet or finit set of M symbols Q = 1, 2, ..., N is a finit set of N states A = (aij )N ×N is a transition probability matrix, where aij is the tranasition probability from i state to j state, for all i, j that belongs N B = (bj (ot ))N ×M is a probability vector of symbols emission, one for each state, where bj = (bj1 , bj2 , ..., bjM ) is the emission probability of vk symbol on the j state π = (π1 , π2 , ..., πN ) is a probabilities vector of initial state q0 in Q.

The figure 1 represent the HMM usual structure with different states and associated transition and emision probabilities. III. I MMUNE EVALUATION PROBLEM An approach to find an effective synthetic vaccine consists in modifying the structure of certain HABP (high-activity binding peptides) [2] to enable them to produce antibodies to protect against disease. When the HABP were used for monkey inmunization none of them were protected against P lasmodium F alciparum malaria, but when some of this HABP were modified an inmunogenic response was generated. Because of this a new

2

expected results. However the process of determine the immune response is not the only process involved in this methodology, but the time can be highly reduced with a in silico model. In conclusion there isnt an associated model to allow a proper assessment (experiments a priori) of amino acid sequence according to the immune response generated against malaria. Fig. 1: HMM Architecture. S = initial state, E= final state, di , mi y ii delete, main, and insert.

IV. HMM D ESIGN methodology was developed in order to identify the modified HABP with high immune response. To select the amino acids to change, identify critical residues, that are the amino acids that are vital to the union, and are exchanged for other amino acids according take into account that its mass, volume and surface are similar, but where its polarity is opposite. These criteria lead to the following changes: F ↔ R; W ↔ Y ; L ↔ H; I ↔ N ; M ↔ K; P ↔ D; Q ↔ E and taking note that S, A and G to be very small without a counterpart in its polarity, it should carefully consider these changes. The process of changing its structure and assessment in the generation of antibodies is as follows: • The first step must be the verification process of peptides with high binding capacity, respectively, and critical residues of these peptides. • Then proceed to synthesize the peptide candidates and their possible modifications for a single sequence that can be over or more than 50. • Then proceed to treat the species of monkeys Aotus with modified peptides. To determine whether immune response exists, a test called the IFA (Immunofluorescence Antibody Test) is made at 15 and 20 days after treatment. • Finally, must be examined how monkeys were protected against malaria. The problem with this approach is its high cost in terms of time and money, and there is not a model that can lead an apriori assesment of which peptides can be a potential candidate to produce immnue response. This makes that all posibilities are consider like a candidate, increasing the time and complexity to find the searched ones. Also most of the modifications provided totally negative results, only around of 10% induced the production of long-lived antibodies and then the immune response searched. Actually from the beginning of this methodology, 10 years ago, have been explored only close to 20 HABPs and their modifications, around 500 aminoacid sequences. If one takes into account that the number of possible modifications its very high, the manual process is too slowly to find the

To solve the problem before mentioned a HMM was designed in order to represent a model that can generate the aminoacid sequences that produces immune response against P lasmodium F alciparum malaria. A classification model would reduce costs involved in the verification process of the peptides analyzed, reducing the amount of experiments in vitro, only those that the model classified in silico as good candidates to produce immune response. The approach used here was model the peptide in it primary structure representation. The peptide is represented as a sequence of probabilities. The states of the designed model are the specific positions of the aminoacid, it means that each state contains the probability that an aminoacid occurs in a particular position of the sequence. The observable symbols are the aminoacid letter representation and the transition pobabilities are represented as the connectors between states. A reduced example of the design is showed in the figure 2

Fig. 2: HMM Proposed model (reduced), the states represents the position of each aminoacid in the sequence, the observable symbols are the aminoacid letter representation

3

A. Data description Based on experimental results (obtained in the last 10 years manually), 463 peptides (˜20 aminoacids each one) were evaluated, it can be separeted in three classes according of the immune response. The first class (A) is the one where the peptides produce antibodies and an effective immune response, the second one (B) induce the body to produce antibodies but its not enough to consider them an effective immune response and the third one (C) its where definitely the peptides doesnt produce antibodies and then not immune response was generated. For the model process design only 324 sequences were used, the others were used to validate the model.

shows this data. Again, the highest probabilities are those in which the previous aminoacid is Lysine and the lowest are those in which the previous aminoacid is Cysteine, actually this measures doesnt contribute information to the model, because in all the cases, when the sequence generate immunes response, when generate antibodies and qhen is totally negative, the proportion of the Lysine and Cysteine is the same. However the other measures are very important because generally in every aminoacid sequence, independence of the context, the probability of occurrence of each aminoacid depends in great measure of the previous aminoacids.

D. Probability according of aminoacid position B. Peptides data The sequences that are evaluated, belongs to all the proteins involved in the red blood cell invasion, that is the principal step in the malaria infection [2], those are the P lasmodium F alciparum merozoite surface proteins and some of the P lasmodium F alciparum microneme proteins.

This measure its calculated according the position of each aminoacid in the sequence, in this experiments the max lenght of a peptide its 20 aminoacids. The table ?? shows this data. The position in the sequence of each aminoacid is the way to represent the states in the proposed HMM, actually is very important in the process of synthesize the peptides. The modifications are determined by the position in the sequences, it means that the aminoacid changes always are in the same position where previously was identified the critical residues. V. HMM I MPLEMENTATION

Fig. 3: Aminoacid Frequency

The aminoacids frequency have a particulary distribution. As the table I shows that the K aminoacid (Lysine) its the most frequent aminoacid, this feature is due to the P lasmodium F alciparum codon bias, the genome is one of the most (A + T)-rich genome sequenced to date[12], it means that the ammount of Lysine is very high because the posibilities of codified it is very high too. In other hand, the C aminoacid (Cysteine) its not very common because the most of the data is from syntetized peptides and the process to make a sequence with Cysteine is very complex, in most of the cases the Cysteine was replaced with G (Glycine) because Glycine is the simplest aminoacid.

The HMM was implemented on a software prototype, and was setted with the data showed before. The data represents only the sequences that belongs to the A class, it means that only the sequences with immune response were taken in account to design the model. The other sequences were used to test the model in order to identify the proportion of difference between every class. However the real test was made with other data that was previously separate with this purpose. In this test data the classes proporrtion is the same that in the training data. The main goal is that the probability of a sequence with immune response evaluated in the HMM is high. It means that if the probability is close to 0, probably that sequence doesnt have immune response. Asses the probability of a sequence in a HMM is a very complex process to do in a brute-force way, the best options are the Viterbi Algorithm and the forward algorithm.

VI. R ESULTS C. Probability of aminoacids changes Another measure take in account is the probability of transition of each aminoacid. It means the probability of each aminoacid given a predecessor aminoacid. The table II

Show the experimental results obtained with the forward algorithm nad with viterbi algorithm, graph both to show that the probabilities obtained for the sequences of class A are higher than the probabilities of sequences of class B or C.

4

TABLE I: Transition probabilities, the first row represent the aminoacid against it previous aminoacid represented in the first column

TABLE II: States probabilities, The first column represents the possible positions in a sequence

VII. C ONCLUSIONS The HMM is a good way to represent aminoacid sequences in order to classify them accordding some feature, in this case the immune response against malaria, but the limitation of this approach its that is not way to represent a high order features or relationships between aminoacids. In this approach the sequences were modeled as in their primary structure representation, however the results are appropiated to evaluate the immune response problem. As a future work the HMM must be designed including the transition probability taking in account not only the previous aminoacid else a group of two or three previous and next

aminoacids. R EFERENCES [1] Robert W. Snow and Carlos A. Guerra and Abdisalan M. Noor and Hla Y. Myint and Simon I. Hay.: The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature 434 p 214-217 (2005). [2] M.E. Patarroyo y M.A. Patarroyo, Emerging Rules for Subunit-Based, Multiantigenic, Multistage Chemically Synthesized Vaccines, Acc. Chem. Res, vol. 41, 2008, pgs. 377-386. [3] M.E. Patarroyo et al., Strategies for developing multi-epitope, subunitbased, chemically-synthesized antimalarial vaccines, Journal of Cellular and Molecular Medicine, 2007. [4] A. Sacan y I. Toroslu, Amino Acid Substitution Matrices Based on 4-Body Delaunay Contact Profiles, Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on, 2007, pgs. 796-801.

5

[5] B. Vanschoenwinkel y B. Manderick, Substitution matrix based kernel functions for protein secondary structure prediction, Machine Learning and Applications, 2004. Proceedings. 2004 International Conference on, 2004, pgs. 388-396. [6] M. Nei, Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions, SMBE, 1986. [7] Y.K. Yu, J.C. Wootton, y S.F. Altschul, The compositional adjustment of amino acid substitution matrices, Proceedings of the National Academy of Sciences, vol. 100, 2003, pgs. 15688-15693. [8] S.F. Altschul, Amino Acid Substitution Matrices from an Information Theoretic Perspective, J. Mol. Bd, vol. 219, 1991, pgs. 555-565. [9] P.C. Ng y S. Henikoff, Predicting Deleterious Amino Acid Substitutions, Cold Spring Harbor Lab, 2001. [10] F. Fabris, A. Sgarro, y A. Tossi, Splitting the BLOSUM Score into Numbers of Biological Significance, EURASIP Journal on Bioinformatics and Systems Biology, vol. 2007, 2007, pg. 31450. [11] R. Nag, K. Wong, y F. Fallside, Script recognition using hidden Markov models, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’86., 1986, pgs. 2071-2074. [12] M.J. Gardner, N. Hall, E. Fung, O. White, M. Berriman, R.W. Hyman, J.M. Carlton, A. Pain, K.E. Nelson, S. Bowman, I.T. Paulsen, K. James, J.A. Eisen, K. Rutherford, S.L. Salzberg, A. Craig, S. Kyes, M. Chan, V. Nene, S.J. Shallom, B. Suh, J. Peterson, S. Angiuoli, M. Pertea, J. Allen, J. Selengut, D. Haft, M.W. Mather, A.B. Vaidya, D.M.A. Martin, A.H. Fairlamb, M.J. Fraunholz, D.S. Roos, S.A. Ralph, G.I. McFadden, L.M. Cummings, G.M. Subramanian, C. Mungall, J.C. Venter, D.J. Carucci, S.L. Hoffman, C. Newbold, R.W. Davis, C.M. Fraser, y B. Barrell, Genome sequence of the human malaria parasite Plasmodium falciparum, Nature, vol. 419, Oct. 2002, pgs. 498-511. [13] Yi-Kuo Yu and Stephen F. Altschul , The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions, Oxford University Press 2004. [14] M. Mohamed y P. Gader, Generalized hidden Markov models. I. Theoretical frameworks, Fuzzy Systems, IEEE Transactions on, vol. 8, 2000, pags. 67-81. [15] L. Rabiner y B. Juang, An introduction to hidden Markov models, ASSP Magazine, IEEE, vol. 3, 1986, pgs. 4-16. [16] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol. 77, 1989, pgs. 257-286. [17] T. Oates, L. Firoiu, y P.R. Cohen, Clustering time series with hidden Markov models and dynamic time warping, Proceedings of the IJCAI-99 Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, 1999, pgs. 1721. [18] T. Starner y A. Pentland, Real-time american sign language recognition from video using hidden markov models, COMPUTATIONAL IMAGING AND VISION, vol. 9, 1997, pgs. 227-244. [19] J. V. White, C. M. Stultz, and T. F. Smith. Protein classification by stochastich modeling and optimal filtering of amino acid sequences. Mathem. Biosci., 1994. [20] G. A. Churchill. Stochastic models for heterogeneous DNA sequences. Bull. Mathem. Biol., 1989. [21] P. Baldi y S. Brunak, Bioinformatics: The Machine Learning Approach, MIT Press, 2001.

Evaluation of immune response against Plasmodium ...

the IV section is described the model design, the data that was used and it ... ii delete, main, and insert. methodology was ... B. Peptides data. The sequences that are evaluated, belongs to all the proteins involved in the red blood cell invasion, that is the principal step in the malaria infection [2], those are the P lasmodium.

1MB Sizes 1 Downloads 175 Views

Recommend Documents

EVALUATION OF CANDIDATE LINES AGAINST WHEAT RUSTS.pdf ...
These lines will remain in NUWYT 2005-06. for further ... even long distances ( Singh et al., 2005). ... results of these trials, the candidate wheat .... NRL-2017 AMSEL/TUI CM107503-12Y-020Y-010M-3Y- 010M-1Y-0M-0AP NIFA, Peshawar.

A high performance simulator of the immune response
Keywords: Immune response; Cellular automata (CA); Parallel virtual machine ... Section 2.2 deals with the optimization of the memory management to reduce ...

Scale invariance of immune system response rates and ...
Springer Science + Business Media, LLC 2010 ... specific NIS cells have to search for small quantities of antigen throughout the body. For ... We call this scale-invariant detection and response. ... 6; the results are summarized in Sect ... system.

Scale Invariance of Immune System Response Rates ...
Department of Computer Science, University of New Mexico, Albuquerque, .... This difficult search through the large physical space is facilitated by infected site .... the data. Since our main goal is to determine the scaling of the NIS response ...

Scale Invariance of Immune System Response Rates ...
We call this scale-invariant detection and response. This is counter-intuitive, since .... T helper cells to activate a critical number of B cells. These B cells will then ...

Evaluation of 1D nonlinear total-stress site response model ...
It is well-known that nonlinear soil behavior exhibits a strong influence on surficial ... predictions, statistically significant conclusions are drawn on the predictive ...

Social status alters immune regulation and response to infection in ...
Social status alters immune regulation and response to infection in macaques.pdf. Social status alters immune regulation and response to infection in ...

Evaluation of acaricides and TNAU neem oils against ...
Table 2. Bioefficacy of insecticides and botanicals against Tetranychus urticae on bhendi (Pot culture). Per cent reduction of mites. Treatment. Conc. (%). 1 DAT. 3 DAT. 7 DAT. 15 DAT. Mean. Dicofol 18.5 EC. 0.05. 90.98a. 88.58a. 86.88a. 85.14a. 87.8

Rodriguez Castillo Malaria - Plasmodium sp.pdf
Rodriguez Castillo Malaria - Plasmodium sp.pdf. Rodriguez Castillo Malaria - Plasmodium sp.pdf. Open. Extract. Open with. Sign In. Main menu.

Synthesis of novel β-lactams and in vitro evaluation against ... - Arkivoc
©ARKAT-USA, Inc. Synthesis of novel β-lactams and in vitro evaluation against the human malaria parasite Plasmodium falciparum. Margaret A. L. Blackie,*a ...

An overview of the immune system
Education (B Cohen BSc), St Bartholomew's and the Royal London ... function of the immune system in recognising, repelling, and eradicating pathogens and ...

An overview of the immune system
travel round the body. They normally flow freely in the ...... 18 The International Chronic Granulomatous Disease Cooperative. Study Group. A controlled trial of ...

Immune System.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

HHMI Immune System.pdf
Page 1 of 15. Howard Hughes Medical Institute. 2007 Holiday Lectures on Science. Cells of the Immune System—Student Worksheet. Answer the following ...

Immune activation suppresses initiation of lytic ... - Wiley Online Library
Apr 5, 2007 - the AxioVision 3.1 software (all from Carl Zeiss AG, Oberkochen,. Germany). Adobe Photoshop 6.0 was used to magnify the region of interest.

The Immune System
May 1, 2009 - and may be the serendipitous outcome of invading DNA introduced by a virus or microbe infecting a fishlike creature. It may seem ironic that an infectious agent endowed vertebrates with the keys to a new microbial defense, but it illust

Empirical Evaluation of Volatility Estimation
Abstract: This paper shall attempt to forecast option prices using volatilities obtained from techniques of neural networks, time series analysis and calculations of implied ..... However, the prediction obtained from the Straddle technique is.

Immune System Key.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Immune System Key.pdf. Immune System Key.pdf. Open. Extract.