Prediction of subplastidial localization of chloroplast proteins from spectral count data - Comparison of machine learning algorithms Thomas Burger(1), Samuel Wieczorek(1), Christophe Masselon(1), Daniel Salvi(2), Norbert Rolland(2), Myriam Ferro(1) (1)

CEA/Grenoble, iRTSV, Biologie à Grande Echelle (équipe EDyP), CNRS FR 3425, INSERM U1038, Université Joseph Fourier, F-38054 Grenoble, France. (2) CEA/Grenoble, iRTSV, Physiologie Cellulaire Végétale, CNRS UMR5168, INRA UMR1200, Université Joseph Fourier, F-38054, Grenoble, France.

Context: In order to study chloroplast metabolism and functions, subplastidial localization is a prerequisite to achieve protein functional characterization. As the accurate localization of many chloroplast proteins often remains hypothetical, we set up a proteomics strategy in order to assign the subplastidial localization of chloroplast proteins. State-of-the-art: A comprehensive study of Arabidopsis thaliana chloroplast proteome has been carried out in our group [1], involving high performance mass spectrometry analyses of highly fractionated chloroplasts. In particular, spectral count data were acquired for the three major chloroplast sub-fractions (stroma, thylakoids and envelope) obtained by sucrose gradient purification. As the distribution of spectral counts over compartments is a fair predicator of relative abundance of proteins [2], it was justified to propose a prime statistical model [1] relating spectral counts to subplastidial localization. This predictive model was based on a logistic regression, and demonstrated an accuracy rate of 84% for chloroplast proteins. Contribution and results: In the present work, we conducted a comparative study of various machine learning techniques to generate a predictive model of subplastidial localization of chloroplast proteins based on spectral count data. To do so, we trained on the same dataset containing spectral count information for 555 proteins, various classification algorithms: 1. Support Vector Machines, Random Forest: the state-of-the-art in terms of performances. 2. k-nearest neighbors: a baseline reference, the performances of which, when compared to the state-of-the-art, provide interesting clues on the computational complexity of the problem. 3. PerTurbo: A new classification algorithm [3] based on kernel tricks and matrix perturbation theory, which provides results similar to SVM, while providing several qualitative advantages (fewer parameter to tune, efficient with high number of classes, no risk of over-fitting, etc.) From this comparison, it appears that the most efficient predictive models provide accurate subplastidial localization for 91% of the proteins. Thus, compared to the original model based on logistic regression, it corresponds to an improvement of 7 points the accuracy rate, and an avoidance of ~40% of the misclassifications.. In addition, we also focused on more qualitative elements: The coverage of the training set, the processing of mislabeled data, and the influences of the parameters of the algorithm. These essential elements will be of prime importance in subsequent work, aimed at developing more accurate models based on comprehensive datasets, and leading to accurate prediction even in the case of multi-localized proteins. References: 1. Ferro, M., et al. (2010). AT_CHLORO: comprehensive chloroplast proteome database with subplastidial localization and information for functional genomics using quantitative label-free analyses, Mol. Cell. Proteomics, 9(6): 1063-1084. 2. Gilchrist A., et al. (2006). Quantitative Proteomics Analysis of the Secretory Pathway, Cell 127:1265–1281. 3. N. Courty, T. Burger, J. Laurent (2011). "PerTurbo: a new classification algorithm based on the spectrum perturbations of the Laplace-Beltrami operator", ECMLPKDD 2011.

Prediction of subplastidial localization of chloroplast ...

From this comparison, it appears that the most efficient predictive models provide accurate subplastidial localization for 91% of the proteins. Thus, compared to the original model based on logistic regression, it corresponds to an improvement of 7 points the accuracy rate, and an avoidance of ~40% of the misclassifications.

85KB Sizes 0 Downloads 137 Views

Recommend Documents

HISTOCHEMICAL-LOCALIZATION-OF-HYALURONATE....pdf ...
intercellular spaces from basal to upper spinous layers displayed strong staining, most intense in the middle. spinous cell layer. The uppermost vital cell layers as well as the cornified cell layer remained unstained. In the non-keratinized epitheli

Chloroplast Color Sheet.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Chloroplast ...

The biogeography of prediction error
of prediction errors in modelling the distribution of invasive species (Fitzpatrick & Weltzin, 2005). RDMs are conceptually similar to SDMs, in that they assess the ...

Prediction of Population Strengths
Apr 28, 1998 - specific static strength prediction model which has been implemented in software produced by the Center for Ergonomics at the University of Michigan. The software allows the simulation of a large variety of manual exertions. This paper

Prediction of Hard Keyword Queries
Keyword queries provide easy access to data over databases, but often suffer from low ranking quality. Using the benchmarks, to identify the queries that are like ...

6A5 Prediction Capabilities of Vulnerability Discovery Models
Vulnerability Discovery Models (VDMs) have been proposed to model ... static metrics or software reliability growth models (SRGMS) are available. ..... 70%. 80%. 90%. 100%. Percentage of Elapsed Calendar Time. E rro r in. E s tim a tio n.

DownloadPDF Foundations of Prediction Markets
... Evidence (Evolutionary Economics and. Social Complexity Science) FULL EPUB ... intelligence or machine learning tools to develop nonlinear models. The.

A Probabilistic Prediction of
Feb 25, 2009 - for Research, Education/Training & Implementation, 14-18, October, 2008, Akyaka, Turkey]. ICZM in Georgia -- from ... monitoring and planning, as well as the progress and experience with the development of the National ICZM ... the sus

Read Science of Survival: Prediction of Human ...
from highest to lowest, making this the complete book on the Tone Scale. Knowing only one or two characteristics of a person and using this chart, you can plot ...

Ebook Future of Everything: The Science of Prediction ...
But to what extent have they succeeded Can past events-Hurricane Katrina the Internet stock bubble. The the SARS outeak- help us understand what will ...

Prediction of Survival Odds of Patients Undergoing Bone Marrow ...
[4] Joonseok Lee, Mingxuan Sun, Guy Lebanon,” A Comparative Study of Collaborative Filtering. Algorithms” May 14, 2012. [5] Issam El-Naqa, Yongyi Yang, ...

Chloroplast DNA variation and postglacial ... - Semantic Scholar
Peninsula, as had been suggested from fossil pollen data. ..... The sAMoVA algorithm did not allow us to unambiguously ..... PhD Thesis. .... Science, 300,.

Impact of the Inaccuracy of Distance Prediction ...
Department of Computer Science, Purdue University. Abstract—Distance ..... shortest links are overestimated to similar degrees by IDES and IDES optimal.

Prediction of Survival Odds of Patients Undergoing ...
[email protected]. 4Assistant Professor (Selection Grade), Department of Information Technology, Sri Ramakrishna Engineering. College, Coimbatore, Tamilnadu, India [email protected] ... Classification Algorithms such as Random Forest (RF),Logi

Prediction of Survival Odds of Patients Undergoing Bone Marrow ...
Bone Marrow Transplantation (BMT) Using Data. Mining. Karthika .... performs well on the large sparse and very imbalanced Netflix dataset. PMF assumes that ...

Chloroplast microsatellites reveal colonization ... - Wiley Online Library
JOSÉ. CLIMENT,† LUIS GIL‡ and BRENT C. EMERSON*. *Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, Norwich NR4 7TJ, UK,. †Departamento de Sistemas y Recursos Forestales, CIFOR-INIA, PO B

Microtubule-based localization of a synaptic calcium - Development
convenient tool with which to analyze the function of microtubules in biological .... visualization of protein subcellular localization and AWC asymmetry in ... 138 tir-1(tm3036lf); odr-3p::tir-1::GFP r. –. 0. 100. 0. 147 odr-3p::nsy-1(gf), L1 s. â

Robot Localization Network for Development of ...
suite for the development of a location sensing network. The sensor suite comprises ... mobile robotics, localization technology refers to a systematic approach to ...

Fast maximum likelihood algorithm for localization of ...
Feb 1, 2012 - 1Kellogg Honors College and Department of Mathematics and Statistics, .... through the degree of defocus. .... (Color online) Localization precision (standard devia- ... nia State University Program for Education and Research.

Evaluation of Vocabulary Trees for Localization in ...
to build a vocabulary and to index images, which means a query image is represented by a vector. In addition, we consider memory occupancy. We use ukbench[3] for the dataset and oxford5k[7] for vocabulary learning. These are the most popular datasets

Persistent Localization and Life-Long Mapping ... - University of Lincoln
ios, where the world is constantly changing and uncertainty grows with time, are still an open problem. Some authors try to handle these dynamics by finding the.

pdf-1419\localization-of-clinical-syndromes-in-neuropsychology-and ...
... apps below to open or edit this item. pdf-1419\localization-of-clinical-syndromes-in-neurop ... uroscience-by-joseph-m-tonkonogy-antonio-e-puente.pdf.