MSeasy: an R package for pre-processing of GC/LC-MS data adapted to chemical ecology Yann Guittona, Florence Nicolèa, Elodie Courtois b, Jérôme Mardon c, Martine Hossaert-Mckey c, Laurent Legendrea Contact: [email protected] aUniversité de Lyon, F-69003, Lyon, France; Université de Saint-Etienne, F-42000, Saint-Etienne, France; BVpam, EA 3061; 23 rue du Dr Paul Michelon, F-42000, Saint-Etienne, France. b Laboratoire Evolution & Diversité Biologique UMR CNRS 5174 Bâtiment 4R3 Université Paul Sabatier 118, route de Narbonne 31062 Toulouse cedex 4, France . c Behavioural Ecology Group, Centre d’Ecologie Fonctionnelle et Evolutive UMR CNRS 5175, 1919 route de Mende, F-34293 Montpellier, cedex 5, France .

Abstract

The democratization of metabolic analyses has extended the scope of metabolomics to ecological questions. Chemical ecology interprets the variation and diversity of chemical signals of non-model organisms in the light of species interactions. Elucidating the biological information within such complex signals, using robust statistical analyses, requires a large number of replicates. We developed an unsupervised pre-processing method, that generates a fingerprinting peak list from large GC/LC-MS dataset. The method is based on the clustering of mass spectra and does not require any profile correction, retention time alignment or normalization. It is robust to the use of different types of columns and to shifts in retention times particularly common for large/longterm experiments. On the two datasets, used for validation we found that the best clustering method for grouping similar mass spectra was the hierarchical clustering analysis with the Euclidean distance and the Ward linkage. However, it is not excluded that other clustering algorithms could be more adapted for other datasets. For that reason, we have developed a function to identify the best clustering algorithm for each dataset. Availability : The package “MSeasy” implementing our pre-processing method is freely available. For non R users a graphical interface was created.

Process description Objective: unsupervised pre-processing of GC /LC-MS data in chemical ecology investigations

Workflow of the MSeasy package Inputs

Lavenders

Tropical trees

GC/MS data

Birds

LC/MS data

Made by externals softwares

Data from nice species But non-model species !

Step 1: Collect & compile data from chromatograms & spectra

Outputs

Corals

Compiled data matrix: Sample/RT/ abundance of each m/z

What is the best clustering method ? Test of the best clustering algorithm with a subdataset made of known peaks taken from the main dataset.

Graphic result: The best distance and the best linkage algorithm are easily seen on the graphic. Euclidean distance & Ward linkage give best results in all evalutated datasets. 0.5 0.7 0.9

Parameters Step 2: Group similar spectra into molecules, produce peak list and outfiles for quality control

PAM

diana

centroid

ward

complete

single

cor mink_1/2 mink_1/3 manhattan euclidean

average

Homogeneite

Answer

Best

Matching coefficient

Or Any matrix in the right format made by user by is own means

Option

Results: Tropical trees : 55 species, 390 GC-

0.9 0.7

diana

centroid

ward

complete

single

average

Mean silhouette width2

① How many clusters?

PAM

cor mink_1/2 mink_1/3 manhattan euclidean

0.5

Silhouette width

Silhouette Test

clu1

clu2

clu3

Sample 1

0

1

0

Sample 2

0

1

1

Sample 3

1

1

0

Conclusion

② How good are the clusters ? Answer :137

Small RT/RI shift

High RT/RI shift

☺ 1 cluster = 1 molecule

☺ 1 cluster = 2 molecules with similar spectra

OR

Number of clusters

High Silhouette width

MS runs, 250 clusters=294 molecules1 Lavenders: 30 species, 600 GC-MS runs, 137 clusters=194 molecules Both datasets contained ≈ 14 000 peaks.

1 molecule in two different columns (polar/apolar)

Rapid/Efficient Open source Do not write papers for you but save you time !

Metabolomic 2010 congress Amsterdam 27 june – 1st july Bibliography: 1-Courtois, E., C. Paine, et al. (2009). "Diversity of the Volatile Organic Compounds Emitted by 55 Species of Tropical Trees: a Survey in French Guiana." Journal of Chemical Ecology 35(11): 1349-1362. 2-Rousseeuw, P. J. (1987). "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis." Journal of computational and applied mathematics 20(1): 53-65.

MSeasy: an R package for pre-processing of GC/LC ...

was the hierarchical clustering analysis with the Euclidean distance and the Ward linkage. ... package “MSeasy” implementing our pre-processing method.

722KB Sizes 3 Downloads 84 Views

Recommend Documents

progenyClust: an R package for Progeny Clustering - The R Journal
the application of Progeny Clustering straightforward and coherent. Introduction ..... Additional graphical arguments can be passed to customize the plot. The only extra input .... Journal of Statistical Software, 61(6):1–36, 2014a. [p328].

CryptRndTest: An R Package for Testing the ... - The R Journal
on the package Rmpfr. By this way, included tests are applied precisely for ... alternative tests for the evaluation of cryptographic randomness available ..... Call. Test. GCD.test(). GCD.test(x,KS = TRUE,CSQ = TRUE,AD = TRUE,JB = TRUE, ..... In:Pro

CryptRndTest: An R Package for Testing the ... - The R Journal
To the best of our knowledge, the adaptive chi-square, topological binary, .... rate of the theoretical Poisson distribution (λ), and the number of classes (k) that is ...... passes the GCD test with CS goodness-of-fit test for k at (8, I), (8, II)

rdrobust: An R Package for Robust Nonparametric ... - The R Journal
(2008), IK, CCT, Card et al. (2014), and references therein. .... Direct plug-in (DPI) approaches to bandwidth selection are based on a mean. The R Journal Vol.

An R Package for Random Generation of 2×2×K and ... - The R Journal
R×C tables provide a general framework for two-way contingency tables. ...... print(z). Consequently, 31 observations were generated under 3 centers. Call:.

Crowdsourced Data Preprocessing with R and ... - The R Journal
for completing an assignment for a given HIT.2 A requester can offer as low as .... An “HTMLQuestion” structure, essentially the HTML to display to the worker.

SKAT Package - CRAN-R
Jul 21, 2017 - When the trait is binary and the sample size is small, SKAT can produce conservative results. We developed a moment matching adjustment (MA) that adjusts the asymptotic null distribution by estimating empirical variance and kurtosis. B

SWMPr: An R Package for Retrieving, Organizing, and ... - The R Journal
series. Introduction. The development of low-cost, automated sensors that collect data in near real time has enabled a ... An invaluable source of monitoring data for coastal regions in the United States is provided by the National ... The software i

Ake: An R Package for Discrete and Continuous ... - The R Journal
ba. Γ(a) z. −a−1 exp(−b/z)1(0,∞)(z). (16). This allows us to obtain the closed form of the posterior density and the Bayesian ..... Department of Computer Science.

GMDH: An R Package for Short Term Forecasting via ... - The R Journal
Abstract Group Method of Data Handling (GMDH)-type neural network algorithms are the heuristic ... Extracting the information from the measurements has advantages while modelling ... et al., 1998) for an analysis. ... big numbers in calculations and

Ake: An R Package for Discrete and Continuous ... - The R Journal
p.m.f. (respectively p.d.f.) Kx,h(·) of support Sx,h (⊆ R) is called “associated ... The binomial (bino) kernel is defined on the support Sx = {0, 1, . . . , x + 1} with x ∈ T ...... 357–365, 1990. doi: 10.2307/2347385. .... Department of Co

keyplayer: An R Package for Locating Key Players in Social Networks
Abstract Interest in social network analysis has exploded in the past few years, ...... 10. 0. 0. 0.00. 0.00. 0.00. 0. 0.00. 0.65. 0.00. 11. 1. 5. 0.48. 20.50. 0.32. 3. 0.10.

New features of the rdrobust R package
Mar 7, 2017 - rdrobust R package, which provides a wide array of estimation and infer- ence methods for the analysis and interpretation of Regression ...

Tutorial introducing the R package TransPhylo - GitHub
Jan 16, 2017 - disease transmission using genomic data. The input is a dated phylogeny, ... In the second part we will analyse the dataset simulated in the first ...

ssdm: An r package to predict distribution of species ...
served over a network of comprehensive species inventories (e.g. plots, transects ... richness prediction (D'Amen, Dubuis, et al., 2015; D'Amen, Pradervand, et al., 2015). ... community- level frameworks in a single software architecture and. SESAM i

Computationally Efficient Simulation of Queues: The R Package - arXiv
in a hospital (Takagi, Kanai, and Misue 2016); items in a manufacturing system (Dallery and Gershwin 1992); ... simpy (Lünsdorf and Scherfke 2013) and the Java (Gosling 2000) package JMT (Bertoli,. Casale, and Serazzi .... Green, Kolesar, and Svoron

SchemaOnRead: A Package for Schema-on-Read in R - The R Journal
schema-on-read tools within the package include a single function call that recursively reads folders with text, comma ... A simple way to use SchemaOnRead is to conveniently load a file without needing to handle the specifics of the ... Page 3 ...

Stylometry with R: A Package for Computational Text ... - The R Journal
Abstract This software paper describes 'Stylometry with R' (stylo), a flexible R package for the high- level analysis of writing style in stylometry. Stylometry (computational stylistics) is concerned with the quantitative study of writing style, e.g

Pruning and Preprocessing Methods for Inventory ...
e.g., keys, are worth using thus pruning potentially unnecessary items before the ...... Digital Entertainment Conference, 2015, extended version at http://arxiv.