Tools for the Validation of Genomes and Transcriptomes with Proteomics data 1 Pang,

2 Aya,

1 Tay,

Chi Nam Ignatius Carlos Aidan Nandan P. 1 3 1 Natalie A. Twine, Moustapha Kassem, Marc R. Wilkins 1. 2. 3.

1 Deshpande,

Nadeem O.

1 Kaakoush,

Hazel

1 Mitchell,

Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia Intersect Australia Limited, Sydney, Australia Center for Experimental Bioinformatics, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark

Aims

Analysis of Novel Bacterial Proteomes: under development

With the large amount of genomics and proteomics data currently available, there remains a lack of tools to integrate data from these two fields. This project aims to provide a ‘nexus’ for integrating genomics and transcriptomics data generated from next-generation sequencing with proteomics data generated from protein mass spectrometry. We are developing a set of tools which allow users to:

• Virtual protein generator: A tool which generates Mascot sequence databases based on genes predicted by tools such as Glimmer.3 Novel open reading frames are accounted for by creating a database of ‘virtual proteins’, in which the genome is sliced into overlapping, fixed sized regions and translated in all six frames.4

• Co-visualise genomics, transcriptomics, and proteomics data using the Integrated Genomics Viewer (IGV).1

• Virtual protein merger: This tool takes a list of peptides that matches to ‘virtual proteins’ and recalculates the position of the open reading frames by searching for flanking start and end codons.

• Validate the existence of genes and mRNAs using peptides identified from mass spectrometry experiments. • Validate alternatively spliced mRNA isoforms by searching for peptides that span across exon-exon junctions. Figure 3. The Virtual protein generator and virtual protein merger. The bacterial genome is sliced into overlapping, fixed sized regions and translated in all six frames to create a database of ‘virtual proteins’. Peptides that match to ‘virtual protein’ are merged together into putative open reading frames based on flanking start and end codons.

Analytical Pipeline The pipeline consists of a number of tools and requires a number of input files. It is represented as a diagram below:

Applications – Proof of Concept • The Results Analyzer was used to verify proteins coded in the Campylobacter concisus and Saccharomyces cerevisiae genome. Proteins were verified on the basis of two or more peptide ‘hits’, with Mascot scores exceeding an identity threshold. • Campylobacter concisus (emergent gut pathogen) - 66% (1320/2002) of proteins in Uniprot2 were verified with peptides identified from mass spectrometry experiments. • Saccharomyces cerevisiae (Baker’s yeast)- 14% (895/6621) of the proteins in Uniprot as well as 9% (29/313) of all splice junctions in the yeast proteome were verified with peptide evidence.

Downloads Figure 1. The analytical pipeline allows genomics and transcriptomics data generated from nextgeneration sequencing platforms to be used in custom sequence databases for Mascot searches. This allows the verification of novel genes or novel alternatively spliced mRNA isoforms using proteomics data.

The software is available via the GitHub code repository:

https://github.com/IntersectAustralia/ap11_samifier

Project Blog Integration and Visualisation of Genomics and Proteomics Data • Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics, and proteomics data using the Integrative Genomics Viewer (IGV), which displays SAM files.

http://intersectaustralia.github.com/ap11/

Contact Prof. Marc Wilkins - [email protected]

Genomic location Peptide at exon-exon junction Peptides matches from Mascot Gene architecture (exons and introns) Figure 2. The Integrative Genomics Viewer was used to visualize experimental peptides for the yeast 40S ribosomal protein S7-B (YNL096C). A peptide which spans exon-exon junction is highlighted in the red box. This analysis has also been done on a genome / proteome scale (see Applications).

• Results analyzer: This tool reports the number and types of peptides and proteins, and their corresponding Mascot scores based on customizable filters. Peptides that span across exon-exon junctions are also highlighted, which can be used to validate alternatively spliced isoforms of proteins.

Scan here to download the program.

Acknowledgements This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) Program and the Education Investment Fund (EIF) Super Science Initiative. The software is developed in conjunction with Intersect Australia Limited, a not-for-profit eResearch company. We thank the Australian Proteomics Computational Facility (APCF) for providing access to the Mascot server and Simon Michnowicz for technical support. We also thank Dr. Gene Hart-Smith for access to the Wilkins Lab yeast proteomics data.

References 1. 2. 3. 4.

Robinson, J. T.; Thorvaldsdottir, H.; Winckler, W.; Guttman, M.; Lander, E. S.; Getz, G.; Mesirov, J. P., Integrative genomics viewer. Nat Biotechnol 2011, 29, (1), 24-6. Deshpande, N. P.; Kaakoush, N. O.; Mitchell, H.; Janitz, K.; Raftery, M. J.; Li, S. S.; Wilkins, M. R., Sequencing and validation of the genome of a Campylobacter concisus reveals intra-species diversity. PLoS One 2011, 6, (7), e22170. Delcher, A. L.; Bratke, K. A.; Powers, E. C.; Salzberg, S. L., Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 2007, 23, (6), 673-9. Arthur, J. W.; Wilkins, M. R., Using proteomics to mine genome sequences. Journal of Proteome Research 2004, 3, (3), 393-402.

Integration and Visualisation of Genomics and Proteomics Data - GitHub

This enables co-visualization of genomics, transcriptomics, and proteomics data using the Integrative ... The software is available via the GitHub code repository:.

943KB Sizes 0 Downloads 243 Views

Recommend Documents

Yeast-based functional genomics and proteomics ... - BioTechniques
Oliver, et al. 2006. Mapping pathways and phenotypes by systemic gene overexpres- sion. Mol. Cell 21:319-330. 57. Kamath, R.S., A.G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin,. N. Le Bot, et al. 2003. Systematic function- al analysi

Yeast-based functional genomics and proteomics ... - BioTechniques
org), the Yeast Protein Database. (YPD ... a big advantage of yeast compared with ... able in the drive toward a comprehensive understanding of protein structure and function in the cellular milieu. ... these genes from the data sets revealed.

Yeast-based functional genomics and proteomics ... - BioTechniques
and the Yeast Resource Center (depts. washington.edu/~yeastrc). ...... identified with at least one partner (68). One major concern is ...... of this article, contact.

Genomics and Proteomics of Type 2 Diabetes in ... - MyScienceWork
development of disease not only shows exponential growth but also ... associated with type 2 diabetes in Asian Indian subjects and the XA ... Among various applications, clinical .... of view, the cost of applyingthese technologies and the.

pdf-08108\genomics-proteomics-and-metabolomics-in ...
... apps below to open or edit this item. pdf-08108\genomics-proteomics-and-metabolomics-in-nut ... uticals-and-functional-foods-from-wiley-blackwell.pdf.

Yeast-based functional genomics and proteomics ... - Semantic Scholar
... cellular milieu. 1University of Toronto, Toronto, ON, Canada and 2Dualsystems Biotech Inc., Zurich, Switzerland ...... AD. Y. HIS3. HIS3. HIS3. Figure 7. Small molecule yeast two-hybrid screening. .... co-express the modifying enzyme along.

Yeast-based functional genomics and proteomics ... - Semantic Scholar
... the cellular milieu. 1University of Toronto, Toronto, ON, Canada and 2Dualsystems Biotech Inc., Zurich, Switzerland ...... 160 College Street. Toronto, ON, M5S ...

PyNode Visualisation - GitHub
Page 1 ... a large number of pre-defined nodes which you can use to build your trees, ... will have to write your own node, but a special class Is provided which will make this ... result object stored within your PyNode and create a plot out of it.

P-1338_R M.Sc 2014 SEM2 Genomics, Proteomics & Other OMICS ...
Page. 1. /. 2. Loading… Page 1 of 2. Page 1 of 2. Page 2 of 2. Page 2 of 2. Main menu. Displaying P-1338_R M.Sc 2014 SEM2 Genomics, Proteomics & Other OMICS.pdf. Page 1 of 2.

P-1338 M.Sc 2014 SEM2 Genomics, Proteomics & Other OMICS.pdf ...
Page. 1. /. 2. Loading… Page 1 of 2. Page 1 of 2. Page 2 of 2. Page 2 of 2. Main menu. Displaying P-1338 M.Sc 2014 SEM2 Genomics, Proteomics & Other OMICS.pdf. Page 1 of 2.

Integration Requirements - GitHub
Integration Requirements. Project Odin. Kyle Erwin. Joshua Cilliers. Jason van Hattum. Dimpho Mahoko. Keegan Ferrett ...

lecture 9: monte carlo sampling and integration - GitHub
analysis if needed. • There is also a concept of quasi-random numbers, which attempt to make the Monte Carlo integrals converge faster than N-1/2: e.g. Sobol ... Note that the weights can be >1 and the method is not very useful when the values>>1.

Proteomics: quantitative and physical mapping of cellular proteins
the study of global changes in protein expression, and cell-map proteomics, the systematic study of ... cal limitations outlined below, the field of proteomics.

Proteomics and Metabolomics Core Services.pdf
Whoops! There was a problem loading this page. Proteomics and Metabolomics Core Services.pdf. Proteomics and Metabolomics Core Services.pdf. Open.

Data Science and Machine Learning Essentials - GitHub
computer. Enter the following details as shown in the image below, and then click the ✓icon. • This is a ... Python in data science experiments in later modules.