Predicting CDSs from RNAseq data, with TransDecoder • STEP1: De novo assembly of a transcriptome (40 millions pair-end Illumina : reads of 100bp -not oriented) with Trinity (Acyrthosiphon svalbardicum, an aphid species). Only contigs >200 bp are retained • STEP2: Prediction of CDSs with TransDecoder, aided by search of pfamA motifs and blastp hits to the proteins of a related genome (pea aphid, A. pisum). Run with option –m 50 (peptides of 50 residues at least) => Aims : evaluating the cds prediction (how many CDSs / transcript, as a function of the transcript sequence size. Writing a program that rapidly calculates a histogram showing how many transcripts have 0, 1, 2, … or more CDSs predicted, for different transcript length bins. Using a filter to consider only CDS of a minimum sequence length in bp.

The program in short STEPS: 1. Calculates the dimension of the histogram by determining the maximum transcript sequence length and the maximum number of CDSs for any transcript 2. Initializes the histogram 3. Fills the Histogram (counts how many transcripts in each category) 4. Displays the results in a two-dimensionnal table

OPTIONS USED: -t transcript sequence file (fasta file used for transdecoder prediction) -c cds cequence file (fasta file produced by Transdecoder.Predict) -min minimum size of cds (e.g. can be set to 150bp, 300bp… default=0) -bin size (of the transcript sequence length – default=50 bp) -o output file name

Output text file: transcript file= svalbard_assembly_CPU8hypermem.Trinity.fasta cds file= svalbard_assembly_ CPU8hypermem.Trinity.fasta.transdecoder.cds minimum size to consider cds= 450 bin size= 50 maximum transcript size= 11585 maximum number of cds for any transcript= 5 low 0 50 100 150 200 250 300 350 400 450 500 550

high 49 99 149 199 249 299 349 399 449 499 549 599

0 0 0 0 0 8781 5512 3570 2620 1966 1140 817 675

1 0 0 0 0 0 0 0 0 0 342 261 276

2 0 0 0 0 0 0 0 0 0 18 23 20

3 0 0 0 0 0 0 0 0 0 2 0 1

Number of CDSs

4 0 0 0 0 0 0 0 0 0 0 0 0



Transcript sequence size range

E.g., the number of transcripts of size comprised between 550and 599 bp with exactly 1 CDS (> 450 bp) predicted is 276

5 0 0 0 0 0 0 0 0 0 0 0 0

Number of cds / transcripts of different sequence length (cumulated percentages) With –min 150 (counting only CDSs > 150 bp)

#CDSs per transcript

Transcript sequence size range

Number of cds / transcripts of different sequence length With –min 300

#CDSs per transcript

Transcript sequence size range

Number of cds / transcripts of different sequence length With –min 450

#CDSs per transcript

Transcript sequence size range

Number of cds / transcripts of different sequence length -

maximum transcript sequence length and the maximum number of. CDSs for any transcript. 2. Initializes the histogram. 3. Fills the Histogram (counts how many transcripts in each category). 4. Displays the results in a two-dimensionnal table. OPTIONS USED: -t transcript sequence file (fasta file used for transdecoder ...

397KB Sizes 3 Downloads 176 Views

Recommend Documents

Minimization of Test Sequence Length for Structural ...
1 ME-Software Engineering, Department of CSE, Sona College of Technology, Salem, TN, India. 2 Associate Professor, Department of CSE, Sona College of ...

Minimization of Test Sequence Length for Structural Coverage ... - IJRIT
So we analyze the role that length plays in software testing, in particular branch ... Index terms: Software Testing, Test sequence, Search based software ...

Minimization of Test Sequence Length for Structural Coverage ... - IJRIT
So we analyze the role that length plays in software testing, in particular branch ... Index terms: Software Testing, Test sequence, Search based software ...

sequence of events.pdf
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. sequence of events.pdf. sequence of even

Future Number of Children
www.gapminder.org/teach ... Free teaching material for a fact-based worldview .... Attribution - You must make clear to others the license terms of this work and ...

In terms of the length of the FPN, the correct length ...
information and personal characteristics such as your ethnic group, any special educational needs and relevant medical information. We will not give information ...

Effective Length of Compression Members - Index of - Free
Mathematically, the effective length factor or the elastic K factor is defined as ..... In computing the K factor for monolithic connections, it is important to evaluate ...

Effect of fiber length on thermal properties of PALF reinforced ...
... the color of the resin changes from pale yellow to dark yellow with the addition of .... length on thermal properties of PALF reinforced bisphenol a composite.pdf.

Draft Genome Sequence of the Filamentous ... - CiteSeerX
Feb 6, 2014 - We thank Joshua Labaer at the Biodesign Institute, Arizona State Univer- ... EM, Eisen JA, Woyke T, Gugger M, Kerfeld CA. 2013. Improving the.

Sequence Discriminative Distributed Training of ... - Semantic Scholar
A number of alternative sequence discriminative cri- ... decoding/lattice generation and forced alignment [12]. 2.1. .... energy features computed every 10ms.

Total Number of ReservedVacancies - Esic
Sep 30, 2009 - (i) Mere submission of application does not confer any right on the candidate to be interviewed. (ii) If a candidate wants to be considered for ...

Total Number of ReservedVacancies - Esic
Sep 30, 2009 - (vi) Wrong declarations/submission of false information or any other action .... Bangalore-560 023 for appointment on deputation/contract basis.

Insights into the sequence of structural consequences of convulsive ...
Insights into the sequence of structural consequences of convulsive ... Health Sciences Centre, Edmonton, Alberta T6G 2B7, Canada. E-mail: ... Data processing. Hippocampal volume: The hippocampi were manu- ally outlined by a trained rater (i.e., F Sh

Approximation of a Polyline with a Sequence of ...
Computer Graphic and Image Processing, Vol. 1 (1972) ... The Canadian Cartographer, Vol. 10, No. 2 ... Computer Science, Springer-Verlag Heidelberg, Vol.

Lesson 2.5: Different kinds of content
Page 1. Lesson 2.5: Different kinds of content. Page 2. Partial listing of content types. ○ Web. ○ Images. ○ Videos (including YouTube). ○ Scholar (scholarly articles and legal opinions). ○ Blogs. ○ Patents. ○ 3D objects. ○ News. ○

a different shade of blue.pdf
a different shade of blue.pdf. a different shade of blue.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying a different shade of blue.pdf. Page 1 of 2.

Quantification of Cry1Ac protein at different stages of ...
Abstract: The present study was conducted at Central Cotton Research Institute, Multan, Pakistan during cotton growing season 2009-10. Nine cotton cultivars with Cry 1 Ac gene (Mon 531 event) selected for current experiment to characterize the toxin

Result of Paying category of different programs and second ...
Result of Paying category of different programs and second counseling of MD/MS Program 2074.pdf. Result of Paying category of different programs and second ...