BIOINFORMATICS APPLICATIONS NOTE

Vol. 25 no. 8 2009, pages 1080–1081 doi:10.1093/bioinformatics/btp095

Genome analysis

RNATOPS-W: a web server for RNA structure searches of genomes Yingfeng Wang1 , Zhibin Huang1 , Yong Wu1 , Russell L. Malmberg2,3 and Liming Cai 1,3,∗ 1 Department

of Computer Science, 2 Department of Plant Biology and 3 Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA

Received on November 5, 2008; revised on February 15, 2009; accepted on February 16, 2009 Advance Access publication March 5, 2009 Associate Editor: Ivo Hofacker

ABSTRACT Summary: RNATOPS-W is a web server to search sequences for RNA secondary structures including pseudoknots. The server accepts an annotated RNA multiple structural alignment as a structural profile and genomic or other sequences to search. It is built upon RNATOPS, a command line C++ software package for the same purpose, in which filters to speed up search are manually selected. RNATOPS-W improves upon RNATOPS by adding the function of automatic selection of a hidden Markov model (HMM) filter and also a friendly user interface for selection of a substructure filter by the user. In addition, RNATOPS-W complements existing RNA secondary structure search web servers that either use built-in structure profiles or are not able to detect pseudoknots. RNATOPS-W inherits the efficiency of RNATOPS in detecting large, complex RNA structures. Availability: The web server RNATOPS-W is available at the web site www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS-w. The underlying search program RNATOPS can be downloaded at www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

1

INTRODUCTION

Searching genomes using computational methods has become important for prediction and annotation of non-coding RNAs (Lowe and Eddy, 1997; Hofacker, 2006; Griffiths-Jones, 2007; Rivas and Eddy, 2001; Rivas et al., 2001; Washietl et al., 2005) Profilebased RNA structure search is an often used approach for this purpose. However, for large, complex RNA molecules such as those containing pseudoknots, the search task has proven difficult. Typically, some existing web servers for RNA structure search consider pseudoknots whose profiles are predefined and fixed with the search program (Zhang et al., 2005); other available programs allow user-defined profiles but are limited to pseudoknotfree structures (Griffiths-Jones et al., 2003; Klein and Eddy, 2003; Nawrocki and Eddy, 2007). Web servers with the capability to accept user-defined profiles for arbitrary pseudoknot structure searches are not available. This is due to the lack of appropriate RNA pseudoknot models that can permit efficient algorithms for structure–sequence alignment, a bottleneck task. Search programs can usually be ∗ To

whom correspondence should be addressed.

1080

speeded up with filtering methods that can quickly remove genome segments unlikely to contain the desired pattern in the profile (Bafna and Zhang, 2004; Lowe and Eddy, 1997; Weinberg and Ruzzo, 2006; Zhang et al., 2005), but even with a significant speed-up (e.g. with a 99% genome reduction), searching for an complex RNA structure with a pseudoknot may still take hours, if not days, on a typical bacterial or yeast genome. Our previous work (Song,Y. et al., 2005) introduced a graphtheoretic modeling method for profiling RNA secondary structures including pseudoknots. With this model, we were able to design a very efficient structure–sequence alignment algorithm, ideal for RNA pseudoknot search on genomes, and implemented it in an RNA structure search program called RNATOPS (Huang et al., 2008). One advantage of RNATOPS is its high efficiency searching for large RNA or complex structures including pseudoknots, while maintaining accuracy comparable with other search programs that are only capable of detecting pseudoknot-free structures. To further speed up searches, RNATOPS also executes the whole structure search on filtering results. However, filters (i.e. subsequence or substructure profiles) can only be manually selected. This article presents a web server version of RNATOPS, called RNATOPS-W with a new built-in function for automatic hidden Markov model (HMM) filter selection. The web server also allows an interactive selection of any substructure as a filter through a user-friendly interface.

2

PROGRAM FEATURES

This section presents the filtering functions of the web server RNATOPS-W and its interface features. We refer the reader to our previous work (Huang et al., 2008; Song et al., 2005) for detailed discussions on the search methods and algorithms used by RNATOPS.

2.1

Filtering method

RNATOPS-W incorporates a function of automatic HMM filter selection; the selected filter is used to speed up the search program. The filter selection chooses a conserved region as an HMM filter from the given RNA structural profile (a set of structurally aligned RNA sequences). Our filter selection method was built from two previous approaches used to identify conserved amino acids in protein sequences (Capra and Singh, 2007; Song,B. et al., 2005); it replaces the overall amino acid distribution in the BLOSUM62

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

[10:27 3/4/2009 Bioinformatics-btp095.tex]

Page: 1080

1080–1081

RNATOPS-W

alignments with the nucleotide distribution in the given RNA alignment. In addition, our method ignores columns containing more than 50% gaps instead of the 30% used in the first method (Capra and Singh, 2007). Scores are assigned to columns based on their degree of conservation, with higher scores for more conserved columns. Based on these scores, an automatic peak detection algorithm (Song, B. et al., 2005) is then applied to find a conserved region. In selecting such a region, an ‘ignored’ column is also re-considered if both its neighboring columns are considered for the conserved region. The selected conserved region is then used to produce a profile HMM filter. We conducted two types of experiments to test the performance of our filter selection method. On synthetic genomes generated by embedding real RNA sequences taken from the profile into randomly generated nucleotides, with the automatically selected HMM filter, RNATOPS-W never missed a real RNA sequence. In the search time test on real genomes, automatically selected HMM filters drastically speeded up the whole structure search (by at least three orders of magnitude) in contrast to randomly generated HMM filters, which found too many false positive filter hits to yield an efficient whole structure search. We have also conducted tests on the HMM filters constructed directly from the full-length alignment of structure profiles and compared their performance with our automatically generated filters. The experiments indicate that with (sequentially) conserved RNA profiles, HMM filters generated from the full-length alignment have a lower false positive rate than automatically generated HMM filters. On sequentially less conserved RNA profiles, the latter has a higher accuracy. Both filters are sensitive. However, in either case of RNA profiles, searching with the filters selected by our algorithm are about one magnitude faster than searching with a filter from the full-length alignment. These test results also indicate that HMM filters automatically generated by RNATOPS-W can maintain both efficiency and the accuracy. Test results and comparisons for automatically generated filters, random filters, full-length alignment filters and manually selected filters are shown in the Supplementary Material.

2.2

Interface features

To use RNATOPS-W for RNA structure search, the user is asked to submit an RNA structure profile (i.e. a set of structurally aligned training RNAs) in pasta format (Huang et al., 2008) and target genomes in fasta format. These data can be in either a file or an input text box to be uploaded in the start page. By default, RNATOPS-W automatically selects an HMM filter for the given structure profile. The user can also opt to select manually his/her own filter, by specifying the beginning and ending regions of any consecutive substructure from the given structure profile. After the submission of the input and an filter option, the server searches the target genomes with the filter and then searches the filtered hits for whole-structure matches. Each search request is given a ticket number with which the user can retrieve later a search result file from a provided link or from the start page. For each search request, the result file contains information for all search hits that ‘match’ the structure profile. For each hit, the following information is produced: the name of the genome containing the hit, the hit sequence, its position in the genome, the score of the hit sequence, the fold conforming to the structure

profile and the structural alignment between the hit sequence and the structure profile. The output also contains the parameter settings for each whole search request and the total time used in the search. Additional options are provided for the user to redefine parameters pertinent to the search algorithm to achieve a desired search accuracy. The user, instead of choosing the ‘All default’ option, can select ‘Adjust parameters’. These parameteras mostly concern setting priors for stochastic modeling of individual stems and loops in the structure profile and improving the qualities of candidates found for individual stems. RNATOPS-W provides users a friendly web-interface to perform searches of genomes for RNAs on the basis of their structural profile, including pseudoknots. It adds functionality by automated selection of filters to speed up the search.

ACKNOWLEDGEMENTS A part of the server interface was implemented with help from Mark Wilson. Funding: NIH Biomedical Information Science and Technology Initiative ( R01GM072080-01A1, in part). Conflict of Interest: none declared.

REFERENCES Bafna,V. and Zhang,S. (2004) FastR: fast database search tool for non-coding RNA. In Proceedings of the 3rd IEEE Computational Systems Bioinformatics Conference, Palo Alto, CA, pp. 52–61. Capra,J.A. and Singh,M. (2007) Predicting functionally important residues from sequence conservation. Bioinformatics, 15, 1875–1882. Griffiths-Jones,S. (2007) Annotating noncoding RNA genes. Ann. Rev. Genomics Hum.Genet., 8, 279–298. Griffiths-Jones,S. et al. (2003) Rfam: an RNA family database. Nucleic Acids Res., 31, 439–441. Hofacker,I.L. (2006) RNAs everywhere: genome-wide annotation of structured RNAs. Genome Informatics, 17, 281–282. Huang,Z. et al. (2008) Fast and accurate search for non-coding RNA pseudoknot structures in genomes. Bioinforamtics, 24, 2281–2287. Klein,R.J. and Eddy,S.R. (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics, 4, 44. Lowe,T.M. and Eddy,S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25, 955–964. Nawrocki,E.P. and Eddy,S.R. (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput. Biol., 3, e56. Rivas,E. and Eddy,S.R. (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2, 8. Rivas,E. et al. (2001) Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol., 11, 1369–1373. Song,B. et al. (2005) ARCS: an aggregated related column scoring scheme for aligned sequences. Bioinformatics, 19, 2326–2332. Song,Y. et al. (2005) Tree decomposition based fast searching for RNA structures with and without pseudoknots. In Proceedings of IEEE Computational Systems Bioinformatics Conference, Palo Alto, CA, pp. 223–234. Washietl,S. et al. (2005) Fast and reliable prediction of noncoding RNAs, Proc. Natl Acad. Sci., 102, 2454–2459. Weinberg,Z. and Ruzzo,W.L. (2004) Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy, Bioinformatics, 20 (Suppl. 1), I334–I341. Weinberg,Z. and Ruzzo,W.L. (2006) Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 22, 35–39. Zhang,S. et al. (2005) Searching genomes for noncoding RNA using FastRI. IEEE/ACM Trans. Comput. Biol. Bioinform., 2, 366–379.

1081

[10:27 3/4/2009 Bioinformatics-btp095.tex]

Page: 1081

1080–1081

RNATOPS-W: a web server for RNA structure searches of genomes

Feb 15, 2009 - 1Department of Computer Science, 2Department of Plant Biology and 3Institute of Bioinformatics, University of ... Associate Editor: Ivo Hofacker.

70KB Sizes 1 Downloads 109 Views

Recommend Documents

RNATOPS-W: a web server for RNA structure searches of genomes
Feb 15, 2009 - automatic selection of a hidden Markov model (HMM) filter and also a friendly ... For Permissions, please email: [email protected] ... with (sequentially) conserved RNA profiles, HMM filters generated.

Efficient structure similarity searches: a partition-based ...
Thus, it finds a wide spectrum of applications of different domains, including object recognition in computer vision. [3], and molecule analysis in chem-informa-tics [13]. For a notable example, compound screening in the process of drug development e

Factor Structure of Content Preparation for E-Business Web Sites
To enhance the quality of e-business web sites, a study of factor ..... The best way to determine what information customers want in e-business operation.

Algorithms for the Study of RNA and Protein Structure
Aug 27, 2010 - More importantly, it is a global alignment, giving high scores to similar structures - we ... structure B, (optionally) without violating the non-crossing constraint. ... global memory. ▻ Two levels of parallelism: ▻ Each block of

Genonets server—a web server for the construction ...
represent genotypes that have the same phenotype. Edges connect vertices if their corresponding geno- types differ in a single small mutation. Genotype net-.

Gathering enriched web server activity data of cached web content
May 8, 2009 - face (CGI) string of data that contains enhanced web activity data information ..... cache Web pages on his local hard disk. A user can con?gure.

Gathering enriched web server activity data of cached web content
May 8, 2009 - signi?cant bene?ts to Internet Service Providers (ISPs), enterprise networks, and end users. The two key bene?ts of web caching are cost ...

Better Searches handout
What to type: “one small step for man". What you'll get: results that include ... What you'll get: results with the word “phone,” as well as “cell,” “cellular,” “wireless," ...

Better searches. Better results.
box to refine your searches and get the best results. Exact Phrase. What it does: searches ... What it does: searches only particular websites. What to type: global ...

InSatDb: a microsatellite database of fully sequenced insect genomes
Nov 1, 2006 - analysis that can be carried out using the output. InSatDb is available at www.cdfd.org.in/insatdb. INTRODUCTION. Microsatellites are simple ...

Better Searches handout
box to refine your searches and get the best results. © Exact Phrase ... What you'll get: results that include the exact phrase ... link to a particular website. What to ...

Characterization of a trp RNA-binding Attenuation ...
the leader region of read-through trp mRNAs induces formation of an RNA ..... tides in all positions except G3 of each repeat, binds WT TRAP with similar affinity ...

Unusual mechanical stability of a minimal RNA kissing ...
Oct 16, 2006 - rip and zip transitions have negative slopes, in sharp contrast to the ... Only a single, big rip appears in the third type of unfolding trajectory.

Evolution of the Chilean Web Structure Composition
Barbara Poblete. Center for Web Research. Dept. of Computer Science .... We acknowledge the support of Millennium Nucleus. Grant P01-029-F from Mideplan, ...

Learning the structure of objects from Web supervision
sider for example the notion of object category, which is a basic unit of understanding in .... parts corresponding to the “bus” and “car” classes. Webly supervised ...

Dynamics of the Chilean Web structure
Dec 9, 2005 - (other non .cl sites hosted in Chile are estimated to number ... but there is no path to go back to MAIN; and. (d) other ... with 94,348 having a DNS server. Hence, in ..... site appeared at the end of 1993 in our CS depart- ment.

Web-based bioinformatics workflows for end-to-end RNA-seq data ...
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Web-based bioinformatics workflows for end-to-end RNA- ... tation and analysis in agricultural animal species.pdf. Web-based bioinformatics workflows for

A STRUCTURE THEOREM FOR RATIONALIZABILITY IN ... - STICERD
particular, there, we have extensively discussed the meaning of perturbing interim ..... assumption that Bi (h), the set of moves each period, is finite restricts the ...

A Framework for Developing the Structure of Public Health Economic ...
placed on these approaches for health care decision making [4], methods for the .... the methods described in the articles were identified using a data extraction ...

A STRUCTURE THEOREM FOR RATIONALIZABILITY ...
under which rti (ai) is a best reply for ti and margΘXT−i. (πti,rti (ai)) = κti . Define a mapping φti,rti (ai),m : Θ* → Θ* between the payoff functions by setting. (A.5).

A Framework for Developing the Structure of Public Health Economic ...
presented. Detailed process suggestions and an example to illustrate ... the diabetes example are drawn upon throughout the article. ... heterogeneous changes.