Bioinformation

open access

by Biomedical Informatics Publishing Group

Database

www.bioinformation.net

____________________________________________________________________________

SNPinProbe_1.0: A database for filtering out probes in the Affymetrix GeneChip® Human Exon 1.0 ST array potentially affected by SNPs Shiwei Duan1, Wei Zhang1, Wasim Kamel Bleibel1, Nancy Jean Cox1, 2 and M. Eileen Dolan1, 3, 4, * 1

Section of Hematology/Oncology, Department of Medicine; 2Department of Human Genetics; 3Committee on Clinical Pharmacology and Pharmacogenomics; 4Cancer Research Center, The University of Chicago, IL 60637, USA; M. Eileen Dolan* - E-mail: [email protected]; Phone: 773 702 4441; Fax: 773 702 0963; * Corresponding Author received June 18, 2008; revised July 20, 2008; accepted July 23, 2008; published August 01, 2008

Abstract: The Affymetrix GeneChip® Human Exon 1.0 ST array (exon array) is designed to measure both gene-level and exon-level expression in human samples. This exon array contains ~1.4 million probesets consisting of ~5.4 million probes and profiles over 17,000 well-annotated gene transcripts in the human genome. As with all expression arrays, the exon array is vulnerable to SNPs within probes, because these SNPs can affect the hybridization of the probes and thus produce misleading expression values. In some cases, this could result in dramatic fluctuations of the exon-level expression. For this reason, we performed a genome-wide search for SNPs within regions that hybridize to probes by evaluating approximately 18 million SNPs in dbSNP (Build 129) and about 5.4 million probes in the exon array. We identified 597,068 probes within 350,382 probe sets that hybridized to regions containing SNPs. These affected probes and/or probesets can be filtered in the data processing procedure thus controlling for potential false expression phenotypes when using this exon array. Keywords: database; probes; SNP; Affymetrix GeneChip® human exon 1.0 ST array; human genome Availability: http://cid-fb2a64e541add2be.skydrive.live.com/browse.aspx/Affy%7C_HuEx%7C_1.0ST?uc=2. Background: Using high-throughput gene expression microarrays, thousands of genes are now able to be profiled in a single analysis. The Affymetrix GeneChip® Human Exon 1.0 ST array has been designed to detect novel exons, spliced exons or sub-exons of a gene in human samples [1]. The exon array uses over 5.4 million probes representing about 1.4 million probesets that are designed based on the genomic regions of known genes and regions that may harbor hypothetical genes. Compared with other arrays including the Affymetrix Genome Human Focus® array, U95® and U133® series array, the probes on the exon array are designed to cover the whole gene region instead of the 3′-untranslated regions [1]. Additionally, gene structures are represented by the probe sets with each probe set on the exon array consisting of up to 4 perfect match probes transcribed to a region of the exon. This is quite different from previous Affymetrix gene expression arrays that contain a set of perfect match and mismatch set of oligonucleotides tiled onto the microarray that account for nonspecific hybridization [1, 2]. However, studies have shown that SNPs within probes can affect hybridization of the 3′ expression arrays [3] as well as the exon arrays [4-6]. Given that there are 5.4 million probes on this human exon array, there are more probes hybridizing to regions containing SNPs and the effect can be dramatic when evaluating exon level expression. ΙSSN 0973-2063 Bioinformation 2(10): 469-470 (2008)

SNPs found in the probe-covered regions were shown to affect the hybridization efficiency of some probes and this can cause false relationships between the SNP genotypes and gene expression levels that are represented by the probes [4-6]. Furthermore, the hybridization difference of certain probes among individuals may not actually reflect the actual expression differences of the probe-representing regions but be due to the genotype differences of the common SNPs inside the hybridized sequences of the probes [3-6]. Quality control should include the identification of the probes containing SNPs in order to filter out the affected probes prior to expression analysis, thereby controlling the confounding effects that can be caused by these SNPs [5, 7, 8]. Methodology: Dataset The dataset [9] contains the probes affected by the SNPs in their hybridization regions based on the dbSNP database (version 129, genome build 36, April, 2008) [10]. Development The genomic positions (build 36) of over 18 million SNPs were retrieved in the dbSNP database (version 129). The sequences of over 5.4 million probes and over 1.4 million

469 Bioinformation, an open access forum © 2008 Biomedical Informatics Publishing Group

Bioinformation

by Biomedical Informatics Publishing Group

open access

Database

www.bioinformation.net

____________________________________________________________________________ probe sets were downloaded at the Affymetrix website [11]. Since the probesets are given with the genomic regions (build 36), while the probes are still annotated with the old genomic regions (build 34), a local BLAT [12] between probes and their probesets were performed to update the probe covered genomic regions. Then, a genome-wide search process was performed between ~18 million SNPs and over 5.4 million probes to identify the probes affected by the probesets. Database content This database [9] provides 597,068 probes within 350,382 probesets affected by the known SNPs in dbSNP (version 129).

Database usage The user can download the list of affected probes and probesets [9], and then apply the list to filter out the affected probes using the program provided by the Affymetrix Power Tools (1.8.6) (Figure 1). This software is a free tool with the functionality to filter out a known set of probes. Removal of affected probes can be accomplished by using their highly experimental workflow through using the apt-probesetsummarize function together with the --kill-list function [13]. Resulting probeset intensities will be summarized solely on those probes not affected by SNPs. The generated expression data will be good for routine expression analysis.

Figure 1: The process of filtering out the affected probes by SNPs inside. W. Zhang, et al., Am J Hum Genet., 82: 631 (2008) [PMID: 18313023] [06] A. Sequeira, et al., Mol Psychiatry, 13: 363 (2008) [PMID: 18347597] [07] W. Zhang and M. E. Dolan, Bioinformation, 2: 238 (2008) [PMID: 18317571] Acknowledgment: This Pharmacogenetics of Anticancer Agents Research [08] S. Duan, et al., Am J Hum Genet., 82: 1101 (2008) (PAAR) Group http://pharmacogenetics.org) study was [PMID: 18439551] supported by the NIH/NIGMS grant U01GM61393. [09] http://cidfb2a64e541add2be.skydrive.live.com/browse.aspx/Aff y|_HuEx|_1.0ST?uc=2 References: [01] http://www.affymetrix.com/support/technical/whitepa [10] S. T. Sherry, et al., Nucleic Acids Res., 29: 308 (2001) pers/exon_probeset_trans_clust_whitepaper.pdf [PMID: 11125122] [02] http://www.affymetrix.com/support/technical/technote [11] http://www.affymetrix.com s/25mer_technote.pdf [12] W. J. Kent, Genome Res., 12: 656 (2002) [PMID: [03] E. Sliwerska, et al., Biol Psychiatry, 61: 13 (2007) 11932250] [PMID: 16690034] [13] http://www.affymetrix.com/support/developer/powert ools/changelog/VIGNETTE-expression-mask[04] R. Alberts, et al., PLoS ONE, 2: e622 (2007) [PMID: probes.html 17637838] Edited by P. Kangueane Citation: Duan et al., Bioinformation 2(10): 469-470 (2008) License statement: This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited. ΙSSN 0973-2063 470 Caveats: There are 111,685 probes (2% of the total probes) that failed in the BLAT process possibly due to the fact that they are the background controls. We also include them in the database [9].

[05]

Bioinformation 2(10): 469-470 (2008)

Bioinformation, an open access forum © 2008 Biomedical Informatics Publishing Group

A database for filtering out probes in the Affymetrix ...

Aug 1, 2008 - Bioinformation, an open access forum ... Pharmacogenomics; 4Cancer Research Center, The University of Chicago, IL 60637, USA; ... These affected probes and/or probesets can be filtered in the data processing procedure ...

175KB Sizes 1 Downloads 144 Views

Recommend Documents

Filtering Large Fingerprint Database for Latent Matching
Department of Computer Science and Engineering. Michigan State University ... gerprints are acquired from co-operative subjects, they are typically of good ...

Filtering Large Fingerprint Database for Latent Matching
Filtering Large Fingerprint Database for Latent Matching. Jianjiang Feng and Anil K. Jain. Department of Computer Science and Engineering. Michigan State ...

Filtering: A Method for Solving Graph Problems in ...
social network analysis. Although it ... seminal work of Karger [10] to the MapReduce setting. ... The most popular model is the PRAM model, ...... O'Reilly Media,.

Filtering: A Method for Solving Graph Problems in ...
As the input to a typical MapReduce computation is large, one ... universities are using Hadoop [6, 21] for large scale data analysis. ...... International Conference on Knowledge Discovery and Data ... Inside large-scale analytics at facebook.

Magnetic resonance probes
Jun 6, 2007 - layer disposed at least partially about the plurality of center conductors in a proximal ...... touching but cores are not in contact. The insulator can ...

Magnetic resonance probes
Jun 6, 2007 - Susil R et a1 “Multifunctional Interventional Devices for MRI' A. 6,999,818 B2 .... 3, 2004, originally published online Jul. ..... software program.

Magnetic resonance probes
Jun 6, 2007 - because of the long duration of recovery and risks associated ..... example, a steering disc 33, which may be disposed in a handle 34 for the ...

An Incremental Approach for Collaborative Filtering in ...
Department of Computer Science and Engineering, National Institute of Technology. Rourkela, Rourkela, Odisha ... real-world datasets show that the proposed approach outperforms the state-of-the-art techniques in ... 1 Introduction. Collaborative filt

High-Throughput Selection of Effective RNAi Probes for ...
E-MAIL [email protected]; FAX (516) 422-4109. Article and publication ..... is a laser scan image of spots expressing EGFP (green) and RFP. (red) expression, and ...

A Language and an Inference Engine for Twitter Filtering Rules
relevant in these years, where billion of users use online social ... posts in social network has been proposed only for Facebook in [15]. ..... campaigns in twitter,” ACM Trans. ... of the 2015 IEEE/ACM International Conference on Advances in.

Use of adaptive filtering for noise reduction in ...
software solutions to the adaptive system using the two main leaders of adaptive LMS (least mean square) ... environment is variable in time and its development.

Organic Redox Probes for the Key Oxidation States in ...
CHI 660 electrochemical workstation (Austin, TX, USA). The three-electrode system consists of a 3 mm diameter modified/unmodified GCE working electrode, ..... tions. 5. Acknowledgement. The authors gratefully acknowledge financial support from the Na

Transfer Learning in Collaborative Filtering for Sparsity Reduction
ematically, we call such data sparse, where the useful in- ... Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) ... way. We observe that these two challenges are related to each other, and are similar to the ...

Use of adaptive filtering for noise reduction in communications systems
communication especially in noisy environments. (transport, factories ... telecommunications, biomedicine, etc.). By the word ..... Companies, 2008. 1026 s.

SNPHarvester: a filtering-based approach for detecting ...
Nov 15, 2008 - Consequently, existing tools can be directly used to detect epistatic interactions. .... (2) Score function: the score function is defined to measure the association between .... smaller in the later stage of our algorithm. • We need

Method and apparatus for filtering E-mail
Jan 31, 2010 - Clark et a1., PCMAIL: A Distributed Mail System for Per. 6,052,709 A ..... keted as a Software Development Kit (hereinafter “SDK”). This Will ...

Combinational Collaborative Filtering for ... - Research at Google
Aug 27, 2008 - Before modeling CCF, we first model community-user co- occurrences (C-U) ...... [1] Alexa internet. http://www.alexa.com/. [2] D. M. Blei and M. I. ...

Trans-Neptunian objects as natural probes to the unknown ... - arXiv.org
Thus, the classical region best represents the remnants of the .... scattered objects can be temporarily detached from the gravitational domain of Neptune by .... What can the planetary Trojan populations tell us about the evolution of their host pla

Anti-Jamming Filtering in the Autocorrelation Domain - IEEE Xplore
input and the output statistics. A limited simulation study supports the theory. Index Terms—Anti-jamming filter design, autocorrelation matching, blind channel ...

Transfer Learning for Collaborative Filtering via a ...
aims at predicting an active user's ratings on a set of. Appearing in Proceedings of ...... J. of Artificial Intelligence Research, 12, 149–198. Caruana, R. A. (1997).

Synthesizing Filtering Algorithms in Stochastic ... - Roberto Rossi
... constraint programming. In Frank van Harmelen, editor, Euro- pean Conference on Artificial Intelligence, ECAI'2002, Proceedings, pages 111–115. IOS. Press ...

Transfer learning in heterogeneous collaborative filtering domains
E-mail addresses: [email protected] (W. Pan), [email protected] (Q. Yang). ...... [16] Michael Collins, S. Dasgupta, Robert E. Schapire, A generalization of ... [30] Daniel D. Lee, H. Sebastian Seung, Algorithms for non-negative matrix ...

Unscented Information Filtering for Distributed ...
This paper represents distributed estimation and multiple sensor information fusion using an unscented ... Sensor fusion can be loosely defined as how to best extract useful information from multiple sensor observations. .... with nυ degrees of free