Title | Integration of 198 ChIP-seq Datasets Reveals Human cis-Regulatory Regions. |
Publication Type | Journal Article |
Year of Publication | 2012 |
Authors | Bolouri H, Ruzzo WL |
Journal | Journal of computational biology : a journal of computational molecular cell biology |
Volume | 19 |
Issue | 9 |
Start Page | 1 |
Pagination | 1-9 |
Date or Month Published | Sept |
ISSN | 1557-8666 |
Abstract | Abstract We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ∼0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ∼8% of the genome (Ernst et al., Nature 2011 , 473, 43-49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011 , 21, 456-464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. ( 2011 ) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. ( 2011 ). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer ( http://www.sanger.ac.uk/genetics/CGP/cosmic/ ). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog ( www.broadinstitute.org/genome_bio/human_lincrnas/ ). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available. |
DOI | 10.1089/cmb.2012.0100 |
Downloads | |
Alternate Journal | J. Comput. Biol. |
Citation Key | 8392 |
PubMed ID | 22897152 |