Skip to main content
No Access

Correcting imbalanced reads coverage in bacterialtranscriptome sequencing with extreme deep coverage

Published Online:pp 195-213

High throughput bacterial RNA-Seq experiments can generate extremely high and imbalanced sequencing coverage. Over- or underestimation of gene expression levels will hinder accurate gene differential expression analysis. Here we evaluated strategies to identify expression differences of genes with high coverage in bacterial transcriptome data using either raw sequence reads or unique reads with duplicate fragments removed. In addition, we proposed a generalised linear model (GLM) based approach to identify imbalance in read coverage based on sequence compositions. Our results show that analysis using raw reads identifies more differentially expressed genes with more accurate fold change than using unique reads. We also demonstrate the presence of sequence composition related biases that are independent of gene expression levels and experimental conditions. Finally, genes that still show strong coverage imbalance after correction were tagged using statistical approach.

Keywords

bacterial transcriptome sequencing, RNA-Seq, gene differential expression, coverage imbalance, tri-nucleotides, GLM, generalised linear model, computational biology

References

  • 1. Aird, D. , Ross, M.G. , Chen, W.S. , Danielsson, M. , Fennell, T. , Russ, C. , Jaffe, D.B. , Nusbaum, C. , Gnirke, A. (2011). ‘Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries’. Genome Biology. 12, 2, R18 Google Scholar
  • 2. Ajay, S.S. , Parker, S.C. , Abaan, H.O. , Fajardo, K.V. , Margulies, E.H. (2011). ‘Accurate and comprehensive sequencing of personal genomes’. Genome Research. 21, 9, 1498-1505 Google Scholar
  • 3. Benjamini, Y. , Speed, T.P. (2012). ‘Summarizing and correcting the GC content bias in high-throughput sequencing’. Nucleic Acids Research. 40, 10, e72 Google Scholar
  • 4. Breese, M.R. , Liu, Y. (2013). ‘NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets’. Bioinformatics. 29, 4, 494-496 Google Scholar
  • 5. Chen, L. , Li, Y. , Lin, C.H. , Chan, T.H. , Chow, R.K. , Song, Y. , Liu, M. , Yuan, Y.F. , Fu, L. , Kong, K.L. , Qi, L. , Li, Y. , Zhang, N. , Tong, A.H. , Kwong, D.L. , Man, K. , Lo, C.M. , Lok, S. , Tenen, D.G. , Guan, X.Y. (2013). ‘Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma’. Nature Medicine. 19, 2, 209-216 Google Scholar
  • 6. Clark, M.J. , Homer, N. , O’connor, B.D. , Chen, Z. , Eskin, A. , Lee, H. , Merriman, B. , Nelson, S.F. (2010). ‘U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line’. PLoS Genetics. 6, 1, e1000832 Google Scholar
  • 7. Gangaiah, D. , Zhang, X. , Fortney, K.R. , Baker, B. , Liu, Y. , Munson, R.S., Jr. , Spinola, S.M. (2013). ‘Activation of CpxRA in haemophilus ducreyi primarily inhibits the expression of its targets, including major virulence determinants’. Journal of Bacteriology. 195, 15, 3486-3502 Google Scholar
  • 8. Hansen, K.D. , Brenner, S.E. , Dudoit, S. (2010). ‘Biases in Illumina transcriptome sequencing caused by random hexamer priming’. Nucleic Acids Research. 38, 12, e131 Google Scholar
  • 9. Kinde, I. , Bettegowda, C. , Wang, Y. , Wu, J. , Agrawal, N. , Shih Ie, M. , Kurman, R. , Dao, F. , Levine, D.A. , Giuntoli, R. , Roden, R. , Eshleman, J.R. , Carvalho, J.P. , Marie, S.K. , Papadopoulos, N. , Kinzler, K.W. , Vogelstein, B. , Diaz, L.A., Jr. (2013). ‘Evaluation of DNA from the Papanicolaou test to detect ovarian and endometrial cancers’. Science Translational Medicine. 5, 167, 167ra164 Google Scholar
  • 10. Leleu, M. , Lefebvre, G. , Rougemont, J. (2010). ‘Processing and analyzing ChIP-seq data: from short reads to regulatory interactions’. Briefings in Functional Genomics. 9, 5–6, 466-476 Google Scholar
  • 11. Levene, H. , Olkin, I. (1960). ‘Robust tests for equality of variances’. Contributions to Probability and Statistics. Palo Alto, CA:Stanford Univ. Press Google Scholar
  • 12. Li, H. , Durbin, R. (2009). ‘Fast and accurate short read alignment with Burrows-Wheeler transform’. Bioinformatics. 25, 14, 1754-1760 Google Scholar
  • 13. Li, J. , Jiang, H. , Wong, W.H. (2010). ‘Modeling non-uniformity in short-read rates in RNA-Seq data’. Genome Biology. 11, 5, R50 Google Scholar
  • 14. Li, M. , Wang, I.X. , Li, Y. , Bruzel, A. , Richards, A.L. , Toung, J.M. , Cheung, V.G. (2011). ‘Widespread RNA and DNA sequence differences in the human transcriptome’. Science. 333, 6038, 53-58 Google Scholar
  • 15. Mardis, E.R. , Wilson, R.K. (2009). ‘Cancer genome sequencing: a review’. Human Molecular Genetics. 18, R2, R163-168 Google Scholar
  • 16. Mortazavi, A. , Williams, B.A. , Mccue, K. , Schaeffer, L. , Wold, B. (2008). ‘Mapping and quantifying mammalian transcriptomes by RNA-Seq’. Nature Methods. 5, 7, 621-628 Google Scholar
  • 17. Nelder, J. , Wedderburn, R. (1972). ‘Generalized linear models’. Journal of the Royal Statistical Society. Series A (General) (Blackwell Publishing). 135, 3, 370-384 Google Scholar
  • 18. Nicolae, M. , Mangul, S. , Mandoiu, I. , Zelikovsky, A. (2011). ‘Estimation of alternative splicing isoform frequencies from RNA-Seq data’. Algorithms for Molecular Biology: AMB. 6, 1, 9 Google Scholar
  • 19. Nookaew, I. , Papini, M. , Pornputtapong, N. , Scalcinati, G. , Fagerberg, L. , Uhlen, M. , Nielsen, J. (2012). ‘A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae’. Nucleic Acids Research. 40, 20, 10084-10097 Google Scholar
  • 20. Roberts, A. , Pimentel, H. , Trapnell, C. , Pachter, L. (2011a). ‘Identification of novel transcripts in annotated genomes using RNA-Seq’. Bioinformatics (Oxford, England). 27, 17, 2325-2329 Google Scholar
  • 21. Roberts, A. , Trapnell, C. , Donaghey, J. , Rinn, J.L. , Pachter, L. (2011b). ‘Improving RNA-Seq expression estimates by correcting for fragment bias’. Genome Biology. 12, 3, R22 Google Scholar
  • 22. Robinson, D.R. , Wu, Y.M. , Kalyana-Sundaram, S. , Cao, X. , Lonigro, R.J. , Sung, Y.S. , Chen, C.L. , Zhang, L. , Wang, R. , Su, F. , Iyer, M.K. , Roychowdhury, S. , Siddiqui, J. , Pienta, K.J. , Kunju, L.P. , Talpaz, M. , Mosquera, J.M. , Singer, S. , Schuetze, S.M. , Antonescu, C.R. , Chinnaiyan, A.M. (2013). ‘Identification of recurrent NAB2-STAT6 gene fusions in solitary fibrous tumor by integrative sequencing’. Nature Genetics. 45, 2, 180-185 Google Scholar
  • 23. Robinson, M.D. , Mccarthy, D.J. , Smyth, G.K. (2010). ‘edgeR: a bioconductor package for differential expression analysis of digital gene expression data’. Bioinformatics. 26, 1, 139-140 Google Scholar
  • 24. Ruark, E. , Snape, K. , Humburg, P. , Loveday, C. , Bajrami, I. , Brough, R. , Rodrigues, D.N. , Renwick, A. , Seal, S. , Ramsay, E. , Duarte Sdel, V. , Rivas, M.A. , Warren-Perry, M. , Zachariou, A. , Campion-Flora, A. , Hanks, S. , Murray, A. , Ansari Pour, N. , Douglas, J. , Gregory, L. , Rimmer, A. , Walker, N.M. , Yang, T.P. , Adlard, J.W. , Barwell, J. , Berg, J. , Brady, A.F. , Brewer, C. , Brice, G. , Chapman, C. , Cook, J. , Davidson, R. , Donaldson, A. , Douglas, F. , Eccles, D. , Evans, D.G. , Greenhalgh, L. , Henderson, A. , Izatt, L. , Kumar, A. , Lalloo, F. , Miedzybrodzka, Z. , Morrison, P.J. , Paterson, J. , Porteous, M. , Rogers, M.T. , Shanley, S. , Walker, L. , Gore, M. , Houlston, R. , Brown, M.A. , Caufield, M.J. , Deloukas, P. , Mccarthy, M.I. , Todd, J.A. , Breast, Ovarian Cancer Susceptibility, C., Wellcome Trust Case Control, C., Turnbull, C. , Reis-Filho, J.S. , Ashworth, A. , Antoniou, A.C. , Lord, C.J. , Donnelly, P. , Rahman, N. (2013). ‘Mosaic PPM1D mutations are associated with predisposition to breast and ovarian cancer’. Nature. 493, 7432, 406-410 Google Scholar
  • 25. Sadamoto, H. , Takahashi, H. , Okada, T. , Kenmoku, H. , Toyota, M. , Asakawa, Y. (2012). ‘De novo sequencing and transcriptome analysis of the central nervous system of mollusc Lymnaea stagnalis by deep RNA sequencing’. PLoS One. 7, 8, e42546 Google Scholar
  • 26. Schloss, P.D. , Gevers, D. , Westcott, S.L. (2011). ‘Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies’. PloS One. 6, 12, e27310 Google Scholar
  • 27. Simon, C. , Chagraoui, J. , Krosl, J. , Gendron, P. , Wilhelm, B. , Lemieux, S. , Boucher, G. , Chagnon, P. , Drouin, S. , Lambert, R. , Rondeau, C. , Bilodeau, A. , Lavallee, S. , Sauvageau, M. , Hebert, J. , Sauvageau, G. (2012). ‘A key role for EZH2 and associated genes in mouse and human adult T-cell acute leukemia’. Genes & Development. 26, 7, 651-656 Google Scholar
  • 28. Szulwach, K.E. , Li, X. , Li, Y. , Song, C.X. , Han, J.W. , Kim, S. , Namburi, S. , Hermetz, K. , Kim, J.J. , Rudd, M.K. , Yoon, Y.S. , Ren, B. , He, C. , Jin, P. (2011). ‘Integrating 5-hydroxymethylcytosine into the epigenomic landscape of human embryonic stem cells’. PLoS Genetics. 7, 6, e1002154 Google Scholar
  • 29. Taliaferro, J.M. , Aspden, J.L. , Bradley, T. , Marwha, D. , Blanchette, M. , Rio, D.C. (2013). ‘Two new and distinct roles for Drosophila Argonaute-2 in the nucleus: alternative pre-mRNA splicing and transcriptional repression’. Genes & Development. 27, 4, 378-389 Google Scholar
  • 30. Tang, F. , Barbacioru, C. , Nordman, E. , Li, B. , Xu, N. , Bashkirov, V.I. , Lao, K. , Surani, M.A. (2010). ‘RNA-Seq analysis to capture the transcriptome landscape of a single cell’. Nature Protocols. 5, 3, 516-535 Google Scholar
  • 31. Tang, F. , Barbacioru, C. , Wang, Y. , Nordman, E. , Lee, C. , Xu, N. , Wang, X. , Bodeau, J. , Tuch, B.B. , Siddiqui, A. , Lao, K. , Surani, M.A. (2009). ‘mRNA-Seq whole-transcriptome analysis of a single cell’. Nature Methods. 6, 5, 377-382 Google Scholar
  • 32. Todd, A.G. , Lin, H. , Ebert, A.D. , Liu, Y. , Androphy, E.J. (2013). ‘COPI transport complexes bind to specific RNAs in neuronal cells’. Human Molecular Genetics. 22, 4, 729-736 Google Scholar
  • 33. Ueno, S. , Klopp, C. , Leple, J.C. , Derory, J. , Noirot, C. , Leger, V. , Prince, E. , Kremer, A. , Plomion, C. , Le Provost, G. (2013). ‘Transcriptional profiling of bud dormancy induction and release in oak by next-generation sequencing’. BMC Genomics. 14, 236 Google Scholar
  • 34. Varshney, R.K. , Song, C. , Saxena, R.K. , Azam, S. , Yu, S. , Sharpe, A.G. , Cannon, S. , Baek, J. , Rosen, B.D. , Tar’an, B. , Millan, T. , Zhang, X. , Ramsay, L.D. , Iwata, A. , Wang, Y. , Nelson, W. , Farmer, A.D. , Gaur, P.M. , Soderlund, C. , Penmetsa, R.V. , Xu, C. , Bharti, A.K. , He, W. , Winter, P. , Zhao, S. , Hane, J.K. , Carrasquilla-Garcia, N. , Condie, J.A. , Upadhyaya, H.D. , Luo, M.C. , Thudi, M. , Gowda, C.L. , Singh, N.P. , Lichtenzveig, J. , Gali, K.K. , Rubio, J. , Nadarajan, N. , Dolezel, J. , Bansal, K.C. , Xu, X. , Edwards, D. , Zhang, G. , Kahl, G. , Gil, J. , Singh, K.B. , Datta, S.K. , Jackson, S.A. , Wang, J. , Cook, D.R. (2013). ‘Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement’. Nature Biotechnology. 31, 3, 240-246 Google Scholar
  • 35. Wang, Z. , Gerstein, M. , Snyder, M. (2009). ‘RNA-Seq: a revolutionary tool for transcriptomics’. Nature Review Genetics. 10, 1, 57-63 Google Scholar
  • 36. You, M. , Yue, Z. , He, W. , Yang, X. , Yang, G. , Xie, M. , Zhan, D. , Baxter, S.W. , Vasseur, L. , Gurr, G.M. , Douglas, C.J. , Bai, J. , Wang, P. , Cui, K. , Huang, S. , Li, X. , Zhou, Q. , Wu, Z. , Chen, Q. , Liu, C. , Wang, B. , Li, X. , Xu, X. , Lu, C. , Hu, M. , Davey, J.W. , Smith, S.M. , Chen, M. , Xia, X. , Tang, W. , Ke, F. , Zheng, D. , Hu, Y. , Song, F. , You, Y. , Ma, X. , Peng, L. , Zheng, Y. , Liang, Y. , Chen, Y. , Yu, L. , Zhang, Y. , Liu, Y. , Li, G. , Fang, L. , Li, J. , Zhou, X. , Luo, Y. , Gou, C. , Wang, J. , Wang, J. , Yang, H. , Wang, J. (2013). ‘A heterozygous moth genome provides insights into herbivory and detoxification’. Nature Genetics. 45, 2, 220-225 Google Scholar
  • 37. Zhang, K. , Li, J.B. , Gao, Y. , Egli, D. , Xie, B. , Deng, J. , Li, Z. , Lee, J.H. , Aach, J. , Leproust, E.M. , Eggan, K. , Church, G.M. (2009). ‘Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human’. Nature Methods. 6, 8, 613-618 Google Scholar