Программное обеспечение для обработки данных полногеномного секвенирования микроорганизмов
Аннотация
Об авторах
М. В. СпринджукБеларусь
220012, г. Минск, ул. Сурганова, д. 6
Р. С. Сергеев
Беларусь
Ю. Е. Демидчик
Беларусь
О. М. Залуцкая
Беларусь
А. Е. Скрягин
Беларусь
А. М. Скрягина
Беларусь
Список литературы
1. Abdennadher N., Boesch R. Porting PHYLIP phylogenetic package on the desktop GRID platform XtremWeb-CH // Stud. Health Technol. Inform. - 2007. - Vol. 126. - P. 55-64.
2. Aranguren M. E. et al. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows // J. Biomed. Semantics. - 2013. - Vol. 4, № 1. - P. 2.
3. Bakker H. C. et al. A whole-genome single nucleotide polymorphism-based approach to trace and identify outbreaks linked to a common Salmonella enterica subsp. enterica serovar Montevideo pulsed-field gel electrophoresis type // Appl. Environ Microbiol. - 2011. - Vol. 77, № 24. - P. 8648-8655.
4. Bao J. et al. Efficient implementation of MrBayes on multi-GPU // Mol. Biol. Evol. - 2013. - Vol. 30, № 6. - P. 1471-1479.
5. Boutet E. et al. UniProtKB/Swiss-Prot // Methods Mol. Biol. - 2007. - Vol. 406. - P. 89-112.
6. Braconi Quintaje S., Orchard S. The annotation of both human and mouse kinomes in UniProtKB/Swiss-Prot: one small step in manual annotation, one giant leap for full comprehension of genomes // Mol. Cell. Proteomics. - 2008. - Vol. 7, № 8. - P. 1409-1419.
7. Burland T. G. Dnastar’s Lasergene sequence analysis software // Methods Mol. Biol. - 2000. - Vol. 132. - P. 71-91.
8. Chang C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets // Gigascience. - 2015. - Vol. 4. - P. 7.
9. Clewley J. P. Geneman of lasergene // Methods Mol. Biol. - 1997. - Vol. 70. - P. 189-196.
10. Clewley J. P. Macintosh sequence analysis software. DNAStar’s LaserGene // Mol. Biotechnol. - 1995. - Vol. 3, № 3. - P. 221-224.
11. Clewley J. P., Arnold C. MEGALIGN. The multiple alignment module of lasergene // Methods Mol. Biol. - 1997. - Vol. 70. - P. 119-129.
12. Cock P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics // Bioinformatics. - 2009. - Vol. 25, № 11. - P. 1422-1423.
13. Coletta A. et al. InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor // Genome Biol. - 2012. - Vol. 13, № 11. - P. R104.
14. Crabtree J. et al. Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG // Bioinformatics. - 2014. - Vol. 30, № 21. - P. 3125-3127.
15. Di Tommaso P. et al. Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud // Bioinformatics. - 2010. - Vol. 26, № 15. - P. 1903-1904.
16. Di Tommaso P. et al. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension // Nucleic Acids Res. - 2011. - Vol. 39, № 1. - P. W13-W17.
17. Etherington G. J. et al. Bio-samtools 2: a package for analysis and visualization of sequence and alignment data with SAMtools in Ruby // Bioinformatics. - 2015. - № 1. - P. 1-12.
18. Famiglietti M. L. et al. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation // Hum. Mutat. - 2014. - Vol. 35, № 8. - P. 927-935.
19. Fan S. B. et al. Using pLink to analyze cross-linked peptides // Curr. Protoc. Bioinformatics. - 2015. - Vol. 49. - P. 1-8.
20. Fukami-Kobayashi K., Saito N. How to make good use of CLUSTALW // Tanpakushitsu Kakusan Koso. - 2002. - Vol. 47, № 9. - P. 1237-1239.
21. Gaudet P. et al. Collaborative annotation of genes and proteins between UniProtKB/Swiss-Prot and dictyBase // Database (Oxford). - 2009. - Vol. 2009. - P. bap016.
22. Golosova O. et al. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses // Peer. J. - 2014. - Vol. 2. - P. e644.
23. Goto N. et al. BioRuby: bioinformatics software for the Ruby programming language // Bioinformatics. - 2010. - Vol. 26, № 20. - P. 2617-2619.
24. Guo C. et al. Ugene, a newly identified protein that is commonly overexpressed in cancer and binds uracil DNA glycosylase // Cancer Res. - 2008. - Vol. 68, № 15. - P. 6118-6126.
25. Harmsen D. et al. RIDOM: comprehensive and public sequence database for identification of Mycobacterium species // BMC Infect. Dis. - 2003. - Vol. 3. - P. 26.
26. Harmsen D. et al. RIDOM: Ribosomal differentiation of medical micro-organisms database // Nucleic Acids Res. - 2002. - Vol. 30, № 1. - P. 416-417.
27. Harris S. R. et al. Read and assembly metrics inconsequential for clinical utility of whole-genome sequencing in mapping outbreaks // Nat. Biotechnol. - 2013. - Vol. 31, № 7. - P. 592-594.
28. Holland R. C. et al. BioJava: an open-source framework for bioinformatics // Bioinformatics. - 2008. - Vol. 24, № 18. - P. 2096-2097.
29. Huelsenbeck J. P., Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees // Bioinformatics. - 2001. - Vol. 17, № 8. - P. 754-755.
30. Jayadeepa R. M. et al. Review and research analysis of computational target methods using BioRuby and in silico screening of herbal lead compounds against pancreatic cancer using R programming // Curr. Drug. Metab. - 2014. - Vol. 15, № 5. - P. 535-543.
31. Jungo F. et al. The UniProtKB/Swiss-Prot Tox-Prot program: A central hub of integrated venom protein data // Toxicon. - 2012. - Vol. 60, № 4. - P. 551-557.
32. Kaisers W. et al. rbamtools: an R interface to samtools enabling fast accumulative tabulation of splicing events over multiple RNA-seq samples // Bioinformatics. - 2015. - Vol. 31, № 10. - P. 1663-1664.
33. Katoh K. et al. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform // Nucleic Acids Res. - 2002. - Vol. 30, № 14. - P. 3059-3066.
34. Kim D., Salzberg S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts // Genome Biol. - 2011. - Vol. 12, № 8. - P. R72.
35. Kim K. U. et al. Comparison of functional gene annotation of Toxascaris leonina and Toxocara canis using CLC genomics workbench // Korean. J. Parasitol. - 2013. - Vol. 51, № 5. - P. 525-530.
36. Kleindienst R. et al. Highly efficient refractive Gaussian-to-tophat beam shaper for compact terahertz imager // Appl. Opt. - 2010. - Vol. 49, № 10. - P. 1757-1763.
37. Kohli D. K., Bachhawat A. K. CLOURE: Clustal Output Reformatter, a program for reformatting ClustalX/ClustalW outputs for SNP analysis and molecular systematics // Nucleic Acids Res. - 2003. - Vol. 31, № 13. - P. 3501-3502.
38. Li H. et al. The Sequence Alignment/Map format and SAMtools // Bioinformatics. - 2009. - Vol. 25, № 16. - P. 2078-2079.
39. Li K. B. ClustalW-MPI: ClustalW analysis using distributed and parallel computing // Bioinformatics. - 2003. - Vol. 19, № 12. - P. 1585-1586.
40. Lim A., Zhang L. WebPHYLIP: a web interface to PHYLIP // Bioinformatics. - 1999. - Vol. 15, № 12. - P. 1068-1069.
41. Lima T. et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot // Nucleic. Acids Res. - 2009. - Vol. 37. - P. 471-478.
42. Ling C. et al. MrBayes tgMC(3): a tight GPU implementation of MrBayes // PLoS One. - 2013. - Vol. 8, № 4. - P. e60667.
43. Magis C. et al. T-Coffee: Tree-based consistency objective function for alignment evaluation // Methods Mol. Biol. - 2014. - Vol. 1079. - P. 117-129.
44. McCormick R. F. et al. RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK // G3 (Bethesda). - 2015. - Vol. 5, № 4. - P. 655-665.
45. Mellmann A. et al. Evaluation of RIDOM, MicroSeq, and Genbank services in the molecular identification of Nocardia species // Int. J. Med. Microbiol. - 2003. - Vol. 293, № 5. - P. 359-370.
46. Notredame C. Computing multiple sequence/structure alignments with the T-coffee package // Curr. Protoc. Bioinformatics. - 2010. - Vol. 3. - P. 1-25.
47. Notredame C. et al. T-Coffee: A novel method for fast and accurate multiple sequence alignment // J. Mol. Biol. - 2000. - Vol. 302, № 1. - P. 205-217.
48. Notredame C., Suhre K. Computing multiple sequence/structure alignments with the T-coffee package // Curr. Protoc. Bioinformatics. - 2004. - Vol. 3. - P. 3-8.
49. Octavia S. et al. Delineating community outbreaks of Salmonella enterica serovar Typhimurium by use of whole-genome sequencing: insights into genomic variability within an outbreak // J. Clin. Microbiol. - 2015. - Vol. 53, № 4. - P. 1063-1071.
50. Ogden T. H., Rosenberg M. S. Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW + PAUP* // Syst. Biol. - 2007. - Vol. 56, № 2. - P. 182-193.
51. Okonechnikov K. et al. Unipro UGENE: a unified bioinformatics toolkit // Bioinformatics. - 2012. - Vol. 28, № 8. - P. 1166-1167.
52. Oliver T. et al. Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW // Bioinformatics. - 2005. - Vol. 21, № 16. - P. 3431-3432.
53. Pantosti A. Whole-genome sequencing may be key to abating hospital-based methicillin-resistant Staphylociccus aureus outbreaks // J. Pediatr. - 2013. - Vol. 162, № 5. - P. 1079-1080.
54. Pollier J. et al. Analysis of RNA-Seq data with TopHat and Cufflinks for genome-wide expression analysis of jasmonate-treated plants and plant cultures // Methods Mol. Biol. - 2013. - Vol. 1011. - P. 305-315.
55. Prlic A. et al. BioJava: an open-source framework for bioinformatics in 2012 // Bioinformatics. - 2012. - Vol. 28, № 20. - P. 2693-2695.
56. Purcell S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses // Am. J. Hum Genet. - 2007. - Vol. 81, № 3. - P. 559-575.
57. Ramirez-Gonzalez R. H. et al. Bio-samtools: Ruby bindings for SAMtools, a library for accessing BAM files containing high-throughput sequence alignments // Source Code Biol. Med. - 2012. - Vol. 7, № 1. - P. 6.
58. Renteria M. E. et al. Using PLINK for Genome-Wide Association Studies (GWAS) and data analysis // Methods Mol. Biol. - 2013. - Vol. 1019. - P. 193-213.
59. Retief J. D. Phylogenetic analysis using PHYLIP // Methods Mol. Biol. - 2000. - Vol. 132. - P. 243-258.
60. Rius J. et al. A user-friendly web portal for T-Coffee on supercomputers // BMC Bioinformatics. - 2011. - Vol. 12. - P. 150.
61. Robinson J. T. et al. Integrative genomics viewer // Nat. Biotechnol. - 2011. - Vol. 29, № 1. - P. 24-26.
62. Ronquist F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space // Syst. Biol. - 2012. - Vol. 61, № 3. - P. 539-542.
63. Ronquist F., Huelsenbeck J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models // Bioinformatics. - 2003. - Vol. 19, № 12. - P. 1572-1574.
64. Ropelewski A. J. et al. MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families // PLoS One. - 2010. - Vol. 5, № 11. - P. e13999.
65. Rothganger J. et al. Ridom TraceEdit: a DNA trace editor and viewer // Bioinformatics. - 2006. - Vol. 22, № 4. - P. 493-494.
66. Schneider M. et al. The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program // J. Proteomics. - 2009. - Vol. 72, № 3. - P. 567-573.
67. Shi H. et al. Using Fisher’s method with PLINK «LD clumped» output to compare SNP effects across Genome-wide Association Study (GWAS) datasets // Int. J. Mol. Epidemiol. Genet. - 2011. - Vol. 2, № 1. - P. 30-35.
68. Stajich J. E. An Introduction to BioPerl // Methods Mol. Biol. - 2007. - Vol. 406. - P. 535-548.
69. Stajich J. E. et al. The Bioperl toolkit: Perl modules for the life sciences // Genome Res. - 2002. - Vol. 12, № 10. - P. 1611-1618.
70. Talevich E. et al. Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython // BMC Bioinformatics. - 2012. - Vol. 13. - P. 209.
71. Taly J. F. et al. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures // Nat. Protoc. - 2011. - Vol. 6, № 11. - P. 1669-1682.
72. Thompson J. D. et al. Multiple sequence alignment using ClustalW and ClustalX // Curr. Protoc. Bioinformatics. - 2002. - Vol. 2. - P. Unit 2 3.
73. Thorvaldsdottir H. et al. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration // Brief Bioinform. - 2013. - Vol. 14, № 2. - P. 178-192.
74. Trapnell C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks // Nat. Protoc. - 2012. - Vol. 7, № 3. - P. 562-578.
75. Trapnell C. et al. TopHat: discovering splice junctions with RNA-Seq // Bioinformatics. - 2009. - Vol. 25, № 9. - P. 1105-1111.
76. Vangala R. K. et al. BioParishodhana: A novel graphical interface integrating BLAST, ClustalW, primer3 and restriction digestion tools // Bioinformation. - 2012. - Vol. 8, № 13. - P. 639-643.
77. Vaskin Y. Y. et al. ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes // In Silico. Biol. - 2011. - Vol. 11, № 3-4. - P. 97-108.
78. Walker T. M. et al. Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing // Clin. Microbiol. Infect. - 2013. - Vol. 19, № 9. - P. 796-802.
79. Walker T. M. et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study // Lancet Infect. Dis. - 2013. - Vol. 13, № 2. - P. 137-146.
80. Wallace I. M. et al. M-Coffee: combining multiple sequence alignment methods with T-Coffee // Nucleic. Acids Res. - 2006. - Vol. 34, № 6. - P. 1692-1699.
81. Wang L. T. et al. Functional interaction of Ugene and EBV infection mediates tumorigenic effects // Oncogene. - 2011. - Vol. 30, № 26. - P. 2921-2932.
82. Wheelan S. J. et al. Spidey: a tool for mRNA-to-genomic alignments // Genome Res. - 2001. - Vol. 11, № 11. - P. 1952-1957.
83. Zhou J. et al. MrBayes on a graphics processing unit // Bioinformatics. - 2011. - Vol. 27, № 9. - P. 1255-1261.
84. Zhu P. et al. OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data // PLoS One. - 2014. - Vol. 9, № 5. - P. e97507.
Рецензия
Для цитирования:
Спринджук М.В., Сергеев Р.С., Демидчик Ю.Е., Залуцкая О.М., Скрягин А.Е., Скрягина А.М. Программное обеспечение для обработки данных полногеномного секвенирования микроорганизмов. Туберкулез и болезни легких. 2016;94(2):47-54.
For citation:
Sprindzhuk M.V., Sergeev R.C., Demidchik Yu.E., Zalutskaya O.M., Skryagin A.E., Skryagina A.M. Software for processing of data of whole-genome sequence analysis of microorganisms. Tuberculosis and Lung Diseases. 2016;94(2):47-54. (In Russ.)