Mangul Lab publishes second paper – a benchmarking study of error correction methods

Publications Research

Mangul Lab publishes second paper – a benchmarking study of error correction methods

Today we celebrate the publication of our second peer-reviewed paper since Mangul Lab launched at USC.

Keith Mitchell, alumnus of Serghei’s group at UCLA, is the first author of “Benchmarking of computational error-correction methods for next-generation sequencing data,” which is now available online. Jaque (our postdoc), Lana (our project specialist), Aaron (our bioinformatics software engineer), Qiaozhen (Jenny) (undergraduate researcher), and Sei (undergraduate researcher) collaborated with Serghei and many others to evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. In addition, Mangul Lab alumni—Russell, Harry, Kevin, Linus, Eli, Taylor, and German, and Douglas—are co-authors and contributed to the paper while undergraduate or high school researchers at UCLA.

While Rapid advancements in next-generation sequencing have improved our ability to study the genomic material of a biological sample at an unprecedented scale, sequencing errors occur in approximately 0.1–1% of bases sequenced and may bias or limit the results of a study.

Our comprehensive benchmarking study of currently available error-correction methods identified numerous effects that various sequencing settings and parameters of error-correction methods can have on the accuracy of output from error-correction methods. We investigated the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.

We hope that researchers who analyze next-generation sequencing data find our paper helpful in guiding their selection of error-correction tools!

We note that benchmarking is a very accessible project for undergraduates—and even high schoolers—to make meaningful contributions toward, due to a large number of routine tasks which provide great training. If you are still hesitant to involve undergraduates in your research, please check out our paper in Nature Biotechnology, which provides recommendations.

As with any Mangul Lab project, all the data and code necessary to reproduce figures are available on GitHub. Online notebooks provided by Project Jupyter allow truly scientifically reproducible figure generation; for example, see https://github.com/Mangul-Lab-USC/benchmarking_error_correction/tree/master/notebooks. We owe credit to Jaque for organizing data and creating notebooks!

Reference:

Keith Mitchell, Jaqueline J Brito, Igor Mandric, Qiaozhen Wu, Sergey Knyazev, Sei Chang, Lana S Martin, Aaron Karlsberg, Ekaterina Gerasimov, Russell Jared Littman, Brian L Hill, Nicholas C Wu, Harry Taegyun Yang, Kevin Hsieh, Linus Chen, Eli Littman, Taylor Shabani, German Shabanets, Douglas Yao, Ren Sun, Jan Schroeder, Eleazar Eskin, Alex Zelikovsky, Pavel Skums, Mihai Pop, Serghei Mangul: Benchmarking of computational error-correction methods for next-generation sequencing data. In: Genome Biology, 21 (71), 2020. https://doi.org/10.1186/s13059-020-01988-3