Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Shujun Ou, Jianing Liu, Kapeel M. Chougule, Arkarachai Fungtammasan, Arun S. Seetharam, Joshua C. Stein, Victor Llaca, Nancy Manchanda, Amanda M. Gilbert, Sharon Wei, Chen Shan Chin, David E. Hufnagel, Sarah Pedersen, Samantha J. Snodgrass, Kevin Fengler, Margaret Woodhouse, Brian P. Walenz, Sergey Koren, Adam M. Phillippy, Brett T. HanniganR. Kelly Dawe, Candice N. Hirsch, Matthew B. Hufford, Doreen Ware

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

Original languageEnglish (US)
Article number2288
JournalNature communications
Volume11
Issue number1
DOIs
StatePublished - Dec 1 2020

Bibliographical note

Funding Information:
This work was supported by NSF Plant Genome Research Program grant IOS-1744001 to R.K.D., D.W., and M.B.H., and grant IOS-1546727 to C.N.H., USDA ARS 5030-21000- 068-00D to M.W., and USDA ARS 58-8062-2100-044 to D.W. B.P.W., S.K., and A.M.P. were supported by the Intramural Research Program of the National Human Genome Research Institute. We acknowledge Jonathan Gent for helpful discussion on repeat space analyses. This research was supported in part by the U.S. Department of Agriculture, Agricultural Research Service. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. USDA is an equal opportunity provider and employer.

Publisher Copyright:
© 2020, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.

Fingerprint Dive into the research topics of 'Effect of sequence depth and length in long-read assembly of the maize inbred NC358'. Together they form a unique fingerprint.

Cite this