Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Shujun Ou; Jianing Liu; Kapeel M. Chougule; Arkarachai Fungtammasan; Arun S. Seetharam; Joshua C. Stein; Victor Llaca; Nancy Manchanda; Amanda M. Gilbert; Sharon Wei; Chen Shan Chin; David E. Hufnagel; Sarah Pedersen; Samantha J. Snodgrass; Kevin Fengler; Margaret Woodhouse; Brian P. Walenz; Sergey Koren; Adam M. Phillippy; Brett T. Hannigan; R. Kelly Dawe; Candice N. Hirsch; Matthew B. Hufford; Doreen Ware

doi:10.1038/s41467-020-16037-7

Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Shujun Ou, Jianing Liu, Kapeel M. Chougule, Arkarachai Fungtammasan, Arun S. Seetharam, Joshua C. Stein, Victor Llaca, Nancy Manchanda, Amanda M. Gilbert, Sharon Wei, Chen Shan Chin, David E. Hufnagel, Sarah Pedersen, Samantha J. Snodgrass, Kevin Fengler, Margaret Woodhouse, Brian P. Walenz, Sergey Koren, Adam M. Phillippy, Brett T. HanniganR. Kelly Dawe, Candice N. Hirsch, Matthew B. Hufford, Doreen Ware

Agronomy and Plant Genetics

Research output: Contribution to journal › Article › peer-review

30 Scopus citations

Abstract

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

Original language	English (US)
Article number	2288
Journal	Nature communications
Volume	11
Issue number	1
DOIs	https://doi.org/10.1038/s41467-020-16037-7
State	Published - Dec 1 2020

Bibliographical note

Publisher Copyright:
© 2020, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.

Access

10.1038/s41467-020-16037-7

OpenUrl availability

Full text

Cite this

Ou, S., Liu, J., Chougule, K. M., Fungtammasan, A., Seetharam, A. S., Stein, J. C., Llaca, V., Manchanda, N., Gilbert, A. M., Wei, S., Chin, C. S., Hufnagel, D. E., Pedersen, S., Snodgrass, S. J., Fengler, K., Woodhouse, M., Walenz, B. P., Koren, S., Phillippy, A. M., ... Ware, D. (2020). Effect of sequence depth and length in long-read assembly of the maize inbred NC358. Nature communications, 11(1), Article 2288. https://doi.org/10.1038/s41467-020-16037-7

Ou, S, Liu, J, Chougule, KM, Fungtammasan, A, Seetharam, AS, Stein, JC, Llaca, V, Manchanda, N, Gilbert, AM, Wei, S, Chin, CS, Hufnagel, DE, Pedersen, S, Snodgrass, SJ, Fengler, K, Woodhouse, M, Walenz, BP, Koren, S, Phillippy, AM, Hannigan, BT, Dawe, RK, Hirsch, CN, Hufford, MB & Ware, D 2020, 'Effect of sequence depth and length in long-read assembly of the maize inbred NC358', Nature communications, vol. 11, no. 1, 2288. https://doi.org/10.1038/s41467-020-16037-7

@article{ad7ba9b36e264edd83383438c1c854c4,

title = "Effect of sequence depth and length in long-read assembly of the maize inbred NC358",

abstract = "Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.",

author = "Shujun Ou and Jianing Liu and Chougule, {Kapeel M.} and Arkarachai Fungtammasan and Seetharam, {Arun S.} and Stein, {Joshua C.} and Victor Llaca and Nancy Manchanda and Gilbert, {Amanda M.} and Sharon Wei and Chin, {Chen Shan} and Hufnagel, {David E.} and Sarah Pedersen and Snodgrass, {Samantha J.} and Kevin Fengler and Margaret Woodhouse and Walenz, {Brian P.} and Sergey Koren and Phillippy, {Adam M.} and Hannigan, {Brett T.} and Dawe, {R. Kelly} and Hirsch, {Candice N.} and Hufford, {Matthew B.} and Doreen Ware",

note = "Publisher Copyright: {\textcopyright} 2020, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.",

year = "2020",

month = dec,

day = "1",

doi = "10.1038/s41467-020-16037-7",

language = "English (US)",

volume = "11",

journal = "Nature communications",

issn = "2041-1723",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - Effect of sequence depth and length in long-read assembly of the maize inbred NC358

AU - Ou, Shujun

AU - Liu, Jianing

AU - Chougule, Kapeel M.

AU - Fungtammasan, Arkarachai

AU - Seetharam, Arun S.

AU - Stein, Joshua C.

AU - Llaca, Victor

AU - Manchanda, Nancy

AU - Gilbert, Amanda M.

AU - Wei, Sharon

AU - Chin, Chen Shan

AU - Hufnagel, David E.

AU - Pedersen, Sarah

AU - Snodgrass, Samantha J.

AU - Fengler, Kevin

AU - Woodhouse, Margaret

AU - Walenz, Brian P.

AU - Koren, Sergey

AU - Phillippy, Adam M.

AU - Hannigan, Brett T.

AU - Dawe, R. Kelly

AU - Hirsch, Candice N.

AU - Hufford, Matthew B.

AU - Ware, Doreen

PY - 2020/12/1

Y1 - 2020/12/1

N2 - Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

AB - Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

UR - http://www.scopus.com/inward/record.url?scp=85084721264&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084721264&partnerID=8YFLogxK

U2 - 10.1038/s41467-020-16037-7

DO - 10.1038/s41467-020-16037-7

M3 - Article

C2 - 32385271

AN - SCOPUS:85084721264

SN - 2041-1723

VL - 11

JO - Nature communications

JF - Nature communications

IS - 1

M1 - 2288

ER -

Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this