Methods for eliciting, annotating, and analyzing databases for child speech development

Mary E. Beckman; Andrew R. Plummer; Benjamin Munson; Patrick F. Reidy

doi:10.1016/j.csl.2017.02.010

Methods for eliciting, annotating, and analyzing databases for child speech development

Mary E. Beckman, Andrew R. Plummer, Benjamin Munson, Patrick F. Reidy

Speech-Language-Hearing Sciences

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver–infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.

Original language	English (US)
Pages (from-to)	278-299
Number of pages	22
Journal	Computer Speech and Language
Volume	45
DOIs	https://doi.org/10.1016/j.csl.2017.02.010
State	Published - Sep 2017

Bibliographical note

Publisher Copyright:
© 2017 Elsevier Ltd

Keywords

Automatic speech recognition
Big data corpora
Child speech development
Phonetic transcription
Spectral kinematics

Access

10.1016/j.csl.2017.02.010

OpenUrl availability

Full text

Cite this

@article{b1552b9667c54729b3a239f03a41b04a,

title = "Methods for eliciting, annotating, and analyzing databases for child speech development",

abstract = "Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver–infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.",

keywords = "Automatic speech recognition, Big data corpora, Child speech development, Phonetic transcription, Spectral kinematics",

author = "Beckman, {Mary E.} and Plummer, {Andrew R.} and Benjamin Munson and Reidy, {Patrick F.}",

note = "Publisher Copyright: {\textcopyright} 2017 Elsevier Ltd",

year = "2017",

month = sep,

doi = "10.1016/j.csl.2017.02.010",

language = "English (US)",

volume = "45",

pages = "278--299",

journal = "Computer Speech and Language",

issn = "0885-2308",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Methods for eliciting, annotating, and analyzing databases for child speech development

AU - Beckman, Mary E.

AU - Plummer, Andrew R.

AU - Munson, Benjamin

AU - Reidy, Patrick F.

PY - 2017/9

Y1 - 2017/9

N2 - Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver–infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.

AB - Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver–infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.

KW - Automatic speech recognition

KW - Big data corpora

KW - Child speech development

KW - Phonetic transcription

KW - Spectral kinematics

UR - http://www.scopus.com/inward/record.url?scp=85014803867&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014803867&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.02.010

DO - 10.1016/j.csl.2017.02.010

M3 - Article

C2 - 28943715

AN - SCOPUS:85014803867

SN - 0885-2308

VL - 45

SP - 278

EP - 299

JO - Computer Speech and Language

JF - Computer Speech and Language

ER -

Methods for eliciting, annotating, and analyzing databases for child speech development

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this