This paper describes an experiment performed using the Principal Direction Divisive Partitioning algorithm (Boley, 1998) in order to extract linguistic word error regularities from several sets of medical dictation data. For each of six physicians, two hundred finished medical dictations aligned with their corresponding automatic speech recognition output were clustered and the results analyzed for linguistic regularities between and within clusters. Sparsity measures indicated a good fit between the algorithm and the input data. Linguistic analysis of the output clusters showed evidence of systematic word recognition error for short words, function words, words with destressed vowels, and phonological confusion errors due to telephony (recording) bandwidth interference. No qualitatively significant distinctions between clusters could be made by examining word errors alone, but the results confirmed several informally held hypotheses and suggested several avenues of further investigation, such as the examination of word error contexts.
|Original language||English (US)|
|Title of host publication||Machine Learning|
|Subtitle of host publication||ECML 2000 - 11th European Conference on Machine Learning, Proceedings|
|Editors||Ramon Lopez de Mantaras, Enric Plaza|
|Number of pages||8|
|State||Published - 2000|
|Event||11th European Conference on Machine Learning, ECML 2000 - Barcelona, Catalonia, Spain|
Duration: May 31 2000 → Jun 2 2000
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||11th European Conference on Machine Learning, ECML 2000|
|Period||5/31/00 → 6/2/00|
Bibliographical noteFunding Information:
This work was partially supported by NSF grant IIS-9811229.