Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

Guergana Savova, Michael Schonwetter, Sergey Pakhomov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.

Original languageEnglish (US)
Title of host publication6th International Conference on Spoken Language Processing, ICSLP 2000
PublisherInternational Speech Communication Association
ISBN (Electronic)7801501144, 9787801501141
StatePublished - 2000
Event6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China
Duration: Oct 16 2000Oct 20 2000

Publication series

Name6th International Conference on Spoken Language Processing, ICSLP 2000

Other

Other6th International Conference on Spoken Language Processing, ICSLP 2000
Country/TerritoryChina
CityBeijing
Period10/16/0010/20/00

Bibliographical note

Funding Information:
Financial support from Academy of Finland is gratefully acknowledged (Grant Number 111692). The author would also like to thank Johnny Lindroos, Fredrick Sundell and Marketta Hiisa for their contribution to the project and their assistance in carrying out some of the experiments.

Fingerprint

Dive into the research topics of 'Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora'. Together they form a unique fingerprint.

Cite this