Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

Guergana Savova; Michael Schonwetter; Sergey Pakhomov

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

Guergana Savova, Michael Schonwetter, Sergey Pakhomov

Pharmaceutical Care and Health Systems

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.

Original language	English (US)
Title of host publication	6th International Conference on Spoken Language Processing, ICSLP 2000
Publisher	International Speech Communication Association
ISBN (Electronic)	7801501144, 9787801501141
State	Published - 2000
Event	6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China Duration: Oct 16 2000 → Oct 20 2000

Publication series

Name	6th International Conference on Spoken Language Processing, ICSLP 2000

Other

Other	6th International Conference on Spoken Language Processing, ICSLP 2000
Country/Territory	China
City	Beijing
Period	10/16/00 → 10/20/00

Bibliographical note

Funding Information:
Financial support from Academy of Finland is gratefully acknowledged (Grant Number 111692). The author would also like to thank Johnny Lindroos, Fredrick Sundell and Marketta Hiisa for their contribution to the project and their assistance in carrying out some of the experiments.

OpenUrl availability

Full text

Cite this

Savova, G., Schonwetter, M., & Pakhomov, S. (2000). Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. In 6th International Conference on Spoken Language Processing, ICSLP 2000 (6th International Conference on Spoken Language Processing, ICSLP 2000). International Speech Communication Association.

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. / Savova, Guergana; Schonwetter, Michael; Pakhomov, Sergey.
6th International Conference on Spoken Language Processing, ICSLP 2000. International Speech Communication Association, 2000. (6th International Conference on Spoken Language Processing, ICSLP 2000).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Savova, G, Schonwetter, M & Pakhomov, S 2000, Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. in 6th International Conference on Spoken Language Processing, ICSLP 2000. 6th International Conference on Spoken Language Processing, ICSLP 2000, International Speech Communication Association, 6th International Conference on Spoken Language Processing, ICSLP 2000, Beijing, China, 10/16/00.

Savova G, Schonwetter M, Pakhomov S. Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. In 6th International Conference on Spoken Language Processing, ICSLP 2000. International Speech Communication Association. 2000. (6th International Conference on Spoken Language Processing, ICSLP 2000).

Savova, Guergana ; Schonwetter, Michael ; Pakhomov, Sergey. / Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. 6th International Conference on Spoken Language Processing, ICSLP 2000. International Speech Communication Association, 2000. (6th International Conference on Spoken Language Processing, ICSLP 2000).

@inproceedings{55238258ab2740ce857f0d522088a503,

title = "Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora",

abstract = "We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.",

author = "Guergana Savova and Michael Schonwetter and Sergey Pakhomov",

note = "Funding Information: Financial support from Academy of Finland is gratefully acknowledged (Grant Number 111692). The author would also like to thank Johnny Lindroos, Fredrick Sundell and Marketta Hiisa for their contribution to the project and their assistance in carrying out some of the experiments.; 6th International Conference on Spoken Language Processing, ICSLP 2000 ; Conference date: 16-10-2000 Through 20-10-2000",

year = "2000",

language = "English (US)",

series = "6th International Conference on Spoken Language Processing, ICSLP 2000",

publisher = "International Speech Communication Association",

booktitle = "6th International Conference on Spoken Language Processing, ICSLP 2000",

}

TY - GEN

T1 - Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

AU - Savova, Guergana

AU - Schonwetter, Michael

AU - Pakhomov, Sergey

N1 - Funding Information: Financial support from Academy of Finland is gratefully acknowledged (Grant Number 111692). The author would also like to thank Johnny Lindroos, Fredrick Sundell and Marketta Hiisa for their contribution to the project and their assistance in carrying out some of the experiments.

PY - 2000

Y1 - 2000

N2 - We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.

AB - We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.

UR - http://www.scopus.com/inward/record.url?scp=85009067813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009067813&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009067813

T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000

BT - 6th International Conference on Spoken Language Processing, ICSLP 2000

PB - International Speech Communication Association

T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000

Y2 - 16 October 2000 through 20 October 2000

ER -

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

Abstract

Publication series

Other

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this