A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning

Hung Wei Tseng; Srikanth Vishnubhotla; Mingyi Hong; Xiangfeng Wang; Jinjun Xiao

A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning

Hung Wei Tseng, Srikanth Vishnubhotla, Mingyi Hong, Xiangfeng Wang, Jinjun Xiao

Electrical and Computer Engineering

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

In this paper, we consider the single-channel speech enhancement problem, in which a clean speech signal needs to be estimated from a noisy observation. To capture the characteristics of both the noise and speech signals, we combine the well-known Short-Time-Spectrum-Amplitude (STSA) estimator with a machine learning based technique called Multi-frame Sparse Dictionary Learning (MSDL). The former utilizes statistical information for denoising, while the latter helps better preserve speech, especially its temporal structure. The proposed algorithm, named STSA-MSDL, outperforms standard statistical algorithms such as the Wiener filter, STSA estimator, as well as dictionary based algorithms when applied to the TIMIT database, using four different objective metrics that measure speech intelligibility, speech distortion, background noise reduction, and the overall quality.

Original language	English (US)
Pages (from-to)	451-455
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
State	Published - Jan 1 2013
Event	14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France Duration: Aug 25 2013 → Aug 29 2013

Keywords

ADMM
Contextual effects
Dictionary learning
STSA
Speech enhancement

OpenUrl availability

Full text

Cite this

A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning. / Tseng, Hung Wei; Vishnubhotla, Srikanth; Hong, Mingyi et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.01.2013, p. 451-455.

Research output: Contribution to journal › Conference article › peer-review

@article{0c11ae7e20a54f12bea586c92856b43d,

title = "A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning",

abstract = "In this paper, we consider the single-channel speech enhancement problem, in which a clean speech signal needs to be estimated from a noisy observation. To capture the characteristics of both the noise and speech signals, we combine the well-known Short-Time-Spectrum-Amplitude (STSA) estimator with a machine learning based technique called Multi-frame Sparse Dictionary Learning (MSDL). The former utilizes statistical information for denoising, while the latter helps better preserve speech, especially its temporal structure. The proposed algorithm, named STSA-MSDL, outperforms standard statistical algorithms such as the Wiener filter, STSA estimator, as well as dictionary based algorithms when applied to the TIMIT database, using four different objective metrics that measure speech intelligibility, speech distortion, background noise reduction, and the overall quality.",

keywords = "ADMM, Contextual effects, Dictionary learning, STSA, Speech enhancement",

author = "Tseng, {Hung Wei} and Srikanth Vishnubhotla and Mingyi Hong and Xiangfeng Wang and Jinjun Xiao",

year = "2013",

month = jan,

day = "1",

language = "English (US)",

pages = "451--455",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

note = "14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 ; Conference date: 25-08-2013 Through 29-08-2013",

}

TY - JOUR

T1 - A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning

AU - Tseng, Hung Wei

AU - Vishnubhotla, Srikanth

AU - Hong, Mingyi

AU - Wang, Xiangfeng

AU - Xiao, Jinjun

PY - 2013/1/1

Y1 - 2013/1/1

N2 - In this paper, we consider the single-channel speech enhancement problem, in which a clean speech signal needs to be estimated from a noisy observation. To capture the characteristics of both the noise and speech signals, we combine the well-known Short-Time-Spectrum-Amplitude (STSA) estimator with a machine learning based technique called Multi-frame Sparse Dictionary Learning (MSDL). The former utilizes statistical information for denoising, while the latter helps better preserve speech, especially its temporal structure. The proposed algorithm, named STSA-MSDL, outperforms standard statistical algorithms such as the Wiener filter, STSA estimator, as well as dictionary based algorithms when applied to the TIMIT database, using four different objective metrics that measure speech intelligibility, speech distortion, background noise reduction, and the overall quality.

AB - In this paper, we consider the single-channel speech enhancement problem, in which a clean speech signal needs to be estimated from a noisy observation. To capture the characteristics of both the noise and speech signals, we combine the well-known Short-Time-Spectrum-Amplitude (STSA) estimator with a machine learning based technique called Multi-frame Sparse Dictionary Learning (MSDL). The former utilizes statistical information for denoising, while the latter helps better preserve speech, especially its temporal structure. The proposed algorithm, named STSA-MSDL, outperforms standard statistical algorithms such as the Wiener filter, STSA estimator, as well as dictionary based algorithms when applied to the TIMIT database, using four different objective metrics that measure speech intelligibility, speech distortion, background noise reduction, and the overall quality.

KW - ADMM

KW - Contextual effects

KW - Dictionary learning

KW - STSA

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=84906222524&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906222524&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84906222524

SN - 2308-457X

SP - 451

EP - 455

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013

Y2 - 25 August 2013 through 29 August 2013

ER -

A single channel speech enhancement approach by combining statistical criterion and Multi-frame Sparse Dictionary Learning

Abstract

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this