Writing Samples

1. Reverberant Speech Processing for Hubot Communication

M.E (Signal Processing) Project Report, Dept. of ECE, IISc, June 2010.

Advisor: Prof. T V Sreenivas, Dept. of ECE, IISc

The full report can be found here.

Abstract:Human-machine interaction through speech is a natural and desired form of communication for humans. We explore the signal processing problems for such a Human-Robot (Hubot) speech communication. The target acoustic environment is a closed room with moving sources and moving listeners. The most important degradation in such a hands-free distant microphone recorded speech in the natural room environment is by reverberation, which we focus upon in the present work. We consider the case of a stationary single source and multiple stationary distant microphones. We mainly focus on reverberant speech enhancement and Isolated Word Recognition (IWR) on a small confusable vocabulary.

We develop a general Iterative Multi-channel Wiener Filter (IMWF) framework for speech enhancement by extending the classical iterative wiener filter. We show that multi-channel measurement provides an advantage for diffused additive noise in speech, both in the case of same SNR over all the channels and that of different SNR in all channels. A clean speech VQ codebook is effective for introducing intra-frame constraints and for improving the convergence of IWF. Further, this codebook constraint improves the convergence properties of the IMWF.

Motivated by the Human Auditory System (HAS), we explore the use of frequency warped signal processing in speech enhancement and recognition. Warped Linear Prediction (WLP) and Multi-channel Warped Wiener Filter (MWWF) are explored and studied for their performance but, as shown by experiments, frequency warping is of limited success. However, the use of Warped Linear Prediction Cepstral Coefficients (w-LPCC) coefficients as the feature vectors in IWR improves the recognition performance for reverberant speech.

Reverberant speech has poorer intelligibility and has poorer performance in Automatic Speech Recognition (ASR) systems. The late reverberations are the main cause for this degradation. Reverberant speech enhancement is thus, an important pre-processing step. A few effective algorithms for dereverberation of multi-channel speech based on linear prediction are studied. A novel Multi-channel Iterative Dereverberation (MID) algorithm based on Codebook Constrained Iterative Multi-channel Wiener Filter (CCIMWF) is proposed. The late reverberations are estimated using Long-term Multi-step Linear Prediction (LTMLP). This estimate is used in CCIMWF framework through a doubly iterative formulation. A variant of CCIMWF called the j-CCIMWF using a clean speech VQ codebook is proposed for the multi-channel dereverberation. The MID algorithm is used for dereverberation of simulated multi-channel reverberant speech. The signal-to-reverberation ratio (SRR) and log spectral distance (LSD) measures improve through the double-iterations, showing that the algorithm suppresses the effect of late reverberations and improves speech quality and intelligibility. The algorithm also has good convergence properties through the iterations.

Robust features which are less sensitive to reverberation are explored. MFCC, LPC, warped LPC (w-LPC) and w-LPCC are explored for robustness to reverberation. The features are tested in an IWR experiment with a 10-word confusable vocabulary with simultaneously recorded clean and reverberant channels. w-LPCC shows better performance when compared to MFCC for the train-test mismatch conditions with clean and reverberant speech. The confusability of phonemes due to smearing by reverberation is studied. We see that, in the studied set of phonemes, the nasals and the glottal phonemes are largely affected by reverberation.

Detailed characterization of room reverberation, handling of moving sources/microphones, multiple sources, source localization in a reverberant environment, handling out of vocabulary (OOV) words in IWR are to be addressed as a part of future work.

2. Publications:

1. Ajay S. and T.V. Sreenivas, "Multi-channel Iterative Dereverberation based on Codebook Constrained Iterative Multi-channel Wiener Filter," in Proceedings of INTERSPEECH 2010, Makuhari, Japan, September 2010. [pdf]

Abstract:A novel Multi-channel Iterative Dereverberation (MID) algorithm based on Codebook Constrained Iterative Multi-channel Wiener Filter (CCIMWF) is proposed. We extend the classical iterative wiener filter (IWF) to the multi-channel dereverberation case. The late reverberations are estimated using Long-term Multi-step Linear Prediction (LTMLP). This estimate is used in CCIMWF framework through a doubly iterative formulation. A clean speech VQ codebook is effective for inducing intra-frame constraints and improve the convergence of IWF, thus, a joint-CCIMWF algorithm is proposed for the multi-channel case. The signal to reverberation ratio (SRR) and log spectral distortion (LSD) measures improve through the double-iterations, showing that the algorithm suppresses the effect of late reverberations and improves speech quality and intelligibility. The algorithm also has fair convergence properties through the iterations.

The poster can be seen here.

2. Veeravasantarao D, Ajay S, Prem Kumar P, Laxmidhar Behera, "Adaptive Active Noise Control Schemes for Headset Applications", in Proceedings of 17th IFAC World Congress, Seoul, July 2008.[pdf]

Abstract: This paper presents the design and implementation of adaptive feedback active noise control system (ANC) for head phone applications. Active Noise Control is a technique of acoustic noise reduction using a secondary source of sound which produces ”antinoise” to cancel the primary noise. In this paper, narrow band single channel feedback ANC for headsets application is focused upon. The filter weights are updated by using the feedback form of Filtered-X LMS (FXLMS) algorithm. Performances for the IIR-based filter, FIR-based filter are compared with those of the algorithm by using ADALINE. The real time implementation of the system is performed using TMS320C6713 DSP Starter Kit (DSK) and the performance is analyzed for two tone sinusoidal noise.

A link to the proceedings is here. The poster is here.

[BACK]