Research Projects

MALORCA: Machine Learning of Speech Recognition Models for Controller Assistance

Postdoctoral research, Idiap Research Institute (Sep 2016 - Feb 2018)

MALORCA is a Horizon 2020 SESAR project (MALORCA website).

Abstract: Air traffic control (ATC) instructions are usually still given via voice communication to the pilots. But ATC systems, to be safe and efficient, need up-to-date data. Therefore, it requires lots of inputs from the air traffic controllers (ATCOs) to keep the system data correct. The projects AcListant® and AcListant®-Strips have shown that Assistant Based Speech Recognition (ABSR) can significantly reduce controllers’ workload and increase ATM efficiency. One main issue to transfer ABSR from the laboratory to the operational systems are the costs of deployment, because modern speech recognition models require manual adaptation to local requirements (local accents, phraseology deviations, environmental constraints etc.). MALORCA proposes a general, cheap and effective solution to automate this re-learning, adaptation and customisation process by automatically learning local speech recognition and controllers models from radar and speech data recordings. (Description from website)

A Data-driven Bayesian Approach to Automatic Rhythm Analysis of Indian Art Music

PhD thesis, part of the CompMusic project (Oct 2012 - Aug 2016)

Advisor: Prof. Xavier Serra, Music Technology Group.

CompMusic website

My work in CompMusic

Thesis companion page

Abstract: Large and growing collections of a wide variety of music are now available on demand to music listeners, necessitating novel ways of automatically structuring these collections using different dimensions of music. Rhythm is one of the basic music dimensions and its automatic analysis, which aims to extract musically meaningful rhythm related information from music, is a core task in Music Information Research (MIR).

Musical rhythm, similar to most musical dimensions, is culture-specific and hence its analysis requires culture-aware approaches. Indian art music is one of the major music traditions of the world and has complexities in rhythm that have not been addressed by the current state of the art in MIR, motivating us to choose it as the primary music tradition for study. Our intent is to address unexplored rhythm analysis problems in Indian art music to push the boundaries of the current MIR approaches by making them culture-aware and generalizable to other music traditions.

The thesis aims to build data-driven signal processing and machine learning approaches for automatic analysis, description and discovery of rhythmic structures and patterns in audio music collections of Indian art music. After identifying challenges and opportunities, we present several relevant research tasks that open up the field of automatic rhythm analysis of Indian art music. Data-driven approaches require well curated data corpora for research and efforts towards creating such corpora and datasets are documented in detail. We then focus on the topics of meter analysis and percussion pattern discovery in Indian art music.

Meter analysis aims to align several hierarchical metrical events with an audio recording. Meter analysis tasks such as meter inference, meter tracking and informed meter tracking are formulated for Indian art music. Different Bayesian models that can explicitly incorporate higher level metrical structure information are evaluated for the tasks and novel extensions are proposed. The proposed methods overcome the limitations of existing approaches and their performance indicate the effectiveness of informed meter analysis.

Percussion in Indian art music uses onomatopoeic oral mnemonic syllables for the transmission of repertoire and technique, providing a language for percussion. We use these percussion syllables to define, represent and discover percussion patterns in audio recordings of percussion solos. We approach the problem of percussion pattern discovery using hidden Markov model based automatic transcription followed by an approximate string search using a data derived percussion pattern library. Preliminary experiments on Beijing opera percussion patterns, and on both tabla and mridangam solo recordings in Indian art music demonstrate the utility of percussion syllables, identifying further challenges to building practical discovery systems.

The technologies resulting from the research in the thesis are a part of the complete set of tools being developed within the CompMusic project for a better understanding and organization of Indian art music, aimed at providing an enriched experience with listening and discovery of music. The data and tools should also be relevant for data-driven musicological studies and other MIR tasks that can benefit from automatic rhythm analysis.

Culture Aware MUsic Technologies (CAMUT)

Music Technology Group (Sep 2015 - Aug 2016)

CAMUT website

ERC Proof of Concept project aiming at prototyping and commercialization of some of the technologies developed under the CompMusic project in India. For more details, please see:

A spinoff from CAMUT from some CompMusic members and focusing on music education is MusicMuni Labs

Predictive Modeling of Music

Research assistant, Georgia Tech Center for Music Technology (Aug 2011 - May 2012)

Advisor: Dr. Parag Chordia, Georgia Tech Center for Music Technology

Previous research has shown that ensembles of variable length Markov models (VLMMs), known as Multiple Viewpoint Models (MVMs), can be used to predict the continuation of Western tonal melodies, and outperform simpler, fixed-order Markov models. We explored the use of this technique to predicting melodic continuation in North Indian classical music, providing further evidence that MVMs are an effective means for modeling temporal structure in a wide variety of musical systems.

For this work, we built bandishDB, a dataset of symbolic scores of North Indian Classcial vocal compositions. The up-to-date dataset can be obtained here as a zipped archive: bandishDB

Reverberant Speech Processing for Hubot Communication

M.E.(Signal Processing) Final Project at IISc, Bangalore (May 2009 - June 2010)

Advisor: Prof. T V Sreenivas, Professor, Dept. of E.C.E, IISc

Abstract: Human-machine interaction through speech is a natural and desired form of communication for humans. We explore the signal processing problems for such a Human-Robot (Hubot) speech communication. The target acoustic environment is a closed room with moving sources and moving listeners. The most important degradation in such a hands-free distant microphone recorded speech in the natural room environment is by reverberation, which we focus upon in the present work. We consider the case of a stationary single source and multiple stationary distant microphones. We mainly focus on reverberant speech enhancement and Isolated Word Recognition (IWR) on a small confusable vocabulary.

We develop a general Iterative Multi-channel Wiener Filter (IMWF) framework for speech enhancement by extending the classical iterative wiener filter. We show that multi-channel measurement provides an advantage for diffused additive noise in speech, both in the case of same SNR over all the channels and that of different SNR in all channels. A clean speech VQ codebook is effective for introducing intra-frame constraints and for improving the convergence of IWF. Further, this codebook constraint improves the convergence properties of the IMWF.

Motivated by the Human Auditory System (HAS), we explore the use of frequency warped signal processing in speech enhancement and recognition. Warped Linear Prediction (WLP) and Multi-channel Warped Wiener Filter (MWWF) are explored and studied for their performance but, as shown by experiments, frequency warping is of limited success. However, the use of Warped Linear Prediction Cepstral Coefficients (w-LPCC) coefficients as the feature vectors in IWR improves the recognition performance for reverberant speech.

Reverberant speech has poorer intelligibility and has poorer performance in Automatic Speech Recognition (ASR) systems. The late reverberations are the main cause for this degradation. Reverberant speech enhancement is thus, an important pre-processing step. A few effective algorithms for dereverberation of multi-channel speech based on linear prediction are studied. A novel Multi-channel Iterative Dereverberation (MID) algorithm based on Codebook Constrained Iterative Multi-channel Wiener Filter (CCIMWF) is proposed. The late reverberations are estimated using Long-term Multi-step Linear Prediction (LTMLP). This estimate is used in CCIMWF framework through a doubly iterative formulation. A variant of CCIMWF called the j-CCIMWF using a clean speech VQ codebook is proposed for the multi-channel dereverberation. The MID algorithm is used for dereverberation of simulated multi-channel reverberant speech. The signal-to-reverberation ratio (SRR) and log spectral distance (LSD) measures improve through the double-iterations, showing that the algorithm suppresses the effect of late reverberations and improves speech quality and intelligibility. The algorithm also has good convergence properties through the iterations.

Robust features which are less sensitive to reverberation are explored. MFCC, LPC, warped LPC (w-LPC) and w-LPCC are explored for robustness to reverberation. The features are tested in an IWR experiment with a 10-word confusable vocabulary with simultaneously recorded clean and reverberant channels. w-LPCC shows better performance when compared to MFCC for the train-test mismatch conditions with clean and reverberant speech. The confusability of phonemes due to smearing by reverberation is studied. We see that, in the studied set of phonemes, the nasals and the glottal phonemes are largely affected by reverberation.

Detailed characterization of room reverberation, handling of moving sources/microphones, multiple sources, source localization in a reverberant environment, handling out of vocabulary (OOV) words in IWR are to be addressed as a part of future work.

Please contact me if you wish to read through my dissertation. The cover page of the report was made using Inkscape. It can be seen here.

Adaptive Active Noise Control

I did this research project as a part of Summer Undergraduate Research Grant for Excellence (SURGE) - 2007 from May - July 2007 at IIT Kanpur.

The project application is to headsets: to develop noise cancellation headsets. The implementation was done using TMS320C6713 DSP Starter Kit

Advisor: Dr. Laxmidhar Behera, Associate Professor, Dept. of Electrical Engineering, IIT Kanpur

Abstract: Active Noise Control (ANC) is a technique of acoustic noise reduction using a secondary source of sound which produces “antinoise” that destructively interferes with the primary noise, thereby reducing it. ANC is useful at low frequency range (less than 600Hz) where passive noise control devices like mufflers and silencers become bulky and ineffective. In this project, narrowband single channel feedback ANC for headsets application is focused upon. This is suitable for reducing low frequency periodic noise like that from an engine or a pump. This architecture uses a single error microphone and a single canceling headphone speaker with a controller using an adaptive algorithm to generate antinoise. The secondary path effects due to the acoustic environment of the system are incorporated into the system. The secondary path modeling is done using Least Mean Squares (LMS) algorithm. The ANC is implemented using the feedback form of Filtered-X LMS (FXLMS) algorithm. The algorithm verification is done on MATLAB and the system is simulated on C platform. Real time implementation of the system is done using TMS320C6713 DSP Starter Kit (DSK). The simulation results using a two tone noise signal show a noise reduction of 16.19dB. The real time practical setup using DSK shows a noise reduction of 8.57 dB under the same testing conditions. The inferior performance of the practical system is attributed to the acoustic environment of the system which is complex to model. Advanced adaptive algorithms like Kalman Filter is also implemented for ANC but the implementation fails to be real time due to DSK hardware limitations.

You can view the full project report here.

You can view the project poster here.

More details about SURGE programme of IIT Kanpur can be found here.