Most Read Research Articles


Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79
Call for Paper - May 2015 Edition
IJCA solicits original research papers for the May 2015 Edition. Last date of manuscript submission is April 20, 2015. Read More

Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping

Print
PDF
International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 40 - Number 3
Year of Publication: 2012
Authors:
Santosh V. Chapaneri
10.5120/5022-7167

Santosh V Chapaneri. Article: Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping. International Journal of Computer Applications 40(3):6-12, February 2012. Full text available. BibTeX

@article{key:article,
	author = {Santosh V. Chapaneri},
	title = {Article: Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {40},
	number = {3},
	pages = {6-12},
	month = {February},
	note = {Full text available}
}

Abstract

In this paper, we propose novel techniques for feature parameter extraction based on MFCC and feature recognition using dynamic time warping algorithm for application in speaker-independent isolated digits recognition. Using the proposed Weighted MFCC (WMFCC), we achieve low computational overhead for the feature recognition stage since we use only 13 weighted MFCC coefficients instead of the conventional 39 MFCC coefficients including the delta and double delta features. In order to capture the trends or patterns that a feature sequence presents during the alignment process, we compute the local and global features using Improved Features for DTW algorithm (IFDTW), rather than using the pure feature values or their estimated derivatives. The experiments based on TI-Digits corpus demonstrate the effectiveness of proposed techniques leading to higher recognition accuracy of 98.13%.

References

  • R. Cox, C. Kamm, L. Rabiner, J. Schroeter, and J. Wilpon, “Speech and language processing for next-millennium communications services”, Proc. of the IEEE, vol. 88, no. 8, Aug 2000
  • D. Jurafsky, and J. Martin, Speech and Language Processing, Prentice Hall, 2000
  • J. Tierney, “A study of LPC analysis of speech in additive noise”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 389-397, 1980
  • A. Paul, D. Das, and M. Kamal, “Bangla speech recognition system using LPC and ANN”, 7th Intl. Conf. Advances in Pattern Recognition, 2009
  • L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  • S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug 1980
  • A. Mishra, M. Chandra, A. Biswas, and S. Sharan, “Robust features for connected Hindi digits recognition”, Intl. Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 4, no. 2, pp. 79-90, June 2011
  • Z.Jun, S. Kwong, W. Gang, and Q. Hong, “Using Mel-frequency cepstral coefficients in missing data technique”, EURASIP Journal on Applied Signal Processing, vol. 2004, no. 3, pp. 340-346, 2004
  • O. W. Kwon, K. Chan, J. Hao, and T. W. Lee, “Emotion recognition by speech signals”, in Proc.8th European Conf. Speech Communication and Technology, pp. 125-128, Geneva, Switzerland, 2003
  • L. Rabiner, B. Juang, S. Levinson, and M. Sondhi, “Recognition of isolated digits using hidden markov models with continuous mixture densities”, AT&T Tech. Journal, 64(6), 1985
  • H. Sakoe, and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
  • L. Rabiner, A. Rosenberg, and S. Levinson, “Considerations in dynamic time warping algorithms for discrete word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
  • W. Fu, X. Yang, and Y. Wang, “Heart sound diagnosis based on DTW and MFCC”, 3rd IEEE Intl. Congress on Image and Signal Processing, pp. 2920-2923, Oct 2010
  • F. Yu, E. Chang, Y. Xu, and H. Shum, “Emotion detection from speech to enrich multimedia content”, in Proc.2nd IEEE Pacific Rim Conf. Multimedia, pp. 550-557, Beijing, China, 2001
  • S. Singh, and E. Rajan, “Vector Quantization approach for speaker recognition using MFCC and inverted MFCC”, International Journal of Computer Applications, vol. 17, no. 1, Mar 2011
  • R. Tato, R. Santos, R. Kompe, and J. Pardo, “Emotional space improves emotion recognition”, in Proc. 7th Intl. Conf. Spoken Language Processing, vol. 3, pp. 2029-2032, Denver, USA, 2002
  • L. Rabiner, and M. Sambur, “An algorithm for determining the endpoints of isolated utterances”, Bell System Technical Journal, vol. 54, no. 2, pp. 297-315, Feb 1975
  • J. Picone, “Signal modeling techniques in speech recognition”, Proc. of the IEEE, vol. 81, no. 9, Sep 1993
  • J. Deller, J. Proakis, and J. Hansen, Discrete Time Processing of Speech Signals, Prentice Hall, NJ, USA, 1993
  • S. Kopparapu, and M. Laxminarayana, “Choice of Mel filter bank in computing MFCC of a resampled speech”, Proc. IEEE Intl. Conf. Information Sciences Signal Processing and their Applications, pp. 121-124, May 2010
  • G. Bekesy, Experiments in Hearing, Mc-Graw Hill, New York, 1960
  • H. Hassanein, and M. Rudko, “On the use of Discrete Cosine Transform in cepstral analysis”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 32, no. 4, pp. 922-925, 1984
  • B. Juang, L. Rabiner, and J. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 765-768, Apr 1986
  • W. Hong, P. Jingui, “Modified MFCCs for robust speaker recognition”, IEEE Intl. Conf. Intelligent Computing and Intelligent Systems, pp. 276-279, Oct 2010
  • W. Junqin, and Y. Junjun, “An improved arithmetic of MFCC in speech recognition system”, IEEE Intl. Conf. Electronics, Communications and Control, pp. 719-722, China, Sep 2011
  • S. Ong, and C. Yang, “A comparative study of text-independent speaker identification using statistical features”, Intl. Journal on Computer Engineering Management, vol. 6, no. 1, 1998
  • F. Itakura, “Minimum prediction residual principle applied to speech recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 52-72, 1975
  • E. Keogh, and M. Pazzani, “Derivative dynamic time warping”, Proc. of the 1st SIAM Intl. Conf. Data Mining, Chicago, USA, 2001
  • S. Salvador, and P. Chan, “FastDTW: toward accurate dynamic time warping in linear time and space”, Proc. of 3rd KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
  • L. Yan-Sheng, and J. Chang-Peng, “Research on improved algorithm of DTW in speech recognition”, IEEE Intl. Conf. Computer Application and System Modeling, pp. 418-421, Oct 2010
  • K. Chanwoo, and S. Kwang-deok, “Robust DTW-based recognition algorithm for hand-held consumer devices”, IEEE Intl. Conf. Consumer Electronics, pp. 433-434, Jan 2005
  • R. Leonard, “A database for speaker-independent digit recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 328-331, Mar 1984