Research Journal of Applied Sciences

Year: 2021
Volume: 16
Issue: 2
Page No. 75 - 84

A Survey: Text Independent Automatic Speech Segmentation Techniques

Authors : Ihsan Al-Hassani, Oumayma Al-Dakkak and Abdlnaser Assami

References

Adell, J. and A. Bonafonte, 2004. Towards Phone Segmentation for Concatenative Speech Synthesis. Proceedings of the 5th ISCA Workshop on Speech Synthesis, June 14-16, 2004, Institute of Singapore Chartered Accountants, Pittsburgh, Pennsylvania, pp: 139-144.

Almpanidis, G. and C. Kotropoulos, 2008. Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion. Speech Commun., 50: 38-55.
CrossRef  |  Direct Link  |  

Almpanidis, G., M. Kotti and C. Kotropoulos, 2019. Robust detection of phone boundaries using model selection criteria with few observations. IEEE. Trans. Audio Speech Lang. Process., 17: 287-298.
CrossRef  |  Direct Link  |  

Brugnara, F., D. Falavigna and M. Omologo, 1993. Automatic segmentation and labeling of speech based on hidden Markov models. Speech Commun., 12: 357-370.
CrossRef  |  

Chappell, D.T. and J.H. Hansen, 2002. A comparison of spectral smoothing methods for segment concatenation based speech synthesis. Speech Commun., 36: 343-373.
CrossRef  |  Direct Link  |  

Chen, L., X. Mao and H. Yan, 2016. Text-independent phoneme segmentation combining egg and speech data. IEEE/ACM. Trans. Audio Speech Langu. Process., 24: 1029-1037.
CrossRef  |  Direct Link  |  

Chih-Kuan, Y., J. Chen, C. Yu and D. Yu, 2019. Unsupervised speech recognition via segmental empirical output distribution matching. Proceedings of the International Conference on Representation Learning, (ICLR), May 6-9, 2019, New Orleans, Louisiana, pp: 1-14.

Delacourt, P. and C.J. Wellekens, 2000. DISTBIC: A speaker-based segmentation for audio data indexing. Speech Commun., 32: 111-126.
CrossRef  |  Direct Link  |  

Dinler, O.B. and N. Aydin, 2020. An optimal feature parameter set based on gated recurrent unit recurrent neural networks for speech segment detection. Appl. Sci., Vol. 10, 10.3390/app10041273

Dusan, S. and L. Rabiner, 2006. On the relation between maximum spectral transition positions and phone boundaries. Proceedings of the 9th International Conference on Spoken Language Processing, September 17-21, 2006, Interspeech, Rutgers University, New Jersey, USA., pp: 645-648.

Esposito, A. and G. Aversano, 2004. Text Independent Methods for Speech Segmentation. In: Nonlinear Speech Modeling and Applications, Chollet, G., A. Esposito, M. Faundez-Zanuy and M. Marinaro (Eds.)., Springer, Berlin, Germany, pp: 261-290.

Franke, J., M. Mueller, F. Hamlaoui, S. Stueker and A. Waibel, 2016. Phoneme boundary detection using deep bidirectional LSTMS. Proceedings of the Speech Communication: 12. ITG Symposium, October 5-7, 2016, VDE, Paderborn, Germany, pp: 1-5.

Frihia, H. and H. Bahi, 2017. HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol., 20: 563-573.
CrossRef  |  Direct Link  |  

Glass, J.R., 2003. A probabilistic framework for segment-based speech recognition. Comput. Speech Lang., 17: 137-152.
CrossRef  |  Direct Link  |  

Hemert, J.P.V., 1991. Automatic segmentation of speech. IEEE. Trans. Signal Process., 39: 1008-1012.
CrossRef  |  Direct Link  |  

Hoang, D.T. and H.C. Wang, 2015. Blind phone segmentation based on spectral change detection using Legendre polynomial approximation. J. Acoust. Soc. Am., 137: 797-805.
CrossRef  |  Direct Link  |  

Hosom, J.P., 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Commun., 51: 352-368.
CrossRef  |  Direct Link  |  

Javed, M., M.M.A. Baig and S.A. Qazi, 2020. Unsupervised phonetic segmentation of classical Arabic speech using forward and inverse characteristics of the vocal tract. Arab. J. Sci. Eng., 45: 1581-1597.
CrossRef  |  Direct Link  |  

Kamper, H., K. Livescu and S. Goldwater, 2017. An embedded segmental k-means model for unsupervised segmentation and clustering of speech. Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 16-20, 2017, IEEE, Okinawa, Japan, pp: 719-726.

Khanagha, V., K. Daoudi, O. Pont and H. Yahia, 2014. Phonetic segmentation of speech signal using local singularity analysis. Digital Signal Process., 35: 86-94.
CrossRef  |  Direct Link  |  

Kreuk, F., J. Keshet and Y. Adi, 2020. Self-supervised contrastive learning for unsupervised phoneme segmentation. Proc. Interspeech, 1: 3700-3704.

Kreuk, F., Y. Sheena, J. Keshet and Y. Adi, 2020. Phoneme boundary detection using learnable segmental features. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 4-8, 2020, IEEE, Barcelona, Spain, pp: 8089-8093.

Ljolje, A., J. Hirschberg and J.P.H.V. Santen, 1997. Automatic Speech Segmentation for Concatenative Inventory Selection. In: Progress in Speech Synthesis, Santen, J.P.H.V., J.P. Olive, R.W. Sproat and J. Hirschberg (Eds.)., Springer, Berlin, Germany, pp: 305-311.

Lu, L., L. Kong, C. Dyer, N.A. Smith and S. Renals, 2016. Segmental recurrent neural networks for end-to-end speech recognition. Comput. Lang., Vol. 1,

Mporas, I., T. Ganchev and N. Fakotakis, 2010. Speech segmentation using regression fusion of boundary predictions. Comput. Speech Lang., 24: 273-288.
CrossRef  |  Direct Link  |  

Pellom, B.L. and J.H. Hansen, 1998. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech Commun., 25: 97-116.
CrossRef  |  

Qiao, Y. and N. Minematsu, 2008. Metric learning for unsupervised phoneme segmentation. Proceedings of the 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Interspeech, Tokyo, Japan, pp: 1060-1063.

Qiao, Y., D. Luo and N. Minematsu, 2013. Unsupervised optimal phoneme segmentation: Theory and experimental evaluation. IET Signal Process., 7: 577-586.
CrossRef  |  Direct Link  |  

Qiao, Y., N. Shimomura and N. Minematsu, 2008. Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, March 31-4, 2008, IEEE, Las Vegas, New York, USA., pp: 3989-3992.

Ramteke, P.B. and S.G. Koolagudi, 2019. Phoneme boundary detection from speech: A rule based approach. Speech Commun., 107: 1-17.
CrossRef  |  Direct Link  |  

Rasanen, O.J., U.K. Laine and T. Altosaar, 2009. An improved speech segmentation quality measure: The R-value. Proceedings of the 10th Annual Conference of the International Speech Communication Association, September 6-10, 2009, Interspeech, Shanghai, China, pp: 1851-1854.

Sahu, P.K., A. Biswas, A. Bhowmick and M. Chandra, 2014. Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition. Eng. Sci. Technol. Int. J., 17: 145-151.
CrossRef  |  Direct Link  |  

Sarma, B.D. and S.M. Prasanna, 2018. Acoustic-phonetic analysis for speech recognition: A review. IETE. Tech. Rev., 35: 305-327.
CrossRef  |  Direct Link  |  

Teng, P., X. Liu and Y. Jia, 2013. Text-Independent Phoneme Segmentation Via Learning Critical Acoustic Change Points. In: Intelligent Science and Big Data Engineering, Sun, C., F. Fang, Z. Zhou, W. Yang and Z. Liu (Eds.)., Springer, Berlin, Germany, pp: 54-61.

Toledano, D.T., L.A.H. Gomez and L.V. Grande, 2003. Automatic phonetic segmentation. IEEE. Trans. Speech Audio Process., 11: 617-625.
CrossRef  |  Direct Link  |  

Wang, H., T. Lee, C.C. Leung, B. Ma and H. Li, 2015. Acoustic segment modeling with spectral clustering methods. IEEE/ACM. Trans. Audio Speech Lang. Process., 23`: 264-277.
CrossRef  |  Direct Link  |  

Ziolko, B., S. Manandhar, R.C. Wilson and M. Ziolko, 2006. Wavelet method of speech segmentation. Proceedings of the 2006 14th European Signal Processing Conference, September 4-8, 2006, IEEE, Florence, Italy, pp: 1-5.

Ziolko, B., S. Manandhar, R.C. Wilson and M. Ziolko, 2011. Phoneme segmentation based on wavelet spectra analysis. Arc. Acoust., 36: 29-47.
CrossRef  |  Direct Link  |  

Ziolko, M., J. Galka, B. Ziolko and T. Drwiega, 2010. Perceptual wavelet decomposition for speech segmentation. Proceedings of the 11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, ISCA, Makuhari, Chiba, Japan, pp: 2234-2237.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved