Elements of Methodology of Precision Phonetic Analysis of Oral Phonograms

Authors

  • O. M. Danylchuk Vasyl’ Stus Donetsk National University, Vinnytcia
  • V. V. Kovtun Vinnytsia National Technical University
  • O. D. Nykytenko Vinnytsia National Technical University
  • Yu. Yu. Nestiuk Vinnytsia National Technical University
  • V. V. Prysiazhniuk Vinnytsia National Technical University

DOI:

https://doi.org/10.31649/1997-9266-2022-162-3-36-51

Keywords:

computer linguistics, classification of language units, automated transcription, phonetic analysis of speech

Abstract

The study of the cornerstone of modern linguistics - the process of speech and textual interpersonal communication, given the size of the infosphere of the twenty-first century, is impossible without a sound and purposeful involvement of information technology from other fields of knowledge, including computer science. The resulting relatively young science, computational linguistics, aims to automatically analyze natural languages in all spectra of their implementations. Among the long list of topical issues actively studied in the paradigm of computational linguistics, we mention the automation of compilation and linguistic processing of language corpora, automated classification and abstracting of documents, creating accurate linguistic models of natural languages, extraction of factual information from informal linguistic data. An effective, strictly formalized methodology for computational phonetic analysis of linguistic information, especially speech information, is potentially a driving force for improving the results of solving these research problems. This thesis is fully consistent with the content of the article, which proves the relevance of the presented scientific and applied results. Accordingly, the paper presents elements of the methodology of precision phonetic analysis of phonograms of oral speech, taking into account the phenomenon of phonetic fusion. The mathematical apparatus of the created methods is based on the provisions of the theory of pattern recognition, information theory and acoustic theory of language formation. This basis provided the basis for a system of analytical formalization of the problem of multicriteria of the process of recognition of language units of human speech. As a result, a method for reliable clustering of personal phonetic alphabets of speakers is presented. A method for detecting potentially unreliable classified speech units and adjusting the results of the process of automated transcription of speech signals is also presented. A method for estimating the influence of the medium of propagation of the studied speech signals on the transcription result is also proposed.

Author Biographies

O. M. Danylchuk, Vasyl’ Stus Donetsk National University, Vinnytcia

Cand. Sc. (Еduc.), Associate Professor, Associate Professor of the Chair of Applied Mathematics

V. V. Kovtun, Vinnytsia National Technical University

Dr. Sc. (Eng.), Associate Professor, Professor of the Chair of Computer Control Systems

O. D. Nykytenko, Vinnytsia National Technical University

Cand. Sc. (Eng.), Associate Professor, Associate Professor of the Chair of Computer Control Systems

Yu. Yu. Nestiuk, Vinnytsia National Technical University

Student of the Department of Intelligent Information Technology and Automation

V. V. Prysiazhniuk, Vinnytsia National Technical University

Senior Lecturer of the Chair of Metrology and Industrial Automation

References

A. Mandal, Kumar Prasanna, and P. K. R. Mitra, “Recent developments in spoken term detection: a survey,” Int. J. Speech Technol 17, pp. 183-198, 2014. https://doi.org/10.1007/s10772-013-9217-1 .

C. China Bhanja, M. A. Laskar, and R. H. Laskar, “Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system,” Lang Resources & Evaluation, 2021. https://doi.org/10.1007/s10579-020-09527-z .

S. S. Agrawal, A. Jain, and S. Sinha, “Analysis and modeling of acoustic information for automatic dialect classification,” Int. J. Speech Technol 19, pp. 593-609, 2016. https://doi.org/10.1007/s10772-016-9351-7 .

S. Gholamdokht Firooz, S. Reza, and Y. Shekofteh, “Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results,” Int. J. Speech Technol 21, pp. 649-657, 2018. https://doi.org/10.1007/s10772-018-9526-5 .

D. Duran, et al. “A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning,” Res on Lang and Comput , no. 8, pp. 133-168, 2010. https://doi.org/10.1007/s11168-011-9075-4 .

D. Mirman, “Mechanisms of Semantic Ambiguity Resolution: Insights from Speech Perception,” Res on Lang and Comput no.6, pp. 293-309, 2008. https://doi.org/10.1007/s11168-008-9055-5 .

E. M. Bender, et al. “Grammar Customization,” Res on Lang and Comput no. 8, pp. 23-72, 2010. https://doi.org/10.1007/s11168-010-9070-1 .

M. Dickinson, “On Morphological Analysis for Learner Language, Focusing on Russian,” Res on Lang and Comput no. 8, pp. 273, 2010. https://doi.org/10.1007/s11168-011-9079-0 .

, S. Moran, E. Grossman, and A. Verkerk, “Investigating diachronic trends in phonological inventories using BDPROTO,” Lang Resources & Evaluation no. 55, pp. 79-103, 2021. https://doi.org/10.1007/s10579-019-09483-3 .

C. van Bael, H. van den Heuvel, and H. Strik, “Validation of phonetic transcriptions in the context of automatic speech recognition,” Lang Resources & Evaluation no. 41, pp. 129-146, 2007. https://doi.org/10.1007/s10579-007-9033-9 .

N. B. Chittaragi, S. G. Koolagudi, “Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms,” Lang Resources & Evaluation no. 54, pp. 553-585, 2020. https://doi.org/10.1007/s10579-019-09481-5

L. Pearl, S. Goldwater, and M. Steyvers, “Online Learning Mechanisms for Bayesian Models of Word Segmentation,” Res on Lang and Comput no. 8, pp. 107-132, 2010. https://doi.org/10.1007/s11168-011-9074-5 .

M. Kurimo, et al. “Modeling under-resourced languages for speech recognition,” Lang Resources & Evaluation no. 51, pp. 961-987, 2017. https://doi.org/10.1007/s10579-016-9336-9 .

A. Masmoudi, et al. “Automatic speech recognition system for Tunisian dialect,” Lang Resources & Evaluation no. 52, pp. 249-267, 2018. https://doi.org/10.1007/s10579-017-9402-y .

W. Elvira-García, et al. “A tool for automatic transcription of intonation: Eti_ToBI a ToBI transcriber for Spanish and Catalan. Lang Resources & Evaluation,” no. 50, pp. 767-792, 2016. https://doi.org/10.1007/s10579-015-9320-9 .

H. Strik, M. Hulsbosch, and C. Cucchiarini, “Analyzing and identifying multiword expressions in spoken language,” Lang Resources & Evaluation no. 44, pp. 41-58, 2010. https://doi.org/10.1007/s10579-009-9095-y .

M. Aissiou, “A genetic model for acoustic and phonetic decoding of standard arabic vowels in continuous speech,” Int J Speech Technol no. 23, pp. 425-434, 2020. https://doi.org/10.1007/s10772-020-09694-y .

C. Santhosh Kumar, V. P. Mohandas, “Robust features for multilingual acoustic modeling,” Int J Speech Technol no. 14, pp. 147-155, 2011. https://doi.org/10.1007/s10772-011-9092-6 .

N. B. Chittaragi, S. G. Koolagudi, “Acoustic-phonetic feature based Kannada dialect identification from vowel sounds,” Int J Speech Technol no. 22, pp. 1099-1113, 2019. https://doi.org/10.1007/s10772-019-09646-1 .

N. T. Kleynhans, E. Barnard, “Efficient data selection for ASR,” Lang Resources & Evaluation no. 49, pp. 327-353, 2015. https://doi.org/10.1007/s10579-014-9285-0 .

C. Clavel, et al. “Spontaneous speech and opinion detection: mining call-centre transcripts,” Lang Resources & Evaluation no. 47, pp. 1089-1125, 2013. https://doi.org/10.1007/s10579-013-9224-5 .

F. Anitha Florence Vinola, G. Padma, “A probabilistic stochastic model for analysis on the epileptic syndrome using speech synthesis and state space representation,” Int J Speech Technol, no. 23, pp. 35-360, 2020. https://doi.org/10.1007/s10772-020-09702-1 .

M. Mehrabani, J. H. L. Hansen, “Automatic analysis of dialect/language sets,” Int J Speech Technol no. 18, pp. 277-286, 2015. https://doi.org/10.1007/s10772-014-9268-y .

X. Ma, “Evocation: analyzing and propagating a semantic link based on free word association,” Lang Resources & Evaluation no. 47, pp. 819-837, 2013. https://doi.org/10.1007/s10579-013-9219-2 .

J. Chaki “Pattern analysis based acoustic signal processing: a survey of the state-of-art,” Int J Speech Technol, 2020. https://doi.org/10.1007/s10772-020-09681-3 .

K. B. Bhangale, and K. Mohanaprasad, “A review on speech processing using machine learning paradigm,” Int J Speech Technol no. 24, pp. 367-388, 2021. https://doi.org/10.1007/s10772-021-09808-0 .

P. Verma, and P. K. Das, “i-Vectors in speech processing applications: a survey,” Int J Speech Technol, no. 8, pp. 529-546, 2015. https://doi.org/10.1007/s10772-015-9295-3 .

T. Drugman, and N. Dutoit, “The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications,” IEEE Transactions on Audio, Speech, and Language Processing, 20, no. 3, pp. 968-981, 2012. https://doi.org/1109/TASL.2011.2169787 .

X. Chen, and C. Bao, “Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, no. 29, pp. 1243-1255, 2021. https://doi.org/10.1109/TASLP.2021.3065202 .

I. Omer, M. Zampieri, and M. Oakes, “Phonetic differences for dialect clustering,” 9th International Conference on Information and Communication Systems (ICICS), 2018, pp. 145-150. https://doi.org/10.1109/IACS.2018.8355457 .

H. Van hamme, “Phonetic analysis of a computational model for vocabulary acquisition from auditory inputs,” IEEE International Conference on Development and Learning (ICDL), 2011, pp. 1-6. https://doi.org/10.1109/DEVLRN.2011.6037365 .

Z. Wang, C. Liu, H. Wang, Y. Hu, and L. Dai, “Phonetic clustering based confidence measure for embedded speech recognition,” in 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 186-189. https://doi.org/10.1109/ISCSLP.2010.5684914 .

P. Kannadaguli, and V. Bhat, “A comparison of Bayesian multivariate modeling and hidden Markov modeling (HMM) based approaches for automatic phoneme recognition in kannada,” Recent and Emerging trends in Computer and Computational Sciences (RETCOMP), 2015, pp. 1-5. https://doi.org/10.1109/RETCOMP.2015.7090795 .

F. A. A. Laleye, E. C. Ezin, and C. Motamed, “Automatic Text-Independent Syllable Segmentation Using Singularity Exponents and Rényi Entropy,” J Sign Process Syst no. 88, pp. 439-451, 2017. https://doi.org/10.1007/s11265-016-1183-9 .

J. Kang, et al. “Lattice Based Transcription Loss for End-to-End Speech Recognition,” J Sign Process Syst no. 90, pp. 1013-1023, 2018. https://doi.org/10.1007/s11265-017-1292-0 .

Y. Qian, et al. “Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications,” J Sign Process Syst no. 92, pp. 805-817, 2020. https://doi.org/10.1007/s11265-019-01484-3 .

Y. Cui, et al. “Simultaneous Predictive Gaussian Classifiers, ”J. Classif no. 33, pp. 73-102, 2016. https://doi.org/10.1007/s00357-016-9197-3 .

O. Bisikalo, O. Boivan, N. Khairova, O. Kovtun, and V. Kovtun, “Precision Automated Phonetic Analysis of Speech Signals for Information Technology of Text-dependent Authentication of a Person by Voice, ” CEUR Workshop Proceedings, no. 2853, pp. 276-288, 2021. urn:nbn:de:0074-2853-7 .

Downloads

Abstract views: 188

Published

2022-06-30

How to Cite

[1]
O. M. Danylchuk, V. V. Kovtun, O. D. Nykytenko, Y. Y. . Nestiuk, and V. V. Prysiazhniuk, “Elements of Methodology of Precision Phonetic Analysis of Oral Phonograms”, Вісник ВПІ, no. 3, pp. 36–51, Jun. 2022.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)

1 2 3 > >>