Elements of Methodology of Precision Phonetic Analysis of Oral Phonograms
DOI:
https://doi.org/10.31649/1997-9266-2022-162-3-36-51Keywords:
computer linguistics, classification of language units, automated transcription, phonetic analysis of speechAbstract
The study of the cornerstone of modern linguistics - the process of speech and textual interpersonal communication, given the size of the infosphere of the twenty-first century, is impossible without a sound and purposeful involvement of information technology from other fields of knowledge, including computer science. The resulting relatively young science, computational linguistics, aims to automatically analyze natural languages in all spectra of their implementations. Among the long list of topical issues actively studied in the paradigm of computational linguistics, we mention the automation of compilation and linguistic processing of language corpora, automated classification and abstracting of documents, creating accurate linguistic models of natural languages, extraction of factual information from informal linguistic data. An effective, strictly formalized methodology for computational phonetic analysis of linguistic information, especially speech information, is potentially a driving force for improving the results of solving these research problems. This thesis is fully consistent with the content of the article, which proves the relevance of the presented scientific and applied results. Accordingly, the paper presents elements of the methodology of precision phonetic analysis of phonograms of oral speech, taking into account the phenomenon of phonetic fusion. The mathematical apparatus of the created methods is based on the provisions of the theory of pattern recognition, information theory and acoustic theory of language formation. This basis provided the basis for a system of analytical formalization of the problem of multicriteria of the process of recognition of language units of human speech. As a result, a method for reliable clustering of personal phonetic alphabets of speakers is presented. A method for detecting potentially unreliable classified speech units and adjusting the results of the process of automated transcription of speech signals is also presented. A method for estimating the influence of the medium of propagation of the studied speech signals on the transcription result is also proposed.
References
A. Mandal, Kumar Prasanna, and P. K. R. Mitra, “Recent developments in spoken term detection: a survey,” Int. J. Speech Technol 17, pp. 183-198, 2014. https://doi.org/10.1007/s10772-013-9217-1 .
C. China Bhanja, M. A. Laskar, and R. H. Laskar, “Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system,” Lang Resources & Evaluation, 2021. https://doi.org/10.1007/s10579-020-09527-z .
S. S. Agrawal, A. Jain, and S. Sinha, “Analysis and modeling of acoustic information for automatic dialect classification,” Int. J. Speech Technol 19, pp. 593-609, 2016. https://doi.org/10.1007/s10772-016-9351-7 .
S. Gholamdokht Firooz, S. Reza, and Y. Shekofteh, “Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results,” Int. J. Speech Technol 21, pp. 649-657, 2018. https://doi.org/10.1007/s10772-018-9526-5 .
D. Duran, et al. “A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning,” Res on Lang and Comput , no. 8, pp. 133-168, 2010. https://doi.org/10.1007/s11168-011-9075-4 .
D. Mirman, “Mechanisms of Semantic Ambiguity Resolution: Insights from Speech Perception,” Res on Lang and Comput no.6, pp. 293-309, 2008. https://doi.org/10.1007/s11168-008-9055-5 .
E. M. Bender, et al. “Grammar Customization,” Res on Lang and Comput no. 8, pp. 23-72, 2010. https://doi.org/10.1007/s11168-010-9070-1 .
M. Dickinson, “On Morphological Analysis for Learner Language, Focusing on Russian,” Res on Lang and Comput no. 8, pp. 273, 2010. https://doi.org/10.1007/s11168-011-9079-0 .
, S. Moran, E. Grossman, and A. Verkerk, “Investigating diachronic trends in phonological inventories using BDPROTO,” Lang Resources & Evaluation no. 55, pp. 79-103, 2021. https://doi.org/10.1007/s10579-019-09483-3 .
C. van Bael, H. van den Heuvel, and H. Strik, “Validation of phonetic transcriptions in the context of automatic speech recognition,” Lang Resources & Evaluation no. 41, pp. 129-146, 2007. https://doi.org/10.1007/s10579-007-9033-9 .
N. B. Chittaragi, S. G. Koolagudi, “Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms,” Lang Resources & Evaluation no. 54, pp. 553-585, 2020. https://doi.org/10.1007/s10579-019-09481-5
L. Pearl, S. Goldwater, and M. Steyvers, “Online Learning Mechanisms for Bayesian Models of Word Segmentation,” Res on Lang and Comput no. 8, pp. 107-132, 2010. https://doi.org/10.1007/s11168-011-9074-5 .
M. Kurimo, et al. “Modeling under-resourced languages for speech recognition,” Lang Resources & Evaluation no. 51, pp. 961-987, 2017. https://doi.org/10.1007/s10579-016-9336-9 .
A. Masmoudi, et al. “Automatic speech recognition system for Tunisian dialect,” Lang Resources & Evaluation no. 52, pp. 249-267, 2018. https://doi.org/10.1007/s10579-017-9402-y .
W. Elvira-García, et al. “A tool for automatic transcription of intonation: Eti_ToBI a ToBI transcriber for Spanish and Catalan. Lang Resources & Evaluation,” no. 50, pp. 767-792, 2016. https://doi.org/10.1007/s10579-015-9320-9 .
H. Strik, M. Hulsbosch, and C. Cucchiarini, “Analyzing and identifying multiword expressions in spoken language,” Lang Resources & Evaluation no. 44, pp. 41-58, 2010. https://doi.org/10.1007/s10579-009-9095-y .
M. Aissiou, “A genetic model for acoustic and phonetic decoding of standard arabic vowels in continuous speech,” Int J Speech Technol no. 23, pp. 425-434, 2020. https://doi.org/10.1007/s10772-020-09694-y .
C. Santhosh Kumar, V. P. Mohandas, “Robust features for multilingual acoustic modeling,” Int J Speech Technol no. 14, pp. 147-155, 2011. https://doi.org/10.1007/s10772-011-9092-6 .
N. B. Chittaragi, S. G. Koolagudi, “Acoustic-phonetic feature based Kannada dialect identification from vowel sounds,” Int J Speech Technol no. 22, pp. 1099-1113, 2019. https://doi.org/10.1007/s10772-019-09646-1 .
N. T. Kleynhans, E. Barnard, “Efficient data selection for ASR,” Lang Resources & Evaluation no. 49, pp. 327-353, 2015. https://doi.org/10.1007/s10579-014-9285-0 .
C. Clavel, et al. “Spontaneous speech and opinion detection: mining call-centre transcripts,” Lang Resources & Evaluation no. 47, pp. 1089-1125, 2013. https://doi.org/10.1007/s10579-013-9224-5 .
F. Anitha Florence Vinola, G. Padma, “A probabilistic stochastic model for analysis on the epileptic syndrome using speech synthesis and state space representation,” Int J Speech Technol, no. 23, pp. 35-360, 2020. https://doi.org/10.1007/s10772-020-09702-1 .
M. Mehrabani, J. H. L. Hansen, “Automatic analysis of dialect/language sets,” Int J Speech Technol no. 18, pp. 277-286, 2015. https://doi.org/10.1007/s10772-014-9268-y .
X. Ma, “Evocation: analyzing and propagating a semantic link based on free word association,” Lang Resources & Evaluation no. 47, pp. 819-837, 2013. https://doi.org/10.1007/s10579-013-9219-2 .
J. Chaki “Pattern analysis based acoustic signal processing: a survey of the state-of-art,” Int J Speech Technol, 2020. https://doi.org/10.1007/s10772-020-09681-3 .
K. B. Bhangale, and K. Mohanaprasad, “A review on speech processing using machine learning paradigm,” Int J Speech Technol no. 24, pp. 367-388, 2021. https://doi.org/10.1007/s10772-021-09808-0 .
P. Verma, and P. K. Das, “i-Vectors in speech processing applications: a survey,” Int J Speech Technol, no. 8, pp. 529-546, 2015. https://doi.org/10.1007/s10772-015-9295-3 .
T. Drugman, and N. Dutoit, “The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications,” IEEE Transactions on Audio, Speech, and Language Processing, 20, no. 3, pp. 968-981, 2012. https://doi.org/1109/TASL.2011.2169787 .
X. Chen, and C. Bao, “Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, no. 29, pp. 1243-1255, 2021. https://doi.org/10.1109/TASLP.2021.3065202 .
I. Omer, M. Zampieri, and M. Oakes, “Phonetic differences for dialect clustering,” 9th International Conference on Information and Communication Systems (ICICS), 2018, pp. 145-150. https://doi.org/10.1109/IACS.2018.8355457 .
H. Van hamme, “Phonetic analysis of a computational model for vocabulary acquisition from auditory inputs,” IEEE International Conference on Development and Learning (ICDL), 2011, pp. 1-6. https://doi.org/10.1109/DEVLRN.2011.6037365 .
Z. Wang, C. Liu, H. Wang, Y. Hu, and L. Dai, “Phonetic clustering based confidence measure for embedded speech recognition,” in 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 186-189. https://doi.org/10.1109/ISCSLP.2010.5684914 .
P. Kannadaguli, and V. Bhat, “A comparison of Bayesian multivariate modeling and hidden Markov modeling (HMM) based approaches for automatic phoneme recognition in kannada,” Recent and Emerging trends in Computer and Computational Sciences (RETCOMP), 2015, pp. 1-5. https://doi.org/10.1109/RETCOMP.2015.7090795 .
F. A. A. Laleye, E. C. Ezin, and C. Motamed, “Automatic Text-Independent Syllable Segmentation Using Singularity Exponents and Rényi Entropy,” J Sign Process Syst no. 88, pp. 439-451, 2017. https://doi.org/10.1007/s11265-016-1183-9 .
J. Kang, et al. “Lattice Based Transcription Loss for End-to-End Speech Recognition,” J Sign Process Syst no. 90, pp. 1013-1023, 2018. https://doi.org/10.1007/s11265-017-1292-0 .
Y. Qian, et al. “Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications,” J Sign Process Syst no. 92, pp. 805-817, 2020. https://doi.org/10.1007/s11265-019-01484-3 .
Y. Cui, et al. “Simultaneous Predictive Gaussian Classifiers, ”J. Classif no. 33, pp. 73-102, 2016. https://doi.org/10.1007/s00357-016-9197-3 .
O. Bisikalo, O. Boivan, N. Khairova, O. Kovtun, and V. Kovtun, “Precision Automated Phonetic Analysis of Speech Signals for Information Technology of Text-dependent Authentication of a Person by Voice, ” CEUR Workshop Proceedings, no. 2853, pp. 276-288, 2021. urn:nbn:de:0074-2853-7 .
Downloads
-
PDF (Українська)
Downloads: 141
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).