Detection of Voice Activity Based on the Angle of the Slope of the Approximating Line of the Eigenvalues
DOI:
https://doi.org/10.31649/1997-9266-2023-169-4-68-77Keywords:
voice activity detector, speech signal, eigenvalues, noise reductionAbstract
The article discusses a method for detecting voice activity with the aim of improving the effectiveness of noise reduction methods in the conditions of low signal-to-noise ratio. The presence of acoustic disturbances limits the use of VAD (Voice Activity Detection) and degrades the performance. Special attention in the study is given to VAD methods that work in the interest of noise reduction systems, for estimating noise in noisy speech signals. The high efficiency of subspace-based noise reduction methods, based on the Karhunen–Loève transform, has prompted the search for a simple and reliable VAD for them. The method proposed in the article for voice activity detection does not require additional transformations of the noisy speech and facilitates the detection of voice activity in subspace-based noise reduction methods.
The proposed VAD utilizes the slope angle of the approximating line of the adjusted eigenvalues as the classification feature for speech frame classification during voice activity detection. The implementation of this approach involves an adjustable eigenvalue spectrum. By subtracting the noise variance from the eigenvalues of the input data covariance matrix, the reduction of noise energy in the observation is achieved. The use of the improved estimation of the noise variance takes into account the presence of additive noise components in the signal space. An adaptive threshold based on the input signal-to-noise ratio is proposed as the decision criterion in the study. A comparative analysis of the performance of the proposed VAD under the influence of color noise was conducted compared to the G.729 VAD codec. The implementation of the VAD models was done in MATLAB and evaluated using objective parameters for assessing erroneous decisions in noisy conditions. The presented simulation results indicate the effectiveness of the proposed method at low signal-to-noise ratios (down to 0 dB). The proposed method for voice activity detection increases speech detection accuracy and reduces the number of VAD erroneous decisions. The conducted research can be used to improve noise suppression systems.
References
L. R. Rabiner, and R. W. Schafer, Theory and Applications of Digital Speech Processing, Pearson Education, 2011, 1060 p.
Y. Hu, and P. Loizou, “Subjective Comparison of Speech Enhancement Algorithms,” in IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006, vol. 1, pp. I-I. https://doi.org/10.1109/ICASSP.2006.1659980 .
N. Golyandina, and A. Zhigljavsky, Singular spectrum analysis for time series. London: Springer, 2013, 120 p.
V. Vasylyshyn, “Adaptive Complex Singular Spectrum Analysis with Application to Modern Superresolution Methods,” Data-Centric Business and Applications. Cham, 2020. pp. 35-54. https://doi.org/10.1007/978-3-030-43070-2_3.
R. Wang, “Karhunen-Loève transform and principal component analysis,” In Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis. Cambridge: Cambridge University Press, 2012, pp. 412-460. https://doi.org/10.1017/cbo9781139015158.011 .
J. Ramírez, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication, vol. 42, no. 3-4. pp. 271-287, April. 2004. https://doi.org/10.1016/j.specom.2003.10.002 .
M. Sankar, and S. Arun, “Speech Sound Classification and Estimation of Optimal Order of LPC Using Neural Network,” in The 2nd International Conference on Vision, Image and Signal Processing. ACM, 2018. https://doi.org/10.1145/3271553.3271611 .
S. Ozaydin, “Design of a Voice Activity Detection Algorithm based on Logarithmic Signal Energy,” in International Conference on Electrical and Computing Technologies and Applications. Ras Al Khaimah, United Arab Emirates, 2022, pp. 19-22. https://doi.org/10.1109/ICECTA57148.2022.9990492 .
R. Çolak, and R. Akdenіz, “A Novel Voice Activity Detection for Multi-Channel Noise Reduction,” IEEE Access, vol.9. pp. 91017-91026, June. 2021. URL: https://doi.org/10.1109/ACCESS.2021.3086364.
K. Yang, L. Zhu, and W. Shan, “Design of an ultra-low Power MFCC Feature Extraction Circuit with Embedded Speech Activity Detector,” in International Conference on Integrated Circuits, Technologies and Applications. IEEE, 2021 pp. 82-83. URL: https://doi.org/10.1109/ICTA53157.2021.9661980 .
A.Samanta, I.Hatai, and A. Mal, “A Reconfigurable Gaussian Base Normalization Deep Neural Network Design for an Energy-Efficient Voice Activity Detector,” in 2nd International Conference on Communication, Computing and Industry 4.0: conference paper. Bangalore, 2021, pp. 1-6. https://doi.org/10.1109/C2I454156.2021.9689307 .
S. Abdullah, M. Zamani, and A. Demosthenous, “A Discrete wavelet transform-based voice activity detection and noise classification with sub-band selection,” in International Symposium on Circuits and Systems: conference paper. IEEE, 2021, pp. 1-5. https://doi.org/10.1109/iscas51556.2021.9401647 .
V. Neo, S. Weiss, S. McKnight, A. Hogg, and P. Naylor, “Polynomial Eigenvalue Decomposition-Based Target Speaker Voice Activity Detection in the Presence of Competing Talkers,” in International Workshop on Acoustic Signal Enhancement: conference paper. IEEE, 2022, pp. 1-5. https://doi.org/10.1109/IWAENC53105.2022.9914796 .
J. Ghasemi, A. Afzalian, and M.Mollaei, “A Combined Voice Activity Detector Based On Singular Value Decomposition and Fourier Transform,” Signal Processing: An International Journal, vol. 4 (1). pp. 54-61, 2010.
Y. Dongwen, “Robust Voice Activity Detection Based on Noise Eigenspace,” Acoustical Science and Technology, vol. 28, no. 6. pp. 413-423, June. 2007. https://doi.org/10.1250/ast.28.413 .
H. Song, S. Ban, and H. Kim, “Voice activity detection using singular value decomposition-based filter,” in Interspeech: conference paper. ISCA, 2009, pp. 2223-2226. https://doi.org/10.21437/Interspeech.2009-632 .
D. Kim, and J. Chang, “A subspace approach based on embedded prewhitening for voice activity detection,” The Journal of the Acoustical Society of America, vol. 130, no. 5, pp. EL304-EL310, Nov. 2011. https://doi.org/10.1121/1.3638927 .
V. Vasylyshyn, “DOA estimation based on proximity of the roots of several polynomials of superresolution methods,” Advanced Information Systems, vol. 4, no. 3. pp. 80-84, March. 2020. https://doi.org/10.20998/2522-9052.2020.3.10 .
P. Stoica, and Y. Selen, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36-47, July, 2004. https://doi.org/10.1109/MSP.2004.1311138 .
H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control,. vol. 19, no. 6, pp. 716-723, December. 1974. https://doi.org/10.1109/TAC.1974.1100705 .
V. Vasylyshyn, O. Koval, and K. Vasylyshyn, “Speech Enhancement Using Modified SSA,” in IEEE International Conference on Information and Telecommunication Technologies and Radio Electronics: conference paper. IEEE, 2021, pp. 203-206. https://doi.org/10.1109/UkrMiCo52950.2021.9716635 .
В. И. Василишин, «Предварительная обработка сигналов с использованием метода SSA в задачах спектрального анализа,» Прикладная радиоэлектроника, № 13(1), с. 43-50, 2014.
R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transaction on Speech and Audio Processing, vol. 9, no. 5. pp. 504-512, July, 2001. https://doi.org/10.1109/89.928915 .
A noisy speech corpus for evaluation of speech enhancement algorithms NOIZEUS. [Electronic resource]. Available: https://ecs.utdallas.edu/loizou/speech/noizeus. Accessed: 06.06.2023 .
G.729 Voice Activity Detection MATHWORKS. [[Electronic resource]. Available: https://www.mathworks.com/help/dsp/ug/g-729-voice-activity-detection.html . Accessed: 06.06.2023.
D. Freeman, G. Cosier, C. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service,” International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1989, vol. 1, pp. 369-372. https://doi.org/10.1109/ICASSP.1989.266442 .
Downloads
-
PDF (Українська)
Downloads: 42
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).