Pitch Estimation for Automated Speaker Recognition System for Critical Use

Authors

  • V. V. Kovtun Vinnytsia National Technical University

Keywords:

automated speaker recognition system for critical use, pitch, deep neural network, recurrent neural network, factorial hidden Markov model

Abstract

The article proposes a method for pitch trend estimation, which, unlike existing ones, uses a factorial hidden Markov model optimized with the junction tree algorithm for pitch trend estimation, generalizing information from pitch state detectors based on deep and recurrent neural networks, with which it is allowed precisely to predict a pitch trend using long-term information from speech frames packets, describe the dynamics of the pitch in the time domain and reduce the noise influence on the quality of pitch estimates. Methods for estimating pitch states based on deep and recurrent neural networks and a method for estimating the pitch trend based on the factorial hidden Markov model (FHMM) are developed. A study was carried out to optimize the parameters of the proposed methods for use as part of the automated speaker recognition system for critical use (ASRSCU). In particular, the results of the research make it possible to recommend power-normalized cepstral characteristics as the basis for estimating the pitch by the proposed methods, to apply frames packets with a duration of 10 frames, to use 1024 neurons in the hidden layers of neural networks that implement the proposed methods, and to use 68 states to describe the pitch. The results of the conducted researches of the dependence of the quality of speakers recognition by the ASRSCU from the level of the signal-to-noise ratio (SNR) in the input speech material and the pitch estimates obtained as a result of the work of the created methods, the parameters of which are optimized taking into account the results of the conducted studies, showed that for all levels of SNR the exact pitch estimate is provided by the FHMM method, showing the correct speakers recognition probability by the ASRSCU at a level of 96…99 % for the selected test sample.

Author Biography

V. V. Kovtun, Vinnytsia National Technical University

Cand. Sc. (Eng.), Assistant Professor of the Chair of Computer Control Systems

Downloads

Abstract views: 166

Published

2018-10-18

How to Cite

[1]
V. V. Kovtun, “Pitch Estimation for Automated Speaker Recognition System for Critical Use”, Вісник ВПІ, no. 4, pp. 61–73, Oct. 2018.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.