For this experiment, we analyzed real life audio signals of “Khule Dao” and “Bondho Koro”. After that we have developed an algorithm which will automatically identify the test data.
Research Trend:
Here two of the recent research on Bangla voice recognition is discussed briefly.
There are a plenty of research experiments and achieved results in various languages throughout the world regarding speech recognition. But, in Bangla language, early researchers in this field had qualified success, though the scenario is being changed in recent years. This research work aims at developing a neural network based connected digit recognition system in Bangla language. Firstly, a Bangla digit corpus has been developed comprising of male and female speakers. Speech is recorded in connected fashion and words are extracted through automatic segmentation. Then MFCC features of the segmented words are calculated and these feature values are sent as the input to the back-propagation neural network (BPNN). BPNN learning algorithm is used to train the network. The required time to train the network, number of hidden layers, error threshold and number of epochs are considered while training the network to reach the best possible recognition accuracy. This proposed system has been implemented using object oriented programming and the achieved recognition accuracy is very much satisfactory and consistent. The network has been tested for three different setups and the best recognition accuracy achieved for digit dataset is 98.46%. [1]
Voice Recognition is a biometric technology which is used to recognize a particular individual voice. The speech waves of particular voice form the basis of identification of speaker. We can use voice identification in multiple application areas such as telephone banking, shopping through telephone, access to database information and voice mail. One of the powerful applications of voice recognition is for security purpose where a person can enter his/her voice for authentication. Each type of voice has its unique characteristics called feature & the process of extracting these features from the individual voice is called feature extraction. The voice features which are extracted are compared with already saved voices in the database for matching. [2]
Extraction Techniques:
audioread, num2str, strcat, fft, abs, max, length.
Proposed Feature:1. Our technique features for simple speech recognition system which is based on the fast fourier transform (fft).
2. At first the audio signal is analyzed using audioread function of MATLAB
for I = 1:116
s1 ='Z:\EEE 309\Open_Ended\Train_Data\Train_Open\OP-';
s2 = num2str(i);
s3 = '.mp3';
file1 = strcat(s1, s2, s3);
if exist (file1, 'file') == [y, t] = audioread(file1);
3. Then fft is applied on the given input data for all the signal in all the loops
NFFT=length(y); % finding the length of y
x=fft(y, NFFT);
4. The maximum amplitude is found using the max function
x1=abs(x);
F=((0:1/NFFT:1-1/NFFT)*Fs);
max_amp=max(x1);
5. Corresponding frequency of the maximum amplitude is found using the find function
b=find(x1==max_amp(1));
F_KD_max(i)=F(b(1));
6. The frequency values are summed up and averaged for both “Khule Dao” and “Bondho Koro”
7. if(F_KD_max(i)>50 && F_KD_max(i)<600)
c1 = c1 +1;
sum1=sum1+F_KD_max(i);
end
average_max_frequency_of_Khule_Dao=sum1/c1
8. We propose that if the difference (absolute value) between the test data’s frequency corresponds to maximum magnitude and the average frequency value of “Khule Dao” is less than the difference between test data’s frequency and average frequency of “Bondho Koro”, then the test data will be predicted as “Khule Dao” and vice versa for “Bondho Koro”
Diff_khule_dao_max(i)= abs(F_max(i)-average_max_frequency_of_Khule_Dao);
Diff_bondo_koro_max(i)= abs(F_max(i)-average_max_frequency_of_bondo_koro);
if(Diff_bondo_koro_max(i) >= Diff_khule_dao_max(i))
disp('Khule Dao')
No_of_khule_dao=No_of_khule_dao+1;
else
disp('Bondho Koro')
No_of_bondho_koro=No_of_bondho_koro+1;
Comments
Please log in or sign up to comment.