2016 International Conference on Frontiers of Information Technology (FIT) (2016)
Islamabad, Pakistan
Dec. 19, 2016 to Dec. 21, 2016
ISBN: 978-1-5090-5300-1
pp: 303-308
This paper investigates the problem of separation of human voice from a mixture of human voice and different music instruments. The human voice may be a part of singing voice in a song or it may be a part of some news broadcasted by a channel and contains background music. The final outcome of this work would be a file containing only vocals. Stereo audio is considered for separation in this advance approach. The signal is processed in time frequency domain. In this method of blind source separation the input stereo audio file is processed in the form of frames, then windowed and in last Short time Fourier transform (STFT) is applied on signal. The signal is masked for de-mixing purpose using independent layers of time frequency filters (TFF). A mask is defined for each layer based upon filtering technique. One of the filtering techniques is Pan TFF and the other is inter-channel phase difference TFF. Filtering helps to select STFT coefficients that are estimated as a part of vocals and makes the rest of them zero. After coefficient selection the signal is reconstructed by overlap add (OLA) method to get the final output signal containing only vocals.
Filtering, Time-frequency analysis, Interference, Human voice, Blind source separation, Spectrogram, Filtering algorithms,Pan, Short time Fourier transform, Masking, Time frequency filters, Overlap-add, Stereophonic audio
