SPEECH PROCESSING
CT 785 08
Course Objectives:
- To introduce the characteristics of Speech signals and the related time and frequency domain methods for speech analysis and speech compression
- To introduce the models for speech production
- To develop time and frequency domain techniques for estimating speech parameters
- To introduce a predictive technique for speech compression
- To understand speech recognition, synthesis and speaker identification.
- Nature of speech signal[8 hours]
- Speech production: Mechanism of speech production
- Acoustic phonetics
- Digitalmodels for speech signals
- Representations of speech waveform
- Sampling speechsignals
- Basics of quantization
- Delta modulation
- Differential PCM
- Time domain methods for speech processing[8 hours]
- Time domain parameters of Speech signal
- Methods for extracting the parameters
- 1Short-time Energy
- Average Magnitude
- Short-time average Zero crossing Rate
- Auditoryperception: psychoacoustics.
- Silence Discrimination using ZCR and energy
- Short Time Auto Correlation Function
- Pitch period estimation using AutoCorrelation Function
- Frequency domain method for speech processing[10 hours]
- Short Time Fourier analysis
- Fourier transform and linear filtering interpretations
- Sampling rates
- Spectrographic displays
- Pitch and formant extraction
- Analysis bySynthesis
- Analysis synthesis systems
- Phase vocoder
- Channel Vocoder
- Homomorphic speech analysis
- Cepstral analysis of Speech
- Formant and PitchEstimation
- Homomorphic Vocoders
- Linear predictive analysis of speech[10 hours]
- Basic Principles of linear predictive analysis
- Auto correlation method
- Covariance method
- Solution of LPC equations
- Cholesky method
- Durbin’s Recursive algorithm
- Application of LPC parameters
- Pitch detection using LPC parameters
- Formant analysis
- VELP
- CELP
- Application of speech & audio signal processing[9 hours]
- Algorithms:
- Dynamic time warping
- K-means clustering and Vector quantization
- Gaussian mixture modeling
- Hidden Markov modeling
- Automatic Speech Recognition
- Feature Extraction for ASR
- Deterministic sequence recognition
- Statistical Sequence
- Recognition
- Language models
- Speaker identification and verification
- Voice response system
- Speech synthesis
- Basics of articulatory
- Source-filter
- Concatenative synthesis
Practical:
There should be at 4-6 experiments based on following topics
- Spectral analysis
- Time-Frequency analysis
- Pitch extraction
- Formant tracking
- Speech enhancement
- Audio coding
- Speaker recognition
All these lab works may be performed in Matlab or similar softwares capable of processing speech signals. It can also be implemented in hardware if available.
References:
- Thomas F. Quatieri, “Discrete-Time Speech Signal Processing”, Prentice Hall /Pearson Education.
- Ben Gold and Nelson Morgan, “Speech and Audio Signal Processing”, John Wiley and Sons Inc.
- L.R.Rabiner and R.W.Schaffer, “Digital Processing of Speech signals”, Prentice Hall
- L.R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall.
- J.R. Deller, J.H.L. Hansen and J.G. Proakis, “Discrete Time Processing of SpeechSignals”, John Wiley, IEEE Press.
- J.L Flanagan, “Speech Analysis Synthesis and Perception”,Springer, Verlag.
Evaluation Scheme:
The questions will cover all the chapters of the syllabus. The evaluation scheme will be as indicated in the table below:
Chapters |
Hours |
Marks Distribution* |
1 |
8 |
14 |
2 |
8 |
14 |
3 |
10 |
18 |
4 |
10 |
18 |
5 |
9 |
16 |
Total |
45 |
80 |
*There could be a minor deviation in Marks distribution
|