HTK is the “Hidden Markov Model Toolkit” developed by the Cambridge University Engineering Department. This toolkit aims at building and manipulating Hidden Markov Models (HMMs).
HTK is primarily used for speech recognition research although it has been used for numerous other applications including speech synthesis, character recognition and DNA sequencing. HTK consists of a set of library modules and tools available in C source form. It is available on free download, along with a complete documentation.
Hidden Markov Model Toolkit (HTK) www.redicals.com
1. Hidden Markov Model Toolkit (HTK) Tutorial
http://www.redicals.com
What is HTK?
HTK is the “Hidden Markov Model Toolkit” developed by the
Cambridge University Engineering Department. This toolkit aims at
building and manipulating Hidden Markov Models (HMMs).
HTK is primarily used for speech recognition research although it has
been used for numerous other applications including speech
synthesis, character recognition and DNA sequencing. HTK consists of
a set of library modules and tools available in C source form. It is
available on free download, along with a complete documentation.
1.1 HTK Construction steps
The main construction steps are the following:
- Creation of a training database: Each element of the vocabulary is
recorded several times, and labelled with the corresponding word.
- Acoustical analysis: The training waveforms are converted into
some series of coefficient vectors.
– Definition of the models: A prototype of Hidden Markov Model
(HMM) is defined for each element of the task vocabulary.
- Training of the models: Each HMM is initialised and trained with
the training data.
- Definition of the task: The grammar of the recogniser (what can be
recognised) is defined.
2. - Evaluation: The performance of the recogniser can be evaluated on
a corpus of test data.
How to Installing HTK on Windows
STEP 1:- Register with HTK and DownloadHTK Toolkit.
STEP 2:- Extract Windows HTK Binaries, Right-click the htk-
3.x-windows-binary.zip file, and select 'Extract All' from your
right-click menu and follow the steps in the extraction wizard
to extract the zip file to your HTK directory.
STEP 3:-After Extraction, you will get two directories
"bin.win32"and "bin", which contain HTK commands and
other data-processing commands, respectively.
STEP 4:- Now you Need to add these two directories to the
system path
STEP 5:- Go to Start >> type “cmd” Now type the address of
this two directories Such as
set path=%path%;c:htkbin.win32;c:htkbin
3. NOTE :- I have Created HTK Folder in C drive and have stored
my both directories in it. Such as c:htkbin and
c:htkbin.win32
STEP 5:- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc
4. How to Installing HTK on Linux
STEP 1 :- DownloadHTK Toolkit and Extract it
STEP 2 :- cd htk
STEP 3 :- ./configure --prefix=/tmp –without-x –disable-
hslab
STEP 4 :- make all
STEP 5 :- sudo make install
STEP 6 :- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc through terminal.
How to Installing HTK on Mac OSX
STEP 1 :- DownloadHTK Toolkit and Extract it
STEP 2 :- $tar zxf HTK-3.4.1.tar.gz
STEP 3 :- Open Terminaland Type cd htk
STEP 4 :- export CPPFLAGS=-UPHNALG
STEP 5 :- cd htk chmod +x configure
STEP 5 :- ./configure –without-x –disable-hslab
STEP 6 :- make all
STEP 7 :- sudo make install
STEP 8 :- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc through termi
5. 1. Data creation
HCopy: feature extraction
HList: file information HLEd: label created(Master Label File, output the MLF)
2. Learning
MakeProtoHMMSet: topologydetermine
the initial model learning
HInit: of HMM and the corresponding cut-out of the phoneme learning (Segmental K-
Means)
HRest:learning of HMM following the HInit (Baum-Welch, increase the number of
mixtures)
3. Recognition
HVite: recognitionby the Viterbi algorithm
HBuild: generationof the word network (sub-networks can also be generated)
HParse: conversionof grammar notation(EBNF ( extendedBackus notation) to)
HDMan: dictionarymanagement tool
4. Analysis
HResult: calculationof the recognitionrate
HTK Tools
HCopy
This program will copy one or more data files to a designated output file, optionally
converting the data into a parameterised form. While the source files can be in any
supported format, the output format is always HTK. Hence, this program is used to
convert data files in other formats to the HTK format, each source data file must
have an associated label file, and a target label file is created. HCopy can also be
used to convert the parameter kind of a file, for example from WAVEFORM to MFCC,
depending on the configuration options. Conversions must be specified via a
configuration.
USE :- HCopy -C mfcc13.cfg ..waveFiles1001-10a.wav outputfeature001-
10a.fea
HInit
HInit is used to provide initial estimates for the parameters of a single HMM using a
set of observation sequences. It works by repeatedly using Viterbi alignment to
segment the training observations and then recomputing the parameters by pooling
6. the vectors in each segment. For mixture Gaussians, each vector in each segment is
aligned with the component with the highest likelihood. HInit can be used to provide
initial estimates of whole word models in which case the observation sequences are
realisations of the corresponding vocabulary word. Alternatively, HInit can be used
to generate initial estimates of HMMs for phoneme-based speech recognition.
HLEd
This program is a simple editor for manipulating label files. Typical examples of its
use might be to merge a sequence of labels into a single composite label or to
expand a set of labels into a context sensitive set. HLEd works by reading in a list of
editing commands from an edit script file and then makes an edited copy of one or
more label files.
HSLab
HSLab is an interactive label editor for manipulating speech label files. An example
of using HSLab would be to load a sampled waveform file, determine the boundaries
of the speech units of interest and assign labels to them. Alternatively, an existing
label file can be loaded and edited by changing current label boundaries, deleting
and creating new labels.
Hparse
The HParse program generates word level lattice files (for use with e.g. HVite) from
a text file syntax description containing a set of rewrite rules based on extended
Backus-Naur Form (EBNF). The EBNF rules are used to generate an internal
representation of the corresponding finite-state network where HParse network
nodes represent the words in the network, and are connected via sets of links. This
HParse network is then converted to HTK word level lattice.
HERest
This program is used to perform a single re-estimation of the parameters of a set of
HMMs, or linear transforms, using an embedded training version of the Baum-Welch
algorithm. Training data consists of one or more utterances each of which has a
transcription in the form of a standard label file (segment boundaries are ignored).
For each training utterance, a composite model is effectively synthesised by
concatenating the phoneme models given by the transcription. Each phone model
has the same set of accumulators allocated to it as are used in HRest but in HERest
they are updated simultaneously by performing a standard Baum-Welch pass over
each training utterance using the composite model. HERest is intended to operate on
HMMs with initial parameter values estimated by HInit/HRest. HERest supports
multiple mixture Gaussians, discrete and tied-mixture HMMs, multiple data streams,
7. parameter tying within and between models, and full or diagonal covariance
matrices.
Hresults
HResults is the HTK performance analysis tool. It reads in a set of label files
(typically output from a recognition tool such as HVite) and compares them with the
corresponding reference transcription files. For the analysis of speech recognition
output.
------------------------ Overall Results --------------------------
SENT: %Correct=86.67 [H=52, S=8, N=60]
WORD: %Corr=86.67, Acc=86.67 [H=52, D=0, S=8, I=0, N=60]
H is the number of correct labels, D is the number of deletions, S is the number of
substitutions, I is the number of insertions
and N is the total number of labels in the defining transcription files. The percentage
number of labels correctly recognised is given by
and the accuracy is computed by