Status report 1) ANN training at ICSI 2) gmtkTie

Status report1) ANN training at ICSI2) gmtkTie

ANN training • Joe Frankel, with help from Matthew Magimai-Doss, is training up nets on Fisher (minus SWB1) • We already have train and validation scores for some nets • Training times appear to be of the order of 100 hours per net

ANN performance summary • Finished • glottal - 351x1400x4 - Train accuracy: 87.27%, CV accuracy: 87.1% • degree1 - 351x1600x6 - Train accuracy: 78.01%, CV accuracy: 77.79% • nasal - 351x1200x3 - Train accuracy: 90.74%, CV accuracy: 90.55% • In progress • place1 - 351x1900x10 - Train accuracy: 76.30%, CV accuracy: 76.06% • rounding - 351x1200x3 - Train accuracy: 87.69%, CV accuracy:87.52% • vowel - 351x2400x23 - Train accuracy: 72.95%, CV accuracy: 72.96% • front - 351x1700x7 - Train accuracy: 74.36%, CV accuracy: 75.14% • Not yet started • height - 351x1800x8

Glottal – 87% overall Confusion matrix Entries are percentages (of frames) for the cross-validation set Columns are correct labels (columns add up to 100%) Rows are classified labels

Degree1 – 78% overall

Nasal – 91% overall

gmtkTie • A general-purpose parameter tying tool for GMTK • Eventually will be able to: • Tie any type of parameters • Do bottom-up or top-down clustering of parameters of all types • Perform other manipulations, e.g. removal of unused parameters, maybe even model structure (e.g. changing variable cardinalities) • Do a full emulation of HTK’s HHEd tying commands • User provides a list of commands to execute, just as with HTK

gmtkTie – current capabilities 1 • Bottom-up (purely data-driven) clustering • Uses a simple agglomerative clustering algorithm to find sets of similar parameters • Can be followed by tying all parameters within each cluster • Many different dissimilarity measures available, such as: Euclidean; cross-likelihood of means; variance-scaled Euclidean (Mahalanobis); etc • Many different measures for the size (“purity”) of clusters, such as: max pairwise dissimilarity; average dissimilarity to centroid • Many different criteria for finding the cluster centroid, such as: the item with least total dissimilarity to other cluster items; average value of all cluster items; arbitrary choice; etc

gmtkTie – current capabilities 2 • Top-down, decision tree-based clustering • As in HTK, uses a decision tree to determine the clustering • Simple, greedy (non-backtracking) clustering scheme which repeatedly splits clusters • Uses questions about parameter features in order to split clusters; uses measure of cluster “purity” based on parameter values in order to select best question • Key property of this method: can “synthesise” parameters for which there is little or no training data • Currently only for mixtures of Gaussians • Stopping criterion is a threshold on the minimum log likelihood improvement

Decision tree-based clustering 1 • Currently, user must provide the feature values for each parameter, e.g. • What state is this parameter used in? • What are the left/right phonetic contexts? • Or anything else you like…. • One problem: there is no sanity checking – the user must be sure that, e.g. the Gaussian mixture called “gmMx34” is used only when the left context is “ah”. • Would like to make construction of features more automated, but this is hard (i.e. I think need Jeff’s help because it involves running some of the inference routines in GMTK and I don’t know enough about that part of the code yet) • User also supplies candidate questions about these features, e.g. • Is left phonetic context in the set {ax, ah, axr}?

Decision tree-based clustering 2 • Can save/load the decision trees • Can synthesise parameters using these trees (user simply provides the features and gmtkTie ties that parameter to an existing parameter) • Multiple feature sets (definitions + values for each parameter) and trees can be loaded in memory at once, then referred to by name

gmtkTie – alpha testing • Currently building a tied-state TIMIT triphone system • Parameters are tied at the Gaussian component/mixing weights level • Would be better (smaller saved parameter files, for one thing) to tie at the mixture distribution level (easy to do: can have repeated mixture names in a collection) • Requires minor changes to gmtkTie (it currently only loads the parameter file, not the collection definition)

gmtkTie – what will be available for the workshop? • Tested and working decision tree-based clustering/tying for Gaussian mixture distributions, using the same method as HTK • Bottom-up clustering/tying • Documentation (on the GMTK Wiki) • Including a working example (TIMIT triphone system probably)

Status report 1) ANN training at ICSI 2) gmtkTie