User Manual of Mining Mouse Vocalizations

User Manual ofMining Mouse Vocalizations Prepared by JesinZakaria and Eamonn Keogh

CREATE SPECTROGRAM Run the code createSpectro.m to create spectrogram from a .wav file idealize the spectrogram extract candidate syllables from idealized spectrogram Try the following example Set, rec = ‘..\031611KOKO02MATED.wav'; % put the address and name of the wav file D = ‘...\031611KOKO02MATEDspectro\'; % location of the folder % that will contain syllables Depending on the size of main memory and recording set range of the for loop In each iteration we created spectrogram of two minutes of the recording, this value can be changed to create spectrogram of longer section of the recording. RUNNING TIME: Since the running time is faster than real time, we did not include running time analysis in our paper. For example, It took on average, (12.95 + 12.81 + 12.67)/3 = 12.81 second, to create spectrogram of a two minute long recording It took, 85.7 second to extract connected components from the idealized spectrogram of a six minute long recording

CREATE SPECTROGRAM 100 kHz laboratory mice 40 Figure 1: Use the following code to create the idealized spectrogram. 124 125 Time (second) rec = 'C:\Users\Jesin\Desktop\temp\031611KOKO02MATED.wav'; t1 = 124000*250; t2 = 125000*250; [Y, FS] = wavread(rec,[t1,t2]); [y,F,T,P]=spectrogram(Y,512,256,512,FS,'yaxis'); C = -10*log10(P); C(C<35)=0; C(C>80)=0; C(C~=0)=1; imshow(~C);

EXTRACT CANDIDATE SYLLABLES In createSpectro.mwe marked the part of code to extract candidate syllables Results of all filtering steps are included in the extractcandidatesyllable.zip folder The folder …\031611KOKO02MATEDspectro contains all connected components with duration >10 and <300 and within frequency range 30 to 110kHz The folder …\031611KOKO02MATED contains all candidate syllables after filtering out some noise and excluding all the syllables but one that appear in the same time stamp The folder …\sametimecontains syllables that were excluded for appearing in same timestamp

CLASSIFY CANDIDATE SYLLABLES Run the code classifySyllables.m Require: labelGrndTruth.txt contains labels of the ground truth theta.txt contains thresholdsfor each class. mean, sigma, mean+sigma and mean+2*sigma for each class of syllables in the ground truth are included in column 1, 2, 4 and 5 of theta.txt Nomalized Ground truth Candidate syllables bitmaps List of candidate syllables in sorted order Result: For our sample example, ‘dis031611KOKO02MATED.txt’, contains distance of the candidate syllables to GroundTruth ‘label 031611KOKO02MATED.txt’, contains labels of all the candidate syllables If you want to see class distribution unblock the code for class distribution in classifySyllables.m

CLASSIFY CANDIDATE SYLLABLES Normalization method In our paper we said that all the candidate syllables and ground truth are normalized before computing the GHT distance between them. But for brevity we did not include details about our normalization method and also did not validate our normalization method. In the next slide we will present detail about our normalization method.

CLASSIFY CANDIDATE SYLLABLES Normalization method GHT is calculated without normalizing the syllables Set: 16syllables of class 1, 3, 4 and 11 (non confusing classes) Syllables that are not clustered correctly are marked with red circle

CLASSIFY CANDIDATE SYLLABLES Normalization method GHT is calculated after normalizing the syllables by dividing x and y by the larger dimension(row or column) Set: 16syllables of class 1, 3, 4 and 11 (non confusing classes) Still there are some syllables that are not clustered correctly as evident from the following figure Same set of syllables after normalization

CLASSIFY CANDIDATE SYLLABLES Normalization method (we used in our paper) GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively Set: 16syllables of class 1, 3, 4 and 11 (non confusing classes) All the syllables except one (marked with arrow), are clustered correctly as evident from the following figure Same set of syllables after normalization

CLASSIFY CANDIDATE SYLLABLES Normalization method (we used in our paper) GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively Set: 16 syllables of class 1 and 27 syllables of class 9(Confusing classes) Same set of syllables after normalization

EDITING GROUND TRUTH Run accuracyGrndTrth.m to generate the plot It requires, editMatrix.txt dis692.txt label692.txt for edited ground truth 1 for all the labeled syllables 0.8 0.6 Classification Accuracy 0.4 0.2 0 0 100 200 300 400 500 600 700 DESCRIPTION OF THE FILES In our paper we have mentioned about the 692 annotated syllables by the domain expert. Instead of using that 692 syllables as ground truth we used data editing technique, that resulted in a set of 108 syllables which we used as GROUNDTRUTH for our experiments 1. editMatrix.txt contains result of editing 692 annotated syllables Column 2, 3, 4 and 5 represent the number of syllable added to the ground truth, class label of the syllable, total number of classified syllable using the edited ground truth and accuracy rate. 2.dis692.txt contains GHT distances of the 692 annotated syllables 3. label692.txt contains class labels of the 692 syllables groundtruth.zip contains the set of 692syllable and 108 syllables that we mentioned in our paper. Adding more instances

MOTIF DISCOVERY Run findMotif.m to find motifs from a vocalization Instruction: In findMotif.m need to change location of the folders that will contain motifs, .wav file, list of syllables, label of the syllables And also create folder e.g. …/motif/6 …/motif/7 before running the code. These folders will contain motifs of length 6, 7 etc. motif.zip contains motifs from the attached .wav file. 194.8 – 195.2 sec 944.7 – 945.2 sec

Clustering mice vocalizations Run clusterMtf.m to cluster motifs from mice vocalizations The folder ‘dendo_mice’ contains all the required files used to generate the dendrograms of figure 12 and figure 13.

Similarity search / Query by content Some additional results are attached here d d q d QUERY ddqd (‘q’ means, unknown class) 10 NN from four vocalizations are presented.

Similarity search / Query by content q i i i Some additional results are attached here c a a a a qaiaiacia QUERY (‘q’ means, unknown class) 10 NN from four vocalizations are presented.

Motif Significance Run mtfSgnfnc.m to assess significance of motifs based on their z-score. The folder ‘../mtfSgnfcn’ contains all the required files used to generate the plot of figure 17.

Contrast sets createContrastset.mis used to create the contrast sets. contratset.mis used to extract the patterns in contrast sets, from a vocalization. The folder ‘../contrastSet’ contains some examples of contrast set that we mentioned in our paper. It also contains necessary files needed in createContrastset.m ‘contrastset.txt’ contains the list of substrings sorted in descending order of their information gain.

Question/ comment? Email at, jzaka001@cs.ucr.edu

User Manual of Mining Mouse Vocalizations

User Manual of Mining Mouse Vocalizations

Presentation Transcript

User Manual

User Manual

Mining Mouse Vocalizations

User Manual

Pindel user manual

User Manual

User Manual

User Manual

User Manual

User Manual

User Manual

User Manual

User Manual

User Manual

User Manual

User manual of MK

CooTel User Manual