140 likes | 246 Views
The model by Ferrand, Nelson, and Wiggins uses entropy to predict melody segmentation points. N-gram models are applied, considering pitch and duration features. Results show higher entropy variance in duration features, suggesting more information for segmentation. However, limited temporal relations are noted, challenging the simplicity of N-gram models in music sequences.
E N D
A Probabilistic Model for Melody Segmentation By Miguel Ferrand, Peter Nelson, and Geraint Wiggins
Outlines • Overview of this model • N-gram models and Entropy • A case study • Compare with the experiment from real listeners • Discussion
Overview • A probabilistic approach to predict segmentation boundaries in melodies • No knowledge of music theories is used in this model, pure mathematic method • Use entropy as a measure of unpredictability of music features • Guess that segmentation boundaries will appear at the changes of entropy
N-gram Models (1) • N-gram grammar (Nth order Markov model): P of occurrence of a symbol depends on the prior occurrence of n -1 other symbols. • The probability of sequence s = w1…wl of length l (wji: wi…wj, n: the order)
N-gram Model (2) • Problems: • Data sparseness: some P(wi | …) = 0 • Longer sequences will have lower counts if training corpus is small • Use linear interpolation smoothing method, Take tri-gram for example, P(wk | wk-3, wk-2, wk-1) = λ1P(wk) + λ2P(wk | wk-1) + λ3P(wk | wk-2, wk-1), where λ1 + λ2 + λ3 = 1 and λ1 < λ2 < λ3
Entropy • For an N-gram model M, entropy Hc(M) associated with context c, (e is all possible successor symbol of c) P(e | c) is calculated from linear interpolation smoothing method. Low entropy usually means high predictability.
A case study (1) • Deliège’s experiment • Subjects listened to a melody and had to identify segmentation points in real-time. (Use the solo for English Horn, from Tristan and Isolde by Wagner) • Subjects are both musically trained and untrained. • Found 8 main segment boudaries
A case study (2) • Translate melody information to event-based representation • Pitch Step (PS): interval distance to following event in semitones • Pitch Contour (PC): the sign of PS, {-1, +1, 0} • Duration Ratio (DR): DR of the present and following event • Duration Contour (DC): the change of DR; -1 if DR >1; 1 if DR < 1; 0 if DR = 1
A case study (4) • Tri-gram, bi-gram and uni-gram model was generated for PS, PC, DR and DC. • Standard deviation of entropy is calculated with sliding window (size = 10) • Results
Result • Duration based features have a much higher entropy variance than pitch based features. Therefore time based features are more likely to convey more information for segmentation. • Distinct changes in entropy happened to be melody segment boundaries indicated by listeners.
Discussion • N-gram model might be over-simplified for music sequences. • A state depends only on the previous states. • However, human’s memory is not infinite, either. • The ability to establish large-span temporal relations is limited.