using blackboard systems for polyphonic transcription
Download
Skip this Video
Download Presentation
Using Blackboard Systems for Polyphonic Transcription

Loading in 2 Seconds...

play fullscreen
1 / 35

Using Blackboard Systems for Polyphonic Transcription - PowerPoint PPT Presentation


  • 418 Views
  • Uploaded on

Using Blackboard Systems for Polyphonic Transcription. A Literature Review by Cory McKay. Outline. Intro to polyphonic transcription Intro to blackboard systems Keith Martin’s work Kunio Kashino’s work Recent contributions Conclusion . Polyphonic Transcription.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using Blackboard Systems for Polyphonic Transcription' - Roberta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Intro to polyphonic transcription
  • Intro to blackboard systems
  • Keith Martin’s work
  • Kunio Kashino’s work
  • Recent contributions
  • Conclusion
polyphonic transcription
Polyphonic Transcription
  • Represent an audio signal as a score
  • Must segregate notes belonging to different voices
  • Problems: variations of timbre within a voice, voice crossing, identification of correct octave
  • No successful general purpose system to date
polyphonic transcription4
Polyphonic Transcription
  • Can use simplified models:
    • Music for a single instrument (e.g. piano)
    • Extract only a given instrument from mix
    • Use music which obeys restrictive rules
  • Simplified systems have had success rates of between 80% and 90%
  • These rates may be exaggerated, since only very limited testing suites generally used
polyphonic transcription5
Polyphonic Transcription
  • Systems to date generally identify only rhythm, pitch and voice
  • Would like systems that also identify other notated aspects such as dynamics and vibrato
  • Ideal is to have system that can identify and understand parameters of music that humans hear but do not notate
blackboard systems
Blackboard Systems
  • Used in AI for decades but only applied to music transcription in early 1990’s
  • Term “blackboard” comes from notion of a group of experts standing around a blackboard working together to solve a problem
  • Each expert writes contributions on blackboard
  • Experts watch problem evolve on blackboard, making changes until a solution is reached
blackboard systems7
Blackboard Systems
  • “Blackboard” is a central dataspace
  • Usually arranged in hierarchy so that input is at lowest level and output is at highest
  • “Experts” are called “knowledge sources”
  • KSs generally consist of a set of heuristics and a precondition whose satisfaction results in a hypothesis that is written on blackboard
  • Each KS forms hypotheses based on information from front end of system and hypotheses presented by other KSs
blackboard systems8
Blackboard Systems
  • Problem is solved when all KSs are satisfied with all hypotheses on blackboard to within a given margin of error
  • Eliminates need for global control module
  • Each KS can be easily updated and new KSs can be added with little difficulty
  • Combines top-down and bottom-up processing
blackboard systems9
Blackboard Systems
  • Music has a naturally hierarchal structure that lends itself well to blackboard systems
  • Allow integration of different types of expertise:
    • signal processing KSs at low level
    • human perception KSs at middle level
    • musical knowledge KSs at upper level
blackboard systems10
Blackboard Systems
  • Limitation: giving upper level KSs too much specialized knowledge and influence limits generality of transcription systems
  • Ideal system would not use knowledge above the level of human perception and the most rudimentary understanding of music
  • Current trend is to increase significance of upper-level musical KSs in order to increase success rate
keith martin 1996 a
Keith Martin (1996 a)
  • “A Blackboard System for Automatic Transcription of Simple Polyphonic Music”
  • Used a blackboard system to transcribe a four-voice Bach chorale with appropriate segregation of voices
  • Limited input signal to synthesized piano performances
  • Gave system only rudimentary musical knowledge, although choice of Bach chorale allowed the use of generally unacceptable assumptions by lower level KSs
keith martin 1996 a12
Keith Martin (1996 a)
  • Front-end system used short-time Fourier transform on input signal
  • Equivalent to a filter bank that is a gross approximation the way the human cochlea processes auditory signals
  • Blackboard system fed sets of associated onset times, frequencies and amplitudes
keith martin 1996 a13
Keith Martin (1996 a)
  • Knowledge sources made five classes of hierarchally organized hypotheses:
    • “Tracks”
    • Partials
    • Notes
    • Intervals
    • Chords
keith martin 1996 a14
Keith Martin (1996 a)
  • Three types of knowledge sources:
    • Garbage collection
    • Physics
    • Musical practice
  • Thirteen knowledge sources in all
  • Each KS only authourized to make certain classes of hypotheses
keith martin 1996 a15
Keith Martin (1996 a)
  • KSs with access to upper-level hypotheses can put “pressure” on KSs with lower-level access to make certain hypotheses and vice versa
  • Example: if the hypotheses have been made that the notes C and G are present in a beat, a KS with information about chords might put forward the hypothesis that there is a C chord, thus putting pressure on other KSs to find an E or Eb.
  • Used a sequential scheduler to coordinate KSs
keith martin 1996 b
Keith Martin (1996 b)
  • “Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing”
  • Previous system often misidentified octaves
  • Attempted to improve performance by shifting octave identification task from a top-down process to a bottom-up process
keith martin 1996 b17
Keith Martin (1996 b)
  • Proposes the use of log-lag correlograms in front end
  • Models the inner hair cells in the cochlea with a bank of filters
  • Determines pitch by measuring the periodic energy in each filter channel as a function of lag
  • Correlograms now basic unit fed to blackboard system
  • No definitive results as to which approach is better
kashino nadaki kinoshita and tanaka 1995
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • “Application of Bayesian Probability Networks to Music Scene Analysis”
  • Work slightly preceded that of Martin
  • Used test patterns involving more than one instrument
  • Uses principles of stream segregation from auditory scene analysis
  • Implements more high-level musical knowledge
  • Uses Bayesian network instead of Martin’s simple scheduler to coordinate KSs
kashino nadaki kinoshita and tanaka 199519
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Knowledge sources used:
    • Chord transition dictionary
    • Chord-note relation
    • Chord naming rules
    • Tone memory
    • Timbre models
    • Human perception rules
  • Used very specific instrument timbres and musical rules, so has limited general applicability
kashino nadaki kinoshita and tanaka 199520
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Tone memory: frequency components of different instruments played with different parameters
  • Found that the integration of tone memory with the other KSs greatly improved success rates
kashino nadaki kinoshita and tanaka 199521
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Bayesian networks well known for finding good solutions despite noisy input or missing data
  • Often used in implementing learning methods that trade off prior belief in a hypothesis against its agreement with current data
  • Therefore seem to be a good choice for coordinating KSs
kashino nadaki kinoshita and tanaka 199522
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • No experimental comparisons of this approach and Martin’s simple scheduler
  • Only used simple test patterns rather than real music
kashino and hagita 1996
Kashino and Hagita (1996)
  • “A Music Scene Analysis System with the MRF-Based Information Integration Scheme”
  • Suggests replacing Bayesian networks with Markov Random Field hypothesis network
  • Successful in correcting two most common problems in previous system:
    • Misidentification of instruments
    • Incorrect octave labelling
kashino and hagita 199624
Kashino and Hagita (1996)
  • MRF-based networks use simulated annealing to converge to a low-energy state
  • MRF approach enables information to be integrated on a multiply connected hypothesis network
  • Bayesian networks only allow singly connected networks
  • Could now deal with two kinds of transition information within a single hypothesis network:
    • chord transitions
    • note transitions
kashino and hagita 199625
Kashino and Hagita (1996)
  • Instrument and octave identification errors corrected, but some new errors introduced
  • Overall, performed roughly 10% better than Bayesian-based system at transcribing 3-part arrangement of Auld Lang Syne
  • Still only had a recognition rate of 71.7%
kashino and murase 1998
Kashino and Murase (1998)
  • Shifts some work away from blackboard system by feeding it higher-level information
  • Simplifies and mathematically formalizes notion of knowledge sources
  • Switches back to Bayesian network
  • Perhaps not truly a blackboard system anymore
  • Has very good recognition rate
  • Scalability of system is seriously compromised by new approach
kashino and murase 199827
Kashino and Murase (1998)
  • Uses adaptive template matching
  • Implemented using a bank of filters arranged in parallel and a number of templates corresponding to particular notes played by particular instruments
  • The correlation between the outputs of the filters is calculated and a match is then made to one of the templates
kashino and murase 199828
Kashino and Murase (1998)
  • Achieved recognition rate of 88.5% on real recordings of piano, violin and flute
  • Including templates for many more instruments could make adaptive template matching intractable
  • Particularly a problem for instruments with
    • Similar frequency spectra
    • A great deal of spectral variation from note to note
hainsworth and macleod 2001
Hainsworth and Macleod (2001)
  • “Automatic Bass Line Transcription from Polyphonic Music”
  • Wanted to be able to extract a single given instrument from an arbitrary musical signal
  • Contrast to previous approaches of using recordings of only one instrument or a set of pre-defined instruments
hainsworth and macleod 200130
Hainsworth and Macleod (2001)
  • Chose to work with bass
    • Can filter out high frequencies
    • Notes usually fairly steady
  • Used simple mathematical relations to trim hypotheses rather than a true blackboard system
  • Had a 78.7% success rate on a Miles Davis recording
bello and sandler 2000
Bello and Sandler (2000)
  • “Blackboard Systems and Top-Down Processing for the Transcription of Simple Polyphonic Music”
  • Return to a true blackboard system
  • Based on Martin’s implementation, using a conventional scheduler
  • Refines knowledge sources and adds high-level musical knowledge
  • Implements one of knowledge sources as a neural network
bello and sandler 200032
Bello and Sandler (2000)
  • The chord recognizer KS is a feedworard network
  • Trained using the spectrograph of different chords of a piano
  • Trained network fed a spectrograph and outputs possible chords
  • Can therefore output more than one hypothesis at each iteration
  • Gives other KSs more information and allows parallel exploration of solution space
bello and sandler 200033
Bello and Sandler (2000)
  • Could automatically retrain network to recognize spectrograph of other instruments with no manual modifications needed
  • Preliminary testing showed tendency to misidentify octaves and make incorrect identification of note onsets
  • These problems could potentially be corrected by signal processing system that feeds blackboard system
conclusions
Conclusions
  • Bass transcription system and more recent work of Kashino useful for specific applications, but limited potential for general transcription purposes
  • True blackboard approach scales well and appears to hold the most potential for general-purpose polyphonic transcription
conclusions35
Conclusions
  • Use of adaptive learning in knowledge sources seems promising
  • Interchangeable modules could be automatically trained to specialize in different areas
  • Could have semi-automatic transcription, where user chooses correct modules and system performs transcription using them
ad