1 / 10

Teaching machines to appreciate music

Teaching machines to appreciate music. Classifying songs into three genres using a trusty Multi-Layer Perceptron. A Project by Chad Ostrowski & Curtis Reinking for EE 456 (Neural Networks) in the spring of 2009. Our genres-to-classify are Post-Rock, Folk, and Hip-Hop.

rajah-stone
Download Presentation

Teaching machines to appreciate music

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Teaching machines to appreciate music Classifying songs into three genres using a trusty Multi-Layer Perceptron A Project by Chad Ostrowski & Curtis Reinkingfor EE 456 (Neural Networks)in the spring of 2009

  2. Our genres-to-classify are Post-Rock, Folk, and Hip-Hop Can you guess which is which?

  3. Let’s feed songs-as-data-arrays into an MLP and let it do its thing! The sample bit rate of the type of music files MATLAB insists on is 44100Hz Good ol’ .wav (but the right kind of .wav (ACM Waveform, if you’re wondering). There are at least 4 kinds.) A lot of our songs are >6 minutes in length 6 minutes * 60 seconds/minute * 44100 data points/second = 15,876,000 data points so we’d have an input layer of that size. Hidden layers of…eh, 2e7. worse: this is a variable depending on the song.

  4. Let us chop it up. Let us extract features. folk hip hop Features, anyone?

  5. FFT (the Fast (or Discrete) Fourier Transform) ought to be a good way for a computer to learn about music it listens to Hip hop Folk

  6. It turns out double peaks are frightfully common. We mute them. Hip Hop Folk

  7. So all of our inputs are: Size of song Means of un-fft-ed clips (that’s four inputs) Means of fft-ed clips (that’s four more) The average number of “big” peaks Locations of the five tallest peaks in the fft of each clip (twenty inputs) 30 total inputs

  8. How to output? We recall that output vectors have no boundary problems and give greater accuracy than output scalars. 100 denotes folk 010 denotes post rock 001 denotes hip hop

  9. Drum-roll please!! We tested various network sizes, settling on 30x100x100x3. It crapped on us.

  10. (and that’s all we got) (so far)

More Related