1 / 30

Progress Report - Year 2

Progress Report - Year 2. Extensions of the PhD Symposium Presentation Daniel McEnnis. Overview. Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements. Current Data. 40’s Jazz Recordings 2000 annotated recordings from 80 CDs

raven
Download Presentation

Progress Report - Year 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

  2. Overview • Accomplishments • Data set acquisition and cleaning • Theoretical achievements • Graph-RAT improvements

  3. Current Data • 40’s Jazz Recordings • 2000 annotated recordings from 80 CDs • Covers nearly all 40’s popular music • LastFM by Song • Retrieves tag and user info by song • Data cleaning on user playcounts needed

  4. Planned Data Set Acquisition • Explored DBTunes XML version of myspace. • Linking with LastFM data designed but not yet written. • Provides per-artist audio data for all recent artists.

  5. Theoretical Achievements • Algorithm Literature Review • Theortical Computer Science journal submission • NZCSRSC conference submission • Recommendation Tasks and Evaluation Metrics

  6. Algorithm Literature • Systematic exploration of theoretical computer science and discrete mathematics. • Discovered 1973 SIAM paper for maximal clique algorithm. • Maximal clique algorithm is most efficient discovered

  7. Journal Submission • Submitted Graph Triples Census algorithm. • Proof of correctness • Proof of Time complexity • Proof of Space Complexity • Rediscovery of 2001 algorithm in Social Networks • Most efficient implementation known

  8. NZCSRSC • Poster at the conference • Written as a short users guide

  9. Evaluation Exploration • Incorporating cross-validation into relational data. • 9 types of music recommendation • Personalized versus generic • Open query versus targeted query • Dynamic versus static data • New music versus all music

  10. Personalized Radio • Open query with personalized presentation • Static data vs dynamic data • New items prediction vs predict anything

  11. Targeted Search • Not personalized • Similarity queries • Automatically generating targeted lists for a browsing hierarchy • New music vs all music • Static vs dynamic data

  12. Personalized Tag Radio • Create a personalized play list matching a given query • New music vs all music • Static vs dynamic data

  13. Excluded Types • ‘Top 40’ prediction • Rendered obsolete by other types

  14. Cross-Validation in Graphs • Actor removal • Only form currently used • All links to a particular actor are removed • Link removal • Selected links from ground truth are removed • Algorithm evaluated on reproducing missing links

  15. Graph-RAT Improvements • Release of 0.4.4 • Finalized Graph-RAT as a relational programming language • Added propositional algorithms • Release of 0.5.0 • New Query Subsystem • Usability enhancements • Space complexity improvements

  16. Aggregators • 8 algorithms with 9 helper functions • Cover each form of propositionalization • Cover mappings between links and properties • Core primitives for Graph-RAT as a programming language.

  17. Similarity • 2 new similarity algorithms • 1 new distance metric

  18. Query Subsystem • 28 primitives for searching in a graph • 10 graph primitives • 7 actor primitives • 7 link primitives • 4 property primitives • Functional - composition to build queries

  19. Performance Specs • Queries can return collections or iterators. • Collections • Implemented as references into graphs • Linear in number of references • Iterators • Ordered sequences of objects • Constant in space complexity (excluding Graph ID and AllGraphs)

  20. Usability Enhancements • Properties and Metadata • Interface enhancements • Dynamic Loading of Classes • XML scripting support

  21. Properties and Metadata • Properties description • Encapsulates all parameter code • Utilizes Graph-RAT Property objects • Comparison to JavaBeans • New Metadata Model • Parameter model update • Input/Output descriptors update

  22. Interface Updates • Arrays->Lists • graph, link, actor, and property objects • Iterators • All graph operations support iterators

  23. Dynamic Loading • Classes loaded from file at runtime. • Loading controlled by call to loader object • Automatic registering with relevant factories • All factories updated to support dynamic loading • Extend Abstract Factory

  24. XML Scripting support • SAX parser support for all components excepting crawling and parsing • Implemented using the Builder pattern

  25. Core Improvements • 2 cross-validation algorithms • ~20 algorithm with space complexity improvements • Iterators for all graph primitives • Macros for separation of graph data by cross-validation property.

  26. Additional algorithms • 2 new similarity algorithms • 1 new distance metric added • Obsolete algorithms removed

  27. LastFM crawler updates • LastFM upgraded its web-services, removing the old version • New version will link to the semantic web • ~20 parsers completed • Still under construction

  28. Planned Future Work • Contingent on arrival of computer • Testing of existing code • Cross-Validation Scheduler • Completion of LastFM Parser • DBTunes (from semantic web) parser • Experiments! • Write Thesis!

  29. Unplanned Future Work • Full semantic web crawler • Incorporating GData protocols • Database backend • Colt-Matrix-Over-Graph adapter • Database-backed Weka instance

  30. Beyond the Horizon • Support for Prolog primitives • Multi-database graph support • Semantic Web graph utilizing the proxy pattern • Support for dynamic updates and dynamic data

More Related