distributed real time computation of community preferences l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed, Real-Time Computation of Community Preferences PowerPoint Presentation
Download Presentation
Distributed, Real-Time Computation of Community Preferences

Loading in 2 Seconds...

play fullscreen
1 / 22

Distributed, Real-Time Computation of Community Preferences - PowerPoint PPT Presentation


  • 291 Views
  • Uploaded on

Distributed, Real-Time Computation of Community Preferences. Thomas Lutkenhouse, Michael L. Nelson, Johan Bollen Old Dominion University Computer Science Department Norfolk, VA 23529 USA {lutken,mln,jbollen}@cs.odu.edu HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Distributed, Real-Time Computation of Community Preferences' - ryanadan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
distributed real time computation of community preferences

Distributed, Real-Time Computation of Community Preferences

Thomas Lutkenhouse, Michael L. Nelson, Johan Bollen

Old Dominion UniversityComputer Science DepartmentNorfolk, VA 23529 USA

{lutken,mln,jbollen}@cs.odu.edu

HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia

6.-9.Sept. 2005, Salzburg Austria

outline
Outline
  • Review of technologies
    • buckets
    • Hebbian learning
    • previous results
  • Experiment design
  • Results
  • Lessons learned
  • Conclusions
buckets
Buckets
  • Premise: repositories come and go, but the objects should endure
  • Began as part of NASA DL research
    • focus on digital preservation
    • implementation of the “Smart Objects, Dumb Archives” (SODA) model for digital libraries
      • CACM 2001, doi.acm.org/10.1145/374308.374342
      • D-Lib, dx.doi.org/10.1045/february2001-nelson
smart objects
Smart Objects
  • Responsibilities generally associated with the repository are “pushed down” into the stored object
    • T&C, maintenance, logging, pagination & display, etc…
  • Aggregate:
    • metadata
    • data
    • methods to operate on the metadata/data
  • API examples
      • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=getMetadata&type=all
      • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listMethods
      • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listPreference
      • (cheat) http://www.cs.odu.edu/~mln/teaching/cs595-f03/bucket/bucket.xml
examples
Examples
  • 1.6.X bucket
    • http://ntrs.nasa.gov/
    • http://www.cs.odu.edu/~mln/phd/
  • 2.0 buckets
    • http://www.cs.odu.edu/~mln/teaching/cs595-f03/
    • http://www.cs.odu.edu/~lutken/bucket/
  • 3.0 buckets (under development)
    • http://beaufort.cs.odu.edu:8080/
    • uses MPEG-21 DIDLs
      • cf. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html
hebbian learning
Hebbian Learning

Implementation issues:

- gather log files

- problematic when spread across servers/domains

- determine a T for session reconstruction

- typically 5 min

- compute links & weights

- update the network periodically

- typically monthly

previous log based recommendation implementations
Previous, Log-Based Recommendation Implementations
  • LANL Journal Recommendations
    • collection analysis based on journal readership patterns
      • D-Lib Magazine, dx.doi.org/10.1045/june2002-bollen
  • NASA Technical Report Server
    • compared recommendations with those generated by VSM
      • WIDM 2004, doi.org.acm/1031453.1031480
  • Open Video Project
    • generated recommendations for videos (little descriptive metadata)
      • JCDL 2005, doi.acm.org/1065385.1065472
hebbian learning with bucket methods
Hebbian Learning with Bucket Methods

http://b?method=display

&referer=http://b&

redirect=http://a?method=display

%26redirect=http://c?method=display

%26referer=http://b

http://a?method=display

&referer=http://a&

redirect=http://b?method=display

%26referer=http://a

experiment
Experiment
  • Spin Magazine’s “Top 50 Rock Bands of All Time”
    • something other than reports, journals, etc.
    • harvest allmusic.com for metadata for all LPs by the 50 bands (total = 800 LPs)
  • Maintain hierarchical arrangement
    • 1 artist  N albums
  • Initialize the network of 800 LPs with each LP randomly linked to 5 other LPs
  • Send out email invitations to browse the network
    • have them explore, and then examine the resulting network
    • users not informed about the workings of the network
hierarchical weighted links

-<structural>

-<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/121/">

-<metadata>

-<descriptive>

<title>Terrapin Station, Capital Centre, Landover, MD, 3/15/90</title>

</descriptive>

<administrative/>

</metadata>

</element>

-<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/11/">

-<metadata>

-<descriptive>

<title>Jealousy/Progress</title>

</descriptive>

<administrative/>

</metadata>

</element>

-<element wt="3" id="~http://www.cs.odu.edu/~lutken/bucket/434/">

-<metadata>

-<descriptive>

<title>Nevermind</title>

</descriptive>

<administrative/>

</metadata>

</element>

-<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/130/">

-<metadata>

-<descriptive>

<title>Technical Ecstasy</title>

</descriptive>

<administrative/>

</metadata>

</element>

…….

Hierarchical, Weighted Links

weights

- initial: 0.5

- frequency : 1.0

- symmetry: 0.5

- transitivity: 0.3

slide14

Respondents

  • August 2004 - October 2004
  • 160 respondents
    • self-identify at the beginning; exit survey at the end
    • 1200 bucket-to-bucket traversals (7.5 average traversals per session)
how to evaluate the resulting network
How to Evaluate the Resulting Network?
  • Compute network analysis metrics:
    • PageRank
    • Degree Centrality
    • Weighted Degree Centrality
  • Compare the results to:
    • Other “expert” lists (VH1, DigitalDreamDoor, original Spin Magazine list)
    • Artist / LP best seller according to RIAA
    • Artist / LP Amazon sales rank
expert rankings
Expert Rankings
  • No correlation with:
    • VH1 artist list
    • DigitalDreamDoor list
    • original Spin Magazine list (!)

(critics don’t agree with each other, or the record buying public)

riaa results
RIAA Results
  • RIAA had only
    • only 51/800 LPs
    • only 14/50 artists

(critics don’t buy records!)

*RIAA sales caveat

  • Figure 6. Probability of albums being best-sellers.

Figure 7. Probability of artists being best-sellers.

amazon sales rank
Amazon Sales Rank
  • No correlation with individual LP sales rank…
  • …but correlated with mean artist sales rank
    • similar to RIAA data
    • interpretation: popular artists often have obscure LPs
lessons learned
Lessons Learned
  • While the subject matter was interesting, it was oriented for music geeks
      • i.e., no actual music was delivered to the users (intellectual property considerations)
      • more traversals needed
  • Random initial starting points were difficult to overcome
      • “cold start problem” - pre-seed the links according to some criteria?
      • weights did not decay over time/traversals
  • Choosing only artists from Spin Magazine may have pre-filtered the response
      • choose artists from Down Beat (Jazz), Vibe (Urban), Music City News (Country), etc.
conclusions
Conclusions
  • Can build a network of smart objects featuring adaptive, hierarchical links constructed in real-time without central state
    • network is created without latency and with computations amortized over individual accesses
  • Experimental testbed with popular music LP metadata shown to approach sales rank of artists, not LPs