ralf schenkel n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Efficient Top-k Querying over Social Tagging Networks PowerPoint Presentation
Download Presentation
Efficient Top-k Querying over Social Tagging Networks

Loading in 2 Seconds...

play fullscreen
1 / 43

Efficient Top-k Querying over Social Tagging Networks - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Ralf Schenkel. Efficient Top-k Querying over Social Tagging Networks. Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Gerhard Weikum. Social Tagging Networks. Common examples: Flickr (images) YouTube (videos) del.icio.us (bookmarks)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Efficient Top-k Querying over Social Tagging Networks' - todd


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ralf schenkel
Ralf Schenkel

Efficient Top-k Querying over Social Tagging Networks

Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Gerhard Weikum

social tagging networks
Social Tagging Networks

Common examples:

  • Flickr (images)
  • YouTube (videos)
  • del.icio.us (bookmarks)
  • Librarything (books)
  • Discogs (CDs)
  • CiteULike (papers)
  • Facebook
  • Myspace (media)

Definition: Social Tagging Network

Website where people

  • publish + tag information
  • review + rate information
  • publish their interests
  • maintain network of friends
  • interact with friends

SIGIR, Singapore

outline
Outline
  • Search in Social Tagging Networks
    • Graph Model
    • Different Information Needs
  • Effective Query Scoring
  • Efficient Query Evaluation
  • Summary & Further Challenges

SIGIR, Singapore

social network model
Social Network Model

travelChina

queueingtheory

travelNorway

USERS

TAGS

ITEMS

SIGIR, Singapore

social network model1
Social Network Model

travelChina

queueingtheory

travelNorway

USERS

TAGS

ITEMS

SIGIR, Singapore

social network model2
Social Network Model

travel

queues

travel

probability

travel

probability

travel

tripvldb

travelChina

queueingtheory

travelNorway

USERS

TAGS

harrypotter

ITEMS

SIGIR, Singapore

components of a social tagging network
Components of a Social Tagging Network

Graph G=(UI, EUEIEUI) with

  • 2 types of nodes:
    • Users U (optionally weighted)
    • Items I (optionally weighted)
  • 3 types of edges:
    • EU: User-User (optionally weighted)
    • EI: Item-Item (optionally weighted)
    • EUI: User-Item (labeled with tags T, opt. weighted)

SIGIR, Singapore

information need 1 global
Information Need 1: Global

travel

queues

travel

probability

travel

probability

travel

tripvldb

travelChina

queueingtheory

travelNorway

USERS

harry potter

TAGS

harrypotter

ITEMS

Tags by all users equally important

SIGIR, Singapore

information need 2 similar users
Information Need 2: Similar Users

travel

queues

travel

probability

travel

probability

travel

tripvldb

travelChina

queueingtheory

?

travelNorway

USERS

travel

TAGS

harrypotter

Tags by users with similar tags/items(„brothers in spirit“)more important

ITEMS

SIGIR, Singapore

information need 3 trusted friends
Information Need 3: Trusted Friends

travel

queues

travel

probability

travel

probability

travel

tripvldb

travelChina

queueingtheory

?

travelNorway

USERS

probability

TAGS

harrypotter

ITEMS

Tags by closely related usersmore important

SIGIR, Singapore

wishlist for social aware social search
Wishlist for Social-Aware Social Search
  • Search results depend on
    • Global popularity of items
    • Collection context of the querying user (books, tags)
    • Social context of the querying user (trusted friends)
  • Scalable query processing

(similar wishlist for social recommendations)

SIGIR, Singapore

outline1
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
    • Quantifying Friendship Strengths
    • User-specific Scoring Functions
    • Experimental Evaluation
  • Efficient Query Evaluation
  • Summary & Further Challenges

SIGIR, Singapore

notation
Notation

U set of users

T set of tags

I set of items

tags(u): tags used by user u

items(u): items tagged by user u

items(t): items tagged with tag t by at least one user

df(t): number of items tagged with tag t

tfu(i,t): number of times user u tagged item i with tag t

tf(i,t): number of times item i was tagged with tag t

user uj

tagst11… t1m1

tagstn1… tnmn

item i1 … item in

SIGIR, Singapore

quantifying friendship strengths
Quantifying Friendship Strengths
  • Global „friendship“ strength:
  • Content-based friendship strength
  • Graph-based friendship strength
  • Integrated friendship strength

SIGIR, Singapore

content based friendship strength
Content-Based Friendship Strength
  • Several alternatives:
  • based on overlap of tag usage:
  • based on overlap of tagged items:
  • For both:
  • Pcontent(u,u):=0
  • normalization such that

SIGIR, Singapore

graph based friendship strength
Graph-Based Friendship Strength

Pgraph(u,u‘)

u2

u3

u4

u5

u6

u7

Edges weighted with Pcontent:

  • For both:
  • Pgraph(u,u):=0
  • normalization such that

u1

u5

u3

u7

u2

u6

u4

Unweighted edges:

SIGIR, Singapore

integrated friendship similarity
Integrated Friendship Similarity

Mixture of

  • content-based similarity
  • graph-based friendship similarity
  • background model (global)

(0,,1; +=1)

Pint(u,u‘)

SIGIR, Singapore

towards a user specific score
Towards a User-specific Score

global friendship strength

Convert into user-specific social frequency:

Define user-specific social score:

SIGIR, Singapore

including tag expansion
Including Tag Expansion

Problem: Users use different tags for similar things

 poor recall (missing relevant results)

Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, UdS, Saarland University, …

Solution:

1. Define notion of similar tags

2. Expand queries with similar tags

3. Modify scoring function for expanded queries

SIGIR, Singapore

heuristics for finding similar tags
Heuristics for finding similar tags

Specialization heuristics:

Tag t2specialization of t1 if t1 occurs (almost) whenever t2 occurs

Co-Occurrence heuristics:

Tags t1 and t2similar if they occur (almost) always together

SIGIR, Singapore

scoring expanded queries
Scoring Expanded Queries

Naive approach:

For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query

But:

„transportation disaster“ expanded by „train car bus plane …“

„international crime“ expanded by „mafia camorra yakuza …“

Result quality drops due to topic drift

Better: auto-tuning incremental expansion [SIGIR’05]

For query tag t, consider only expansion with

highest combined score per item

SIGIR, Singapore

experimental evaluation effectiveness
Experimental Evaluation: Effectiveness

Systematic evaluation of result quality difficult

Three setups:

  • Manual queries + human assessments
  • Queries+assessments derived from external info (ex: DMOZ categories)
  • Automated assessments from context of user
    • Items tagged by user and/or friends
    • Items tagged in the future

SIGIR, Singapore

prototype implementation
Prototype Implementation

SIGIR, Singapore

preliminary user study
Preliminary User Study

LibraryThing user study: [Data Engineering Bulletin, June 2008]

  • 6 librarything users with reasonably large library and friend sets
  • Overall 49 queries
  • Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends
  • Measured NDCG[10]

(1-α) (content)

  • Result quality generally very high
  • Limited social influence is best (not enough friends?)
  • Tag expansion has limited influence on results

(1-α)

(graph)

SIGIR, Singapore

outline2
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
  • Efficient Query Evaluation
    • Threshold Algorithms
    • ContextMerge
    • Experimental Evaluation
  • Summary & Further Challenges

SIGIR, Singapore

algorithmic overview
Algorithmic Overview
  • Input: query q={t1…tn} for user u, α, , 
  • Output: k items with highest scores
  • Goals:
    • Avoid computing all results
    • Minimize disk I/O and CPU load
    • Utilize precomputed information on disk

SIGIR, Singapore

excursion threshold algorithms for text ir
Excursion: Threshold Algorithms for Text IR

Input:

  • query q={t1…tn}
  • lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓

Output: k items with highest aggregated score

Algorithm:

  • scan lists in parallel
  • maintain partial candidate results with score bounds
  • terminate as soon as top-k results are stable

SIGIR, Singapore

excursion threshold algorithms
Excursion: Threshold Algorithms

Many powerful extensions:

  • Probabilistic pruning of candidates withguarantees on result quality
  • Random accesses to index lists
  • Scheduling scans and random accesses
  • Dynamic query expansion techniques
  • Hierarchical top-k for phrases
  • Structured queries for XML

Most variants provably instance optimal

Impossible to precompute scoreu(i,t)

(materialize BM25 model per user+config)

 cannot directly apply Threshold Algorithms

SIGIR, Singapore

revisiting the social frequency
Revisiting the Social Frequency

independent of user u

dependent of user u

Compute sfu(i,t) on the fly from tf(i,t), friends of

u and their tagged documents

SIGIR, Singapore

contextmerge 0
ContextMerge (=0)

Precomputed lists:

  • ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓
  • FRIENDS(u): pairs <u‘,Pgraph(u,u‘)>, sorted by Pgraph(u,u‘)↓
  • USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted

Adapted Threshold Algorithm for query u,t1…tn:

  • Scan ITEMS(tp) and n copies of FRIENDS(u),pick „best“ list
    • If ITEMS(tp): read next entry
    • If FRIENDS(u,p): read USERITEMS(u‘,tp) for next friend u‘
    • Update candidates and topk
    • Check for termination

SIGIR, Singapore

contextmerge candidates
ContextMerge: Candidates

Candidate items c maintain for each query term t

tf(t): value read from ITEMS(t) or UNDEF

tfu(t): sum of values read from USERITEMS(u‘,t), weighted byPgraph(u,u‘)

c(t): unweighted sum of values read from USERITEMS(u‘,t)

To compute worstscore(c):

  • plug tf(t) and tfu(t) into defintion of sfu(t) (0 if UNDEF)
  • plug sfu(t) into definition of scoreu(t)

SIGIR, Singapore

contextmerge candidates1
ContextMerge: Candidates

To compute bestscore(c):

  • if tf(t)=UNDEF [not yet seen in ITEMS(t)]

use tf(t)=highttfu(t)=highFt· (hight-c(t))

  • else [already seen in ITEMS(t)]

use tfu(t)=highFt· (tf(t)-c(t))

and plug it into definition of sfu as before

hight: current high score in ITEMS(t)highFt: current high score in FRIENDS(u,t)

SIGIR, Singapore

contextmerge list selection
ContextMerge: List Selection

Lists are greedily selected by highest expected score

  • ITEMS(t):

compute sfu(t), scoreu(t) with tf(t)=hight, tfu(t)=0

  • FRIENDS(u,t):

compute sfu(t), scoreu(t) with tf(t)=0, tfu(t)=highFt·maxtf

max tfu(t)

u,t

SIGIR, Singapore

contextmerge schematic execution
ContextMerge: Schematic execution

consideredUSERITEMS(u‘,t1)

consideredUSERITEMS(u‘,t2)

Items(t1)

Items(t2)

Friends(u,t1)

Friends(u,t1)

SIGIR, Singapore

contextmerge schematic execution1
ContextMerge: Schematic execution

consideredUSERITEMS(u‘,t1)

consideredUSERITEMS(u‘,t2)

Items(t1)

Items(t2)

Friends(u,t1)

Friends(u,t1)

u7

SIGIR, Singapore

contextmerge schematic execution2
ContextMerge: Schematic execution

consideredUSERITEMS(u‘,t1)

consideredUSERITEMS(u‘,t2)

Items(t1)

Items(t2)

Friends(u,t1)

Friends(u,t1)

u7

SIGIR, Singapore

contextmerge schematic execution3
ContextMerge: Schematic execution

consideredUSERITEMS(u‘,t1)

consideredUSERITEMS(u‘,t2)

Items(t1)

Items(t2)

Friends(u,t1)

Friends(u,t1)

u7

SIGIR, Singapore

contextmerge schematic execution4
ContextMerge: Schematic execution

consideredUSERITEMS(u‘,t1)

consideredUSERITEMS(u‘,t2)

Items(t1)

Items(t2)

Friends(u,t1)

Friends(u,t1)

SIGIR, Singapore

experimental evaluation efficiency
Experimental Evaluation: Efficiency
  • Testbed: 3 large crawls of real social networks
    • Flickr: 10 mio pictures, ~50,000 users
    • Del.icio.us: ~175,000 bookmarks, ~12,000 users
    • Librarything: ~6.5 mio books, ~10,000 users
  • Queries:
    • ~150 frequent tag pairs in each set
    • for each query pick user with „enough“ results & friends
  • Cost measure: #sorted acc. + 100#random acc.
  • Baseline: full join + sort

SIGIR, Singapore

outline3
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
  • Efficient Query Evaluation
  • Summary & Further Challenges

SIGIR, Singapore

summary
Summary
  • Need for social-aware social search, supporting
    • global
    • social
    • spiritual

information needs

  • Social scoring
    • integrating global, collection, and social context
    • including dynamic tag expansion
  • ContextMerge: scalable implementation

SIGIR, Singapore

further challenges
Further Challenges
  • Meaningful & common benchmark
  • Incremental maintenance for high dynamics
  • Extend to ratings, user weights, item weights, …
  • Extend to non-tags (like image features)
  • Automatic query parameterization
  • Meaningful explanations of results
  • Exploit dynamics (hot topics, evolving groups,….)

Social-Aware Search & Recommendationsat planet scale

SIGIR, Singapore