connecting distributed people and information on the web l.
Skip this Video
Loading SlideShow in 5 Seconds..
Connecting Distributed People and Information on the Web PowerPoint Presentation
Download Presentation
Connecting Distributed People and Information on the Web

Loading in 2 Seconds...

play fullscreen
1 / 59

Connecting Distributed People and Information on the Web - PowerPoint PPT Presentation

  • Uploaded on

Connecting Distributed People and Information on the Web. Jennifer Golbeck College of Information Studies Human-Computer Interaction Lab University of Maryland, College Park. Information Access on the Web. Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell. .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Connecting Distributed People and Information on the Web' - Jimmy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
connecting distributed people and information on the web

Connecting Distributed People and Information on the Web

Jennifer Golbeck

College of Information Studies

Human-Computer Interaction Lab

University of Maryland, College Park

information access on the web
Information Access on the Web
  • Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell.

The Cowbell Project -

finding trusted information
Finding Trusted Information
  • How many cows in Texas?

the social solution
The Social Solution
  • People are the sources of information
  • Social relationships give us information about people
  • Use relationships to understand the information people produce.
current state
Current State
  • 250-ish social networks
  • 850,000,000 users
  • Ning claims 185,000 networks
my research questions
My Research Questions
  • How do users behave and relate to one another in web-based social networks?
  • How do social connections, like trust, relate to information?
  • How can we estimate relationships (like trust) between people who do not know each other?
  • How can we use social networks to build intelligent systems to improve information access?
social relationships and information

Social Relationships and Information

How Trust Relates to Similarity

a study
A Study
  • People create information on the web
    • An expression of their opinions and view of the world
    • Focus on quantitative information (e.g. ratings)
  • People express trust in social networks
  • How does trust relate to the similarity of two people
the idea
The Idea
  • We know trust correlates with overall similarity (Ziegler and Golbeck, 2006)
  • Does trust capture more than just overall agreement?
  • Two Part Analysis
    • Controlled study to find profile similarity measures that relate to trust
    • Verification through application in a live system
experimental outline
Experimental Outline
  • Phase 1: Rate Movies - Subjects rate movies on the list
    • Ratings grouped as extreme (1,2,9,10) or far from average (≥4 different)
  • Create profiles of hypothetical users
    • Profile is a list of movies and the hypothetical user’s ratings of them
  • Subjects rate how much they would trust the person represented by the profile
    • Vary the profile’s ratings in a controlled way
phase 1 rating movies
Phase 1: Rating Movies
  • Movies most subjects would have seen - (100 worldwide top grossing films of all time)
  • Cover a broad spectrum of genres -
    • Top 10 rated movies from each genre as listed in the Internet Movie Database (IMDB): Action, Adventure, Animation, Family, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Independent, Musical, Mystery, Romance, Science Fiction, Thriller, War, and Western.
  • Include bad movies -(IMDB 100 worst rated movies with at least 1,000 ratings)
  • 283 total films
  • Ratings on 1 (bad) to 10 (great) scale
generating profiles
Generating Profiles
  • Each profile contained exactly 10 movies, 4 from an experimental category and 6 from its complement
    • E.g. 4 movies with extreme ratings and 6 with non-extreme ratings
  • Control for average difference, standard deviation, etc. so we could see how differences on specific categories of films affected trust
  • 59 subjects
  • Age 20 to 52
  • Education
    • 6 high school, 11 bachelors, 23 masters, 11 PhD, 8 unreported
  • Movie Experience
    • Watch 1-2 times per week on average
    • Movie media (web, magazines, etc.) every week or two
  • Reconfirmed that trust strongly correlates with overall similarity ().
  • Agreement on extremes ()
  • Largest single difference (r)
  • Subject’s propensity to trust ()
  • Gather all pairs of FilmTrust users who have a known trust relationship and share movies in common
    • 322 total user pairs
  • Develop a formula using the experimental parameters to estimate trust
  • Compute accuracy by comparing computed trust value with known value
in filmtrust
In FilmTrust

Use weights (w1,w2, w3, w4, w) = (7,2,1,8,2)

experimental conclusions
Experimental Conclusions
  • Social trust relationships are stronger between people who are similar in certain ways
  • First observed in controlled experiments
  • Verified through application in a real system


Using social trust for improved information access

social information access
Social Information Access
  • Use social relationships (e.g. trust) for
    • Aggregating Information
    • Sorting and Ranking Information
    • Filtering and Assessing the Quality of Information
  • FilmTrust
  • Use Trust for information access
    • Recommender system
    • Review ordering
  • 1200 users
information aggregation using trust
Information Aggregation Using Trust
  • Trust-based Recommender System
    • Generates predictive movie ratings based on trust
    • Weighted average of everyone’s ratings of the film,where trust is the weight

Difference between known user rating and recommended rating

(measured in number of stars difference)

Minimum difference between known user rating and average rating

conclusions social information access
Conclusions - Social Information Access
  • Use understanding and analysis of social behavior in web-based social networks to improve information access
  • Shown a connection between social trust and similarity
  • Shown how trust can be used for aggregating, sorting, and filtering information
future directions general
Future Directions - General
  • Improved understanding of behavior in web-based social networks
  • How different types of social connections relate to information
  • How to improve information access using new social analyses
future directions specific
Future Directions - Specific
  • Ad hoc information and social networks for micro news
  • E.g. I have evacuated for natural disaster (earthquake, hurricane, flood). I want to know what’s going on at my house.
  • Distributed information (satellite photos, ground, video, photos, blog entries, local news reports, message board text)
  • Needs
    • Provenance - is this information unique, or is it all derived from the same source?
    • Trust - should I trust the source of this information?
  • Jennifer Golbeck
generating profiles34
Generating Profiles
  • Pre-defined rating differences
  • Subjects rated 54 total profiles
    • Six categories
    • Three  values
    • Three profiles in each -category combination
the provenance challenge
The Provenance Challenge
  • Researchers in many areas
    • Storage systems
    • Databases
    • Grid computing
    • Data mining
  • A challenge provides a standard for comparing approaches
  • Given a scientific workflow and nine challenge queries
  • Represent all data that we consider relevant about the history of each file
  • Answer as many queries as possible
filmtrust results
FilmTrust Results
  • FilmTrust compared trust from the social network with overall similarity (via collaborative filtering algorithms) as a weight in recommender systems.
  • Trust outperformed overall similarity in some cases, suggesting that trust captures something more than overall similarity does
ten largest wbsns
Ten Largest WBSNs
  • MySpace 150,000,000
  • ChinaRen Xiaonei 60,000,000
  • Adult Friend Finder 26,000,000
  • Bebo 25,000,000
  • Friendster 21,000,000
  • Cyworld 21,000,000
  • Tickle 20,000,000
  • Black Planet 18,000,000
  • Hi5 14,000,000
  • LiveJournal 12,000,000
example queries
Example Queries
  • Find everything that caused a given Graphic to be as it is.
  • Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter that ran on a Monday.
  • Find all images where at least one of the input files had an entry global maximum=4095.
  • A user has annotated some images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
semantic web approach
Semantic Web Approach
  • Ontology represents information about the execution of services and the dependencies among files
  • Logical inferences connect objects to their ancestors
    • Role hierarchy separates direct lineage from ancestry
    • Semantics of transitive roles imply connections among files connected through ancestral relationships
  • Additional reasoning with Semantic Web Rules
evaluation through query answering
Evaluation through Query Answering
  • SPARQL, a W3C standard, is used to formulate queries
  • We were easily able to answer all nine queries for the challenge (one of only 3 teams from 15 entries)
  • Have already completed the second phase of the challenge, importing data from other systems and applying our techniques

A Web-based Social Network (WBSN) must meet these criteria:

  • Accessible over the web with a web browser
  • Users must explicitly state their relationship with other people qua stating a relationship
  • Must have explicit built-in support for users making these connections.
  • Relationships must be visible and browsable


why the difference
Why the Difference?
  • Ranges of disconnected members
    • Dogster and HAMSTERster have lowest rates
    • Ecademy
    • FilmTrust
    • Mobango and Worldshine
  • As the non-social networking purpose of the website becomes stronger, the number of friendless and outsiders increases


using web based social networks wbsns
Using Web-BasedSocial Networks (WBSNs)
  • If we are going to use social networks for information access we must understand…
  • How do users behave in social networks?
  • How do social relationships relate to information?
  • The trust we have in people can inform how we treat information provided by those people
  • This and other studies suggest trust will work well for aggregating, filtering and sorting information
  • Important when working on the web
  • Motivation
  • Understanding Relationships in Web-based Social Networks
    • Behavior
    • Trust
  • Using Social Relationships for Information Access
  • Conclusions and Future Directions
understanding social behavior

Understanding Social Behavior

In Web-Based Social Networks

behavior and dynamics
Behavior and Dynamics
  • Social networks are not static.
    • Relationships constantly change, are formed, and are dropped.
    • New people enter the network and others leave
  • Do people behave the same way in social networks on the Web?
  • How do these networks grow (and shrink)?
  • How are relationships added (and removed)?
  • What affects social disconnect?
  • What affects centrality?
  • 24 month study
  • Automatically collected adjacency lists (everyone and who they know), join dates, and last active dates for all members.
    • December 2004
    • December 2006
  • For 7 networks, I collected adjacency lists every day for 7 weeks.
    • Who joined or left
    • What relationships were added or removed
network growth
Network Growth
  • People do not leave social networks
    • On sites with a clear simple process, less than a dozen members leave per day
    • In some networks, essentially no one has ever left
  • Lots of people join social networks
    • For ten networks we knew the date that every member joined the network
    • Networks tend to show linear growth
    • The slope can shift
      • Usually occurs suddenly
      • Explained by some event
  • Forming relationships is the basis for social networking
  • Almost all networks are growing denser
    • Relationships grow at approximately 1.7 - 2.7 times the rate of membership
  • There is a strong social disincentive to remove relationships
friendless and the outsiders
Friendless and the Outsiders
  • Friendless have no social connections
  • Outsiders have social connections but are independent from the major connected component of the network
  • Important because if we are using the social network for information access, these people will get little benefit.
  • Other than having lots of friends, what makes people more central?
    • Average shortest path length as centrality measure
  • Activity
    • Consider join date, last active date, and length of activity (last active date - join date)
    • Compute rank correlation with centrality
    • Medium strength correlation (~0.5) between duration and centrality
  • Networks follow a linear growth pattern, where the slope shifts in response to events
    • People rarely leave networks
  • Networks grow denser, with relationships added more frequently than members
    • People will delete relationships, but orders of magnitude less frequently than they add them
  • Websites with more non-social features tend to have more friendless and disconnected users
  • Users with longer periods of activity tend to be more central to the network
example profile59
Example Profile
  • Movies m1 through m10
  • User ratings r1…r10 for m1…m10
    • r1…r4 are extreme (1,2,9, or 10)
    • r5…r10 are not extreme
  • Profile ratings pi = ri§i