Connecting distributed people and information on the web
Download
1 / 59

Connecting Distributed People and Information on the Web - PowerPoint PPT Presentation


  • 349 Views
  • Uploaded on

Connecting Distributed People and Information on the Web. Jennifer Golbeck College of Information Studies Human-Computer Interaction Lab University of Maryland, College Park. Information Access on the Web. Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell. .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Connecting Distributed People and Information on the Web' - Jimmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Connecting distributed people and information on the web l.jpg

Connecting Distributed People and Information on the Web

Jennifer Golbeck

College of Information Studies

Human-Computer Interaction Lab

University of Maryland, College Park


Information access on the web l.jpg
Information Access on the Web

  • Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell.

The Cowbell Project - http://www.geekspeakweekly.com/cowbell/


Finding trusted information l.jpg
Finding Trusted Information

  • How many cows in Texas?

http://www.cowabduction.com/


The social solution l.jpg
The Social Solution

  • People are the sources of information

  • Social relationships give us information about people

  • Use relationships to understand the information people produce.


Current state l.jpg
Current State

  • 250-ish social networks

  • 850,000,000 users

  • Ning claims 185,000 networks


My research questions l.jpg
My Research Questions

  • How do users behave and relate to one another in web-based social networks?

  • How do social connections, like trust, relate to information?

  • How can we estimate relationships (like trust) between people who do not know each other?

  • How can we use social networks to build intelligent systems to improve information access?


Social relationships and information l.jpg

Social Relationships and Information

How Trust Relates to Similarity


A study l.jpg
A Study

  • People create information on the web

    • An expression of their opinions and view of the world

    • Focus on quantitative information (e.g. ratings)

  • People express trust in social networks

  • How does trust relate to the similarity of two people


The idea l.jpg
The Idea

  • We know trust correlates with overall similarity (Ziegler and Golbeck, 2006)

  • Does trust capture more than just overall agreement?

  • Two Part Analysis

    • Controlled study to find profile similarity measures that relate to trust

    • Verification through application in a live system


Experimental outline l.jpg
Experimental Outline

  • Phase 1: Rate Movies - Subjects rate movies on the list

    • Ratings grouped as extreme (1,2,9,10) or far from average (≥4 different)

  • Create profiles of hypothetical users

    • Profile is a list of movies and the hypothetical user’s ratings of them

  • Subjects rate how much they would trust the person represented by the profile

    • Vary the profile’s ratings in a controlled way


Phase 1 rating movies l.jpg
Phase 1: Rating Movies

  • Movies most subjects would have seen - (100 worldwide top grossing films of all time)

  • Cover a broad spectrum of genres -

    • Top 10 rated movies from each genre as listed in the Internet Movie Database (IMDB): Action, Adventure, Animation, Family, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Independent, Musical, Mystery, Romance, Science Fiction, Thriller, War, and Western.

  • Include bad movies -(IMDB 100 worst rated movies with at least 1,000 ratings)

  • 283 total films

  • Ratings on 1 (bad) to 10 (great) scale


Generating profiles l.jpg
Generating Profiles

  • Each profile contained exactly 10 movies, 4 from an experimental category and 6 from its complement

    • E.g. 4 movies with extreme ratings and 6 with non-extreme ratings

  • Control for average difference, standard deviation, etc. so we could see how differences on specific categories of films affected trust



Subjects l.jpg
Subjects

  • 59 subjects

  • Age 20 to 52

  • Education

    • 6 high school, 11 bachelors, 23 masters, 11 PhD, 8 unreported

  • Movie Experience

    • Watch 1-2 times per week on average

    • Movie media (web, magazines, etc.) every week or two


Results l.jpg
Results

  • Reconfirmed that trust strongly correlates with overall similarity ().

  • Agreement on extremes ()

  • Largest single difference (r)

  • Subject’s propensity to trust ()



Validation l.jpg
Validation

  • Gather all pairs of FilmTrust users who have a known trust relationship and share movies in common

    • 322 total user pairs

  • Develop a formula using the experimental parameters to estimate trust

  • Compute accuracy by comparing computed trust value with known value


In filmtrust l.jpg
In FilmTrust

Use weights (w1,w2, w3, w4, w) = (7,2,1,8,2)


Experimental conclusions l.jpg
Experimental Conclusions

  • Social trust relationships are stronger between people who are similar in certain ways

  • First observed in controlled experiments

  • Verified through application in a real system


Applications l.jpg

Applications

Using social trust for improved information access


Social information access l.jpg
Social Information Access

  • Use social relationships (e.g. trust) for

    • Aggregating Information

    • Sorting and Ranking Information

    • Filtering and Assessing the Quality of Information

  • FilmTrust


Filmtrust l.jpg
FilmTrust

  • Use Trust for information access

    • Recommender system

    • Review ordering

  • 1200 users


Information aggregation using trust l.jpg
Information Aggregation Using Trust

  • Trust-based Recommender System

    • Generates predictive movie ratings based on trust

    • Weighted average of everyone’s ratings of the film,where trust is the weight


Slide25 l.jpg

Difference between known user rating and recommended rating

(measured in number of stars difference)

Minimum difference between known user rating and average rating


Conclusions and future directions l.jpg

Conclusions andFuture Directions


Conclusions social information access l.jpg
Conclusions - Social Information Access

  • Use understanding and analysis of social behavior in web-based social networks to improve information access

  • Shown a connection between social trust and similarity

  • Shown how trust can be used for aggregating, sorting, and filtering information


Future directions general l.jpg
Future Directions - General

  • Improved understanding of behavior in web-based social networks

  • How different types of social connections relate to information

  • How to improve information access using new social analyses


Future directions specific l.jpg
Future Directions - Specific

  • Ad hoc information and social networks for micro news

  • E.g. I have evacuated for natural disaster (earthquake, hurricane, flood). I want to know what’s going on at my house.

  • Distributed information (satellite photos, ground, video, photos, blog entries, local news reports, message board text)

  • Needs

    • Provenance - is this information unique, or is it all derived from the same source?

    • Trust - should I trust the source of this information?


Questions l.jpg
Questions

  • Jennifer Golbeck

  • [email protected]

  • http://trust.mindswap.org


Generating profiles34 l.jpg
Generating Profiles

  • Pre-defined rating differences

  • Subjects rated 54 total profiles

    • Six categories

    • Three  values

    • Three profiles in each -category combination


The provenance challenge l.jpg
The Provenance Challenge

  • Researchers in many areas

    • Storage systems

    • Databases

    • Grid computing

    • Data mining

  • A challenge provides a standard for comparing approaches

  • Given a scientific workflow and nine challenge queries

  • Represent all data that we consider relevant about the history of each file

  • Answer as many queries as possible


Filmtrust results l.jpg
FilmTrust Results

  • FilmTrust compared trust from the social network with overall similarity (via collaborative filtering algorithms) as a weight in recommender systems.

  • Trust outperformed overall similarity in some cases, suggesting that trust captures something more than overall similarity does


Ten largest wbsns l.jpg
Ten Largest WBSNs

  • MySpace 150,000,000

  • ChinaRen Xiaonei 60,000,000

  • Adult Friend Finder 26,000,000

  • Bebo 25,000,000

  • Friendster 21,000,000

  • Cyworld 21,000,000

  • Tickle 20,000,000

  • Black Planet 18,000,000

  • Hi5 14,000,000

  • LiveJournal 12,000,000


Example queries l.jpg
Example Queries

  • Find everything that caused a given Graphic to be as it is.

  • Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter that ran on a Monday.

  • Find all images where at least one of the input files had an entry global maximum=4095.

  • A user has annotated some images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.


Semantic web approach l.jpg
Semantic Web Approach

  • Ontology represents information about the execution of services and the dependencies among files

  • Logical inferences connect objects to their ancestors

    • Role hierarchy separates direct lineage from ancestry

    • Semantics of transitive roles imply connections among files connected through ancestral relationships

  • Additional reasoning with Semantic Web Rules


Evaluation through query answering l.jpg
Evaluation through Query Answering

  • SPARQL, a W3C standard, is used to formulate queries

  • We were easily able to answer all nine queries for the challenge (one of only 3 teams from 15 entries)

  • Have already completed the second phase of the challenge, importing data from other systems and applying our techniques


Definition l.jpg
Definition

A Web-based Social Network (WBSN) must meet these criteria:

  • Accessible over the web with a web browser

  • Users must explicitly state their relationship with other people qua stating a relationship

  • Must have explicit built-in support for users making these connections.

  • Relationships must be visible and browsable

(back)


Why the difference l.jpg
Why the Difference?

  • Ranges of disconnected members

    • Dogster and HAMSTERster have lowest rates

    • Ecademy

    • FilmTrust

    • Mobango and Worldshine

  • As the non-social networking purpose of the website becomes stronger, the number of friendless and outsiders increases

(back)


Using web based social networks wbsns l.jpg
Using Web-BasedSocial Networks (WBSNs)

  • If we are going to use social networks for information access we must understand…

  • How do users behave in social networks?

  • How do social relationships relate to information?



Implications l.jpg
Implications

  • The trust we have in people can inform how we treat information provided by those people

  • This and other studies suggest trust will work well for aggregating, filtering and sorting information

  • Important when working on the web


Outline l.jpg
Outline

  • Motivation

  • Understanding Relationships in Web-based Social Networks

    • Behavior

    • Trust

  • Using Social Relationships for Information Access

  • Conclusions and Future Directions


Understanding social behavior l.jpg

Understanding Social Behavior

In Web-Based Social Networks


Behavior and dynamics l.jpg
Behavior and Dynamics

  • Social networks are not static.

    • Relationships constantly change, are formed, and are dropped.

    • New people enter the network and others leave

  • Do people behave the same way in social networks on the Web?


Questions49 l.jpg
Questions

  • How do these networks grow (and shrink)?

  • How are relationships added (and removed)?

  • What affects social disconnect?

  • What affects centrality?


Methodology l.jpg
Methodology

  • 24 month study

  • Automatically collected adjacency lists (everyone and who they know), join dates, and last active dates for all members.

    • December 2004

    • December 2006

  • For 7 networks, I collected adjacency lists every day for 7 weeks.

    • Who joined or left

    • What relationships were added or removed


Network growth l.jpg
Network Growth

  • People do not leave social networks

    • On sites with a clear simple process, less than a dozen members leave per day

    • In some networks, essentially no one has ever left

  • Lots of people join social networks

    • For ten networks we knew the date that every member joined the network

    • Networks tend to show linear growth

    • The slope can shift

      • Usually occurs suddenly

      • Explained by some event


Relationships l.jpg
Relationships

  • Forming relationships is the basis for social networking

  • Almost all networks are growing denser

    • Relationships grow at approximately 1.7 - 2.7 times the rate of membership

  • There is a strong social disincentive to remove relationships



Friendless and the outsiders l.jpg
Friendless and the Outsiders

  • Friendless have no social connections

  • Outsiders have social connections but are independent from the major connected component of the network

  • Important because if we are using the social network for information access, these people will get little benefit.


Centrality l.jpg
Centrality

  • Other than having lots of friends, what makes people more central?

    • Average shortest path length as centrality measure

  • Activity

    • Consider join date, last active date, and length of activity (last active date - join date)

    • Compute rank correlation with centrality

    • Medium strength correlation (~0.5) between duration and centrality


Conclusions l.jpg
Conclusions

  • Networks follow a linear growth pattern, where the slope shifts in response to events

    • People rarely leave networks

  • Networks grow denser, with relationships added more frequently than members

    • People will delete relationships, but orders of magnitude less frequently than they add them

  • Websites with more non-social features tend to have more friendless and disconnected users

  • Users with longer periods of activity tend to be more central to the network


Example profile59 l.jpg
Example Profile

  • Movies m1 through m10

  • User ratings r1…r10 for m1…m10

    • r1…r4 are extreme (1,2,9, or 10)

    • r5…r10 are not extreme

  • Profile ratings pi = ri§i


ad