Location mining from online social networks
Download
1 / 66

Location Mining from Online Social Networks - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Location Mining from Online Social Networks. Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham. Location Mining in Online Social Networks. What is the city level home location of a user?. Outline. Introduction and Problem Statement Different Approaches

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Location Mining from Online Social Networks' - avari


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Location mining from online social networks

Location Mining from Online Social Networks

SatyenAbrol

Advisors:

Dr. Latifur Khan

Dr. BhavaniThuraisingham


Location mining in online social networks
Location Mining in Online Social Networks

What is the city level home location of a user?


Outline
Outline

  • Introduction and Problem Statement

  • Different Approaches

  • Social Graph Based: Our Approaches

    • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


Outline1
Outline

  • Introduction and Problem Statement

  • Different Approaches

  • Social Graph Based: Our Approaches

    • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Twitter basics
    Twitter - Basics

    Location

    # of Followers

    # of Following

    # of Tweets

    Tweets:

    Maximum 140 Characters



    Privacy and security
    Privacy and Security

    • Losing locational privacy forever

      • Users leave field blank, don’t want strangers to know their locations

    • http://pleaserobme.com/


    Trustworthiness
    Trustworthiness

    To be able to trust/verify the correctness of location mentioned in user profile

    • Corporate companies use social media for better advertising and marketing

    • Iran Elections of 2009

      • US State Department used Twitter as a source

    • Trustworthiness is important in such cases


    Marketing and business
    Marketing and Business

    • Large corporations Walmart, Starbucks, United Airlines use social media

      • Great tool for inexpensive advertising

      • Getting feedback from users


    The problem
    The Problem

    • Leave the location field blank in their Twitter profiles

    • Do not provide valid geographic information

      • “Justin Biebers heart”, “NON YA BISNESS!!”, “looking down on u people”

    • Provide incorrect locations which may actually exist in real world

      • “Nothing” in Arizona, “Little Heaven” in Connecticut

    • Provide several locations, difficult to identify the home location

      • “CALi b0Y $TuCCiN V3Ga$” – California boy stuck in Las Vegas, NV

    • (~35%) enter just country, state, county, etc. and no city level locations1

    B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justinbiebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.


    Outline2
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

      • Tweecalization: Label Propagation

      • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Location prediction in social networks
    Location Prediction in Social Networks

    • Two Approaches

      • Content Based1,2

      • Using Social Graph3,4,5

    Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10.

    B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.

    S. Abrol, L. Khan and B. Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    S. Abrol., L. Khan and B. Thuraisingham “Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012 Pittsburgh, Pennsylvania.

    S. Abrol., L. Khan, “Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining,” The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.


    Content based approach
    Content Based Approach

    • Inaccurate – Location in Text not Location of User

    • Involves Ambiguity: Paris can mean

      • Paris Hilton

      • Paris, the capital of France

      • Paris, a town in Texas

    • Slow – Uses NLP/ Machine Learning techniques, searches gazetteers


    Using social graphs
    Using Social Graphs

    • Based on Japanese Proverb - “When the character of a man is not clear to you, look at his friends.”

    • Relationship between geospatial proximity and friendship

    • Uses classical data mining algorithms for more accurate results

    • Faster and can be used for real world applications


    Geospatial proximity and friendship
    Geospatial Proximity and Friendship

    • Form 1012 Twitter user pairs and identify geo distance

    • Curve follows power law, curve of form a(x+b)-c with exponent of -0.87


    Graph construction
    Graph Construction

    • Vertices (data points) represents users

    • Edge represents ‘similarity’ between two users

    • Deal with special cases

      • Spammers – follow random people

      • Celebrities – followed by random people

  • Edge weight gets abbreviated


  • Defining edge weight
    Defining Edge Weight

    • Consists of two components:

      • Trustworthiness (TW)

      • Mutual Friends (MF)


    Trustworthiness1
    Trustworthiness

    • Fraction of friends which have the same label as the user himself

    • Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthiness

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Trustworthiness: 0.6

    Friend

    Location:Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA


    Mutual friends
    Mutual Friends

    • Chose number common friends for similarity

      • Better Accuracy

      • Low Time Complexity


    Defining edge weight1
    Defining Edge Weight

    • Defined as

      Weightij=α×Max{TW(Ui), TW(Uj)} + (1- α) × MFij

    • 0<α<1, typically chosen to be around 0.7


    Outline3
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Tweethood fuzzy k closest friends with variable depth
    Tweethood: Fuzzy k-Closest Friends with Variable Depth

    • Choose k “closest” friends for the user

    • If location is not found look further for the answer

    • Each node is defined by a vector having locations with their respective probabilities

    • Boost and Aggregate at each step

    Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010




    Choose k closest friends of john doe
    Choose k closest friends of John Doe

    CB1

    CB2

    CB3

    CBk


    Identify locations
    Identify Locations

    Location : NULL

    CB1

    LOW ACCURACY

    Location : Seattle, USA

    CB2

    CB3

    Location : NULL

    CBk

    Location : NULL


    What if we have depth 2
    What if we have depth=2 ?

    Location : Seattle/WA/USA

    Location : NULL

    Location : NULL

    Location : Dallas/TX/USA

    Location : NULL

    Location : Sydney/AU

    CB1

    Location : Dallas/TX/USA

    CB2

    Location : NULL

    Location : Richardson/TX/USA

    CB3

    Location : NULL

    CBk


    Location vector for john doe s friends
    Location Vector for John Doe’s friends

    Dallas/TX/USA 0.4

    Seattle/WA/USA 0.2

    Richardson/TX/USA 0.2

    Sydney/AU 0.2

    CB1

    Dallas/TX/USA 0.33

    New Delhi/Delhi/India 0.33

    Sunnyvale/CA/USA 0.33

    CB2

    CB3

    Austin/TX/USA 0.50

    Minneapolis/MN/USA 0.50

    CBk

    Plano/TX/USA 0.25

    Boulder/CO/USA 0.25

    Salt Lake City/UT/USA 0.25

    London/London/GB 0.25


    Location vector for john doe
    Location Vector for John Doe

    Dallas/TX/USA 0.1825

    Seattle/WA/USA 0.05

    Richardson/TX/USA 0.05

    Sydney/AU 0.05

    New Delhi/Delhi/IN 0.0825

    Sunnyvale/CA/USA 0.0825

    Austin/TX/USA 0.125

    Minneapolis/MN/USA 0.125

    Plano/TX/USA 0.0625

    Boulder/CO/USA 0.0625

    Salt Lake City/UT/US 0.0625

    London/GB 0.0625


    Agglomerative clustering
    Agglomerative Clustering

    Dallas/TX/USA 0.1825

    Seattle/WA/USA 0.05

    Richardson/TX/USA 0.05

    Sydney/AU 0.05

    New Delhi/Delhi/IN 0.0825

    Sunnyvale/CA/USA 0.0825

    Austin/TX/USA 0.125

    Minneapolis/MN/USA 0.125

    Plano/TX/USA 0.0625

    Boulder/CO/USA 0.0625

    Salt Lake City/UT/US 0.0625

    London/GB 0.0625


    Agglomerative clustering1
    Agglomerative Clustering

    {Dallas, Plano,

    Richardson}/TX/USA 0.295

    Seattle/WA/USA 0.05

    Sydney/AU 0.05

    New Delhi/Delhi/IN 0.0825

    Sunnyvale/CA/USA 0.0825

    Austin/TX/USA 0.125

    Minneapolis/MN/USA 0.125

    Boulder/CO/USA 0.0625

    Salt Lake City/UT/US 0.0625

    London/GB 0.0625



    Outline4
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

      • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Tweecalization label propagation
    Tweecalization: Label Propagation

    • But the availability of users with location is limited

    • Most of users do not have a location

    • Need a method that can learn from unlabeled data

    Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania


    Tweecalization label propagation1
    Tweecalization: Label Propagation

    • Ideal scenario for semi supervised learning: Only a few friends with locations(labeled data)1

    • Use both labeled and unlabeled data for training

    • Points which are close to each other are more likely to share a label

    Y. Bengio, O. Dellalleau, and N. L. Roux, “Label propagation and quadratic criterion,” In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.


    Label propagation an illustration
    Label Propagation: An Illustration

    “CLAMPED LOCATIONS”

    Central User

    Friends with location

    Friends without location

    ?



    Outline5
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • What about temporal analysis
    What About Temporal Analysis?

    • None of the existing works do temporal analysis

    • What about migration/ geographical mobility?


    Migration geographical mobility
    Migration/Geographical Mobility

    • 4% to 6% every year, means 12 to 17 million each year

    United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/


    Migration geographical mobility1
    Migration/Geographical Mobility

    • Migration as a function of age

    • People aged 20-29 have a higher probability to move

    High Migration Rate: College and Jobs

    Low Migration Rate: Old age, people settle down

    United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/


    Facebook users and mobility
    Facebook Users and Mobility

    • Let us look at the cumulative effect

    • Only 28% to 37% are currently living in their hometown

    Based on our experiments on 300k Public Facebook Profiles


    Twitter users and mobility
    Twitter Users and Mobility

    • Linking Twitter users to migration

    • 33% of all Twitter users are aged 25-34 years

    Based on our findings by [1] ABI Research. Online. Available: http://www.abiresearch.com


    Tweeque graph partitioning
    Tweeque: Graph Partitioning

    • How do we know if “this” is the current location for a user?

    • How do we perform temporal analysis of friendships?

    • Propose a technique that indirectly infers the current location

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.


    Observation 1 social cliques and location
    Observation 1: Social Cliques and Location

    • Our definition: A social clique is an inclusive group of people that share friendship

    • Apart from friendship, what is the attribute that links members of a clique? Individual Locations

    • All members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.


    Observation 2 migration and time
    Observation 2: Migration and Time

    • As shown previously over course of time, people have tendency to migrate

    • Based on these two observations we hypothesize

    • If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations.

    • Migration is our latent time factor


    Tweeque an example
    Tweeque: An example

    Friends from high school in Dallas

    Friends from college in Boston

    Relatives/Cousins

    Friends from job in Seattle


    Tweeque an example1
    Tweeque: An example

    All Friends of the User


    Tweeque an example2
    Tweeque: An example

    Social Clique #1 (High School)

    Social Clique #2 (College)

    Social Clique #3 (Current Work)

    Social Clique #4 (Relatives)


    Tweeque an example3
    Tweeque: An Example

    Relatives

    High School

    College

    Work

    Singapore

    Seattle/WA/USA

    Boston/MA/USA

    Dallas/TX/USA

    Seattle/WA/USA

    Sydney/Australia

    Portland/OR/USA

    Seattle/WA/USA

    Dallas/TX/USA

    Dallas/TX/USA

    Dallas/TX/USA

    Austin/TX/USA

    Dallas/TX/USA

    Seattle/WA/USA

    Boston/MA/USA

    San Diego/CA/USA

    Ontario/Canada

    Redmond/WA/USA

    Dallas/TX/USA

    New York/NY/USA

    Purity (Dallas) = 0.32

    Purity (Boston) = 0.45

    Purity (Dallas) = 0.18

    Purity (Seattle) = 0.69



    Tweeque graph partitioning2
    Tweeque: Graph Partitioning

    J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.





    Outline6
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Experiment data
    Experiment Data

    • Randomly choose 1000 Twitter users


    Experiments and results
    Experiments and Results

    • 75.5% for city level prediction

    • 80.1% for country level prediction

    • We observe that the accuracy saturates after depth 4

    • Six degrees of separation is the idea that everyone is on average approximately six steps away, by way of introduction, from any other person in the world`

    • For Twitter this distance is found to be 4.67


    Comparison of different approaches
    Comparison of Different Approaches

    SatyenAbrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10.


    Outline7
    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Contributions
    Contributions

    • Developed three graph based location mining algorithms for online social networks

      • Maps location mining problem to k-nearestneighbor, semi supervised and graph partitioning problem

      • Outperform content based approach in time and accuracy

    • Relationship between geospatial proximity and friendship

    • Effect of geographical mobility on current location of users


    Future work
    Future Work

    • Combining Content and Graph based methods

      • Score based geo-tagging technique1

      • Associating keywords with locations to build probabilistic model: “cowboys”  Dallas, “casino”  Las Vegas

      • Since tweets have timestamps, it leads to more accurate prediction of current location

    1 Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle.


    Future work1
    Future Work

    • Improve scalability of current algorithms using cloud computing framework

      • Each of the friends of a user is handled by a separate node in the distributed environment

    • Micro-level location identification

      • Identify specific points of interests (POIs) such as restaurants, place of work, etc from tweets

      • Identify comfort zone for a user

      • Use Foursquare check-in dataset: over 30 million POIs all over the world


    Publications
    Publications

    • Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    • Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

    • Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)


    Publications1
    Publications

    • Satyen Abrol And Latifur Khan, “TWinner: Understanding News Queries With Geo-Content Using Twitter”. In Proc. of 6th Workshop on Geographic Information Retrieval (GIR'10) At Zurich, Switzerland.

    • Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle.

    • Satyen Abrol, Latifur Khan, Vaibhav Khadilkar, Bhavani M. Thuraisingham, Tyrone Cadenhead, “Design and implementation of SNODSOC: Novel class detection for social network analysis”, ISI 2012: 215-220


    Thank you

    Thank You!

    Questions?


    ad