Location mining from online social networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 66

Location Mining from Online Social Networks PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Location Mining from Online Social Networks. Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham. Location Mining in Online Social Networks. What is the city level home location of a user?. Outline. Introduction and Problem Statement Different Approaches

Download Presentation

Location Mining from Online Social Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Location mining from online social networks

Location Mining from Online Social Networks

SatyenAbrol

Advisors:

Dr. Latifur Khan

Dr. BhavaniThuraisingham


Location mining in online social networks

Location Mining in Online Social Networks

What is the city level home location of a user?


Outline

Outline

  • Introduction and Problem Statement

  • Different Approaches

  • Social Graph Based: Our Approaches

    • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


Outline1

Outline

  • Introduction and Problem Statement

  • Different Approaches

  • Social Graph Based: Our Approaches

    • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Twitter basics

    Twitter - Basics

    Location

    # of Followers

    # of Following

    # of Tweets

    Tweets:

    Maximum 140 Characters


    Why is location so important

    Why is location so important?


    Privacy and security

    Privacy and Security

    • Losing locational privacy forever

      • Users leave field blank, don’t want strangers to know their locations

    • http://pleaserobme.com/


    Trustworthiness

    Trustworthiness

    To be able to trust/verify the correctness of location mentioned in user profile

    • Corporate companies use social media for better advertising and marketing

    • Iran Elections of 2009

      • US State Department used Twitter as a source

    • Trustworthiness is important in such cases


    Marketing and business

    Marketing and Business

    • Large corporations Walmart, Starbucks, United Airlines use social media

      • Great tool for inexpensive advertising

      • Getting feedback from users


    The problem

    The Problem

    • Leave the location field blank in their Twitter profiles

    • Do not provide valid geographic information

      • “Justin Biebers heart”, “NON YA BISNESS!!”, “looking down on u people”

    • Provide incorrect locations which may actually exist in real world

      • “Nothing” in Arizona, “Little Heaven” in Connecticut

    • Provide several locations, difficult to identify the home location

      • “CALi b0Y $TuCCiN V3Ga$” – California boy stuck in Las Vegas, NV

    • (~35%) enter just country, state, county, etc. and no city level locations1

    B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justinbiebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.


    Outline2

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

      • Tweecalization: Label Propagation

      • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Location prediction in social networks

    Location Prediction in Social Networks

    • Two Approaches

      • Content Based1,2

      • Using Social Graph3,4,5

    Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10.

    B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.

    S. Abrol, L. Khan and B. Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    S. Abrol., L. Khan and B. Thuraisingham “Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012 Pittsburgh, Pennsylvania.

    S. Abrol., L. Khan, “Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining,” The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.


    Content based approach

    Content Based Approach

    • Inaccurate – Location in Text not Location of User

    • Involves Ambiguity: Paris can mean

      • Paris Hilton

      • Paris, the capital of France

      • Paris, a town in Texas

    • Slow – Uses NLP/ Machine Learning techniques, searches gazetteers


    Using social graphs

    Using Social Graphs

    • Based on Japanese Proverb - “When the character of a man is not clear to you, look at his friends.”

    • Relationship between geospatial proximity and friendship

    • Uses classical data mining algorithms for more accurate results

    • Faster and can be used for real world applications


    Geospatial proximity and friendship

    Geospatial Proximity and Friendship

    • Form 1012 Twitter user pairs and identify geo distance

    • Curve follows power law, curve of form a(x+b)-c with exponent of -0.87


    Graph construction

    Graph Construction

    • Vertices (data points) represents users

    • Edge represents ‘similarity’ between two users

    • Deal with special cases

      • Spammers – follow random people

      • Celebrities – followed by random people

  • Edge weight gets abbreviated


  • Defining edge weight

    Defining Edge Weight

    • Consists of two components:

      • Trustworthiness (TW)

      • Mutual Friends (MF)


    Trustworthiness1

    Trustworthiness

    • Fraction of friends which have the same label as the user himself

    • Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthiness

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Trustworthiness: 0.6

    Friend

    Location:Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA

    Location : Seattle/WA/USA


    Mutual friends

    Mutual Friends

    • Chose number common friends for similarity

      • Better Accuracy

      • Low Time Complexity


    Defining edge weight1

    Defining Edge Weight

    • Defined as

      Weightij=α×Max{TW(Ui), TW(Uj)} + (1- α) × MFij

    • 0<α<1, typically chosen to be around 0.7


    Outline3

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

    • Tweecalization: Label Propagation

    • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Tweethood fuzzy k closest friends with variable depth

    Tweethood: Fuzzy k-Closest Friends with Variable Depth

    • Choose k “closest” friends for the user

    • If location is not found look further for the answer

    • Each node is defined by a vector having locations with their respective probabilities

    • Boost and Aggregate at each step

    Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010


    Find the location of john doe

    Find the location of John Doe


    Social network of john doe

    Social Network of John Doe

    CB1

    CB2

    CB3

    CBn


    Choose k closest friends of john doe

    Choose k closest friends of John Doe

    CB1

    CB2

    CB3

    CBk


    Identify locations

    Identify Locations

    Location : NULL

    CB1

    LOW ACCURACY

    Location : Seattle, USA

    CB2

    CB3

    Location : NULL

    CBk

    Location : NULL


    What if we have depth 2

    What if we have depth=2 ?

    Location : Seattle/WA/USA

    Location : NULL

    Location : NULL

    Location : Dallas/TX/USA

    Location : NULL

    Location : Sydney/AU

    CB1

    Location : Dallas/TX/USA

    CB2

    Location : NULL

    Location : Richardson/TX/USA

    CB3

    Location : NULL

    CBk


    Location vector for john doe s friends

    Location Vector for John Doe’s friends

    Dallas/TX/USA0.4

    Seattle/WA/USA0.2

    Richardson/TX/USA0.2

    Sydney/AU0.2

    CB1

    Dallas/TX/USA0.33

    New Delhi/Delhi/India0.33

    Sunnyvale/CA/USA0.33

    CB2

    CB3

    Austin/TX/USA0.50

    Minneapolis/MN/USA0.50

    CBk

    Plano/TX/USA0.25

    Boulder/CO/USA0.25

    Salt Lake City/UT/USA0.25

    London/London/GB0.25


    Location vector for john doe

    Location Vector for John Doe

    Dallas/TX/USA0.1825

    Seattle/WA/USA0.05

    Richardson/TX/USA0.05

    Sydney/AU0.05

    New Delhi/Delhi/IN0.0825

    Sunnyvale/CA/USA0.0825

    Austin/TX/USA0.125

    Minneapolis/MN/USA0.125

    Plano/TX/USA0.0625

    Boulder/CO/USA0.0625

    Salt Lake City/UT/US0.0625

    London/GB0.0625


    Agglomerative clustering

    Agglomerative Clustering

    Dallas/TX/USA0.1825

    Seattle/WA/USA0.05

    Richardson/TX/USA0.05

    Sydney/AU0.05

    New Delhi/Delhi/IN0.0825

    Sunnyvale/CA/USA0.0825

    Austin/TX/USA0.125

    Minneapolis/MN/USA0.125

    Plano/TX/USA0.0625

    Boulder/CO/USA0.0625

    Salt Lake City/UT/US0.0625

    London/GB0.0625


    Agglomerative clustering1

    Agglomerative Clustering

    {Dallas, Plano,

    Richardson}/TX/USA 0.295

    Seattle/WA/USA0.05

    Sydney/AU0.05

    New Delhi/Delhi/IN0.0825

    Sunnyvale/CA/USA0.0825

    Austin/TX/USA0.125

    Minneapolis/MN/USA0.125

    Boulder/CO/USA0.0625

    Salt Lake City/UT/US0.0625

    London/GB0.0625


    Tweethood algorithm

    Tweethood: Algorithm


    Outline4

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

      • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Tweecalization label propagation

    Tweecalization: Label Propagation

    • But the availability of users with location is limited

    • Most of users do not have a location

    • Need a method that can learn from unlabeled data

    Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania


    Tweecalization label propagation1

    Tweecalization: Label Propagation

    • Ideal scenario for semi supervised learning: Only a few friends with locations(labeled data)1

    • Use both labeled and unlabeled data for training

    • Points which are close to each other are more likely to share a label

    Y. Bengio, O. Dellalleau, and N. L. Roux, “Label propagation and quadratic criterion,” In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.


    Label propagation an illustration

    Label Propagation: An Illustration

    “CLAMPED LOCATIONS”

    Central User

    Friends with location

    Friends without location

    ?


    Tweecalization algorithm

    Tweecalization: Algorithm


    Outline5

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • What about temporal analysis

    What About Temporal Analysis?

    • None of the existing works do temporal analysis

    • What about migration/ geographical mobility?


    Migration geographical mobility

    Migration/Geographical Mobility

    • 4% to 6% every year, means 12 to 17 million each year

    United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/


    Migration geographical mobility1

    Migration/Geographical Mobility

    • Migration as a function of age

    • People aged 20-29 have a higher probability to move

    High Migration Rate: College and Jobs

    Low Migration Rate: Old age, people settle down

    United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/


    Facebook users and mobility

    Facebook Users and Mobility

    • Let us look at the cumulative effect

    • Only 28% to 37% are currently living in their hometown

    Based on our experiments on 300k Public Facebook Profiles


    Twitter users and mobility

    Twitter Users and Mobility

    • Linking Twitter users to migration

    • 33% of all Twitter users are aged 25-34 years

    Based on our findings by [1]ABI Research. Online. Available: http://www.abiresearch.com


    Tweeque graph partitioning

    Tweeque: Graph Partitioning

    • How do we know if “this” is the current location for a user?

    • How do we perform temporal analysis of friendships?

    • Propose a technique that indirectly infers the current location

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.


    Observation 1 social cliques and location

    Observation 1: Social Cliques and Location

    • Our definition: A social clique is an inclusive group of people that share friendship

    • Apart from friendship, what is the attribute that links members of a clique? Individual Locations

    • All members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.


    Observation 2 migration and time

    Observation 2: Migration and Time

    • As shown previously over course of time, people have tendency to migrate

    • Based on these two observations we hypothesize

    • If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations.

    • Migration is our latent time factor


    Tweeque an example

    Tweeque: An example

    Friends from high school in Dallas

    Friends from college in Boston

    Relatives/Cousins

    Friends from job in Seattle


    Tweeque an example1

    Tweeque: An example

    All Friends of the User


    Tweeque an example2

    Tweeque: An example

    Social Clique #1 (High School)

    Social Clique #2 (College)

    Social Clique #3 (Current Work)

    Social Clique #4 (Relatives)


    Tweeque an example3

    Tweeque: An Example

    Relatives

    High School

    College

    Work

    Singapore

    Seattle/WA/USA

    Boston/MA/USA

    Dallas/TX/USA

    Seattle/WA/USA

    Sydney/Australia

    Portland/OR/USA

    Seattle/WA/USA

    Dallas/TX/USA

    Dallas/TX/USA

    Dallas/TX/USA

    Austin/TX/USA

    Dallas/TX/USA

    Seattle/WA/USA

    Boston/MA/USA

    San Diego/CA/USA

    Ontario/Canada

    Redmond/WA/USA

    Dallas/TX/USA

    New York/NY/USA

    Purity (Dallas) = 0.32

    Purity (Boston) = 0.45

    Purity (Dallas) = 0.18

    Purity (Seattle) = 0.69


    Tweeque graph partitioning1

    Tweeque: Graph Partitioning


    Tweeque graph partitioning2

    Tweeque: Graph Partitioning

    J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.


    Tweeque graph partitioning3

    Tweeque: Graph Partitioning


    Tweeque algorithm

    Tweeque: Algorithm


    Tweeque purity voting

    Tweeque: Purity Voting


    Outline6

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Experiment data

    Experiment Data

    • Randomly choose 1000 Twitter users


    Experiments and results

    Experiments and Results

    • 75.5% for city level prediction

    • 80.1% for country level prediction

    • We observe that the accuracy saturates after depth 4

    • Six degrees of separation is the idea that everyone is on average approximately six steps away, by way of introduction, from any other person in the world`

    • For Twitter this distance is found to be 4.67


    Comparison of different approaches

    Comparison of Different Approaches

    SatyenAbrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

    SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10.


    Outline7

    Outline

    • Introduction and Problem Statement

    • Different Approaches

    • Social Graph Based: Our Approaches

      • Tweethood: Fuzzy k – Closest Friends with Variable Depth

        • Tweecalization: Label Propagation

        • Tweeque: Graph Partitioning for Spatio-Temporal Analysis

  • Experiments and Results

  • Future Work


  • Contributions

    Contributions

    • Developed three graph based location mining algorithms for online social networks

      • Maps location mining problem to k-nearestneighbor, semi supervised and graph partitioning problem

      • Outperform content based approach in time and accuracy

    • Relationship between geospatial proximity and friendship

    • Effect of geographical mobility on current location of users


    Future work

    Future Work

    • Combining Content and Graph based methods

      • Score based geo-tagging technique1

      • Associating keywords with locations to build probabilistic model: “cowboys”  Dallas, “casino”  Las Vegas

      • Since tweets have timestamps, it leads to more accurate prediction of current location

    1 Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle.


    Future work1

    Future Work

    • Improve scalability of current algorithms using cloud computing framework

      • Each of the friends of a user is handled by a separate node in the distributed environment

    • Micro-level location identification

      • Identify specific points of interests (POIs) such as restaurants, place of work, etc from tweets

      • Identify comfort zone for a user

      • Use Foursquare check-in dataset: over 30 million POIs all over the world


    Publications

    Publications

    • Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

    • Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

    • Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)


    Publications1

    Publications

    • Satyen Abrol And Latifur Khan, “TWinner: Understanding News Queries With Geo-Content Using Twitter”. In Proc. of 6th Workshop on Geographic Information Retrieval (GIR'10) At Zurich, Switzerland.

    • Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle.

    • Satyen Abrol, Latifur Khan, Vaibhav Khadilkar, Bhavani M. Thuraisingham, Tyrone Cadenhead, “Design and implementation of SNODSOC: Novel class detection for social network analysis”, ISI 2012: 215-220


    Thank you

    Thank You!

    Questions?


  • Login