1 / 27

Friendship Prediction and Homophily in Social Media

Friendship Prediction and Homophily in Social Media. By LUCA MARIA AIELLO ALAIN BARRAT ROSSANO SCHIFANELLA CIRO CATTUTO BENJAMIN MARKINES and FILIPPO MENCZER, Presenter Maltin Shkarpa. Introduction. Online social sites are very popular nowadays

kiona-oneal
Download Presentation

Friendship Prediction and Homophily in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Friendship Prediction and Homophily in Social Media By LUCA MARIA AIELLO ALAIN BARRAT ROSSANO SCHIFANELLA CIRO CATTUTO BENJAMIN MARKINES and FILIPPO MENCZER, Presenter Maltin Shkarpa

  2. Introduction • Online social sites are very popular nowadays • Great place to share information, experience, ideas – all tailored to specific interest • Users sharing the same interest tend to be friends with each other

  3. Introduction • This article: • Suggests that users with similar interests are more likely to be friends. • Confirms that social networks constructed from topical similarity capture actual friendship accurately.

  4. Introduction • Homophily • Metrics to measure “similarity”? • Tagging • three-way relation • Explicit representation of user activities by: • Exposing resources • Tagging items • Discussion groups • And, other user relations

  5. The article addresses these questions: • How does the similarity between user profiles relate to their proximity on the social network? • Amount of activity • Content of activity • And can we predict the existence of social links from knowledge of the similarity among user profiles?

  6. Flickr • Users upload, tag, and share their pictures • Directed links • flickr.com/api & crawling • Last.fm • Tag songs, artists or albums, create or join groups • Undirected links • last.fm/api and crawling • Tasteometer

  7. Datasets • aNobii • The public aNobii book database • Each user has a digital book collections • Library • Wishlist • Two different types of social ties: • Friendship • Neighborhood • The study will consider the union between friendship and neighborhood

  8. Data Analysis – Heterogeneity and Correlations • Number of friendship relations is considered to be a measure of activity. • Analyzing: • The activity patterns of individual users. • The correlation between various activity indicators.

  9. Data Analysis - Heterogeneity • Activity pattern of users = highly heterogeneous Fig 1.Flickr complementary cumulative distributions

  10. Data Analysis - Heterogeneity • Activity pattern of users = highly heterogeneous Fig 2. Complementary cumulative distributions of the measures of activity of aNobii users

  11. Data Analysis - Correlations • Correlation about different types of activity? • Compute the average activity of a type for a user having a certain value of another activity type.

  12. Data Analysis - Correlations • Overall, the various activity metrics are all positively correlated with each other • Large fluctuations still present Fig 3. Left: Average number of distinct tags (nt), of groups (ng), and of tag assignments (a) of users having kout out-neighbors in the Flickr social network. Right: Correlations between the activity of aNobiiusers and their number of declared friends and neighbors: group memberships ng, library nb and wishlistsizes nw, averaged over users with kout out-links, vskout..

  13. Data Analysis - Mixing Patterns • Mixing Patterns • The correlations between the activity metrics • “Assortative mixing” or “homophily” • Clear assortative trends

  14. Data Analysis - Topic Similarity • Topic Similarity

  15. Data Analysis - Topic Similarity • Homophily • Link selection • Social influence • How to measure it and how to relate it to the social network structure? • Number of shared items, tags, groups, books, songs

  16. Data Analysis - Topic Similarity Fig 7. Average library and wishlist similarity as a function of the distance on the aNobii social network.

  17. Data Analysis - Topic Similarity Fig 8. Average tag and group similarity as a function of the distance on the Flickr and Last.fm social networks.

  18. Social Link Prediction - Methodology • Hypothesis • Social tie can be predicted based only on topical similarity • Methodology • ROC curves to test the prediction performance • AUC to measure performance evaluation • Sensitivity analysis • Prediction accuracy affected by density?

  19. Social Link Prediction – Similarity Metrics • Tripartite graph (three mode data) • Triple – a ternary relation • Similarity measures σ(u, v) • Two mode data • aggregation

  20. Social Link Prediction with Single Feature • MIP often outperforms the other measures. • Very good prediction on groups and libraries. Table III. AUC Values for Last.fm and aNobii Social Link Predictions

  21. Social Link Prediction with Single Feature Fig 14. Summary comparison between the ROC curves of the best performing prediction measures

  22. Social Link Prediction with Single Feature • Sensitivity Analysis Fig 15. Sensitivity analysis of link prediction based on the library feature in aNobii.

  23. Social Link PredictionCombining Features for Prediction Predictive Power of Single and Combined Features (using a decision tree on a balanced set of 10,000 positive and negative samples extracted from the aNobii dataset)

  24. Social Link PredictionLanguage Community Analysis • aNobii has two main groups • Italian community (60%) • Far East community (Hong Kong and Taiwan) (20%) • Tags in native language • Very little intersection between two different language clusters

  25. Social Link PredictionLanguage Community Analysis Fig 16. ROC curves comparing the link prediction within different language communities in aNobii and Last.fm. The user samples are composed by the top 500 taggers in the whole system (All) or considering a single language community (Italian, Chinese, English, German). In all cases we used the MIP similarity metric using a distributional aggregation over tags.

  26. Conclusions • Strong correlation between the social connectivity and intensity of user activities. • Link prediction • Can use any user profile features • MIP similarity – best prediction results • Library feature – very accurate • Combining features – even better accuracy • Easer in social networks that are strongly clustered by language

  27. Thank you!

More Related