1 / 33

What you want is not what you get:

What you want is not what you get:. Predicting sharing policies for text-based content on Facebook. Arunesh Sinha*, Yan Li † , Lujo Bauer* *Carnegie Mellon University † Singapore Management University. Motivation.

tender
Download Presentation

What you want is not what you get:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What you want is not what you get: Predicting sharing policies for text-based content on Facebook • Arunesh Sinha*, Yan Li †, Lujo Bauer* • *Carnegie Mellon University • †Singapore Management University

  2. Motivation

  3. Problem for Social Networks • Report in dailymail.co.uk† †http://www.dailymail.co.uk/sciencetech/article-2423713/Facebook-users-committing-virtual-identity-suicide- quitting-site-droves-privacy-addiction-fears.html

  4. More User Control ⇏ Better Privacy • Users fail to comprehend controls • Users fails to comprehend consequences • Though concerned, often no effort towards better use of controls

  5. Smarter user control More user control Our goal: Help users pick correct policy for new Facebook posts

  6. Facebook’s Strategy Post n+1 Facebook Wall Default: Public Post n Public Post n-1 Public Post n-2 Friends

  7. Our Goal and Approach Post n+1 Facebook Wall Default:? Post n Public Post n-1 ML Public Post n-2 Friends

  8. Outline • Data collection methodology • Survey results • Machine learning approach • Results and analysis • Limitations / Conclusion

  9. Data Collection Method Survey Methodology • Created an online survey • Advertised on Craigslist and at CMU Participate in a Carnegie Mellon research study on Facebook sharing. Earn $5 for participating in a ~20 minute online study. We’re looking for English speaking adults, who have used Facebook for at least 4 months, update their Facebook status or post on Facebook at least every other day, and have used more than one privacy setting for their posts. Please click on the following link to start the online study: http://greyw1.ece.cmu.edu/survey/survey.php Upon completion of the study, you will receive a $5 Amazon gift card.

  10. Data Collection Method Filtering Users

  11. Data Collection Method Survey Questions • Collected demographic data • Age, gender, country, level of education • Degree of agreement with the statements: • I have a strong set of privacy rules. • I find Facebook's privacy controls confusing. • Have you ever posted something on a social network and then regretted doing it? If so, what happened? 

  12. Data Collection Method Facebook App • Fetched 4 months of users’ posts Text in post Policy

  13. Survey Results Survey Results: Demographics • 42 participants (avg. 146 posts and 4.6 policies) • Age: 18 to 65, with an average of 29.1 • 35 female, 7 male • 39 from USA

  14. Survey Results Survey Results: Sentiment

  15. ML Usage Plan Facebook Wall Post n+1 Default:? Post n Public Post n-1 ML Public Post n-2 Friends

  16. Machine Learning Approach Machine Learning • We use MaxEnt as the ML tool • Used Stanford NLP software • MaxEnt: provides good generalization • I.e., prevents overfitting • Learns probabilistic hypothesis h that outputs probability over labels given data x • Chooses hypothesis h with maximizes entropy • Subject to a form of agreement with training data

  17. Machine Learning Approach Features Considered • Words and 2-grams in the Facebook post • Presence of multimedia • Time of day – morning, evening, night • Previous post’s policy • Model (feature set) chosen using cross validation

  18. Machine Learning Approach Temporal Testing • The data is temporal • Picked 10 posts randomly as test data • We simulate a real-world scenario Time Test Test Train to predict Train to predict

  19. Machine Learning Approach Training • Cross-validation to choose features • May have different model for different test point Time Test Test Train to predict Train to predict

  20. Results and analysis Baseline Approach • Previous policy (Facebook’s approach) • Use the policy of the last post as the prediction • Surprisingly, pretty good accuracy • 0.85 on average

  21. Results and analysis MaxEntAccuracy

  22. Results and analysis Prediction Mismatch • Problem: We are not predicting intended policy • Instead, predicting implemented policy • Conjecture: • Implemented policy is often incorrect • Users just use Facebook’s default policy

  23. Results and analysis Ground Truth Collection • Feedback on 20 randomly chosen posts • Provides ground truth (intended policy) Text of post All policies ever used

  24. Results and analysis Datasets Pruned clean data Original data Clean data Correct 20 postsbased on feedback Remove 80% Implemented Policy

  25. Results and analysis Temporal Testing • 20 intended policy known • Picked 8 of these randomly as test data • We simulate a real-world scenario Test Test Train to predict Train to predict

  26. Results and analysis Baseline • Same previous policy approach as before • Measure intended accuracy • Predict only for posts with known intended policy • Better measure of performance • Baseline intended accuracy: 0.67 • 0.85 obtained previously on implemented policies

  27. Results and analysis MaxEnt Intended Accuracy MaxEnt(pruned clean) MaxEnt(clean) Baseline 67% 71% 81%

  28. Results and analysis Confidence About Policy Confidence Factor (CF): Fraction of posts for which intended policy matched implementedpolicy

  29. Results and analysis Analysis of Improvement

  30. Limitations Limitations • Only 20 intended policy available • 42 participants is not a huge number • Other studies have used similar numbers • Richer feature space possible • By processing the attachments of the post • Could use more sophisticated ML techniques

  31. Conclusion An approach demonstrating feasibility of learning intended privacy policy of Facebook posts • Accuracy: 67%81% • Accuracyfor CF>0.5: 78%94%

  32. Result and analysis Discarding “Bad” Data Helps

  33. Result and analysis Improvement #Participants

More Related