1 / 24

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

22 nd  USENIX Security Symposium (USENIX Security '13). The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions. Tao Zhu 1 ; David Phipps 2 ; Adam Pridgen 3 ; Jedidiah R. Crandall 4 ; Dan S. Wallach 3 1 Independent Researcher 2 Bowdoin College

brian
Download Presentation

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 22nd USENIX Security Symposium (USENIX Security '13) The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu1; David Phipps2; Adam Pridgen3; JedidiahR. Crandall4; Dan S. Wallach3 1Independent Researcher 2Bowdoin College 3Rice University 4University of New Mexico 左昌國 2013/09/10 Seminar @ ADLab, CSIE, NCU

  2. Outline • Introduction • Methodology • Hypotheses • Topic Extraction • Discussion • Conclusion

  3. Introduction • Microblogs in China : Weibo • SinaWeibo ( http://weibo.com ) • 503 million registered users (Dec. 2012) • 100 million messages sent daily • Promoting visibility of social issues • China employs both backbone-level filtering of IP packets and higher level filtering implemented in the software • Many works focus on how and what to filter • This paper focuses on how quicklymicroblog posts are removed

  4. Introduction • Contributions: • The implementation of a method that detect a censorship event within 1-2 mins of its occurrence • To understand how Weibo can react so quickly in terms of deleting posts with sensitive content • 4 hypotheses • To overcome the usage of neologisms, named entities, and informal language in Chinese for topical analysis

  5. Methodology • Identifying the sensitive user group • Crawling posts of sensitive user group • Detecting deletions

  6. Methodology – Identifying the Sensitive User Group • Search the outdated sensitive keywords in China Digital Times (http://chinadigitaltimes.net/2013/06/two-years-of-sensitive-words-grass-mud-horse-list/) • Using the keywords like “党产共”; 2011-4 ~ 2012-10 • Starting with 25 sensitive users (manually selected) 26 25 sensitive users > 5 deletion > 5 reposts for each user

  7. Methodology - Identifying the Sensitive User Group • Sensitive group reaches 3567 users after 15 days • More than 4500 post deletions daily • 1500 “permission denied” posts • 12% of the total posts from the group were eventually deleted • This methodology cannot a representative sample of the whole Weibo

  8. Methodology - Crawling • User timeline : • Weibo user timeline API returns the most recent 50 posts of the specified user. • Querying 3567 sensitive users one per minute • 100 accounts for API call • 300 concurrent Tor circuit • Four-node cluster running Hadoop and HBase

  9. Methodology – Detecting Deletions • If a post is in the database but is not returned from Weibo issue a secondary query for that post to determine what error message is returned • Permission-denied or system deletion • “Permission-Denied” error • Caused by censorship event • The post still exists but cannot be accessed by users • General deletion • “Post does not exist” error • May caused by user self deletion or censorship events • The post does not exist.

  10. Methodology – Detecting Deletions • This paper focuses on system deletions • Apparently not by users • From July 2012 to September 2012, 2.38 million posts were collected, with a 12.8% total deletion rate (4.5% for system deletions and 8.3% for general deletions). • The lifetime of a post is the time difference between the time the system detected the post being deleted and the creation time. • The measurement fidelity is on the order of minutes

  11. Distribution of Deleted Posts

  12. Hypotheses • How can the Weibo system find sensitive posts and remove them so quickly? • How are those sensitive posts located by the moderators after a month in the huge database? • Weibo has different strategies to target sensitive contents

  13. Hypotheses • Hypothesis 1: • Weibo has filtering mechanisms as a proactive, automated defense • Explicit filtering • Implicit filtering • “shishikanfalunhowle” • Camouflaged posts

  14. Hypotheses • Hypothesis 2: • Weibo targets specific users, such as those who frequently post sensitive content

  15. Hypotheses • Hypothesis 3: • When a sensitive post is found, a moderator will use automated searching tools to find all of its related reposts (parent, child, etc.), and delete them all at once

  16. Hypotheses • Hypothesis 4: • Deletion speed is related to the topic. That is, particular topics are targeted for deletion based on how sensitive they are. • Main 5 topics: • Qidong • QianYunhui • Beijing Rainstorm • Diaoyu Island • Group Sex

  17. Topic Extraction • Automatic methods are needed to classify the posts • TF*IDF (https://zh.wikipedia.org/wiki/TF-IDF) • Assign weights to the terms (n-grams) of a document • Pointillism approach [27] • Reconstruction from grams to words and phrases using external information

  18. Topic Extraction • 李W阳 (Li Wangyang, from 李旺阳) • 六圌四 (June Fourth, from 六四) • 胡()涛 (Hu Jintao, from 胡锦涛) • 启-东, 启\东 and 启/东 (Qidong, from 启东)

  19. Topic Extraction • Which topics among these have been discussed for the longest period of time? • Independent Component Analysis (ICA) • Beijing, government, China, country, policeman, and people • These 6 terms appear in almost every individual topic

  20. Discussion – Filtering Mechanisms • Proactive mechanisms • Hypothesis 1 • Backwards reposts search • Hypothesis 3: chain reposts deletion • Backwards keyword search • Similar to hypothesis 3: relative keywords deletion • 兲朝 • 37人(http://news.now.com/home/international/player?newsId=40857) • Monitoring specific users • Hypothesis 2

  21. Discussion – Filtering Mechanisms • Account closures • 300 user accounts closed • Search filtering • Public timeline filtering • User credit point • Users can report sensitive or rumor-based posts to earn points

  22. Discussion – Time-of-day Behavior

  23. Discussion – Time-of-day Behavior

  24. Conclusion • Deletions happen most heavily in the first hour • 90% of the deletions happen within the first 24 hours • The 4 hypotheses

More Related