1 / 32

Dark Web Collection, Search, and Analysis

Dark Web Collection, Search, and Analysis. Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu Acknowledgements: NSF CRI; NSF EXP-LA; DTRA, DOD CTFP, NPS; (ARFL WMD, CIA, FBI). Leaderless Jihad and the Internet.

oded
Download Presentation

Dark Web Collection, Search, and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dark WebCollection, Search, and Analysis Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu Acknowledgements: NSF CRI; NSF EXP-LA; DTRA, DOD CTFP, NPS; (ARFL WMD, CIA, FBI)

  2. Leaderless Jihad and the Internet • “The process of radicalization in a hostile habitat but linked through the Internet leads to a disconnected global network, the Leaderless Jihad.” • Before 2004, face-to-face interactions, 26-year old • After 2004, interactions on the Internet: Madrid, Dutch Hifsatd, Cairo, Toronto… Irhabi007 and Muntada, 20-year old

  3. Intelligence and Security Informatics Intelligence and Security Informatics (ISI): Development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003a) • Data, text, and web mining • From COPLINK to Dark Web

  4. COPLINK project in the press The New York Times, November 2, 2002 COPLINK assisted in DC sniper investigation ABC News  April 15, 2003 Google for Cops: Coplink software helps police search for cyber clues to bust criminals Newsweek Magazine,  March 3, 2003 A computerized way for police to coordinate crime databases Washington Post, March 6, 2008 National dragnet is a click away! COPLINK in use in 1,600 police agencies in US!

  5. Dark Web Overview • Dark Web: Terrorists’ and cyber criminals’ use of the Internet • Collection: Web sites, forums, blogs, YouTube, Second Life • Analysis and Visualization: Link and content analysis; Web metrics analysis; Authorship analysis; Sentiment analysis; Multimedia analysis • Our collection is about 2 TBs in size, with close to 500M pages/files/messages from more than 10,000 Dark Web sites.

  6. Dar Web project in the press • Project Seeks to Track Terror Web Posts, 11/11/2007 • Researchers say tool could trace online posts to terrorists, 11/11/2007 • Mathematicians Work to Help Track Terrorist Activity, 9/14/2007 • Team from the University of Arizona identifies and tracks terrorists on the Web, 9/10/2007

  7. Dark Web Forum Crawler System

  8. Middle Eastern Web Collection File Types • Dynamic files (e.g., PHP, ASP, JSP, etc.) are widely used in extremist Web sites, indicating a high level of technical sophistication. • Multimedia files (videos, images) are also heavily used in extremist Web sites.

  9. CyberGate System: Analysis & Visualization

  10. Measuring Hate and Violence: US vs. Middle Eastern Groups 7. Results: Intensity Relationship • b • R Strong hate and violence correlation, especially for Middle-Eastern groups.

  11. Number of Posts By Month: Al-Firdaws vs. Montada • Al-Firdaws consistently has between 2,500-3,000 posts per month since the second half of 2006. • Montada very active in 2002 and 2005.

  12. Affect Intensities: Al-Firdaws vs. Montada • Al-Firdaws - Violence • Montada - Violence Al-Firdaws has considerably higher violence and also greater anger intensity. • Al-Firdaws - Anger • Montada - Anger

  13. Arabic Writeprint Feature Set

  14. Arabic Feature Extraction Component 1 Incoming Message Count +1 Degree + 5 2 Elongation Filter Filtered Message • Similarity Scores (SC) Root Dictionary 3 Root Clustering Algorithm max(SC)+1 All Remaining Features Values Generic Feature Extractor 4

  15. Sliding Window + PCA : Turning Text into Dots Message Text 2. Extract feature usage vectors Compute eigenvectors for 2 principal components of feature group 1. x y 1,0,0,2,1,2 0.533 0.956 -0.541 0.445 0.034 0.089 0.653 0.456 0.975 -0.085 0.143 -0.381 Feature Usage Vector Z 0,1,3,0,1,0 y Eigenvectors x =  Zx y =  Zy Transform into 2-dimensional space 3. Repeat steps 2 and 3 x

  16. Anonymous Messages Author Writeprints Author A 10 messages Author B 10 messages

  17. ClearGuidance.com (Toronto Plot): Participant Network Visualization

  18. The series of overlapping circular patterns for bag-of-word features indicates that the author’s discussion revolves around a related set of topics. ClearGuidance Forum “Experts” Bag-of-words are predominantly related to religious topics, e.g., Adam, angels, etc. Many large red blots indicative of the presence of features unique to this author, e.g., Adam, angels, etc.

  19. This author was later arrested as a major culprit in the Toronto terror plot (“Soldier of God”). He uses many violent affect terms. Radar chart showing violent affect feature usages. Selected feature is use of term “jihad” which is the highest in the forum . Selected feature (i.e., “jihad”) is shown in red. This author constantly attempts to justify acts of violence and terrorism. “…there are so many paid sheikhs stuck in this life….no point going to them for fatwas…personally speaking…cuz they don’t even agree with jihad in the first place”

  20. Dark Web Forum Tools Collection AZ Forum Spider • Information contained within Dark Web forums represent a significant source of knowledge for security and intelligence organizations. • We have developed tools supporting the large-scale collection, search, and analysis of Dark Web forums, specifically addressing the needs of security analysts. Search AZ Forum Portal AZ Sentiment Analyzer Analysis AZ CyberGate Text Analyzer

  21. AZ Forum Spider • Automated collection of forum communications; weekly update • Proxy servers and parameters • Site map, URL ordering, and forum extraction • Incremental spider • Collection visualization Collection – AZ Forum Spider Forum List Spidering Status Collection Statistics Spidering Profile

  22. AZ Forum Portal Dark Web Forum Portal • Current version: 13M messages (340K members) across 29 major Jihadi forums in English, Arabic, French, German and Russian • Forum analysis • By forum, thread, member, time period, or topic • Social network analysis and visualization • Google Translation

  23. Forum Portal Data Set 23

  24. Data Set (Cont’d) 24

  25. Forum Statistics Summary (Cont’d) 25

  26. Cross Forum Search 26

  27. Single Forum Search & Translation Search: bomb, iraq Translations of thread titles 27

  28. SNA Replay Network 1. Bint ul Islam (290 postings) 2. Iloveislam (239 postings) 3. Abuhannah (173 postings)

  29. AZ Sentiment Analyzer Search – AZ Sentiment Analyzer • Portal for the sentiment and affect analysis of forums, measuring member opinions and emotions • Characterizes the affects conveyed in forum text, and the underlying sentiment polarity • By forum, thread, member, or time period • Keyword search

  30. AZ CyberGate Text Analyzer Analysis – AZ CyberGate Text Analyzer • Comprehensive system for the analysis and visualization of forum communications • Shows all text features • Utilizes Writeprint and Ink Blot techniques in text analysis • Incorporates rich visualization based upon multi-dimensional scaling and parallel coordinates

  31. Conclusion • The web offers extremists a rich medium for recruiting, communication, and radicalization. • Information contained within Dark Web sites, forums, blogs, multimedia, etc. represent a significant source of knowledge for security and intelligence organizations. • A computational approach to Dark Web research spans collection, search, and analysis. • Dark Web research could potentially assist in terrorism research and intelligence analysis. • Dark Web Forum Portal available now!!!

  32. Dark WebCollection, Search, and Analysis For more information: Dr. Hsinchun Chen, University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu

More Related