dark web collection search and analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Dark Web Collection, Search, and Analysis PowerPoint Presentation
Download Presentation
Dark Web Collection, Search, and Analysis

Loading in 2 Seconds...

play fullscreen
1 / 32

Dark Web Collection, Search, and Analysis - PowerPoint PPT Presentation

  • Uploaded on

Dark Web Collection, Search, and Analysis. Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu Acknowledgements: NSF CRI; NSF EXP-LA; DTRA, DOD CTFP, NPS; (ARFL WMD, CIA, FBI). Leaderless Jihad and the Internet.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Dark Web Collection, Search, and Analysis' - oded

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dark web collection search and analysis

Dark WebCollection, Search, and Analysis

Dr. Hsinchun Chen

Director, Artificial Intelligence Lab

University of Arizona

hchen@eller.arizona.edu http://ai.arizona.edu


leaderless jihad and the internet
Leaderless Jihad and the Internet
  • “The process of radicalization in a hostile habitat but linked through the Internet leads to a disconnected global network, the Leaderless Jihad.”
  • Before 2004, face-to-face interactions, 26-year old
  • After 2004, interactions on the Internet: Madrid, Dutch Hifsatd, Cairo, Toronto… Irhabi007 and Muntada, 20-year old

Intelligence and Security Informatics

Intelligence and Security Informatics (ISI): Development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003a)

  • Data, text, and web mining
  • From COPLINK to Dark Web

COPLINK project in the press

The New York Times, November 2, 2002

COPLINK assisted in DC sniper investigation

ABC News  April 15, 2003

Google for Cops: Coplink software helps police search for cyber clues to bust criminals

Newsweek Magazine,  March 3, 2003

A computerized way for police to coordinate crime databases

Washington Post, March 6, 2008

National dragnet is a click away! COPLINK in use in 1,600 police agencies in US!

dark web overview
Dark Web Overview
  • Dark Web: Terrorists’ and cyber criminals’ use of the Internet
  • Collection: Web sites, forums, blogs, YouTube, Second Life
  • Analysis and Visualization: Link and content analysis; Web metrics analysis; Authorship analysis; Sentiment analysis; Multimedia analysis
  • Our collection is about 2 TBs in size, with close to 500M pages/files/messages from more than 10,000 Dark Web sites.
dar web project in the press
Dar Web project in the press
  • Project Seeks to Track Terror Web Posts, 11/11/2007
  • Researchers say tool could trace online posts to terrorists, 11/11/2007
  • Mathematicians Work to Help Track Terrorist Activity, 9/14/2007
  • Team from the University of Arizona identifies and tracks terrorists on the Web, 9/10/2007
middle eastern web collection file types
Middle Eastern Web Collection File Types
  • Dynamic files (e.g., PHP, ASP, JSP, etc.) are widely used in extremist Web sites, indicating a high level of technical sophistication.
  • Multimedia files (videos, images) are also heavily used in extremist Web sites.
7 results intensity relationship

Measuring Hate and Violence: US vs. Middle Eastern Groups

7. Results: Intensity Relationship
  • b
  • R

Strong hate and violence

correlation, especially for

Middle-Eastern groups.

number of posts by month al firdaws vs montada
Number of Posts By Month: Al-Firdaws vs. Montada
  • Al-Firdaws consistently has between 2,500-3,000 posts per month since the second half of 2006.
  • Montada very active in 2002 and 2005.
affect intensities al firdaws vs montada
Affect Intensities: Al-Firdaws vs. Montada
  • Al-Firdaws - Violence
  • Montada - Violence

Al-Firdaws has considerably higher violence and also greater anger intensity.

  • Al-Firdaws - Anger
  • Montada - Anger

Arabic Feature Extraction Component


Incoming Message

Count +1

Degree + 5


Elongation Filter

Filtered Message

  • Similarity Scores (SC)

Root Dictionary


Root Clustering Algorithm


All Remaining Features Values

Generic Feature Extractor


sliding window pca turning text into dots
Sliding Window + PCA : Turning Text into Dots

Message Text


Extract feature usage vectors

Compute eigenvectors for 2 principal components of feature group





0.533 0.956 -0.541 0.445 0.034 0.089 0.653 0.456 0.975 -0.085 0.143 -0.381

Feature Usage Vector Z




x =  Zx

y =  Zy

Transform into 2-dimensional space


Repeat steps 2 and 3



Anonymous Messages

Author Writeprints

Author A

10 messages

Author B

10 messages

clearguidance forum experts

The series of overlapping circular patterns for bag-of-word features indicates that the author’s discussion revolves around a related set of topics.

ClearGuidance Forum “Experts”

Bag-of-words are predominantly related to religious topics, e.g., Adam, angels, etc.

Many large red blots indicative of the presence of features unique to this author, e.g., Adam, angels, etc.


This author was later arrested as a major culprit in the Toronto terror plot (“Soldier of God”). He uses many violent affect terms.

Radar chart showing violent affect feature usages.

Selected feature is use of term “jihad” which is the highest in the forum .

Selected feature (i.e., “jihad”) is shown in red.

This author constantly attempts to justify acts of violence and terrorism.

“…there are so many paid sheikhs stuck in this life….no point going to them for fatwas…personally speaking…cuz they don’t even agree with jihad in the first place”

dark web forum tools
Dark Web Forum Tools


AZ Forum


  • Information contained within Dark Web forums represent a significant source of knowledge for security and intelligence organizations.
  • We have developed tools supporting the large-scale collection, search, and analysis of Dark Web forums, specifically addressing the needs of security analysts.


AZ Forum Portal

AZ Sentiment Analyzer


AZ CyberGate

Text Analyzer

az forum spider
AZ Forum Spider
  • Automated collection of forum communications; weekly update
  • Proxy servers and parameters
  • Site map, URL ordering, and forum extraction
  • Incremental spider
  • Collection visualization

Collection – AZ Forum Spider

Forum List







az forum portal
AZ Forum Portal

Dark Web Forum Portal

  • Current version: 13M messages (340K members) across 29 major Jihadi forums in English, Arabic, French, German and Russian
  • Forum analysis
    • By forum, thread, member, time period, or topic
    • Social network analysis and visualization
    • Google Translation
single forum search translation
Single Forum Search & Translation

Search: bomb, iraq

Translations of thread titles


sna replay network
SNA Replay Network

1. Bint ul Islam (290 postings)

2. Iloveislam (239 postings)

3. Abuhannah (173 postings)

az sentiment analyzer
AZ Sentiment Analyzer

Search – AZ Sentiment Analyzer

  • Portal for the sentiment and affect analysis of forums, measuring member opinions and emotions
  • Characterizes the affects conveyed in forum text, and the underlying sentiment polarity
    • By forum, thread, member, or time period
    • Keyword search
az cybergate text analyzer
AZ CyberGate Text Analyzer

Analysis – AZ CyberGate Text Analyzer

  • Comprehensive system for the analysis and visualization of forum communications
  • Shows all text features
  • Utilizes Writeprint and Ink Blot techniques in text analysis
  • Incorporates rich visualization based upon multi-dimensional scaling and parallel coordinates
  • The web offers extremists a rich medium for recruiting, communication, and radicalization.
  • Information contained within Dark Web sites, forums, blogs, multimedia, etc. represent a significant source of knowledge for security and intelligence organizations.
  • A computational approach to Dark Web research spans collection, search, and analysis.
  • Dark Web research could potentially assist in terrorism research and intelligence analysis.
  • Dark Web Forum Portal available now!!!
dark web collection search and analysis1

Dark WebCollection, Search, and Analysis

For more information:

Dr. Hsinchun Chen, University of Arizona