slide1 l.
Skip this Video
Download Presentation
Data Mining for Malicious Traffic Dr. Latifur Khan (NASA, AFOSR)

Loading in 2 Seconds...

play fullscreen
1 / 10

Data Mining for Malicious Traffic Dr. Latifur Khan (NASA, AFOSR) - PowerPoint PPT Presentation

  • Uploaded on

Cyber Security Research at the University of Texas at Dallas Sample Projects Prof. Bhavani Thuraisingham, PhD, CISSP Prof . Latifur Khan, PhD Prof. Murat Kantarcioglu, PhD Prof. Kevin Hamlen, PhD Prof. Edwin Sha, PhD August 2010.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data Mining for Malicious Traffic Dr. Latifur Khan (NASA, AFOSR)' - kenda

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Cyber Security Research at the University of Texas at DallasSample ProjectsProf. Bhavani Thuraisingham, PhD, CISSP Prof. Latifur Khan, PhDProf. Murat Kantarcioglu, PhDProf. Kevin Hamlen, PhDProf. Edwin Sha, PhDAugust 2010

data mining for malicious traffic dr latifur khan nasa afosr
Data Mining for Malicious TrafficDr. Latifur Khan (NASA, AFOSR)
  • Motivation
  • Network traffic is a continuous flow of data, which is evolving with time
  • How can we detect intrusion by mining the network traffic when
    • the intrusions evolve themselves ?
    • only a small fraction of the traffic is analyzed and labeled by human experts ?
    • new kind of intrusions appear ?
  • Technical Approach
  • Idea: Build a classification model from past data and predict intrusions using the model.
  • The model must be able to
    • keep itself up-to-date so that it can detect intrusions even if their characteristics change over time
    • use the limited amount of labeled data to efficiently update itself
    • detect new kind of intrusions in the traffic
  • Strategy:
    • Semi-supervised learning to compensate for the short of labeled training data
    • Ensemble classification technique to cope with the changes in the traffic
    • Novel class detection to detect new kind of intrusions in the traffic

System Architecture

Newer chunks

Older chunks

Network traffic

Last Partially labeled chunk

Last Unlabeled chunk








Ensemble of models

New model



reactively adaptive malware dr kevin w hamlen and dr latifur khan afosr
Reactively Adaptive MalwareDr. Kevin W. Hamlen and Dr. Latifur Khan (AFOSR)
  • Motivation
  • Design and study malware immune to conventional antivirus technologies
  • Important for AF active defense project
  • Important for developing adequate defenses in anticipation of next-generation attacks
  • Technical Approach
  • Data Mining
    • use machine learning to discover signatures dynamically
    • adapt to new malware in the field
    • share learned signatures amongst mutually trusting attackers
  • Reactively Adaptive Malware
    • discover false negatives in protection system
    • self-obfuscate to defeat defenses

Signature Inference Engine

Obfuscated Binary

Malware Binary

Antivirus Signature Database

Signature Approximation Model

Obfuscation Generation

Signature Query Interface

Obfuscation Function

afosr assured information sharing 2005 2008 dr bhavani thuraisingham
AFOSR: Assured Information Sharing: 2005-2008 (Dr. Bhavani Thuraisingham)
  • Integrate the Medicaid claims data and mine the data; next enforced policies and determine how much information has been lost (Trustworthy partners); Prototype system; Application of Semantic web technologies
  • Apply game theory and probing to extract information from semi-trustworthy partners
  • Conduct Active Defence and determine the actions of an untrustworthy partner
    • Defend ourselves from our partners using data mining techniques
    • Conduct active defence – find our what our partners are doing by monitoring them so that we can defend our selves from dynamic situations
  • Trust for Peer to Peer Networks (Infrastructure security)

Data/Policy for Coalition









Data/Policy for

Data/Policy for

Agency A

Agency C


Data/Policy for

Agency B

Trustworthy Partners

Semi-Trustworthy Partners

Untrustworthy Partners

Incentive Issues in Assured Information SharingDr. Murat Kantarcioglu (DoD MURI Project 2008-2013, AFOSR))
  • Motivation
  • Misaligned incentives could be a significant problem in Information Security.
    • Software bugs vs. Software companies’ incentives
  • Incentive issues in information sharing have been explored to some extent
    • Incentive issues in file sharing p2p networks
  • Assured information sharing creates new challenges
    • Security considerations vs. Utility
  • Technical Approach
  • Verify that the other participants do not lie about their data.
    • If the data is revealed as it is
      • Trust but verify (Our initial results: DKE ’08 paper)
    • If the data is not revealed (e.g., SMC techniques are used)
      • Non-cooperative computing
      • Mechanism design
      • SMC with rational adversaries.
scalable social network mining dr murat kantarcioglu nsf
Scalable Social Network MiningDr. Murat Kantarcioglu (NSF)
  • Motivation
  • Mining social network data could provide important insights.
  • Recently many different data mining techniques have been suggested for mining social network data.
  • These techniques require many iterations (e.g., collective inference techniques) and expensive computations (e.g., maximum likelihood methods) over the large social networks.
  • Initial Results
  • Partitioning techniques based on various social network centrality metrics have been implemented
    • Degree centrality (DC)
    • Clustering coefficient (CC)
    • Closeness centrality (CloC)
    • Betweenness centrality (BC)
    • Random partionining
    • Domain specific
  • Our initial results indicate by intelligent partitioning we can increase accuracy and reduce running time.
  • Technical Approach
  • Our goal is to scale the existing social network mining techniques to very large social network data by using cloud computing.
  • To achieve this goal, we are exploring
    • Intelligent data partition techniques based on social network concepts
    • Caching of some important queries
    • Efficient update of cached query results using cloud computing
language based security dr kevin w hamlen afosr
Language-based SecurityDr. Kevin W. Hamlen (AFOSR)
  • Motivation
  • Mobile code security (web scripts, patches, etc.)
  • How to enforce application-specific security policies over these untrusted software extensions?
    • Policy #1:Untrusted code must not create or modify any file whose name ends in “.exe”
    • Policy #2:Untrusted code must not access the network after reading a confidential file
    • Policy #3:Untrusted code must relinquish the thread after at most 1000 instruction cycles

System Architecture











code + proof

  • Technical Approach
  • Idea: Automatically rewrite the code prior to execution
  • Two constraints on rewritten code:
    • rewritten code must satisfy security policy
    • rewritten code behaves exactly like original (except with regard to policy violations)
  • One simple rewriting strategy:
    • insert guard instructions before every potentially dangerous instruction
  • Use compiler optimizations to eliminate or streamline unnecessary guards



Example Code

(inserted code shown in green)

eax := “filename.exe”

if (eax == “*.exe”) abort();

call, “w”);

privacy preserving distributed data mining dr murat kantarcioglu nsf


Cryptographic Protocols


Data Processing


Data 1 (Public)


Data 2 (Public)

Data Sanitization

Data Sanitization

Source Data 1


Source Data 2


Privacy-preserving Distributed Data MiningDr. Murat Kantarcioglu (NSF)
  • Motivation
  • Privacy sensitive data that is needed for many critical tasks is distributed among different organizations.
    • Statistical analysis of hospital discharge data for detecting biological weapons attacks.
  • Privacy concerns may hinder sharing such data for legitimate purposes.
  • Our goal is to develop techniques to enable distributed data mining without sacrificing individual privacy
  • Technical Approach
  • Idea: Combine sanitization and cryptographic techniques to enable efficient and accurate privacy-preserving distributed data mining.
    • Each data source sanitizes its own data.
    • Sanitized data is shared directly .
    • Cryptographic algorithms use sanitize data along with original data to get the data mining results.
  • Our initial results indicate that this idea is more efficient than pure cryptographic approaches and more accurate than pure sanitization approaches.

WWW Disambiguation & Geo-tagging: Dr. L. Khan (NGA)



  • WWW problems as a source of geo-information
    • Geographic context embedded in natural language descriptions
    • Place names ambiguous and confused with names of organisations, people, buildings and streets
    • Web queries depend on exact match of text terms





  • Applications:
    • Location-based services
    • Locally targeted web advertising
    • Mining geographic properties

Market research

    • Geo-Information Web services




Ranking Based Disambiguation

  • Geo-Tagging = Geo-parsing + Geo-coding
  • Geo-parsing
    • Recognising geographic references (ignoring non-geographic uses of place terminology)
  • Geo-coding
    • Attaching a unique quantitative locations (footprint) to geographic references
  • Example:
  • Geo-Geo ambiguity



  • Geo- non Geo ambiguity

e.g. “Samuel Lancaster”

Lancaster > Last name.

{City} Lancaster / Texas/ U.S.

other projects
Other Projects
  • Secure Cloud Computing
  • Secure Social and Private Networks
  • Security and Privacy preserving ontology alignment
  • Secure Peer to Peer Data Management
  • Risk modeling and analysis of Botnets
  • Policy interoperability of geospatial data
  • Data provenance and Attribution of Attacks
  • Accountability of Secure Systems