Parallel and Distributed Computing for Cyber Security

1. Parallel and Distributed Computing for Cyber Security

2. Progress in HPC - past 6 decades

3. Applications Drive the Technology �I think there is world market for maybe 5 computers� - Thomas Watson Sr. (1943)

4. Data Mining - A Driver for Parallel/ Distributed Computing Lots of data being collected in commercial and scientific world Strong competitive pressure to extract and use the information from the data Scaling of data mining to large data requires HPC Data and/or computational resources needed for analysis are often distributed Sometimes the choice is distributed data mining or no data mining Ownership, privacy, security issues

5. Cyber Intrusion Detection - Motivation Sophistication of cyber attacks and their severity is increasing Large-scale denial of service attacks Identify Theft/ Fraud Espionage DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber attacks Security mechanisms always have inevitable vulnerabilities Firewalls are not sufficient to ensure security in computer networks Insider attacks difficult to detect

7. What are Intrusions? Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by: Attackers accessing the system from Internet Insider attackers - authorized users attempting to gain and misuse non-authorized privileges Typical intrusion scenario

8. What are Intrusions? Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by: Attackers accessing the system from Internet Insider attackers - authorized users attempting to gain and misuse non-authorized privileges Typical intrusion scenario

9. Intrusion Detection Systems

10. Data Mining for Intrusion Detection Increased interest in data mining based intrusion detection over the past decade Misuse detection Suitable for attacks for which it is difficult to build signatures Builds predictive models from labeled labeled data sets (instances are labeled as �normal� or �intrusive�) to identify known intrusions Cannot detect unknown and emerging attacks Madam ID project, ADAM project, fuzzy association rules [Bridges00], decision trees [Sinclair99], neural networks [Lippmann00, Ghosh99], genetic algorithms [Bridges00, Sinclair99], cost sensitive modeling (AdaCost [Fan99], MetaCost [Domingos99, Ting00]), learning from rare class ([Kubat97, Fawcett97, Provost01, Japkowicz01, Joshi02, Lazarevic03] Anomaly detection Detects emerging/novel attacks as deviations from �normal� behavior Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies PHAD, ALAD [Chan01, Cha02], ADAM [Barbara01] finite mixture model [Yamanishi00], ?2 based [Ye01]), temporal sequence learning [Lane98], neural networks [Ryan98], generating artificial anomalies [Fan01], clustering [Eskin02], unsupervised SVM [Eskin02, Lazarevic03], outlier detection schemes (MINDS), Bayesian net [Valdes00], Hidden Markov models [Ourston03]

11. Data Mining for Intrusion Detection Misuse Detection � Building Predictive Models

12. Misuse Detection � Building Predictive Models Data Mining for Intrusion Detection

13. MINDS � Minnesota INtrusion Detection System

14. Typical Anomaly Detection Output

15. Summarization Using Association Patterns

16. Typical MINDS Output

17. Typical MINDS Output

18. Typical Summarization Output

19. Detecting Modes of Network Traffic Using Clustering Used Shared Nearest Neighbor (SNN) clustering Not distracted by �noise� in the data CPU intensive: O(N2) Requires storing an N x K matrix K (number of neighbors) is typically between 10 � 20 K should be about the size of the smallest expect mode Clustered 850,000 connections collected over one hour at one US Army Fort Took 10 hours on a 16 CPU cluster Found 3135 clusters Largest clusters around 500 records, smallest cluster 10 records Large clusters correspond to normal behavior Many small clusters correspond to policy violations or other undesired behavior

20. Detecting Modes of Network Traffic Using Clustering




24. Need for HPC Very large data size Typical network traffic at University level reach around 500 million connections per day Compute intensive nature of the pattern finding algorithm Associative analysis Clustering Sequential pattern analysis

25. Need for Distributed Intrusion Detection Attacks on the network infrastructure may be launched from several different locations and may target multiple destinations Stealthy coordinated attacks with low traffic volumes are difficult to detect by IDSs based at a single network site Detection of such attacks in early stage requires correlation of data at multiple network sites

31. Centralizing data is not possible Data needed for analysis is distributed Costs of centralizing data is too high Security and privacy issues Computational resources needed for analysis are distributed Need for Grid-based IDS

32. Data Mining Middleware for Grids

33. Grid-Based Data Mining: Distributed Network Intrusion Detection

34. Publications Managing Cyber Threats: Issues, Approaches and Challenges, edited by V. Kumar, J. Srivastava, and A. Lazarevic, Kluwer Academic Publishers (forthcoming). MINDS - Minnesota Intrusion Detection System, Ert�z, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., Dokas, P., Data Mining: Next Generation Challenges and Future Directions, editors: H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha MIT/AAAI Press, 2004, AHPCRC Technical Report # 2003-121 Detection of Novel Network Attacks Using Data Mining, L. Ert�z, E. Eilertson, A. Lazarevic, P. Tan, P. Dokas, V. Kumar, J. Srivastava, Workshop on Data Mining for Computer Security, IEEE International Conference on Data Mining, Melbourne, FL, November 19, 2003, AHPCRC Technical Report # 2003-108

Parallel and Distributed Computing for Cyber Security

Parallel and Distributed Computing for Cyber Security

Presentation Transcript

ECOM 6330 Java Parallel and Distributed computing

Parallel (and Distributed) Computing Overview

Parallel and Distributed Computing Overview and Syllabus

Why Parallel/Distributed Computing

Security Mechanisms for Distributed Computing Systems

Security Mechanisms for Distributed Computing Systems

Cyber Security and Cloud Computing

Parallel Algorithms & Distributed Computing

Parallel and Distributed Computing for Data Mining

Distributed Parallel Computing

Highly Distributed Parallel Computing

Parallel and Distributed Computing for Neuroinformatics

Parallel and Distributed Computing: MapReduce

PARALLEL AND DISTRIBUTED COMPUTING OVERVIEW Fall 2003

Distributed & Parallel Computing Cluster

Parallel and Distributed Computing

Parallel distributed computing techniques

Parallel and Distributed Computing

What is Parallel and Distributed computing?

Parallel and Distributed Computing in CS2013

Parallel and Distributed Computing: MapReduce

Why Parallel/Distributed Computing

Parallel and Distributed Computing for Cyber Security