CISA
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

CISA PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

CISA. Continually Improving Stream Analysis Nancy McMillan Doug Mooney Dave Burgoon March 14, 2003. Agenda. Background and Overview Architecture Algorithms Results. MURALS: Multiple Use Real-time Analytics for Large Scale Data. Major information technology initiative

Download Presentation

CISA

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cisa

CISA

Continually Improving Stream Analysis

Nancy McMillan

Doug Mooney

Dave Burgoon

March 14, 2003


Agenda

Agenda

  • Background and Overview

  • Architecture

  • Algorithms

  • Results


Murals multiple use real time analytics for large scale data

MURALS:Multiple Use Real-time Analytics for Large Scale Data

  • Major information technology initiative

    • Objective: Develop intellectual property addressing the challenges created by:

      • Data generation/collection at previously unimaginable rates

      • Growing expectation that real time decision-making is feasible and necessary for competitive advantage

      • Dramatic increase in the data to information ratio

      • Compelling need for balance between result precision and timeliness

  • Sponsored development of two technologies

    • InfoRes: Addresses IT issues associated with real-time querying of very large relational databases

    • CISA: Addresses IT issues associated with real-time analysis of high volume (varying arrival speed) stream data


Background our problem space

Background:Our problem space

  • Many data sources supplying stream data

  • Stream data can be summarized by a set of features/summary statistics over some time window

  • Each data source needs continually classified or characterized

  • Classification/characterization of a single data source may depend on data from other data sources

  • Examples:

    • Computers connecting to a firewall

    • Sensor networks


Internet security example who is trying to inappropriately access a company s network

Internet Security ExampleWho is trying to inappropriately access a company’s network?

  • There are 19 firewalls recording connections in a log file

    • Date/Time• Source and Destination IP addresses

    • Protocol • Action (Accept/Drop/Decrypt/..)

    • Service • Rule

  • Inbound and outbound connections and warnings over a six day period in July 2002 were logged

    • but connections from site to site VPNs are not

    • only externally initiated connections are being analyzed

    • more data (6 days in September) were provided later


The problem the faster data arrives the more processing power required for real time analysis

Every data arrival initiates some tasks (store data, recalculate features, update decisions, etc.), which each require computational time

Systems designed for gushing data waste resources when data trickles.

Systems designed for slower data flow fail when data arrives too fast.

More sophisticated analysis techniques (better features, decision algorithms, etc.) require more computational time, but can provide better answers

Analytics designed for gushing data don’t provide the best answer possible when data trickles.

Analytics designed for slower data flow don’t provide timely answers when data arrives too fast

The Problem: The faster data arrives, the more processing power required for real-time analysis.

To what data arrival rate should system be designed?


The cisa answer a precision speed trade off

The CISA Answer: A precision-speed trade-off

  • When the data arrives more slowly than the system design rate, the best possible answer is provided

    • All data is considered.

    • Best analysis techniques are used.

  • As the data flows faster than the system design rate the accuracy and/or precision of the solution degrades smoothly.

  • System achieves precision-speed trade-off through:

    • Architecture

      • Answer not based on all current data

      • Requires feedback from algorithm so most important data is considered

    • Algorithms

      • Partial/approximate solutions provided


Architecture and algorithm overview how cisa achieves precision speed trade off

Architecture

Assign analysis tasks to asynchronously operating objects

storage, characterization, decision-making, and visualization

Prioritize analysis tasks associated with each new piece of data

Data likely to impact analysis is analyzed sooner

Algorithm

Use incremental algorithms where possible

Update previous answer with new data rather than re-analyze all data

Stop or modify iterative or multi-step algorithms before completion when new data arrivals need to enter algorithm

Partial/approximate solutions provided

Architecture and Algorithm OverviewHow CISA achieves precision-speed trade-off


Agenda1

Agenda

  • Background and Overview

  • Architecture

  • Algorithms

  • Results


Cisa architectural components diagram

CISA Architectural ComponentsDiagram


Internet security example architecture diagram

Internet Security Example ArchitectureDiagram

Java

Access database

JMS object communication

SAS Analytics


Advantages issues related to rapid prototyping decisions

Advantages

Asynchronous

Prioritized Lists

Open Source / Off-the-shelf

Platform Independent

Issues

Slow – system resources, ”thrashing”, db, (network speeds)

JMS Implementations vary slightly

Advantages

Easy communication with Java

Easily and quickly developed

data storage and

feature calculation

Issues

Slow

Not available on many platforms

Advantages / IssuesRelated to rapid prototyping decisions

JMS

Access


Agenda2

Agenda

  • Background and Overview

  • Architecture

  • Algorithms

  • Results


Candidate cisa algorithms a very broad group of statistical methods

Feature characteristics

Relies on more than one feature

Some of the individual features take time to compute or measure

Meaningful nested "sub-algorithms" can be built on increasing sets of features

Data source characteristics

The algorithm can efficiently, update its current solution when feature values for only a small group of source objects change

There is a natural method for prioritizing objects

Candidate CISA AlgorithmsA very broad group of statistical methods…


Construction methodologies general

Construction MethodologiesGeneral

  • Feature Priority

    • Order features (statically)

    • Create series of nested models that use an increasing number of features

    • Develop a function to assign priorities based on feature order and current object classification

  • Data Source Priority

    • Order data sources (dynamically)

    • Assign priorities based on uncertainty of classification or cost of misclassification

    • Incremental algorithms are usually essential

  • Combinations of Both


Construction methodologies examples

Construction MethodologiesExamples

  • Feature Priority: Decompose an algorithm into subalgorithms that use subsets of features. Prioritize feature computation.

    • Example: Decision tree using X1,X2,… , Xn

    • Prioritize order of Xi computation based on tree structure

    • Use pruned trees to classify:

      {X1}, {X1,X2}, {X1, X2, X3}, …, {X1, X2, …, Xn}

  • Data Source Priority:

    • Example: Cluster analysis—All features needed

    • Objects with incomplete feature sets get higher priority

    • Objects with more uncertain classifications get higher priority


Feature priority construction decision tree example

Feature Priority ConstructionDecision tree example


Agenda3

Agenda

  • Background and Overview

  • Architecture

  • Algorithms

  • Results


Internet security example who is trying to inappropriately access the company s network

Internet Security ExampleWho is trying to inappropriately access the company’s network?

  • There are 19 firewalls recording connections in a log file

    • Date/Time• Source and Destination IP addresses

    • Protocol • Action (Accept/Drop/Decrypt/..)

    • Service • Rule

  • Inbound and outbound connections and warnings over a six day period in July 2002 were logged

    • but connections from site to site VPNs are not

    • only externally initiated connections are being analyzed

    • more data (6 days in September) were provided later


External network connectors summary statistics features

Quickly calculated features

% Drop

% Accept

Hits/Sec

# Hits

More time consuming features

# Different Services

Different Services/Hit

# Different IPs

Different IPs/Hit

External Network Connectors Summary statistics/features


Cisa

N=3

Slow Port and IP Scans

High Services

High Number of IPs

High Number of Hits

Low Hits/Sec

Large Drop %

N=4636

Suspicious

Large Drop %

Medium IP/Hit

Low everything else

N=10

Fast IP Address Scans

Low Services

High Number of Hits

High IP/Hit

High Number of Hits/Sec

Large Drop %

Mostly Foreign

Represent 40% of External Connections

N=7828

Normal

High Accept %

N=8055

Suspicious-Too Early to Tell

Large Drop %

High IP/Hit

Few Hits

N=36

Port Scans

High Services

Large Drop %

Dates: 7/21/02 -7/27/02


External network connectors classifications

External Network ConnectorsClassifications

70%-80% of IPs stay in same group from day to day.


External network connectors rule based feature priority classification algorithm

External Network ConnectorsRule-based, feature priority classification algorithm

Priority


Precision speed trade off expected results

Precision-Speed Trade-offExpected results

100

%

0

Connections per second

Correctly classified same level algorithm

Correctly classified different level algorithm

Consistently classified

Inconsistently classified


Precision speed trade off observed results

Precision-Speed Trade-offObserved results


External network connectors dynamic data source priority algorithm

External Network ConnectorsDynamic, data source priority algorithm

  • Traditional cluster analysis (e.g., K-means) is time consuming on large datasets

  • Incremental clustering algorithm required for reasonable performance

  • Our approach:

    • After first cluster analysis, use centroid locations to seed the next analysis

    • Used the SAS procedure FASTCLUS for proof-of-concept purposes


Cisa

Dates: 8/11/02 - 8/17/02

Outlier

Outlier: n=1 (0.32% of connections) Extremely high services China


Cisa

Dates: 8/11/02 - 8/17/02

Cluster 0: n = 5207 (10.11% of connections) High Accept % Mix Max Hits Mix IP/Hit

Cluster 1: n = 2561 (17.16% of connections) High Drop % Medium IP/Hit

Cluster 2: n = 7 (50.35% of connections) High Drop % High Num Hits High Num IPs High Max Hits/Sec

Cluster 3: n = 180 (17.81% of connections) High Services and/or Max Hits/Sec Mixed

Cluster 4: n = 4 (01.42% of connections) High Drop % High Services 94.5% of connections from Korea 1 of 4 IPs from Korea Average 23 sec between hits

Cluster 5: n = 5104 (02.82% of connections) High IP/Hit High Drop %

Cluster 0

Cluster 2

Cluster 4

Cluster 1

Cluster 5

Cluster 3


External network connector classifications dashboard report

Drop %

Service/Hit

IPS/Hit

Max Hit/Sec

IPs Scanned

Services Scanned

% of Sources

% Connections

External Network Connector Classifications Dashboard report


External network connector classifications outlier report

Drop %

Service/Hit

IPS/Hit

Max Hit/Sec

IPs Scanned

Services Scanned

External Network Connector ClassificationsOutlier report


  • Login