semi automated discovery of application session structure
Download
Skip this Video
Download Presentation
Semi-Automated Discovery of Application Session Structure

Loading in 2 Seconds...

play fullscreen
1 / 35

Semi-Automated Discovery of Application Session Structure - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Semi-Automated Discovery of Application Session Structure. Jayanthkumar Kannan (Berkeley) , Jaeyeon Jung (Mazu Networks) , Vern Paxson (Berkeley) , Can Emre Koksal (EPFL) ACM Internet Measurement Conference 2006. Outline. Introduction Background Session Extraction Structure Abstraction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Semi-Automated Discovery of Application Session Structure' - amaris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
semi automated discovery of application session structure

Semi-Automated Discovery of Application Session Structure

Jayanthkumar Kannan (Berkeley), Jaeyeon Jung (Mazu Networks), Vern Paxson (Berkeley), Can Emre Koksal (EPFL)

ACM Internet Measurement Conference 2006

outline
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments

Speaker: Li-Ming Chen

network traffic analysis
Network traffic analysis
  • Previous works have extensively examined network behavior at the level of packets and connections.
    • Dynamics, self-similarity
    • Packet delays and loses
    • Connection characteristics at different sites
    • Transport behavior, structural analysis
    • Applications: traffic engineering, capacity planning, anomaly detection
  • What about session level analysis?

Speaker: Li-Ming Chen

understanding traffic at a higher level session level analysis
Understanding traffic at a higher level - Session level analysis
  • Comparatively, the structure of user-initialed sessions remains much less explored
    • Sessions – application sessions
    • Denote as a group of connections associated with a single network task (response to a user event !)
  • What could be considered as application sessions?
    • Applications have pre-specified forms (e.g., FTP sessions)
    • More types of sessions:
      • User behavior (e.g., Web surfing, sending e-mail)
      • Anomalies, mis-configuration
      • Malicious activities (e.g., Botnet)

Speaker: Li-Ming Chen

results examples
Results/Examples
  • FTP session
  • or imagine:
  • logging into a website and listening online music..
  • Botnet zombie receiving instructions from its master and proceeding..

Speaker: Li-Ming Chen

benefits of session level analysis
Benefits of session level analysis
  • For the researchers
    • Aid with traffic characterization and monitoring
    • Provide a foundation for forming source models
      • Descriptions of network activity in terms of what a source is attempting to achieve using the network
    • Aid with anomaly detection
  • For the administrators
    • Track application use in their network at a higher level
    • Provide richer information for framing network policies
    • Anomaly detection

Speaker: Li-Ming Chen

problem goals
Problem & Goals
  • Mine a connection-level trace
  • Derive session descriptors(abstract descriptions of the session-structure) for the different applications present in the trace
    • Without any a prior knowledge about the application
  • Deduce descriptors to provide qualitative structure for the analysts
  • Express these descriptors as
    • Regular expressions
    • Deterministic finite automata (DFAs)
      • The expression focus on the order, type, directionality of the connections, but not their inter-arrival timing !

Speaker: Li-Ming Chen

approach the concept
Approach (the concept)

Session

Descriptors

Connection

-level

traffic trace

Session

Extraction

Structure

Abstraction

  • Reduces a stream of connections
  • down to a stream of sessions
  • (Observation) connections
  • belonging to the same session
  • tend to occur “close” to one another
  • Model the temporal characteristics
  • of session arrivals
  • Attempts to infer succinct session
  • descriptors from each application
  • Simplify the raw descriptions to a
  • generalized form
  • Provide complexity-coverage
  • curves to represent the trade off
  • between economy-of-expression
  • and more detailed fidelity

Speaker: Li-Ming Chen

outline1
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments

Speaker: Li-Ming Chen

dataset
Dataset
  • Connection-level traces collected at the border of the LBNL
    • 1 month trace, about 2700K connections per day
  • 1st half – used to develop and calibrate the model
  • 2nd half – apply the model to infer descriptors for about 40 different applications, including:
    • Content-transfer (SMTP, FTP, HTTP)
    • Remote access (SSH, Telnet)
    • Database (OracleSQL, MySQL)
    • P2P (BitTorrent)
    • Mapping, authentication, remote desktop…, etc
  • How to evaluate? Based on the Spec. or human inspect

Speaker: Li-Ming Chen

terminology
Terminology
  • Connection C:
    • Denote by (proto, dir, remote-host, local-host, start-time, duration)
      • proto: destination port X
      • dir: incoming or outgoing connection
  • Type of a connection T(C):
    • Define as (proto, dir)
  • Session S = (C1, C2,…, Cn)
    • a sequence of connections involve only a single local-host and single remote-host
  • Application A(S):
    • Associated with a session S as T(C1)
  • A session S belongs to the session type ST(S) = (T1, T2,…, Tn)
    • For all i ≦n, Ti = T(Ci)

Speaker: Li-Ming Chen

types of sessions
Types of Sessions
  • Singleton
    • A lone connection by itself
  • Homogeneous sessions
    • Sessions consisting of consecutive invocations of the same application protocol and all with the same directionality
      • -> same connection type !
  • Mixed sessions
    • Sessions involving different connection types
  • Sessions involving multiple remote hosts… future work

Speaker: Li-Ming Chen

applications vs types of sessions
Applications vs. Types of Sessions
  • Different applications vary widely in the prevalence they exhibit for each of these types of session structure
  • E.g.,
    • LDAP (mapping): 11% singleton, 88% homo
    • SSH (remote access): 80% singleton, 18% homo
    • GridFTP (content-transfer): 58% singleton, 42% mixed
    • About half of the 40 applications involve more complex structure..

Speaker: Li-Ming Chen

outline2
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments

Session

Descriptors

Connection

-level

traffic trace

Session

Extraction

Structure

Abstraction

Speaker: Li-Ming Chen

session extraction
Session extraction
  • Problem:
    • Given a stream of connections,
    • Parse and reduce it into sessions (a stream of application-level sessions)
  • When observing a new connection Ci, the algorithm must decide:
    • (a) Ci is part of a current session !?
    • (b) Ci represents the beginning of a new session !?
  • Observation/Assumption:
    • The connections in a session are causally related
    • Such connections tend to occur “close” to each other

Speaker: Li-Ming Chen

1 extracting homogeneous sessions the aggregation rule

time

1. Extracting homogeneous sessions(the aggregation rule)
  • Considering connections less than a time Taggreg apart as part of the same session [24]
  • For Ci and already existed active session Sj
    • Sj = (C1j, …, Cnj) and A(Sj) ≡ T(C1j) = T(Ci)
    • If Cnj arrived less than Taggreg in the past from Ci’s arrival, then we consider Ci part of Sj
  • What about the connections involving different proto, or some what further apart ??

Sj =

C1j

C2j

Cnj

Ci

Taggreg

[24] C. Nuzman, I. Saniee, W. Sweldens, and A. Weiss, “A compound model for TCP

connection arrivals for LAN and WAN applications,” Computer Network, 2002.

Speaker: Li-Ming Chen

2 extracting mixed sessions
2. Extracting mixed sessions

not exactly

the same

  • Attempt to access possible causality
    • For Ci and already existed active session Sk
    • Sk = (C1k, …, Cmk) and A(Sk) ≡ T(C1k) ≠ T(Ci)
    • Try to find if Ci is a “triggered” connection of C1k ?
    • Bases on the observation, if Ci is causally related to Sk, then its arrival is likely to be “closer” to Sk, in comparison to the case where Ci is a normal connection.
  • (Approach) devised a statistical test:
    • Identifies pairs of causally linked connections
    • Builds a base model of what is “normal”, and flags deviations
      • Using null hypothesis test

Speaker: Li-Ming Chen

2 extracting mixed sessions causality detection algorithm
2. Extracting mixed sessions(causality detection algorithm)
  • On the arrival of a connection C of type T involving a local-host L
    • Let the sessions observed at L in the previous Ttrigger (500) seconds be S1, S2, …, Sn
    • Check & simply aggregate C to the most recent homo-sessions Si
    • Estimate the rate of connection arrivals at L for each session type within the past Trate (3600) seconds
    • For 1 ≤ i ≤ n, compute P[Ti, T, xi], for xi the interval between the arrival of Si and C
    • If P[Ti, T, xi] < α and C and Si involve the same remote-host, then add C to Si
      • else C is considered to be the 1st connection of a new session Si+1

Speaker: Li-Ming Chen

2 extracting mixed sessions causality detection algorithm cont d

time

2. Extracting mixed sessions(causality detection algorithm) (cont’d)
  • (Empirically known fact) arrival model is often roughly stationary Poisson over hourly periods
  • Identify connections whose arrivals deviate from this model as triggered connections
    • Arrival process of unrelated (normal) connections = union of independent Poisson processes
    • Quite close coincidental arrivals are very rare
    • Therefore: arrivals that are close are likely related, i.e., part of same session
  • P[T1, T2, x] is the probability that two
  • sessions have an arrival within time x.
  • If P[..] < α, declare C1, C2 in same session

FTP, rateλ1

C1

inter-arrival x

HTTP

rateλ2

might longer

than Taggreg

C2

Speaker: Li-Ming Chen

outline3
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments

Session

Descriptors

Connection

-level

traffic trace

Session

Extraction

Structure

Abstraction

Speaker: Li-Ming Chen

structure abstraction
Structure abstraction
  • Derive succinct descriptions for application session based on the set of session types (ST) reported by Session Extraction
  • Use regular expressions & DFA to represent an application session
    • Good balance between expressiveness and ease of generation
    • Further refine this representation by labeling state transitions with probabilities
      • Avoid false positive

Speaker: Li-Ming Chen

exact dfa vs nature dfa
Exact DFA vs. “Nature” DFA
  • (Naïve approach) Simply build
  • a DFA that matches the list of
  • all the observed sessions
  • More complex due to the fact
  • that it has to completely
  • capture several FTP sessions

Exact FTP DFA

Nature

FTP DFA

  • A more traceable DFA for FTP
  • Benefits:
  • Simplicity,
  • Generalization,
  • Highlighting Common Behavior,
  • Minimizing False Positives

Speaker: Li-Ming Chen

structure abstraction framework
Structure Abstraction Framework

Session

Descriptors

Connection

-level

traffic trace

Session

Extraction

Structure

Abstraction

(4 steps)

1

2

3

4

  • Semi-automatic
  • Lack of the ground truth
  • Categorize sessions
  • based on the server port
  • of the 1st connection
  • Construct exact DFA Efrom the union
  • of each observed session types (ST)

Speaker: Li-Ming Chen

step 3 coverage phase
Step 3: Coverage Phase
  • Given exact DFA E
  • Aim to extract a set of DFAs that capture subsets of the observed session behavior
    • Best trade off simplicity-of-expression (fewest states/edges) for coverage (capturing most types of behavior)
  • A greedy algorithm: DFA E -> DFA F1, F2, …, Fn
    • Feed every session instance in ST to E
    • Compute hit count h(e) for every edge
    • Next, compute augmented hit count h’(e) = Σh(e’)
      • e’ reachable form e
    • Order edges by decreasing h’(e), denote by e1, e2,…
    • Construct DFAs Fi by taking the union of all edges e1, …, ei

Speaker: Li-Ming Chen

step 4 generalization phase
Step 4: Generalization Phase
  • Generalize F1, F2, … to a set of transformation of generalized DFAs G1, G2, …
  • 3 workable generalization rules:
    • Prefix Rule: STi in trace -> all prefixes of STi
    • Counting Rule: (aBc) & (aBnc) in trace -> (aB+c)
    • Invert Direction Rule: STi in trace -> invert(STi)

ftp_in

ftp_out

data_in

data_out

data_out

data_in

data_in

data_out

data_in

Speaker: Li-Ming Chen

Refer to author’s slides

outline4
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments
  • Parameters:
  • Taggreg = 100 sec
  • Ttrigger = 500 sec
  • Trate = 1 hr
  • Threshold α = 0.1
  • Tservice≥ 5
  • Counting rule |B| = 2
  • Only feed session types
  • of length ≤ 10

Speaker: Li-Ming Chen

ftp session structures content transfer
FTP session structures (content transfer)
  • The fraction of session types in ST accepted by Gi,
  • weighted by the frequency with which the type occurs.
  • Gi may have more or fewer than i edges

DFA: 4 edges

4

2: singleton

Speaker: Li-Ming Chen

ftp session structure cont d
FTP session structure (cont’d)
  • DFA: 8 edges
  • Single data transfer
  • in the opposite dir

DFA: 8 edges

But fewer actual

edges

DFA: 10 edges

HTTP connections can

occur during FTP sessions

DFA: 18 edges

Coverage: 99%

Speaker: Li-Ming Chen

timbuktu session structures remote desktop
Timbuktu session structures (remote desktop)
  • 2: Singleton > 90%
  • Others < 10%

DFA: 4 edges

4

Speaker: Li-Ming Chen

timbuktu session structures cont d
Timbuktu session structures (cont’d)

DFA: 10 edges

Speaker: Li-Ming Chen

http session structure content transfer
HTTP session structure(content transfer)
  • DFA: 30 edges
  • (for saving space…,
  • only choose sessions begun with an
  • outgoing HTTP connections…)
  • More complex, ~99% are singleton
  • or aggregated sessions that reflect
  • successive retrieval of multiple
  • pages from the same server !

Speaker: Li-Ming Chen

finding attacks using anomaly detection
Finding Attacks Using Anomaly Detection
  • One goal is to detect network attacks by finding sessions that deviate from established session structures.
    • Such deviations would reflect either unintended mis-configurations, scanning, or “phone home” connections associated with compromises.

Speaker: Li-Ming Chen

outline5
Outline
  • Introduction
  • Background
  • Session Extraction
  • Structure Abstraction
  • Results
  • Conclusion & Comments

Speaker: Li-Ming Chen

conclusion
Conclusion
  • Session extraction
    • A statistical technique to extract application sessions from a connection-level trace of network activity
  • Structure abstraction
    • A method to deduce descriptors that can be used by an analyst to capture the qualitative structure of such sessions.
  • The results show that the proposed method works well over many of the applications in the trace
  • The future work:
    • Evaluate/validate the proposed method over more applications
    • Extend the method to support single-to-multiple host sessions
    • Try to collate descriptors for closely-related protocols

Speaker: Li-Ming Chen

comments
Comments
  • This method statistically correlate connections by observing connection-level traffic traces
    • Might not suitable for a complex environment..
    • What if the packet-level traces can be acquired ?
  • Surprisingly, a particular application can manifest various session structures
  • Session structures in this paper will help to find out the host-based anomaly
  • Single-to-multiple host sessions might be more helpful to the observation/identification of the worm-like activities

Speaker: Li-Ming Chen

ad