Immune system metaphors applied to intrusion detection and related problems
1 / 29

Immune System Metaphors Applied to Intrusion Detection and Related Problems - PowerPoint PPT Presentation

  • Uploaded on

Immune System Metaphors Applied to Intrusion Detection and Related Problems. by Ian Nunn, SCS, Carleton University Overview of Presentation. Review of immune system properties of most interest Algorithm design and the representation of application domains

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Immune System Metaphors Applied to Intrusion Detection and Related Problems' - sean-mejia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Immune system metaphors applied to intrusion detection and related problems

Immune System Metaphors Applied to Intrusion Detection and Related Problems


Ian Nunn, SCS, Carleton University

Overview of presentation
Overview of Presentation Related Problems

  • Review of immune system properties of most interest

  • Algorithm design and the representation of application domains

  • Examples of two recognition algorithms

  • Overview of application areas

  • Focus on intrusion detection systems (IDS)

  • Advantages of IS models and future research

  • The IS model as a swarm system

Immune system characteristics of interest
Immune System Characteristics of Interest Related Problems

  • The human immune system (IS) is a system of detectors (principally B and T cells) that:

    • After initial negative selection (tolerization), does not recognize elements of the body (self)

    • Is adaptable in that it can recognize over time, any foreign element (non-self) including those never before encountered

    • Remembers previous foreign element encounters

    • Dynamically regenerates its elements

    • Regulates the population size and diversity of its elements

    • Is robust to input signal noise (recognition region) and detector loss

    • Is distributed in nature with no central or hierarchical control

    • Is error tolerant in that self recognition does not halt the system

    • Is self-protecting since it is part of self

Representation of self non self

  • Shape space model: a parameterized representation (genotype) of the conformational form of self/non-self elements (phenotype)

Representation of Self/Non-Self

  • IS elements involved are cellular proteins and their peptide sequences

Is application algorithm design
IS Application Algorithm Design called

  • Requires a deep understanding of the problem domain

  • Self/non-self discrimination the fundamental IS principle

  • Steps in designing an IS algorithm:

    • Identification of features allowing correct and complete self/non-self discrimination*

    • Representation or encoding of features, particularly of continuous real-valued parameters*. Ab and Ag feature strings of same length facilitate algorithm performance analysis

    • Determination of a matching or fitness function. Important for evolution of Ab populations (affinity maturation)

    • Selection of IS principles to apply, e.g. negative selection, costimulation, affinity maturation, etc.

* This is hard stuff and an important step in applying any modeling technique whether genetic algorithms or swarm simulations (recall for army ants the problem of deciding what parameters to assign to the ants and to the environment and what values to allow).

Approach to feature selection and representation
Approach to Feature Selection and Representation called

  • Antibodies and antigens represented by strings of features:

    • The set of actual values observed such as sensor readings, voltages, ASCII text is called the application’s phenotype

    • The coded representation is called the application’s genotype

  • A feature is encoded by symbols from a finite alphabet

  • Some application feature domains:

    • Binary variables: digital signals in computer systems

    • Discrete real variables: ASCII character text

    • Continuous real variables: real world sensors

  • Continuous domains must be mapped onto discrete domains since we work with finite alphabets to ensure finite Ab/Ag population spaces

Phenotype representation change detection problem domains
Phenotype Representation: Change Detection Problem Domains called

  • OS (UNIX) processes: sequences of top level system calls

  • Program execution: alphabet symbols represent op codes

  • File system: reduction to ASCII or binary strings

  • User behavior and interface use: keystrokes, mouse clicks

  • Time series data representation of a physical (plant) processes: x/y position of a milling machine tool

  • Memory accesses: memory address calls

  • Local network traffic: TCP/IP packets: addresses, ports

  • Network traffic through routers and gateways: TCP/IP packets, addresses, volume

Is phenotype encoding and matching using a binary model 1

IS Phenotype Encoding and Matching Using a Binary Model1

  • Genotype = Phenotype : 32 bit string on a binary alphabet

Example use of a binary model with a ga for clonal selection 1
Example: Use of a Binary Model with a GA for Clonal Selection1

  • Start with randomly generated Ag and possibly incomplete Ab populations

  • For each Ab in turn, compute its average match (fitness) with a random fixed-size Ag subpopulation

  • Use a standard GA with mutation but no crossover to evolve successively better generations of antibodies

  • Niches observed to develop in coverage space for genetic commonalities (bacterial polysaccaride coating) if the initial populations have a bias

  • Self recognition minimized (without negative selection) by selecting for more Ag specific instead of more general antibodies – less likely to match self

Establishing antibody fitness
Establishing Antibody Fitness Selection

Random sub-population

Ga evolution of antibodies 1
GA Evolution of Antibodies Selection1



Use of a negative selection algorithm for clonal selection 5 2
Use of a Negative Selection Algorithm for Clonal Selection Selection5,2

  • Want explicit self-filtering (tolerization)

  • Algorithm:

    • Generate the set S of self (sub)strings

    • Generate a set R0of random strings

    • Match each string from R0 against S :

      • Match (non-complementary) on at least r contiguous locations: reject

      • No match: add string to detector set R

  • How to generate detectors efficiently an issue

  • Match detector set against target strings to detect intruder

  • Strings can be on any alphabet

Negative selection algorithm 2

Self string to be protected: 1011 0111 0011 0000, length of contiguous match substring r = 2

Negative Selection Algorithm2

Match Ab1: 10xx

Match Ab4: xx00

The problem of holes 6

  • Let contiguous match substring r = 2s1 and s2 be two antibodies matching over r-1 contiguous bits and h1 and h2 be two antigens

The Problem of Holes6

  • For a particular choice of matching rule and Ab repertoire, some non-self strings may not be found causing a hole in the coverage space

  • A detector that matches any r contiguous bits in h1 will also match either s1 or s2 for the same feature string. The same for h2. So h1 and h2 are undetectable.

Major application categories of immune system theory
Major Application Categories of Immune System Theory contiguous match substring r = 2

  • Machine learning and pattern recognition: limited but promising work done to date

  • Associative memories: limited work done to date

  • Elimination of identified elements:

    • IS model: use the B cell and Tk cell “kill” disable viruses

    • Use a phagocyte analogue for cleanup and garbage collection

    • IBM virus lab and Forrest’s group at UNM have looked at this

  • Recovery, repair and augmentation of identified elements :

    • IS model: use the B cell and Tk cell analogue to deliver a positive payload to an agent

    • Very little work done to date

Application areas cont
Application Areas (cont.) contiguous match substring r = 2

  • Detection problems – where most of the work has occurred:

    • Fault: failure of a self element (industrial plant systems)

    • Change: any change in self (tumors)

    • Anomaly: unusual presentation of a self element

    • Virus: presence of a non-self element

    • Intrusion: attempt to gain access by non-self element

    • Many of the classical issues of computer and network security involve some element of detection or self/non-self discrimination

The intrusion detection problem
The Intrusion Detection Problem contiguous match substring r = 2

  • Two classical types of intrusion detection systems (IDS):

    • Host-based: domain is a single machine possibly on a network

    • Network-based: domain is a network of hosts

  • Two classes of problem:

    • Anomaly detection: deviations from normal local resource use and network traffic

    • Misuse detection: usage identified with known system vulnerabilities and security policies

Essential requirements of a network based ids
Essential Requirements of a Network-based IDS contiguous match substring r = 2

  • Robustness to host failures and noisy signals (anomalous behavior)

  • Easy (self-)configurability of hosts

  • Easy extendibility to new hosts

  • Scalability: extendible to large networks without degradation of performance

  • Adaptability: dynamically able to recognize new anomalies

  • Efficiency: simple and low overhead operation

  • Global analysis: able to correlate local events to form global patterns

Network representation
Network Representation contiguous match substring r = 2

  • Commonly represent problem as the connection events (not message content) between computers

  • Kim and Bentley4:

    • Phenotype: 35 real-valued fields in four categories - connection identifier, port vulnerabilities, TCP handshaking, traffic intensity

    • Genotype: 35 genes. A detector gene has three “nucleotides” (cluster number in (0, 9), min offset, max offset). An antibody or antigen has a single real value.

    • Cluster and offset tables established for each host at start

    • A matching function maps an Ag or Ab value to a cluster and takes the distance to the nearest cluster bound as the measure of similarity

    • Use positive detection events to evolve the offsets for clusters

Kim and bentley model 4
Kim and Bentley Model contiguous match substring r = 24

New lower interval bound for cluster 2

New upper interval bound for cluster 2

Network representation cont
Network Representation (cont.) contiguous match substring r = 2

  • Hofmeyr and Forrest3:

    • Phenotype: 3 integer fields (source IP address, destination IP address, service or port number)

    • Genotype: for a detector, 49 bit binary string + state

Algorithmic refinements 3
Algorithmic Refinements contiguous match substring r = 23

  • Detectors may have a lifetime at the end of which they are replaced if they have not matched – maintain diversity

  • Activation threshold and time decay on activation level to deter limited autoimmune reaction to rare self strings

  • Local activation causes a message to be sent to other hosts decreasing their activation levels (cytokine costimulation)

  • Matching rule may result in holes in coverage. A randomly assigned permutation mask to control packet presentation helps avoid this (MHC molecule host diversity) contributing to population diversity.

  • Each host has a unique detector set contributing to diversity and self-protection across a population of hosts

The hofmeyr and forrest model 3
The Hofmeyr and Forrest Model contiguous match substring r = 23

Antigen pheotype

Host-based refinement fields

Detector state

Problem posed by computer applications
Problem Posed by Computer Applications contiguous match substring r = 2

  • The repertoire of human self proteins is fixed over a lifetime

  • In networks, valid hosts are added and deleted without notice so what “self” is constantly changes

  • Among a fixed set of hosts, valid usage patterns may change without notice

  • One solution: costimulation by a trusted (human) authority both at start and subsequent operation

  • Much work needs to be done

Advantages of is models
Advantages of IS Models contiguous match substring r = 2

  • Adaptability through the ability to recognize foreign patterns never before encountered

  • Distributed detection contributes to:

    • Diversity (shape space coverage)

    • Robustness (failure of individual hosts)

    • Scalability and extendibility

  • Quick response to new variants of old attacks

  • Ability to reproduce detectors of increasing fitness while self-regulating the overall population

Is as a swarm system
IS as a Swarm System contiguous match substring r = 2

  • The IS model has a number of characteristics in common with swarm systems:

    • Large populations of independent agents of characterizable classes

    • Each agent has at most a very few characteristic simple behaviors:

      • Bind with another appropriate agent and activate (B and T cells)

      • Kill something (killer T cell)

      • Clone myself (B cell)

      • Secrete a signaling chemical or an antibody (T and B cells)

      • Live for a long time (memory B and T cells)

    • Simple interactions with the environment:

      • Special things that happen in lymphoid organs

      • Secreting signal chemicals which alter environmental properties (cytokines and inflammation)

    • Self-organizing as an emergent property

    • No centralized control over the system

Areas for additional research
Areas for Additional Research contiguous match substring r = 2

  • Matching rules with good computational properties, perhaps application specific ones

  • Self/non-self representation and encoding

  • Algorithms for generating detector sets

  • Other selection algorithms

  • Incorporation of additional IS characteristics

  • Detector set populations: evolution, dynamics and emergent properties at the species level

References contiguous match substring r = 2

  • Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evolutionary Computation, 1(3):191-211, 1993.

  • Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination in a Computer. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos CA, 1994.

  • Hofmeyr and Forrest. Immunity by Design: An Artificial Immune System. In Proceedings of 1999 GECCO Conference, 1999.

  • Kim and Bentley. Negative selection and niching by an artificial immune system for network intrusion detection. In Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, Orlando, Florida, 1999.

References cont
References (cont.) contiguous match substring r = 2

  • Forrest, Allen, Perelson and Cherukuri. A Change-Detection Algorithm Inspired by the Immune System. Submitted to IEEE Transactions on Software Engineering, 1995.

  • D'haeseleer, Forrest and Helman. An Immunological Approach to Change Detection: Algorithms, Analysis, and Implications. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, 1996.