Real valued negative selection algorithms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 71

Real-valued negative selection algorithms PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Real-valued negative selection algorithms. Zhou Ji 11-2-2005. outline. Background Variations of real-valued selection algorithms More details through an example: V-detector Demonstration. Background: AIS. AIS (Artificial Immune Systems) – only about 10 years’ history

Download Presentation

Real-valued negative selection algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Real valued negative selection algorithms

Real-valued negative selection algorithms

Zhou Ji

11-2-2005


Outline

outline

  • Background

  • Variations of real-valued selection algorithms

  • More details through an example: V-detector

  • Demonstration


Background ais

Background: AIS

  • AIS (Artificial Immune Systems) – only about 10 years’ history

    • Negative selection (development of T cells)

    • Immune network theory (how B cells and antibodies interact with each other)

    • Clonal selection (how a pool of B cells, especially, memory cells are developed)

    • New inspirations from immunology: danger theory, germinal center, etc.

  • Negative selection algorithms

    • The earliest and most widely used AIS.

background


Biological metaphor of negative selection

Biological metaphor of negative selection

How T cells mature in the thymus:

  • The cell are diversified.

  • Those that recognize self are eliminated.

  • The rest are used to recognize nonself.


The idea of negative selection algorithms nsa

The idea of negative selection algorithms (NSA)

The concept of feature space and detectors

  • The problem to deal with: anomaly detection (or one-class classification)

  • Detector set

    • random generation: maintain diversity

    • censoring: eliminating those that match self samples

background


Outline of a typical nsa

Outline of a typical NSA

Anomaly detection:

(classification of incoming data items)

Generation of detector set

background


Family of nsa

Family of NSA

Types of works about NSA

  • Applications: solving real world problems by using a typical version or adapting for specific applications

  • Improving NSA of new detector scheme and generation method and analyzing existing methods. Works are data representation specific, mostly binary representation.

  • Establishment of framework for binary representation to include various matching rules; discussion on uniqueness and usefulness of NSA; introduction of new concepts.

    What defines a negative selection algorithm?

  • Representation in negative space

  • One-class learning

  • Usage of detector set

background


Data representation in nsa

Data representation in NSA

  • Different representations vs. different searching space

  • Various representations:

    • Binary

    • String over finite alphabet: no fundamental difference from binary

    • Real-valued vector

    • hybrid

  • Different distance measure

    • Data representation is not the only factor to make a scheme different


Real valued nsa

Real-valued NSA

  • Why is real-valued NSA different from binary NSA?

    • Hard to analyze: simple combinatorics would not work

    • Necessary and proper for many real applications: binary representation may decouple the relation between feature space and representation

  • Is categorization based on data representation a good way to understand and develop NSA?


Major issues in nsa

Major issues in NSA

  • Number of detectors

    • Affecting the efficiency of generation and detection

  • Detector coverage

    • Affecting the accuracy detection

  • Generation mechanisms

    • Affecting the efficiency of generation and the quality of resulted detectors

  • Matching rules – generalization

    • How to interpret the training data

    • depending on the feature space and representation scheme

  • Issues that are not NSA specific

    • Difficulty of one-class classification

    • Curse of dimensionality


Variations of real valued nsa

Variations of real-valued NSA

  • Rectangular detectors generated with GA

  • Circular detectors that move and change size

  • MILA (multilevel immune learning algorithm)


Rectangular detectors ga

Rectangular detectors + GA

  • Rectangular detectors: “rules” of value range

  • Generated by a typical genetic algorithm

By Gonzalez, Dasgupta


Circular detectors hypersphere

Circular detectors (hypersphere)

  • From constant size to variable size

  • Moving after initial generation:

    • Reduce overlap

    • “artificial annealing”

By Dasgupta, KrishnaKumar et al

By Dasgupta, Gonzalez


Real valued negative selection algorithms

MILA

  • Multilevel – to capture local patterns and global patterns

  • Negative selection + positive selection

  • Euclidean distance on sub-space

For example, suppose that a self string is <s1, s2, …, sL> and the window size is chosen as 3, then the self peptide strings can be <s1, s3, sL>, < s2, s4, s9>, < s5, s7, s8> and so on by randomly picking up the attribute at some positions.


V detector

V-detector

  • V-detector is a new negative selection algorithm.

  • It embraces a series of related works to develop a more efficient and more reliable algorithm.

  • It has its unique process to generate detectors and determine coverage.


V detector s major features

V-detector’s major features

  • Variable-sized detectors

  • Statistical confidence in detector coverage

  • Boundary-aware algorithm

  • Extensibility


Real valued negative selection algorithms

Match or not match?

In real-valued representation, detector can be visualized as hyper-sphere.

Candidate 1: thrown-away; candidate 2: made a detector.


Variable sized detectors in v detector method are maximized detector

Variable sized detectors in V-detector method are “maximized detector”

  • Unanswered question: what is the self space?

V-detector: maximized size

traditional detectors: constant size


Why is the idea of variable sized detectors novel

Why is the idea of “variable sized detectors” novel?

  • The rational of constant size: a uniform matching threshold

  • Detectors of variable size exist in some negative selection algorithms as a different mechanism

    • Allowing multiple or evolving size to optimize the coverage – limited by the concern of overlap

    • Variable size as part of random property of detectors/candidates

  • V-detector uses variable sized detectors to maximize the coverage with limited number of detectors

    • Size is decided on by the training data

    • Large nonself region is covered easily

    • Small detectors cover ‘holes’

    • Overlap is not an issue in V-detector


Statistical estimate of detector coverage

Statistical estimate of detector coverage

  • Exiting works: estimate necessary number of detectors – no direct relationship between the estimate and the actual detector set obtained.

  • Novelty of V-detector:

    • Evaluate the coverage of the actual detector set

    • Statistical inference is used as an integrated components of the detector generation algorithm, not to estimate coverage of finished detector set.


Basic idea leading to the new estimation mechanism

Basic idea leading to the new estimation mechanism

  • Random points are taken as detector candidates. The probability that a random point falls on covered region (some exiting detectors) reflects the portion that is covered -- similar to the idea of Monte Carlo integral.

    • Proportion of covered nonself space = probability of a sample point to be a covered point. (the points on self region not counted)

  • When more nonself space has been covered, it becomes less likely that a sample point to be an uncovered one. In other words, we need try more random point to find a uncovered one - one that can be used to make a detector.


Statistics involved

Statistics involved

  • Central limit theory: sample statistic follows normal distribution

    • Using sample statistic to population parameter

    • In our application, use proportion of covered random points to estimate the actual proportion of covered area

proportion

0

1


Statistic inference

Statistic inference

  • Point estimate versus confidence interval

  • Estimate with confidence interval versus hypothesis testing

    • Proportion that is close to 100% will make the assumption of central limit theory invalid – not normal distribution.

    • Purpose of terminating the detector generation


Hypothesis testing

Hypothesis testing

  • Identifying null hypothesis/alternative hypothesis.

    • Type I error: falsely reject null hypothesis

    • Type II error: falsely accept null hypothesis

    • The null hypothesis is the statement that we’d rather take as true if there is not strong enough evidence showing otherwise. In other words, we consider type I error more costly.

    • In term of coverage estimate, we consider falsely inadequate coverage is more costly. So the null hypothesis is: the current coverage is below the target coverage.

  • Choose significant level: maximum probability we are willing to accept in making Type I Error.

  • Collect sample and compute its statistic, in this case, the proportion.

  • Calculate z score from proportion an compare with za

  • If z is larger, we can reject null hypothesis and claim adequate coverage with confidence


Boundary aware algorithm versus point wise interpretation

Boundary-aware algorithm versus point-wise interpretation

  • A new concept in negative selection algorithm

  • Previous works of NSA

    • Matching threshold is used as mechanism to control the extent of generalization

    • However, each self sample is used individually. The continuous area represented by a group of sample is not captured. (point-wise interpretation)

Desired interpretation:

The area represented by

The group of points

More specificity

Relatively more aggressive to detect anomaly

More generalization

The real boundary is

Extended.


Boundary aware using the training points as a collection

Boundary–aware: using the training points as a collection

  • Boundary-aware algorithm

    • A ‘clustering’ mechanism though represented in negative space

    • The training data are used as a collection instead individually.

    • Positive selection cannot do the same thing


V detector is more than a real valued negative selection algorithm

V-detector is more than a real-valued negative selection algorithm

  • V-detector can be implemented for any data representation and distance measure.

    • Usually negative selection algorithms were designed with specific data representation and distance measure.

  • The features we just introduced are not limited by representation scheme or generation mechanism. (as long as we have a distance measure and a threshold to decide matching)


Real valued negative selection algorithms

V-detector algorithm with

confidence in detector coverage

contribution


Real valued negative selection algorithms

V-detector algorithm with

confidence in detector coverage

contribution


Real valued negative selection algorithms

V-detector algorithm with

confidence in detector coverage

contribution


V detector s advantages

V-detector’s advantages

  • Efficiency:

    • fewer detectors

    • fast generation

  • Coverage confidence

  • Extensibility, simplicity


Experiments

Experiments

  • A large pool of synthetic data (2-D real space) are experimented to understand V-detector’s behavior

    • More detail analysis of the influence of various parameters is planned as ‘work to do’

  • Real world data

    • Confirm it works well enough to detect real world “anomaly”

    • Compare with methods dealing with similar problems

  • Demonstration

    • How actual training data and detector look like

    • Basic UI and visualization of V-detector implementation


Parameters to evaluate its performance

Parameters to evaluate its performance

  • Detection rate

  • False alarm rate

  • Number of detectors


Control parameters and algorithm variations

Control parameters and algorithm variations

  • Self radius – key parameter

  • Target coverage

  • Significant level (of hypothesis testing)

  • Boundary-aware versus point-wise

  • Hypothesis testing versus naïve estimate

  • Reuse random points versus minimum detector set (to be implemented)


Data s influence on performance

Data’s influence on performance

  • Specific shape

    • Intuitively, “corners” will affect the results.

  • Number of training points

    • Major influence


Experiments on 2 d synthetic data

Experiments on 2-D synthetic data

Training points (1000)

Test data (1000 points) and the

‘real shape’ we try to learn


Detector sets generated

Detector sets generated

Trained with 1000 points

Trained with 100 points


Synthetic data intersection and pentagram compare na ve estimate and hypothesis testing

Synthetic data (‘intersection’ and pentagram): compare naïve estimate and hypothesis testing

‘intersection’ shape

pentagram


Synthetic data results for different shapes of self region

Synthetic data : results for different shapes of self region


Synthetic data ring compare boundary aware and point wise

Synthetic data (ring): compare boundary-aware and point-wise

False alarm rate

Detection rate


Synthetic data cross shaped self balance of errors

Synthetic data (cross-shaped self): balance of errors


Real world data

Real world data

  • Biomedical data

  • Pollution data

  • Ball bearing – preprocessed time series data

  • Others: Iris data, gene data, India Telugu


Results of biomedical data

Results of biomedical data


Results of air pollution data

Results of air pollution data

Detection rate and false alarm rate

Number of detectors


Ball bearing s structure and damage

Ball bearing’s structure and damage

Damaged cage


Ball bearing data

Ball bearing data

Example of raw data (new bearings, first 1000 points)

  • raw data: time series of acceleration measurements

  • Preprocessing (from time domain to representation space for detection)

    • FFT (Fast Fourier Transform) with Hanning windowing: window size 32

    • Statistical moments: up to 5th order


Ball bearing data results

Ball bearing data: results

Preprocessed with FFT

Preprocessed with statistical moments


Real valued negative selection algorithms

Ball bearing experiments

with two different preprocessing techniques

contribution


Results of iris data

Results of Iris data


Iris data number of detectors

Iris data: number of detectors


Conclusions

Conclusions

  • Real-valued NSA has unique advantages and difficulties.

  • Good NSA should not be limited by the difference in data representation

  • “Killer application” is needed to support the necessity of NSA as many other “soft computation” paradigm

    • Compare with other methods. In case of NSA, other one-class classification, e.g. one-class SVM

  • Good representation scheme and distance measure play a very important role in performance – more important than algorithm variations in many cases.


References

references

  • S Forrest, A. S. Perelson, L. Allen, and R. Cherukuri. Self-nonself discrimination in a computer. In Proc. of the IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos, CA, pp. 202–212, 1994.

  • D. Dasgupta and F. Gonzalez, An Immunity-Based Technique to Characterize Intrusions in Computer Networks. In the Journal IEEE Transactions on Evolutionary Computation, Volume:6, Issue:3,Page(s):281-291, June, 2002.

  • F. Gonzalez, D. Dasgupta and L.F. Nino. A Randomized Real-Valued Negative Selection Algorithm. In the proceedings of the 2nd International Conference on Artificial Immune Systems UK September 1-3, 2003.

  • D.Dasgupta, S.Yu and N.S. Majumdar. MILA - Multilevel Immune Learning Algorithm. In the proceedings of the Genetic and Evolutionary Computation Conference(GECCO) Chicago, July 12-16 2003.

  • Dasgupta, Ji, Gonzalez, Artificial immune system (AIS) research in the last five years, CEC 2003

  • Ji, Dasgupta, Augmented negative selection algorithm with variable-coverage detectors, CEC 2004

  • D.Dasgupta, K.KrishnaKumar, D.Wong, M.Berry Negative Selection Algorithm for Aircraft Fault Detection. 3rd International Conference on Artificial Immune Systems Catania, Sicily.(Italy) September 13-16 2004.

  • Ji, Dasgupta, Real-valued negative selection algorithm with variable-sized detectors, GECCO 2004

  • Simon M. Garrett. How do we evaluate artificial immune systems? Evolutionary Computation, 13(2):145–178, 2005.

  • Ji, Dasgupta, Estimating the detector coverage in a negative selection algorithm, GECCO 2005

  • Ji, A boundary-aware negative selection algorithm, ASC 2005

  • Ji, Dasgupta, Revisiting negative selection algorithms, submitted to the Evolutionary Computation Journal

  • Ji, Dasgupta, An efficient negative selection algorithm of “probably adequate” coverage, submitted to SMC


Questions

Questions?

Thank you!


What is matching rule

What is matching rule?

  • When a sample and a detector are considered matching.

  • Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.


Experiments and results

Experiments and Results

  • Synthetic Data

    • 2D. Training data are randomly chosen from the normal region.

  • Fisher’s Iris Data

    • One of the three types is considered as “normal”.

  • Biomedical Data

    • Abnormal data are the medical measures of disease carrier patients.

  • Air Pollution Data

    • Abnormal data are made by artificially altering the normal air measurements

  • Ball bearings:

    • Measurement: time series data with preprocessing - 30D and 5D


Synthetic data cross shaped self space shape of self region and example detector coverage

Synthetic data - Cross-shaped self spaceShape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1


Synthetic data cross shaped self space results

Synthetic data - Cross-shaped self spaceResults

Detection rate and false alarm rate

Number of detectors


Synthetic data ring shaped self space shape of self region and example detector coverage

Synthetic data - Ring-shaped self spaceShape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1


Synthetic data ring shaped self space results

Synthetic data - Ring-shaped self spaceResults

Detection rate and false alarm rate

Number of detectors


Iris data virginica as normal 50 points used to train

Iris dataVirginica as normal, 50% points used to train

Detection rate and false alarm rate

Number of detectors


Biomedical data

Biomedical data

  • Blood measure for a group of 209 patients

  • Each patient has four different types of measurement

  • 75 patients are carriers of a rare genetic disorder. Others are normal.


Biomedical data1

Biomedical data

Detection rate and false alarm rate

Number of detectors


Air pollution data

Air pollution data

  • Totally 60 original records.

  • Each is 16 different measurements concerning air pollution.

  • All the real data are considered as normal.

  • More data are made artificially:

    • Decide the normal range of each of 16 measurements

    • Randomly choose a real record

    • Change three randomly chosen measurements within a larger than normal range

    • If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal

  • Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.


Example of data fft of new bearings first 3 coefficients of the first 100 points

Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points


Example of data statistical moments of new bearings moments up to 3rd order of the first 100 points

Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points


How much one sample tells

How much one sample tells


Samples may be on boundary

Samples may be on boundary


In term of detectors

In term of detectors


Comparing three methods

Comparing three methods

New algorithm

V-detector

Constant-sized detectors

Self radius = 0.05


Comparing three methods1

Comparing three methods

V-detectors

New algorithm

Constant-sized detectors

Self radius = 0.1


Real valued negative selection algorithms

Back to the presentation


  • Login