- 124 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Real-valued negative selection algorithms' - oralee

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Questions?

outline

- Background
- Variations of real-valued selection algorithms
- More details through an example: V-detector
- Demonstration

Background: AIS

- AIS (Artificial Immune Systems) – only about 10 years’ history
- Negative selection (development of T cells)
- Immune network theory (how B cells and antibodies interact with each other)
- Clonal selection (how a pool of B cells, especially, memory cells are developed)
- New inspirations from immunology: danger theory, germinal center, etc.
- Negative selection algorithms
- The earliest and most widely used AIS.

background

Biological metaphor of negative selection

How T cells mature in the thymus:

- The cell are diversified.
- Those that recognize self are eliminated.
- The rest are used to recognize nonself.

The idea of negative selection algorithms (NSA)

The concept of feature space and detectors

- The problem to deal with: anomaly detection (or one-class classification)
- Detector set
- random generation: maintain diversity
- censoring: eliminating those that match self samples

background

Outline of a typical NSA

Anomaly detection:

(classification of incoming data items)

Generation of detector set

background

Family of NSA

Types of works about NSA

- Applications: solving real world problems by using a typical version or adapting for specific applications
- Improving NSA of new detector scheme and generation method and analyzing existing methods. Works are data representation specific, mostly binary representation.
- Establishment of framework for binary representation to include various matching rules; discussion on uniqueness and usefulness of NSA; introduction of new concepts.

What defines a negative selection algorithm?

- Representation in negative space
- One-class learning
- Usage of detector set

background

Data representation in NSA

- Different representations vs. different searching space
- Various representations:
- Binary
- String over finite alphabet: no fundamental difference from binary
- Real-valued vector
- hybrid
- Different distance measure
- Data representation is not the only factor to make a scheme different

Real-valued NSA

- Why is real-valued NSA different from binary NSA?
- Hard to analyze: simple combinatorics would not work
- Necessary and proper for many real applications: binary representation may decouple the relation between feature space and representation
- Is categorization based on data representation a good way to understand and develop NSA?

Major issues in NSA

- Number of detectors
- Affecting the efficiency of generation and detection
- Detector coverage
- Affecting the accuracy detection
- Generation mechanisms
- Affecting the efficiency of generation and the quality of resulted detectors
- Matching rules – generalization
- How to interpret the training data
- depending on the feature space and representation scheme
- Issues that are not NSA specific
- Difficulty of one-class classification
- Curse of dimensionality

Variations of real-valued NSA

- Rectangular detectors generated with GA
- Circular detectors that move and change size
- MILA (multilevel immune learning algorithm)

Rectangular detectors + GA

- Rectangular detectors: “rules” of value range
- Generated by a typical genetic algorithm

By Gonzalez, Dasgupta

Circular detectors (hypersphere)

- From constant size to variable size
- Moving after initial generation:
- Reduce overlap
- “artificial annealing”

By Dasgupta, KrishnaKumar et al

By Dasgupta, Gonzalez

MILA

- Multilevel – to capture local patterns and global patterns
- Negative selection + positive selection
- Euclidean distance on sub-space

For example, suppose that a self string is <s1, s2, …, sL> and the window size is chosen as 3, then the self peptide strings can be <s1, s3, sL>, < s2, s4, s9>, < s5, s7, s8> and so on by randomly picking up the attribute at some positions.

V-detector

- V-detector is a new negative selection algorithm.
- It embraces a series of related works to develop a more efficient and more reliable algorithm.
- It has its unique process to generate detectors and determine coverage.

V-detector’s major features

- Variable-sized detectors
- Statistical confidence in detector coverage
- Boundary-aware algorithm
- Extensibility

In real-valued representation, detector can be visualized as hyper-sphere.

Candidate 1: thrown-away; candidate 2: made a detector.

Variable sized detectors in V-detector method are “maximized detector”

- Unanswered question: what is the self space?

V-detector: maximized size

traditional detectors: constant size

Why is the idea of “variable sized detectors” novel?

- The rational of constant size: a uniform matching threshold
- Detectors of variable size exist in some negative selection algorithms as a different mechanism
- Allowing multiple or evolving size to optimize the coverage – limited by the concern of overlap
- Variable size as part of random property of detectors/candidates
- V-detector uses variable sized detectors to maximize the coverage with limited number of detectors
- Size is decided on by the training data
- Large nonself region is covered easily
- Small detectors cover ‘holes’
- Overlap is not an issue in V-detector

Statistical estimate of detector coverage

- Exiting works: estimate necessary number of detectors – no direct relationship between the estimate and the actual detector set obtained.
- Novelty of V-detector:
- Evaluate the coverage of the actual detector set
- Statistical inference is used as an integrated components of the detector generation algorithm, not to estimate coverage of finished detector set.

Basic idea leading to the new estimation mechanism

- Random points are taken as detector candidates. The probability that a random point falls on covered region (some exiting detectors) reflects the portion that is covered -- similar to the idea of Monte Carlo integral.
- Proportion of covered nonself space = probability of a sample point to be a covered point. (the points on self region not counted)
- When more nonself space has been covered, it becomes less likely that a sample point to be an uncovered one. In other words, we need try more random point to find a uncovered one - one that can be used to make a detector.

Statistics involved

- Central limit theory: sample statistic follows normal distribution
- Using sample statistic to population parameter
- In our application, use proportion of covered random points to estimate the actual proportion of covered area

proportion

0

1

Statistic inference

- Point estimate versus confidence interval
- Estimate with confidence interval versus hypothesis testing
- Proportion that is close to 100% will make the assumption of central limit theory invalid – not normal distribution.
- Purpose of terminating the detector generation

Hypothesis testing

- Identifying null hypothesis/alternative hypothesis.
- Type I error: falsely reject null hypothesis
- Type II error: falsely accept null hypothesis
- The null hypothesis is the statement that we’d rather take as true if there is not strong enough evidence showing otherwise. In other words, we consider type I error more costly.
- In term of coverage estimate, we consider falsely inadequate coverage is more costly. So the null hypothesis is: the current coverage is below the target coverage.
- Choose significant level: maximum probability we are willing to accept in making Type I Error.
- Collect sample and compute its statistic, in this case, the proportion.
- Calculate z score from proportion an compare with za
- If z is larger, we can reject null hypothesis and claim adequate coverage with confidence

Boundary-aware algorithm versus point-wise interpretation

- A new concept in negative selection algorithm
- Previous works of NSA
- Matching threshold is used as mechanism to control the extent of generalization
- However, each self sample is used individually. The continuous area represented by a group of sample is not captured. (point-wise interpretation)

Desired interpretation:

The area represented by

The group of points

More specificity

Relatively more aggressive to detect anomaly

More generalization

The real boundary is

Extended.

Boundary–aware: using the training points as a collection

- Boundary-aware algorithm
- A ‘clustering’ mechanism though represented in negative space
- The training data are used as a collection instead individually.
- Positive selection cannot do the same thing

V-detector is more than a real-valued negative selection algorithm

- V-detector can be implemented for any data representation and distance measure.
- Usually negative selection algorithms were designed with specific data representation and distance measure.
- The features we just introduced are not limited by representation scheme or generation mechanism. (as long as we have a distance measure and a threshold to decide matching)

V-detector’s advantages

- Efficiency:
- fewer detectors
- fast generation
- Coverage confidence
- Extensibility, simplicity

Experiments

- A large pool of synthetic data (2-D real space) are experimented to understand V-detector’s behavior
- More detail analysis of the influence of various parameters is planned as ‘work to do’
- Real world data
- Confirm it works well enough to detect real world “anomaly”
- Compare with methods dealing with similar problems
- Demonstration
- How actual training data and detector look like
- Basic UI and visualization of V-detector implementation

Parameters to evaluate its performance

- Detection rate
- False alarm rate
- Number of detectors

Control parameters and algorithm variations

- Self radius – key parameter
- Target coverage
- Significant level (of hypothesis testing)
- Boundary-aware versus point-wise
- Hypothesis testing versus naïve estimate
- Reuse random points versus minimum detector set (to be implemented)

Data’s influence on performance

- Specific shape
- Intuitively, “corners” will affect the results.
- Number of training points
- Major influence

Experiments on 2-D synthetic data

Training points (1000)

Test data (1000 points) and the

‘real shape’ we try to learn

Synthetic data (‘intersection’ and pentagram): compare naïve estimate and hypothesis testing

‘intersection’ shape

pentagram

Real world data

- Biomedical data
- Pollution data
- Ball bearing – preprocessed time series data
- Others: Iris data, gene data, India Telugu

Ball bearing’s structure and damage

Damaged cage

Ball bearing data

Example of raw data (new bearings, first 1000 points)

- raw data: time series of acceleration measurements
- Preprocessing (from time domain to representation space for detection)
- FFT (Fast Fourier Transform) with Hanning windowing: window size 32
- Statistical moments: up to 5th order

Conclusions

- Real-valued NSA has unique advantages and difficulties.
- Good NSA should not be limited by the difference in data representation
- “Killer application” is needed to support the necessity of NSA as many other “soft computation” paradigm
- Compare with other methods. In case of NSA, other one-class classification, e.g. one-class SVM
- Good representation scheme and distance measure play a very important role in performance – more important than algorithm variations in many cases.

references

- S Forrest, A. S. Perelson, L. Allen, and R. Cherukuri. Self-nonself discrimination in a computer. In Proc. of the IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos, CA, pp. 202–212, 1994.
- D. Dasgupta and F. Gonzalez, An Immunity-Based Technique to Characterize Intrusions in Computer Networks. In the Journal IEEE Transactions on Evolutionary Computation, Volume:6, Issue:3,Page(s):281-291, June, 2002.
- F. Gonzalez, D. Dasgupta and L.F. Nino. A Randomized Real-Valued Negative Selection Algorithm. In the proceedings of the 2nd International Conference on Artificial Immune Systems UK September 1-3, 2003.
- D.Dasgupta, S.Yu and N.S. Majumdar. MILA - Multilevel Immune Learning Algorithm. In the proceedings of the Genetic and Evolutionary Computation Conference(GECCO) Chicago, July 12-16 2003.
- Dasgupta, Ji, Gonzalez, Artificial immune system (AIS) research in the last five years, CEC 2003
- Ji, Dasgupta, Augmented negative selection algorithm with variable-coverage detectors, CEC 2004
- D.Dasgupta, K.KrishnaKumar, D.Wong, M.Berry Negative Selection Algorithm for Aircraft Fault Detection. 3rd International Conference on Artificial Immune Systems Catania, Sicily.(Italy) September 13-16 2004.
- Ji, Dasgupta, Real-valued negative selection algorithm with variable-sized detectors, GECCO 2004
- Simon M. Garrett. How do we evaluate artificial immune systems? Evolutionary Computation, 13(2):145–178, 2005.
- Ji, Dasgupta, Estimating the detector coverage in a negative selection algorithm, GECCO 2005
- Ji, A boundary-aware negative selection algorithm, ASC 2005
- Ji, Dasgupta, Revisiting negative selection algorithms, submitted to the Evolutionary Computation Journal
- Ji, Dasgupta, An efficient negative selection algorithm of “probably adequate” coverage, submitted to SMC

Thank you!

What is matching rule?

- When a sample and a detector are considered matching.
- Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.

Experiments and Results

- Synthetic Data
- 2D. Training data are randomly chosen from the normal region.
- Fisher’s Iris Data
- One of the three types is considered as “normal”.
- Biomedical Data
- Abnormal data are the medical measures of disease carrier patients.
- Air Pollution Data
- Abnormal data are made by artificially altering the normal air measurements
- Ball bearings:
- Measurement: time series data with preprocessing - 30D and 5D

Synthetic data - Cross-shaped self spaceShape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

Synthetic data - Cross-shaped self spaceResults

Detection rate and false alarm rate

Number of detectors

Synthetic data - Ring-shaped self spaceShape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

Synthetic data - Ring-shaped self spaceResults

Detection rate and false alarm rate

Number of detectors

Iris dataVirginica as normal, 50% points used to train

Detection rate and false alarm rate

Number of detectors

Biomedical data

- Blood measure for a group of 209 patients
- Each patient has four different types of measurement
- 75 patients are carriers of a rare genetic disorder. Others are normal.

Air pollution data

- Totally 60 original records.
- Each is 16 different measurements concerning air pollution.
- All the real data are considered as normal.
- More data are made artificially:
- Decide the normal range of each of 16 measurements
- Randomly choose a real record
- Change three randomly chosen measurements within a larger than normal range
- If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal
- Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.

Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points

Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points

Download Presentation

Connecting to Server..