algorithms for port of entry inspection for wmds n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Algorithms for Port of Entry Inspection for WMDs PowerPoint Presentation
Download Presentation
Algorithms for Port of Entry Inspection for WMDs

Loading in 2 Seconds...

play fullscreen
1 / 97

Algorithms for Port of Entry Inspection for WMDs - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Algorithms for Port of Entry Inspection for WMDs. Fred S. Roberts DyDAn Center Rutgers University . Port of Entry Inspection Algorithms. Goal : Find ways to intercept illicit nuclear materials and weapons destined for the U.S. via the maritime transportation system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Algorithms for Port of Entry Inspection for WMDs' - branden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
algorithms for port of entry inspection for wmds

Algorithms for Port of Entry Inspection for WMDs

Fred S. Roberts

DyDAn Center

Rutgers University

port of entry inspection algorithms

Port of Entry Inspection Algorithms

  • Goal: Find ways to intercept illicit
  • nuclear materials and weapons
  • destined for the U.S. via the
  • maritime transportation system
  • Currently inspecting only small
  • % of containers arriving at ports
  • Even inspecting 8% of containers in Port of NY/NJ might bring international trade to a halt (Larrabbee 2002)
port of entry inspection algorithms1

Aim: Develop decision support algorithms that will help us to “optimally” intercept illicit materials and weapons subject to limits on delays, manpower, and equipment

  • Find inspection schemes that minimize total “cost” including “cost” of false positives and false negatives

Port of Entry Inspection Algorithms

Mobile Vacis: truck-mounted gamma ray imaging system

port of entry inspection algorithms2

My work on port of entry inspection has gotten me and our students and colleagues to some remarkable places.

Port of Entry Inspection Algorithms

Me on a Coast Guard

boat in a tour of the

harbor in Philadelphia

sequential decision making problem

Stream of containers arrives at a port

  • The Decision Maker’s Problem:
    • Which to inspect?
    • Which inspections next based on previous results?
  • Approach:
    • “decision logics”
    • combinatorial optimization methods
    • Builds on ideas of Stroud

and Saeger at Los Alamos

National Laboratory

    • Need for new models

and methods

Sequential Decision Making Problem

sequential diagnosis problem

Such sequential diagnosis problems arise in many areas:

    • Communication networks (testing connectivity, paging cellular customers, sequencing tasks, …)
    • Manufacturing (testing machines, fault diagnosis, routing customer service calls, …)
    • Artificial intelligence/CS (optimal derivation strategies in knowledge bases, best-value satisficing search, coding decision trees, …)
    • Medicine (diagnosing patients, sequencing treatments, …)

Sequential Diagnosis Problem

sequential decision making problem1

Containersarriving to be classified into categories.

  • Simple case: 0 = “ok”, 1 = “suspicious”
  • Inspection scheme: specifies which inspections are to be made based on previous observations

Sequential Decision Making Problem

sequential decision making problem2

Containers have attributes, each

  • in a number of states
  • Sample attributes:
    • Levels of certain kinds of chemicals or biological materials
    • Whether or not there are items of a certain kind in the cargo list
    • Whether cargo was picked up in a certain port

Sequential Decision Making Problem

sequential decision making problem3

Currently used attributes:

    • Does ship’s manifest set off an “alarm”?
    • What is the neutron or Gamma emission count? Is it above threshold?
    • Does a radiograph image come up positive?
    • Does an induced fission test come up positive?

Sequential Decision Making Problem

Gamma ray detector

sequential decision making problem4

We can imagine many other attributes

  • This project is concerned with general algorithmic approaches.
  • We seek a methodology not tied to today’s technology.
  • Detectors are evolving quickly.

Sequential Decision Making Problem

sequential decision making problem5

Simplest Case: Attributes are in state 0 or 1

  • Then: Container is a binary string like 011001
  • So: Classification is a decision functionF that assigns each binary string to a category.

Sequential Decision Making Problem

011001

F(011001)

If attributes 2, 3, and 6 are present, assign container to category F(011001).

sequential decision making problem6

If there are two categories, 0 and 1, decision function F is a Boolean function.

  • Example:
  • F(000) = F(111) = 1, F(abc) = 0 otherwise
  • This classifies a container as positive iff it has none of the attributes or all of them.

Sequential Decision Making Problem

1 =

sequential decision making problem7

Sequential Decision Making Problem

  • Given a container, test its attributes until know enough to calculate the value of F.
  • An inspection scheme tells us in which order to test the attributes to minimize cost.
  • Even this simplified problem is hard computationally.
sequential decision making problem8

This assumes F is known.

  • Simplifying assumption: Attributes are independent.
  • At any point we stop inspecting and output the value of F based on outcomes of inspections so far.
  • Complications: May be precedence relations in the components (e.g., can’t test attribute a4 before testing a6.
  • Or: cost may depend on attributes tested before.
  • F may depend on variables that cannot be directly tested or for which tests are too costly.

Sequential Decision Making Problem

sequential decision making problem9

Such problems are hard computationally.

  • There are many possible Boolean functions F.
  • Even if F is fixed, problem of finding a good classification scheme (to be defined precisely below) is NP-complete.
  • Several classes of functions F allow for efficient inspection schemes:
    • k-out-of-n systems
    • Certain series-parallel systems
    • Read-once systems
    • “regular” systems
    • Horn systems

Sequential Decision Making Problem

sensors and inspection lanes

n types of sensors measure presence or absence of the n attributes.

  • Many copies of each sensor.
  • Complication: different characteristics of sensors.
  • Entities come for inspection.
  • Which sensor of a given type to
  • use?
  • Think of inspection lanes and
  • queues.
  • Besides efficient inspection
  • schemes, could decrease costs by:
    • Buying more sensors
    • Change allocation of containers to sensor lanes.

Sensors and Inspection Lanes

binary decision tree approach

Sensors measure presence/absence of attributes.

  • Binary Decision Tree:
    • Nodes are sensors or categories (0 or 1)
    • Two arcs exit from each sensor node, labeled left and right.
    • Take the right arc when sensor says the attribute is present, left arc otherwise

Binary Decision Tree Approach

binary decision tree approach1

Reach category 1 from the root only through the path a0 to a1 to 1.

  • Container is classified in category 1 iff it has both attributes a0 and a1 .
  • Corresponding Boolean function F(11) = 1, F(10) = F(01) = F(00) = 0.

Binary Decision Tree Approach

Figure 1

binary decision tree approach2

Reach category 1 from the root by:

  • a0 L to a1 R a2 R 1 or
  • a0 R a2 R1
  • Container classified in category 1 iff it has
  • a1 and a2 and not a0 or
  • a0 and a2 and possibly a1.
  • Corresponding Boolean function F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

Binary Decision Tree Approach

Figure 2

binary decision tree approach3

This binary decision tree corresponds to the same Boolean function

  • F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.
  • However,it has one less observation node ai. So, it is more efficient if all observations are equally costly and equally likely.

Binary Decision Tree Approach

Figure 3

binary decision tree approach4

Even if the Boolean function F is fixed, the problem of finding the “optimal” binary decision tree for it is very hard (NP-complete) if optimal means smallest number of sensor nodes.

  • For small n = number of attributes, can try to solve it by brute force enumeration.
  • Even for n = 4, not practical. (n = 4 at Port of Long Beach-Los Angeles)

Binary Decision Tree Approach

Port of Long Beach

binary decision tree approach5

Promising Approaches:

  • Heuristic algorithms, approximations to optimal.
  • Special assumptions about the Boolean function F.
  • For “monotone” Boolean functions, integer programming formulations give promising heuristics.
  • Stroud and Saeger enumerate all
  • “complete,” monotone Boolean functions
  • and calculate the least expensive
  • corresponding binary decision trees.
  • Their method practical for n up to 4,
  • not n = 5.

Binary Decision Tree Approach

binary decision tree approach6

Monotone Boolean Functions:

  • Given two strings x1x2…xn, y1y2…yn
  • Suppose that xi yi for all i implies that F(x1x2…xn)  F(y1y2…yn).
  • Then we say that F is monotone.
  • Then 11…1 has highest probability of being in category 1.

Binary Decision Tree Approach

binary decision tree approach7

Incomplete Boolean Functions:

  • Boolean function F is incomplete if F can be calculated by finding at most n-1 attributes and knowing the value of the input string on those attributes
  • Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) = F(001) = F(010) = F(011) = 0.
  • F(abc) is determined without knowing b (or c).
  • F is incomplete.

Binary Decision Tree Approach

binary decision tree approach8

Complete, Monotone Boolean Functions:

  • Stroud and Saeger: algorithm for enumerating binary decision trees implementing complete, monotone Boolean functions.
  • Feasible to implement up to n = 4.
  • n = 2:
    • There are 6 monotone Boolean functions.
    • Only 2 of them are complete, monotone
    • There are 4 binary decision trees for calculating these 2 complete, monotone Boolean functions.

Binary Decision Tree Approach

binary decision tree approach9

Complete, Monotone Boolean Functions:

  • n = 3:
    • 9 complete, monotone Boolean functions.
    • 60 distinct binary trees for calculating them
  • n = 4:
    • 114 complete, monotone Boolean functions.
    • 11,808 distinct binary decision trees for calculating them.
    • (Compare 1,079,779,602 BDTs for all Boolean functions)

Binary Decision Tree Approach

binary decision tree approach10

Complete, Monotone Boolean Functions:

  • n = 5:
    • 6894 complete, monotone Boolean functions
    • 263,515,920 corresponding binary decision trees.
  • Combinatorial explosion!
  • Need alternative approaches; enumeration not feasible!
  • (Even worse: compare 5 x 1018 BDTs corresponding to all Boolean functions)

Binary Decision Tree Approach

cost functions

Cost Functions

  • Stroud-Saeger method applies to more sophisticated cost models, not just cost = number of sensors in the BDT.
  • Using a sensor has a cost:
    • Unit cost of inspecting one item with it
    • Fixed cost of purchasing and deploying it
    • Delay cost from queuing up at the sensor station
  • Preliminary problem: disregard fixed and delay costs. Minimize unit costs.
cost functions1

Cost Functions

  • Simplification so far: Disregard characteristics of population of entities being inspected.
  • Only count number of observation (attribute) nodes in the tree.
  • Unit Cost Complication: How many nodes of the decision tree are actually visited during average container’s inspection? Depends on “distribution” of containers. In our early models, depends on probability of sensor errors and probability of bomb in a container.
cost functions delay costs

Tradeoff between fixed costs and delay costs: Add more sensors cuts down on delays.

  • Stochastic process of containers arriving
  • Distribution of delay times for inspections
  • Use queuing theory to find average delay times under different models

Cost Functions: Delay Costs

cost functions unit costs tree utilization

Cost Functions:Unit CostsTree Utilization

  • Complication: Assume cost depends on how many nodes of BDT are actually visited during an “average” container’s inspection. (This is sum of unit costs.)
  • Depends on characteristics of population of entities being inspected.
  • I.e., depends on “distribution” of containers.
  • In our early models, we assume we are given probability of sensor errors and probability of bomb in a container.
  • This allows us to calculate “expected” cost of utilization of the tree Cutil.
cost function used for evaluating the decision trees
Cost Function used for Evaluating the Decision Trees.

Later: Expect to analyze models for distribution of attributes of containers and more sophisticated analysis of expected cost of utilizing the tree, bringing in delay costs.

cost functions2

Cost Functions

  • Cost of false positive: Cost of additional tests.
    • If it means opening the container, it’s very expensive.
  • Cost of false negative:
    • Complex issue.
    • What is cost of a bomb going off in Manhattan?
cost functions sensor errors

One Approach to False Positives/Negatives:

  • Assume there can be Sensor Errors
  • Simplest model: assume that all sensors checking for attribute ai have same fixed probability of saying ai is 0 if in fact it is 1, and similarly saying it is 1 if in fact it is 0.
  • More sophisticated analysis later describes a model for determining probabilities of sensor errors.
  • Notation: X = state of nature (bomb or no bomb)
    • Y = outcome (of sensor or entire inspection process).

Cost Functions: Sensor Errors

probability of error for the entire tree

A

A

B

0

B

0

C

1

C

1

1

0

1

0

Probability of Error for The Entire Tree

State of nature is one (X = 1), presence of a bomb

State of nature is zero (X = 0), absence of a bomb

Probability of false positive

(P(Y=1|X=0))

for this tree is given by

Probability of false negative

(P(Y=0|X=1))

for this tree is given by

P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0)

+ P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)

Pfalsepositive

P(Y=0|X=1) = P(YA=0|X=1) +

P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)

Pfalsenegative

cost function used for evaluating the decision trees1
Cost Function used for Evaluating the Decision Trees.

CTot =CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative+ Cutil

CFalsePositive is the cost of false positive (Type I error)

CFalseNegative is the cost of false negative (Type II error)

PFalsePositive is the probability of a false positive occurring

PFalseNegative is the probability of a false negative occurring

Cutil is the expected cost of utilization of the tree.

stroud saeger experiments
Stroud Saeger Experiments
  • Stroud-Saeger ranked all trees formed

from 3 or 4 sensors A, B, C and D

according to increasing tree costs.

  • Used cost function defined above.
  • Values used in their experiments:
    • CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10;
    • CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01;
    • CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001;
    • CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05;
    • Here, Ci = unit cost of utilization of sensor i.
  • Also fixed were: CFalseNegative, CFalsePositive, P(X=1)
stroud saeger experiments our sensitivity analysis
Stroud Saeger Experiments: Our Sensitivity Analysis
  • We have explored sensitivity of the Stroud-Saeger conclusions to variations in values of these three parameters.
  • We estimated high and low values for these parameters and did experiments with selection of values from the interval of possible values.
stroud saeger experiments our sensitivity analysis1
Stroud Saeger Experiments: Our Sensitivity Analysis
  • CFalseNegativewas varied between 25 million and 10 billion dollars
    • Low and high estimates of direct and indirect costs incurred due to a false negative.
  • CFalsePositive was varied between $180 and $720
    • Cost incurred due to false positive

(4 men * (3 -6 hrs) * (15 – 30 $/hr)

  • P(X=1)was varied between 1/10,000,000 and 1/100,000
stroud saeger experiments our sensitivity analysis2
Stroud Saeger Experiments: Our Sensitivity Analysis

n = 3 (use sensors A, B, C)

  • We chose one of the values from the interval of values and then explored the highest ranked tree as the other two were chosen at random in the interval of values. 10,000 experiments for each fixed value.
  • We looked for the variation in the top-ranked tree and how the top-rank related to choice of parameter values.
  • Very surprising results.
frequency of top ranked trees when c falsenegative and c falsepositive are varied
Frequency of Top-ranked Trees when CFalseNegative and CFalsePositive are Varied
  • 10,000 randomized experiments (randomly selected values of CFalseNegative and CFalsePositive from the specified range of values) for the median value of P(X=1).
  • The above graph has frequency counts of the number of experiments when a particular tree was ranked first or second, or third and so on.
  • Only three trees (7, 55 and 1) ever came first. 6 trees came second, 10 came third, 13 came fourth.
frequency of top ranked trees when c falsenegative and p x 1 are varied
Frequency of Top-ranked Trees when CFalseNegative and P(X=1) are Varied
  • 10,000 randomized experiments for the median value of CFalsePositive.
  • Only 2 trees (7 and 55) ever came first. 4 trees came second. 7 trees came third. 10 and 13 trees came 4th and 5th respectively.
frequency of top ranked trees when p x 1 and c falsepositive are varied
Frequency of Top-ranked Trees when P(X=1) and CFalsePositive are Varied
  • 10,000 randomized experiments for the median value of CFalseNegative.
  • Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10 trees came third. 13 and 16 trees came 4th and 5th respectively.
stroud saeger experiments our sensitivity analysis 4 sensors
Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors
  • Second set of computer experiments: n = 4

(use sensors, A, B, C, D).

  • Same values as before.
  • Experiment 1: Fix values of two of CFalseNegative,CFalsePositive, P(X=1) and vary the third through their interval of possible values.
  • Experiment 2: Fix a value of one of CFalseNegative,CFalsePositive, P(X=1) and vary the other two.
  • Do 10,000 experiments each time.
  • Look for the variation in the highest ranked tree.
stroud saeger experiments our sensitivity analysis 4 sensors1
Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors
  • Experiment 1: Fix values of two of CFalseNegative,CFalsePositive, P(X=1) and vary the third.
c tot vs c falsenegative for ranked 1 trees trees 11485 9651 and 10129 349
CTot vs CFalseNegative for Ranked 1 Trees (Trees 11485(9651) and 10129(349))

Only two trees ever were ranked first, and one, tree 11485, was ranked first in 9651 out of 10,000 runs.

c tot vs c falsepositive for ranked 1 trees tree no 11485 10000
CTot vs CFalsePositive for Ranked 1 Trees (Tree no. 11485 (10000))

One tree, number 11485, was ranked first every time.

c tot vs p x 1 for ranked 1 trees tree no 11485 8372 10129 488 11521 1056
CTot vs P(X=1) for Ranked 1 Trees (Tree no. 11485(8372), 10129(488), 11521(1056))

Three trees dominated first place. Trees 10201(60), 10225(17) and 10153(7) also achieved first rank but with relatively low frequency.

stroud saeger experiments our sensitivity analysis 4 sensors2
Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors
  • Experiment 2: Fix the values of one of CFalseNegative,CFalsePositive, P(X=1) and vary the others.
slide51

Frequency of First Ranked Trees when Two Parameters (CFalseNegative and CFalsePositive) were Varied Keeping P(X=1) Constant at Randomly Selected Values.

Only 6 trees

ever ranked 1st

Tree 11521

dominated

  • 10,000 randomized experiments with randomly selected values ofP(X=1)
  • The experiments were repeated for 20 different randomly selected values ofP(X=1)
slide52

Frequency of First Ranked Trees when Two Parameters (CFalseNegative and P(X=1)) were Varied Keeping CFalsePositiveConstant at Randomly Selected Values.

Only 12 trees

ever ranked 1st

Tree 11521

dominated

  • 10,000 randomized experiments with randomly selected values of CFalsePositive
  • The experiments were repeated for 20 different randomly selected values of CFalsePositive
slide53

Frequency of First Ranked Trees when Two Parameters (P(X=1) and CFalsePositive) were Varied Keeping CFalseNegative Constant at Randomly Selected Values.

Only 7 trees

ever ranked1st

Tree 11521

dominated

  • 10,000 randomized experiments with randomly selected values of CFalseNegative
  • The experiments were repeated for 20 different randomly selected values of CFalseNegative
conclusions from sensitivity analysis
Conclusions from Sensitivity Analysis
  • Considerable lack of sensitivity to modification in parameters for trees using 3 or 4 sensors.
  • Very few optimal trees.
  • Very few Boolean functions arise among optimal and near-optimal trees.
modeling sensor errors

One Approach to Sensor Errors: Modeling Sensor Operation

  • Threshold Model:
    • Sensors have different discriminating power
    • Many use counts (e.g., Gamma radiation counts)
    • See if count exceeds

threshold

    • If so, say attribute is present.

Modeling Sensor Errors

modeling sensor errors1

Threshold Model:

    • Sensor i has discriminating power Ki, threshold Ti
    • Attribute present if counts exceed Ti
    • Seek threshold values that minimize the overall cost function, including costs of inspection, false positive/negative
    • Assume readings of category 0 containers follow a Gaussian distribution and similarly category 1 containers
    • Simulation approach

Modeling Sensor Errors

probability of error for individual sensors

Ti

P(i|X=1)

P(i|X=0)

Characteristics of a typical sensor

Probability of Error for Individual Sensors
  • For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2 (P(Yi=0|X=1)) errors are modeled using Gaussian distributions.
    • State of nature X=0 represents absence of a bomb.
    • State of nature X=1 represents presence of a bomb.
    • i represents the outcome (count) of sensor i.
    • Σi is variance of the distributions
    • PD = prob. of detection, PF= prob. of false pos.

Ki

i

modeling sensor errors2
Modeling Sensor Errors

The probability of false positive for the ith sensor is computed as:

P(Yi=1|X=0) = 0.5 erfc[Ti/√2]

The probability of detection for the ith sensor is computed as:

P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]

erfc = complementary error function erfc(x) = (1/2,x2)/sqrt()

The following experiments have been done using sensors A, B, C and using:

KA = 4.37; ΣA = 1

KB = 2.9;ΣB = 1

KC = 4.6;ΣC = 1

We then varied the individual sensor thresholds TA, TB and TC from -4.0 to +4.0 in steps of 0.4. These values were chosen since they gave us an “ROC curve” for the individual sensors over a complete rangeP(Yi=1|X=0) and P(Yi=1|X=1)

frequency of first ranked trees for variations in sensor thresholds
Frequency of First Ranked Trees for Variations in Sensor Thresholds
  • 68,921 experiments were conducted, as each Ti was varied through its entire range. (n = 3)
  • The above graph has frequency counts of the number of experiments when a particular tree was ranked first. There are 15 such trees. Tree 37 had the highest frequency of attaining rank one.
modeling sensor errors3
Modeling Sensor Errors
  • A number of trees ranking first in other experiments also ranked first here.
  • Similar results in case of n = 4.
  • 4,194,481 experiments.
  • 244 different trees were ranked first in at least one experiment.
  • Trees ranked first in other experiments also frequently appeared first here.
  • Conclusion: considerable insensitivity to change of threshold.
new approaches to optimum threshold computation
New Approaches to Optimum Threshold Computation
  • Extensive search over a range of thresholds has some practical drawbacks:
    • Large number of threshold values for every sensor
    • Large step size
    • Grows exponentially with the number of sensors (computationally infeasible for n > 4)
  • A non-linear optimization approach proves more satisfactory:
    • A combination of Gradient Descent and modified Newton’s methods
optimum threshold computation
Optimum Threshold Computation
  • We implemented standard approaches including:
    • Gradient Descent Method
    • Newton’s Method in Optimization
  • In Gradient Descent Method we have to adjust the “step size” heuristically, while the Newton method uses the Hessian matrix to adjust the step size automatically

Though the computation of the Hessian matrix is a little expensive and tedious, the method quickly converges in fewer iterations than the gradient descent method

problems with standard approaches
Problems with Standard Approaches
  • Gradient Descent Method:Setting the value of the step size heuristically, since:
    • Too small step size results in large number of iterations to reach the minimum
    • Too big step size results in skipping the minimum
  • Newton’s Method:
    • The convergence depends largely on the starting point. This method occasionally drifts in the wrong direction and hence fails to converge.
  • Solution: combination of gradient descent and Newton’s methods
a combined method
A Combined Method
  • The Hessian matrix H f(τ) might not be a well-conditioned, positive definite matrix
  • We explored alternative approaches to computing positive definite approximations to H f(τ)
    • Modified Cholesky Algorithms
  • If the Hessian matrix H f(τ) is ill-conditioned, we take small steps towards the minimum using the gradient descent method until it becomes well conditioned
results threshold optimization
Results: Threshold Optimization
  • Costs of false positive CFalsePositive and false negative CFalseNegative and prior probability of occurrence of a bad container, P(X=1),were fixed as means of the min and max values given by Stroud and Saeger (same as we used in earlier experiments)
  • We were able to converge to a (hopefully-close-to-minimum) cost every time with a modest number of iterations. For example:
    • For 3 sensors, it took an average of 0.081 seconds (as opposed to 0.387 seconds using extensive search) to converge to a cost for all 114 trees
    • For 4 sensors, it took an average of 0.196 seconds (as opposed to more than 2 seconds using extensive search) to converge to a cost for all 66,936 trees
  • In each case, min cost attained with new algorithm was lower, and often much lower, than that attained with extensive search.
results threshold optimization1
Results: Threshold Optimization

Many times the minimum obtained using the optimization method was considerably less than the one from the extensive search technique.

new idea searching through a generalized tree space
New Idea: Searching through a Generalized Tree Space
  • We expand the space of trees from those corresponding to Stroud and Saeger’s “Complete” and “Monotonic” Boolean Functions to “Complete and Monotonic BDTs”, because…
  • Unlike Boolean functions, BDTs may not consider all sensor outputs to give a final decision
  • Advantages:
    • Allow more potentially useful trees to participate in the analysis
    • Help define an irreducible tree space for search operations
    • Move focus from Boolean Functions to Binary Decision Trees
revisiting monotonicity

a b b

b c a c c a

0 1 0 1 0 c a 1 0 a 1 c

0 1 1 0 0 1 0 1

  • a b c F(abc)
  • 0 0 0 0
  • 0 0 1 0
  • 0 1 0 1
  • 0 1 1 1
  • 1 0 0 0
  • 1 0 1 1
  • 1 1 0 0
  • 1 1 1 1

b c

c c b b

0 a a 1 0 a a 1

0 1 1 0 1 0 0 1

c c

a b b a

b 0 a 1 0 a b 1

0 1 0 1 1 0 0 1

Revisiting Monotonicity
  • Monotonic Decision Trees
    • A binary decision tree will be called monotonic if all the left leaves are class “0” and all the right leaves are class “1”.
  • Example:

All these trees correspond to same monotonic Boolean function

Only one is a monotonic BDT.

revisiting completeness

a

b c

c 1 b 1

0 1 0 1

a

c b

b 1 c 1

0 1 0 1

  • a b c F(abc)
  • 0 0 0 0
  • 0 0 1 1
  • 0 1 0 1
  • 0 1 1 1
  • 1 0 0 0
  • 1 0 1 1
  • 1 1 0 1
  • 1 1 1 1

a

c c

b 1 b 1

0 1 0 1

a

b b

c 1 c 1

0 1 0 1

Revisiting Completeness
  • Complete Decision Trees
    • A binary decision tree will be called complete if every sensor occurs at least once in the tree and, at any non-leaf node in the tree, its left and right sub-trees are not identical.
  • Example:
the cm tree space
The CM Tree Space

complete, monotonic BDTs

tree neighborhood and tree space
Tree Neighborhood and Tree Space
  • Define tree neighborhood by giving operations for moving from one tree in CM Tree Space to another.
  • Structure-based methods
  • Classifications-based methods
  • We choose structure-based neighborhood methods because the cost of a BDT depends more on its structure
  • Small changes in tree structure don’t significantly affect cost, and…
  • All BDTs with the same Boolean functions may differ a lot in cost
search operations in tree space

a

b c

0 c d 1

d 1 0 1

0 1

a

b c

0 c d 1

d 1 b 1

0 1 0 1

SPLIT

Search Operations in Tree Space
  • Split

Pick a leaf node and replace it with a sensor that is not already present in that branch, and then insert arcs from that sensor to 0 and to 1.

search operations

a

b c

0 c d 1

d 1 0 1

0 1

a

b c

0 d d 1

c 1 0 1

0 1

SWAP

Search Operations
  • Swap

Pick a non-leaf node in the tree and swap it with its parent node such that the new tree is still monotonic and complete and no sensor occurs more than once in any branch.

search operations1
Search Operations
  • Merge

Pick a parent node of two leaf nodes and make it a leaf node by collapsing the two leaf nodes below it, or pick a parent node with one leaf node, collapse both the parent node and its one leaf node, and shift the sub-tree up in the tree by one level.

a

b c

0 c d 1

0 1 0 1

a

b c

0 c d 1

d 1 0 1

0 1

a

b c

0 d d 1

0 1 0 1

MERGE

search operations2

a

b c

0 c d 1

d 1 0 1

0 1

a

b c

0 c b 1

d 1 0 1

0 1

REPLACE

Search Operations
  • Replace

Pick a node with a sensor occurring more than once in the tree and replace it with any other sensor such that no sensor occurs more than once in any branch.

tree neighborhood and tree space1
Tree Neighborhood and Tree Space
  • Define tree neighborhood by using these four operations for moving from one tree in CM Tree Space to another.
  • Irreducibility
    • Theorem: Any tree in the CM tree space can be reached from any other tree by using these neighborhood operations repetitively
    • An irreducible CM tree space helps “search” for the cheapest trees using neighborhood operations
tree space traversal
Tree Space Traversal
  • Greedy Search
      • Randomly start at any tree in the CM tree space
      • Find its neighboring trees using the above operations
      • Move to the neighbor with the lowest cost
      • Iterate until we find a minimum
    • The CM Tree space is highly multi-modal (more than one minimum)!
    • Therefore, we implement a stochastic search algorithm with simulated annealing to find the best tree
tree space traversal1
Tree Space Traversal
  • Stochastic Search
    • Randomly start at any tree in CM space
    • Find its neighboring trees, and evaluate each one for its total cost
    • Select next move according to a probability distribution over the neighboring trees
  • To deal with the multimodality of the tree space, we introduce Simulated Annealing:
    • Make more random jumps initially, gradually decrease the randomness and finally converge at the overall minimum
tree space traversal2
Tree Space Traversal
  • Stochastic Search
    • Randomly start at any tree in CM space
    • Find its neighboring trees, and evaluate each one for its total cost
    • Select move according to a probability distribution over neighboring trees proportional to the ratio of the costs of two trees
    • Specifically, if we are at the ith tree τi, then the probability of going to its kth neighbor, denoted τik, is given by
    • where f(τj) and f(τij) are the costs of trees τi and τij, respectively and ni is the number of trees in the neighborhood of τij
    • so-called “temperature” t is initiated to one and lowered in discrete unequal steps after every m hops until we reach a minimum
results searching cm tree space
Results: Searching CM Tree Space
  • We were able to perform experiments for 3, 4 and 5 sensors, successfully
  • Results show improvement compared to the extensive search method. E.g., for 4 sensors (66,936 trees)
    • 100 different experiments were performed
    • Each experiment was started 10 times randomly at some tree and chains were formed by making stochastic moves in the neighborhood, until we find a local minimum
    • Only 4890 trees were examined on average for every experiments
    • Global minimum was found 82 times while the second best tree was found 10 times
    • The method found trees that were less costly than those found by earlier searches of BDTs corresponding to complete, monotonic Boolean functions.
genetic algorithms based approach
Genetic Algorithms-based Approach
  • Structure-based neighborhood moves allow very short moves only. Therefore,…
  • Techniques like Genetic Algorithms and Evolutionary Techniques may suggest ways for getting more efficiently to better trees, given a population of good trees
genetic algorithms based approach1
Genetic Algorithms-based Approach
  • Started implementing genetic algorithms- based techniques for tree space traversal
  • Basically, we try to get “better” trees from the current population of “good” trees using the basic genetic operations on them:
    • Selection
    • Crossover
    • Mutation
  • Here, “better” decision trees correspond to lower cost decision trees than the ones in the current population (“good”).
genetic algorithms based approach2
Genetic Algorithms-based Approach
  • Therefore, as new trees are generated, the size of the genePool keeps increasing.
  • Selection:
    • Select an initial population of trees, bestPop, randomly out of the CM tree space
    • Always keep a population of N best trees for further crossover and mutation operations
  • Crossover:
    • Performed between every pair of trees in bestPop.
    • For each crossover operation between two trees τi and τj, we randomly select nodes and exchange the subtrees
genetic algorithms based approach3
Genetic Algorithms-based Approach
  • We randomly perform crossover operation k times between a pair of trees
  • However, we impose certain restrictions to ensure that resultant trees lie in CM tree space
genetic algorithms based approach4
Genetic Algorithms-based Approach
  • Mutation:
    • Performed after every m generations of the algorithm
    • We do two types of mutations:
      • Generating all the neighboring trees of the current best population of trees using the four operations used in the stochastic search method and put them into the gene pool
      • Replacing some fraction of the trees of bestPop with random samples from the CM tree space which are not in the gene pool
    • Only ~1600 trees had to be examined to obtain the 10 best trees for 4 sensors!
discussion and future work
Discussion and Future Work
  • Very few optimal trees; optimality insensitive to changes in parameters.
  • The extensive search techniques become practically infeasible beyond a very small number of sensors
  • The new threshold optimization algorithms provide faster ways to arrive at a low tree cost; cost is lower and often much lower than in extensive search
  • The irreducible tree space helps us to “search” for the best trees rather than evaluating all the trees for their cost
  • The stochastic search algorithm allows us to search for optimum inspection schemes beyond 4 sensors successfully
discussion and future work1
Discussion and Future Work
  • Future work: Because of the rapid growth in number of trees in CM Tree Space when the number of sensors grows, it is necessary to try to reduce the number of trees we need to search through.
  • A notion of tree equivalence could be incorporated when the number of sensors go beyond 5 or 6
  • We hope that incorporating this into our model will enable us to extend our model to a large number of sensors
tree equivalence and tree irreducibility
Tree Equivalence and Tree Irreducibility
  • A decision tree in CM tree space can be exactly equivalent to another decision tree in the CM tree space:
  • This property is called transposition and the two trees are called transposes of each others
tree equivalence and tree irreducibility1
Tree Equivalence and Tree Irreducibility
  • The problem is that the size of the equivalence class (number of transposes for a given tree) increases exponentially with the number of sensors.
  • Therefore, we need to define a “canonical” representation of the equivalence class and deal with the equivalence class as a whole rather than treating each individual tree separately
tree irreducibility

a

b b

c d c d

0 1 0 c 0 d 0 c

0 1 0 1 0 1

b

a a

c c d d

0 1 0 d 0 c 0 c

0 1 0 1 0 1

Tree Irreducibility
  • Another problem is that in addition to being complete and monotonic, a tree should also be irreducible (no non-useful sensor). For example:
discussion and future work3

Future work: More than two values of an attribute

      • (present, absent, present with probability > 75%, absent with probability at least 75%)
      • (ok, not ok, ok with probability > 99%, ok with probability between 95% and 99%)
  • Future work: In the Boolean function model: inferring the Boolean function from observations (partially defined Boolean functions)

Discussion and Future Work

discussion and future work4

Future Work: Explain why conclusions are so insensitive to variation in parameter values.

  • Future Work: Explore the structure of the optimal trees and compare the different optimal trees.
  • Future Work: Develop methods for
  • approximating the optimal tree.

Discussion and Future Work

Pallet vacis

closing remark

Recall that the “cost” of inspection includes the cost of failure, including failure to foil a terrorist plot.

  • There are many ways to lower the total “cost” of inspection:
    • Use more efficient

orders of inspection.

    • Find ways to inspect

more containers.

    • Find ways to cut down

on delays at inspection lanes.

Closing Remark

research team
Research Team
  • Tayfur Altiok, Rutgers, Ind. & Systems Engineering
  • Saket Anand, Rutgers, ECE graduate student
  • Endre Boros, Rutgers, Operations Research
  • Elsayed Elsayed, Rutgers, Ind. & Systems Engineering
  • Liliya Fedzhora, Rutgers, Operations Res. grad. student
  • Paul Kantor, Rutgers, Schl. of Infor. & Library Studies
  • Abdullah Karaman, Rutgers Ind. & Syst. Eng. grad. student
  • David Madigan, Rutgers, Statistics
  • Richard Mammone, Rutgers, Center for Advanced Information Processing
  • Ben Melamed, Rutgers Business School
  • Sushil Mittal, Rutgers, ECE graduate student
  • S. Muthukrishnan, Rutgers, Computer Science
  • Saumitr Pathek, Rutgers ECE graduate student
  • Richard Picard, Los Alamos, Statistical Sciences Group
  • Fred Roberts, Rutgers, DIMACS Center
  • Kevin Saeger, Los Alamos, Homeland Security
  • Christina Schroepfer, Rutgers, Ind. & Syst. Eng. Grad. student
  • Phillip Stroud, Los Alamos, Systems Engineering and Integration Group
  • Minge Xie, Rutgers, Statistics
  • Hao Zhang, Rutgers Ind. & Systems Eng., graduate student
slide97

Collaborators on Work Described:

  • Saket Anand
  • David Madigan
  • Richard Mammone
  • Sushil Mittal*
  • Saumitr Pathak
  • Research Support:
  • Office of Naval Research
  • National Science Foundation
  • New support: Domestic Nuclear Detection Office
  • Los Alamos National Laboratory:
  • Rick Picard
  • Kevin Saeger
  • Phil Stroud