Algorithms for Port of Entry Inspection for WMDs Fred S. Roberts DIMACS Center, Rutgers University
Port of Entry Inspection Algorithms • Goal: Find ways to intercept illicit • nuclear materials and weapons • destined for the U.S. via the • maritime transportation system • Currently inspecting only small • % of containers arriving at ports • Even inspecting 8% of containers in Port of NY/NJ might bring international trade to a halt (Larrabbee 2002)
Aim: Develop decision support algorithms that will help us to “optimally” intercept illicit materials and weapons subject to limits on delays, manpower, and equipment • Find inspection schemes that minimize total “cost” including “cost” of false positives and false negatives Port of Entry Inspection Algorithms Mobile Vacis: truck-mounted gamma ray imaging system
Stream of containers arrives at a port • The Decision Maker’s Problem: • Which to inspect? • Which inspections next based on previous results? • Approach: • “decision logics” • combinatorial optimization methods • Builds on ideas of Stroud and Saeger at Los Alamos National Laboratory • Need for new models • and methods Sequential Decision Making Problem
Such sequential diagnosis problems arise in many areas: • Communication networks (testing connectivity, paging cellular customers, sequencing tasks, …) • Manufacturing (testing machines, fault diagnosis, routing customer service calls, …) • Artificial intelligence/CS (optimal derivation strategies in knowledge bases, best-value satisficing search, coding decision trees, …) • Medicine (diagnosing patients, sequencing treatments, …) Sequential Diagnosis Problem
Containersarriving to be classified into categories. • Simple case: 0 = “ok”, 1 = “suspicious” • Inspection scheme: specifies which inspections are to be made based on previous observations Sequential Decision Making Problem
Containers have attributes, each • in a number of states • Sample attributes: • Levels of certain kinds of chemicals or biological materials • Whether or not there are items of a certain kind in the cargo list • Whether cargo was picked up in a certain port Sequential Decision Making Problem
Currently used attributes: • Does ship’s manifest set off an “alarm”? • What is the neutron or Gamma emission count? Is it above threshold? • Does a radiograph image come up positive? • Does an induced fission test come up positive? Sequential Decision Making Problem Gamma ray detector
We can imagine many other attributes • This project is concerned with general algorithmic approaches. • We seek a methodology not tied to today’s technology. • Detectors are evolving quickly. Sequential Decision Making Problem
Simplest Case: Attributes are in state 0 or 1 • Then: Container is a binary string like 011001 • So: Classification is a decision function F that assigns each binary string to a category. Sequential Decision Making Problem 011001 F(011001) If attributes 2, 3, and 6 are present, assign container to category F(011001).
If there are two categories, 0 and 1, decision function F is a boolean function. • Example: • F(000) = F(111) = 1, F(abc) = 0 otherwise • This classifies a container as positive iff it has none of the attributes or all of them. Sequential Decision Making Problem 1 =
Sequential Decision Making Problem • Given a container, test its attributes until know enough to calculate the value of F. • An inspection scheme tells us in which order to test the attributes to minimize cost. • Even this simplified problem is hard computationally.
This assumes F is known. • Simplifying assumption: Attributes are independent. • At any point we stop inspecting and output the value of F based on outcomes of inspections so far. • Complications: May be precedence relations in the components (e.g., can’t test attribute a4 before testing a6. • Or: cost may depend on attributes tested before. • F may depend on variables that cannot be directly tested or for which tests are too costly. Sequential Decision Making Problem
Such problems are hard computationally. • There are many possible boolean functions F. • Even if F is fixed, problem of finding a good classification scheme (to be defined precisely below) is NP-complete. • Several classes of functions F allow for efficient inspection schemes: • k-out-of-n systems • Certain series-parallel systems • Read-once systems • “regular” systems • Horn systems Sequential Decision Making Problem
n types of sensors measure presence or absence of the n attributes. • Many copies of each sensor. • Complication: different characteristics of sensors. • Entities come for inspection. • Which sensor of a given type to • use? • Think of inspection lanes and • queues. • Besides efficient inspection • schemes, could decrease costs by: • Buying more sensors • Change allocation of containers to sensor lanes. Sensors and Inspection Lanes
Sensors measure presence/absence of attributes. • Binary Decision Tree: • Nodes are sensors or categories (0 or 1) • Two arcs exit from each sensor node, labeled left and right. • Take the right arc when sensor says the attribute is present, left arc otherwise Binary Decision Tree Approach
Reach category 1 from the root only through the path a0 to a1 to 1. • Container is classified in category 1 iff it has both attributes a0 and a1 . • Corresponding boolean function F(11) = 1, F(10) = F(01) = F(00) = 0. Binary Decision Tree Approach Figure 1
Reach category 1 from the root by: • a0 L to a1 R a2 R 1 or • a0 R a2 R1 • Container classified in category 1 iff it has • a1 and a2 and not a0 or • a0 and a2 and possibly a1. • Corresponding boolean function F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise. Binary Decision Tree Approach Figure 2
This binary decision tree corresponds to the same boolean function • F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise. • However,it has one less observation node ai. So, it is more efficient if all observations are equally costly and equally likely. Binary Decision Tree Approach Figure 3
Even if the boolean function F is fixed, the problem of finding the “optimal” binary decision tree for it is very hard (NP-complete). • For small n = number of attributes, can try to solve it by brute force enumeration. • Even for n = 5, not practical. (n = 4 at Port of Long Beach-Los Angeles) Binary Decision Tree Approach Port of Long Beach
Promising Approaches: • Heuristic algorithms, approximations to optimal. • Special assumptions about the boolean function F. • Example: For “monotone” boolean functions, integer programming formulations give promising heuristics. • Stroud and Saeger enumerate • all “complete,” monotone • boolean functions and calculate • the least expensive corresponding • binary decision trees. Binary Decision Tree Approach
Monotone Boolean Functions: • Given two strings x1x2…xn, y1y2…yn • Suppose that xi yi for all i implies that F(x1x2…xn) F(y1,y2…yn). • Then we say that F is monotone. • Then 11…1 has highest probability of being in category 1. Binary Decision Tree Approach
Incomplete Boolean Functions: • Boolean function F is incomplete if F can be calculated by finding at most n-1 attributes and knowing the value of the input string on those attributes • Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) = F(001) = F(010) = F(011) = 0. • F(abc) is determined without knowing b (or c). • F is incomplete. Binary Decision Tree Approach
Complete, Monotone Boolean Functions: • Stroud and Saeger: algorithm for enumerating binary decision trees implementing complete, monotone boolean functions. • Feasible to implement up to n = 4. • n = 2: • There are 6 monotone boolean functions. • Only 2 of them are complete, monotone • There are 4 binary decision trees for calculating these 2 complete, monotone boolean functions. Binary Decision Tree Approach
Complete, Monotone Boolean Functions: • n = 3: • 9 complete, monotone boolean functions. • 60 distinct binary trees for calculating them • n = 4: • 114 complete, monotone boolean functions. • 11,808 distinct binary decision trees for calculating them. Binary Decision Tree Approach
Complete, Monotone Boolean Functions: • n = 5: • 6894 complete, monotone boolean functions • 263,515,920 corresponding binary decision trees. • Combinatorial explosion! • Need alternative approaches; enumeration not feasible! Binary Decision Tree Approach
Cost Functions • Above analysis: Only uses number of sensors • Using a sensor has a cost: • Unit cost of inspecting one item with it • Fixed cost of purchasing and deploying it • Delay cost from queuing up at the sensor station • Preliminary problem: disregard fixed and delay costs. Minimize unit costs.
Cost Functions • Simplification so far: Disregard characteristics of population of entities being inspected. • Only count number of observation (attribute) nodes in the tree. • Unit Cost Complication: How many nodes of the decision tree are actually visited during average container’s inspection? Depends on “distribution” of containers. In our early models, will depend on probability of sensor errors and probability of bomb in a container.
Tradeoff between fixed costs and delay costs: Add more sensors cuts down on delays. • Stochastic process of containers arriving • Distribution of delay times for inspections • Use queuing theory to find average delay times under different models Cost Functions: Delay Costs
Cost Functions • Cost of false positive: Cost of additional tests. • If it means opening the container, it’s very expensive. • Cost of false negative: • Complex issue. • What is cost of a bomb going off in Manhattan?
The cost of each binary decision tree corresponding to a complete, monotone boolean function is calculated. • The optimum tree is selected. • Optimum depends on assumptions about sensor errors, costs of false positive and false negative outcomes, and unit, fixed, and delay costs for each sensor. The Brute Force Approach
One Approach to False Positives/Negatives: • Assume there can be Sensor Errors • Simplest model: assume that all sensors checking for attribute ai have same fixed probability of saying ai is 0 if in fact it is 1, and similarly saying it is 1 if in fact it is 0. • More sophisticated analysis later describes a model for determining probabilities of sensor errors. • Notation: X = state of nature (bomb or no bomb) • Y = outcome (of sensor or entire inspection process). Cost Functions: Sensor Errors
A A B 0 B 0 C 1 C 1 1 0 1 0 Probability of Error for The Entire Tree State of nature is one (X = 1), presence of a bomb State of nature is zero (X = 0), absence of a bomb Probability of false positive (P(Y=1|X=0)) for this tree is given by Probability of false negative (P(Y=0|X=1)) for this tree is given by P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0) + P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0) P(Y=0|X=1) = P(YA=0|X=1) + P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)
Cost Function used for Evaluating the Decision Trees. CTot =CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative+ Cutil CFalsePositive is the cost of false positive (Type I error) CFalseNegative is the cost of false negative (Type II error) PFalsePositive is the probability of a false positive occurring PFalseNegative is the probability of a false negative occurring Cutil is the cost of utilization of the tree. The error probability of the entire tree is computed from the error probabilities of the individual sensors.
Cost Function used for Evaluating the Decision Trees. Cutil is the cost of utilization of the tree. Simplest assumption: Cutil isthe expected sum of unit costs associated with the tree. Count unit cost of each sensor each time it is used. Use P(X = 1) and probability of errors at each type of sensor to calculate expected value. Later: models for distribution of attributes of containers and more sophisticated analysis of expected cost of utilizing the tree, bringing in delay costs.
Stroud Saeger Experiments • Stroud-Saeger ranked all trees formed from 3 or 4 sensors A, B, C and D according to increasing tree costs. • Used cost function defined above. • Values used in their experiments: • CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10; • CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01; • CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001; • CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05; • Here, Ci = cost of utilization of sensor i. • Also fixed were: CFalseNegative, CFalsePositive, P(X=1)
Stroud Saeger Experiments: Our Sensitivity Analysis • We have explored sensitivity of the Stroud-Saeger conclusions to variations in values of these three parameters. • We estimated high and low values for these parameters. • We chose one of the values from the interval of values and then explored the highest ranked tree as the other two were chosen at random in the interval of values. 10,000 experiments for each pair of fixed values. • We looked for the variation in the top-ranked tree and how the top-rank related to choice of parameter values. • Very surprising results.
Stroud Saeger Experiments: Our Sensitivity Analysis • CFalseNegativewas varied between 25 million and 10 billion dollars • Low and high estimates of direct and indirect costs incurred due to a false negative. • CFalsePositive was varied between $180 and $720 • Cost incurred due to false positive (4 men * (3 -6 hrs) * (15 – 30 $/hr) • P(X=1)was varied between 1/10,000,000 and 1/100,000
Stroud Saeger Experiments: Sensitivity Analysis • First set of experiments: 3 attributes or types of sensors, A, B, C. • Extensive computer experimentation.
Frequency of Top-ranked Trees when CFalseNegative and CFalsePositive are Varied • 10,000 randomized experiments (randomly selected values of CFalseNegative and CFalsePositive from the specified range of values) for the median value of P(X=1). • The above graph has frequency counts of the number of experiments when a particular tree was ranked first or second, or third and so on. • Only three trees (7, 55 and 1) ever came first. 6 trees came second, 10 came third, 13 came fourth.
Frequency of Top-ranked Trees when CFalseNegative and P(X=1) are Varied • 10,000 randomized experiments for the median value of CFalsePositive. • Only 2 trees (7 and 55) ever came first. 4 trees came second. 7 trees came third. 10 and 13 trees came 4th and 5th respectively.
Frequency of Top-ranked Trees when P(X=1) and CFalsePositive are Varied • 10,000 randomized experiments for the median value of CFalseNegative. • Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10 trees came third. 13 and 16 trees came 4th and 5th respectively.
A B 0 B B A A A C C 1 1 1 0 0 C A 0 0 1 0 0 0 1 1 Most Frequent Tree Groups Attaining the Top Three Ranks. • Trees 7, 9 and 10 All the three decision trees have been generated from the same boolean expression 00000111 representing F(000)F(001)…F(111) Both Tree 9 and Tree 10 are ranked second and third more than 99% of the times when Tree 7 is ranked first.
A B B 1 1 1 B A C 1 1 1 C A C 0 0 0 1 1 1 Most Frequent Tree Groups Attaining the Top Three Ranks • Trees 55, 57 and 58 The boolean expression for these three decision trees is 01111111 Tree ranked 57 is second 96% of the times and tree 58 is third 79 % of the times when tree 55 is ranked first.
A B A C A B 0 0 0 0 0 0 C C B 0 0 0 1 1 1 Most Frequent Tree Groups Attaining the Top Three Ranks • Trees 1, 3, and 2 The boolean expression for these three decision trees is 00000001 Tree 3 is ranked second 98% of times and tree 2 is ranked third 80 % of the times when tree 1 is ranked first.
Values of CFalseNegative and CFalsePositive when Tree 7 was Ranked First • This is a graph of CFalsePositive against CFalseNegative values obtained from the randomized experiments. The black dots represent points at which tree 7 scored first rank.
Values of CFalseNegative and CFalsePositive when Tree 55 was Ranked First • Tree 55 fills up the lower area in the range of CFalseNegative and CFalsePositive values.
Values of CFalseNegative and CFalsePositive when Tree 1 was Ranked First • Tree 1 fills up the major area in the range of CFalseNegative and CFalsePositive.
Values of CFalseNegative and CFalsePositive for the Three First Ranked Trees • Trees 7, 55 and 1 fill up the entire area in the range of CFalseNegative and CFalsePositive among themselves.
Values ofCTot, CFalseNegative and CFalsePositive for First Ranked Trees • This graph shows total costs for trees 7, 55 and 1 in the respective regions in which they were ranked first. • Each tree’s total cost is a hyperplane which cuts other hyperplanes as it gains and then loses first rank.