A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Visual Multidimensional Scaling (VMDS): an Intelligence Fusion EngineTim Hanratty (Lead), John Brand, Ann E. M. BornsteinAndrew Niederer, John RichardsonUS Army Research LaboratoryComputational and Information Sciences DirectorateAberdeen Proving Ground, MD 21005

A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine Analysts need a method to determine similarity of objects under analysis to other objects in a background population, all described by sparse, non-normal data of different data types. Multidimensional scaling (MDS) is an appropriate similarity analysis methodology but is difficult to use.

Why Multidimensional Scaling? • The similarity analysis methodology must be tolerant of sparse data of dissimilar types, of varying reliability. • Data on a person may include information such as tribal affiliation, gender, pulse rate, agreement of any bio-ID resulting from an encounter with identity documents on the person during the encounter, how many and what kinds of documents are on the person, number of entries in the intel data base linked to the bio-ID or claimed name of the person, etc. • These data are • Seldom distributed normally, • Virtually certain to be incomplete, and • Of differing data types (e.g., ratio data such as pulse rate, nominal data such as tribal membership). • Multidimensional Scaling (MDS) is a similarity methodology tolerant of these factors.

The Visual Multidimensional Scaling (VMDS) engine The Visual Multidimensional Scaling (VMDS) similarity analysis engine simplifies and partially automates use of MDS, including data import/export and display of results. A focus problem to develop the engine and its underlying methodology: Identify High Value Individuals (HVIs) using similarity analysis of massive but sparse input data sets resulting from parallel streams of dissimilar data types, of varying reliability and arriving at different times, so that a field commander can take appropriate action: • Release • Release and watch • Detain.

Use of similarity analysis Similarity analysis: • Is complementary to graph analysis based on functional relationships, when functional relationships—A controls B, B provides services to C,D, and E, C has been seen with E and F, etc.—are known. • Allows inferences when no information on functional relationships exists, that is, A resembles B, rather than A controls B. • Determines resemblance based on observables to • Cue the analyst to the importance of a subject to begin the surveillance/monitoring process, • Indicate other subjects in the subject’s circle who may also be important, once functional information about a subject has been gathered, • Act as a filter or as an additional tool to investigate a possible enemy cell. • Is a mathematical implementation of what police and intelligence officers have always done— • “He looks suspicious” integrates observational and other data to form a conclusion based on unspecified and possibly subliminal recognition of similarities to known groups. • Similarity analysis is complementary to the present method of graph analysis based on functional relationships. It is a first step to Level 3 Fusion.

Identity (ID) is a key • A subject may be • linked to a positive identity through biometrics and then through that ID to other data such as civil records, or • known only by a name, possibly an alias or variant name. • If we have a personal ID associated with a subject we may compile • A Biometric Rap Sheet of ID events dealing with that subject (presently done at the Biometric Fusion Center) • A list of intel reports referring to that subject (presently done at NGIC). • The biometric information may then be fused with other data such as human observations, situational information, etc. • If the person is known only by reports citing names which may be aliases, information on that person may be linked to those aliases and a pattern of similarities with key groups determined. • If a subject has no traceable identity, that itself is a cue to military value. • The data is almost always sparse and of varying reliability.

Similarity analysis using MDS • MDS is a similarity analysis methodology based on dimension reduction. • MDS reduces the set of pairwise (dis)similarities of entities described in a high dimension input data set to a set of (dis)similarities in a reduced dimension solution space, (typically 2 or 3 dimensions) that follow the distances in the higher dimension set. • The (dis)similarities in the solution set are reflected by distances in the 2- or 3-dimension solution space produced by a set of coordinates describing the location of the entities. • The points describing the entities in the solution space may be visualized to show the closeness and, hence, similarity of the entities to each other. The display and other measures may guide further investigation. • Effective use of MDS requires substantial knowledge on the part of the analyst, experience in choosing analytical options, and prior preparation of data. • This knowledge can be incorporated into an analytical engine that incorporates the needed background and facilitates data preparation. • An application has been developed to partially automate and simplify the analysis, the Visual MDS (VMDS) application.

Development conducted with a notional persons data base • Methodology development was done using a data base of invented persons described by notional,plausible information. • Data include reasonable but incomplete situational, observational, biometric ID, biometric stress indicators, ID documents, records, and intel data. • Some data in the notional data base is congruent to real or widely accepted surrogate data sources: • Intel messages are from STEF or modeled after STEF. • Situational, descriptive, and biometric ID data is congruent to real data from Biometric Task Force, including the Iraq Biometric Watch List and a sample Biometric Rap Sheet. • Biometric stress indicators have been widely used but are not presently used in theater due to focus on ID. They are technically accessible and based on the well accepted “fight or flight” syndrome. • Biometric stress indicators are a cue and may reflect innocent stress as well as guilty stress.

Notional persons data base A data base of 52 notional personswas constructed based on Soft Target Exploitation and Fusion (STEF) intel message set. The 19 STEF persons are known only by reference to those names in intel messages and are terrorists of one kind or another. An additional 33 persons were invented based on an overall scenario and individual encounter scenarios to justify the development of the known information on each person. The 52 persons were from several population subgroups, including 10 innocents, 5 petty criminals, 8 militia, 10 non-STEF terrorists, and the 19 STEF terrorists. A 53rd person was invented, an innocent, as a probe of the analysis methodology.

Development conducted with notional persons data base • The notional persons data base was originally embodied in Excel, a fast development environment. • Excel does not portray associations with multiple entities easily, such as multiple identities. • A relational data base is under development using Access. The excerpt illustrates the ease of portrayal of the description of a person, including the abstraction of the descriptive data vector. Numerical attributes are in aqua

Surrogate data set congruent to real biometric ID data sets The biometric identity and personal descriptive information in the biometric data bases is a foundation of the notional persons data base. The extent of the additional data beyond bio-ID—situational, biometric stress indicators, documentary, intel—is also clear. Numerical attributes used in MDS, extracted from characteristics of notional persons data base Attributes included in biometric watch list and biometric rap sheet—bio ID only

Attribute vectors used as input to MDS analysis Human observation Direct biosensed Civil, criminal records Situational Remote biosensed Personal documents Intel data Heavy emphasis on biometrics—underlying ID is key and biometrics is key to ID!

Visual MultiDimensional Scaling (VMDS) Program The Visual MDS (VMDS) package is an integrated analysis and visualization engine developed using R to carry out MDS analyses and GGobi to display the results. It is presently in beta version. This package integrates and partially automates • data import/export, • MDS data map generation, • a visualization tool based on GGobi. VMDS will include specific entity/relations analysis tools. Analytical entity/relations analysis functions may include methods from Outlier Analysis and Facet Theory.

VMDS analysis options control panel • VMDS allows import of data in a commonly used format, .csv. • Analysis options are menu driven, which helps analyst: • MDS analysis type (classical, non-metric) • (Dis)similarity function • Distance function, i.e., city black, euclidean, etc. • Dimensions in the solution set (2-, 3-D) • Icon color keying. • Data analysis will also be menu driven in later versions.

VMDS visualization output VMDS analysis of 21-dimension version of notional persons data. Analyst used default analysis options. VMDS visualization tool is GGobi—three dimensional cue based on relative motion of points in constellation on rotation. Separation of groups is very clear on rotation but not in a static display.

VMDS visualization output Rotating the constellation shows separation of groups.

3-D Rendering with static visual depth cues • The utility of static depth cueing is shown by the display above. • The 3-D coordinates of the reduced dimension solution set produced using PERMAP, an MDS engine used earlier in the development of the methodology, were inserted in a customized X3D visualizer that provides static depth cues. • The size of the spheres is indicative of closeness to the forward plane in the perspective view. • The red sphere (solid white arrow) represents the individual under analysis, RI. • Green spheres are innocents. RI clearly closely resembles innocents, with increasingly less resemblance to criminals (blue), militia (pinkish red), and various terrorists (yellow, gold, light cyan) • Yellow and gold spheres represent the STEF terrorists; gold symbols are a hostage team that is a subset of the STEF terrorists. • The smaller yellow sphere (dashed arrow) near RI is a STEF terrorist; its size indicates the 3-D perspective of distance in that viewing angle—it is not actually near the green spheres or near RI. This separation becomes clear on rotation of the constellation.

Analyst report • Subject of Remote Inquiry (RI) appears to most closely resemble innocents in the background population, and less closely resemble the groups that include petty criminals, militia, and several kinds of terrorists. • Analyst concludes that the subject is probably not a High Value Individual, most likely is an innocent.

Analyst conclusion The ground truth is that RI is an innocent. Analysis corroborates ground truth.

View ahead • Develop more quantitative method. • Analysis based on map evaluation. • Investigate quantitative methods for estimating resemblance such as Outlier Detection. • Expand data set. • Framework based on relatively small notional data set. • Expanding scope of notional data set. • Introducing problem of multiple personas vs. intrinsic identity. • Obtain real data. • Notional set plausible, reasonable, but need to compare with reality. • Real data obtained from Biometric Task Force. • Observed Tactical Network Topology Test.

VMDS improvements VMDS is being modified with additional features • Static visual cues to 3-D depth • User ability to designate groups and members of groups from the screen • Computation of group centroid • Computation of distances of selected entities to selected group centroids • Computation and display of frequency histograms of distances of group members to group centroids, with distances of selected points to the cetroids overlain on the respective histograms • User blogs/notes

VMDS development • VMDS will be exercised in a concept development exercise/demonstration at a quarterly Biometrics Task Force exercise • A controlled experiment will evaluate the difference in an analyst’s ability to determine population grouping in simulated encounters with 1 to up to 500 subjects at a time. The notional data base has been bootstrapped to several hundred individuals to support this experiment

Summary • A methodology has been developed and documented to perform a similarity analysis of a specific intelligence problem, determination of high value individuals. • The methodology has been embodied in a beta version of a software package to partially automate the application of the methodology including display of the results. • The software package will be exercised at a quarterly Biometric Task Force exercise. • The utility of the package will be assessed in a controlled experiment with a bootstrapped population sample. • This methodology is applicable to a general class of intelligence problems, similarity based analysis.

BACKUPS

Example of relation of multiple personas to underlying ID Person with underlying identity or verified ID (birth and or other records) (Files sometimes referred to as “personality files,” since information gives a sense of who the subject is, or their personality. By extension, the person is sometimes referred to as a “personality.”) Underlying ID: Abdul bin Zawahiri, illegitimate son of Achmed bin Zawahiri, born Mosul 3 December 1982…. Bio-ID as Bio-ID as Bio-ID unavailable Bio-ID unavailable no other info Persona 1 Alternate legitimate name convention 1 “Abdul bin Zawahiri” Persona 2 Alternate legitimate name convention 2 “Abdul al-Tikriti” Persona 3 Alternate legitimate name convention 3 “Abdul al-Talebani” Persona 4 Deceptive (false) identity 1 “Dhul Fiqar” Persona 5 Incomplete information, only known as “Abdullah” Encounter 5. Informant report, no other info, no bio-ID, analyst unaware of ground truth ID. Encounter 1. Detained at check point, subject used real name corroborated by bio-ID. Encounters 2, 3. Detention at check point, no bio-ID kit available. Subject’s bogus ID cards accepted as real, analyst unable to link to underlying identity. Encounter 4. Arrest in raid, Police recognize as al Zawahiri, use bio-ID kit, determine underlying ID.

Possible representation of multiple personas and underlying ID Excel representation of multiple personas—the underlying ID of an individual is assumed known, one Muhammad_al_Rekh. Al_Rekh is also known under three aliases used in three encounters, each encounter involving a different alias, with different situational data, a different claimed background for the false identity, different stress related biometric data. All the encounter aliases are linked biometrically to the same underlying identity.

A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine