1 / 78

Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, Univer

Community-based Security Informatics Research: The COPLINK Experience Acknowledgement: NSF, CIA/ITIC, DHS, NIJ/DOJ, NLM/NIH, COPS, TPD, PPD, KCC. Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, University of Arizona. Outline.

tyrell
Download Presentation

Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, Univer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Community-based Security Informatics Research: The COPLINK Experience Acknowledgement: NSF, CIA/ITIC, DHS, NIJ/DOJ, NLM/NIH, COPS, TPD, PPD, KCC Hsinchun Chen, Ph.D.Director,COPLINK Center of Excellence,Artificial Intelligence Lab,Hoffman E-Commerce Lab, University of Arizona

  2. Outline • COPLINK Background and Research Framework • COPLINK Connect and Detect: Community-based Research • COPLINK STV, Agent, and Deception Detection Research • COPLINK Visual Criminal Network Analysis Research • From COPLINK to BorderSafe and Terrorism Research

  3. Outline COPLINK Background and Research Framework

  4. Introduction • The concern about national security has increased significantly since the terrorist attack on September 11, 2001 • Intelligence agencies such as the CIA and FBI are actively collecting and analyzing information to investigate terrorists’ activities • Local law enforcement agencies have also become more alert to criminal activities in their own jurisdictions that may be relevant to national security

  5. COPLINK Progression • 1990-present NSF CISE funding (IIS, Digital Government, Digital Library, NSDL, ITR, IDM, CSS), NLM/NIH (medical informatics), DARPA • 1997 NIJ COPLINK funding; Web-enabled data warehousing for law enforcement • NIJ AGILE interoperability funding; information sharing • NSF Digital Government funding; data/text mining, agents, and knowledge management; COPLINK Center • NSF/CIA KDD funding; intelligence community • DHS BorderSafe funding; NSF/CIA disease informatics (bioterrorism) funding; NSF ITR funding, terrorism portal • Goal: A model and testbed for law enforcement and national security research

  6. Increasing public influence

  7. The COPLNK Research Framework

  8. Building the Science of Intelligence and Security Informatics

  9. Outline COPLINK Testbed: Data Characteristics Information Sharing and Interoperability

  10. Tucson PD Data Sources • TPD Record Management System: Stores a wide range of information from incident reports to warrants to pawn tickets, from person descriptions to vehicles to weapons and property items. Incident data goes back as early as 1983. Database: Litton PRC RMS31 on Oracle 7.3, Compaq OpenVMS • TPD Mug Shot Database: Stores about 90,000 mug shots taken by the ID Department. Database: ImageWare on SQL Server 7.0, Windows NT 4.0 Server • TPD Gang Database: Stores comprehensive information about 3,200 gang members: their activities, aliases, physical descriptions, vehicles, etc. Database: In House Access 97, Windows NT 4.0 Server

  11. Tucson PD RMS Documents • Incident Reports: Report number, crime type, precinct, MOs, date and time. • Pawn Tickets: Ticket number, data and time. • Warrants: Warrant number, docket number, type and issue date. • Field Interviews: FI number, type, precinct, date and time.

  12. Tucson PD RMS Data Objects • Person: True names, aliases, descriptions, addresses, IDs, marks and phone numbers. • Organization: Name, address and phones. • Vehicle: VIN, license plate, make, model, style, year and colors. • Property: Serial number, type, make, model, size and colors. • Weapon: Serial number, type, manufacturer, caliber and colors.

  13. COPLINK Database: Tucson PD

  14. COPLINK Documentation Sample COPLINK ERD, Entity Relationship Diagram

  15. COPLINK Documentation COPLINK Data Dictionary: 217 Tables, 1000 attributes

  16. COPLINK Data Formats • Delimited ASCII text files • SQL Server 2000 backup file • SQL Server 2000 detached database • Oracle 8i/9i dump file • Oracle 8i/9i transportable tablespace • DB2 UDB 7 backup file • TPD data available: 10/1/2002, PPD data: 2/1/2003

  17. Information Management Challenges: Tucson PD Data Across all Crime Types • Incident Reports: Report number, crime type, precinct, MOs, date and time. • Pawn Tickets: Ticket number, data and time. • Warrants: Warrant number, docket number, type and issue date. • Field Interviews: FI number, type, precinct, date and time.

  18. Information Management Challenges: Sample COPLINK Table COPLINK Data Dictionary: 217 Tables, 1000 attributes

  19. Outline COPLINK Connect and Detect: Community-based Research User-centered Design, Information Sharing, Information Retrieval, HCI, and Association Rule Mining

  20. COPLINK Connect: Information Sharing Consolidating & sharing information promotes problem solving and collaboration Records Management Systems (RMS) Gang Database Mugshots Database

  21. COPLINK Connect Functionality • Generic, common XML based criminal elements representation • Data migration (batch and incremental) and mapping for all major databases and legacy systems • Database independent: ODBC compliance data warehouse • Multi-layered Web-based architecture: database server, Web server, browser • Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc. • Graphical browser-based GUI interface for ease of use, training and maintenance H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.

  22. COPLINK Detect: Crime Analysis Consolidated information enables targeted problem solving via powerful investigative criminal association analysis

  23. COPLINK Detect Functionality • Simple association rule mining applied to criminal elements relationships • Generic, common XML based representation for criminal relationships • Incremental data migration and association analysis on databases • Support powerful, multi-attribute queries using partial crime information • Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.

  24. COPLINK Detect 2.0/2.5

  25. COPLINK Connect/Detect Deployment • Tucson, Phoenix (Arizona) • Huntsville (Texas) • Montgomery County (Maryland) • Polk County/Des Moines (Iowa) • Ann Arbor (Michigan) • Boston (Massachusetts) • Redmond (Washington) • Henderson County (North Carolina) • Shawnee County (Kansas) • San Diego (CA) • Pima County, Arizona DHS (Arizona) • State of Alaska, Los Angeles (CA) Serving 20+ states, 300+ agencies, protecting 30M+ citizens

  26. Outline COPLINK STV, Deception Detection and Agent Research Visualization, HCI, Agent, Data Mining

  27. COPLINK Spatial-Temporal Visualization: Timeline Tool • Visualizes the chronologically ordered set of events associated with user-selected database entities • Events placed along horizontal axis • Entities placed along vertical axis • Entities can be grouped together • Each row contains all events associated with the entities in a group • Time-based Zooming • User can zoom into a specific time interval for more detail, while hiding uninteresting portions of the timeline

  28. COPLINK Spatial-Temporal Visualization: GeoMapping Tool • Plots location of incident events within a selected time interval • Zooming/panning capabilities • User-selectable GIS layers • Overview map • Provides context to the currently selected region • Plot events over time • Plot events as they occur, use different color shadings to indicate when it occurred relative to other events • Plot events as they occur and remove them after they are over, using directed arrows to highlight movement from one event to the next in time

  29. COPLINK Spatial-Temporal Visualization: Periodic Pattern Tool • Reveals periodic patterns of incident occurrence • Incident events will be plotted continuously on a circular graph • Time period represented along circle (day, week, month, etc.) • Height from center indicates number of incidents that occurred at that specific time • Customizable granularity (e.g. year, month, day, etc.) • 3-sigma statistical significance line • Indicates unusually large or small number of occurrences at a specific time

  30. COPLINK Data Mining Research Deception Detection, a data mining approach • “An agent must spell a suspect’s name exactly right, or the FBI computer will not recognize it. That can be particularly frustrating in cases such as the Sept. 11 probe, in which suspects have used multiple names and sometimes created identities by switching a few letters in their names.” – FBI • FBI’s problem with 9/11 suspect names, e.g., “Majed M.GH Moqed,” “Majed Moqed,” and “Majed Mashaan Moqed,” and DOB, e.g., “01-01-1976” and “03-03-1976.” • A deception taxonomy was created based on criminal deceptions in law enforcement databases • Patterns existed in criminal deceptions, e.g., SSN variations, name variations, etc. • Phonetic and syntactic string comparators are adopted • Promising initial testing result: 94% accuracy in deception detection G. Wang, H. Chen, H. Atabakhsh, “Automatically Detecting Deceptive Criminal Identities,” Communications of the ACM, forthcoming, 2002.

  31. A Taxonomy for Deceptions in Criminal Identity

  32. A Taxonomy of Deceptions in Criminal Identity: Name Deception • Name Deception: • Either false first name or false last name (62.5%) • Only the middle initial is changed (62.5%) • Similar pronunciation but different spelling (42%) • A Completely false name (29.2%) • Using abbreviated names or adding extra letters (29.2%) • Leaving out the first name or last name (29.2%) • Exchanging last name and first name (8%)

  33. A Taxonomy of Deceptions in Criminal Identity: DOB, SSN, Residency • DOB and ID (SSN) deception: • In most cases, criminals only make minor changes in DOB and SSN, e.g., 19700207  19700208 • Residency deception: • 42% criminals in the collection deceived on address information. In most cases, only one portion of the address is changed slightly, e.g., street number.

  34. String Comparators • Phonetic Russell SoundEx code: Newcombe [1959], encodes a name with a format having a prefix letter followed by a three-digit number, • e.g., PEARCE and PIERCE both coded as: “P620”. However, phonetic matching is particularly poor at finding matches [Zobel and Dart 1996]; • Spelling string comparator [Jaro 1976; Winkler 1990]. • compares spelling variations between two strings instead of phonetic codes Limitation: common characters in both strings must be within half the length of the shorter string

  35. Other Approximate String Matching tool • Agrep [Wu, Manber 1992]: A general string matching algorithm that can handle character variations of insertion, deletion, and substitution. • The pattern is represented as a bit array. The computation only involves simple bit operations (RightShift) and logic operations (AND, OR) on bit arrays. Rdj+1=Rshift[Rdj] AND Sc OR Rshift[Rd-1j OR Rd-1j+1] OR Rd-1j • Agrep has been integrated into Unix and been in wide use since June 1991

  36. Algorithm Design • Compare corresponding fields of each pair of records (disagreement): Sname, SDOB, Saddr, and SID • To capture different types of name deceptions, • Calculate the Normalized Euclidean Distance for the overall dis-similarity between two records, i.e., Disagreement =

  37. Experimental Results (Training: 80 cases) Table: Distance matrix, the distance value shows the degree of disagreement between each pair of records in the training data set.

  38. Experimental Results (Training: 80 cases) Table: Determining best threshold value (0.48)

  39. Experimental Results (Testing: 40 cases) Table: Accuracy of deception detection when the best threshold value (0.48) is applied to the testing data set (40 records)

  40. COPLINK Agent Research COPLINK Agent: alert and collaboration in a wireless architecture • Enhance police information timeliness, collaboration, mobility, and safety via a web-based wireless alerting system (under testing at TPD) • Real-time alert of time-critical information from multiple databases, e.g., CAD (computer-aided dispatching) database, MVD • Identify and inform officers/detectives who are working on similar cases • Push time-critical information via wireless and personalized communications, i.e., web alert, email, cell phone, and pager

  41. COPLINK Agent: Wireless Alert and Collaboration • Allows Patrol Officers to enhance their community expertise • Further promotes Officer safety through curbside knowledge • Secure wireless access and alert: laptop, PDA, pager, cell phone • Alert: 24-7 monitoring of time-critical information from different databases • Collaboration: Automatically informing detectives working on similar cases

  42. COPLINK Agent: Vehicle Search Form Multi-DB Search Notification setting Alert Method

  43. COPLINK Agent: Web and E-mail Collaboration Alerts Web Alert Email Alert

  44. COPLINK Agent: Cell Phone and Pager Alert Cell phone alert Pager alert with case number

  45. Agent User Study and Result Summary • Study Design: • Case study method based on structured interviews, archival records analysis, and usability survey. • Use QUIS (Questionnaire for User Interaction Satisfaction) survey instrument developed by the HCI Lab at the U. of Maryland. • 10 participants: crime analysts and detectives in several TPD units. • Positive feedback on system Effectiveness and Efficiency: • Monitoring: “… the information I have received back was instrumental in making at least 2 felony cases that will be prosecuted on the federal level.” • Collaboration from CAD Alert: “… allowing us to respond to incidents we know are important that the field units perhaps don’t realize in a timely manner.” • Multi-database Search: “The Tucson City Court Search was helpful because I located one of my suspects on her court date.” • High User Satisfaction from QUIS survey items: • Averaged 5.5 for 49 items on a 7-point Likert scale (7: most useful). • Strengths: Offers good Investigative power; Easy to read layout; Potential for Collaborative information sharing; CAD Integration; High intention to use. • Weaknesses: Lack of help messages; Difficult for inexperienced users; Obscure user preference settings.

  46. Arizona Daily Star, Jan 7, 2001

  47. New York Times, Nov 2, 2002

  48. Newsweek, March 3, 2003

  49. Interacting with the LE Community • User-centered design (2 officers assigned to project); frequent, focused, staged user studies (a user study team); quick prototyping and user feedback (quarterly) • TPD user briefings: 30+ user groups and management demos/briefings (2 chiefs, 7 assistant chiefs) • Arizona/regional partner briefings: 30+ regional partners demos/meetings; Phoenix, Pima, etc. • Annual COPLINK Center research workshop, under NSF Digital Government Program • National/regional NIJ/DOJ and LE meetings: 20+ LE IT meetings; International Association of Chiefs of Police (IACP) meetings • Regional deployment and success: Arizona, TX, Iowa, Michigan, Boston, Alaska, CA, etc.

  50. COPLINK Lessons Learned • Know their pain and build something they can use.  What street cops need. • Build trust and know the culture.  security, policy, training, user acceptance (build a Living Lab) • Early and consistent user involvement.  2 TPD officers, 7 asst. chiefs, 2 chiefs • Create early and small successes.  Detect/Connect, group to division and department • Spread the success and solicit partners.  Tucson, AZ, CA, TX, MA, Montgomery, MA, Alaska, etc. • Understand funding agencies expectation.  NIJ (tools), NSF (research) • Development and research prioritization.  research (Ph.D.) after development (MS/BS); little cutting-edge research in the first two years • Establish deployment partners.  KCC, diff(operational system,research prototype) = $2M • Work with university technology transfer office.  office of (preventing) technology transfer?

More Related