Towards situational awareness systems for disaster response

Towards situational awareness systems fordisaster response Naveen AshishCalit2@UC-Irvine Bell Labs India, Bangalore, 04/23/07

Organization • Introduction to • SAMI • Selected research areas • Technology transition • Discussion

RESCUE • The SAMI TEAM • Students • Stella Chen, Chaitanya Desai, Vibhav Gogate, Jon Hutchinson, • Ram Hariharan, Shengyue Ji, Yiming Ma, Rabia Nuray-Turan, • Dawit Seid, Shankar Shivappa • Staff • Jay Lickfett, Chris Davison • Collaborators • Charles Huyck, Ron Eguchi, Shubharoop Ghosh • Faculty, Scientists and Post-docs • Dmitri Kalashnikov, Rajesh Hedge, Sharad Mehrotra, Sangho Park • Slide Aggregator (aka Project Leader) • Naveen Ashish • NSF funded “large-ITR” project • Advance information technologies for disaster response • 5 year project • Oct 2003 to Oct 2008 • Institutions • 6 universities (UCI, UCSD, UIUC, BYU, U-Colorado, U-Maryland) and 1 company (ImageCat) • Active and formal community partners • City of LA, OCFA, Irvine Police, …. • People • Director: Sharad Mehrotra • ~ 25 researchers and staff, ~40 students • Web: http://www.itr-rescue.org

RESCUE Mission The mission of RESCUE is to enhance the ability of emergency response organizations and the public to mitigate crises, save lives, and prevent secondary and indirect human and economic loss by radically transforming ways in which these organizations gather, process, manage, use and disseminate information during man-made and natural catastrophes.

Response • Effectiveness • lives & property saved • damage prevented • cascades avoided • Quality of • Decisions • first responders • consequence planners • public Quality & Timeliness of Information • Situational • Awareness • incidences • resources • victims • needs Motivation: Transform the Ability of First Responders to Mitigate Crisis Observation: Right Information to the Right Person at the Right Time can result in dramatically better response

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster. • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster. • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals). • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster. • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals). • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands. • Develop technologies that can be used for timely and customizeddissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster. • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals). • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands. • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions. • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.

RESCUE Objectives • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster. • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals). • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands. • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions. • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public. • Promote interdisciplinary education at all levels (graduate, undergraduate, K-12) and across diverse student groups to expose the future community of citizens to issues in emergency management and homeland security – an area of global and national importance.

RESCUE Research Projects • SAMI: Situational Awareness from Multi-Modal Input(Project Lead: N. Ashish, UCI) • PISA: Policy-driven Information Sharing Architecture (Project Lead: M. Winslett, UIUC) • Customized Dissemination in the Large (Project Leads: K. Tierney, UC-B & N. Venkatasubramanian, UCI) • Privacy Implications of Technology Adoption (Project Lead: S. Mehrotra, UCI) • Robust Networking and Information Collection (Project Lead: BS Manoj, UCSD)

Applications Evacuation Planning Damage Assessment Situational Dashboard Information Reports Responders News Weather Traffic Simulations Reconnaissance System A Situational Awareness Application

Situational data management Analysis Extraction and synthesis Architecture Events as fundamental abstraction units

Areas Situational awareness systems Extraction and synthesis Data management Analysis graph analysis semantic extraction from text geospatial audio-visual extraction E event model SAT-ware predictive modeling spatial indexing damage assessment

Extraction and Synthesis Extraction and Synthesis Semantic extraction from text Audio event extraction Visual event extraction

Why do we need “Data Cleaning”? An actual excerpt from a person’s CV • sanitized for privacy • quite common in CVs, etc • this particular person • argues he is good • because his work is well-cited • but, there is a problem with using CiteSeer ranking • in general, it is not valid (in CVs) • let’s see why... “... In June 2004, I was listed as the 1000th most cited author in computer science (of 100,000 authors) by CiteSeer, available at http://citeseer.nj.nec.com/allcited.html. ...”

What is the problem in the example? Suspicious entries • Let us go to the DBLP website • which stores bibliographic entries of many CS authors • Let us check who are • “A. Gupta” • “L. Zhang” CiteSeer: the top-k most cited authors DBLP DBLP

Comparing raw and cleaned CiteSeer Cleaned CiteSeer top-k CiteSeer top-k

What is the lesson? • data should be cleaned first • e.g., determine the (unique) real authors of publications • solving such challenges is not always “easy” • that explains a large body of work on data cleaning • note • CiteSeer is aware of the problem with its ranking • there are more issues with CiteSeer • many not related to data cleaning “Garbage in, garbage out” principle: Making decisions based on bad data, can lead to wrong results.

High-level view of the problem

Traditional Domain-Independent DC Methods

What is “Reference Disambiguation”? ? Author table (clean) Publication table (to be cleaned) A1, ‘Dave White’, ‘Intel’ A2, ‘Don White’, ‘CMU’ A3, ‘Susan Grey’, ‘MIT’ A4, ‘John Black’, ‘MIT’ A5, ‘Joe Brown’, unknown A6, ‘Liz Pink’, unknown P1, ‘Databases . . . ’, ‘John Black’, ‘Don White’ P2, ‘Multimedia . . . ’, ‘Sue Grey’, ‘D. White’ P3, ‘Title3 . . .’, ‘Dave White’ P4, ‘Title5 . . .’, ‘Don White’, ‘Joe Brown’ P5, ‘Title6 . . .’, ‘Joe Brown’, ‘Liz Pink’ P6, ‘Title7 . . . ’, ‘Liz Pink’, ‘D. White’ • Analysis(‘D. White’ in P2, our approach): • 1. ‘Don White’ • has a paper with ‘John Black’@MIT • 2. ‘Dave White’ • is not connected to MIT in any way • 3. ‘Sue Grey’ • is coauthor of P2 too, and @ MIT • Thus: ‘D. White’ in P2 is probably Don • (since we know he collaborates with MIT ppl.) • Analysis (‘D. White’ in P6, our approach): • 1. ‘Don White’ • has a paper (P4) with Joe Brown; • Joe has a paper (P5) with Liz Pink; • Liz Pink is a coauthor of P6. • 2. ‘Dave White’ • does not have papers with Joe or Liz • Thus: ‘D. White’ in P6 is probably Don • (since co-author networks often form clusters)

Attributed Relational Graph (ARG) • View dataset as a graph • nodes for entities • papers, authors, organizations • e.g., P2, Susan, MIT • edges for relationships • “writes”, “affiliated with” • e.g. Susan → P2 (“writes”) • “Choice” nodes • for uncertain relationships • mutual exclusion • “1” and “2” in the figure • Analysis can be viewed as • application of the “Context AP” • to this graph • defined next... Q: How come domain-independent?

Context Attraction Principle (CAP) publication P1 “J. Smith” if • reference r, made in the context of entity x, refers to an entity yj • but, the description, provided by r, matches multiple entities: y1,…,yj,…,yN, then • x and yj are likely to be more strongly connected to each other via chains of relationships • than x and yk (k = 1, 2, … , N; k j). John E. Smith SSN = 123 P1 John E. Smith Jane Smith Joe A. Smith • In designing the RelDC approach • - our goal was to use CAP as an axiom • - then solve problem formally, without heuristics

Analyzing paths: linking entities and contexts D. White is a reference • in the context of P2, P6 • can link P2, P6 to Don • cannot link P2, P6 to Dave • more complex paths in general • Analysis(‘D. White’ in P2): path P2→Don • 1. ‘Don White’ • has a paper with ‘John Black’@MIT • 2. ‘Dave White’ • is not connected to MIT in any way • 3. ‘Sue Grey’ • is coauthor of P1 too, and @ MIT • Thus: ‘D. White’ is probably Don White • Analysis(‘D. White’ in P6): path P6→Don • 1. ‘Don White’ • has a paper (P4) with Joe Brown; • Joe has a paper (P5) with Liz Pink; • Liz Pink is a coauthor of P6. • 2. ‘Dave White’ • does not have papers with Joe or Liz • Thus: ‘D. White’ is probably Don White

Does the CAP principle hold over real datasets? That is, if we disambiguate references based on it, will the references be correctly disambiguated? Can we design a generic solution to exploiting relationships for disambiguation? Questions to answer

Problem formalization the name of k-th author of paper xi, e.g. ‘J. Smith’ the truek-th author of paper xi ‘John A. Smith’, ‘Jane B. Smith’, ...

Entity-Relationship Graph RelDC views dataset as a graph • undirected • nodes for entities • don’t have weights • edges for relationships • have weights • real number in [0,1] • the confidence the relationship exists “J. Smith” “John Smith” P1 Handling References: Linking (references correspond to relationships) if|CS[xi .rk]| = 1then • we know the answer d[xi .rk] • link xi and d[xi .rk] directly, w = 1 else • the answer is uncertain for xi .rk • create a “choice” node, link it • “option-weights”, w1 + ... + wN= 1 • option-weights are variables “Jane Smith”

Objective of Reference Disambiguation Definition: To resolve a reference xi .rk means • to pick one yj from CS[xi .rk] as d[xi .rk]. Graph interpretation • among w1, w2, ... , wN, assign wj= 1 to onewj • means yj is chosen as the answer d[xi .rk] Definition: Reference xi .rk is resolved correctly, if the chosen yj =d[xi .rk]. Definition: Reference xi .rk is unresolved or uncertain, if not yet resolved... Goal: Resolve all uncertain references as correctly as possible.

Formalizing the CAP CAP • is based on “connection strength” • c(u,v) for entities u and v • measures how strongly u and v are connected to each other via relationships • e.g. c(u,v) > c(u,z) in the figure • will formalize c(u,v) later Context Attraction Principle (CAP) ifc(xi, yj) ≥ c(xi, yk) thenwj≥ wk(most of the time) We use proportionality: c(xi, yj) ∙ wk = c(xi, yk) ∙ wj

RelDC approach Input: the ARG for the dataset • Computing connection strengths • for each unresolved reference xi .rk • determine equations for all (i.e., N) c(xi, yj)’s • c(xi, yj) = gij(w) • a function of other option-weights • Determining equations for option-weights • use CAP to relate all wj’s and connection strengths • since c(xi, yj) = gij(w), hence wij= fij(w) • Computing option-weights • solve the system of equations from Step 2. • Resolving references • use the interpretation procedure to resolve weights

Computing connection strength (Step 1) Computation of c(u,v) consists of two phases • Phase 1: Discover connections • all L-short simple paths between u and v • bottleneck • optimizations, not in SDM05 • Phase 2: Measure the strength • in the discovered connections • many c(u,v) models exist • we use random walks in graphs model

Measuring connection strength • Note: • c(u,v) returns an equations • because paths can go via various option-edges • cuv = c(u,v) = guv(w)

Equations for option-weights (Step 2) CAP (proportionality): System (over-constrained): Add slack:

Solving the system (Steps 3 and 4) Step 3: Solve the system of equations • use a math solver, or • iterative method (approx. solution ), or • bounding-interval-based method (tech. report). Step 4: Interpret option-weights • to determine the answer for each reference • pick yj with the largest weight as the answer

Experimental Setup Parameters • When looking for L-short simple paths, L = 7 • L is the path-length limit RealPub dataset: • CiteSeer + HPSearch • publications (255K) • authors (176K) • organizations (13K) • departments (25K) • ground truth is not known • accuracy... SynPub datasets: • many ds of two types • emulation of RealPub • publications (5K) • authors (1K) • organizations (25K) • departments (125K) • ground truth is known RealMov: • movies (12K) • people (22K) • actors • directors • producers • studious (1K) • producing • distributing

Sample Publication Data CiteSeer: publication records HPSearch: author records

Efficiency and Long paths Non-exponential cost Longer paths do help

Web Disambiguation Music Composer Football Player UCSD Professor Comedian Botany Professor @ Idaho

Web Disambiguation

Web Disambiguation • Extract key information such as mentions of entities (persons, names, locations) and other information such as hyperlinks and email addresses from Web pages • Cast as a relationship analysis problem • Prototype at: http://opteron.calit2.uci.edu:1977/Diamond/people_search.jsp

Extraction and Synthesis Semantic extraction from text Audio event extraction Visual event extraction • Information extraction from text • Many systems and techniques • May benefit from semantics • Limitations • All or nothing extraction • Towards probabilistic extraction systems

Leads • Disambiguation and data cleaning • Dmitri Kalashnikov, Stella Chen, Rabia Nuray-Turan • Information extraction • Naveen Ashish, Sharad Mehrotra

Extraction and Synthesis Semantic extraction from text Audio event extraction Visual event extraction • Multi-microphone speech processing • Speaker identification • Noise reduction • Audio-visual speech recognition • Combine visual features (venemes) with audio • Speech recognition on light-weight devices • Team • Rajesh Hegde, Bhaskar Rao, Shankar Shivappa (UCSD)

Extraction and Synthesis Semantic extraction from text Audio event extraction Visual event extraction • Combine views from multiple cameras • Homomorphic transformations • Multi-perspective “view-binding” • Team • Sangho Park, Mohan Trivedi (UCSD)

Situational Data Management Situational Data Management Spatial Indexing Event data model SAT-Ware

Outline • Overall Goal • Use examples to illustrate: • Different approaches in modeling and querying • Advantage of our approach • Extracting spatial expression • Building model for spatial expression • Experiments • Conclusion

Overall Goal Info about events, that constitute a crisis, is often available as text. reports ... Goal: Situation Awareness from Textual Sources Database Textual data during crisis • transcribed • 911 calls • first responder communications Textual data after crisis • first responders reports • Internet sources • for post factum analysis

Motivating Examples • Two reports filed by first responders after 9/11 attack: • “…the PAPD Mobile Command Post was located on West St. north of WTC …” • “…a PAPD Command Truck parked on the west side of Broadway St. and north of Vesey St….” • Query: Retrieve Events around WTC • Goal: Both events should be retrieved with high scores attached.

Towards situational awareness systems for disaster response

Towards situational awareness systems for disaster response

Presentation Transcript

Situational Awareness

SITUATIONAL AWARENESS TOOLS

Situational Awareness for Flash Flooding

Situational Awareness

Situational Awareness Technology

Space Weather for Situational Awareness

Using Honeynets for Internet Situational Awareness

Situational Awareness for Fire Fighters (SAFIRE)

Situational Awareness for Flash Flooding

Situational Awareness in Emergency Response

Situational Awareness Technologies for Fire Emergency Response Prof. Nalini Venkatasubramanian

Geodynamic Situational Awareness for Intelligence Operations

Information Infrastructure for Situational Awareness and Systems Integration

Situational Awareness

Situational Awareness Board

Situational Awareness, Delivered

Chaotic Situational Awareness

Space Weather for Situational Awareness

Situational Awareness, Delivered

Situational Awareness