Towards situational awareness systems for disaster response
Download
1 / 107

Towards situational awareness systems for disaster response - PowerPoint PPT Presentation


  • 330 Views
  • Updated On :

Towards situational awareness systems for disaster response. Naveen Ashish [email protected] Bell Labs India, Bangalore, 04/23/07. Organization. Introduction to SAMI Selected research areas Technology transition Discussion . RESCUE. The SAMI TEAM Students

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards situational awareness systems for disaster response' - LeeJohn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Towards situational awareness systems for disaster response l.jpg

Towards situational awareness systems fordisaster response

Naveen [email protected]

Bell Labs India, Bangalore, 04/23/07


Organization l.jpg
Organization

  • Introduction to

  • SAMI

  • Selected research areas

  • Technology transition

  • Discussion


Rescue l.jpg
RESCUE

  • The SAMI TEAM

    • Students

    • Stella Chen, Chaitanya Desai, Vibhav Gogate, Jon Hutchinson,

    • Ram Hariharan, Shengyue Ji, Yiming Ma, Rabia Nuray-Turan,

    • Dawit Seid, Shankar Shivappa

    • Staff

    • Jay Lickfett, Chris Davison

    • Collaborators

    • Charles Huyck, Ron Eguchi, Shubharoop Ghosh

    • Faculty, Scientists and Post-docs

    • Dmitri Kalashnikov, Rajesh Hedge, Sharad Mehrotra, Sangho Park

    • Slide Aggregator (aka Project Leader)

    • Naveen Ashish

  • NSF funded “large-ITR” project

    • Advance information technologies for disaster response

  • 5 year project

    • Oct 2003 to Oct 2008

  • Institutions

    • 6 universities (UCI, UCSD, UIUC, BYU, U-Colorado, U-Maryland) and 1 company (ImageCat)

    • Active and formal community partners

      • City of LA, OCFA, Irvine Police, ….

  • People

    • Director: Sharad Mehrotra

    • ~ 25 researchers and staff, ~40 students

  • Web: http://www.itr-rescue.org


Rescue mission l.jpg
RESCUE Mission

The mission of RESCUE is to enhance the ability of emergency response organizations and the public to mitigate crises, save lives, and prevent secondary and indirect human and economic loss by radically transforming ways in which these organizations gather, process, manage, use and disseminate information during man-made and natural catastrophes.


Motivation transform the ability of first responders to mitigate crisis l.jpg

  • Response

  • Effectiveness

  • lives & property saved

  • damage prevented

  • cascades avoided

  • Quality of

  • Decisions

  • first responders

  • consequence planners

  • public

Quality &

Timeliness of

Information

  • Situational

  • Awareness

  • incidences

  • resources

  • victims

  • needs

Motivation: Transform the Ability of First Responders to Mitigate Crisis

Observation: Right Information to the Right Person at the Right Time can result in dramatically better response


Rescue objectives l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.


Rescue objectives7 l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).


Rescue objectives8 l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.


Rescue objectives9 l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

  • Develop technologies that can be used for timely and customizeddissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.


Rescue objectives10 l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

  • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.

  • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.


Rescue objectives11 l.jpg
RESCUE Objectives

  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

  • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.

  • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.

  • Promote interdisciplinary education at all levels (graduate, undergraduate, K-12) and across diverse student groups to expose the future community of citizens to issues in emergency management and homeland security – an area of global and national importance.


Rescue research projects l.jpg
RESCUE Research Projects

  • SAMI: Situational Awareness from Multi-Modal Input(Project Lead: N. Ashish, UCI)

  • PISA: Policy-driven Information Sharing Architecture (Project Lead: M. Winslett, UIUC)

  • Customized Dissemination in the Large (Project Leads: K. Tierney, UC-B & N. Venkatasubramanian, UCI)

  • Privacy Implications of Technology Adoption (Project Lead: S. Mehrotra, UCI)

  • Robust Networking and Information Collection (Project Lead: BS Manoj, UCSD)


A situational awareness application l.jpg

Applications

Evacuation Planning

Damage Assessment

Situational Dashboard

Information

Reports Responders News Weather Traffic

Simulations Reconnaissance

System

A Situational Awareness Application


Architecture l.jpg

Situational data management

Analysis

Extraction and synthesis

Architecture

Events as fundamental abstraction units


Areas l.jpg
Areas

Situational awareness systems

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment


Extraction and synthesis l.jpg
Extraction and Synthesis

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction


Why do we need data cleaning l.jpg
Why do we need “Data Cleaning”?

An actual excerpt from a person’s CV

  • sanitized for privacy

  • quite common in CVs, etc

  • this particular person

    • argues he is good

    • because his work is well-cited

  • but, there is a problem with using CiteSeer ranking

    • in general, it is not valid (in CVs)

    • let’s see why...

“... In June 2004, I was listed as the 1000th most cited author in computer science (of 100,000 authors) by CiteSeer, available at

http://citeseer.nj.nec.com/allcited.html. ...”


What is the problem in the example l.jpg
What is the problem in the example?

Suspicious entries

  • Let us go to the DBLP website

    • which stores bibliographic entries of many CS authors

  • Let us check who are

    • “A. Gupta”

    • “L. Zhang”

CiteSeer: the top-k most cited authors

DBLP

DBLP


Comparing raw and cleaned citeseer l.jpg
Comparing raw and cleaned CiteSeer

Cleaned CiteSeer top-k

CiteSeer top-k


What is the lesson l.jpg
What is the lesson?

  • data should be cleaned first

  • e.g., determine the (unique) real authors of publications

  • solving such challenges is not always “easy”

  • that explains a large body of work on data cleaning

  • note

    • CiteSeer is aware of the problem with its ranking

    • there are more issues with CiteSeer

    • many not related to data cleaning

“Garbage in, garbage out” principle:

Making decisions based on bad data, can lead to wrong results.




What is reference disambiguation l.jpg
What is “Reference Disambiguation”?

?

Author table (clean)

Publication table (to be cleaned)

A1, ‘Dave White’, ‘Intel’

A2, ‘Don White’, ‘CMU’

A3, ‘Susan Grey’, ‘MIT’

A4, ‘John Black’, ‘MIT’

A5, ‘Joe Brown’, unknown

A6, ‘Liz Pink’, unknown

P1, ‘Databases . . . ’, ‘John Black’, ‘Don White’

P2, ‘Multimedia . . . ’, ‘Sue Grey’, ‘D. White’

P3, ‘Title3 . . .’, ‘Dave White’

P4, ‘Title5 . . .’, ‘Don White’, ‘Joe Brown’

P5, ‘Title6 . . .’, ‘Joe Brown’, ‘Liz Pink’

P6, ‘Title7 . . . ’, ‘Liz Pink’, ‘D. White’

  • Analysis(‘D. White’ in P2, our approach):

  • 1. ‘Don White’

    • has a paper with ‘John [email protected]

  • 2. ‘Dave White’

    • is not connected to MIT in any way

  • 3. ‘Sue Grey’

    • is coauthor of P2 too, and @ MIT

  • Thus: ‘D. White’ in P2 is probably Don

  • (since we know he collaborates with MIT ppl.)

  • Analysis (‘D. White’ in P6, our approach):

  • 1. ‘Don White’

    • has a paper (P4) with Joe Brown;

    • Joe has a paper (P5) with Liz Pink;

    • Liz Pink is a coauthor of P6.

  • 2. ‘Dave White’

    • does not have papers with Joe or Liz

  • Thus: ‘D. White’ in P6 is probably Don

  • (since co-author networks often form clusters)


Attributed relational graph arg l.jpg
Attributed Relational Graph (ARG)

  • View dataset as a graph

    • nodes for entities

      • papers, authors, organizations

      • e.g., P2, Susan, MIT

    • edges for relationships

      • “writes”, “affiliated with”

      • e.g. Susan → P2 (“writes”)

  • “Choice” nodes

    • for uncertain relationships

    • mutual exclusion

    • “1” and “2” in the figure

  • Analysis can be viewed as

    • application of the “Context AP”

    • to this graph

    • defined next...

Q: How come domain-independent?


Context attraction principle cap l.jpg
Context Attraction Principle (CAP)

publication P1

“J. Smith”

if

  • reference r, made in the context of entity x, refers to an entity yj

  • but, the description, provided by r, matches multiple entities: y1,…,yj,…,yN,

    then

  • x and yj are likely to be more strongly connected to each other via chains of relationships

    • than x and yk (k = 1, 2, … , N; k j).

John E. Smith

SSN = 123

P1

John E. Smith

Jane Smith

Joe A. Smith

  • In designing the RelDC approach

    • - our goal was to use CAP as an axiom

    • - then solve problem formally, without heuristics


Analyzing paths linking entities and contexts l.jpg
Analyzing paths: linking entities and contexts

D. White is a reference

  • in the context of P2, P6

  • can link P2, P6 to Don

  • cannot link P2, P6 to Dave

  • more complex paths in general

  • Analysis(‘D. White’ in P2): path P2→Don

  • 1. ‘Don White’

    • has a paper with ‘John [email protected]

  • 2. ‘Dave White’

    • is not connected to MIT in any way

  • 3. ‘Sue Grey’

    • is coauthor of P1 too, and @ MIT

  • Thus: ‘D. White’ is probably Don White

  • Analysis(‘D. White’ in P6): path P6→Don

  • 1. ‘Don White’

    • has a paper (P4) with Joe Brown;

    • Joe has a paper (P5) with Liz Pink;

    • Liz Pink is a coauthor of P6.

  • 2. ‘Dave White’

    • does not have papers with Joe or Liz

  • Thus: ‘D. White’ is probably Don White


Questions to answer l.jpg

Does the CAP principle hold over real datasets?

That is, if we disambiguate references based on it, will the references be correctly disambiguated?

Can we design a generic solution to exploiting relationships for disambiguation?

Questions to answer


Problem formalization l.jpg
Problem formalization

the name of k-th author of paper xi, e.g. ‘J. Smith’

the truek-th author of paper xi

‘John A. Smith’, ‘Jane B. Smith’, ...


Entity relationship graph l.jpg
Entity-Relationship Graph

RelDC views dataset as a graph

  • undirected

  • nodes for entities

    • don’t have weights

  • edges for relationships

    • have weights

    • real number in [0,1]

    • the confidence the relationship exists

“J. Smith”

“John Smith”

P1

Handling References: Linking

(references correspond to relationships)

if|CS[xi .rk]| = 1then

  • we know the answer d[xi .rk]

  • link xi and d[xi .rk] directly, w = 1

    else

  • the answer is uncertain for xi .rk

  • create a “choice” node, link it

  • “option-weights”, w1 + ... + wN= 1

  • option-weights are variables

“Jane Smith”


Objective of reference disambiguation l.jpg
Objective of Reference Disambiguation

Definition:

To resolve a reference xi .rk means

  • to pick one yj from CS[xi .rk] as d[xi .rk].

    Graph interpretation

  • among w1, w2, ... , wN, assign wj= 1 to onewj

  • means yj is chosen as the answer d[xi .rk]

    Definition:

    Reference xi .rk is resolved correctly, if the chosen yj =d[xi .rk].

    Definition:

    Reference xi .rk is unresolved or uncertain, if not yet resolved...

    Goal:

    Resolve all uncertain references as correctly as possible.


Formalizing the cap l.jpg
Formalizing the CAP

CAP

  • is based on “connection strength”

  • c(u,v) for entities u and v

    • measures how strongly u and v are connected to each other via relationships

    • e.g. c(u,v) > c(u,z) in the figure

  • will formalize c(u,v) later

Context Attraction Principle (CAP)

ifc(xi, yj) ≥ c(xi, yk)

thenwj≥ wk(most of the time)

We use proportionality:

c(xi, yj) ∙ wk = c(xi, yk) ∙ wj


Reldc approach l.jpg
RelDC approach

Input: the ARG for the dataset

  • Computing connection strengths

    • for each unresolved reference xi .rk

      • determine equations for all (i.e., N) c(xi, yj)’s

      • c(xi, yj) = gij(w)

        • a function of other option-weights

  • Determining equations for option-weights

    • use CAP to relate all wj’s and connection strengths

    • since c(xi, yj) = gij(w), hence wij= fij(w)

  • Computing option-weights

    • solve the system of equations from Step 2.

  • Resolving references

    • use the interpretation procedure to resolve weights


Computing connection strength step 1 l.jpg
Computing connection strength (Step 1)

Computation of c(u,v) consists of two phases

  • Phase 1: Discover connections

    • all L-short simple paths between u and v

    • bottleneck

    • optimizations, not in SDM05

  • Phase 2: Measure the strength

    • in the discovered connections

    • many c(u,v) models exist

    • we use random walks in graphs model


Measuring connection strength l.jpg
Measuring connection strength

  • Note:

    • c(u,v) returns an equations

    • because paths can go via various option-edges

    • cuv = c(u,v) = guv(w)


Equations for option weights step 2 l.jpg
Equations for option-weights (Step 2)

CAP (proportionality):

System (over-constrained):

Add slack:


Solving the system steps 3 and 4 l.jpg
Solving the system (Steps 3 and 4)

Step 3: Solve the system of equations

  • use a math solver, or

  • iterative method (approx. solution ), or

  • bounding-interval-based method (tech. report).

    Step 4: Interpret option-weights

  • to determine the answer for each reference

  • pick yj with the largest weight as the answer


Experimental setup l.jpg
Experimental Setup

Parameters

  • When looking for L-short simple paths, L = 7

  • L is the path-length limit

RealPub dataset:

  • CiteSeer + HPSearch

    • publications (255K)

    • authors (176K)

    • organizations (13K)

    • departments (25K)

  • ground truth is not known

    • accuracy...

SynPub datasets:

  • many ds of two types

  • emulation of RealPub

    • publications (5K)

    • authors (1K)

    • organizations (25K)

    • departments (125K)

  • ground truth is known

RealMov:

  • movies (12K)

  • people (22K)

    • actors

    • directors

    • producers

  • studious (1K)

    • producing

    • distributing


Sample publication data l.jpg
Sample Publication Data

CiteSeer: publication records

HPSearch: author records


Efficiency and long paths l.jpg
Efficiency and Long paths

Non-exponential cost

Longer paths do help


Web disambiguation l.jpg
Web Disambiguation

Music Composer

Football Player

UCSD Professor

Comedian

Botany Professor @ Idaho



Web disambiguation42 l.jpg
Web Disambiguation

  • Extract key information such as mentions of entities (persons, names, locations) and other information such as hyperlinks and email addresses from Web pages

  • Cast as a relationship analysis problem

  • Prototype at: http://opteron.calit2.uci.edu:1977/Diamond/people_search.jsp


Slide43 l.jpg

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Information extraction from text

  • Many systems and techniques

  • May benefit from semantics

  • Limitations

    • All or nothing extraction

    • Towards probabilistic extraction systems


Leads l.jpg
Leads

  • Disambiguation and data cleaning

    • Dmitri Kalashnikov, Stella Chen, Rabia Nuray-Turan

  • Information extraction

    • Naveen Ashish, Sharad Mehrotra


Slide45 l.jpg

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Multi-microphone speech processing

    • Speaker identification

    • Noise reduction

  • Audio-visual speech recognition

    • Combine visual features (venemes) with audio

  • Speech recognition on light-weight devices

  • Team

    • Rajesh Hegde, Bhaskar Rao, Shankar Shivappa (UCSD)


Slide46 l.jpg

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Combine views from multiple cameras

  • Homomorphic transformations

    • Multi-perspective “view-binding”

  • Team

    • Sangho Park, Mohan Trivedi (UCSD)


Situational data management l.jpg
Situational Data Management

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware


Outline l.jpg
Outline

  • Overall Goal

  • Use examples to illustrate:

    • Different approaches in modeling and querying

    • Advantage of our approach

  • Extracting spatial expression

  • Building model for spatial expression

  • Experiments

  • Conclusion


Overall goal l.jpg
Overall Goal

Info about events, that constitute a crisis, is often available as text.

reports

...

Goal: Situation Awareness

from Textual Sources

Database

Textual data during crisis

  • transcribed

    • 911 calls

    • first responder communications

Textual data after crisis

  • first responders reports

  • Internet sources

  • for post factum analysis


Motivating examples l.jpg
Motivating Examples

  • Two reports filed by first responders after 9/11 attack:

    • “…the PAPD Mobile Command Post was located on West St. north of WTC …”

    • “…a PAPD Command Truck parked on the west side of Broadway St. and north of Vesey St….”

  • Query: Retrieve Events around WTC

  • Goal: Both events should be retrieved with high scores attached.


Approach 1 using ir approach l.jpg
Approach 1: Using IR approach

  • Direct Keyword retrieval

    • Only one report mentioned keyword “WTC”

  • Query expansion

    • based on nearby spatial objects

    • E.g. Nearby streets and buildings…

    • Ad-hoc and Objects might not be bounded


Approach 2 mapping using uncertain region l.jpg
Approach 2: Mapping Using Uncertain Region

  • Query : Near WTC

  • Report 1:

    West St.

    north ofWTC

  • Report 2:

    west side ofBroadway St.andnorth ofVesey St

  • Rank based on the ratio of intersection

  • Problem: rank score is not accurate based on the uniform assumptions


Our approach l.jpg

Near(WTC)

Our Approach

  • Step 1: Converting Text to Spatial Expression

    • S-expression: has well-defined function form

  • West St.

    north ofWTC

On(West St.) North(WTC)

  • west side ofBroadway St.andnorth ofVesey St

West(Broadway St.) North(Vesey St.)


Our approach54 l.jpg
Our Approach

Step 2: Mapping S-expression to probabilistic density function (PDF)

  • Near(A)

On(West St.) North(WTC)


Answering range query l.jpg
Answering Range Query

  • Given a query region

    • Retrieve objects based on the degree of belonging

On(West St.) North(WTC)

West(Broadway St.) North(Vesey St.)

  • Consider location as a random variable


Advantages of our approach l.jpg
Advantages of Our Approach

  • More explicit spatial mapping remove the needs for keyword expansion (IR approach)

  • Probabilistic representation is more formal and accurate than uncertain region (UR) approach

  • Decouple the extraction and modeling modules

    • Better extraction and modeling modules can be easily plug-in


Extracting spatial expression l.jpg
Extracting Spatial Expression

  • Step1: Discovering landmarks

    • buildings, roads, intersections

  • Step2: Generating s-descriptors

    • Use spatial relations to connect the landmarks

    • Spatial relations: near, behind, between

    • in the format D(L1, L2, ... ,Ln)

  • Step3: Generating s-expressions

    • compositions of s-descriptors

    • near(A)  near(B)


Step1 discovering landmarks l.jpg
Step1: Discovering landmarks

  • Markup the text by the landmarks

    • Using Gazetteers (Incorporate into information extractor, GATE)

    • Note: not only markup the “name”, features also attached

Examples of Landmark


Step2 generating s descriptors l.jpg
Step2: Generating s-descriptors

  • Discover spatial relations around the landmarks

    • Dictionary approach (convert spatial relations to potential words)

    • Machine learning techniques can also be used

Examples of s-descriptors


Modeling s expression l.jpg
Modeling S-expression

  • Goal: generating a reasonable probabilistic representation for s-expression

  • Step1: Modeling S-descriptors

  • Step2: Combining s-descriptors


Modeling s descriptors l.jpg
Modeling S-descriptors

  • Modeling templates

    • e.g Uniform, Normal distribution

  • Using parameter learning techniques


Generating s expression l.jpg
Generating s-expression

  • In a s-expression, we assume the s-descriptors are conditional independent.

  • If a s-expression has 2 descriptors, S1, S2

  • It can be generalized to n descriptors, S1…Sn


Generating s expression63 l.jpg

  • Near(A)

  • Outdoor()

    Near(WTC)

Generating s-expression


Experimental setup64 l.jpg
Experimental Setup

Domain

  • real geographic dataset

  • Manhattan, NY, near WTC

  • buildings, streets, roads

  • 4  4 km2

    Data

  • Based on 164 reports

    • by Police Officers

    • participants of 9/11

  • s-expressions

    • near(A), on(A), outdoor

    • intersections, buildings, street

  • Construct 2359 pdfs

    Queries

  • 50 Range Queries

  • rdsf


Simulate the errors l.jpg
Simulate the Errors

  • Extraction Errors:

    • With human supervision, error is small.

  • Modeling Errors:

    • Even with supervision, model parameters can still be away from the ideal settings.

    • E.g., the mean and variance settings for the Gaussian model.

  • We simulate two types of modeling errors for the analysts:

    • Overly confident: estimated model is too “tight”

      • By reducing variance of the “ideal” Gaussian model

    • Not confident: estimated model is too “loose”

      • By increasing variance in the “ideal” Gaussian model


Results l.jpg
Results

  • Event with large errors, probabilistic models are still better than bounding region methods


Conclusions l.jpg
Conclusions

reports

...

Spatial Awareness

from Textual Sources

Database

Novel in this work

  • approach for mapping text to PDF

  • query requirements for SA apps

    • query design issues

  • representation of PDFs

Ongoing work

  • database aspects of the problem

    • more types of queries

      Future work

  • spatio-temporal aspects

  • better modeling (text to PDF)


Slide68 l.jpg
Lead

  • Spatial awareness

    • Yiming Ma


Slide69 l.jpg

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware


Slide70 l.jpg

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware


Analysis l.jpg
Analysis

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment


Graph analysis l.jpg
Graph Analysis

Relationship

Summarization/

Exploration

[Relations]

Multi-dimensional

Analysis

[For Documents]

Graph Pattern-

Based Querying

Ranked Graph

Pattern Matching

SEMANTIC METADATA

D

B

M

S

Semantic Graphs

(Attributed graphs)

Taxonomies

(“Reference

Data”)

Entity-Relationship

Schemas

Ontologies

(“Semantic Models”)

DESCRIBED DATA

Document

Repositories

Relations


Graph data model entity attribute value model l.jpg

ns:studentAt

&UCI

&dawit

Graph Data Model (Entity-Attribute-Value Model)

  • Graph (edge sets aka triple sets):

    E.g. (&dawit ns:studentAt &UCI)

    (&UCI ns:type &university)

    (ns:university ns:subClassOf ns:oraganization)

    • Two kinds of nodes: object-ids, literals (e.g. integer, string, etc.)

      • Blank nodes (e.g. (&dawit :studentAt _)

    • Directed edges (aka predicates or properties)

      • there exists only one edge with a given label between a pair of nodes

  • Symmetric representation of Metadata + data

    • Nodes: object classes or link classes

    • Links: predicates on classes:

      (:studentAt :domain :person)

      (:studentAt :range :organization)

      (:universty :subclassOf :organization)

  • Object identity + relationship identity

    • Objects and relationships have unique ids (called URIs)


Graphs for actual data storage beyond data modeling l.jpg
Graphs for actual data storage - beyond data modeling

  • Graphs normally used for conceptual data modeling

    • the entity-relationship (ER) model

  • What is different ?

    • Using graphs for actual (minimally structured) data representation.

  • Why ?

    • Store/represent and query data without schema

    • Symmetrically Store/query both schema (ontology) and data

    • Graph traversal based query + reasoning (inference)

    • Multi-schema queries on the same graph

    • Query unstructured data annotated with taxonomies/ontologies using traditional (structured) query operators


Slide75 l.jpg

String

Date

String

(a)

(b)

(c)

Comp.Sc

name

year

title

produces

refersTo

Literal

Literal

Info. Sys.

Info. Sys.

M

O

D

E

L

researcher

publication

Data

price

Interfaces

Interfaces

IR

Encrypt.

editsProc

affiliates

editor

book

book

DB

DB

Data

Struct.

editsBook

proceeding

rating

list_price

book

D. Lib.

Languages

Languages

Online

services

writesBook

Literal

Literal

Systems

Systems

inProceeding

organization

chapter

Literal

Literal

author

writesArticle

org_name

pages

article

String

Multimedia

DB

Multimedia

DB

Distributed

DB

Distributed

DB

String

topic ontology

name

John

I

N

S

T

A

N

C

E

IBM

affiliates

90

writesBook

price

org_name

&r1

&o1

title

“”

affiliates

writesBook

&b1

year

2003

affiliates

&r2

name

writesBook

110

price

LEGEND

Alex

affiliates

UCI

subClassOf/

subPropertyOf

org_name

&b2

1998

writesBook

year

Sara

name

&r3

&o2

rdf:type

100

produces

price

affiliates

&b3

&o organization

&r researcher

&b book

&p proceeding

&a article

1998

&r4

year

&p1

writesArticle

inPRoceeding

&a1


Graph pattern based querying l.jpg
Graph Pattern based Querying

super-class of writesBook

variable

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

triple pattern

queries schema (a)

uses schema (b)

Variable on predicates - matches all applicable predicates


Graph pattern based querying77 l.jpg

.

.

.

.

.

.

Graph Pattern based Querying

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

&o1

&r1

&b1

90

2003

&o1

&r2

&b1

90

2003

Relation

Graph set

Graph


Graph pattern based querying78 l.jpg

.

.

.

Graph Pattern based Querying

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

2003

&o1

&r1

&b1

90

&o1

&r1

&b1

90

2003

110

&o1

&r2

&b1

90

&r2

&b2

&o2

1998

2003

&r3

&b3

100

1998

Relation

Graph


Enumerative algebra l.jpg

&o1

&r1

&r1

&b1

&o1

&r2

&b1

&r2

&r2

&o1

&b2

&r3

Enumerative Algebra

  • Enumerative algebra - algebra over sets of variable bindings

Triple patterns

?org :affiliates ?aut

?aut :produces ?b

org

aut

aut

b

Variables

Bindings

(per triple

pattern)

Joinable Bindings – same variable,

same value.


Enumerative algebra ctd l.jpg

?org

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

&o1

&r3

?org

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

Enumerative Algebra (ctd.)

Given two set of bindings T1 and T2, and r denoting a binding:

T1

T2

= {r | r  T1 or r  T2 }

T1 ⋈

T2

= {r1

r2 | r1  T1 and r  T2

and r1 and r2 are joinable}


Enumerative algebra ctd81 l.jpg
Enumerative Algebra (ctd.)

  • match[P] (G) – matches the graph pattern P to graph G

    • Given P = {p1, p2, …, pm}

G

match [P](G) =

match [p1] ⋈

⋈ match [pm]

match [p2] ⋈

Sets of sets (tuples) of bindings


Enumerative algebra ctd82 l.jpg
Enumerative Algebra (ctd.)

  • Other operators:

Difference:

T1 \ T2 = {r  T1 | for all r’  T2,

r and r’ are not joinable}

Outer Join:

T1 T2 = (T1 ⋈ T2) ∪ (T1 \ T2)

Filter, (T), evaluate the Boolean condition on T.

E.g. of  is: ?p > 100.


Extractive algebra l.jpg
Extractive Algebra

Given two graphs G1 and G2, and t denoting a triple :

G1

G2

= {t | t  G1 or t  G2 }

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

  • Matching retains Structure

  • More compact Representation during implementation

&o1 :aff

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2


Extractive algebra ctd l.jpg

&o1 :aff

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

˄

  • For all t1  G1, either there exists t2  G2 such that t1 and t2 are joinable by p or t1 does not match p1  p.

  • For all t2  G2, either there exists t1  G1 such that t2 and t1 are joinable by p or t2 does not match p2  p

G1 ⋈p G2 = {G1

G2 |

where p = (p1,p2), i.e. a pair of triple patterns.

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

⋈((?org :affiliates ?aut),(?aut :produces ?b))


Extractive algebra ctd85 l.jpg

&b1 :price

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b :price ?p))

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&b1 :year

2003

&b3 :year

1998


Extractive algebra ctd86 l.jpg

&b1 :price

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&o1 :aff

&r1

&b1 :year

2003

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b ?pred ?x))


Extractive algebra ctd87 l.jpg
Extractive Algebra (ctd.)

  • extract[P] (G) – matches the graph pattern P

    • Given P = {p1, p2, …, pm}

G

˄

extract [P](G) =

match [p1] ⋈

˄

˄

match [p2] ⋈

⋈ match [pm]

Graph


Extractive algebra ctd88 l.jpg
Extractive Algebra (ctd.)

  • Other operations:

Difference:

G1 \ G2 = {t  G1 and t  G2}

Filter:

(G) = G \ {t |  (t)  true}


Implementing extract na ve join split l.jpg
Implementing Extract – Naïve/Join-split

  • As a post-process of enumerative matching

    • Do enumerative matching

      • Produces a joined relation

    • Vertically split join result into triples

  • IO cost: for a pair of triple-sets:

    • 2 reads of triple sets +

    • 1 write of joined result +

    • 2 reads of join result (one for each split/projection) +

    • 2 writes of projected result +

    • 2 reads of the projected triple sets

    • 1 write of unioned result

    • Total: 6 reads and 4 writes (4 reads and 3 write if no union).


Implementing extract 2 way semi joins l.jpg
Implementing Extract – 2-way semi-joins

  • Use 2-way semi-joins

    • Given two joinable triple sets A and B,

  • IO Cost

    • 2 reads of triplesets (first semi-join)

    • 1 write of result to union (writes smaller table)

    • 2 reads to perform next semijoin (1 read is on smaller table)

    • 1 write of result to union

    • Total: 4 reads and 2 writes.

B’

A’

A

B


Implementing extract 2 stream operator l.jpg

˄

Implementing Extract – 2-stream operator

  • Scan each input and produce triples that have at least one match in the other

  • Is a high-level operator that can be implemented via:

    • Hashing or

    • Sort-merge

A’

B’

A

B


Grouping and aggregation flatten and aggregate approach l.jpg
Grouping and Aggregation : Flatten-and-Aggregate Approach

  • This is how Oracle supports aggregation over graph data !

  • Also, [Hung, Deng, and Subrahmanian, ICDE 2005]

SELECT ?org, sum (?p) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

writesBook

affiliates

writesBook

&r1

Group and

Aggregate

Enumerative

Match Results

&o1

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

100

Result: 390.

WRONG !

&r3


Group by l.jpg
Group By

  • Should be based on extractive matching (graphs).

  • What should group by mean on graphs ?

    • Collapse a set of triples into a single triple.

    • Use Bag nodes.

Bag

type

writesBook

&b1

:1

:2

Bag

affiliates

type

&r1

writesBook

&o1

:1

affiliates

&b2

affiliates

&r2

writesBook

affiliates

&o2

&b3

type

:1

Bag

&r3

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?aut ON :writesBook

Grouping Target

Grouping Basis


Aggregation l.jpg
Aggregation

  • Two types (modes) of aggregations on graphs

    • Branch-wise : aggregate a set of values adjacent to a node type

    • Path-wise : aggregate over a path in the graph

      • Not discussed here.

  • Branch-wise Example :

2003

year

Aggregation

basis

label

Anchor

Mode

90

price

&b1

SELECT ?b, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

1998

year

&b2

price

110

1998

year

&b3

price

100


Aggregation revisit example l.jpg
Aggregation – revisit example

  • Anchor and aggregation basis

    not adjacent !

Aggregation

basis

label

Anchor

Mode

SELECT ?org, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

price

writesBook

affiliates

writesBook

Optional

&r1

&o1

price

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

price

100

&r3


Aggregation solution l.jpg
Aggregation - solution

  • RULE: All nodes between anchor and aggregation basis should be bags !

    • If anchor and aggregation basis are adjacent, push aggregation into group by.

    • Otherwise, iteratively perform graph grouping with edge-propagation making each intermediary node an aggregation target.

Bag

writesBook

90

affiliates

&b1

:2

:1

&r1

&r2

&o1

:1

Bag

affiliates

type

110

writesBook

&b2

affiliates

&o2

type

writesBook

100

&b3

:1

Bag

&r3

Result: &o1, 300.

&o2, 200


Slide97 l.jpg
Lead

  • Dawit Yimam Seid


Slide98 l.jpg

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • Ram Hariharan (with Sharad Mehrotra and Chen Li)

  • Searching (open source) GIS data and datasets

    • Metadata

    • Compression


Slide99 l.jpg

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • Vibhav Gogate and Jon Hutchinson (with Padhraic Smyth)

  • Activity monitoring and prediction

  • Anomalous event detection


Slide100 l.jpg

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • ImageCat Inc (Ron Eguchi, Charles Huyck)

  • INLET, MetaSIM



Disaster portal l.jpg
Disaster Portal

  • Many Communities – Many Disaster Portals

    • Contents of sites are administered by respective city emergency mgmt.

    • Easily customized to meet needs of different communities.

    • Regional summarization capabilities built in (eg. county/state level summary view).

  • Objectives of the Disaster Portal project are to provide:

    • An integrated platform for RESCUE team members to develop, test, and demonstrate their research projects in real-life scenarios.

    • Next-generation capabilities to first responders and the public.

  • Key development partner:

    • City of Ontario

The Disaster Portal is a suite of web applications for disseminating information and providing situational awareness to the general public during a disaster.


Community deployment of disaster portal l.jpg
Community Deployment of Disaster Portal

  • Applications selected from Disaster Portal suite.

  • Portal framework providing situation summary page, custom look-and-feel



Slide105 l.jpg

Disaster Portal

Included in Ontario Pilot Disaster Portal


Slide106 l.jpg

Situational awareness systems

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment

SAMI


Conclusions107 l.jpg
Conclusions

  • Situational data management

  • Semantics

  • Synergies

  • Integrated demonstration

Thank you !

[email protected]


ad