Loading in 5 sec....

Towards situational awareness systems for disaster responsePowerPoint Presentation

Towards situational awareness systems for disaster response

- 331 Views
- Updated On :

Towards situational awareness systems for disaster response. Naveen Ashish Calit2@UC-Irvine. Bell Labs India, Bangalore, 04/23/07. Organization. Introduction to SAMI Selected research areas Technology transition Discussion . RESCUE. The SAMI TEAM Students

Related searches for Towards situational awareness systems for disaster response

Download Presentation
## PowerPoint Slideshow about 'Towards situational awareness systems for disaster response' - LeeJohn

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Towards situational awareness systems fordisaster response

Naveen AshishCalit2@UC-Irvine

Bell Labs India, Bangalore, 04/23/07

Organization

- Introduction to
- SAMI
- Selected research areas
- Technology transition
- Discussion

RESCUE

- The SAMI TEAM
- Students
- Stella Chen, Chaitanya Desai, Vibhav Gogate, Jon Hutchinson,
- Ram Hariharan, Shengyue Ji, Yiming Ma, Rabia Nuray-Turan,
- Dawit Seid, Shankar Shivappa
- Staff
- Jay Lickfett, Chris Davison
- Collaborators
- Charles Huyck, Ron Eguchi, Shubharoop Ghosh
- Faculty, Scientists and Post-docs
- Dmitri Kalashnikov, Rajesh Hedge, Sharad Mehrotra, Sangho Park
- Slide Aggregator (aka Project Leader)
- Naveen Ashish

- NSF funded “large-ITR” project
- Advance information technologies for disaster response

- 5 year project
- Oct 2003 to Oct 2008

- Institutions
- 6 universities (UCI, UCSD, UIUC, BYU, U-Colorado, U-Maryland) and 1 company (ImageCat)
- Active and formal community partners
- City of LA, OCFA, Irvine Police, ….

- People
- Director: Sharad Mehrotra
- ~ 25 researchers and staff, ~40 students

- Web: http://www.itr-rescue.org

RESCUE Mission

The mission of RESCUE is to enhance the ability of emergency response organizations and the public to mitigate crises, save lives, and prevent secondary and indirect human and economic loss by radically transforming ways in which these organizations gather, process, manage, use and disseminate information during man-made and natural catastrophes.

- Response
- Effectiveness
- lives & property saved
- damage prevented
- cascades avoided

- Quality of
- Decisions
- first responders
- consequence planners
- public

Quality &

Timeliness of

Information

- Situational
- Awareness
- incidences
- resources
- victims
- needs

Observation: Right Information to the Right Person at the Right Time can result in dramatically better response

RESCUE Objectives

- Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

RESCUE Objectives

- Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
- Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

RESCUE Objectives

- Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
- Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
- Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

RESCUE Objectives

- Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
- Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
- Develop technologies that can be used for timely and customizeddissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.

RESCUE Objectives

- Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
- Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.
- Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.

RESCUE Objectives

- Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.
- Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.
- Promote interdisciplinary education at all levels (graduate, undergraduate, K-12) and across diverse student groups to expose the future community of citizens to issues in emergency management and homeland security – an area of global and national importance.

RESCUE Research Projects

- SAMI: Situational Awareness from Multi-Modal Input(Project Lead: N. Ashish, UCI)
- PISA: Policy-driven Information Sharing Architecture (Project Lead: M. Winslett, UIUC)
- Customized Dissemination in the Large (Project Leads: K. Tierney, UC-B & N. Venkatasubramanian, UCI)
- Privacy Implications of Technology Adoption (Project Lead: S. Mehrotra, UCI)
- Robust Networking and Information Collection (Project Lead: BS Manoj, UCSD)

Evacuation Planning

Damage Assessment

Situational Dashboard

Information

Reports Responders News Weather Traffic

Simulations Reconnaissance

System

A Situational Awareness ApplicationAnalysis

Extraction and synthesis

ArchitectureEvents as fundamental abstraction units

Areas

Situational awareness systems

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment

Extraction and Synthesis

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

Why do we need “Data Cleaning”?

An actual excerpt from a person’s CV

- sanitized for privacy
- quite common in CVs, etc
- this particular person
- argues he is good
- because his work is well-cited

- but, there is a problem with using CiteSeer ranking
- in general, it is not valid (in CVs)
- let’s see why...

“... In June 2004, I was listed as the 1000th most cited author in computer science (of 100,000 authors) by CiteSeer, available at

http://citeseer.nj.nec.com/allcited.html. ...”

What is the problem in the example?

Suspicious entries

- Let us go to the DBLP website
- which stores bibliographic entries of many CS authors

- Let us check who are
- “A. Gupta”
- “L. Zhang”

CiteSeer: the top-k most cited authors

DBLP

DBLP

What is the lesson?

- data should be cleaned first
- e.g., determine the (unique) real authors of publications
- solving such challenges is not always “easy”
- that explains a large body of work on data cleaning
- note
- CiteSeer is aware of the problem with its ranking
- there are more issues with CiteSeer
- many not related to data cleaning

“Garbage in, garbage out” principle:

Making decisions based on bad data, can lead to wrong results.

What is “Reference Disambiguation”?

?

Author table (clean)

Publication table (to be cleaned)

A1, ‘Dave White’, ‘Intel’

A2, ‘Don White’, ‘CMU’

A3, ‘Susan Grey’, ‘MIT’

A4, ‘John Black’, ‘MIT’

A5, ‘Joe Brown’, unknown

A6, ‘Liz Pink’, unknown

P1, ‘Databases . . . ’, ‘John Black’, ‘Don White’

P2, ‘Multimedia . . . ’, ‘Sue Grey’, ‘D. White’

P3, ‘Title3 . . .’, ‘Dave White’

P4, ‘Title5 . . .’, ‘Don White’, ‘Joe Brown’

P5, ‘Title6 . . .’, ‘Joe Brown’, ‘Liz Pink’

P6, ‘Title7 . . . ’, ‘Liz Pink’, ‘D. White’

- Analysis(‘D. White’ in P2, our approach):
- 1. ‘Don White’
- has a paper with ‘John Black’@MIT

- 2. ‘Dave White’
- is not connected to MIT in any way

- 3. ‘Sue Grey’
- is coauthor of P2 too, and @ MIT

- Thus: ‘D. White’ in P2 is probably Don
- (since we know he collaborates with MIT ppl.)

- Analysis (‘D. White’ in P6, our approach):
- 1. ‘Don White’
- has a paper (P4) with Joe Brown;
- Joe has a paper (P5) with Liz Pink;
- Liz Pink is a coauthor of P6.

- 2. ‘Dave White’
- does not have papers with Joe or Liz

- Thus: ‘D. White’ in P6 is probably Don
- (since co-author networks often form clusters)

Attributed Relational Graph (ARG)

- View dataset as a graph
- nodes for entities
- papers, authors, organizations
- e.g., P2, Susan, MIT

- edges for relationships
- “writes”, “affiliated with”
- e.g. Susan → P2 (“writes”)

- nodes for entities
- “Choice” nodes
- for uncertain relationships
- mutual exclusion
- “1” and “2” in the figure

- Analysis can be viewed as
- application of the “Context AP”
- to this graph
- defined next...

Q: How come domain-independent?

Context Attraction Principle (CAP)

publication P1

“J. Smith”

if

- reference r, made in the context of entity x, refers to an entity yj
- but, the description, provided by r, matches multiple entities: y1,…,yj,…,yN,
then

- x and yj are likely to be more strongly connected to each other via chains of relationships
- than x and yk (k = 1, 2, … , N; k j).

John E. Smith

SSN = 123

P1

John E. Smith

Jane Smith

Joe A. Smith

- In designing the RelDC approach
- - our goal was to use CAP as an axiom
- - then solve problem formally, without heuristics

Analyzing paths: linking entities and contexts

D. White is a reference

- in the context of P2, P6
- can link P2, P6 to Don
- cannot link P2, P6 to Dave
- more complex paths in general

- Analysis(‘D. White’ in P2): path P2→Don
- 1. ‘Don White’
- has a paper with ‘John Black’@MIT

- 2. ‘Dave White’
- is not connected to MIT in any way

- 3. ‘Sue Grey’
- is coauthor of P1 too, and @ MIT

- Thus: ‘D. White’ is probably Don White

- Analysis(‘D. White’ in P6): path P6→Don
- 1. ‘Don White’
- has a paper (P4) with Joe Brown;
- Joe has a paper (P5) with Liz Pink;
- Liz Pink is a coauthor of P6.

- 2. ‘Dave White’
- does not have papers with Joe or Liz

- Thus: ‘D. White’ is probably Don White

Does the CAP principle hold over real datasets?

That is, if we disambiguate references based on it, will the references be correctly disambiguated?

Can we design a generic solution to exploiting relationships for disambiguation?

Questions to answerProblem formalization

the name of k-th author of paper xi, e.g. ‘J. Smith’

the truek-th author of paper xi

‘John A. Smith’, ‘Jane B. Smith’, ...

Entity-Relationship Graph

RelDC views dataset as a graph

- undirected
- nodes for entities
- don’t have weights

- edges for relationships
- have weights
- real number in [0,1]
- the confidence the relationship exists

“J. Smith”

“John Smith”

P1

Handling References: Linking

(references correspond to relationships)

if|CS[xi .rk]| = 1then

- we know the answer d[xi .rk]
- link xi and d[xi .rk] directly, w = 1
else

- the answer is uncertain for xi .rk
- create a “choice” node, link it
- “option-weights”, w1 + ... + wN= 1
- option-weights are variables

“Jane Smith”

Objective of Reference Disambiguation

Definition:

To resolve a reference xi .rk means

- to pick one yj from CS[xi .rk] as d[xi .rk].
Graph interpretation

- among w1, w2, ... , wN, assign wj= 1 to onewj
- means yj is chosen as the answer d[xi .rk]
Definition:

Reference xi .rk is resolved correctly, if the chosen yj =d[xi .rk].

Definition:

Reference xi .rk is unresolved or uncertain, if not yet resolved...

Goal:

Resolve all uncertain references as correctly as possible.

Formalizing the CAP

CAP

- is based on “connection strength”
- c(u,v) for entities u and v
- measures how strongly u and v are connected to each other via relationships
- e.g. c(u,v) > c(u,z) in the figure

- will formalize c(u,v) later

Context Attraction Principle (CAP)

ifc(xi, yj) ≥ c(xi, yk)

thenwj≥ wk(most of the time)

We use proportionality:

c(xi, yj) ∙ wk = c(xi, yk) ∙ wj

RelDC approach

Input: the ARG for the dataset

- Computing connection strengths
- for each unresolved reference xi .rk
- determine equations for all (i.e., N) c(xi, yj)’s
- c(xi, yj) = gij(w)
- a function of other option-weights

- for each unresolved reference xi .rk
- Determining equations for option-weights
- use CAP to relate all wj’s and connection strengths
- since c(xi, yj) = gij(w), hence wij= fij(w)

- Computing option-weights
- solve the system of equations from Step 2.

- Resolving references
- use the interpretation procedure to resolve weights

Computing connection strength (Step 1)

Computation of c(u,v) consists of two phases

- Phase 1: Discover connections
- all L-short simple paths between u and v
- bottleneck
- optimizations, not in SDM05

- Phase 2: Measure the strength
- in the discovered connections
- many c(u,v) models exist
- we use random walks in graphs model

Measuring connection strength

- Note:
- c(u,v) returns an equations
- because paths can go via various option-edges
- cuv = c(u,v) = guv(w)

Solving the system (Steps 3 and 4)

Step 3: Solve the system of equations

- use a math solver, or
- iterative method (approx. solution ), or
- bounding-interval-based method (tech. report).
Step 4: Interpret option-weights

- to determine the answer for each reference
- pick yj with the largest weight as the answer

Experimental Setup

Parameters

- When looking for L-short simple paths, L = 7
- L is the path-length limit

RealPub dataset:

- CiteSeer + HPSearch
- publications (255K)
- authors (176K)
- organizations (13K)
- departments (25K)

- ground truth is not known
- accuracy...

SynPub datasets:

- many ds of two types
- emulation of RealPub
- publications (5K)
- authors (1K)
- organizations (25K)
- departments (125K)

- ground truth is known

RealMov:

- movies (12K)
- people (22K)
- actors
- directors
- producers

- studious (1K)
- producing
- distributing

Web Disambiguation

- Extract key information such as mentions of entities (persons, names, locations) and other information such as hyperlinks and email addresses from Web pages
- Cast as a relationship analysis problem
- Prototype at: http://opteron.calit2.uci.edu:1977/Diamond/people_search.jsp

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

- Information extraction from text
- Many systems and techniques
- May benefit from semantics
- Limitations
- All or nothing extraction
- Towards probabilistic extraction systems

Leads

- Disambiguation and data cleaning
- Dmitri Kalashnikov, Stella Chen, Rabia Nuray-Turan

- Information extraction
- Naveen Ashish, Sharad Mehrotra

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

- Multi-microphone speech processing
- Speaker identification
- Noise reduction

- Audio-visual speech recognition
- Combine visual features (venemes) with audio

- Speech recognition on light-weight devices
- Team
- Rajesh Hegde, Bhaskar Rao, Shankar Shivappa (UCSD)

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

- Combine views from multiple cameras
- Homomorphic transformations
- Multi-perspective “view-binding”

- Team
- Sangho Park, Mohan Trivedi (UCSD)

Outline

- Overall Goal
- Use examples to illustrate:
- Different approaches in modeling and querying
- Advantage of our approach

- Extracting spatial expression
- Building model for spatial expression
- Experiments
- Conclusion

Overall Goal

Info about events, that constitute a crisis, is often available as text.

reports

...

Goal: Situation Awareness

from Textual Sources

Database

Textual data during crisis

- transcribed
- 911 calls
- first responder communications

Textual data after crisis

- first responders reports
- Internet sources
- for post factum analysis

Motivating Examples

- Two reports filed by first responders after 9/11 attack:
- “…the PAPD Mobile Command Post was located on West St. north of WTC …”
- “…a PAPD Command Truck parked on the west side of Broadway St. and north of Vesey St….”

- Query: Retrieve Events around WTC
- Goal: Both events should be retrieved with high scores attached.

Approach 1: Using IR approach

- Direct Keyword retrieval
- Only one report mentioned keyword “WTC”

- Query expansion
- based on nearby spatial objects
- E.g. Nearby streets and buildings…
- Ad-hoc and Objects might not be bounded

Approach 2: Mapping Using Uncertain Region

- Query : Near WTC

- Report 1:
West St.

north ofWTC

- Report 2:
west side ofBroadway St.andnorth ofVesey St

- Rank based on the ratio of intersection
- Problem: rank score is not accurate based on the uniform assumptions

- Near WTC

Near(WTC)

Our Approach- Step 1: Converting Text to Spatial Expression
- S-expression: has well-defined function form

- West St.
north ofWTC

On(West St.) North(WTC)

- west side ofBroadway St.andnorth ofVesey St

West(Broadway St.) North(Vesey St.)

Our Approach

Step 2: Mapping S-expression to probabilistic density function (PDF)

- Near(A)

On(West St.) North(WTC)

Answering Range Query

- Given a query region
- Retrieve objects based on the degree of belonging

On(West St.) North(WTC)

West(Broadway St.) North(Vesey St.)

- Consider location as a random variable

Advantages of Our Approach

- More explicit spatial mapping remove the needs for keyword expansion (IR approach)
- Probabilistic representation is more formal and accurate than uncertain region (UR) approach
- Decouple the extraction and modeling modules
- Better extraction and modeling modules can be easily plug-in

Extracting Spatial Expression

- Step1: Discovering landmarks
- buildings, roads, intersections

- Step2: Generating s-descriptors
- Use spatial relations to connect the landmarks
- Spatial relations: near, behind, between
- in the format D(L1, L2, ... ,Ln)

- Step3: Generating s-expressions
- compositions of s-descriptors
- near(A) near(B)

Step1: Discovering landmarks

- Markup the text by the landmarks
- Using Gazetteers (Incorporate into information extractor, GATE)
- Note: not only markup the “name”, features also attached

Examples of Landmark

Step2: Generating s-descriptors

- Discover spatial relations around the landmarks
- Dictionary approach (convert spatial relations to potential words)
- Machine learning techniques can also be used

Examples of s-descriptors

Modeling S-expression

- Goal: generating a reasonable probabilistic representation for s-expression
- Step1: Modeling S-descriptors
- Step2: Combining s-descriptors

Modeling S-descriptors

- Modeling templates
- e.g Uniform, Normal distribution

- Using parameter learning techniques

Generating s-expression

- In a s-expression, we assume the s-descriptors are conditional independent.
- If a s-expression has 2 descriptors, S1, S2
- It can be generalized to n descriptors, S1…Sn

Experimental Setup

Domain

- real geographic dataset
- Manhattan, NY, near WTC
- buildings, streets, roads
- 4 4 km2
Data

- Based on 164 reports
- by Police Officers
- participants of 9/11

- s-expressions
- near(A), on(A), outdoor
- intersections, buildings, street

- Construct 2359 pdfs
Queries

- 50 Range Queries

- rdsf

Simulate the Errors

- Extraction Errors:
- With human supervision, error is small.

- Modeling Errors:
- Even with supervision, model parameters can still be away from the ideal settings.
- E.g., the mean and variance settings for the Gaussian model.

- We simulate two types of modeling errors for the analysts:
- Overly confident: estimated model is too “tight”
- By reducing variance of the “ideal” Gaussian model

- Not confident: estimated model is too “loose”
- By increasing variance in the “ideal” Gaussian model

- Overly confident: estimated model is too “tight”

Results

- Event with large errors, probabilistic models are still better than bounding region methods

Conclusions

reports

...

Spatial Awareness

from Textual Sources

Database

Novel in this work

- approach for mapping text to PDF
- query requirements for SA apps
- query design issues

- representation of PDFs

Ongoing work

- database aspects of the problem
- more types of queries
Future work

- more types of queries
- spatio-temporal aspects
- better modeling (text to PDF)

Lead

- Spatial awareness
- Yiming Ma

Graph Analysis

Relationship

Summarization/

Exploration

[Relations]

Multi-dimensional

Analysis

[For Documents]

Graph Pattern-

Based Querying

Ranked Graph

Pattern Matching

SEMANTIC METADATA

D

B

M

S

Semantic Graphs

(Attributed graphs)

Taxonomies

(“Reference

Data”)

Entity-Relationship

Schemas

Ontologies

(“Semantic Models”)

DESCRIBED DATA

Document

Repositories

Relations

&UCI

&dawit

Graph Data Model (Entity-Attribute-Value Model)- Graph (edge sets aka triple sets):
E.g. (&dawit ns:studentAt &UCI)

(&UCI ns:type &university)

(ns:university ns:subClassOf ns:oraganization)

- Two kinds of nodes: object-ids, literals (e.g. integer, string, etc.)
- Blank nodes (e.g. (&dawit :studentAt _)

- Directed edges (aka predicates or properties)
- there exists only one edge with a given label between a pair of nodes

- Two kinds of nodes: object-ids, literals (e.g. integer, string, etc.)
- Symmetric representation of Metadata + data
- Nodes: object classes or link classes
- Links: predicates on classes:
(:studentAt :domain :person)

(:studentAt :range :organization)

(:universty :subclassOf :organization)

- Object identity + relationship identity
- Objects and relationships have unique ids (called URIs)

Graphs for actual data storage - beyond data modeling

- Graphs normally used for conceptual data modeling
- the entity-relationship (ER) model

- What is different ?
- Using graphs for actual (minimally structured) data representation.

- Why ?
- Store/represent and query data without schema
- Symmetrically Store/query both schema (ontology) and data
- Graph traversal based query + reasoning (inference)
- Multi-schema queries on the same graph
- Query unstructured data annotated with taxonomies/ontologies using traditional (structured) query operators

Date

String

(a)

(b)

(c)

Comp.Sc

name

year

title

produces

refersTo

Literal

Literal

Info. Sys.

Info. Sys.

M

O

D

E

L

researcher

publication

Data

price

Interfaces

Interfaces

IR

Encrypt.

editsProc

affiliates

editor

book

book

DB

DB

Data

Struct.

editsBook

proceeding

rating

list_price

book

D. Lib.

Languages

Languages

Online

services

writesBook

Literal

Literal

Systems

Systems

inProceeding

organization

chapter

Literal

Literal

author

writesArticle

org_name

pages

article

String

Multimedia

DB

Multimedia

DB

Distributed

DB

Distributed

DB

String

topic ontology

name

John

I

N

S

T

A

N

C

E

IBM

affiliates

90

writesBook

price

org_name

&r1

&o1

title

“”

affiliates

writesBook

&b1

year

2003

affiliates

&r2

name

writesBook

110

price

LEGEND

Alex

affiliates

UCI

subClassOf/

subPropertyOf

org_name

&b2

1998

writesBook

year

Sara

name

&r3

&o2

rdf:type

100

produces

price

affiliates

&b3

&o organization

&r researcher

&b book

&p proceeding

&a article

1998

&r4

year

&p1

writesArticle

inPRoceeding

&a1

Graph Pattern based Querying

super-class of writesBook

variable

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

triple pattern

queries schema (a)

uses schema (b)

Variable on predicates - matches all applicable predicates

.

.

.

.

.

Graph Pattern based QueryingCONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

&o1

&r1

&b1

90

2003

&o1

&r2

&b1

90

2003

Relation

Graph set

Graph

.

.

Graph Pattern based QueryingCONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

2003

&o1

&r1

&b1

90

&o1

&r1

&b1

90

2003

110

&o1

&r2

&b1

90

&r2

&b2

&o2

1998

2003

&r3

&b3

100

1998

Relation

Graph

&r1

&r1

&b1

&o1

&r2

&b1

&r2

&r2

&o1

&b2

&r3

Enumerative Algebra- Enumerative algebra - algebra over sets of variable bindings

Triple patterns

…

?org :affiliates ?aut

?aut :produces ?b

org

aut

aut

b

Variables

Bindings

(per triple

pattern)

Joinable Bindings – same variable,

same value.

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

&o1

&r3

?org

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

Enumerative Algebra (ctd.)Given two set of bindings T1 and T2, and r denoting a binding:

T1

T2

= {r | r T1 or r T2 }

T1 ⋈

T2

= {r1

r2 | r1 T1 and r T2

and r1 and r2 are joinable}

Enumerative Algebra (ctd.)

- match[P] (G) – matches the graph pattern P to graph G
- Given P = {p1, p2, …, pm}

G

match [P](G) =

match [p1] ⋈

⋈ match [pm]

match [p2] ⋈

…

Sets of sets (tuples) of bindings

Enumerative Algebra (ctd.)

- Other operators:

Difference:

T1 \ T2 = {r T1 | for all r’ T2,

r and r’ are not joinable}

Outer Join:

T1 T2 = (T1 ⋈ T2) ∪ (T1 \ T2)

Filter, (T), evaluate the Boolean condition on T.

E.g. of is: ?p > 100.

Extractive Algebra

Given two graphs G1 and G2, and t denoting a triple :

G1

G2

= {t | t G1 or t G2 }

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

- Matching retains Structure
- More compact Representation during implementation

&o1 :aff

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)˄

- For all t1 G1, either there exists t2 G2 such that t1 and t2 are joinable by p or t1 does not match p1 p.
- For all t2 G2, either there exists t1 G1 such that t2 and t1 are joinable by p or t2 does not match p2 p

G1 ⋈p G2 = {G1

G2 |

where p = (p1,p2), i.e. a pair of triple patterns.

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

⋈((?org :affiliates ?aut),(?aut :produces ?b))

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b :price ?p))

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&b1 :year

2003

&b3 :year

1998

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&o1 :aff

&r1

&b1 :year

2003

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b ?pred ?x))

Extractive Algebra (ctd.)

- extract[P] (G) – matches the graph pattern P
- Given P = {p1, p2, …, pm}

G

˄

extract [P](G) =

match [p1] ⋈

˄

˄

…

match [p2] ⋈

⋈ match [pm]

Graph

Extractive Algebra (ctd.)

- Other operations:

Difference:

G1 \ G2 = {t G1 and t G2}

Filter:

(G) = G \ {t | (t) true}

Implementing Extract – Naïve/Join-split

- As a post-process of enumerative matching
- Do enumerative matching
- Produces a joined relation

- Vertically split join result into triples

- Do enumerative matching
- IO cost: for a pair of triple-sets:
- 2 reads of triple sets +
- 1 write of joined result +
- 2 reads of join result (one for each split/projection) +
- 2 writes of projected result +
- 2 reads of the projected triple sets
- 1 write of unioned result
- Total: 6 reads and 4 writes (4 reads and 3 write if no union).

Implementing Extract – 2-way semi-joins

- Use 2-way semi-joins
- Given two joinable triple sets A and B,

⋃

- IO Cost
- 2 reads of triplesets (first semi-join)
- 1 write of result to union (writes smaller table)
- 2 reads to perform next semijoin (1 read is on smaller table)
- 1 write of result to union
- Total: 4 reads and 2 writes.

B’

A’

A

B

⋈

Implementing Extract – 2-stream operator- Scan each input and produce triples that have at least one match in the other
- Is a high-level operator that can be implemented via:
- Hashing or
- Sort-merge

A’

B’

A

B

Grouping and Aggregation : Flatten-and-Aggregate Approach

- This is how Oracle supports aggregation over graph data !
- Also, [Hung, Deng, and Subrahmanian, ICDE 2005]

SELECT ?org, sum (?p) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

writesBook

affiliates

writesBook

&r1

Group and

Aggregate

Enumerative

Match Results

&o1

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

100

Result: 390.

WRONG !

&r3

Group By

- Should be based on extractive matching (graphs).
- What should group by mean on graphs ?
- Collapse a set of triples into a single triple.
- Use Bag nodes.

Bag

type

writesBook

&b1

:1

:2

Bag

affiliates

type

&r1

writesBook

&o1

:1

affiliates

&b2

affiliates

&r2

writesBook

affiliates

&o2

&b3

type

:1

Bag

&r3

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?aut ON :writesBook

Grouping Target

Grouping Basis

Aggregation

- Two types (modes) of aggregations on graphs
- Branch-wise : aggregate a set of values adjacent to a node type
- Path-wise : aggregate over a path in the graph
- Not discussed here.

- Branch-wise Example :

2003

year

Aggregation

basis

label

Anchor

Mode

90

price

&b1

SELECT ?b, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

1998

year

&b2

price

110

1998

year

&b3

price

100

Aggregation – revisit example

- Anchor and aggregation basis
not adjacent !

Aggregation

basis

label

Anchor

Mode

SELECT ?org, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

price

writesBook

affiliates

writesBook

Optional

&r1

&o1

price

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

price

100

&r3

Aggregation - solution

- RULE: All nodes between anchor and aggregation basis should be bags !
- If anchor and aggregation basis are adjacent, push aggregation into group by.
- Otherwise, iteratively perform graph grouping with edge-propagation making each intermediary node an aggregation target.

Bag

writesBook

90

affiliates

&b1

:2

:1

&r1

&r2

&o1

:1

Bag

affiliates

type

110

writesBook

&b2

affiliates

&o2

type

writesBook

100

&b3

:1

Bag

&r3

Result: &o1, 300.

&o2, 200

Lead

- Dawit Yimam Seid

Graph analysis

GIS

Predictive modeling

Damage assessment

- Ram Hariharan (with Sharad Mehrotra and Chen Li)
- Searching (open source) GIS data and datasets
- Metadata
- Compression

Graph analysis

GIS

Predictive modeling

Damage assessment

- Vibhav Gogate and Jon Hutchinson (with Padhraic Smyth)
- Activity monitoring and prediction
- Anomalous event detection

Graph analysis

GIS

Predictive modeling

Damage assessment

- ImageCat Inc (Ron Eguchi, Charles Huyck)
- INLET, MetaSIM

Disaster Portal

- Many Communities – Many Disaster Portals
- Contents of sites are administered by respective city emergency mgmt.
- Easily customized to meet needs of different communities.
- Regional summarization capabilities built in (eg. county/state level summary view).

- Objectives of the Disaster Portal project are to provide:
- An integrated platform for RESCUE team members to develop, test, and demonstrate their research projects in real-life scenarios.
- Next-generation capabilities to first responders and the public.

- Key development partner:
- City of Ontario

The Disaster Portal is a suite of web applications for disseminating information and providing situational awareness to the general public during a disaster.

Community Deployment of Disaster Portal

- Applications selected from Disaster Portal suite.
- Portal framework providing situation summary page, custom look-and-feel

Included in Ontario Pilot Disaster Portal

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment

SAMIConclusions

- Situational data management
- Semantics
- Synergies
- Integrated demonstration

Thank you !

ashish@ics.uci.edu

Download Presentation

Connecting to Server..