towards situational awareness systems for disaster response
Download
Skip this Video
Download Presentation
Towards situational awareness systems for disaster response

Loading in 2 Seconds...

play fullscreen
1 / 107

Towards situational awareness systems for disaster response - PowerPoint PPT Presentation


  • 330 Views
  • Uploaded on

Towards situational awareness systems for disaster response. Naveen Ashish [email protected] Bell Labs India, Bangalore, 04/23/07. Organization. Introduction to SAMI Selected research areas Technology transition Discussion . RESCUE. The SAMI TEAM Students

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards situational awareness systems for disaster response' - LeeJohn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
organization
Organization
  • Introduction to
  • SAMI
  • Selected research areas
  • Technology transition
  • Discussion
rescue
RESCUE
  • The SAMI TEAM
    • Students
    • Stella Chen, Chaitanya Desai, Vibhav Gogate, Jon Hutchinson,
    • Ram Hariharan, Shengyue Ji, Yiming Ma, Rabia Nuray-Turan,
    • Dawit Seid, Shankar Shivappa
    • Staff
    • Jay Lickfett, Chris Davison
    • Collaborators
    • Charles Huyck, Ron Eguchi, Shubharoop Ghosh
    • Faculty, Scientists and Post-docs
    • Dmitri Kalashnikov, Rajesh Hedge, Sharad Mehrotra, Sangho Park
    • Slide Aggregator (aka Project Leader)
    • Naveen Ashish
  • NSF funded “large-ITR” project
    • Advance information technologies for disaster response
  • 5 year project
    • Oct 2003 to Oct 2008
  • Institutions
    • 6 universities (UCI, UCSD, UIUC, BYU, U-Colorado, U-Maryland) and 1 company (ImageCat)
    • Active and formal community partners
      • City of LA, OCFA, Irvine Police, ….
  • People
    • Director: Sharad Mehrotra
    • ~ 25 researchers and staff, ~40 students
  • Web: http://www.itr-rescue.org
rescue mission
RESCUE Mission

The mission of RESCUE is to enhance the ability of emergency response organizations and the public to mitigate crises, save lives, and prevent secondary and indirect human and economic loss by radically transforming ways in which these organizations gather, process, manage, use and disseminate information during man-made and natural catastrophes.

motivation transform the ability of first responders to mitigate crisis

Response

  • Effectiveness
  • lives & property saved
  • damage prevented
  • cascades avoided
  • Quality of
  • Decisions
  • first responders
  • consequence planners
  • public

Quality &

Timeliness of

Information

  • Situational
  • Awareness
  • incidences
  • resources
  • victims
  • needs
Motivation: Transform the Ability of First Responders to Mitigate Crisis

Observation: Right Information to the Right Person at the Right Time can result in dramatically better response

rescue objectives
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
rescue objectives7
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
rescue objectives8
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
rescue objectives9
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
  • Develop technologies that can be used for timely and customizeddissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.
rescue objectives10
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
  • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.
  • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.
rescue objectives11
RESCUE Objectives
  • Develop technologies to dramatically improve situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.
  • Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).
  • Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.
  • Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.
  • Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.
  • Promote interdisciplinary education at all levels (graduate, undergraduate, K-12) and across diverse student groups to expose the future community of citizens to issues in emergency management and homeland security – an area of global and national importance.
rescue research projects
RESCUE Research Projects
  • SAMI: Situational Awareness from Multi-Modal Input(Project Lead: N. Ashish, UCI)
  • PISA: Policy-driven Information Sharing Architecture (Project Lead: M. Winslett, UIUC)
  • Customized Dissemination in the Large (Project Leads: K. Tierney, UC-B & N. Venkatasubramanian, UCI)
  • Privacy Implications of Technology Adoption (Project Lead: S. Mehrotra, UCI)
  • Robust Networking and Information Collection (Project Lead: BS Manoj, UCSD)
a situational awareness application

Applications

Evacuation Planning

Damage Assessment

Situational Dashboard

Information

Reports Responders News Weather Traffic

Simulations Reconnaissance

System

A Situational Awareness Application
architecture

Situational data management

Analysis

Extraction and synthesis

Architecture

Events as fundamental abstraction units

areas
Areas

Situational awareness systems

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment

extraction and synthesis
Extraction and Synthesis

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

why do we need data cleaning
Why do we need “Data Cleaning”?

An actual excerpt from a person’s CV

  • sanitized for privacy
  • quite common in CVs, etc
  • this particular person
    • argues he is good
    • because his work is well-cited
  • but, there is a problem with using CiteSeer ranking
    • in general, it is not valid (in CVs)
    • let’s see why...

“... In June 2004, I was listed as the 1000th most cited author in computer science (of 100,000 authors) by CiteSeer, available at

http://citeseer.nj.nec.com/allcited.html. ...”

what is the problem in the example
What is the problem in the example?

Suspicious entries

  • Let us go to the DBLP website
    • which stores bibliographic entries of many CS authors
  • Let us check who are
    • “A. Gupta”
    • “L. Zhang”

CiteSeer: the top-k most cited authors

DBLP

DBLP

comparing raw and cleaned citeseer
Comparing raw and cleaned CiteSeer

Cleaned CiteSeer top-k

CiteSeer top-k

what is the lesson
What is the lesson?
  • data should be cleaned first
  • e.g., determine the (unique) real authors of publications
  • solving such challenges is not always “easy”
  • that explains a large body of work on data cleaning
  • note
    • CiteSeer is aware of the problem with its ranking
    • there are more issues with CiteSeer
    • many not related to data cleaning

“Garbage in, garbage out” principle:

Making decisions based on bad data, can lead to wrong results.

what is reference disambiguation
What is “Reference Disambiguation”?

?

Author table (clean)

Publication table (to be cleaned)

A1, ‘Dave White’, ‘Intel’

A2, ‘Don White’, ‘CMU’

A3, ‘Susan Grey’, ‘MIT’

A4, ‘John Black’, ‘MIT’

A5, ‘Joe Brown’, unknown

A6, ‘Liz Pink’, unknown

P1, ‘Databases . . . ’, ‘John Black’, ‘Don White’

P2, ‘Multimedia . . . ’, ‘Sue Grey’, ‘D. White’

P3, ‘Title3 . . .’, ‘Dave White’

P4, ‘Title5 . . .’, ‘Don White’, ‘Joe Brown’

P5, ‘Title6 . . .’, ‘Joe Brown’, ‘Liz Pink’

P6, ‘Title7 . . . ’, ‘Liz Pink’, ‘D. White’

  • Analysis(‘D. White’ in P2, our approach):
  • 1. ‘Don White’
    • has a paper with ‘John Black’@MIT
  • 2. ‘Dave White’
    • is not connected to MIT in any way
  • 3. ‘Sue Grey’
    • is coauthor of P2 too, and @ MIT
  • Thus: ‘D. White’ in P2 is probably Don
  • (since we know he collaborates with MIT ppl.)
  • Analysis (‘D. White’ in P6, our approach):
  • 1. ‘Don White’
    • has a paper (P4) with Joe Brown;
    • Joe has a paper (P5) with Liz Pink;
    • Liz Pink is a coauthor of P6.
  • 2. ‘Dave White’
    • does not have papers with Joe or Liz
  • Thus: ‘D. White’ in P6 is probably Don
  • (since co-author networks often form clusters)
attributed relational graph arg
Attributed Relational Graph (ARG)
  • View dataset as a graph
    • nodes for entities
      • papers, authors, organizations
      • e.g., P2, Susan, MIT
    • edges for relationships
      • “writes”, “affiliated with”
      • e.g. Susan → P2 (“writes”)
  • “Choice” nodes
    • for uncertain relationships
    • mutual exclusion
    • “1” and “2” in the figure
  • Analysis can be viewed as
    • application of the “Context AP”
    • to this graph
    • defined next...

Q: How come domain-independent?

context attraction principle cap
Context Attraction Principle (CAP)

publication P1

“J. Smith”

if

  • reference r, made in the context of entity x, refers to an entity yj
  • but, the description, provided by r, matches multiple entities: y1,…,yj,…,yN,

then

  • x and yj are likely to be more strongly connected to each other via chains of relationships
      • than x and yk (k = 1, 2, … , N; k j).

John E. Smith

SSN = 123

P1

John E. Smith

Jane Smith

Joe A. Smith

  • In designing the RelDC approach
    • - our goal was to use CAP as an axiom
    • - then solve problem formally, without heuristics
analyzing paths linking entities and contexts
Analyzing paths: linking entities and contexts

D. White is a reference

  • in the context of P2, P6
  • can link P2, P6 to Don
  • cannot link P2, P6 to Dave
  • more complex paths in general
  • Analysis(‘D. White’ in P2): path P2→Don
  • 1. ‘Don White’
    • has a paper with ‘John Black’@MIT
  • 2. ‘Dave White’
    • is not connected to MIT in any way
  • 3. ‘Sue Grey’
    • is coauthor of P1 too, and @ MIT
  • Thus: ‘D. White’ is probably Don White
  • Analysis(‘D. White’ in P6): path P6→Don
  • 1. ‘Don White’
    • has a paper (P4) with Joe Brown;
    • Joe has a paper (P5) with Liz Pink;
    • Liz Pink is a coauthor of P6.
  • 2. ‘Dave White’
    • does not have papers with Joe or Liz
  • Thus: ‘D. White’ is probably Don White
questions to answer
Does the CAP principle hold over real datasets?

That is, if we disambiguate references based on it, will the references be correctly disambiguated?

Can we design a generic solution to exploiting relationships for disambiguation?

Questions to answer
problem formalization
Problem formalization

the name of k-th author of paper xi, e.g. ‘J. Smith’

the truek-th author of paper xi

‘John A. Smith’, ‘Jane B. Smith’, ...

entity relationship graph
Entity-Relationship Graph

RelDC views dataset as a graph

  • undirected
  • nodes for entities
    • don’t have weights
  • edges for relationships
    • have weights
    • real number in [0,1]
    • the confidence the relationship exists

“J. Smith”

“John Smith”

P1

Handling References: Linking

(references correspond to relationships)

if|CS[xi .rk]| = 1then

  • we know the answer d[xi .rk]
  • link xi and d[xi .rk] directly, w = 1

else

  • the answer is uncertain for xi .rk
  • create a “choice” node, link it
  • “option-weights”, w1 + ... + wN= 1
  • option-weights are variables

“Jane Smith”

objective of reference disambiguation
Objective of Reference Disambiguation

Definition:

To resolve a reference xi .rk means

  • to pick one yj from CS[xi .rk] as d[xi .rk].

Graph interpretation

  • among w1, w2, ... , wN, assign wj= 1 to onewj
  • means yj is chosen as the answer d[xi .rk]

Definition:

Reference xi .rk is resolved correctly, if the chosen yj =d[xi .rk].

Definition:

Reference xi .rk is unresolved or uncertain, if not yet resolved...

Goal:

Resolve all uncertain references as correctly as possible.

formalizing the cap
Formalizing the CAP

CAP

  • is based on “connection strength”
  • c(u,v) for entities u and v
    • measures how strongly u and v are connected to each other via relationships
    • e.g. c(u,v) > c(u,z) in the figure
  • will formalize c(u,v) later

Context Attraction Principle (CAP)

ifc(xi, yj) ≥ c(xi, yk)

thenwj≥ wk(most of the time)

We use proportionality:

c(xi, yj) ∙ wk = c(xi, yk) ∙ wj

reldc approach
RelDC approach

Input: the ARG for the dataset

  • Computing connection strengths
    • for each unresolved reference xi .rk
      • determine equations for all (i.e., N) c(xi, yj)’s
      • c(xi, yj) = gij(w)
        • a function of other option-weights
  • Determining equations for option-weights
    • use CAP to relate all wj’s and connection strengths
    • since c(xi, yj) = gij(w), hence wij= fij(w)
  • Computing option-weights
    • solve the system of equations from Step 2.
  • Resolving references
    • use the interpretation procedure to resolve weights
computing connection strength step 1
Computing connection strength (Step 1)

Computation of c(u,v) consists of two phases

  • Phase 1: Discover connections
    • all L-short simple paths between u and v
    • bottleneck
    • optimizations, not in SDM05
  • Phase 2: Measure the strength
    • in the discovered connections
    • many c(u,v) models exist
    • we use random walks in graphs model
measuring connection strength
Measuring connection strength
  • Note:
    • c(u,v) returns an equations
    • because paths can go via various option-edges
    • cuv = c(u,v) = guv(w)
equations for option weights step 2
Equations for option-weights (Step 2)

CAP (proportionality):

System (over-constrained):

Add slack:

solving the system steps 3 and 4
Solving the system (Steps 3 and 4)

Step 3: Solve the system of equations

  • use a math solver, or
  • iterative method (approx. solution ), or
  • bounding-interval-based method (tech. report).

Step 4: Interpret option-weights

  • to determine the answer for each reference
  • pick yj with the largest weight as the answer
experimental setup
Experimental Setup

Parameters

  • When looking for L-short simple paths, L = 7
  • L is the path-length limit

RealPub dataset:

  • CiteSeer + HPSearch
    • publications (255K)
    • authors (176K)
    • organizations (13K)
    • departments (25K)
  • ground truth is not known
    • accuracy...

SynPub datasets:

  • many ds of two types
  • emulation of RealPub
    • publications (5K)
    • authors (1K)
    • organizations (25K)
    • departments (125K)
  • ground truth is known

RealMov:

  • movies (12K)
  • people (22K)
    • actors
    • directors
    • producers
  • studious (1K)
    • producing
    • distributing
sample publication data
Sample Publication Data

CiteSeer: publication records

HPSearch: author records

efficiency and long paths
Efficiency and Long paths

Non-exponential cost

Longer paths do help

web disambiguation
Web Disambiguation

Music Composer

Football Player

UCSD Professor

Comedian

Botany Professor @ Idaho

web disambiguation42
Web Disambiguation
  • Extract key information such as mentions of entities (persons, names, locations) and other information such as hyperlinks and email addresses from Web pages
  • Cast as a relationship analysis problem
  • Prototype at: http://opteron.calit2.uci.edu:1977/Diamond/people_search.jsp
slide43

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Information extraction from text
  • Many systems and techniques
  • May benefit from semantics
  • Limitations
    • All or nothing extraction
    • Towards probabilistic extraction systems
leads
Leads
  • Disambiguation and data cleaning
    • Dmitri Kalashnikov, Stella Chen, Rabia Nuray-Turan
  • Information extraction
    • Naveen Ashish, Sharad Mehrotra
slide45

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Multi-microphone speech processing
    • Speaker identification
    • Noise reduction
  • Audio-visual speech recognition
    • Combine visual features (venemes) with audio
  • Speech recognition on light-weight devices
  • Team
    • Rajesh Hegde, Bhaskar Rao, Shankar Shivappa (UCSD)
slide46

Extraction and Synthesis

Semantic extraction

from text

Audio event

extraction

Visual event

extraction

  • Combine views from multiple cameras
  • Homomorphic transformations
    • Multi-perspective “view-binding”
  • Team
    • Sangho Park, Mohan Trivedi (UCSD)
situational data management
Situational Data Management

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware

outline
Outline
  • Overall Goal
  • Use examples to illustrate:
    • Different approaches in modeling and querying
    • Advantage of our approach
  • Extracting spatial expression
  • Building model for spatial expression
  • Experiments
  • Conclusion
overall goal
Overall Goal

Info about events, that constitute a crisis, is often available as text.

reports

...

Goal: Situation Awareness

from Textual Sources

Database

Textual data during crisis

  • transcribed
    • 911 calls
    • first responder communications

Textual data after crisis

  • first responders reports
  • Internet sources
  • for post factum analysis
motivating examples
Motivating Examples
  • Two reports filed by first responders after 9/11 attack:
    • “…the PAPD Mobile Command Post was located on West St. north of WTC …”
    • “…a PAPD Command Truck parked on the west side of Broadway St. and north of Vesey St….”
  • Query: Retrieve Events around WTC
  • Goal: Both events should be retrieved with high scores attached.
approach 1 using ir approach
Approach 1: Using IR approach
  • Direct Keyword retrieval
    • Only one report mentioned keyword “WTC”
  • Query expansion
    • based on nearby spatial objects
    • E.g. Nearby streets and buildings…
    • Ad-hoc and Objects might not be bounded
approach 2 mapping using uncertain region
Approach 2: Mapping Using Uncertain Region
  • Query : Near WTC
  • Report 1:

West St.

north ofWTC

  • Report 2:

west side ofBroadway St.andnorth ofVesey St

  • Rank based on the ratio of intersection
  • Problem: rank score is not accurate based on the uniform assumptions
our approach

Near WTC

Near(WTC)

Our Approach
  • Step 1: Converting Text to Spatial Expression
    • S-expression: has well-defined function form
  • West St.

north ofWTC

On(West St.) North(WTC)

  • west side ofBroadway St.andnorth ofVesey St

West(Broadway St.) North(Vesey St.)

our approach54
Our Approach

Step 2: Mapping S-expression to probabilistic density function (PDF)

  • Near(A)

On(West St.) North(WTC)

answering range query
Answering Range Query
  • Given a query region
    • Retrieve objects based on the degree of belonging

On(West St.) North(WTC)

West(Broadway St.) North(Vesey St.)

  • Consider location as a random variable
advantages of our approach
Advantages of Our Approach
  • More explicit spatial mapping remove the needs for keyword expansion (IR approach)
  • Probabilistic representation is more formal and accurate than uncertain region (UR) approach
  • Decouple the extraction and modeling modules
    • Better extraction and modeling modules can be easily plug-in
extracting spatial expression
Extracting Spatial Expression
  • Step1: Discovering landmarks
    • buildings, roads, intersections
  • Step2: Generating s-descriptors
    • Use spatial relations to connect the landmarks
    • Spatial relations: near, behind, between
    • in the format D(L1, L2, ... ,Ln)
  • Step3: Generating s-expressions
    • compositions of s-descriptors
    • near(A)  near(B)
step1 discovering landmarks
Step1: Discovering landmarks
  • Markup the text by the landmarks
    • Using Gazetteers (Incorporate into information extractor, GATE)
    • Note: not only markup the “name”, features also attached

Examples of Landmark

step2 generating s descriptors
Step2: Generating s-descriptors
  • Discover spatial relations around the landmarks
    • Dictionary approach (convert spatial relations to potential words)
    • Machine learning techniques can also be used

Examples of s-descriptors

modeling s expression
Modeling S-expression
  • Goal: generating a reasonable probabilistic representation for s-expression
  • Step1: Modeling S-descriptors
  • Step2: Combining s-descriptors
modeling s descriptors
Modeling S-descriptors
  • Modeling templates
    • e.g Uniform, Normal distribution
  • Using parameter learning techniques
generating s expression
Generating s-expression
  • In a s-expression, we assume the s-descriptors are conditional independent.
  • If a s-expression has 2 descriptors, S1, S2
  • It can be generalized to n descriptors, S1…Sn
generating s expression63

Outdoor()

  • Near(A)
  • Outdoor()

Near(WTC)

Generating s-expression
experimental setup64
Experimental Setup

Domain

  • real geographic dataset
  • Manhattan, NY, near WTC
  • buildings, streets, roads
  • 4  4 km2

Data

  • Based on 164 reports
    • by Police Officers
    • participants of 9/11
  • s-expressions
    • near(A), on(A), outdoor
    • intersections, buildings, street
  • Construct 2359 pdfs

Queries

  • 50 Range Queries
  • rdsf
simulate the errors
Simulate the Errors
  • Extraction Errors:
    • With human supervision, error is small.
  • Modeling Errors:
    • Even with supervision, model parameters can still be away from the ideal settings.
    • E.g., the mean and variance settings for the Gaussian model.
  • We simulate two types of modeling errors for the analysts:
    • Overly confident: estimated model is too “tight”
      • By reducing variance of the “ideal” Gaussian model
    • Not confident: estimated model is too “loose”
      • By increasing variance in the “ideal” Gaussian model
results
Results
  • Event with large errors, probabilistic models are still better than bounding region methods
conclusions
Conclusions

reports

...

Spatial Awareness

from Textual Sources

Database

Novel in this work

  • approach for mapping text to PDF
  • query requirements for SA apps
    • query design issues
  • representation of PDFs

Ongoing work

  • database aspects of the problem
    • more types of queries

Future work

  • spatio-temporal aspects
  • better modeling (text to PDF)
slide68
Lead
  • Spatial awareness
    • Yiming Ma
slide69

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware

slide70

Situational Data Management

Spatial Indexing

Event data model

SAT-Ware

analysis
Analysis

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

graph analysis
Graph Analysis

Relationship

Summarization/

Exploration

[Relations]

Multi-dimensional

Analysis

[For Documents]

Graph Pattern-

Based Querying

Ranked Graph

Pattern Matching

SEMANTIC METADATA

D

B

M

S

Semantic Graphs

(Attributed graphs)

Taxonomies

(“Reference

Data”)

Entity-Relationship

Schemas

Ontologies

(“Semantic Models”)

DESCRIBED DATA

Document

Repositories

Relations

graph data model entity attribute value model

ns:studentAt

&UCI

&dawit

Graph Data Model (Entity-Attribute-Value Model)
  • Graph (edge sets aka triple sets):

E.g. (&dawit ns:studentAt &UCI)

(&UCI ns:type &university)

(ns:university ns:subClassOf ns:oraganization)

    • Two kinds of nodes: object-ids, literals (e.g. integer, string, etc.)
      • Blank nodes (e.g. (&dawit :studentAt _)
    • Directed edges (aka predicates or properties)
      • there exists only one edge with a given label between a pair of nodes
  • Symmetric representation of Metadata + data
    • Nodes: object classes or link classes
    • Links: predicates on classes:

(:studentAt :domain :person)

(:studentAt :range :organization)

(:universty :subclassOf :organization)

  • Object identity + relationship identity
    • Objects and relationships have unique ids (called URIs)
graphs for actual data storage beyond data modeling
Graphs for actual data storage - beyond data modeling
  • Graphs normally used for conceptual data modeling
    • the entity-relationship (ER) model
  • What is different ?
    • Using graphs for actual (minimally structured) data representation.
  • Why ?
    • Store/represent and query data without schema
    • Symmetrically Store/query both schema (ontology) and data
    • Graph traversal based query + reasoning (inference)
    • Multi-schema queries on the same graph
    • Query unstructured data annotated with taxonomies/ontologies using traditional (structured) query operators
slide75

String

Date

String

(a)

(b)

(c)

Comp.Sc

name

year

title

produces

refersTo

Literal

Literal

Info. Sys.

Info. Sys.

M

O

D

E

L

researcher

publication

Data

price

Interfaces

Interfaces

IR

Encrypt.

editsProc

affiliates

editor

book

book

DB

DB

Data

Struct.

editsBook

proceeding

rating

list_price

book

D. Lib.

Languages

Languages

Online

services

writesBook

Literal

Literal

Systems

Systems

inProceeding

organization

chapter

Literal

Literal

author

writesArticle

org_name

pages

article

String

Multimedia

DB

Multimedia

DB

Distributed

DB

Distributed

DB

String

topic ontology

name

John

I

N

S

T

A

N

C

E

IBM

affiliates

90

writesBook

price

org_name

&r1

&o1

title

“”

affiliates

writesBook

&b1

year

2003

affiliates

&r2

name

writesBook

110

price

LEGEND

Alex

affiliates

UCI

subClassOf/

subPropertyOf

org_name

&b2

1998

writesBook

year

Sara

name

&r3

&o2

rdf:type

100

produces

price

affiliates

&b3

&o organization

&r researcher

&b book

&p proceeding

&a article

1998

&r4

year

&p1

writesArticle

inPRoceeding

&a1

graph pattern based querying
Graph Pattern based Querying

super-class of writesBook

variable

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

triple pattern

queries schema (a)

uses schema (b)

Variable on predicates - matches all applicable predicates

graph pattern based querying77

.

.

.

.

.

.

Graph Pattern based Querying

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

&o1

&r1

&b1

90

2003

&o1

&r2

&b1

90

2003

Relation

Graph set

Graph

graph pattern based querying78

.

.

.

Graph Pattern based Querying

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

SELECT *

WHERE { ?org :affiliates ?aut .

?aut :produces ?b .

?b :type :book .

?b :price ?p .

?b ?pred ?x . }

Extractive

Semantics

Enumerative

Semantics

2003

&o1

&r1

&b1

90

&o1

&r1

&b1

90

2003

110

&o1

&r2

&b1

90

&r2

&b2

&o2

1998

2003

&r3

&b3

100

1998

Relation

Graph

enumerative algebra

&o1

&r1

&r1

&b1

&o1

&r2

&b1

&r2

&r2

&o1

&b2

&r3

Enumerative Algebra
  • Enumerative algebra - algebra over sets of variable bindings

Triple patterns

?org :affiliates ?aut

?aut :produces ?b

org

aut

aut

b

Variables

Bindings

(per triple

pattern)

Joinable Bindings – same variable,

same value.

enumerative algebra ctd

?org

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

&o1

&r3

?org

?aut

?b

&01

&r1

&b1

&01

&r2

&b1

&o1

&r2

&b2

Enumerative Algebra (ctd.)

Given two set of bindings T1 and T2, and r denoting a binding:

T1

T2

= {r | r  T1 or r  T2 }

T1 ⋈

T2

= {r1

r2 | r1  T1 and r  T2

and r1 and r2 are joinable}

enumerative algebra ctd81
Enumerative Algebra (ctd.)
  • match[P] (G) – matches the graph pattern P to graph G
    • Given P = {p1, p2, …, pm}

G

match [P](G) =

match [p1] ⋈

⋈ match [pm]

match [p2] ⋈

Sets of sets (tuples) of bindings

enumerative algebra ctd82
Enumerative Algebra (ctd.)
  • Other operators:

Difference:

T1 \ T2 = {r  T1 | for all r’  T2,

r and r’ are not joinable}

Outer Join:

T1 T2 = (T1 ⋈ T2) ∪ (T1 \ T2)

Filter, (T), evaluate the Boolean condition on T.

E.g. of  is: ?p > 100.

extractive algebra
Extractive Algebra

Given two graphs G1 and G2, and t denoting a triple :

G1

G2

= {t | t  G1 or t  G2 }

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

  • Matching retains Structure
  • More compact Representation during implementation

&o1 :aff

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

extractive algebra ctd

&o1 :aff

&r1

&o1 :aff

&r2

&o1 “aff

&r3

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

˄

  • For all t1  G1, either there exists t2  G2 such that t1 and t2 are joinable by p or t1 does not match p1  p.
  • For all t2  G2, either there exists t1  G1 such that t2 and t1 are joinable by p or t2 does not match p2  p

G1 ⋈p G2 = {G1

G2 |

where p = (p1,p2), i.e. a pair of triple patterns.

?org :affiliates ?aut

?aut :produces ?b

&o1 :aff

&r1

&r1 :prod

&b1

&o1 :aff

&r2

&r2 :prod

&b1

&o1 “aff

&r3

&r2 :prod

&b2

⋈((?org :affiliates ?aut),(?aut :produces ?b))

extractive algebra ctd85

&b1 :price

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b :price ?p))

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&b1 :year

2003

&b3 :year

1998

extractive algebra ctd86

&b1 :price

90

&b3 :price

110

&b1 :year

2003

&b3 :year

1998

&o1 :aff

&r1

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&b1 :price

90

&o1 :aff

&r1

&b1 :year

2003

&o1 :aff

&r2

&r1 :prod

&b1

&r2 :prod

&b1

&r2 :prod

&b2

Extractive Algebra (ctd.)

?org :affiliates ?aut .

?aut :produces ?b

?b :price ?p .

?b ?pred ?x

⋈((?aut :produces ?b),(?b ?pred ?x))

extractive algebra ctd87
Extractive Algebra (ctd.)
  • extract[P] (G) – matches the graph pattern P
    • Given P = {p1, p2, …, pm}

G

˄

extract [P](G) =

match [p1] ⋈

˄

˄

match [p2] ⋈

⋈ match [pm]

Graph

extractive algebra ctd88
Extractive Algebra (ctd.)
  • Other operations:

Difference:

G1 \ G2 = {t  G1 and t  G2}

Filter:

(G) = G \ {t |  (t)  true}

implementing extract na ve join split
Implementing Extract – Naïve/Join-split
  • As a post-process of enumerative matching
    • Do enumerative matching
      • Produces a joined relation
    • Vertically split join result into triples
  • IO cost: for a pair of triple-sets:
    • 2 reads of triple sets +
    • 1 write of joined result +
    • 2 reads of join result (one for each split/projection) +
    • 2 writes of projected result +
    • 2 reads of the projected triple sets
    • 1 write of unioned result
    • Total: 6 reads and 4 writes (4 reads and 3 write if no union).
implementing extract 2 way semi joins
Implementing Extract – 2-way semi-joins
  • Use 2-way semi-joins
    • Given two joinable triple sets A and B,

  • IO Cost
    • 2 reads of triplesets (first semi-join)
    • 1 write of result to union (writes smaller table)
    • 2 reads to perform next semijoin (1 read is on smaller table)
    • 1 write of result to union
    • Total: 4 reads and 2 writes.

B’

A’

A

B

implementing extract 2 stream operator

˄

Implementing Extract – 2-stream operator
  • Scan each input and produce triples that have at least one match in the other
  • Is a high-level operator that can be implemented via:
    • Hashing or
    • Sort-merge

A’

B’

A

B

grouping and aggregation flatten and aggregate approach
Grouping and Aggregation : Flatten-and-Aggregate Approach
  • This is how Oracle supports aggregation over graph data !
  • Also, [Hung, Deng, and Subrahmanian, ICDE 2005]

SELECT ?org, sum (?p) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

writesBook

affiliates

writesBook

&r1

Group and

Aggregate

Enumerative

Match Results

&o1

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

100

Result: 390.

WRONG !

&r3

group by
Group By
  • Should be based on extractive matching (graphs).
  • What should group by mean on graphs ?
    • Collapse a set of triples into a single triple.
    • Use Bag nodes.

Bag

type

writesBook

&b1

:1

:2

Bag

affiliates

type

&r1

writesBook

&o1

:1

affiliates

&b2

affiliates

&r2

writesBook

affiliates

&o2

&b3

type

:1

Bag

&r3

CONSTRUCT *

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?aut ON :writesBook

Grouping Target

Grouping Basis

aggregation
Aggregation
  • Two types (modes) of aggregations on graphs
    • Branch-wise : aggregate a set of values adjacent to a node type
    • Path-wise : aggregate over a path in the graph
      • Not discussed here.
  • Branch-wise Example :

2003

year

Aggregation

basis

label

Anchor

Mode

90

price

&b1

SELECT ?b, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

1998

year

&b2

price

110

1998

year

&b3

price

100

aggregation revisit example
Aggregation – revisit example
  • Anchor and aggregation basis

not adjacent !

Aggregation

basis

label

Anchor

Mode

SELECT ?org, branch sum (:price) as totalPrice

WHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .

?b :price ?p }

GROUP BY ?org

&b1

90

price

writesBook

affiliates

writesBook

Optional

&r1

&o1

price

writesBook

affiliates

110

&b2

affiliates

&r2

affiliates

&o2

writesBook

&b3

price

100

&r3

aggregation solution
Aggregation - solution
  • RULE: All nodes between anchor and aggregation basis should be bags !
    • If anchor and aggregation basis are adjacent, push aggregation into group by.
    • Otherwise, iteratively perform graph grouping with edge-propagation making each intermediary node an aggregation target.

Bag

writesBook

90

affiliates

&b1

:2

:1

&r1

&r2

&o1

:1

Bag

affiliates

type

110

writesBook

&b2

affiliates

&o2

type

writesBook

100

&b3

:1

Bag

&r3

Result: &o1, 300.

&o2, 200

slide97
Lead
  • Dawit Yimam Seid
slide98

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • Ram Hariharan (with Sharad Mehrotra and Chen Li)
  • Searching (open source) GIS data and datasets
    • Metadata
    • Compression
slide99

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • Vibhav Gogate and Jon Hutchinson (with Padhraic Smyth)
  • Activity monitoring and prediction
  • Anomalous event detection
slide100

Analysis and Visualization

Graph analysis

GIS

Predictive modeling

Damage assessment

  • ImageCat Inc (Ron Eguchi, Charles Huyck)
  • INLET, MetaSIM
disaster portal
Disaster Portal
  • Many Communities – Many Disaster Portals
    • Contents of sites are administered by respective city emergency mgmt.
    • Easily customized to meet needs of different communities.
    • Regional summarization capabilities built in (eg. county/state level summary view).
  • Objectives of the Disaster Portal project are to provide:
    • An integrated platform for RESCUE team members to develop, test, and demonstrate their research projects in real-life scenarios.
    • Next-generation capabilities to first responders and the public.
  • Key development partner:
    • City of Ontario

The Disaster Portal is a suite of web applications for disseminating information and providing situational awareness to the general public during a disaster.

community deployment of disaster portal
Community Deployment of Disaster Portal
  • Applications selected from Disaster Portal suite.
  • Portal framework providing situation summary page, custom look-and-feel
slide105

Disaster Portal

Included in Ontario Pilot Disaster Portal

slide106

Situational awareness systems

Extraction and synthesis

Data management

Analysis

graph analysis

semantic extraction

from text

geospatial

audio-visual

extraction

E event model

SAT-ware

predictive modeling

spatial indexing

damage assessment

SAMI
conclusions107
Conclusions
  • Situational data management
  • Semantics
  • Synergies
  • Integrated demonstration

Thank you !

[email protected]

ad