- By
**shing** - Follow User

- 160 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Mining for Spatial Patterns' - shing

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Army Relevance and Collaborations

Mining for Spatial Patterns

Shashi Shekhar

Department of Computer Science

University of Minnesota http://www.cs.umn.edu/~shekhar

Collaborators: V. Kumar, G. Karypis, C.T. Lu, W. Wu, Y. Huang, V. Raju, P. Zhang, P. Tan, M. Steinbach

This work was partially funded by NASA and Army High Performance Computing Center

Mining For Spatial Patterns

Spatial Data Mining(SDM) - Examples

- Historical Examples:
- London Asiatic Cholera 1854 (Griffith)
- Dental health and fluoride in water, Colorado early 1900s
- Current Examples:
- Cancer clusters (CDC), Spread of disease (e.g. Nile virus)
- Crime hotspots (NIJ CML, police petrol planning)
- Environmental justice (EPA), fair lending practices
- Upcoming Applications: Location aware services
- Defense: Sensor networks, Mobile ad-hoc networks
- Civilian: Mortgage PMI determination based on location

Mining For Spatial Patterns

Army Relevance of SDM

- Strategic
- Predicting global hot spots (FORMID)
- Army land: endangered species vs. training and war games
- Search for local trends in massive simulation data
- Critical infra-structure defense (threat assessment)
- Tactical
- Inferring enemy tactics (e.g. flank attack) from blobology
- Detection of lost ammunition dumps (Dr. Radhakrishnan)
- Operational
- Interpretation of maps: map matching (locating oneself on map)
- identify terrain feature, e.g. ravines, valleys, ridge, etc.
- Locating enemy (e.g. sniper in a haystack, sensor networks)
- Avoiding friendly fire

Mining For Spatial Patterns

Spatial Data Mining(SDM) - Definition

- Search of implicit, interesting patterns in geo-spatial data
- Ex. Reconnaissance, Vector maps(NIMA, TEC), GPS, Sensor networks
- Data Mining vs. Statistics:
- Primary vs. Secondary analysis
- Global vs. local trends
- Spatial Data Mining vs. Data Mining:
- Spatial Autocorrelation
- Continuous vs. Discrete data types

Mining For Spatial Patterns

Background

- Spatial Data Mining
- Spatial statistics in Geology, Regional Economics
- NSF workshop on GIS and DM (3/99)
- NSF workshop on spatial data analysis (5/02)
- Spatial patterns:
- Spatial outliers
- Location prediction
- Associations, colocations
- Hotspots, Clustering, trends, …

Mining For Spatial Patterns

Framework

- 2 Approaches to mining Spatial Data
- 1. Pick spatial features; use classical DM methods
- 2. Use novel data mining techniques
- Our Approach:
- Define the problem: capture special needs
- Explore data using maps, other visualization
- Try reusing classical DM methods
- If classical DM perform poorly, try new methods
- Evaluate chosen methods rigourously
- Performance tuning if needed

Mining For Spatial Patterns

Spatial Association Rule

- Citation: Symp. On Spatial Databases 2001
- Problem: Given a set of boolean spatial features
- find subsets of co-located features, e.g. (fire, drought, vegetation)
- Data - continuous space, partition not natural, no reference feature
- Classical data mining approach: association rules
- But, Look Ma! No Transactions!!! No support measure!
- Approach: Work with continuous data without transactionizing it!
- confidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)]
- support: cardinality of spatial join of instances of fire, drought, dry veg.
- participation: min. fraction of instances of a features in join result
- new algorithm using spatial joins and apriori_gen filters

Mining For Spatial Patterns

Event Definition

- Convert the time series into sequence of events at each spatial location.

Mining For Spatial Patterns

Interesting Association Patterns

- Use domain knowledge to eliminate uninteresting patterns.
- A pattern is less interesting if it occurs at random locations.
- Approach:
- Partition the land area into distinct groups (e.g., based on land-cover type).
- For each pattern, find the regions for which the pattern can be applied.
- If the pattern occurs mostly in a certain group of land areas, then it is potentially interesting.
- If the pattern occurs frequently in all groups of land areas, then it is less interesting.

Mining For Spatial Patterns

Association Rules

FPAR-Hi NPP-Hi (support 10)

Shrubland regions

- Intra-zone non-sequential Patterns

- Region corresponds to semi-arid grasslands, a type of vegetation, which is able to quickly take advantage of high precipitation than forests.
- Hypothesis: FPAR-Hi events could be related to unusual precipitation conditions.

Mining For Spatial Patterns

Co-location

Can you find co-location patterns from the following sample dataset?

Answers: and

Mining For Spatial Patterns

Co-location

Spatial Co-location

A set of features frequently co-located

Given

A set T of K boolean spatial feature types T={f1,f2, … , fk}

A set P of N locations P={p1, …, pN } in a spatial frame work S, pi P is of some spatial feature in T

A neighbor relation R over locations in S

Find

Tc = subsets of T frequently co-located

Objective

Correctness

Completeness

Efficiency

Constraints

R is symmetric and reflexive

Monotonic prevalence measure

Reference Feature Centric

Window Centric

Event Centric

Mining For Spatial Patterns

Co-location

Comparison with association rules

Participation index

Participation ratio pr(fi, c) of feature fi in co-location c = {f1, f2, …, fk}: fraction of instances of fi with

feature {f1, …, fi-1, fi+1, …, fk} nearby 2.Participation index = min{pr(fi, c)}

Algorithm

Hybrid Co-location Miner

Mining For Spatial Patterns

Spatial Co-location Patterns

- Dataset

- Spatial feature A,B,C and their instances
- Possible associations are (A, B), (B, C), etc.
- Neighbor relationship includes following pairs:
- A1, B1
- A2, B1
- A2, B2
- B1, C1
- B2, C2

Mining For Spatial Patterns

Spatial Co-location Patterns

- Dataset

- Partition approach[Yasuhiko, KDD 2001]
- Support not well defined,i.e. not independent of execution trace
- Has a fast heuristic which is hard to analyze for correctness/completeness

Spatial feature A,B, C,

and their instances

Support A,B=1 B,C=2

Support A,B =2 B,C=2

Mining For Spatial Patterns

Spatial Co-location Patterns

- Dataset

- Reference feature approach [Han SSD 95]
- C as reference feature to get transactions
- Transactions: (B1) (B2)
- Support (A,B) = Ǿ from Apriori algorithm
- Note: Neighbor relationship includes following pairs:
- A1, B1
- A2, B1
- A2, B2
- B1, C1
- B2, C2

Spatial feature A,B, C,

and their instances

Mining For Spatial Patterns

Spatial Co-location Patterns

- Dataset

- Our approach (Event Centric)
- Neighborhood instead of transactions
- Spatial join on neighbor relationship
- Support Prevalence
- Participation index = min. p_ratio
- P_ratio(A, (A,B)) = fraction of instance of A participating in join(A,B, neighbor)
- Examples
- Support(A,B)=min(2/2,3/3)=1
- Support(B,C)=min(2/2,2/2)=1

Spatial feature A,B, C,

and their instances

Mining For Spatial Patterns

Spatial Co-location Patterns

- Partition approach

- Our approach

- Dataset

Support(A,B)=min(2/2,3/3)=1

Spatial feature A,B, C,

and their instances

Support(B,C)=min(2/2,2/2)=1

Support A,B =2 B,C=2

- Reference feature approach

C as reference feature

Transactions: (B1) (B2)

Support (A,B) = Ǿ

Support A,B=1 B,C=2

Mining For Spatial Patterns

Spatial Outliers

- Spatial Outlier: A data point that is extreme relative to it neighbors
- Case Study: traffic stations different from neighbors [SIGKDD 2001, JIDA 2002]
- Data - space-time plot, distr. Of f(x), S(x)
- Distribution of base attribute:
- spatially smooth
- frequency distribution over value domain: normal
- Classical test - Pr.[item in population] is low
- Q? distribution of diff.[f(x), neighborhood agg{f(x)}]
- Insight: this statistic is distributed normally!
- Test: (z-score on the statistics) > 2
- Performance - spatial join, clustering methods

Mining For Spatial Patterns

Spatial Outlier Detection

Given

A spatial graph G={V,E}

A neighbor relationship (K neighbors)

An attribute function : V -> R

An aggregation function : :R k -> R

A comparison function

Confidence level threshold

Statistic test function ST: R ->{T, F}

Find

O = {vi | vi V, vi is a spatial outlier}

Objective

Correctness: The attribute values of vi

is extreme, compared with its neighbors

Computational efficiency

Constraints

and ST are algebraic aggregate functions of and

Computation cost dominated by I/O op.

Mining For Spatial Patterns

Spatial Outlier Detection

Spatial Outlier Detection Test

1. Choice of Spatial Statistic

S(x) = [f(x)–E y N(x)(f(y))]

Theorem: S(x) is normally distributed

if f(x) is normally distributed

2. Test for Outlier Detection

| (S(x) - s) / s | >

Hypothesis

I/O cost determined by clustering efficiency

f(x)

S(x)

Mining For Spatial Patterns

A Unified Approach Spatial Outliers

- Tests : quantitative, graphical
- Results:
- Computation = spatial self-join
- Tests: algebraic functions of join
- Join predicate: neighbor relations
- I/O-cost: f(clustering efficiency)
- Our algorithm is I/O-efficient for
- Algebraic tests

Scatter Plot

Original Data

Our Approach

Mining For Spatial Patterns

Spatial Outlier Detection

Results

1. CCAM achieves higher clustering efficiency (CE)

2. CCAM has lower I/O cost

3. High CE => low I/O cost

4. Big Page => high CE

CE value

I/O cost

Z-order

Cell-Tree

CCAM

Mining For Spatial Patterns

Location Prediction

- Citations: IEEE Tran. on Multimedia2002, SIAM DM Conf. 2001, SIGKDD DMKD 2000
- Problem: predict nesting site in marshes
- given vegetation, water depth, distance to edge, etc.
- Data - maps of nests and attributes
- spatially clustered nests, spatially smooth attributes
- Classical method: logistic regression, decision trees, bayesian classifier
- but, independence assumption is violated ! Misses auto-correlation !
- Spatial auto-regression (SAR), Markov random field bayesian classifier
- Open issues: spatial accuracy vs. classification accurary
- Open issue: performance - SAR learning is slow!

Mining For Spatial Patterns

Location Prediction

Given:

1. Spatial Framework

2. Explanatory functions:

3. A dependent class:

4. A family of function mappings:

Find: Classification model:

Objective:maximize

classification_accuracy

Constraints:

Spatial Autocorrelation exists

Nest locations

Distance to open water

Water depth

Vegetation durability

Mining For Spatial Patterns

Motivation and Framework

Mining For Spatial Patterns

Spatial AutoRegression (SAR)

- Spatial Autoregression Model (SAR)
- y = Wy + X +
- W models neighborhood relationships
- models strength of spatial dependencies
- error vector
- Solutions
- and - can be estimated using ML or Bayesian stat.
- e.g., spatial econometrics package uses Bayesian approach using sampling-based Markov Chain Monte Carlo (MCMC) method.
- Likelihood-based estimation requires O(n3) ops.
- Other alternatives – divide and conquer, sparse matrix, LU decomposition, etc.

Mining For Spatial Patterns

MRF Bayesian

- Markov Random Field based Bayesian Classifiers
- Pr(li | X, Li) = Pr(X|li, Li) Pr(li | Li) / Pr (X)
- Pr(li | Li) can be estimated from training data
- Li denotes set of labels in the neighborhood of si excluding labels at si
- Pr(X|li, Li) can be estimated using kernel functions
- Solutions
- stochastic relaxation [Geman]
- Iterated conditional modes [Besag]
- Graph cut [Boykov]

Mining For Spatial Patterns

Experiment Design

Mining For Spatial Patterns

Prediction Maps(Learning)

Actual Nest Sites (Real Learning)

MRF-P Prediction (ADNP=3.36)

NZ=85

NZ=138

MRF-GMM Prediction (ADNP=5.88)

SAR Prediction (ADNP=9.80)

NZ=140

NZ=130

Mining For Spatial Patterns

Prediction Maps(Testing)

Actual Nest Sites (Real Testing)

MRF-P Prediction (ADNP=2.84)

Actual Nest Sites (Real Learning)

NZ=30

NZ=80

MRF-GMM Prediction (ADNP=3.35)

SAR Prediction (ADNP=8.63)

NZ=76

NZ=80

Mining For Spatial Patterns

Comparison (MRF-BC vs. SAR)

- SAR can be rewritten as y = (QX) + Q
- where Q = (I- W)-1 which can be viewed as a spatial smoothing operation.
- This transformation shows that SAR is similar to linear logistic model, and thus suffers with same limitations – i.e., SAR model assumes linear separability of classes in transformed feature space
- SAR model also make more restrictive assumptions about the distribution of features and class shapes than MRF
- The relationship between SAR and MRF are analogous to the relationship between logistic regression and Bayesian classifiers.
- Our experimental results shows that MRF model yields better spatial and classification accuracies than SAR predictions.

Mining For Spatial Patterns

Conclusion and Future Directions

- Spatial domains may not satisfy assumptions of classical methods
- data: auto-correlation, continuous geographic space
- patterns: global vs. local, e.g. spatial outliers vs. outliers
- data exploration: maps and albums
- Open Issues
- patterns: hot-spots, blobology (shape), spatial trends, …
- metrics: spatial accuracy(predicted locations), spatial contiguity(clusters)
- spatio-temporal dataset
- scale and resolutions sentivity of patterns
- geo-statistical confidence measure for mined patterns

Mining For Spatial Patterns

Relevance: “Maps are as important to soldiers as guns” - unknown

Joint Projects:

High Performance GIS for Battlefield Simulation (ARL Adelphi)

Spatial Querying for Battlefield Situation Assessment (ARL Adelphi)

Joint Publications:

w/ G. Turner (ARL Adelphi, MD) & D. Chubb (CECOM IEWD)

IEEE Computer (December 1996)

IEEE Transactions on Knowledge and Data Eng. (July-Aug. 1998)

Three conference papers

Visits, Other Collaborations

GIS group, Waterways Experimentation Station (Army)

Concept Analysis Agency, Topographic Eng. Center, ARL, Adelphi

Workshop on Battlefield Visualization and Real Time GIS (4/2000)

Reference

- S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C.T. Liu, “Spatial Databases: Accomplishments and Research Needs”, IEEE Transactions on Knowledge and Data Engineering, Jan.-Feb. 1999.
- S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th International Symposium on Spatial and Temporal Databases (SSTD01), July 2001.
- S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
- S. Shekhar, C.T. Lu, P. Zhang, “Detecting Graph-based Saptial Outlier”, Intelligent Data Analysis, To appear in Vol. 6(3), 2002
- S. Shekhar, S. Chawla, the book“Spatial Database: Concepts, Implementation and Trends”, Prentice Hall, 2002
- S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000.
- S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, 2001.
- S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu, and S. Chawla, “Spatial Contextual Classification and Prediction Models for Mining Geospatial Data”,To Appear in IEEE Transactions on Multimedia, 2002.
- S. Shekhar, V. Kumar, P. Tan. M. Steinbach, Y. Huang, P. Zhang, C. Potter, S. Klooster, “Mining Patterns in Earth Science Data”, IEEE Computing in Science and Engineering (Submitted)

Mining For Spatial Patterns

Reference

- S. Shekhar, C.T. Lu, P. Zhang, “A Unified Approach to Spatial Outliers Detection”, IEEE Transactions on Knowledge and Data Engineering (Submitted)
- S. Shekhar, C.T. Lu, X. Tan, S. Chawla, Map Cube: A Visualization Tool for Spatial Data Warehouses, as Chapter of Geographic Data Mining and Knowledge Discovery. Harvey J. Miller and Jiawei Han (eds.), Taylor and Francis, 2001, ISBN 0-415-23369-0.
- S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's Spatial about Spatial Data Mining: Three Case Studies , as Chapter of Book: Data Mining for Scientific and Engineering Applications. V. Kumar, R. Grossman, C. Kamath, R. Namburu (eds.), Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2
- Shashi Shekhar and Yan Huang , Multi-resolution Co-location Miner: a New Algorithm to Find Co-location Patterns in Spatial Datasets, Fifth Workshop on Mining Scientific Datasets (SIAM 2nd Data Mining Conference), April 2002

Mining For Spatial Patterns

Download Presentation

Connecting to Server..