mining for spatial patterns
Skip this Video
Download Presentation
Mining for Spatial Patterns

Loading in 2 Seconds...

play fullscreen
1 / 39

Mining for Spatial Patterns - PowerPoint PPT Presentation

  • Uploaded on

Mining for Spatial Patterns . Shashi Shekhar Department of Computer Science University of Minnesota Collaborators: V. Kumar, G. Karypis, C.T. Lu, W. Wu, Y. Huang, V. Raju, P. Zhang, P. Tan, M. Steinbach

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Mining for Spatial Patterns' - shing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mining for spatial patterns
Mining for Spatial Patterns

Shashi Shekhar

Department of Computer Science

University of Minnesota

Collaborators: V. Kumar, G. Karypis, C.T. Lu, W. Wu, Y. Huang, V. Raju, P. Zhang, P. Tan, M. Steinbach

This work was partially funded by NASA and Army High Performance Computing Center

Mining For Spatial Patterns

spatial data mining sdm examples
Spatial Data Mining(SDM) - Examples
  • Historical Examples:
    • London Asiatic Cholera 1854 (Griffith)
    • Dental health and fluoride in water, Colorado early 1900s
  • Current Examples:
    • Cancer clusters (CDC), Spread of disease (e.g. Nile virus)
    • Crime hotspots (NIJ CML, police petrol planning)
    • Environmental justice (EPA), fair lending practices
  • Upcoming Applications: Location aware services
    • Defense: Sensor networks, Mobile ad-hoc networks
    • Civilian: Mortgage PMI determination based on location

Mining For Spatial Patterns

army relevance of sdm
Army Relevance of SDM
  • Strategic
    • Predicting global hot spots (FORMID)
    • Army land: endangered species vs. training and war games
    • Search for local trends in massive simulation data
    • Critical infra-structure defense (threat assessment)
  • Tactical
    • Inferring enemy tactics (e.g. flank attack) from blobology
    • Detection of lost ammunition dumps (Dr. Radhakrishnan)
  • Operational
    • Interpretation of maps: map matching (locating oneself on map)
      • identify terrain feature, e.g. ravines, valleys, ridge, etc.
    • Locating enemy (e.g. sniper in a haystack, sensor networks)
    • Avoiding friendly fire

Mining For Spatial Patterns

spatial data mining sdm definition
Spatial Data Mining(SDM) - Definition
  • Search of implicit, interesting patterns in geo-spatial data
    • Ex. Reconnaissance, Vector maps(NIMA, TEC), GPS, Sensor networks
  • Data Mining vs. Statistics:
    • Primary vs. Secondary analysis
    • Global vs. local trends
  • Spatial Data Mining vs. Data Mining:
    • Spatial Autocorrelation
    • Continuous vs. Discrete data types

Mining For Spatial Patterns

  • Spatial Data Mining
    • Spatial statistics in Geology, Regional Economics
    • NSF workshop on GIS and DM (3/99)
    • NSF workshop on spatial data analysis (5/02)
  • Spatial patterns:
    • Spatial outliers
    • Location prediction
    • Associations, colocations
    • Hotspots, Clustering, trends, …

Mining For Spatial Patterns

  • 2 Approaches to mining Spatial Data
    • 1. Pick spatial features; use classical DM methods
    • 2. Use novel data mining techniques
  • Our Approach:
    • Define the problem: capture special needs
    • Explore data using maps, other visualization
    • Try reusing classical DM methods
    • If classical DM perform poorly, try new methods
    • Evaluate chosen methods rigourously
    • Performance tuning if needed

Mining For Spatial Patterns

spatial association rule
Spatial Association Rule
  • Citation: Symp. On Spatial Databases 2001
  • Problem: Given a set of boolean spatial features
    • find subsets of co-located features, e.g. (fire, drought, vegetation)
    • Data - continuous space, partition not natural, no reference feature
  • Classical data mining approach: association rules
    • But, Look Ma! No Transactions!!! No support measure!
  • Approach: Work with continuous data without transactionizing it!
    • confidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)]
    • support: cardinality of spatial join of instances of fire, drought, dry veg.
    • participation: min. fraction of instances of a features in join result
    • new algorithm using spatial joins and apriori_gen filters

Mining For Spatial Patterns

event definition
Event Definition
  • Convert the time series into sequence of events at each spatial location.

Mining For Spatial Patterns

interesting association patterns
Interesting Association Patterns
  • Use domain knowledge to eliminate uninteresting patterns.
  • A pattern is less interesting if it occurs at random locations.
  • Approach:
    • Partition the land area into distinct groups (e.g., based on land-cover type).
    • For each pattern, find the regions for which the pattern can be applied.
    • If the pattern occurs mostly in a certain group of land areas, then it is potentially interesting.
    • If the pattern occurs frequently in all groups of land areas, then it is less interesting.

Mining For Spatial Patterns

association rules
Association Rules

FPAR-Hi  NPP-Hi (support  10)

Shrubland regions

  • Intra-zone non-sequential Patterns
  • Region corresponds to semi-arid grasslands, a type of vegetation, which is able to quickly take advantage of high precipitation than forests.
  • Hypothesis: FPAR-Hi events could be related to unusual precipitation conditions.

Mining For Spatial Patterns


Can you find co-location patterns from the following sample dataset?

Answers: and

Mining For Spatial Patterns


Spatial Co-location

A set of features frequently co-located


A set T of K boolean spatial feature types T={f1,f2, … , fk}

A set P of N locations P={p1, …, pN } in a spatial frame work S, pi P is of some spatial feature in T

A neighbor relation R over locations in S


Tc = subsets of T frequently co-located






R is symmetric and reflexive

Monotonic prevalence measure

Reference Feature Centric

Window Centric

Event Centric

Mining For Spatial Patterns


Comparison with association rules

Participation index

Participation ratio pr(fi, c) of feature fi in co-location c = {f1, f2, …, fk}: fraction of instances of fi with

feature {f1, …, fi-1, fi+1, …, fk} nearby 2.Participation index = min{pr(fi, c)}


Hybrid Co-location Miner

Mining For Spatial Patterns

Spatial Co-location Patterns
  • Dataset
  • Spatial feature A,B,C and their instances
  • Possible associations are (A, B), (B, C), etc.
  • Neighbor relationship includes following pairs:
    • A1, B1
    • A2, B1
    • A2, B2
    • B1, C1
    • B2, C2

Mining For Spatial Patterns

Spatial Co-location Patterns
  • Dataset
  • Partition approach[Yasuhiko, KDD 2001]
  • Support not well defined,i.e. not independent of execution trace
  • Has a fast heuristic which is hard to analyze for correctness/completeness

Spatial feature A,B, C,

and their instances

Support A,B=1 B,C=2

Support A,B =2 B,C=2

Mining For Spatial Patterns

Spatial Co-location Patterns
  • Dataset
  • Reference feature approach [Han SSD 95]
    • C as reference feature to get transactions
    • Transactions: (B1) (B2)
    • Support (A,B) = Ǿ from Apriori algorithm
    • Note: Neighbor relationship includes following pairs:
      • A1, B1
      • A2, B1
      • A2, B2
      • B1, C1
      • B2, C2

Spatial feature A,B, C,

and their instances

Mining For Spatial Patterns

Spatial Co-location Patterns
  • Dataset
  • Our approach (Event Centric)
    • Neighborhood instead of transactions
    • Spatial join on neighbor relationship
    • Support  Prevalence
      • Participation index = min. p_ratio
      • P_ratio(A, (A,B)) = fraction of instance of A participating in join(A,B, neighbor)
      • Examples
  • Support(A,B)=min(2/2,3/3)=1
  • Support(B,C)=min(2/2,2/2)=1

Spatial feature A,B, C,

and their instances

Mining For Spatial Patterns

Spatial Co-location Patterns
  • Partition approach
  • Our approach
  • Dataset


Spatial feature A,B, C,

and their instances


Support A,B =2 B,C=2

  • Reference feature approach

C as reference feature

Transactions: (B1) (B2)

Support (A,B) = Ǿ

Support A,B=1 B,C=2

Mining For Spatial Patterns

spatial outliers
Spatial Outliers
  • Spatial Outlier: A data point that is extreme relative to it neighbors
  • Case Study: traffic stations different from neighbors [SIGKDD 2001, JIDA 2002]
  • Data - space-time plot, distr. Of f(x), S(x)
  • Distribution of base attribute:
    • spatially smooth
    • frequency distribution over value domain: normal
  • Classical test - Pr.[item in population] is low
    • Q? distribution of diff.[f(x), neighborhood agg{f(x)}]
    • Insight: this statistic is distributed normally!
    • Test: (z-score on the statistics) > 2
    • Performance - spatial join, clustering methods

Mining For Spatial Patterns

Spatial Outlier Detection


A spatial graph G={V,E}

A neighbor relationship (K neighbors)

An attribute function : V -> R

An aggregation function : :R k -> R

A comparison function

Confidence level threshold 

Statistic test function ST: R ->{T, F}


O = {vi | vi V, vi is a spatial outlier}


Correctness: The attribute values of vi

is extreme, compared with its neighbors

Computational efficiency


and ST are algebraic aggregate functions of and

Computation cost dominated by I/O op.

Mining For Spatial Patterns

Spatial Outlier Detection

Spatial Outlier Detection Test

1. Choice of Spatial Statistic

S(x) = [f(x)–E y N(x)(f(y))]

Theorem: S(x) is normally distributed

if f(x) is normally distributed

2. Test for Outlier Detection

| (S(x) - s) / s | > 


I/O cost determined by clustering efficiency



Mining For Spatial Patterns

graphical spatial tests
Graphical Spatial Tests

Moran Scatter Plot

Original Data

Variogram Cloud

Mining For Spatial Patterns

a unified approach spatial outliers
A Unified Approach Spatial Outliers
  • Tests : quantitative, graphical
  • Results:
    • Computation = spatial self-join
    • Tests: algebraic functions of join
    • Join predicate: neighbor relations
    • I/O-cost: f(clustering efficiency)
    • Our algorithm is I/O-efficient for
      • Algebraic tests

Scatter Plot

Original Data

Our Approach

Mining For Spatial Patterns

Spatial Outlier Detection


1. CCAM achieves higher clustering efficiency (CE)

2. CCAM has lower I/O cost

3. High CE => low I/O cost

4. Big Page => high CE

CE value

I/O cost




Mining For Spatial Patterns

location prediction
Location Prediction
  • Citations: IEEE Tran. on Multimedia2002, SIAM DM Conf. 2001, SIGKDD DMKD 2000
  • Problem: predict nesting site in marshes
    • given vegetation, water depth, distance to edge, etc.
  • Data - maps of nests and attributes
    • spatially clustered nests, spatially smooth attributes
  • Classical method: logistic regression, decision trees, bayesian classifier
    • but, independence assumption is violated ! Misses auto-correlation !
    • Spatial auto-regression (SAR), Markov random field bayesian classifier
    • Open issues: spatial accuracy vs. classification accurary
    • Open issue: performance - SAR learning is slow!

Mining For Spatial Patterns

Location Prediction


1. Spatial Framework

2. Explanatory functions:

3. A dependent class:

4. A family of function mappings:

Find: Classification model:




Spatial Autocorrelation exists

Nest locations

Distance to open water

Water depth

Vegetation durability

Mining For Spatial Patterns

Motivation and Framework

Mining For Spatial Patterns

Spatial AutoRegression (SAR)
  • Spatial Autoregression Model (SAR)
    • y = Wy + X + 
      • W models neighborhood relationships
      •  models strength of spatial dependencies
      •  error vector
    • Solutions
      •  and  - can be estimated using ML or Bayesian stat.
      • e.g., spatial econometrics package uses Bayesian approach using sampling-based Markov Chain Monte Carlo (MCMC) method.
      • Likelihood-based estimation requires O(n3) ops.
      • Other alternatives – divide and conquer, sparse matrix, LU decomposition, etc.

Mining For Spatial Patterns

  • Linear Regression
  • Spatial Regression
  • Spatial model is better

Mining For Spatial Patterns

MRF Bayesian
  • Markov Random Field based Bayesian Classifiers
    • Pr(li | X, Li) = Pr(X|li, Li) Pr(li | Li) / Pr (X)
      • Pr(li | Li) can be estimated from training data
      • Li denotes set of labels in the neighborhood of si excluding labels at si
      • Pr(X|li, Li) can be estimated using kernel functions
    • Solutions
      • stochastic relaxation [Geman]
      • Iterated conditional modes [Besag]
      • Graph cut [Boykov]

Mining For Spatial Patterns

Experiment Design

Mining For Spatial Patterns

prediction maps learning
Prediction Maps(Learning)

Actual Nest Sites (Real Learning)

MRF-P Prediction (ADNP=3.36)



MRF-GMM Prediction (ADNP=5.88)

SAR Prediction (ADNP=9.80)



Mining For Spatial Patterns

prediction maps testing
Prediction Maps(Testing)

Actual Nest Sites (Real Testing)

MRF-P Prediction (ADNP=2.84)

Actual Nest Sites (Real Learning)



MRF-GMM Prediction (ADNP=3.35)

SAR Prediction (ADNP=8.63)



Mining For Spatial Patterns

Comparison (MRF-BC vs. SAR)
  • SAR can be rewritten as y = (QX)  + Q
    • where Q = (I- W)-1 which can be viewed as a spatial smoothing operation.
    • This transformation shows that SAR is similar to linear logistic model, and thus suffers with same limitations – i.e., SAR model assumes linear separability of classes in transformed feature space
  • SAR model also make more restrictive assumptions about the distribution of features and class shapes than MRF
  • The relationship between SAR and MRF are analogous to the relationship between logistic regression and Bayesian classifiers.
  • Our experimental results shows that MRF model yields better spatial and classification accuracies than SAR predictions.

Mining For Spatial Patterns


Confusion Matrix:

Spatial Confusion Matrix:

Mining For Spatial Patterns

Conclusion and Future Directions
  • Spatial domains may not satisfy assumptions of classical methods
    • data: auto-correlation, continuous geographic space
    • patterns: global vs. local, e.g. spatial outliers vs. outliers
    • data exploration: maps and albums
  • Open Issues
    • patterns: hot-spots, blobology (shape), spatial trends, …
    • metrics: spatial accuracy(predicted locations), spatial contiguity(clusters)
    • spatio-temporal dataset
    • scale and resolutions sentivity of patterns
    • geo-statistical confidence measure for mined patterns

Mining For Spatial Patterns

army relevance and collaborations

Army Relevance and Collaborations

Relevance: “Maps are as important to soldiers as guns” - unknown

Joint Projects:

High Performance GIS for Battlefield Simulation (ARL Adelphi)

Spatial Querying for Battlefield Situation Assessment (ARL Adelphi)

Joint Publications:

w/ G. Turner (ARL Adelphi, MD) & D. Chubb (CECOM IEWD)

IEEE Computer (December 1996)

IEEE Transactions on Knowledge and Data Eng. (July-Aug. 1998)

Three conference papers

Visits, Other Collaborations

GIS group, Waterways Experimentation Station (Army)

Concept Analysis Agency, Topographic Eng. Center, ARL, Adelphi

Workshop on Battlefield Visualization and Real Time GIS (4/2000)

  • S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C.T. Liu, “Spatial Databases: Accomplishments and Research Needs”, IEEE Transactions on Knowledge and Data Engineering, Jan.-Feb. 1999.
  • S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th International Symposium on Spatial and Temporal Databases (SSTD01), July 2001.
  • S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
  • S. Shekhar, C.T. Lu, P. Zhang, “Detecting Graph-based Saptial Outlier”, Intelligent Data Analysis, To appear in Vol. 6(3), 2002
  • S. Shekhar, S. Chawla, the book“Spatial Database: Concepts, Implementation and Trends”, Prentice Hall, 2002
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000.
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, 2001.
  • S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu, and S. Chawla, “Spatial Contextual Classification and Prediction Models for Mining Geospatial Data”,To Appear in IEEE Transactions on Multimedia, 2002.
  • S. Shekhar, V. Kumar, P. Tan. M. Steinbach, Y. Huang, P. Zhang, C. Potter, S. Klooster, “Mining Patterns in Earth Science Data”, IEEE Computing in Science and Engineering (Submitted)

Mining For Spatial Patterns

  • S. Shekhar, C.T. Lu, P. Zhang, “A Unified Approach to Spatial Outliers Detection”, IEEE Transactions on Knowledge and Data Engineering (Submitted)
  • S. Shekhar, C.T. Lu, X. Tan, S. Chawla, Map Cube: A Visualization Tool for Spatial Data Warehouses, as Chapter of Geographic Data Mining and Knowledge Discovery. Harvey J. Miller and Jiawei Han (eds.), Taylor and Francis, 2001, ISBN 0-415-23369-0.
  • S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's Spatial about Spatial Data Mining: Three Case Studies , as Chapter of Book: Data Mining for Scientific and Engineering Applications. V. Kumar, R. Grossman, C. Kamath, R. Namburu (eds.), Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2
  • Shashi Shekhar and Yan Huang , Multi-resolution Co-location Miner: a New Algorithm to Find Co-location Patterns in Spatial Datasets, Fifth Workshop on Mining Scientific Datasets (SIAM 2nd Data Mining Conference), April 2002

Mining For Spatial Patterns