1 / 57

Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps

Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps. Ph.D. Dissertation Defense . Jun Yan Geography Department SUNY at Buffalo July 29, 2004. Dissertation Committee: Dr. Jean-Claude Thill (Chair) Dr. Ling Bian Dr. David Mark. Outline. Background

baby
Download Presentation

Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation Defense Jun Yan Geography Department SUNY at Buffalo July 29, 2004 • Dissertation Committee: • Dr. Jean-Claude Thill (Chair) • Dr. Ling Bian • Dr. David Mark

  2. Outline • Background • Spatial Interaction Data • Methodology • Self-Organizing Maps • Visual Data Mining • Case studies • Conclusions and Future Research

  3. Information technologies More tools available More data available Background • Data-rich vs computation-rich: • challenge? • opportunity !!! Two Legs!!!

  4. Background (Cont.) • Data Mining & Knowledge Discovery:“useful information from large databases” • useful • novel • valid • Understandable • Geographic data mining (GDM) and geographic knowledge discovery (GKD)?

  5. Background (Cont.) User Controller DB Interface Target Data Selection Data Mining Evaluation DBMS Discoveries Domain Knowledge Knowledge Base Knowledge discovery process • Mining techniques: statistics, patternrecognition,machine learning, visualization, high performance computing … • Knowledge discovery process Data Mining

  6. Background (Cont.) • Finding all the patterns autonomously in a database?: unrealistic • because the patterns could be too many but uninteresting • Data mining: an iterative, interactive, semi-automated process • people directs what to be mined • Visualization: Geovisualization (GVis) • visual data mining !!!

  7. Visualization in KDD Process Selecting Application Domain Understanding basic data distribution, selecting meaningful target datasets Selecting Target Data Locating missing data, noise removing, data smoothing Processing Data Parameters setting, process tracking, process steering Extracting Information/Knowledge Interpretation, reporting, comparison, validity checking Interpretation and Evaluation

  8. Background (Cont.) Learning Algorithm Examples Concept description or Other knowledge Background knowledge (sometimes) Inputs Outputs Input layer Output layer Hidden layer • Machine learning & Neural Networks

  9. Background (Cont.) • Objectives: • Explore the effectiveness of neural networks in GKD • Examine the roles of GVis in GKD

  10. Spatial Interaction Data • What is spatial interaction? • Pairsof places • Elemental: trips made by individuals • Aggregate: flows from origins to destinations • Examples: migration, freight shipment, movement of capital & information …

  11. Spatial Interaction Data (Cont.) Region 1 Region 2 Origin Type 1 Region 3 Destination Type 2 Distance Type 3 Region1 Region1>Region 1 Trip 1 Region 2 Region1>Region 2 Trip 2 Region 3 Region1>Region 3 Trip 3 Trip table Basic O-D matrix Dyadic O-D matrix Elemental level Aggregate level

  12. Spatial Interaction Data (Cont.) • Exploring the Patterns of Interaction • Very necessary!!! • Existing Exploratory Data Analysis (EDA): lack of interactivity • Challenges: • a large number of interactions • wide range of interaction magnitudes • multiple semantics

  13. Spatial Interaction Data (Cont.) Interaction semantics O-D Matrices Origin Destination • Multidimensionality!!!

  14. Spatial Interaction Data (Cont.) Electronic products Machinery Vehicle and parts Photographic products

  15. Methodology • Self-Organizing Maps (SOM) • Visual Data Mining (VDM): • SOM as core DM engine • Interactivity

  16. Self-Organizing Maps • A crucial task of KDD: reduce data complexity • Data Quantization:number of records, here number of spatial interactions • Data Projection:number of variables, here number of interaction semantics • By reducing data complexity, identification of meaningful geographic structures becomes possible • Traditional multivariate statistical methods share their limitations

  17. Self-Organizing Maps (Cont.) Losing Node Winning Node Output Losing Node Input Layer Competitive Output layer A special type of competitive neural network; Based on some measure of dissimilarity in the attribute space; Capable of reducing data complexity on two dimensions simultaneously Actually an unsupervised pattern classifier.

  18. Self-Organizing Maps (Cont.) • Best match unit (BMU) changes its value to fit with the input data; • Its neighboring nodes change their values to fit with the input data as well. Only the magnitude decreases with distance; • Like a flexible net; • Similar data will locate close to each other in the mapping

  19. Dynamic linking Assignment Focusing Operation Brushing Colormap manipulation Interaction Forms Visualization Forms Visual Data Mining • Framework

  20. Visualization Forms

  21. Case Studies • Airline Origin and Destination Survey Market Table (DB1Market): http://www.bts.org • 10% of air flight itineraries • Geographic scale: airport level  280 metros in Contiguous US • Temporal range: 1993 to 2002 • Two case studies on DB1BMarket • Cross-sectional analysis • Temporal changes

  22. Clustering Analysis 3 8 3 8 4 4 9-2 7 9-1 7 9 6 6 9-3 9-4 1 1 5 2 9-5 5 2 • A cluster is an area of low values (distance) surrounded by areas of high values (distance). • There are several clusters in the feature map

  23. Clustering Analysis (Cont.) A cluster is a valley in a 3-D map

  24. Cluster Analysis (Cont.) Market Share Contribution

  25. AA MQ CO RU NW XJ WN ZW Multiple UA QX DL HP DL EV QX US Cluster Analysis (Cont.)

  26. Markets with US Airways Market Share >= 50% Markets Represented by Cluster 2 Cluster 2 Cluster Analysis (Cont.)

  27. Cluster Analysis: MarketsFrom Nashville CO RU WN AA NW DL UA US EV

  28. Cluster Analysis: MarketsFrom Nashville (Cont.) CO RU WN AA NW DL UA US EV

  29. Association Analysis Market Share Average Airfare

  30. Association Analysis (Cont.)

  31. Association Analysis (Cont.)

  32. Temporal Changes

  33. Temporal Changes (Cont.) TWA 2001 AA 2001 AA 1993 AA 2002

  34. Temporal Changes (Cont.) Northwest share Continental share

  35. Temporal Changes: Trajectory 01 US Airways fare US Airways share Southwest share 00 93 98 96 01 01 00 00 93 93 96 96 98 98 • Market from Buffalo to DC

  36. Conclusions • Data rich environment: large databases, and high dimensionality • Data complexity reduction is crucial • Results suggest SOM: • summarize well the overall data distribution • capable of detecting clustered structures • can be used to analyze the properties of clustered structures • can be used to study the associations among input variables

  37. Conclusions (Cont.) • Interactive visual data mining can: • examine subset data more closely • study relationships among interaction types • analyze how detected clusters are distributed in the actual geographic space • Help us gain a better understanding of the factors and spatial processes behind

  38. Future Research • SOM/VDM analysis • DB1BMarket • Other types of spatial interaction data • Data at elemental level • Improved VDM environment • Human subject testing • Seemly-coupled

  39. Thank You!Questions? Comments?Contact: junyan@buffalo.edu

  40. Background (Cont.) • Geographic database fits the profile: • massive volume:GIS, GPS, Remote Sensing … • high dimensionality • Geographic data mining (GDM) and geographic knowledge discovery (GKD)? • Current topic in GIS research

  41. Exploratory analysis Knowledge construction Data driven Exploratory analysis Knowledge construction Analysis and modeling Evaluation of results Model driven Visual exploration & visual data mining Time Data presentation, visualization of uncertainty Visual knowledge construction & refinement Visual model tracking, model steering Background (Cont.) Roles of Visualization

  42. Visualization in KDD Process Selecting Application Domain Understanding basic data distribution, selecting meaningful target datasets Selecting Target Data Locating missing data, noise removing, data smoothing Processing Data Parameters setting, process tracking, process steering Extracting Information/Knowledge Interpretation, reporting, comparison, validity checking Interpretation and Evaluation

  43. Modeling Flows • Modeling Flows • Spatial interaction models: “Gravity Models” • Other geographic factors: • Geographic relationships among origins? • Geographic relationships among destinations? • Association among types of interaction?

  44. Modeling Flows • Modeling Flows • Spatial interaction models: “Gravity Models” • Push: origin • Pull: destination • Transportation cost: distance decay Iij  =  k PiPj / dija=  k Pi Pjdij -a

  45. Spatial Interaction Data (Cont.)

  46. Spatial Interaction Data (Cont.)

  47. Limitations of Traditional Multivariate Methods • Data Projection • Factor analysis • Projection pursuit • Multi-dimensional scaling • Data Quantization • Partitioning methods • Hierarchical methods • Linearity • Stationary • Normal distribution • Limited data amount • One dimension compression • Non-linear • Non-stationary • Distribution unknown • Sparse • Large data amount • Multi-dimensional

  48. Visualization Forms

  49. Interaction Forms

  50. Interaction Forms

More Related