1 / 18

Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications

Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications. Jiaxiang Lin & D.Y. Ye Key Lab of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, China May, 2008. Motivation.

pules
Download Presentation

Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications Jiaxiang Lin & D.Y. Ye Key Lab of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, China May, 2008

  2. Motivation • To combine the power of automatic calculation and the capabilities of human processing • Human perception offers phenomenal abilities to extract structures from pictures; • Research on visual spatial outliers mining; • Take into consideration spatial autocorrelation in SDM techniques • To improve the computational efficiency and the correctness of traditional methods of spatial outliers mining • Apply technologies of spatial outliers mining on real application;

  3. Problem Statement • Input Dataset • Edaphic chemical elements data inspected in the project "Ecological Geochemical Survey of Fujian Coastal Economic Belt " • Output • Spots that chemical elements are seriously abnormal to their neighbors; • D-TIN, MST chart, sub-MST chart, Neighborhood relationship graph between spots, and spatial outliers presentation; • Constraints • Dataset should be numeric; • The algorithm set a target for spatial points; • Objective • To find sets of real spatial outliers and show the results visually.

  4. Challenges in spatial data mining • Spatial Autocorrelation • near things are more related than far things • assumptions that values are independently and identically distributed does not hold true. • Large size of spatial data sets • Continuous vs. Discrete data types • Classical data - numbers and categories, while Spatial data – more complex, and extended objects such as points, lines and polygons; • Classicaldata mining works with explicit inputs, whereas spatial predicates and attributes are often implicit;

  5. Key Concepts • Outliers • An outlier is an observation that deviates so much from other observations as so to arouse suspicion that it is generated by a different mechanism. (Hawkins) • Observations inconsistent with therest of the dataset (Global outliers). • Spatial Outliers • A spatial outlier isa spatially referenced object whose non-spatial attribute values are significantly different from the values of its spatial neighborhood. (Shashi Shekhar) • Observations inconsistent with their neighborhoods. • A local instability or discontinuity. • Spatial neighborhoods may be defined using spatial attributes & spatial relations. Comparisons between spatially referenced objects can be based on non-spatial attributes.

  6. A A B B 7 5 7 5 C C 9 6 1 2 1 2 D D 3 F F E E 1 1 Key Concepts (contd.1) • Minimum Spanning Tree • A spanning tree is a tree containing all vertices of the graph. • When a graph has weighted edges, such as Euclidian distance between the two end points, the weight of a tree is the sum of the weights of the edges in the tree. • A minimum spanning tree (MST) is a tree with minimum weight, out of all spanning trees. Example: connected graph spanning tree (MST)

  7. Cluster 2 Inconsistent edge Cluster 1 Key Concepts (contd.2) • Inconsistent Edge in Graph • An edge is said to be inconsistent if its weight – i.e., the distance between its two end nodes – is significantly larger than the average weight of nearby edges. We can “control” the number of clusters by changing the precise definition of an inconsistent edge to be removed !

  8. Main Idea • Visualization and automated mining methods are applied sequentially • Result of Delaunay triangle construction can be used as input for minimum spanning tree. • Unite MST with the definition of inconsistent edge, partitional clustering can be achieved. • Spatial relationships implied in DTIN and the clustering info can be used as the groundwork of spatial outliers detection. Loose Integration (VDM)

  9. The proposed algorithm • D-TIN construction • Each point is denoted as a spatial object. • With time complexity of O(nlogn), plane sweep algorithm is adopted to construct the D-TIN. • MST construction based on D-TIN • Theorem: The MST of a set of points P (in any dimension) is a sub-graph of the Delaunay triangulation. • Compute the MST using Kruskal's algorithm. Based on Delaunay triangulations, it can be done in O(nlogn) time; • MST segmentation & clustering • Cut off several inconsistent edges of MST • Attain clusters info accord to spatial location. • Spatial outliers mining • Make use of the spatial neighborship denoted by D-TIN and the blocking role of clusters, then calculate local instability of each target.

  10. MST based Partitional Clustering • Basic Steps: • Construct the minimum spanning tree (MST) for the data; • Identify “inconsistent” edges in the MST; • Remove the inconsistent edges and consider each of the connected components as a cluster; • Notes: • During clustering based on minimum spanning trees, we associate: • each sample (record, spatial object) with a node in a graph; • the Euclidian distance between each pair of samples with the weight of the edge connecting these two examples; • “inconsistent” edge • An edgewhose distance between its two end nodes is significantly larger than the average weight of its two end nodes’ nearby edges. • Number of “inconsistent” edge is control by a threshold, such as , it denotes that “inconsistent” edge is times larger than it’s nearby edges.

  11. Spatial Outliers Detection • Basic Steps of the Method: • For each object xi, • find the k(xi) nearest neighbors set NNk(xi) , according to spatial neighbourhoodsdefined using Delaunay TIN and the partitional clustering info of multi-MST; • Calculate the weighted average of all xi’s neighbors, according to the neighborhood function g(xi); • Compute the comparison function hi=h(xi)=f(xi)/ g(xi); • Let ha or ha-1 denote the maximum value of comparison, For a given threshold §, if ha or ha-1 ≥§, then treat xa as a candidate s-outliers; • Remove xa and repeat the former step, find next candidate s-outliers, until the threshold condition is not met or the number of objects in the cluster equals 1; • Notes: • Threshold § is set to a different value in different application. In this project §=1.5. It means that if an object was 1.5 times larger or smaller its neighbors, it will be treat as candidate s-outliers; • Isolate object is directly treated as candidate s-outlier; • Whether a candidate s-outliers is a true one or not, it should be further affirmed by domain experts;

  12. Real Application • Dataset of edaphic chemical elements • Collect from both the shallow soil and deep soil; • inspected in the project “Ecological Geochemical Survey of Fujian Coastal Economic Belt”; • 251 records & 54 attributes totally • 3 spatial attribute, known as longitude, latitude and the time inspect (don’t include in the paper); • 51 thematic attributes, As, Ag, Al2O3, Au, B, Ba, Be, Bi, Br, CaO, Ce, Cl, Co, Cr, Cu, F, Fe2O3, Ga, Ge, Hg, I, K2O, La, Li, MgO, Mn, Mo, Na2O, Nb, Ni, N, P, Pb, Ph, Rb, S, Sb, Sc, Se, SiO2, Sn, Sr, Th, Ti, U, V , W, Zn, Zr, Organic Carbon, sum of carbon;

  13. Plane Sweep Line Algorithm Inconsistent Threshold= Inconsistent edges MST segmentation, Partitional Clustering Preliminary work

  14. Outlier score of each candidate s-outliers Candidate s-outlier display in chart Candidate Spatial Outliers Iterative test spatial objects, get candidate s-outliers

  15. s-outliers display in ESRI.ArcMap Further Examination Real s-outliers

  16. Analysis • In this project, spatial outlier is a spot which has a extreme value of chemical elements compared to its neighboring spots • Through comparison between the inspected values and the average, object p16, p49, p75, p133, p148, p227 are identified as real outliers • By the reason that some of the harmful chemical elements of these spots are seriously abnormal. • Abnormity of harmful edaphic chemical elements may be derived from: • exterior factor during the development of industry, such as the stack of garbage; • internal factor during the shaping of soil, like the enrichment of chemical elements.

  17. Conclusion • Spatial outliers are defined as local instability; • Spatial structure characteristic of spatial objects are well maintained; • Spatial neighborhoods are achieved on behalf of Delaunay Triangulation Net and partition info of multi-MST • With suitably modifiedthreshold of inconsistent edge and abnormity, MST based algorithms can be used to effectively detect spatial outliers from different kinds of spatial data. • If only little is known about the data, we can take good use of human abilities • User is expected to directly involve in the process of outliers detection;

  18. Thank you !!!

More Related