1 / 67

Modeling & Analyzing Massive Terrain Data Sets

Modeling & Analyzing Massive Terrain Data Sets. Pankaj K. Agarwal Duke University. STREAM Project. S calable T echniques for hi- R esolution E levation data A nalysis & M odeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi.

kali
Download Presentation

Modeling & Analyzing Massive Terrain Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling & Analyzing Massive Terrain Data Sets Pankaj K. Agarwal Duke University

  2. STREAM Project Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi

  3. Photogrammetry0.76m v. accuracy (5ft contours) 1974 1995 1998 Lidar0.15m v. accuracy; altitude 700m and 2300m RTK-GPS0.10m v. accuracy 2001 2004 1999 Diverse elevation point data: density, distribution, accuracy

  4. Constructing Digital Elevation Models Grid, TIN, Contour DEMs

  5. Natural feature extraction Maps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters Ma Extracted foredunes 1999-2004 using profile curvature and elevation threshold

  6. slip faces: slope > 30deg dune crests: profile curvature threshold 1995 new slip face in 1999 2001 Tracking evolving features how to automatically track features that change some of their properties over time?

  7. Hydrology Analysis • Flow Analysis • Watershed hierarchy

  8. Challenge: Massive Data Sets • LIDAR • NC Coastline: 200 million points – over 7 GB • Neuse River basin (NC): 500 million points – over 17 GB • Output is also large • 10ft grid: 10GB • 5ft grid: 40GB

  9. Approximation Algorithms • Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution • Tunable Approximation algorithms • Tradeoff between accuracy and efficiency • User specifies tolerance

  10. read/write head track read/write arm magnetic surface I/O-Bottleneck • Data resides in secondary memory • Disk access is 106 times slower than main memory access • Maximize useful data transferred with each access • Amortize large access time by transferring large contiguous blocks of data • I/O – not CPU – is often the bottleneck for processing massive data sets

  11. Disk RAM CPU I/O-efficient Algorithms [AV88] • Traditional algorithms optimize CPU computation • Not sensitive to penalty of disk access • I/O model • Memory is finite • Data is transferred in blocks • Complexity measured in disk blocks transferred B M B ~ 2K-8K

  12. Elevation Data to Watershed Hierarchies: A Pipeline Approach

  13. Point Cloud to Grid DEM • Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data) • Goal: construct a grid DEM for z • DEM Applications • Hydrology • Ecology • Viewsheds … • Many algorithms only work on grid data

  14. Sub-meter resolution: improved structures representation 2004 lidar-based 30cm resolution DSM 1999 lidar 2m resolution DSM 2004 lidar: 0.5m resolution binning interpolated DSM Binning (within cell average) sufficient for ~1-3m DEMs but interpolation produces improved geometry representation and allows us to create higher resolution DEMs/DSMs

  15. Interpolate O(N3) General Approach: Interpolation • Kriging, IDW, Splines, … z=F(x,y) N Points • Modern data sets can have millions of points • Direct interpolation does not scale

  16. Improving Interpolation with Quad Trees • Segment initial point set using a quad tree • Each quad tree leaf has a <K points (10-100) • Interpolate each segment separately • N/K segments • O(K3) interpolation per segment • O(K2N) total running time • Linear in N • Boundary smoothness? • Use points from nearby segments

  17. Overview of the Algorithm • Construct quad-tree on disk using bulk loading [AAPV 01] • For each quad-tree node, find points in neighboring segments • Arrange points on disk for sequential interpolation • Use the algorithm by Mitasova, Mitas, Harmon • Can use any other interpolation method

  18. Block I/O Building an External Quad Tree K=2 • Build top few levels in memory • Buffer points in lower levels • Write buffers to disk when full

  19. Building a TIN TIN: Triangulated Irregular Network Constrained DelaunayTriangulation Developed an I/O-efficient algorithm Builds TIN for NeuseRiver Basin (14GB) in7.5 hours

  20. Future Work: Hierarchical DEMs • Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains • Focuses on visualization • Most algorithms perform random memory access • Out-of-core simplification • I/O-efficient algorithms for terrain simplification (edge contraction method) • Feature preserving simplification • Preserves main river networks • Preserves visibility for most of the points

  21. Coping with Noisy Data • Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96] • water flows downwards • stops at a minimum

  22. Denoising via Flooding[Jenson and Domingue 88, Arge et al. 03] • Pour water uniformly on terrain • Water level rises until steady-state

  23. River Network After Flooding After flooding Sort(N) I/Os

  24. Denoising via Flooding[Jenson and Domingue 88, Arge et al. 03] • Pour water uniformly on terrain • Water level rises until steady-state • Use topologicalpersistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum • Flood low-persistence minima, preserve high-persistence minima

  25. Topological Persistence [Edelsbrunner et al. 02]

  26. The Union-Find Problem • A universe of N elements: x1, x2, …, xN • Initially N singleton sets: {x1}, {x2 }, …, {xN} • Each set has a representative • Maintain the partition under • Union(xi, xj) : Joins the sets containing xi and xj • Find(xi) : Returns the representative of the set containing xi

  27. Formulated as Batched Union-Find • On a terrain represented as a TIN • When reach • A minimum or maximum: do nothing • A regular point u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) • Sequence of union/find operations can be computed in advance lower link

  28. Classical Union-Find Algorithm representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a b j a m path compression link-by-rank e g e g n

  29. Complexity • O(N α(N)) for a sequence of N union and find operations [Tarjan 75] • α(•) : Inverse Ackermann function (very slow!) • Optimal in the worst case [Tarjan79, Fredman and Saks 89] • Batched (Off-line) version • Entire sequence known in advance • Can be improved to linear on RAM [Gabow and Tarjan 85] • Not possible on a pointer machine [Tarjan79]

  30. Our Results • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • Same as sorting • optimal in the worst case • A simpler algorithm using O(sort(N) log(N/M)) I/Os • Implemented • Apply to persistence computation, flooding • O(sort(N)) time • Other applications

  31. I/O-Efficient Batched Union-Find • Two-stage algorithm • Convert to interval union-find • Compute an order on the elements s.t. each union joins two adjacent sets • Solve batched interval union-find • Each stage formulated as a geometric problem and solved using sort(N) I/Os

  32. Experimental Results Traditional algorithm good as long as data fits in memory

  33. Experimental Results:Neuse River Basin 0.5 billion points (14GB raw data), sampled to obtain smaller data sets On the entire data set: EM takes 5 hours, IM fails

  34. Contour Trees

  35. Previous Results • Directly maintain contours • O(N log N) time [van Kreveld et al. 97] • Needs union-split-find for circular lists • Do not extend to higher dimensions • Two sweeps by maintaining components, then merge • O(N log N) time [Carr et al. 03] • Extend to arbitrary dimensions

  36. Join Tree and Split Tree[Carr et al. 03] Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree

  37. Final Contour Tree Hard to BATCH! 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree

  38. 9 9 9 Now can BATCH! 8 8 8 u 7 7 u 7 u 6 6 6 v v v 5 5 5 w w w 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge

  39. Experimental Results On the Neuse River Basin data set

  40. Coping with Noise: Future Work • Constructing hydrologically conditioned DEMs • Formulate as an optimization problem • Using persistent homology • Integrating topology and geometry • Handling flat areas • Scalability remains a big issue • Handling anthropogenic features

  41. Back to Pipeline: Flow Modeling

  42. 110 90 85 95 100 80 Sea From Elevation to River Networks • Where does water go? • From higher elevation to lower elevation • Single flow directions forms a tree

  43. 1 2 2 1 7 1 9 1 1 2 11 14 17 1 2 25 1 3 7 1 1 1 Drainage Area • How much area is upstream of each node? • Each node has initial drainage area (1) • Drainage area of internal nodes depends on drainage area of children 3 3 5

  44. Drainage Area example (Panama)

  45. IFSARE and SRTM based stream analysis Stream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data. SRTM - 7400x3600 DSM at 90m resolution for entire Panama, IFSARE - 10800x11300 DEM at 10m resolution for the Panama canal section Using r.terraflow - officially released in GRASS6.2 in 2006 - streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr

  46. Hierarchical Watershed Decomposition

  47. Watershed Hierarchies • Decompose a river network into a hierarchy of hydrological units • All water in HU flows to a common outlet • Hierarchy provides tunable level of detail • Method used: Pfafstetter [VV99] • Want a solution scalable to large modern hi-res terrains

  48. 1 2 1 8 3 9 1 7 2 7 3 1 4 5 11 2 9 14 1 3 17 2 1 1 3 25 7 2 6 1 1 1 1 5 Pfafstetter • Find main river • Find four largest tributaries • Label basins/interbasins • Recurse until single path

  49. 72 75 7 71 73 74 25 24 21 2 27 26 22 23 Recurse

  50. 96 9 97 999 98 998 94 6 994 992 993 99 4 95 93 997 8 92 2 91 996 7 991 5 3 995 1 Example Watershed Boundaries

More Related