experiences on processing spatial data with mapreduce l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Experiences on Processing Spatial Data with MapReduce PowerPoint Presentation
Download Presentation
Experiences on Processing Spatial Data with MapReduce

Loading in 2 Seconds...

play fullscreen
1 / 22

Experiences on Processing Spatial Data with MapReduce - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Experiences on Processing Spatial Data with MapReduce. Ariel Cary, Zhengguo Sun, Vagelis Hristidis , Naphtali Rishe Florida International University School of Computing and Information Sciences 11200 SW 8 th St, Miami, FL 33199 {acary001,sunz,vagelis,rishen}@ cis.fiu.edu

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Experiences on Processing Spatial Data with MapReduce' - etana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
experiences on processing spatial data with mapreduce

Experiences on Processing Spatial Data with MapReduce

Ariel Cary, Zhengguo Sun, VagelisHristidis, Naphtali Rishe

Florida International University

School of Computing and Information Sciences

11200 SW 8th St, Miami, FL 33199

{acary001,sunz,vagelis,rishen}@cis.fiu.edu

Sponsored by: NSF Cluster Exploratory (CluE)

agenda
Agenda
  • Introduction
  • Solving Spatial Problems in MapReduce
    • R-Tree Index Construction
    • Aerial Image Processing
  • Experimental Results
  • Related Work
  • Conclusion

Florida International University

introduction
Introduction
  • Spatial databases mainly store:
    • Raster data (satellite/aerial digital images), and
    • Vector data (points, lines, polygons).
  • Traditional sequential computing models may take excessive time to process large and complex spatial repositories.
  • Emerging parallel computing models, such as MapReduce, provide a potential for scaling data processing in spatial applications.

Florida International University

introduction cont
Introduction (cont.)
  • MapReduce is an emerging massively parallel computing model (Google) composed of two functions:
    • Map: takes a key/value pair, executes some computation, and emits a set of intermediate key/value pairs as output.
    • Reduce: merges its intermediate values, executes some computation on them, and emits the final output.
  • In this work, we present our experiences in applying the MapReduce model to:
    • Bulk-construct R-Trees (vector) and
    • Compute aerial image quality (raster)

Florida International University

introduction cont5
Introduction (cont.)
  • Apache's Hadoop
  • Linux operating system
  • XEN hypervisor
  • HadoopDistributed File System (HDFS)
  • SOCKS proxy server
  • 480 computers (nodes), each half terabytes storage

Florida International University

2 solving spatial problems in mapreduce

2. Solving Spatial Problems in MapReduce

R-Tree Index Construction

Aerial Image Processing

mapreduce mr r tree construction
MapReduce (MR) R-Tree Construction
  • R-Tree Bulk-Construction
    • Every object oin database D has two attributes:
      • o.id - the object’s unique identifier.
      • o.P - the object’s location in some spatial domain.
    • The goal is to build an R-Tree index on D.
  • MapReduce Algorithm
    • Database partitioning function computation (MR).
    • A small R-Tree is created for each partition (MR).
    • The small R-Trees are merged into the final R-Tree.

Florida International University

phase 1 partitioning function
Phase 1 – Partitioning Function
  • Goal: compute f to assign objects of D into one of R possible partitions s.t.:
    • R (ideally) equally-sized partitions are generated (minimal variance is acceptable).
    • Objects close in the spatial domain are placed within the same partition.
  • Proposed solution:
    • Use Z-order space-filling curve to map spatial coordinate samples (3%) into an sorted sequence.
    • Collect splitting points that partition the sequence in R ranges.
  • Where:
    • o is an spatial object in the database.
    • C which is a constant that helps in sending Mappers' outputs to a single Reducer.
    • U is a space-filling curve, e.g. Z-order value.
    • S' is an array containing R-1 splitting points.

Florida International University

phase 2 r tree construction in mr
Phase 2 - R-Tree Construction in MR
  • Mappers compute f() values for objects.
  • Reducers compute an R-Tree for each group of objects with identical f() value
  • Where:
    • o is an spatial object in the database.
    • f is the partitioning function computed in phase 1.
    • Tree.root is the R-Tree root node.

Florida International University

phase 3 r tree consolidation
Phase 3 - R-Tree Consolidation
  • sequential process

Florida International University

image processing in mapreduce
Image Processing in MapReduce
  • Aerial Image Quality Computation
    • Let d be a orthorectified aerial photography (DOQQ) file and t be a tile inside d, d.name is d’s file name and t.q is the quality information of tile t.
    • The goal is to compute a quality bitmap for d.
  • MapReduce Algorithm
    • A customized InputFormatter partitions each DOQQ file d into several splits containing multiple tiles.
    • The Mappers compute the quality bitmap for each tile inside a split.
    • The Reducers merge all the bitmaps that belongs to a file d and write them to an output file.

Florida International University

image processing in mapreduce12
Image Processing in MapReduce
  • MapReduce Algorithm
  • Where:
    • d is a DOQQ file.
    • t is a tile in d.
    • t.q is the quality bitmap of t.

Florida International University

experimental results setting
Experimental Results: Setting
  • Data Set
  • Environment
    • The cluster was provided by the Google and IBM Academic Cluster Computing Initiative.
    • The cluster contains around 480 computers running Hadoop- open source MapReduce.

Florida International University

experimental results r tree
Experimental Results: R-Tree
  • R-Tree Construction Performance Metrics

Florida International University

experimental results r tree16
Experimental Results: R-Tree
  • MapReduce R-Trees vs. Single Process (SP)

Florida International University

experimental results imagery
Experimental Results: Imagery
  • Tile Quality Computation

Florida International University

related work
Related Work
  • Previous works on R-Tree parallel construction faced intrinsic distributed computing problems: load balancing, process scheduling, fault tolerance, etc.
  • Schnitzer and Leutenegger [16] proposed a Master-Client R-Tree, where the data set is first partitioned using Hilbert packing sort algorithm, then the partitions are declustered into a number of processors, where individual trees are built. At the end, a master process combines the individual trees into the final R-Tree.
  • Papadopoulos and Manolopoulos [17] proposed a methodology for sampling-based space partitionining, load balancing, and partition assignment into a set of processors in parallely building R-Trees.

Florida International University

conclusion
Conclusion
  • We used the MapReduce model to solve two spatial problems on a Google&IBM cluster:
    • (a) Bulk-construction of R-Trees and
    • (b) Aerial image quality computation
  • MapReduce can dramatically improve task completion times. Our experiments show close to linear scalability.
  • Our experience in this work shows MapReduce has the potential to be applicable to more complex spatial problems.

Florida International University

references
References

[1] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD 1984:47-57.

[2] NSF Cluster Exploratory Program: http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm

[3] Google&IBM Academic Cluster Computing Initiative:

http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html

[4] Apache Hadoop project: http://hadoop.apache.org

[6] Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, USENIX Association, Volume 6, pp. 10-10, December 2004.

[12] High Performance Database Research Center (HPDRC), Research Division of the Florida International University, School of Computing and Information Sciences, University Park, Telephone: (305) 348-1706, FIU ECS-243, Miami, FL 33199.

[16] Schnitzer B., Leutenegger S.T.: Master-client R-trees: a new parallel R-tree architecture, In Proceedings of the 11th International Conference on Scientific and Statistical Database Management, pp. 68-77, August 1999.

[17] Apostolos Papadopoulos, Yannis Manolopoulos: Parallel bulk-loading of spatial data, Parallel Computing, Volume 29, Issue 10, pp. 1419 - 1444, October 2003.

Florida International University