1 / 21

PGAS 2006 Vijay Gandhi, Mete Celik, Shashi Shekhar

Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification. PGAS 2006 Vijay Gandhi, Mete Celik, Shashi Shekhar Army High Performance Computing and Research Center (AHPCRC) University of Minnesota . Overview. PGAS Relevance, Application Domain

minya
Download Presentation

PGAS 2006 Vijay Gandhi, Mete Celik, Shashi Shekhar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing Spatial Data Mining Algorithms:A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi Shekhar Army High Performance Computing and Research Center (AHPCRC) University of Minnesota

  2. Overview • PGAS Relevance, Application Domain • Problem Definition • Approach • Experimental Results • Conclusion & Future Work

  3. PGAS Relevance • How effective is UPC in parallelizing spatial applications? • How effective is UPC in improving productivity of researchers in spatial domain?

  4. Spatial Applications: An Example • Multiscale Multigranular Image Classification Output Images at Multiple Scales Input

  5. MSMG Classification - Formulation • Model • = observations • = a classification model • = log-likelihood (Quality Measure) of M • = Penalty function • Calculation of log-likelihood of • Uses Expectation Maximization • Computationally Expensive • 7 hours of Computation time for an input image of size 512 x 512 pixels with 4 Classes

  6. Spatial Application: Multiscale Multigranular Image Classification • Applications • Land-cover change Analysis • Environmental Assessment • Agricultural Monitoring • Challenges • Expensive computation of Quality Measure i.e. likelihood • Large amount of data • Many dimensions

  7. Pseudo-code : Serial Version • 1.Initialize parameters. • 2.for each Class • 3.for each Spatial Scale • 4.for each Quad • 5. Calculate Quality Measure • 8.end for Quad • 9. end for Spatial Scale • 10.end for Class • 11.Post-processing Q? What are the options for parallelization?

  8. Parallelization – Problem Definition • Given • Serial version of a Spatial Data Mining Algorithm • Likelihood of each specific class at each pixel • Class-hierarchy • Maximum Spatial Scale • Find • Parallel formulation of the algorithm • Objective • Scalability e.g. Isoefficiency • Constraints • Parallel Platform “UPC”

  9. Challenges in Parallelization Description of work • Compute Quality Measure for combinations of Class-label, Scale, Quad (Spatial Unit) Challenges • Variable workload across computations of quality measure • Many dimensions to parallelize i.e. Class-label, Scale, Quad • Dependency across scales

  10. Class-level Parallelization • 1.    Initialize parameters and memory • 2.    upc_forall Class • 3.    for each Spatial Scale • 4.    for each Quad • 5.    Calculate Quality Measure • 8.    end for Quad • 9.    end for Spatial Scale • 10. end upc_forall Class • 11. Post-processing

  11. Class-level Parallelization • Disadvantages: • Workload distribution is uneven (Cost of Quality measure changes with Class) • Number of parallel processors is restricted to number of classes Examples

  12. Quad-level Parallelization • 1.    Initialize parameters and memory • 2.    for each Spatial Scale • 3.upc_forall Quad • 4.    for each Class • 5.    Calculate Quality Measure • 6 end for Class • 7.   end upc_forall Quad • 8. upc_barrier • 9.    end for Spatial Scale • 11. Post-processing

  13. Quad-level Parallelization • Advantages: • Workload distribution is more even • Greater number of processors can be used • Number of Quads = f (Number of pixels) • Example • Input: 4 Classes, Scale of 6

  14. Experimental Design

  15. Workload Scale: 64 x 64 Scale: 2 x 2 Input class hierarchy Output Images at Multiple scales

  16. Effect of Number of Processors Speedup Efficiency Plot • Quad-level parallelization gives better speed-up • Room for Speed-up for both approaches • Q? Class-level << Quad-level. Why?

  17. Workload Distribution Fixed Parameter - Number of processors: 4 • Quad-level parallelization provides better load-balance • Probably because of large number of Quads (~100,000)

  18. Conclusions • How effective is UPC in parallelizing Spatial applications? • Quad-level parallelization • Speed-up of 6.65 on 8 processors • Large number of Quads (98,304) • Class-level parallelization • Speed-ups are lower • Smaller number of Classes (4)

  19. Conclusions • How effective is UPC in improving productivity of researches in spatial domain? • Coding effort was reduced • 20 lines of new code in program with base size of 2000 lines • 1 person-month • Analysis effort refocused • Identify units of parallel work i.e. Quality Measure • Identify dimensions to parallelize i.e. Quad, Class, Scale • Selecting dimension(s) to parallelize • Dependency Analysis (Ruled out Scale) • Number of Units (Larger the better) • Load Balancing • 6 person-month

  20. Future Work • Improve Efficiency • Explore Dynamic Load Balancing • Other parallel formulations Acknowledgements • Spatial Databases / Spatial Data Mining Group • AHPCRC • Richard Welsh, NCS • University of Boston • Junchang Ju, Eric D. Kolaczyk, Sucharita Gopal

  21. References • E. D. Kolaczyk, J. J., and G. S. Multiscale, Multigranular Statistical Image Segmentation. Journal of the American Statistical Association, 100, 1358-1369, 2005. • Z. Kato, M. Berthod, and J. Zerubia. A hierarchical Markov random field model and multi-temperature annealing for parallel image classification. Graphical Models and Image Processing, 58(1):18–37, January 1996. • A. Y. Grama, A. Gupta, V. Kumar. Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures. IEEE Parallel & Distributed Technology: Systems & Technology, 1, 12-21, 1993.

More Related