1 / 27

Mr. Scan: Efficient Clustering with MRNet and GPUs

Mr. Scan: Efficient Clustering with MRNet and GPUs. Evan Samanas and Ben Welton. Density-based clustering. Discovers the number of clusters Finds oddly-shaped clusters. Clustering Example (DBSCAN [1] ). Goal: Find regions that meet minimum density and spatial distance characteristics.

rusk
Download Presentation

Mr. Scan: Efficient Clustering with MRNet and GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben Welton

  2. Density-based clustering • Discovers the number of clusters • Finds oddly-shaped clusters Mr. Scan: Efficient Clustering with MRNet and GPUs

  3. Clustering Example (DBSCAN[1]) Goal: Find regions that meet minimum density and spatial distance characteristics The two parameters that determine if a point is in a cluster is Epsilon (Eps), and MinPts If the number of points in Eps is >MinPts, the point is a core point. For every discovered point, this same calculation is performed until the cluster is fully expanded Eps MinPts MinPts: 3 [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996) Mr. Scan: Efficient Clustering with MRNet and GPUs

  4. Scaling DBSCAN • PDBSCAN (1999)[2] • Qualityequivalent to single DBSCAN • Linearspeedup up to 8 nodes • DBDC (2004)[3] • Sacrifices quality • ~30x speedup on 15 nodes • PDSDBSCAN (2012) [4] • Quality equivalent to single node DBSCAN • 5675x Speedup on 8192 nodes (72 Million Points) • 2 Map/Reduce attempts (2011, 2012) • Quality equivalent to single node DBSCAN • 6x speedup on 12 nodes [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012) Mr. Scan: Efficient Clustering with MRNet and GPUs

  5. Challenges of scaling DBSCAN • Data distribution • How do we effectively take an input file and create partitions that can be clustered by DBSCAN? • Distributed 2-D partitioner reading from a distributed file system • Load balancing • How to keep variance in clustering times across nodes to a minimum? • Dense Box • Merge • How do we reduce the amount of data needed for the merge while keeping accuracy high? • Representative points Mr. Scan: Efficient Clustering with MRNet and GPUs

  6. BE BE BE BE app app app app app app app app app app app app app app app app MRNet – Multicast / Reduction Network FE • General-purpose TBON API • Network: user-defined topology • Stream: logical data channel • to a set of back-ends • multicast, gather, and custom reduction • Packet: collection of data • Filter: stream data operator • synchronization • transformation • Widely adopted by HPC tools • CEPBA toolkit • Cray ATP & CCDB • Open|SpeedShop & CBTF • STAT • TAU F(x1,…,xn) CP CP CP CP CP CP … … … Mr. Scan: Efficient Clustering with MRNet and GPUs

  7. BE BE BE BE app app app app app app app app app app app app app app app app TBON Computation • Ideal Characteristics: • Filter output size • constant or decreasing • Computation rate • similar across levels • Adjustable for load • balance Total Time: ~30 sec Total Time: ~60 sec FE ~10 sec Packet Size: ≤10 MB CP CP ~10 sec Packet Size: ≤10 MB 4x … ~10 sec ~40 sec ~10 sec Data Size: 10MB per BE Mr. Scan: Efficient Clustering with MRNet and GPUs

  8. Intro to Mr. Scan Merge Mr. Scan Phases Partition: Distributed DBSCAN: GPU(@ BE) Merge: CPU (x #levels) Sweep: CPU (x #levels) FE Merge Sweep CP CP DBSCAN Sweep BE BE BE BE FE BE BE BE BE FS Mr. Scan: Efficient Clustering with MRNet and GPUs

  9. Mr. Scan Architecture FS Read 224 Secs FS Write 489 Secs FS Read: 24 Secs Partitioner MRNet Startup 130 Secs DBSCAN 168 Secs DBSCAN Write Output: 19 Secs Merge Time: 6 Secs Merge & Sweep Sweep Time: 4 Secs Time: 0 Clustering 6.5 Billion Points Time: 18.2 Min Mr. Scan: Efficient Clustering with MRNet and GPUs

  10. Partition Phase • Goal: Partitions computationally equivalent to DBSCAN • Algorithm: • Form initial partitions • Add shadow regions • Rebalance Mr. Scan: Efficient Clustering with MRNet and GPUs

  11. Distributed Partitioner Mr. Scan: Efficient Clustering with MRNet and GPUs

  12. GPU DBSCAN Filter DBSCAN is performed in two distinct steps Step 2: Expand core points and color Step 1: Detect Core Points Block 1 Block 1 T 512 T 512 T 1 T 1 T 2 T 2 Block 2 Block 2 T 512 T 1 T 2 T 512 T 1 T 2 Block 900 Block 900 T 512 T 1 T 2 T 512 T 1 T 2 Mr. Scan: Efficient Clustering with MRNet and GPUs

  13. Dense Box • We reduce the computation cost of high density regions by pre-clustering these regions • One significant scalability issue is dealing with dense regions of data • Density increases the computation cost of DBSCAN KD-Tree R2 R1 ` R2 Requires more comparison operations Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps DBSCAN no longer needs to expand these regions Mr. Scan: Efficient Clustering with MRNet and GPUs

  14. Merge Algorithm • Merge overlapping clusters found on different nodes. • Two steps in the merge operation • Select Representative points (BE) • Merge operation Mr. Scan: Efficient Clustering with MRNet and GPUs

  15. Representative Points • These are points that represent the core points in the dataset. • Create a boundarywhich at least one core point shared between overlapping clusters must be contained. These points create a boundary(shaded region) which a point must fall in to merge overlapping clusters Representative points are the points closest to the corners and middle of the side of the eps box Mr. Scan: Efficient Clustering with MRNet and GPUs

  16. Merge Algorithm Core Point Core Point Non-Core Point Non-Core Point • Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes. • Need to handle the merge with low overhead and without the full dataset 2. Non-core/Core overlap 1. Core/Core overlap Node 1 Node 1 Node 2 Node 2 Core point seen as non-core by one node. MinPts * 2 operations required to detect Core Point in common. 64 operations to detect. Mr. Scan: Efficient Clustering with MRNet and GPUs

  17. Sweep Step • Get cluster identifiers and file offsets down to BE’s to write final clusters. • FE gives each cluster a unique ID and a file offset. • This data is passed back down to the BE that holds the data in the cluster. • Data is written out to disk by the BE. Mr. Scan: Efficient Clustering with MRNet and GPUs

  18. Experiment Setup • Dataset: Generated data with distribution from real Twitter data • Measuring: • Weak Scaling up to 8192 GPUs • Strong Scaling • Quality compared to single-threaded DBSCAN Mr. Scan: Efficient Clustering with MRNet and GPUs

  19. Results Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase Mr. Scan: Efficient Clustering with MRNet and GPUs

  20. Results Breakdown – Partition Phase @ 6.5 Billion Points: 65.9% of Mr. Scan’s time 94.6% I/O time Mr. Scan: Efficient Clustering with MRNet and GPUs

  21. Results Breakdown – GPU Cluster Time Mr. Scan: Efficient Clustering with MRNet and GPUs

  22. Strong Scaling Mr. Scan: Efficient Clustering with MRNet and GPUs

  23. Quality Mr. Scan: Efficient Clustering with MRNet and GPUs

  24. Future Work • Remove partitioner’s I/O bottleneck • Multiple dimensions Mr. Scan: Efficient Clustering with MRNet and GPUs

  25. Conclusion • Clustered 6.5 billion points with DBSCAN in 18.2 minutes • Controlled computational variance of DBSCAN • PartitionerI/O = scaling enemy Mr. Scan: Efficient Clustering with MRNet and GPUs

  26. Questions? A Brief Discussion of Ways and Means

  27. Summary of previous Mr. Scan implementation Algorithm Steps SpatialDecomp: CPU(@ FE) DBSCAN: CPU or GPU(@ BE) DrawBoundBox:CPU or GPU MergeCluster:CPU (x #levels) FE MergeCluster CP CP DBSCAN BE BE BE BE Mr. Scan: Efficient Clustering with MRNet and GPUs

More Related