Disc finder a distributed algorithm for identifying galaxy clusters
This presentation is the property of its rightful owner.
Sponsored Links
1 / 6

DISC-Finder: A distributed algorithm for identifying galaxy clusters PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

DISC-Finder: A distributed algorithm for identifying galaxy clusters. Two galaxies are “friends” if they are close to each other. We analyze an undirected graph, where galaxies are vertices and their “friendships” are edges. We need to identify its connected components.

Download Presentation

DISC-Finder: A distributed algorithm for identifying galaxy clusters

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Disc finder a distributed algorithm for identifying galaxy clusters

DISC-Finder:A distributed algorithmfor identifying galaxy clusters


Friends of friends fof technique

  • Two galaxies are “friends” if they are close to each other

  • We analyze an undirected graph, where galaxies are vertices and their “friendships” are edges

  • We need to identify its connected components

Friends-of-Friends (FoF) technique:

Identification of galaxy clusters

Sequential algorithms

  • Exact: O((n∙ log n)1.5)

  • Approximate: O(n)


Distributed procedure

  • Divide the space into “slightly overlapping” cubes

Load balancing:

  • Randomly select a subset of galaxies

  • Apply the kd-tree construction to build a balanced partition for the subset

  • Use it for the full set of galaxies

  • Distributed computation: Apply a sequential FoF algorithm to find the clusters within each cube

  • Use any sequential FoF

  • Allocate different cores to cubes

  • Identify cross-cube edges and merge the respective clusters

  • Apply the union-find algorithm to thegalaxies in the cube overlaps

Distributed procedure


Distributed procedure1

REDUCER

MAPPER

MAPPER

MAPPER

REDUCER

REDUCER

Distributed procedure

localclusters

galaxysets

apply localsequential FoF

divide the spaceinto cubes


Advantages

Advantages

  • Scalable: We can apply it to massive datasets and use all available cores

  • Black-box use of a sequential FoF:We can utilize any FoF algorithm

  • Hadoop friendly: We have mapped all main operations into the Hadoop framework, which has resulted in very compact code (800 lines)


Scalability

14,800 mlngalaxies

500 mlngalaxies

Scalability

Time

(min)

240

SCALES TO LARGE DATA SETS

EFFECTIVELY USES AVAILBLE CORES

60

1000 mlngalaxies

15

4

8

16

32

64

256

128

Number of cores


  • Login