1 / 29

אשכול בעזרת אלגורתמים בתורת הגרפים

אשכול בעזרת אלגורתמים בתורת הגרפים. רועי יצחק. Elements of Graph Theory. A graph G = ( V , E ) consists of a vertex set V and an edge set E . If G is a directed graph , each edge is an ordered pair of vertices

fisk
Download Presentation

אשכול בעזרת אלגורתמים בתורת הגרפים

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. אשכול בעזרת אלגורתמים בתורת הגרפים רועי יצחק

  2. Elements of Graph Theory • A graphG = (V,E) consists of a vertex setV and an edge setE. • If G is a directed graph, each edge is an ordered pair of vertices • A bipartite graph is one in which the vertices can be divided into two groups, so that all edges join vertices in different groups.

  3. 0 1 1.5 2 5 6 7 9 1 0 2 1 6.5 6 8 8 1.5 2 0 1 4 4 6 5.5 . . . העברת הנק' לגרף • קבע את הנק' במישור • קבע מרחק בין כל זוג נקודות graph representation distance matrix n-D data points

  4. Minimal Cut • Create a graph G(V,E) from the data • Compute all the distances in the graph • The weight of each edge is the distance • Remove edges from the graph according to some threshold

  5. V={xi} Set of n vertices representing data points E={Wij} Set of weighted edges indicating pair-wise similarity between points 0.1 5 0.8 1 0.8 0.8 0.6 6 2 4 0.7 0.8 0.2 3 Similarity Graph • Distance decrease similarty increase • Represent dataset as a weighted graph G(V,E)

  6. Similarity Graph • Wij represent similarity between vertex • If Wij=0 where isn’t similarity • Wii=0

  7. Graph Partitioning • Clustering can be viewed as partitioning a similarity graph • Bi-partitioning task: • Divide vertices into two disjoint groups (A,B) A B 5 1 2 4 6 3 V=A U B

  8. Apply these objectives to our graph representation 0.1 5 0.8 1 0.8 0.8 0.6 6 2 4 0.7 0.2 0.8 3 Minimize weight of between-group connections Clustering Objectives • Traditional definition of a “good” clustering: • Points assigned to same cluster should be highly similar. • Points assigned to different clusters should be highly dissimilar.

  9. A B 0.1 0.8 5 1 0.8 0.8 cut(A,B) = 0.3 0.6 6 4 2 0.7 0.8 0.2 3 Graph Cuts • Express partitioning objectives as a function of the “edge cut” of the partition. • Cut: Set of edges with only one vertex in a group.we wants to find the minimal cut beetween groups. The groups that has the minimal cut would be the partition

  10. Degenerate case: Optimal cut Minimum cut • Problem: • Only considers external cluster connections • Does not consider internal cluster density Graph Cut Criteria • Criterion: Minimum-cut • Minimise weight of connections between groups min cut(A,B)

  11. Graph Cut Criteria (continued) • Criterion: Normalised-cut (Shi & Malik,’97) • Consider the connectivity between groups relative to the density of each group. • Normalise the association between groups by volume. • Vol(A): The total weight of the edges originating from group A. • Why use this criterion? • Minimising the normalised cut is equivalent to maximising normalised association. • Produces more balanced partitions.

  12. עץ פורש מינימאלי (MST-Prim) • קבע קודקוד מקור והכנס אותו לסט A (עץ) • מצא את הקשת הקלה ביותר אשר המחברת בין הסט B (שאר הקודקודים בגרף) לעץ (A) • חזור על התהליך עד שלא ישארו קודקודים בסט B

  13. מציאת clustring • קבע את כיוון ההתקדמות בעץ (כל הוספה של צומת) בפונקציה של משקל הקשת שהוספה • כל "עמק" בגרף מייצג cluster

  14.     4          8 8 8 7 7 b b c c d d 9 9 4 4 2 2 a a e e i i 11 11 14 14 4 4 6 6 7 7 8 8 10 10 h h g g f f 2 2 1 1 Example

  15. 4    8 8 8 7 7 b b c c d d 8  4 9 9 4 4 2 2 a a e e i i 11 11 14 14 4 4   6 6 7 7 8 8 10 10 h h g g f f 2 2 1 1   8 Example 8   7 2 4

  16. 8 8 7 7 b b c c d d 8 4 7 8 4 7 9 9 4 4 2 2 a a e e i i 11 11 14 14 4 4  2  2 6 6 7 7 8 8 10 10 h h g g f f 2 2 1 1 4  8 4 6 7 Example 6 7 10 2

  17. 8 8 7 7 b b c c d d 8 8 4 4 7 7 9 9 4 4 2 2 a a e e i i 11 11 14 14 4 4 10 10 2 2 6 6 7 7 8 8 10 10 h h g g f f 2 2 1 1 4 4 2 2 1 7 Example 1

  18. 8 7 b c d 8 4 7 9 4 2 a e i 11 14 4 10 2 6 7 8 10 h g f 2 1 4 2 1 Example 9

  19. Similarity Graph • Wij represent similarity between vertex • If Wij=0 where isn’t similarity • Wii=0

  20. Dataset exhibits complex cluster shapes • K-means performs very poorly in this space due bias toward dense spherical clusters. In the embedded space given by two leading eigenvectors, clusters are trivial to separate. Example – 2 Spirals

  21. The eigenvalues and eigenvectors of a matrix provide global information about its structure. • Spectral Graph Theory • Analyse the “spectrum” of matrix representing a graph. • Spectrum : The eigenvectors of a graph, ordered by the magnitude(strength) of their corresponding eigenvalues. Spectral Graph Theory • Possible approach • Represent a similarity graph as a matrix • Apply knowledge from Linear Algebra…

  22. Matrix Representations • Adjacency matrix (A) • n x n matrix • : edge weight between vertex xiand xj 0.1 0.8 5 1 0.8 0.8 0.6 6 4 2 0.7 0.8 0.2 3 • Important properties: • Symmetric matrix • Eigenvalues are real • Eigenvector could span orthogonal base

  23. Degree matrix (D) n x n diagonal matrix : total weight of edges incident to vertex xi Matrix Representations (continued) 0.1 0.8 5 1 0.8 0.8 0.6 6 4 2 0.7 0.8 0.2 3 • Important application: • Normalise adjacency matrix

  24. Matrix Representations (continued) L = D - A • Laplacian matrix (L) • n x n symmetric matrix 0.1 0.8 5 1 0.8 0.8 0.6 6 4 2 0.7 0.8 0.2 3 • Important properties: • Eigenvalues are non-negative real numbers • Eigenvectors are real and orthogonal • Eigenvalues and eigenvectors provide an insight into the connectivity of the graph…

  25. Another option – normalized laplasian • Laplacian matrix (L) • n x n symmetric matrix 0.1 0.8 5 1 0.8 0.8 0.6 6 4 2 0.7 0.8 0.2 3 • Important properties: • Eigenvectors are real and normalize • Each Aij which i,j is not equal =

  26. Spectral Clustering Algorithms • Three basic stages: • Pre-processing • Construct a matrix representation of the dataset. • Decomposition • Compute eigenvalues and eigenvectors of the matrix. • Map each point to a lower-dimensional representation based on one or more eigenvectors. • Grouping • Assign points to two or more clusters, based on the new representation.

  27. 0.0 • Decomposition • Find eigenvalues Xand eigenvectors Λof the matrix L 0.4 0.2 0.1 0.4 -0.2 -0.9 0.4 0.4 0.2 0.1 -0. 0.4 0.3 2.2 0.4 0.2 -0.2 0.0 -0.2 0.6 Λ = X = 2.3 0.4 -0.4 0.9 0.2 -0.4 -0.6 0.4 -0.7 -0.4 -0.8 -0.6 -0.2 2.5 0.4 -0.7 -0.2 0.5 0.8 0.9 3.0 x1 0.2 • Map vertices to corresponding components of λ2 x2 0.2 x3 0.2 x4 -0.4 x5 -0.7 x6 -0.7 Spectral Bi-partitioning Algorithm • Pre-processing • Build Laplacian matrix L of the graph

  28. Spectral Bi-partitioning Algorithm The matrix which represents the eigenvector of the laplacian the eigenvector matched to the corresponded eigenvalues with increasing order

  29. Split at 0 Cluster A: Positive points Cluster B: Negative points x1 0.2 B A x2 0.2 x3 0.2 x4 -0.4 x1 0.2 x4 -0.4 x5 -0.7 x2 0.2 x5 -0.7 x6 -0.7 x3 0.2 x6 -0.7 Spectral Bi-partitioning (continued) • Grouping • Sort components of reduced 1-dimensional vector. • Identify clusters by splitting the sorted vector in two. • How to choose a splitting point? • Naïve approaches: • Split at 0, mean or median value • More expensive approaches • Attempt to minimise normalised cut criterion in 1-dimension

More Related