# Data Structures and Algorithms - PowerPoint PPT Presentation

1 / 53

Data Structures and Algorithms. Graphs I: Representation and Search Gal A. Kaminka Computer Science Department. Outline. Reminder: Graphs Directed and undirected Matrix representation of graphs Directed and undirected Sparse matrices and sparse graphs Adjacency list representation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Data Structures and Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Data Structures and Algorithms

Graphs I:

Representation and Search

Gal A. Kaminka

Computer Science Department

### Outline

• Reminder: Graphs

• Directed and undirected

• Matrix representation of graphs

• Directed and undirected

• Sparse matrices and sparse graphs

### Graphs

• Tuple <V,E>

• V is set of vertices

• E is a binary relation on V

• Each edge is a tuple < v1,v2 >, where v1,v2 in V

• |E| =< |V|2

### Directed and Undirected Graphs

• Directed graph:

• < v1, v2 > in E is ordered, i.e., a relation (v1,v2)

• Undirected graph:

• < v1, v2 > in E is un-ordered, i.e., a set { v1, v2 }

• Degree of a node X:

• Out-degree: number of edges < X, v2 >

• In-degree: number of edges < v1, X >

• Degree: In-degree + Out-degree

• In undirected graph: number of edges { X, v2 }

### Paths

• Path from vertex v0 to vertex vk:

• A sequence of vertices < v0, v1, …. vk >

• For all 0 =< i < k, edge < vi, vi+1 > exists.

• Path is of length k

• Two vertices x, y are adjacent if < x, y > is an edge

• Path is simple if vertices in sequence are distinct.

• Cycle: if v0 = vk

• < v, v > is cycle of length 1

### Connected graphs

• Undirected Connected graph:

• For any vertices x, y there exists a path xy (= yx)

• Directed connected graph:

• If underlying undirected graph is connected

• Strongly connected directed graph:

• If for any two vertices x, y there exist path xy

and path yx

• Clique: a strongly connected component

• |V|-1 =< |E| =< |V|2

### Cycles and trees

• Graph with no cycles: acyclic

• Directed Acyclic Graph: DAG

• Undirected forest:

• Acyclic undirected graph

• Tree: undirected acyclic connected graph

• one connected component

### Representing graphs

• When graph is dense

• |E| close to |V|2

• When graph is sparse

• |E| << |V|2

• Matrix of size |V| x |V|

• Each row (column) j correspond to a distinct vertex j

• “1” in cell < i, j > if there is exists an edge <i,j>

• Otherwise, “0”

• In an undirected graph, “1” in <i,j> => “1” in <j,i>

• “1” in <j,j> means there’s a self-loop in vertex j

### Examples

1

1

2

1 2 3

1 0 0 1

2 0 1 0

3 1 1 0

3

2

3

4

1 2 3 4

1 0 1 1 0

2 1 0 0 0

3 1 0 0 0

4 0 0 0 0

• Storage complexity: O(|V|2)

• But can use bit-vector representation

• Undirected graph: symmetric along main diagonal

• AT transpose of A

• Undirected: A=AT

• In-degree of X: Sum along column X O(|V|)

• Out-degree of X: Sum along row X O(|V|)

• Very simple, good for small graphs

• Edge existence query: O(1)

### But, ….

• Many graphs in practical problems are sparse

• Not many edges --- not all pairs x,y have edge xy

• Matrix representation demands too much memory

• We want to reduce memory footprint

• Use sparse matrix techniques

• An array Adj[ ] of size |V|

• Each cell holds a list for associated vertex

• List does not have to be sorted

Undirected graphs:

• Each edge is represented twice

### Examples

1

1

2

1 3

2 2

3 1  2

3

2

3

4

1 2  3

2 1

3 1

4

• Storage Complexity:

• O(|V| + |E|)

• In undirected graph: O(|V|+2*|E|) = O(|V|+|E|)

• Edge query check:

• O(|V|) in worst case

• Degree of node X:

• Out degree: Length of Adj[X] O(|V|) calculation

• In degree: Check all Adj[] lists O(|V|+|E|)

• Can be done in O(1) with some auxiliary information!

שאלות?

### Graph Traversals (Search)

• We have covered some of these with binary trees

• Depth-first search (DFS)

• A traversal (search):

• An algorithm for systematically exploring a graph

• Visiting (all) vertices

• Until finding a goal vertex or until no more vertices

Only for connected graphs

• One of the simplest algorithms

• Also one of the most important

• It forms the basis for MANY graph algorithms

### BFS: Level-by-level traversal

• Given a starting vertex s

• Visit all vertices at increasing distance from s

• Visit all vertices at distance k from s

• Then visit all vertices at distance k+1 from s

• Then ….

5

2

1

3

8

6

10

7

9

### BFS in a binary tree (reminder)

BFS: visit all siblings before their descendents

5 2 8 1 3 6 10 7 9

### BFS(tree t)

• q  new queue

• enqueue(q, t)

• while (not empty(q))

• curr  dequeue(q)

• visit curr // e.g., print curr.datum

• enqueue(q, curr.left)

• enqueue(q, curr.right)

This version for binary trees only!

### BFS for general graphs

• This version assumes vertices have two children

• left, right

• This is trivial to fix

• But still no good for general graphs

• It does not handle cycles

Example.

A

B

G

C

E

D

F

Queue: A

A

B

G

C

E

D

F

Queue: A B E

B and E are next

A

B

G

C

E

D

F

Queue: A B E C G D F

When we go to B, we put G and C in the queue

When we go to E, we put D and F in the queue

A

B

G

C

E

D

F

Queue: A B E C G D F

When we go to B, we put G and C in the queue

When we go to E, we put D and F in the queue

A

B

G

C

E

D

F

Queue: A B EC G D F F

Suppose we now want to expand C.

We put F in the queue again!

### Generalizing BFS

• Cycles:

• We need to save auxiliary information

• Each node needs to be marked

• Visited: No need to be put on queue

• Not visited:Put on queue when found

What about assuming only two children vertices?

• Need to put all adjacent vertices in queue

### BFS(graph g, vertex s)

• unmark all vertices in G

• q  new queue

• mark s

• enqueue(q, s)

• while (not empty(q))

• curr  dequeue(q)

• visit curr // e.g., print its data

• for each edge <curr, V>

• if V is unmarked

• mark V

• enqueue(q, V)

### The general BFS algorithm

• Each vertex can be in one of three states:

• Unmarked and not on queue

• Marked and on queue

• Marked and off queue

• The algorithm moves vertices between these states

### Handling vertices

• Unmarked and not on queue:

• Not reached yet

• Marked and on queue:

• Known, but adjacent vertices not visited yet (possibly)

• Marked and off queue:

• Known, all adjacent vertices on queue or done with

A

B

G

C

E

D

F

Queue: A

A

B

G

C

E

D

F

Queue: A B E

Mark them and put them in queue.

A

B

G

C

E

D

F

Queue: AB E C G

Now take B off queue, and queue its neighbors.

A

B

G

C

E

D

F

Queue: ABE C G D F

Do same with E.

A

B

G

C

E

D

F

Queue: ABEC G D F

Visit C.

Its neighbor F is already marked, so not queued.

A

B

G

C

E

D

F

Queue: ABECG D F

Visit G.

A

B

G

C

E

D

F

Queue: ABECGD F

Visit D. F, E marked so not queued.

A

B

G

C

E

D

F

Queue: ABECGDF

Visit F.

E, D, C marked, so not queued again.

A

B

G

C

E

D

F

Queue: ABECGDF

Done. We have explored the graph in order:

A B E C G D F.

### Interesting features of BFS

• Complexity: O(|V| + |E|)

• All vertices put on queue exactly once

• For each vertex on queue, we expand its edges

• In other words, we traverse all edges once

• BFS finds shortest path from s to each vertex

• Shortest in terms of number of edges

• Why does this work?

### Depth-first search

• Again, a simple and powerful algorithm

• Given a starting vertex s

• Pick an adjacent vertex, visit it.

• Then visit one of its adjacent vertices

• …..

• Until impossible, then backtrack, visit another

### DFS(graph g, vertex s)Assume all vertices initially unmarked

• mark s

• visit s // e.g., print its data

• for each edge <s, V>

• if V is not marked

• DFS(G, V)

A

B

G

C

E

D

F

Current vertex: A

A

B

G

C

E

D

F

Current: B

Expand A’s adjacent vertices. Pick one (B).

Mark it and re-visit.

A

B

G

C

E

D

F

Current: C

Now expand B, and visit its neighbor, C.

A

B

G

C

E

D

F

Current: F

Visit F.

Pick one of its neighbors, E.

A

B

G

C

E

D

F

Current: E

E’s adjacent vertices are A, D and F.

A and F are marked, so pick D.

A

B

G

C

E

D

F

Current: D

Visit D. No new vertices available. Backtrack to

E. Backtrack to F. Backtrack to C. Backtrack to B

A

B

G

C

E

D

F

Current: G

Visit G. No new vertices from here. Backtrack to

B. Backtrack to A. E already marked so no new.

A

B

G

C

E

D

F

Current:

1

Done. We have explored the graph in order:

A B C F E D G

2

5

6

3

7

4

### Interesting features of DFS

• Complexity: O(|V| + |E|)

• All vertices visited once, then marked

• For each vertex on queue, we examine all edges

• In other words, we traverse all edges once

• DFS does not necessarily find shortest path

• Why?