Data Structures and Algorithms

1 / 53

Data Structures and Algorithms - PowerPoint PPT Presentation

Data Structures and Algorithms. Graphs I: Representation and Search Gal A. Kaminka Computer Science Department. Outline. Reminder: Graphs Directed and undirected Matrix representation of graphs Directed and undirected Sparse matrices and sparse graphs Adjacency list representation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Data Structures and Algorithms' - wanda

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data Structures and Algorithms

Graphs I:

Representation and Search

Gal A. Kaminka

Computer Science Department

Outline
• Reminder: Graphs
• Directed and undirected
• Matrix representation of graphs
• Directed and undirected
• Sparse matrices and sparse graphs
Graphs
• Tuple <V,E>
• V is set of vertices
• E is a binary relation on V
• Each edge is a tuple < v1,v2 >, where v1,v2 in V
• |E| =< |V|2
Directed and Undirected Graphs
• Directed graph:
• < v1, v2 > in E is ordered, i.e., a relation (v1,v2)
• Undirected graph:
• < v1, v2 > in E is un-ordered, i.e., a set { v1, v2 }
• Degree of a node X:
• Out-degree: number of edges < X, v2 >
• In-degree: number of edges < v1, X >
• Degree: In-degree + Out-degree
• In undirected graph: number of edges { X, v2 }
Paths
• Path from vertex v0 to vertex vk:
• A sequence of vertices < v0, v1, …. vk >
• For all 0 =< i < k, edge < vi, vi+1 > exists.
• Path is of length k
• Two vertices x, y are adjacent if < x, y > is an edge
• Path is simple if vertices in sequence are distinct.
• Cycle: if v0 = vk
• < v, v > is cycle of length 1
Connected graphs
• Undirected Connected graph:
• For any vertices x, y there exists a path xy (= yx)
• Directed connected graph:
• If underlying undirected graph is connected
• Strongly connected directed graph:
• If for any two vertices x, y there exist path xy

and path yx

• Clique: a strongly connected component
• |V|-1 =< |E| =< |V|2
Cycles and trees
• Graph with no cycles: acyclic
• Directed Acyclic Graph: DAG
• Undirected forest:
• Acyclic undirected graph
• Tree: undirected acyclic connected graph
• one connected component
Representing graphs
• When graph is dense
• |E| close to |V|2
• When graph is sparse
• |E| << |V|2
• Matrix of size |V| x |V|
• Each row (column) j correspond to a distinct vertex j
• “1” in cell < i, j > if there is exists an edge <i,j>
• Otherwise, “0”
• In an undirected graph, “1” in <i,j> => “1” in <j,i>
• “1” in <j,j> means there’s a self-loop in vertex j
Examples

1

1

2

1 2 3

1 0 0 1

2 0 1 0

3 1 1 0

3

2

3

4

1 2 3 4

1 0 1 1 0

2 1 0 0 0

3 1 0 0 0

4 0 0 0 0

• Storage complexity: O(|V|2)
• But can use bit-vector representation
• Undirected graph: symmetric along main diagonal
• AT transpose of A
• Undirected: A=AT
• In-degree of X: Sum along column X O(|V|)
• Out-degree of X: Sum along row X O(|V|)
• Very simple, good for small graphs
• Edge existence query: O(1)
But, ….
• Many graphs in practical problems are sparse
• Not many edges --- not all pairs x,y have edge xy
• Matrix representation demands too much memory
• We want to reduce memory footprint
• Use sparse matrix techniques
• An array Adj[ ] of size |V|
• Each cell holds a list for associated vertex
• List does not have to be sorted

Undirected graphs:

• Each edge is represented twice
Examples

1

1

2

1 3

2 2

3 1  2

3

2

3

4

1 2  3

2 1

3 1

4

• Storage Complexity:
• O(|V| + |E|)
• In undirected graph: O(|V|+2*|E|) = O(|V|+|E|)
• Edge query check:
• O(|V|) in worst case
• Degree of node X:
• Out degree: Length of Adj[X] O(|V|) calculation
• In degree: Check all Adj[] lists O(|V|+|E|)
• Can be done in O(1) with some auxiliary information!
Graph Traversals (Search)
• We have covered some of these with binary trees
• Depth-first search (DFS)
• A traversal (search):
• An algorithm for systematically exploring a graph
• Visiting (all) vertices
• Until finding a goal vertex or until no more vertices

Only for connected graphs

• One of the simplest algorithms
• Also one of the most important
• It forms the basis for MANY graph algorithms
BFS: Level-by-level traversal
• Given a starting vertex s
• Visit all vertices at increasing distance from s
• Visit all vertices at distance k from s
• Then visit all vertices at distance k+1 from s
• Then ….

5

2

1

3

8

6

10

7

9

BFS in a binary tree (reminder)

BFS: visit all siblings before their descendents

5 2 8 1 3 6 10 7 9

BFS(tree t)
• q  new queue
• enqueue(q, t)
• while (not empty(q))
• curr  dequeue(q)
• visit curr // e.g., print curr.datum
• enqueue(q, curr.left)
• enqueue(q, curr.right)

This version for binary trees only!

BFS for general graphs
• This version assumes vertices have two children
• left, right
• This is trivial to fix
• But still no good for general graphs
• It does not handle cycles

Example.

A

B

G

C

E

D

F

Queue: A

A

B

G

C

E

D

F

Queue: A B E

B and E are next

A

B

G

C

E

D

F

Queue: A B E C G D F

When we go to B, we put G and C in the queue

When we go to E, we put D and F in the queue

A

B

G

C

E

D

F

Queue: A B E C G D F

When we go to B, we put G and C in the queue

When we go to E, we put D and F in the queue

A

B

G

C

E

D

F

Queue: A B EC G D F F

Suppose we now want to expand C.

We put F in the queue again!

Generalizing BFS
• Cycles:
• We need to save auxiliary information
• Each node needs to be marked
• Visited: No need to be put on queue
• Not visited: Put on queue when found

What about assuming only two children vertices?

• Need to put all adjacent vertices in queue
BFS(graph g, vertex s)
• unmark all vertices in G
• q  new queue
• mark s
• enqueue(q, s)
• while (not empty(q))
• curr  dequeue(q)
• visit curr // e.g., print its data
• for each edge <curr, V>
• if V is unmarked
• mark V
• enqueue(q, V)
The general BFS algorithm
• Each vertex can be in one of three states:
• Unmarked and not on queue
• Marked and on queue
• Marked and off queue
• The algorithm moves vertices between these states
Handling vertices
• Unmarked and not on queue:
• Not reached yet
• Marked and on queue:
• Known, but adjacent vertices not visited yet (possibly)
• Marked and off queue:
• Known, all adjacent vertices on queue or done with

A

B

G

C

E

D

F

Queue: A

A

B

G

C

E

D

F

Queue: A B E

Mark them and put them in queue.

A

B

G

C

E

D

F

Queue: AB E C G

Now take B off queue, and queue its neighbors.

A

B

G

C

E

D

F

Queue: ABE C G D F

Do same with E.

A

B

G

C

E

D

F

Queue: ABEC G D F

Visit C.

Its neighbor F is already marked, so not queued.

A

B

G

C

E

D

F

Queue: ABECG D F

Visit G.

A

B

G

C

E

D

F

Queue: ABECGD F

Visit D. F, E marked so not queued.

A

B

G

C

E

D

F

Queue: ABECGDF

Visit F.

E, D, C marked, so not queued again.

A

B

G

C

E

D

F

Queue: ABECGDF

Done. We have explored the graph in order:

A B E C G D F.

Interesting features of BFS
• Complexity: O(|V| + |E|)
• All vertices put on queue exactly once
• For each vertex on queue, we expand its edges
• In other words, we traverse all edges once
• BFS finds shortest path from s to each vertex
• Shortest in terms of number of edges
• Why does this work?
Depth-first search
• Again, a simple and powerful algorithm
• Given a starting vertex s
• Pick an adjacent vertex, visit it.
• Then visit one of its adjacent vertices
• …..
• Until impossible, then backtrack, visit another
DFS(graph g, vertex s)Assume all vertices initially unmarked
• mark s
• visit s // e.g., print its data
• for each edge <s, V>
• if V is not marked
• DFS(G, V)

A

B

G

C

E

D

F

Current vertex: A

A

B

G

C

E

D

F

Current: B

Expand A’s adjacent vertices. Pick one (B).

Mark it and re-visit.

A

B

G

C

E

D

F

Current: C

Now expand B, and visit its neighbor, C.

A

B

G

C

E

D

F

Current: F

Visit F.

Pick one of its neighbors, E.

A

B

G

C

E

D

F

Current: E

E’s adjacent vertices are A, D and F.

A and F are marked, so pick D.

A

B

G

C

E

D

F

Current: D

Visit D. No new vertices available. Backtrack to

E. Backtrack to F. Backtrack to C. Backtrack to B

A

B

G

C

E

D

F

Current: G

Visit G. No new vertices from here. Backtrack to

B. Backtrack to A. E already marked so no new.

A

B

G

C

E

D

F

Current:

1

Done. We have explored the graph in order:

A B C F E D G

2

5

6

3

7

4

Interesting features of DFS
• Complexity: O(|V| + |E|)
• All vertices visited once, then marked
• For each vertex on queue, we examine all edges
• In other words, we traverse all edges once
• DFS does not necessarily find shortest path
• Why?