CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components

CS 163Data StructuresChapter 8Graphs, Strongly Connected Components Herbert G. Mayer, PSU Status 5/21/2015

Syllabus • Definition of Graph • Building a Graph From Edges • Graph Data Structure • Strongly Connected Components • Basic Blocks • Control Flow Graph • References

Formal Definition of Graph • Empty Graph: For simplicity and expediency we ignore the possibility of a graph G being empty • Graph: is a data structure G = { V, E } consisting of a set E of edges and a set of V vertices, AKA nodes. Any node viϵ V may be connected to any other node vj. Such a connection is called an edge. Edges may be directed, or even bi-directed. Different from a tree, a node in G may have any number of predecessors –or incident edges • Connected Graph: If all n > 0 nodes vn in G are connected somehow, the graph G is called connected, regardless of edge directions • Strongly Connected Component: A subset SG ⊆ G is strongly connected, if every node vi i > 0in SG can reach all vi nodes in SG somehow • Directed Acyclic Graph (DAG): A DAG is a graph with directed edges that form no cycle. A node may still have multiple predecessors • When programming graphs, it is convenient to add fields to the node type for auxiliary functions; e.g. it is possible to process all nodes in a linear fashion by adding a link field, here called the “finger” field • Sample 1: building a stack of all nodes in G • Sample 2: traversing all nodes in G, though G is unconnected!

Building a Graph

Graph Data Structure A graph G( v, e ) consists of nodes v and edges e • Implemented via a suitable node_type data structure G is identified and thus accessible via one select node, called entry node, or simply entry, AKA head Head is of type pointer to node_type G is not necessarily connected • If parts of G are unconnected, how can they be retrieved in case of a necessary, complete graph traversal? Several methods of forcing complete access: • Either create a super-node S, not specified by the user of G, in a way that each unconnected region is pointed at by S • Or have a linked-list (LL) meandering through each node of G, without this link field being part of G proper; e.g. finger

Sample Graph G0 G0 2 1 3 R Y G B O 4 5 V 6 How many Strongly Connected Components in G0?

Graph Data Structure Sample Graph G0 above has 6 nodes The ID, AKA name, of each node is shown next to the nodes, e.g. 1 2 3 4 5 … The graph’s node type data structure includes such name information as part of node_type In addition, each node in G0 has attributes, such as R, G, Y etc. in the sample above There may be many more attributes belonging to each node, depending on what the graph will be used for Any of these attributes must also be declared in the node_type data structure And the successors, if any, of each node must be encoded in the node somehow; no limit on number! G0 has 3 SCCs; 2 of those are trivial, thus not interesting!

Graph Data Structure Since in general there is no inherent upper bound on the number of successor nodes, a suitable way to define successors is via a linked list Thus data type for successor is a pointer to a link node Link nodes then are also allocated off the heap, as needed, of type link_type And each link consists of just 2 fields • One field pointing to the next link, if any; the type is pointer to link_type, in some languages expresses as *link_type • The other field pointing to the successor node; the type is pointer to node_type For convenience, the last link inserted is added at the head of the list, saving multiple searches for list end

Graph Data Structure, Link • // node may have any number of successors • // all need to be retrieved • // so each node in G has a link pointer, • // pointing to Llist of all successor nodes. • // Last one connected is the first one inserted • typedef struct link_tp * link_ptr_tp; // announce fwd • typedef struct node_tp * node_ptr_tp; // announce fwd • typedef struct link_tp • { • link_ptr_tp next_link; // point to next link • node_ptr_tp next_node; // point to successor node • } str_link_tp; • #define LINK_SIZE sizeof( str_link_tp )

Graph Data Structure, Node • // "name" is arbitrary number given during creation • // "link" is head of Llist of successor nodes, while • // finger" is linear link through all nodes • // "visited" is true if was visited; initially FALSE • typedef struct node_tp * node_ptr_tp; // done earlier • typedef struct node_tp • { • link_ptr_tp link; // Llist of successors • node_ptr_tp finger; // finger through all nodes • int name; // name given at creation • bool visited; // to check connectivity • others ... // many other fields • } str_node_tp; • #define NODE_SIZE sizeof( str_node_tp )

Building a Graph // create a node in graph G, identified by “name” // connect to the global “finger” at head of Llist node_ptr_tpmake_node( int name ) { // make_node node_ptr_tp node = (node_ptr_tp) malloc( NODE_SIZE ); // check once non-Null here, not on user side! ASSERT( node, ”no space for node in heap!" ); node->finger = finger; // re-assign finger!! node->lowlink = NIL; // int. not pointer node->number = NIL; // int type node->link = NULL; // pointer type node->name = name; // IDs this node node->visited = FALSE; // initially finger = node; // now link to “this” return node; } //end make_node

Building a Graph from Edges // input is list of pairs, each element being a node name // craft edge from first to second name= number // If a node is new: create it; else use ptr = exists() while( scanf( "%d%d", &a, &b ) ) { // a, b are ints if ( ! ( first = exists( a ) ) ) { // ‘a’ new node? first = make_node( a ); // allocate ‘a’ } //end if if ( ! ( second = exists( b ) ) ) { // ‘b’ new node? second = make_node( b ); // allocate ‘b’ } //end if // both exist. Either created, or pre-existed: Connect! if ( new_link( first, second ) ) { link = make_link( first->link, second ); ASSERT( link, "no space for link node" ); first->link = link; }else{ // link was there already, no need to add again! printf( "<><> skip duplicate link %d->%d\n", a, b ); } //end if } //end while

Building a Graph // check, whether link between these 2 nodes already exists // if not, return true: New! Else return false, NOT new! boolnew_link( node_ptr_tp first, node_ptr_tp second ) { // new_link int target = second->name; link_ptr_tp link = first->link; while ( link ) { if ( target == link->next_node->name ) { return FALSE; // it is an existing link, NOT new } //end if // check next node; if any link = link->next_link; } //end while // none of successors equal the second node's name return TRUE; // is a new link } //end new_link

Strongly Connected Components:Optional

Strongly Connected Components We’ll analyse graphs for the attribute of strong connectivity Using the best method known to date: by Robert E Tarjan, in his awesome 1972 SIAM paper: Pure beauty in Computer Science  Requires special fields in graph node, which we just add to any regular node:int numberand intlowlink typedefstructnode_tp { link_ptr_tp link; // points to Llist of successors node_ptr_tp finger; // finger through all nodes intlowlink; // Tarjan'slowlink intnumber; // Tarjan's number int name; // name given during creation bool visited; // avoid repeat visit } str_node_tp; #define NODE_SIZE sizeof( str_node_tp )

Strongly Connected Components • Every node vi in a strongly connected component SCC of graph G can reach every node vi, not necessarily in one single step • An SCC is a subgraph SG of graph G, SG ⊆ G • By definition then, a singleton node graph is strongly connected; not very interesting, but this shows up when we discuss Tarjan’s method to uncover SCCs • We’ll enhance Tarjan’s code to filter out singleton-node SCCs • It is not required that an SCC have a single entry point, single exit point, and a single back-edge • Graph needs defined entry point: named entry or head • Tarjan’s SCC analysis may start at any node viϵ G • Correctness proof in Tarjan’s beautiful 1972 paper

Strongly Connected Components • // Pseudo code for Tarjan’s method • // of detecting SCCs in directed graph • int scc_number = 0 // global to scc() • Pointer to scc_stack // initially empty • int scc_count = 0 // how many SCCs? • procedure main() // pseudo code • { // main • // assume, or verify stack is empty! • // mark all nodes in G as 'not visited' • for each node w ϵG not yet visited, do • scc( w ) • end for • } // end main

Strongly Connected Components // Pseudo code for Tarjan’s method of detecting SCCs in directed graph G // Nodes in G in Tarjan's notation have added fields lowlinkand number // Also, there is a stack of nodes, encoded via scc_stack procedure scc( node_ptr_tp v ) { // scc lowlink( v ) := number( v ) := ++scc_number -- use global: scc_number push( v ) -- changes global: scc_stack for all successors w of v do if w is not visited then -- v->w is a tree arc scc( w ) lowlink( v ) := min( lowlink( v ), lowlink( w ) ) elsif number( w ) < number( v ) then -- v->w is a cross link if in_stack( w ) then lowlink( v ) := min( lowlink( v ), number( w ) ) end if end if end for if lowlink( v ) == number( v ) then -- next scc found scc_count++ -- just count number of SCCs printf( “New SCC %d found.\n”, scc_count ); while scc_stack, w := scc_stack, number( w ) >= number( v ) do printf( “Node %d is part of it.\n”, w->name ); pop( w ) end while end if } // end scc

SCC Sample – Omit Trivial SCCs

3 Num: Low: Stack: 2 Num: Low: Stack: 1 Num: Low: Stack: 3-Node SCC Sample

3-Node SCC Sample Outside call to scc( node 1 ) • Increments num to 1, sets fields .num and .lowlink = 1 • Pushes node 1, i.e., stack = 1, node 1’s predecessor = null • Recursive call to find_scc( for node 2 ) • When done, find that lowlink = num, hence SCC Recursive call from node 1: scc( node 2 ) • Set node 2’s .num and .lowlink = 2 • Stack points to node 2, node 2.pred is node 1 • Has 2 successors: node 1 and node 3 • But node 1 is visited, while node 3 causes new find_scc( node 3 ) Recursive call from node 2: scc( node 3 ) • Set .num and .lowlink = 3 • Stack points to node 3, node 3.pred is node 2 • Has no successor • But lowlink = num hence is SCC, but is a singleton-node SCC

Strongly Connected Components //////////////////////////////////////////////////////////////////////// //////////// //////////// //////////// S C C G r a p h A n a l y s i s //////////// //////////// //////////// //////////////////////////////////////////////////////////////////////// // globals for scc intscc_number = 0; // Tarjan's SCC numbers node_ptr_tpscc_stack = NULL; // stack exists via link in nodes intscc_count = 0; // tracks # of SCCs // global "scc_stack” simulates stack via Llist in nodes // each node has scc_pred link, linking up in fashion of a stack void push( node_ptr_tp v ) { // push ASSERT( v, "push() called with NIL vertex v" ); ASSERT( !( v->visited ), "pushing vertex again?" ); v->scc_pred = scc_stack; // first time NULL, then stack ptr v->visited = TRUE; // will be handled now scc_stack = v; // global pts ID’s head } //end push // starting with global scc_stack, we can traverse whole stack // all elements are connected by node field scc_stack void pop() { // pop ASSERT( scc_stack, "error, empty SCC stack" ); scc_stack->visited = FALSE; // remove from stack scc_stack = scc_stack->scc_pred; } //end pop

SCC Coded, Part 1 void scc( node_ptr_tp v ) { // scc node_ptr_tp w; ASSERT( v, "calling scc with NULL pointer" ); ASSERT( !v->number, “node already has non-null number!" ); v->number = v->lowlink = ++scc_number; push( v ); for( link_ptr_tp link=v->link; link; link=link->next_link ) { w = link->next_node; ASSERT( w, “node w linked as successor must be /= 0" ); if( ! w->number ) { // if number is 0: not yet SCC’ed scc( w ); v->lowlink = min( v->lowlink, w->lowlink ); }else if( w->number < v->number ) { // frond, AKA “cross link” if( w->visited ) { // visited means: is on stack v->lowlink = min( v->lowlink, w->number ); } //end if } //end if } //end for . . Continued next page: now we can pop

SCC Coded, Part 2 // now see, whether v is part of an SCC // and if so, record SCC number and all nodes belonging to it if ( v->lowlink == v->number ) { // found next scc; but if singleton node SCC, then skip it if ( scc_stack == v ) { // yes, singleton node; so be silent! Uninteresting! pop(); }else{ // multi-node SCC; So we do consider scc_count++; cout << “next sccnumber is: ” << scc_count << endl; while( scc_stack && ( scc_stack->number >= v->number ) ) { cout << “node “ << scc_stack->name<< “ is in.“ << endl; pop(); } //end while } //end if } //end if } //end scc

Basic Blocks SCCs are usually uncovered for control flow graphs, AKA as CFG; goal to identify loops, specifically inner loops A CFG is a graph of basic blocks, AKA as BB • Each node in a CFG is a BB • An edge indicates transfer of control from one BB to another This control transfer is static; i.e. it is unknown at compile time, whether such a branch/call etc. will be executed Transfer instructions can have 1, 2, or more successors • An unconditional branch has 1 successor, the destination • A call has 2 successors, the destination and the instruction after the call; that’s where the callée returns to • A conditional branch has 2 successors, the destination and the fall-through location; like a call • Indexed branch –used for branch table– has any number of successors, table-entry computed at run-time

Basic Blocks Def: A Basic Block is a sequence of i, i>0 instructions with one point of entry (header) and 1 point of exit; the two don’t need to be distinct! Point of entry for a BB may be the target of a branch or call or a fall-through of conditional branch The point of exit for a BB may be a branch or call instruction itself, or a conditional branch, but also a regular, sequential instruction, such as an integer add • happens, if the next instruction is the target of some other branch, i.e. that next instruction is the entry point of a new BB Given a sequence of instructions for a program, BB analysis is a two-pass process, inputs are the instruction sequence and knowledge of the first instruction to be executed, by default instruction 1

Basic Block Analysis • Is a two-pass algorithm • Shown in separate document • Ditto for cfg generation • Would be too advanced for CS 163

References • Control Flow Graph, in: Mayer, H. “Parallel Execution Enabled by Refined Source Analysis: Cost and Benefits in a Supercompiler”, R. Oldenbourg Verlag München/Wien, March 1997 • Graphs in: C. Berge, “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973 • SCCs: Robert Tarjan, "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972

CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components

CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components

Presentation Transcript

Connected Components, Directed graphs, Topological sort

Connected Components, Directed Graphs, Topological Sort

CS 163 – Data Structures

Chapter 8: Graphs

CS 302 Data Structures

CS-2852 Data Structures

CS-2852 Data Structures

CS 240: Data Structures

Data Structures – LECTURE 14 Strongly connected components

Hw. 6: Algorithm for finding strongly connected components.

Data Structures – LECTURE 14 Strongly connected components

CS-362: Data Structures Week 8

Data Structures for Graphs

Strongly Connected Components for Directed Graphs

CS 163 Data Structures Chapter 10 Symbolic Differentiation

CS 163 Data Structures Chapter 7 C++ Simulation of Recursion

Data Structures Graphs

CS 163 Data Structures Chapter 1 The Instructor and You

Connected Components, Directed Graphs, Topological Sort

CSE 326: Data Structures Part 8 Graphs