1 / 28

CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components

CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components. Herbert G. Mayer, PSU Status 5/21/2015. Syllabus. Definition of Graph Building a Graph From Edges Graph Data Structure Strongly Connected Components Basic Blocks Control Flow Graph References.

negbert
Download Presentation

CS 163 Data Structures Chapter 8 Graphs, Strongly Connected Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 163Data StructuresChapter 8Graphs, Strongly Connected Components Herbert G. Mayer, PSU Status 5/21/2015

  2. Syllabus • Definition of Graph • Building a Graph From Edges • Graph Data Structure • Strongly Connected Components • Basic Blocks • Control Flow Graph • References

  3. Formal Definition of Graph • Empty Graph: For simplicity and expediency we ignore the possibility of a graph G being empty • Graph: is a data structure G = { V, E } consisting of a set E of edges and a set of V vertices, AKA nodes. Any node viϵ V may be connected to any other node vj. Such a connection is called an edge. Edges may be directed, or even bi-directed. Different from a tree, a node in G may have any number of predecessors –or incident edges • Connected Graph: If all n > 0 nodes vn in G are connected somehow, the graph G is called connected, regardless of edge directions • Strongly Connected Component: A subset SG ⊆ G is strongly connected, if every node vi i > 0in SG can reach all vi nodes in SG somehow • Directed Acyclic Graph (DAG): A DAG is a graph with directed edges that form no cycle. A node may still have multiple predecessors • When programming graphs, it is convenient to add fields to the node type for auxiliary functions; e.g. it is possible to process all nodes in a linear fashion by adding a link field, here called the “finger” field • Sample 1: building a stack of all nodes in G • Sample 2: traversing all nodes in G, though G is unconnected!

  4. Building a Graph

  5. Graph Data Structure A graph G( v, e ) consists of nodes v and edges e • Implemented via a suitable node_type data structure G is identified and thus accessible via one select node, called entry node, or simply entry, AKA head Head is of type pointer to node_type G is not necessarily connected • If parts of G are unconnected, how can they be retrieved in case of a necessary, complete graph traversal? Several methods of forcing complete access: • Either create a super-node S, not specified by the user of G, in a way that each unconnected region is pointed at by S • Or have a linked-list (LL) meandering through each node of G, without this link field being part of G proper; e.g. finger

  6. Sample Graph G0 G0 2 1 3 R Y G B O 4 5 V 6 How many Strongly Connected Components in G0?

  7. Graph Data Structure Sample Graph G0 above has 6 nodes The ID, AKA name, of each node is shown next to the nodes, e.g. 1 2 3 4 5 … The graph’s node type data structure includes such name information as part of node_type In addition, each node in G0 has attributes, such as R, G, Y etc. in the sample above There may be many more attributes belonging to each node, depending on what the graph will be used for Any of these attributes must also be declared in the node_type data structure And the successors, if any, of each node must be encoded in the node somehow; no limit on number! G0 has 3 SCCs; 2 of those are trivial, thus not interesting!

  8. Graph Data Structure Since in general there is no inherent upper bound on the number of successor nodes, a suitable way to define successors is via a linked list Thus data type for successor is a pointer to a link node Link nodes then are also allocated off the heap, as needed, of type link_type And each link consists of just 2 fields • One field pointing to the next link, if any; the type is pointer to link_type, in some languages expresses as *link_type • The other field pointing to the successor node; the type is pointer to node_type For convenience, the last link inserted is added at the head of the list, saving multiple searches for list end

  9. Graph Data Structure, Link • // node may have any number of successors • // all need to be retrieved • // so each node in G has a link pointer, • // pointing to Llist of all successor nodes. • // Last one connected is the first one inserted • typedef struct link_tp * link_ptr_tp; // announce fwd • typedef struct node_tp * node_ptr_tp; // announce fwd • typedef struct link_tp • { • link_ptr_tp next_link; // point to next link • node_ptr_tp next_node; // point to successor node • } str_link_tp; • #define LINK_SIZE sizeof( str_link_tp )

  10. Graph Data Structure, Node • // "name" is arbitrary number given during creation • // "link" is head of Llist of successor nodes, while • // finger" is linear link through all nodes • // "visited" is true if was visited; initially FALSE • typedef struct node_tp * node_ptr_tp; // done earlier • typedef struct node_tp • { • link_ptr_tp link; // Llist of successors • node_ptr_tp finger; // finger through all nodes • int name; // name given at creation • bool visited; // to check connectivity • others ... // many other fields • } str_node_tp; • #define NODE_SIZE sizeof( str_node_tp )

  11. Building a Graph // create a node in graph G, identified by “name” // connect to the global “finger” at head of Llist node_ptr_tpmake_node( int name ) { // make_node node_ptr_tp node = (node_ptr_tp) malloc( NODE_SIZE ); // check once non-Null here, not on user side! ASSERT( node, ”no space for node in heap!" ); node->finger = finger; // re-assign finger!! node->lowlink = NIL; // int. not pointer node->number = NIL; // int type node->link = NULL; // pointer type node->name = name; // IDs this node node->visited = FALSE; // initially finger = node; // now link to “this” return node; } //end make_node

  12. Building a Graph from Edges // input is list of pairs, each element being a node name // craft edge from first to second name= number // If a node is new: create it; else use ptr = exists() while( scanf( "%d%d", &a, &b ) ) { // a, b are ints if ( ! ( first = exists( a ) ) ) { // ‘a’ new node? first = make_node( a ); // allocate ‘a’ } //end if if ( ! ( second = exists( b ) ) ) { // ‘b’ new node? second = make_node( b ); // allocate ‘b’ } //end if // both exist. Either created, or pre-existed: Connect! if ( new_link( first, second ) ) { link = make_link( first->link, second ); ASSERT( link, "no space for link node" ); first->link = link; }else{ // link was there already, no need to add again! printf( "<><> skip duplicate link %d->%d\n", a, b ); } //end if } //end while

  13. Building a Graph // check, whether link between these 2 nodes already exists // if not, return true: New! Else return false, NOT new! boolnew_link( node_ptr_tp first, node_ptr_tp second ) { // new_link int target = second->name; link_ptr_tp link = first->link; while ( link ) { if ( target == link->next_node->name ) { return FALSE; // it is an existing link, NOT new } //end if // check next node; if any link = link->next_link; } //end while // none of successors equal the second node's name return TRUE; // is a new link } //end new_link

  14. Strongly Connected Components:Optional

  15. Strongly Connected Components We’ll analyse graphs for the attribute of strong connectivity Using the best method known to date: by Robert E Tarjan, in his awesome 1972 SIAM paper: Pure beauty in Computer Science  Requires special fields in graph node, which we just add to any regular node:int numberand intlowlink typedefstructnode_tp { link_ptr_tp link; // points to Llist of successors node_ptr_tp finger; // finger through all nodes intlowlink; // Tarjan'slowlink intnumber; // Tarjan's number int name; // name given during creation bool visited; // avoid repeat visit } str_node_tp; #define NODE_SIZE sizeof( str_node_tp )

  16. Strongly Connected Components • Every node vi in a strongly connected component SCC of graph G can reach every node vi, not necessarily in one single step • An SCC is a subgraph SG of graph G, SG ⊆ G • By definition then, a singleton node graph is strongly connected; not very interesting, but this shows up when we discuss Tarjan’s method to uncover SCCs • We’ll enhance Tarjan’s code to filter out singleton-node SCCs • It is not required that an SCC have a single entry point, single exit point, and a single back-edge • Graph needs defined entry point: named entry or head • Tarjan’s SCC analysis may start at any node viϵ G • Correctness proof in Tarjan’s beautiful 1972 paper

  17. Strongly Connected Components • // Pseudo code for Tarjan’s method • // of detecting SCCs in directed graph • int scc_number = 0 // global to scc() • Pointer to scc_stack // initially empty • int scc_count = 0 // how many SCCs? • procedure main() // pseudo code • { // main • // assume, or verify stack is empty! • // mark all nodes in G as 'not visited' • for each node w ϵG not yet visited, do • scc( w ) • end for • } // end main

  18. Strongly Connected Components // Pseudo code for Tarjan’s method of detecting SCCs in directed graph G // Nodes in G in Tarjan's notation have added fields lowlinkand number // Also, there is a stack of nodes, encoded via scc_stack procedure scc( node_ptr_tp v ) { // scc lowlink( v ) := number( v ) := ++scc_number -- use global: scc_number push( v ) -- changes global: scc_stack for all successors w of v do if w is not visited then -- v->w is a tree arc scc( w ) lowlink( v ) := min( lowlink( v ), lowlink( w ) ) elsif number( w ) < number( v ) then -- v->w is a cross link if in_stack( w ) then lowlink( v ) := min( lowlink( v ), number( w ) ) end if end if end for if lowlink( v ) == number( v ) then -- next scc found scc_count++ -- just count number of SCCs printf( “New SCC %d found.\n”, scc_count ); while scc_stack, w := scc_stack, number( w ) >= number( v ) do printf( “Node %d is part of it.\n”, w->name ); pop( w ) end while end if } // end scc

  19. SCC Sample – Omit Trivial SCCs

  20. 3 Num: Low: Stack: 2 Num: Low: Stack: 1 Num: Low: Stack: 3-Node SCC Sample

  21. 3-Node SCC Sample Outside call to scc( node 1 ) • Increments num to 1, sets fields .num and .lowlink = 1 • Pushes node 1, i.e., stack = 1, node 1’s predecessor = null • Recursive call to find_scc( for node 2 ) • When done, find that lowlink = num, hence SCC Recursive call from node 1: scc( node 2 ) • Set node 2’s .num and .lowlink = 2 • Stack points to node 2, node 2.pred is node 1 • Has 2 successors: node 1 and node 3 • But node 1 is visited, while node 3 causes new find_scc( node 3 ) Recursive call from node 2: scc( node 3 ) • Set .num and .lowlink = 3 • Stack points to node 3, node 3.pred is node 2 • Has no successor • But lowlink = num hence is SCC, but is a singleton-node SCC

  22. Strongly Connected Components //////////////////////////////////////////////////////////////////////// //////////// //////////// //////////// S C C G r a p h A n a l y s i s //////////// //////////// //////////// //////////////////////////////////////////////////////////////////////// // globals for scc intscc_number = 0; // Tarjan's SCC numbers node_ptr_tpscc_stack = NULL; // stack exists via link in nodes intscc_count = 0; // tracks # of SCCs // global "scc_stack” simulates stack via Llist in nodes // each node has scc_pred link, linking up in fashion of a stack void push( node_ptr_tp v ) { // push ASSERT( v, "push() called with NIL vertex v" ); ASSERT( !( v->visited ), "pushing vertex again?" ); v->scc_pred = scc_stack; // first time NULL, then stack ptr v->visited = TRUE; // will be handled now scc_stack = v; // global pts ID’s head } //end push // starting with global scc_stack, we can traverse whole stack // all elements are connected by node field scc_stack void pop() { // pop ASSERT( scc_stack, "error, empty SCC stack" ); scc_stack->visited = FALSE; // remove from stack scc_stack = scc_stack->scc_pred; } //end pop

  23. SCC Coded, Part 1 void scc( node_ptr_tp v ) { // scc node_ptr_tp w; ASSERT( v, "calling scc with NULL pointer" ); ASSERT( !v->number, “node already has non-null number!" ); v->number = v->lowlink = ++scc_number; push( v ); for( link_ptr_tp link=v->link; link; link=link->next_link ) { w = link->next_node; ASSERT( w, “node w linked as successor must be /= 0" ); if( ! w->number ) { // if number is 0: not yet SCC’ed scc( w ); v->lowlink = min( v->lowlink, w->lowlink ); }else if( w->number < v->number ) { // frond, AKA “cross link” if( w->visited ) { // visited means: is on stack v->lowlink = min( v->lowlink, w->number ); } //end if } //end if } //end for . . Continued next page: now we can pop

  24. SCC Coded, Part 2 // now see, whether v is part of an SCC // and if so, record SCC number and all nodes belonging to it if ( v->lowlink == v->number ) { // found next scc; but if singleton node SCC, then skip it if ( scc_stack == v ) { // yes, singleton node; so be silent! Uninteresting! pop(); }else{ // multi-node SCC; So we do consider scc_count++; cout << “next sccnumber is: ” << scc_count << endl; while( scc_stack && ( scc_stack->number >= v->number ) ) { cout << “node “ << scc_stack->name<< “ is in.“ << endl; pop(); } //end while } //end if } //end if } //end scc

  25. Basic Blocks SCCs are usually uncovered for control flow graphs, AKA as CFG; goal to identify loops, specifically inner loops A CFG is a graph of basic blocks, AKA as BB • Each node in a CFG is a BB • An edge indicates transfer of control from one BB to another This control transfer is static; i.e. it is unknown at compile time, whether such a branch/call etc. will be executed Transfer instructions can have 1, 2, or more successors • An unconditional branch has 1 successor, the destination • A call has 2 successors, the destination and the instruction after the call; that’s where the callée returns to • A conditional branch has 2 successors, the destination and the fall-through location; like a call • Indexed branch –used for branch table– has any number of successors, table-entry computed at run-time

  26. Basic Blocks Def: A Basic Block is a sequence of i, i>0 instructions with one point of entry (header) and 1 point of exit; the two don’t need to be distinct! Point of entry for a BB may be the target of a branch or call or a fall-through of conditional branch The point of exit for a BB may be a branch or call instruction itself, or a conditional branch, but also a regular, sequential instruction, such as an integer add • happens, if the next instruction is the target of some other branch, i.e. that next instruction is the entry point of a new BB Given a sequence of instructions for a program, BB analysis is a two-pass process, inputs are the instruction sequence and knowledge of the first instruction to be executed, by default instruction 1

  27. Basic Block Analysis • Is a two-pass algorithm • Shown in separate document • Ditto for cfg generation • Would be too advanced for CS 163

  28. References • Control Flow Graph, in: Mayer, H. “Parallel Execution Enabled by Refined Source Analysis: Cost and Benefits in a Supercompiler”, R. Oldenbourg Verlag München/Wien, March 1997 • Graphs in: C. Berge, “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973 • SCCs: Robert Tarjan, "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972

More Related