SSA Optimization Find All Referenced Variable Dead Code Elimination

SSA OptimizationFind All Referenced VariableDead Code Elimination Presented by 蔡進義

Outline • Find All Referenced Variable • Dead Code Elimination

Find All Referenced Variable • This pass walks the entire function and collects an array of all variables referenced in the function, referenced_vars. • The index at which a variable is found in the array is used as a UID for the variable within this function. • This data is needed by the SSA rewriting routines. The pass is located in tree-dfa.c and is described by pass_referenced_vars.

Find All Referenced Variable (cont’d) void init_tree_optimization_passes (void) tree-optimize.c NEXT_PASS (pass_referenced_vars) struct tree_opt_pass pass_referenced_vars Initial value: { NULL, NULL, find_referenced_vars, NULL, NULL, 0, TV_FIND_REFERENCED_VARS, PROP_gimple_leh | PROP_cfg, PROP_referenced_vars, 0, 0, 0, 0 } find_referenced_vars() tree-dfa.c

find_referenced_vars vars_found = htab_create (); tree *stmt_p = bsi_stmt_ptr (si); walk_tree (stmt_p, find_vars_r, &walk_state, NULL); htab_delete (vars_found);

walk_tree() result = (*func) (tp, &walk_subtrees, data); //find_vars_r() WALK_SUBTREE_TAIL (TREE_CHAIN (*tp)); result = lang_hooks.tree_inlining.walk_subtrees (tp, &walk_subtrees, func, data, pset); if (code == DECL_EXPR && TREE_CODE (DECL_EXPR_DECL (*tp)) == TYPE_DECL && TREE_CODE (TREE_TYPE (DECL_EXPR_DECL (*tp))) != ERROR_MARK) { ….. } else if (code != EXIT_BLOCK_EXPR && code != SAVE_EXPR && code != BIND_EXPR && IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (code))) { …. } else if (TYPE_P (*tp)) { result = walk_type_fields (*tp, func, data, pset); if (result) return result; } else { ….. }

static void find_referenced_vars (void) { htab_t vars_found; basic_block bb; block_stmt_iterator si; struct walk_state walk_state; vars_found = htab_create (50, htab_hash_pointer, htab_eq_pointer, NULL); memset (&walk_state, 0, sizeof (walk_state)); walk_state.vars_found = vars_found; FOR_EACH_BB (bb) for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si)) { tree *stmt_p = bsi_stmt_ptr (si); walk_tree (stmt_p, find_vars_r, &walk_state, NULL); } htab_delete (vars_found); }

/* State information for find_vars_r. */ struct walk_state { /* Hash table used to avoid adding the same variable more than once. */ htab_t vars_found; }; htab_t vars_found; vars_found = htab_create (50, htab_hash_pointer, htab_eq_pointer, NULL); walk_state.vars_found = vars_found; htab_create()  htab_create_alloc() This function creates table with length slightly longer than given source length.

This function frees all memory allocated for given hash table. Naturally the hash table must already exist. htab_delete (vars_found); #define FOR_EACH_BB_REVERSE(BB) \ FOR_BB_BETWEEN (BB, EXIT_BLOCK_PTR->prev_bb, ENTRY_BLOCK_PTR, prev_bb)

/* Apply FUNC to all the sub-trees of TP in a pre-order traversal. FUNC is called with the DATA and the address of each sub-tree. If FUNC returns a non-NULL value, the traversal is aborted, and the value returned by FUNC is returned. If PSET is non-NULL it is used to record the nodes visited, and to avoid visiting a node more than once. */ walk_tree (stmt_p, find_vars_r, &walk_state, NULL);

Outline • Find All Referenced Variable • Dead Code Elimination

Dead Coed Elimination • Dead-code elimination is the removal of statements which have no impact on the program's output. • "Dead statements" have no impact on the program's output, while "necessary statements“ may have impact on the output.

Dead Coed Elimination Algorithm • The algorithm consists of three phases: • Marking as necessary all statements known to be necessary, e.g. most function calls, writing a value to memory, etc; • Propagating necessary statements, e.g., the statements giving values to operands in necessary statements; and • Removing dead statements.

Dead Code Elimination (cont’d) void init_tree_optimization_passes (void) tree-optimize.c NEXT_PASS (pass_dce); struct tree_opt_passpass_dce = { "dce", /* name */ gate_dce, /* gate */ tree_ssa_dce, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_DCE, /* tv_id */ PROP_cfg | PROP_ssa | PROP_alias, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_fix_def_def_chains |TODO_ggc_collect | TODO_verify_ssa, /* todo_flags_finish */ 0 /* letter */}; tree_ssa_dce() tree-ssa-dce.c

tree_ssa_dce() mark_stmt_necessary (phi, true); mark_stmt_if_obviously_necessary (stmt, el != NULL); perform_tree_ssa_dce() i = VARRAY_TOP_TREE (worklist); VARRAY_POP (worklist); mark_operand_necessary (arg, false); tree_dce_init (aggressive); find_obviously_necessary_stmts (el); propagate_necessity (el); mark_really_necessary_kill_operand_phis (); remove_dead_phis (bb); eliminate_unnecessary_stmts (); remove_dead_stmt (&i, bb);

tree-ssa-dce.c /* Pass entry points. */ static void tree_ssa_dce (void) { perform_tree_ssa_dce (/*aggressive=*/false); } void perform_tree_ssa_dce (bool aggressive) { struct edge_list *el = NULL; tree_dce_init (aggressive); if (aggressive) { /* Compute control dependence. */ timevar_push (TV_CONTROL_DEPENDENCES); calculate_dominance_info (CDI_POST_DOMINATORS); el = create_edge_list (); find_all_control_dependences (el); timevar_pop (TV_CONTROL_DEPENDENCES); mark_dfs_back_edges (); }

find_obviously_necessary_stmts (el); propagate_necessity (el); mark_really_necessary_kill_operand_phis (); eliminate_unnecessary_stmts (); if (aggressive) free_dominance_info (CDI_POST_DOMINATORS); cleanup_tree_cfg (); /* Debugging dumps. */ if (dump_file) { dump_function_to_file (current_function_decl, dump_file, dump_flags); print_stats (); } tree_dce_done (aggressive); free_edge_list (el); } tree-ssa-dce.c

/* Initialization for this pass. Set up the used data structures. */ static void tree_dce_init (bool aggressive) { memset ((void *) &stats, 0, sizeof (stats)); if (aggressive) { int i; control_dependence_map = xmalloc (last_basic_block * sizeof (bitmap)); for (i = 0; i < last_basic_block; ++i) control_dependence_map[i] = BITMAP_XMALLOC (); last_stmt_necessary = sbitmap_alloc (last_basic_block); sbitmap_zero (last_stmt_necessary); } processed = sbitmap_alloc (num_ssa_names + 1); //Allocate a simple bitmap of N_ELMS bits. sbitmap_zero (processed); //memset (bmap->elms, 0, bmap->bytes); //Zero all elements in a bitmap. VARRAY_TREE_INIT (worklist, 64, "work list"); //Allocate a virtual array with 64 elements } #define VARRAY_TREE_INIT(va, num, name) \ va = varray_init (num, VARRAY_DATA_TREE, name)

Find obviously necessary statements. These are things like most functioncalls, and stores to file level variables. static void find_obviously_necessary_stmts (struct edge_list *el) { basic_block bb; block_stmt_iterator i; edge e; FOR_EACH_BB (bb) { tree phi; /* Check any PHI nodes in the block. */ for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi)) { NECESSARY (phi) = 0; //#define NECESSARY(stmt) stmt->common.asm_written_flag if (is_gimple_reg (PHI_RESULT (phi)) && //scalar variable, global variable is_global_var (SSA_NAME_VAR (PHI_RESULT (phi)))) mark_stmt_necessary (phi, true); }

/* Check all statements in the block. */ for (i = bsi_start (bb); ! bsi_end_p (i); bsi_next (&i)) { tree stmt = bsi_stmt (i); NECESSARY (stmt) = 0; mark_stmt_if_obviously_necessary (stmt, el != NULL); } /* Mark this basic block as `not visited'. A block will be marked visited when the edges that it is control dependent on have been marked. */ bb->flags &= ~BB_VISITED; } if (el) //it contains the list of edges used by control dependence analysis. { /* Prevent the loops from being removed. We must keep the infinite loops, and we currently do not have a means to recognize the finite ones. */ FOR_EACH_BB (bb) { edge_iterator ei; FOR_EACH_EDGE (e, ei, bb->succs) if (e->flags & EDGE_DFS_BACK) mark_control_dependent_edges_necessary (e->dest, el); } } }

/* If STMT is not already marked necessary, mark it, and add it to the worklist if ADD_TO_WORKLIST is true. */ static inline void mark_stmt_necessary (tree stmt, bool add_to_worklist) { gcc_assert (stmt); gcc_assert (stmt != error_mark_node); gcc_assert (!DECL_P (stmt)); if (NECESSARY (stmt)) //已mark過了 return; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Marking useful stmt: "); print_generic_stmt (dump_file, stmt, TDF_SLIM); fprintf (dump_file, "\n"); } NECESSARY (stmt) = 1; //mark if (add_to_worklist) VARRAY_PUSH_TREE (worklist, stmt);//add the worklist //Push a new element on the end of VA }

static void propagate_necessity (struct edge_list *el) { while (VARRAY_ACTIVE_SIZE (worklist) > 0) { /* Take `i' from worklist. */ i = VARRAY_TOP_TREE (worklist); VARRAY_POP (worklist); if (TREE_CODE (i) == PHI_NODE) { } else { ssa_op_iter iter; tree use; get_stmt_operands (i); FOR_EACH_SSA_TREE_OPERAND (use, i, iter, SSA_OP_ALL_USES) mark_operand_necessary (use, false); } }

/* Describe one pass. */ struct tree_opt_pass { /* Terse name of the pass used as a fragment of the dump file name. */ const char *name; /* If non-null, this pass and all sub-passes are executed only if the function returns true. */ bool (*gate) (void); /* This is the code to run. If null, then there should be sub-passes, otherwise this pass does nothing. */ void (*execute) (void); /* A list of sub-passes to run, dependent on gate predicate. */ struct tree_opt_pass *sub; /* Next in the list of passes to run, independent of gate predicate. */ struct tree_opt_pass *next; tree-pass.h /* Static pass number, used as a fragment of the dump file name. */ int static_pass_number; /* The timevar id associated with this pass. */ /* ??? Ideally would be dynamically assigned. */ unsigned int tv_id; /* Sets of properties input and output from this pass. */ unsigned int properties_required; unsigned int properties_provided; unsigned int properties_destroyed; /* Flags indicating common sets things to do before and after. */ unsigned int todo_flags_start; unsigned int todo_flags_finish; /* Letter for RTL dumps. */ char letter; }; Appendix

tree-optimize.c /* Construct the pass tree. */ void init_tree_optimization_passes (void) { struct tree_opt_pass **p; #define NEXT_PASS(PASS) (p = next_pass_1 (p, &PASS)) p = &all_passes; NEXT_PASS (pass_gimple); NEXT_PASS (pass_remove_useless_stmts); NEXT_PASS (pass_mudflap_1); NEXT_PASS (pass_lower_cf); NEXT_PASS (pass_lower_eh); NEXT_PASS (pass_build_cfg); NEXT_PASS (pass_pre_expand); NEXT_PASS (pass_tree_profile); NEXT_PASS (pass_init_datastructures); NEXT_PASS (pass_all_optimizations); NEXT_PASS (pass_warn_function_return); NEXT_PASS (pass_mudflap_2); NEXT_PASS (pass_free_datastructures); NEXT_PASS (pass_expand); NEXT_PASS (pass_rest_of_compilation); *p = NULL; p = &pass_all_optimizations.sub; NEXT_PASS (pass_referenced_vars); NEXT_PASS (pass_build_ssa); NEXT_PASS (pass_may_alias); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_early_warn_uninitialized); NEXT_PASS (pass_dce); NEXT_PASS (pass_dominator); NEXT_PASS (pass_redundant_phi); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_may_alias); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_profile); NEXT_PASS (pass_sra); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_dominator); NEXT_PASS (pass_redundant_phi); NEXT_PASS (pass_dce); NEXT_PASS (pass_dse); NEXT_PASS (pass_may_alias); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_ccp); Appendix (cont’d)

tree-optimize.c p = &pass_loop.sub; NEXT_PASS (pass_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_unswitch); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_linear_transform); NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_if_conversion); NEXT_PASS (pass_vectorize); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_iv_optimize); NEXT_PASS (pass_loop_done); *p = NULL; #undef NEXT_PASS /* Register the passes with the tree dump code. */ register_dump_files (all_passes, 0); } NEXT_PASS (pass_redundant_phi); NEXT_PASS (pass_fold_builtins); NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_loop); NEXT_PASS (pass_dominator); NEXT_PASS (pass_redundant_phi); NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_dse); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_late_warn_uninitialized); NEXT_PASS (pass_del_ssa); NEXT_PASS (pass_nrv); NEXT_PASS (pass_remove_useless_vars); NEXT_PASS (pass_cleanup_cfg_post_optimizing); *p = NULL; Appendix (cont’d)

/* This structure maintains an edge list vector. */ struct edge_list { int num_blocks; int num_edges; edge *index_to_edge; }; basic-block.h #define SBITMAP_ELT_TYPE unsigned HOST_WIDEST_FAST_INT typedef struct simple_bitmap_def { unsigned int n_bits; /* Number of bits. */ unsigned int size; /* Size in elements. */ unsigned int bytes; /* Size in bytes. */ SBITMAP_ELT_TYPE elms[1]; /* The elements. */ } *sbitmap; typedef SBITMAP_ELT_TYPE *sbitmap_ptr; sbitmap.h Appendix (cont’d)

Appendix (cont’d) /* Allocate a virtual array with NUM_ELEMENT elements, each of which is ELEMENT_SIZE bytes long, named NAME. Array elements are zeroed. */ varray_type varray_init (size_t num_elements, enum varray_data_enum element_kind, const char *name) { size_t data_size = num_elements * element[element_kind].size; varray_type ptr; #ifdef GATHER_STATISTICS struct varray_descriptor *desc = varray_descriptor (name); desc->created++; desc->allocated += data_size + VARRAY_HDR_SIZE; #endif if (element[element_kind].uses_ggc) ptr = ggc_alloc_cleared (VARRAY_HDR_SIZE + data_size); else ptr = xcalloc (VARRAY_HDR_SIZE + data_size, 1); ptr->num_elements = num_elements; ptr->elements_used = 0; ptr->type = element_kind; ptr->name = name; return ptr; }

typedef struct { tree_stmt_iterator tsi; basic_block bb; } block_stmt_iterator; tree-flow.h Appendix (cont’d)

Appendix struct basic_block_def GTY((chain_next ("%h.next_bb"), chain_prev ("%h.prev_bb"))) { /* The first and last insns of the block. */ rtx head_; rtx end_; /* Pointers to the first and last trees of the block. */ tree stmt_list; /* The edges into and out of the block. */ VEC(edge) *preds; VEC(edge) *succs; /* The registers that are live on entry to this block. */ bitmap GTY ((skip (""))) global_live_at_start; /* The registers that are live on exit from this block. */ bitmap GTY ((skip (""))) global_live_at_end; /* Auxiliary info specific to a pass. */ PTR GTY ((skip (""))) aux; /* Innermost loop containing the block. */ struct loop * GTY ((skip (""))) loop_father; basic-block.h /* The dominance and postdominance information node. */ struct et_node * GTY ((skip (""))) dom[2]; /* Previous and next blocks in the chain. */ struct basic_block_def *prev_bb; struct basic_block_def *next_bb; /* The data used by basic block copying and reordering functions. */ struct reorder_block_def * GTY ((skip (""))) rbi; /* Annotations used at the tree level. */ struct bb_ann_d *tree_annotations; /* Expected number of executions: calculated in profile.c. */ gcov_type count; /* The index of this block. */ int index; /* The loop depth of this block. */ int loop_depth; /* Expected frequency. Normalized to be in range 0 to BB_FREQ_MAX. */ int frequency; /* Various flags. See BB_* below. */ int flags; };

Appendix (cont’d) A basic block is a sequence of instructions with only entry and only one exit. If any one of the instructions are executed, they will all be executed, and in sequence from first to last. There may be COND_EXEC instructions in the basic block. The COND_EXEC *instructions* will be executed -- but if the condition is false the conditionally executed *expressions* will of course not be executed. We don't consider the conditionally executed expression (which might have side-effects) to be in a separate basic block because the program counter will always be at the same location after the COND_EXEC instruction, regardless of whether the condition is true or not. Basic blocks need not start with a label nor end with a jump insn. For example, a previous basic block may just "conditionally fall“ into the succeeding basic block, and the last basic block need not end with a jump insn. Block 0 is a descendant of the entry block. A basic block beginning with two labels cannot have notes between the labels. Data for jump tables are stored in jump_insns that occur in no basic block even though these insns can follow or precede insns in basic blocks.

Appendix (cont’d) /* Callback for walk_tree. Used to collect variables referenced in the function. */ static tree find_vars_r (tree *tp, int *walk_subtrees, void *data) { struct walk_state *walk_state = (struct walk_state *) data; /* If T is a regular variable that the optimizers are interested in, add it to the list of variables. */ if (SSA_VAR_P (*tp)) add_referenced_var (*tp, walk_state); /* Type, _DECL and constant nodes have no interesting children. Ignore them. */ else if (IS_TYPE_OR_DECL_P (*tp) || CONSTANT_CLASS_P (*tp)) *walk_subtrees = 0; return NULL_TREE; }

SSA Optimization Find All Referenced Variable Dead Code Elimination

SSA Optimization Find All Referenced Variable Dead Code Elimination

Presentation Transcript

Code Optimization

Reasoning Under Uncertainty: Variable Elimination

Code optimization by partial redundancy elimination using Eliminatability paths (E-paths)

Code Optimization

C66x Code Optimization

Variable Elimination

XCISE 2.0.6 Dead Code Elimination by Exploiting ASX

Dead Code Elimination

From Variable Elimination to Junction Trees

Code Optimization

Randomized Variable Elimination

Code Optimization

Sparse code optimization

Code Optimization

More Code Optimization

Optimization algorithms using SSA

Code Optimization

Bayesian networks Variable Elimination

Code Optimization

XCISE 2.0.6 Dead Code Elimination by Exploiting ASX

Code Optimization