1 / 25

CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity

Herbert G. Mayer, PSU CS Status 7/4/2013. CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity. Syllabus. Thoughts on Complexity Hard to Understand Code? Program Complexity Complex vs. Hard Halstead Program Metrics McCabe Cyclomatic Number

hila
Download Presentation

CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Herbert G. Mayer, PSU CS Status 7/4/2013 CS 410 / 510Mastery in ProgrammingChapter 3Program and Language Complexity

  2. Syllabus • Thoughts on Complexity • Hard to Understand Code? • Program Complexity • Complex vs. Hard • Halstead Program Metrics • McCabe Cyclomatic Number • Cyclomatic Number Samples • References

  3. Thoughts on Complexity • ‘Complexity’ as used in this class: • Refers to the number of different paths of execution through a given program, dictated by flow of control; synonym: convoluted • Or refers a degree of difficulty of expressing some algorithm via a string of symbols –i.e. the source program; synonym: hard • Some hard to compute functions are easy to code and understand, once invented • E.g. R. E. Tarjan’s SCC algorithm, or Newton’s square-root formula • Complexity, as used here, does not mean: • “intractable to compute”, such as NP-complete problems requiring too much compute power to ever terminate in human time • Complexity also does not mean: • “hard to understand”, as may be the case with obfuscated programming styles; or poorly written code • Synonym for such a type of “complex” may be: difficult to read

  4. Hard to Understand C Code? #include <stdio.h> int a[ 1 ]; // just to have an array to index int p( char arg ) { // p printf( "%c", arg ); return 0; // no array bounds violation! } //end p int main( ) { // main a[ p( 'a' ) ] = a[ p( 'b' ) ] = a[ p( 'c' ) ] = a[ p( 'd' ) ]; printf( "\n" ); return 0; } //end main

  5. Hard to Understand Code? • Output using PSU Unix C compiler is: a b c d • Is this correct? If not, what should output be? • Is this assignment-statement rule respected in the used C++ implementation: • to execute the right-hand side first? • Other outputs feasible, according to rules C++ or Java or C# ?

  6. Hard to Understand, Not Complex #include <stdio.h> #define MAX 7 // 7 redundant? Discuss! int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 }; void p() { // p for( inti = 0; i < MAX; i++ ) { printf( " a[%d] = %d\n", i, a[ i ] ); } //end for printf( "\n" ); } //end p int main() { // main int x = 99; p(); a[ x = 3 ] = a[ x = 5 ] = x = 6; p(); } //end main

  7. Hard to Understand, Not Complex • a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 3 a[4] = 4 a[5] = 5 a[6] = 6 a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 6  a[4] = 4 a[5] = 6  • a[6] = 6 • x ends up being = 6 on [most] C++ run-time systems

  8. Program Complexity • Some computable problems are hard, NP-hard, complex, or hard-to-understand! • Assuming an experienced designer and programmer: • Some problems are laborious to solve; they are “complex” due to amount of work • Others are hard, due to elusiveness of a solution; just try to find a better SCC!!! • Yet others are not solvable; e.g. non computable functions, e.g. Halting Problem [10] • What is program complexity? • Is a large program complex, i.e. one with many lines of code (LOC)? • More complicated code? • Spaghetti code? Labels? Computable labels? Gotos? Poor naming conventions? • Recursive functions? • What unit-of-measure does complexity have? • Time to run? • Number of different paths through control-flow graph? • Space for memory locations needed to run? • Number of processors needed to solve computation? • Number of iterations for suitable solution? E.g. number of digits for π • Degree of “mental hardness” to identify a solution? E.g. in the chess game? • V(G) by McCabe is a stab at a unit of complexity. But will it be universally acceptable?

  9. Program Complexity • Programmatic solution for “chess” is hard or complex or both? • Safely: A complete and correct chess program is hard to code • Yet the rules are simple and relatively few • And it has been solved programmatically to the grand-master level • Kasparov lost to “Deep Blue” in a Tournament in game 1 in 1996, overall competition ended up in a tie in 1997 [8] • Degree of difficulty for finding a solution quantifies complexity! • For example, solving Sudoku? • Some problems seem not hard, yet the number of special cases renders a solution virtually intractable • E.g. US tax code [9]; contains about 9,800 different sections; ~75,000 pages • Could be simpler and fairer, even equally applicable to all citizens • But instead is highly complex, due to “special cases” and requires experts to give definitive answers; has exceptions for individual tax payers! • Numerous CS attempts to formalize complexity, unit, computability • We cover 2 very briefly: Halstead’s and McCabe’s

  10. Complex vs. Hard • Complex is to be interpreted as “Mathematically difficult to find a correct algorithm!” • E.g. find an algorithm to identify all strongly-connected components in a graph: SCC • Hard is to be interpreted as “Very much work to compute the solution”, with the algorithm being not hard • E.g. compute the shortest path for a Travelling Salesman’s n stopping points • Might take so long that we are no longer interested in the solution • Instead: use heuristic provably no worse than x times the best solution • An incorrect solution, is always easy to compute 

  11. Halstead Program Metrics • Measures a specific program’s complexity • Metrics developed by the late Maurice Halstead • To directly quantify complexity of any given source program • Solely from operators, operands used in source • Halstead introduced measures in 1977 • Early formal program complexity measures • [1], [2], [3] • Not formally derived, but postulated • Halstead metrics carry an element of arbitrariness • Lack scientific proof! No formal derivation of the rules!

  12. Halstead Program Metrics • Halstead’s metrics count operators and operands in source code of program being analyzed • number of unique (distinct) operators (n1) • number of unique (distinct) operands (n2) • total number of operators (N1) • total number of operands (N2) • Number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated during lexical analysis of source program • Other Halstead measures are derived from these 4 units • but without proof or scientific derivation! • intuition of developer was used as the basis for deriving the measures • Halstead intended to provide formal proofs; but he died!

  13. Halstead Program Metrics • Operands • Literals, AKA constants; e.g. 0, 1000, “hello” • User defined identifiers for values, AKA symbolic constants, e.g. MAX is an operand in: #define MAX 5 • Reserved keywords that denote value, e.g. NIL • Declarations like #define MAX 5 less obvious • Depending on language, some language-defined type specifiers are treated as operands, e.g. in C++ char, int, double

  14. Halstead Program Metrics Operators • Common arithmetic symbols, e.g. + - / * ^ % • Other arithmetic symbols, e.g. ( and ) • Symbols for boolean operations, e.g. > >= < <= != && || • Symbols for all kinds of operations, including cat for concatenation in some languages • Reserved keywords, e.g. or, or else, and, and then, xor • Function names, e.g. add( a, 8 ), sin( 45 ), sqrt( 3 ) • Reserved operations, e.g. try, catch, throw • Type qualifiers, e.g. const, volatile • Scope specifiers, e.g. extern, static1 1 "static” an overloaded qualifier in C for scope & storage

  15. Halstead Program Metrics Operators that are control constructs: • if ( ... ) plus then-clause and optional else-clause • while ( ... ) • do ... • for( ; ; ) ... • catch() • return ... • switch {... }

  16. Halstead Program Metrics Program length N, vocabulary size n, program volume V: Program length N is the sum of total number of operators and operands in the program analyzed: • N = N1 + N2 Vocabulary size n is the sum of the number of unique operators and operands: • n = n1 + n2 Program volume V: information contents of program: • V = N * log2 n

  17. Halstead Program Metrics Difficulty level D, AKA degree of error-proneness: Level of difficulty D of program is proportional to number of unique operators n1 in program And proportional to the total number of operands N2 But with scale-factors applied to both D is postulated to be: • D = ( n1 / 2 ) * ( N2 / n2 ) • Interestingly, total number of operators N1 is not part of the formula for the difficulty level D

  18. Halstead Program Metrics Program level L: Program level L is inverse of error-proneness • i.e. a low level program is more prone to errors than a corresponding high level program for the same computable function • L = 1 / D

  19. Halstead Program Metrics Other measures, for you to elaborate in your paper • Effort to implement • Time to implement • Number of bugs delivered • Etc.

  20. Cyclomatic Number • Goal of McCabe’s Cyclomatic Numbers: • To have a measure of source program complexity • To manage complexity, rather than dealing with an unknown • See [4], [6] • Builds on: • Graph theory • E.g. [7] Berge: “Graphs and Hypergraphs” • Fundamental units: • Graph G –not necessarily connected! • Number of edges: e • Number of nodes: n • Number of connected components: p • i.e. if ( p > 1 ) then G is not connected

  21. Cyclomatic Number V • Cyclomatic number V of a graph G is called V(G) If: • e = number of edges • n = number of nodes, AKA vertices in other literature • p = number of connected components then: • V(G) = e – n + 2 * p

  22. Cyclomatic Number Samples • Sequence of 2 statements • e = 1 • n = 2 • p = 1 • V(G) = 1 – 2 + 2 * 1 = 1 • If Statement with Then- and Else- • e = 4 • n = 4 • p = 1 • V(G) = 4 – 4 + 2 * 1 = 2 • Sequence of 4 statements • e = 3 • n = 4 • p = 1 • V(G) = 3 – 4 + 2 * 1 = 1

  23. Cyclomatic Number of While While Loop • e = 3 • n = 3 • p = 1 • V(G) = 3 - 3 + 2 * 1 = 2

  24. Cyclomatic Number of Program Multiple-Module program with no cross-module vertices • Main Program = M • Module A = A() • Module B = B() • V(G) = V( M U A U B ) = V(M) + V(A) + V(B) M: A: B: V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(G) = 12 – 12 + 2*3 = 6

  25. References • Halstead metrics: http://www.verifysoft.com/en_halstead_metrics.html • Halstead’s book: Maurice Halstead, “Elements of Software Science”, Elsevier, 1977, ISBN 0444002057 • Detail on Halstead: http://www.horst-zuse.homepage.t-online.de/halstead.html • Wiki page on Cyclomatic numbers: http://en.wikipedia.org/wiki/Cyclomatic_complexity • Program complexity: http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-pap/DSS.PDF • Thomas J. McCabe, “A Complexity Measure”, IEEE Transactions on SWE, Viol. SE-2, No. 4, December 1976 • C. Berge: “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973 • Deep Blue Info: http://www.research.ibm.com/deepblue/ • Tax code info: http://www.fourmilab.ch/ustax/ustax.html • Halting Problem: http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html • Robert E. Tarjan: "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972

More Related