1 / 92

reading & understanding code

reading & understanding code. experts are better at code comprehension because they focus on higher level patterns patterns can be considered “discourse rules” naming conventions, design patterns, schemas experts work significantly better when reading & writing code according to these patterns.

ozzie
Download Presentation

reading & understanding code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. reading & understanding code • experts are better at code comprehension because they focus on higher level patterns • patterns can be considered “discourse rules” • naming conventions, design patterns, schemas • experts work significantly better when reading & writing code according to these patterns

  2. reading & understanding code program comprehension expertise effects mental models tools

  3. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  4. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  5. mental model • explanation of a someone’s thought process when carrying out a task • our someone: programmers • our task: program comprehension • several models exist

  6. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  7. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  8. bottom-up mental models • 1st: read code statements • 2nd: chunking: group statements as abstractions • 3rd: repeat

  9. chunking sequence chunk 1 chunk n chunk 2 element 1 element k element 2 modified from wikipedia

  10. chunking • program model • reasoning about the order of computation, how control moves throughout a program • “control flow” • situation model • reason about how data moves through atomic models • “data flow” N. Pennington Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs Cognitive Psychology, 1987

  11. program & situation model studies • participants first primed for either control flow or data flow • shown a piece of code, asked to recall another piece of code which is related through either control flow or data flow • participants then asked a question that relates to either control or data flow • participants primed to think about control flow answered other control-flow questions faster, same with data flow N. Pennington Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs Cognitive Psychology, 1987

  12. types of programmer knowledge • semantic: general programming concepts • low-level knowledge, e.g. what a=1 means • high-level knowledge, e.g. sorting algorithms • syntactic: language detail • overlaps between languages • stylistic: programming conventions • “discourse rules” B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979 E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  13. problem statement short term memory internal semantics (working memory) program high level concepts low level concepts knowledge (long term memory) semantic knowledge syntactic knowledge high level concepts COBOL FORTRAN PL/I LISP low level concepts B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979

  14. evidence forsemantic & syntactic knowledge • lab studies using FORTRAN • participants: programmers and non-programmers • asked to perform tasks that used one type of knowledge • six studies (will describe two) B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979

  15. program memorization • study • two subject types: non-programmers & programmers • two program versions: normal & shuffled • participants asked to memorize a program • results • non-programmers performed equally poorly with normal & shuffled programs • programmers performed poorly with shuffled program, well with normal • were able to remember semantic details with syntactic variations • conclusion • programmers were not memorizing the program, but internal semantics to represent its function B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979

  16. commenting • study • two program versions • 5-line high-level block comment at top • numerous interspersed low-level comments • participants asked to make modifications to program & memorize program • result • high-level comment participants performed better • strong correlation between ability to make modifications and ability to memorize • conclusion • memorization is a strong correlate to comprehension • hierarchical chunking to organize statements into a unit facilitate comprehension process B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979

  17. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  18. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  19. top-down models • 1st: develop hypotheses about the program • 2nd: evaluate and refine hypotheses • with the help of beacons • 3rd: repeat • a process of “reconstructing knowledge”

  20. beacons • “indexes into existing knowledge” • recognizable features in that are cues to the presence of certain structures • e.g., looking for a listener pattern M. Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future IEEE Workshop on Program Comprehension, 2005 R. Brooks Towards a theory of the comprehension of computer programs International J. on Man-Machine Studies, 1981

  21. beacon types • semantic knowledge “plans” • reusable generic program fragments • high-level or low-level • programming discourse conventions • “rules” that make program comprehension easier • found across programmers E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  22. brooks’ model problem external representation requirement documentation program code design document match beacons beacons beacons syntactic knowledge semantic knowledge verify internal schema vs external representation internal representation –hypotheses and subgoals R. Brooks Towards a theory of the comprehension of computer programs International J. on Man-Machine Studies, 1981 modified from Jonathan I. Maletic’sslides: An Overview of Mental Models for Program Understanding

  23. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  24. mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation

  25. opportunistic & systematic strategies • programmers enhancing existing program • two strategies: • systematically read code in detail, tracing through control and data flow manually • developed control and data flow knowledge • focus only on code relevant to a task • developed only control flow knowledge, resulted in a weaker understanding Margaret-Anne Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future Int. Workshop on Program Comprehension, 2005

  26. integrated model • maintainers switch between top-down and bottom-up comprehension • top-down if code or code type is familiar • program model (control-flow) when code is completely unfamiliar • situation model (data-flow) after a partial data-flow understanding is developed through top-down or program model methods • knowledge base: information from previous three models Margaret-Anne Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future Int. Workshop on Program Comprehension, 2005 A. von Mayrhauser and A.M. Vans From Program Comprehension to Tool Requirements for an Industrial Environment IEEE Workshop on Program Comprehension, 1993

  27. validating the integrated model • taped professional maintenance programmers • worked with a large code base • classified as domain and language experts • tape transcriptions classified into model types • one of few studies with real world tasks

  28. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  29. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  30. programming discourse rules • specify the conventions of programming • e.g., a variable’s name should reflect its function • e.g., don’t include code that won’t be used • similar to writing discourse rules, as outlined in books like Elements of Style • e.g., you expect to find the description for fig. 7 between those for fig. 6 and fig. 8 E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  31. rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  32. testing discourse rules • lab study with expert & novice programmers • two program types • α (plan-like): obeyed discourse rules • β (un-plan-like): disobeyed discourse rules • participants given either α or β code, with one blank • task:fill the blank with what seems “natural” • participants were not told about α or β code • conclusion: experts fared best with α code

  33. why have un-plan-like (β) code? • machine limitations • limited memory, processing, bandwidth, etc. • language limitations • less common. bugs, efficiency issues, etc. • programmer limitations • does not have full mastery of discourse • historical traces • resistance to changing legacy code, permanent “temporary” code source: The Psychology of Computer Programming

  34. XXX: PROCEDURE OPTIONS(MAIN); DECLARE B(1000) FIXED(7,2), C FIXED(11,2), (I, J) FIXED BINARY; C = 0; DO I = 1 TO 10; GET LIST((B(J) DO J = 1 TO 1000)); DO J = 1 TO 1000; C = C + B(J); END; END; PUT LIST(‘RESULT IS ’, C); END XXX; modified from The Psychology of Computer Programming

  35. XXX: PROCEDURE OPTIONS(MAIN); DECLARE A(1000) FIXED(7,2), C FIXED(11,2), I FIXED BINARY; C = 0; GET LIST((A(J) DO I = 1 TO 10000)); DO I = 1 TO 10000; C = C + B(I); END; PUT LIST(‘RESULT IS ’, C); END XXX; modified from The Psychology of Computer Programming

  36. rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  37. rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  38. naming conventions • meaningful names • variable naming reflects cognitive structure • grammatical sensibility • interact with language spec. to form expressions • containers & paths • objects & pointers • polysemy, homonymy, & overloading • operators, name sharing B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006

  39. naming conventions • meaningful names • variable naming reflects cognitive structure • grammatical sensibility • interact with language spec. to form expressions • containers & paths • objects & pointers • polysemy, homonymy, & overloading • operators, name sharing B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006

  40. meaningful names • metaphors for domain tasks • e.g. pushing objects onto a stack • keywords for grouping • e.g. common prefixes & suffixes • informative names • balanced with name length A. Blackwell Metaphor or analogy: how should we see programming abstractions? Psychology of Programming Interest Group, 1996 B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006

  41. name length • length harm readability and recall ability • idioms and memory ties improve readability and recall ability • takeaway: variable names with consistent and abbreviated vocabulary are optimal • (variable names that concisely express a metaphor) D. Binkley, D. Lawrie, S. Maex, and C. Morrell Identifier length and limited programmer memory Science of Computer Programming, 2009

  42. grammatical sensibility • names as phrase fragments • methods as actions (change state of program) • e.g. addElement, setSize, removeAll • methods as mathematical functions (compute result, don’t alter state) • e.g. true/false: contains, equals, isEmpty • e.g. data: capacity, indexOf, size • valence cues (phrase fragments w/ open slot) • e.g. roster.contains(player) • smalltalk makes use of this extensively: • roster insert: player at: position B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006

  43. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  44. outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools

  45. 20:1 programmer performance • Sackman et al.: best programmers are 20xbetter than worst programmers @ bug fixing • study originally meant to evaluate the effectiveness of time-shared systems H. Sackman, W. J. Erikson, and E. E. Grant Exploratory experimental studies comparing online and offline programming performance Communications of the ACM, 1968

  46. 10:1 programmer performance • there are substantial programmer efficiency differences, but not as dramatic as initially reported • what makes experts so much better at understanding code?

  47. testing discourse rules • lab study with expert & novice programmers • two program types • α (plan-like): obeyed discourse rules • β (un-plan-like): disobeyed discourse rules • participants given either α or β code, with one blank • task:fill the blank with what seems “natural” • participants were not told about α or β code

  48. α problem PROGRAM Magenta(input, output) VAR Max, I, Num INTEGER BEGIN Max = 0. FOR I = 1 TO 10 DO BEGIN READLN(Num) If Num Max THEN Max = Num END WRITELN(Max). END ? E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

  49. α solution PROGRAM Magenta(input, output) VAR Max, I, Num INTEGER BEGIN Max = 0. FOR I = 1 TO 10 DO BEGIN READLN(Num) If Num > Max THEN Max = Num END WRITELN(Max). END E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984

More Related