1 / 18

C-REX: An Evolutionary Code Extractor for C

C-REX: An Evolutionary Code Extractor for C. Ahmed E. Hassan and Richard C. Holt University Of Waterloo. Traditional Code Extractor. Examines a snapshot of the code Examples: Rigiparse, CIA, CFX, CPPX, Produce facts such as: function_1 calls function_2 function_1 uses variable_1.

ollie
Download Presentation

C-REX: An Evolutionary Code Extractor for C

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C-REX: An Evolutionary Code Extractor for C Ahmed E. Hassan and Richard C. Holt University Of Waterloo

  2. Traditional Code Extractor • Examines a snapshotof the code • Examples: • Rigiparse, CIA, CFX, CPPX, • Produce facts such as: • function_1 calls function_2 • function_1 uses variable_1 Source Code Snapshot Traditional Extractor FACTS

  3. Evolutionary Code Extractor • Examines multiple snapshots of the code S0 S1 .. St St+1 Traditional Extractor F0 Ft Ft+1 F1 .. Compare Snapshot Facts Evolutionary Change Data

  4. Evolutionary Code Extractor • Produces facts about: • Addition, removal, or modification of code entities (functions/variables/macros): • function_1 is added/removed/modified • Dependencies between code entities: • function_1 no longer calls function_2

  5. The Need for Evolutionary Change Data • Assist in understanding how source code evolves and how code changes propagate: • Build better software development tools: • Measure the benefits of not-yet-existing tools • Measure the value of adopting development tools or methodologies • Monitor the quality and state of software systems: • Examine if changes are localized or scattered

  6. Motivating Example main() { int a; /*call help*/ helpInfo(); } helpInfo() { errorString! } main() { int a; /*call help*/ helpInfo(); } helpInfo(){ int b; } main() { int a; /*call help*/ helpInfo(); } V1: Undefined func. (Link Error) V2: Syntax error V3: Valid code

  7. Challenges • Robustness -- code is always changing: • Release frequency - too coarse • Change frequency - code may be invalid/incomplete (lookahead techniques) • Accuracy of extracted data (AST level is too complex): • Do not go to AST level, to make life easier  • Scalability/performance to extract large long-lived systems • Complexity of building the extractor and effort required: • Adopt off the shelf components

  8. Evolutionary Code Extractor Using a traditional snapshot extractor is not feasible S0 S1 .. St St+1 Traditional Extractor F0 Ft Ft+1 F1 .. Compare Snapshot Facts Evolutionary Change Data

  9. ctags • Can tag start and end of code entities • Used by source editors for highlighting and navigational support • Contains a variety of heuristics to handle incomplete and complex code (ifdef’s and K&R vs. ANSI C) • Supports over 30 languages • Actively maintained and highly optimized

  10. Our Solution – C-REX • Use CVS to acquire thousands of code snapshots • Use ctags to assist in the parsing and code analysis • Use Perl scripts to drive the ctags analysis • Attach additional CVS data

  11. C-REX Evolutionary Change Data Schema (1/2)

  12. C-REX Schema (2/2)

  13. C-REX Implementation Overview: Simple Example

  14. Implementation Overview • Revision Data Extraction • Retrieve Revision Details from CVS: • File revisions • Developer name • Change message • Other changed files • Entity Extraction Using ctags • Record start and end of each entity and contents • Build Historical Symbol Table: • helpInfo • Main • helpInfo2

  15. Implementation Overview • Entity Analysis – Create 3 buckets: • Code bucket • Comment bucket • Control bucket • Token Change Analysis • Dependency Change Analysis using Historical Symbol Table • Attaching Revision Data to Recovered Change Data

  16. C-REX Output Size (in MB)

  17. C-REX Limitations • Performance: • 10 yrs project (NetBSD) takes 12 hrs • RAID drives to improve performance • Parallelize the extraction • Dependency Analysis: • Does not consider the build system (Makefiles) • Dependency linking windows • Beyond C and CVS

  18. Conclusions • Introduced evolutionary code extractor -- a new type of code extractor that extracts the evolutionary history of a project • Discussed the challenges associated with building such an extractor • Presented the implementation of C-REXand highlighted its limitations

More Related