1 / 77

Presentation 7 Summary

Presentation 7 Summary. Cross Language Clone Analysis Team 2 November 22, 2010. Agenda. Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward. Our Team. Allen Tucker Patricia Bradford Greg Rodgers

iren
Download Presentation

Presentation 7 Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presentation 7 Summary Cross Language Clone Analysis Team 2 November 22, 2010

  2. Agenda • Feasibility Study • Release Plan • Architecture • Parsing • CodeDOM • Clone Analysis • Testing • Demonstration • Team Collaboration • Path Forward

  3. Our Team • Allen Tucker • Patricia Bradford • Greg Rodgers • Brian Bentley • Ashley Chafin

  4. Feasibility Study Our evaluation of the project to determine the difficulty in carrying out the task.

  5. Task Summary • Our Customers: Dr. Etzkorn and Dr. Kraft • Customer Request: • A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones. • Areas to Note: • the user interface • easy comparisons of clones • visualization of clones • sub-clones • clone detection for large bodies of code

  6. Task Summary (cont.) • Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model. • Some Language Independent Object Models: • Dagstuhl Middle Metamodel (DMM) • Microsoft CodeDOM • Both of these models provide a language independent object model for representing the structure of source code.

  7. Task Understanding • Three Step Process • Step 1 Code Translation • Step 2 Clone Detection • Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Clone Visualization UI Detected Clones

  8. Benefits • Fact: Modularity is a key characteristic in today’s software world • Why? Allows us to divide software into a decomposed separation of concerns • Attributes to maintainability, reusability, testability and reliability • Clone Detection allows us to detect common software spread across large bodies of code • Identify code that is subject to further modularity

  9. Features • Clone Detection Software Suite • Identifies • Tracks • Manages Software Clones • Multi-language support • C++ • C# • Java

  10. Features (cont) • Provides complete code coverage • Multi-Application Support • Stand-alone • Plug-in based (Eclipse) • Backend service (Ant task) • Extendible • Built on a Plug-in Framework • Add new languages • Easy to Navigate between Clones • Persists Clones for easy Retrieval

  11. Risk Analysis • Complexity of problem proves more difficult than initial estimates. • Technology to be applied is neither well-established or has yet to be developed. • Unable to complete defined project scope within schedule. • Volatile user requirements leading to redefinition of project objectives.

  12. Release Plan Release Plan and User Stories

  13. Re-tooled User Stories • Came out with original Release Plan on 9/15/20 • Due to customer wants/needs, we had to re-tool our user stories. • Dr. Etzkorn’s main concerns: • Load source code and translate to a language independent model • Analyze the translated source code for clones • Results from meeting: • Created two new user stories (see next two slides) • These two user stories have been pushed to the front of our card stack

  14. CS 666 Studio I User Stories Phase I

  15. Source Code Load & Translate 017 1 14 Days As an analyst I want the to load and translate my source code projectsso I can analyze the source for clones.

  16. Source Code Analyze 018 1 14 Days As an analyst I want the to analyze my source code projectsso I can see the clones.

  17. Code Clone Highlights 002 1 14 Days As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

  18. Current Tasks Requirements & Models

  19. Current Tasks’ Requirements • Requirements modeling for the first user story “Source Code Load & Translate”: • Load & parse C#, Java, C++ source code. • Translate the parsed C#, Java, C++ source code to CodeDOM. • Associate the CodeDOM to the original source code. • Requirements modeling for the second user story “Source Code Analyze”: • Analyze CodeDom for clones.

  20. UML Model – Load & Parse

  21. UML Model – Translate

  22. UML Model – Associate

  23. UML Model – Analyze

  24. Architecture Design and Architecture

  25. Key Architecture Points • Multilanguage support • Configurable for different platforms • Stand-along application • plug-in • backend service • Extendable

  26. Architecture Application User Interface Web Interface Core Clone Detection Algorithms Code Model Service API Language Support (Interface) Eclipse Plug-in C# Service Java Service C++ Service Etc…

  27. Core Unit • Code Model • Stores the code in common format • Application Programming Interface • Used to embed clone detection in applications • Language Service Interface • Communication layer between the core and the specific language services Core Clone Detection Algorithms Code Model API Language Service Interface

  28. App Configuration

  29. CRC Card Sampling Class Responsibility Collaboration Cards

  30. Java Parser CRC

  31. C# Parser CRC

  32. Language ServiceCRC

  33. Java Service CRC

  34. Cs Service CRC

  35. CloneDetectionCRC

  36. Parsing Our struggles and our successes.

  37. Parsing Struggles & Successes • We explored and conducted spikes on CSParser and CS CodeDOM Parser. • They both had advantages and disadvantage. • We came to the conclusion that neither of them were going to fit our needs. • We explored and conducted a spike on GOLD Parser. • We ultimately chose the GOLD Parser because it best fit our needs. • This gave us a way to manage multiple language grammars with one engine.

  38. GOLD Parsing System GOLD Parsing Populating CodeDOM

  39. How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data

  40. How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Typical output from engine: a long nested tree

  41. Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST • CodeDOM Conversion • Need to write routine to move data from Parsed Tree to CodeDOM • Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars

  42. Grammar Updates Bookkeeping for parsing the multiple grammars.

  43. Grammar Updates • Currently the grammars we have for the Gold parser are out dated. • Current Gold Grammars • C# version 2.0 • Java version 1.4 • Current available software versions • C# version 4.0 • Java version 6

  44. Grammar Update Issues • Grammars for C# and Java are very complex and require a lot of work to build. • Antler and Gold Parser grammars use completely different syntax. • Positive note: Other development not halted by use of older grammars.

  45. Our Bookkeeping Bookkeeping for parsing the multiple grammars

  46. Compiled Grammar Table • For Java, there is… • 359 production rules • 249 distinctive symbols (terminal & non-terminal) • For C#, there is… • 415 production rules • 279 distinctive symbols (terminal & non-terminal)

  47. Production Rule Dependancies

  48. Our Grammar Bookkeeping Since there are so many production rules, we came up with the following bookkeeping: • A spreadsheet of the compiled grammar table (for each language) with each production rule indexed. • This spreadsheet covers: • various aspects of language • what we have/have not handled from the parser • what we have/have not implemented into CodeDOM • percentage complete

  49. Our Grammar Bookkeeping

  50. Parsing & CodeDOM Status • Parsing Handlers’ Status: • C# = 100% complete • Java = 100% complete

More Related