1 / 14

Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky

Explore the need for a natural-language-based system that extracts information from documents for automating software testing. Learn about SIFT (Specification Information From Text) and its unique features, alternative extraction methods, system design, and experimental results.

ewalton
Download Presentation

Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction from Documents for Automating Softwre Testingby Patricia Lutsky Presented by Ramiro Lopez

  2. Outline • Why is there a need for a natural-language-based system for extracting information from documents • Alternative ways for extracting information from documents • System design and implementation details • Experimental Results

  3. Motivation for SIFT • What is SIFT? SIFT stands for Specification Information From Text. • Various documents in Software Engineering are written in natural language. • Examples: Requirements and Specification Documents, User Manuals. • Software Engineering Documents tend to be written in a very particular way with specific sections and subsections, i.e., they are semi-structured.

  4. What does SIFT do? • SIFT is essentially an automated testing tool • It extracts specification-level information, generates tests with that information and adds them to the set of existing test cases • The tests are then run to check that the system conforms to the documentation

  5. Alternative ways for extracting information from documents • Use a controlled language for requirements specifications • Parse natural language texts about testing entirely and generate test scripts • Extract specific facts on system specifications, but no specific testable facts

  6. What is unique about SIFT? • Extracts specific testable facts from semi-structured documents • Uses XML, which separates content information from presentation formats, to give the document a consistent structure • Does not pursue full-text understanding, thus avoiding issues related to the endless ways of saying the same thing

  7. How to use SIFT • Identify concepts that can be extracted for testing • Examine a document to find out how it is organized and to find the different sentence types • Encode sentence types in a grammar • Create XML tags to give the document a consistent structure

  8. XML tag examples

  9. Example of how a sentence is processed • Natural-language specification: The maximum value you can specify with the BUFQUO argument is 65355 • The parser translates this to a canonical form: The maximum value for BUFQUO is 65355 and a canonical form (maximum_value BUFQUO 65355) • Maximum_value BUFQUO 65355 is then mechanically converted into actual code, a test case, and added to the system

  10. Example of a rule in a grammar • Suppose you have two structurally equivalent sentences: The box is on the counter. The glass is under the counter. • They would be translated into a rule in a grammar as follows: NounPhrase is Preposition NounPhrase

  11. When can SIFT be used • Use on long-term projects where documentation will go through many versions • Use on semi-structured documents that are organized in a predefined way • Use on documents written in a consistent style • Use on domains that have many similar semantic entities (example: methods that have arguments)

  12. Experimental Results • SIFT was used to extract information from an operating system’s reference manual • The total number of tests identified by the developers was 174 • SIFT was able to find 25 or 14% of the 174

  13. Final thoughts • It is only a proof-of-concept testing tool, but it has potential to save developers time on trivial test cases • I think the natural-language approach is error-prone and costly because people may not follow a consistent writing style • Deciding on a standard template that limits the choices of structure in a document might be more useful, since people will be forced to follow the standard and it is less likely that tests will be missed because of an inconsistent writing style

More Related