1 / 1

Toped: Enabling End-User Programmers to Validate Data

4. 3. 1. 2. Fig A: Editing a format in Toped. Fig B: Human-readable descriptions of input errors. Toped: Enabling End-User Programmers to Validate Data. Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science, http://www.cs.cmu.edu/~cscaffid.

verdad
Download Presentation

Toped: Enabling End-User Programmers to Validate Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4 3 1 2 Fig A: Editing a format in Toped Fig B: Human-readable descriptions of input errors Toped: Enabling End-User Programmers to Validate Data Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science, http://www.cs.cmu.edu/~cscaffid End-user Programmers (EUPs) Millions of end users are also programmers who create spreadsheets or web forms containing… • Company names such as Microsoft • Room numbers Wean Hall 4104 • Campus phone numbers 8-3564 • Project numbers 004.000.270999.99 • Grant numbers CCF-0613823 • Email addresses cscaffid@cs.cmu.edu … and other kinds of inputs that are… • Short (usually in 1 spreadsheet cell or web form textfield) • Often ambiguously defined (a “valid” company name) • Often organization-specific (your validation rules may differ from mine!) • Sometimes application-specific Problem How can we enable EUPs to implement input-validation code? Prototype Based on pilot results, we designed a tool called Toped for implementing validation “formats”. Each format consists of named parts with constraints that can often or always be true. Toped accepts a set of examples, then infers a boilerplate format for EUPs to review and customize (Fig A). To support iterative refinement, a window allows EUPs to enter test strings. Toped converts the format to a CFG with constraints attached to the productions, then checks the strings against this constrained CFG. Toped’s integration with Microsoft Excel and Visual Studio (web form design tool) enable reuse of formats for validating spreadsheet and web form data. Our system identifies inputs that violate the CFG or constraints, then displays a human-readable message summarizing errors (Fig B). Users can override warnings in spreadsheets, as well as soft constraint violations in web forms. Pilot Study In their own words, 4 administrative assistants described how to recognize American mailing addresses and university project numbers. They almost always described data as a hierarchy of named parts, such as describing a mailing address as a street address, city, state, and zip. This structurally resembled a context-free grammar (CFG), down until sub-parts were so small that participants lacked names for them. At that point, participants used soft constraints to define sub-parts, such as saying that the street type usually is “Ave” or “St”, indicating that valid data occasionally violate these constraints. This stands in stark contrast to regexps and CFGs, which classify inputs as valid or invalid, with no shades of gray. Evaluation: Usability Study 16 EUPs implemented validation to find typos in 3 kinds of data—phone numbers, street addresses, and company names. We randomly assigned them to use Toped or a comparison tool (Lapis). Toped EUPs completed more tasks (2.79 of 3, vs 1.75), found more typos (92% of typos, vs 32%), were more accurate overall (F1 .74 vs .51), and were more satisfied with the tool (satisfaction question-naire scale score 3.78 ≈ “somewhat satisfied” vs 3.00 = “Neutral”). These differences were significant at P<0.01, except for accuracy (F1). Also, Toped EUPs were faster and more accurate at our tasks than EUPs doing similar tasks in an earlier study that evaluated a regexp editor. Future Work Our evaluation only involved 3 formats, and EUPs might struggle to implement formats for other data. We will develop a repository where EUPs can publish and share formats, enabling us to collect formats and feedback from EUPs using formats in real applications. Funded by EUSES under ITR-0325273, and by NSF under CCF-0438929 and CCF-0613823. Is it right?

More Related