Toped: Enabling End-User Programmers to Validate Data

4 3 1 2 Fig A: Editing a format in Toped Fig B: Human-readable descriptions of input errors Toped: Enabling End-User Programmers to Validate Data Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science, http://www.cs.cmu.edu/~cscaffid End-user Programmers (EUPs) Millions of end users are also programmers who create spreadsheets or web forms containing… • Company names such as Microsoft • Room numbers Wean Hall 4104 • Campus phone numbers 8-3564 • Project numbers 004.000.270999.99 • Grant numbers CCF-0613823 • Email addresses cscaffid@cs.cmu.edu … and other kinds of inputs that are… • Short (usually in 1 spreadsheet cell or web form textfield) • Often ambiguously defined (a “valid” company name) • Often organization-specific (your validation rules may differ from mine!) • Sometimes application-specific Problem How can we enable EUPs to implement input-validation code? Prototype Based on pilot results, we designed a tool called Toped for implementing validation “formats”. Each format consists of named parts with constraints that can often or always be true. Toped accepts a set of examples, then infers a boilerplate format for EUPs to review and customize (Fig A). To support iterative refinement, a window allows EUPs to enter test strings. Toped converts the format to a CFG with constraints attached to the productions, then checks the strings against this constrained CFG. Toped’s integration with Microsoft Excel and Visual Studio (web form design tool) enable reuse of formats for validating spreadsheet and web form data. Our system identifies inputs that violate the CFG or constraints, then displays a human-readable message summarizing errors (Fig B). Users can override warnings in spreadsheets, as well as soft constraint violations in web forms. Pilot Study In their own words, 4 administrative assistants described how to recognize American mailing addresses and university project numbers. They almost always described data as a hierarchy of named parts, such as describing a mailing address as a street address, city, state, and zip. This structurally resembled a context-free grammar (CFG), down until sub-parts were so small that participants lacked names for them. At that point, participants used soft constraints to define sub-parts, such as saying that the street type usually is “Ave” or “St”, indicating that valid data occasionally violate these constraints. This stands in stark contrast to regexps and CFGs, which classify inputs as valid or invalid, with no shades of gray. Evaluation: Usability Study 16 EUPs implemented validation to find typos in 3 kinds of data—phone numbers, street addresses, and company names. We randomly assigned them to use Toped or a comparison tool (Lapis). Toped EUPs completed more tasks (2.79 of 3, vs 1.75), found more typos (92% of typos, vs 32%), were more accurate overall (F1 .74 vs .51), and were more satisfied with the tool (satisfaction question-naire scale score 3.78 ≈ “somewhat satisfied” vs 3.00 = “Neutral”). These differences were significant at P<0.01, except for accuracy (F1). Also, Toped EUPs were faster and more accurate at our tasks than EUPs doing similar tasks in an earlier study that evaluated a regexp editor. Future Work Our evaluation only involved 3 formats, and EUPs might struggle to implement formats for other data. We will develop a repository where EUPs can publish and share formats, enabling us to collect formats and feedback from EUPs using formats in real applications. Funded by EUSES under ITR-0325273, and by NSF under CCF-0438929 and CCF-0613823. Is it right?

Toped: Enabling End-User Programmers to Validate Data

Toped: Enabling End-User Programmers to Validate Data

Presentation Transcript

C for Java Programmers

The Game Development Process

Pemrograman Internet Mobile

Enabling Oracle Applications on DB2 an Early User Experience

Guide to multithreaded code

User-Defined Classes

Theoretical Foundations for Enabling a Web of Knowledge

User Engagement Toolkit

Data-Driven Decision Making

FSP DCR Training for Users

The Many Faces of CARLa Data

CTSA at RSNA 2010

HEI New User Training

Retail Market: User Interaction & Data Delivery in Nodal

Data Manipulation Using MySQL

Basics Of Computers

GENOPT

Panda: Public Auditing for Shared Data

Data Warehousing

Data Convergence and Enabling IT Infrastructure for Real Time Decision Support

Seven Ineffective Coding Habits of Many Programmers

Toped: Enabling End-User Programmers to Validate Data