A Framework for Empirical Evaluation of Model Comprehensibility

AFrameworkforEmpirical EvaluationofModel Comprehensibility Jorge Aranda, Neil Ernst, Jennifer Horkoff, & Steve Easterbrook University of Toronto MiSE 2007, Minneapolis, MN

IMAGINE DILBERT STRIP HERE

If there are so manymodeling languages... • Static modeling languages • Dynamic modeling languages • Intentional modeling languages • Argumentation modeling languages • Belief modeling languages • Meta-modeling languages • Unified modeling languages • ...

...why does nobody use them? • OK, nobody is too strong • But from what I see, when faced with real projects... • my colleagues do not use them • my professors do not use them • my tutored students do not use them • unless I force them to • most small software houses I know of do not use them • 86% in a soon-to-appear field study of small software companies • most big software corporations I know of do not use them • Figure appears to range from 10-25% depending on the survey • Highest usage number (~50%) of use case and class diagrams by companies contacted through the OMG • Dilbert does not seem to use them either • There is a community that, though they may not use them, they certainly talk about them

Why does nobody use my modeling language? stuff?

Possibility 1:Kial neniu uzi Esperanto? • Esperanto for “why does nobody use Esperanto?” • I think... • Reason: Unnecessary invention • English is universal enough • Klingon is geekier • Only useful in Esperanto conventions! • Remedy? • Mostly hopeless

Possibility 2:Why does nobody use my typewriter? • Looks complicated • Not intuitive • I’m used to handwriting • Reason: Complexity and unfamiliarity • At first sight it doesn’t seem to be worth the effort • But all it takes is to see an expert doing wonders with it for us to want to learn as well • Remedy? • Training and publicity

Possibility 3:Why does nobody use my bicycle? • Very uncomfortable! • Nicknamed “bone breaker” • Painful landings • Inappropriate roads • Reason: Needs refinement • Evolution • Trial and error • Context (roads) evolves along with artifact (bike) • Remedy? • Evaluation and refinement

So which is it? • Why does nobody use my modeling language? • Useless proposal? • Certainly true in some cases • (but not for any of the members of this distinguished audience!) • Lack of training? • Less true than we’d like it to be • Our favourite excuse • “Software developers don’t know what they’re doing” • “This is why we’re going through a software crisis” • “Adapt the user” approach • Or often “blame the user”... • Needs refinement? • “Adapt the tool” approach • Evaluate, identify, eliminate weaknesses • True of almost every proposal

} Communication Why would we use models? • Exploration and reflection • Model-driven development • Model checking • RUP asked me to do it • Explaining a domain to developers • Explaining a system to clients • Documenting for future maintainers • Memory aids • Minimizing ambiguity • Simplified, abstracted terms • Models are primarily communication artifacts!

Communication Artifacts • Models are communication artifacts... • ...so let’s study them as such! • Some qualities of communication artifacts: • Codification effort • Learning curve • Obsolescence • Comprehensibility • We decided (for now) to focus on comprehensibility • Why? • It has bitten us in the past • “It is not enough to preach ---one must be heard”

Challenges of Evaluating Comprehensibility • Tricky construct • Affected variables • Correctness of understanding • Time • Confidence • Perceived difficulty • Affecting variables • Type of task • Language expertise • Domain expertise • Problem size • Unfeasible to evaluate them all in a single empirical study

Challenges of Evaluating Comprehensibility • Accessibility of participants • It’s hard enough to find participants for standard software engineering studies • Requiring language/domain expertise makes the task much harder • Ensuring “fair” comparisons • It is practically impossible to guarantee that two different representations transmit the same meaning to a human reader • Informal semantics play a large role in human comprehension

A Framework forEmpirical Evaluation... • Most modeling languages are never evaluated • And third-party evaluation is almost non-existent • Popular languages do get their share of studies (ER, DFDs, some UML) • But for most proposals we’re stuck with version 1.0 • We designed a framework to run empirical studies of model comprehensibility • Based on our survey of (scarce) past comprehensibility papers... • ...and on our struggle to design appropriate evaluations • I am not going to explain it in full here • No time! • I will only cover it superficially and refer you to our paper • Warning: The framework itself has not been evaluated!

The Framework • Step 1: Select the modeling notation • Which version will be studied? • Are we including language extensions? • Can we tweak the rules of the notation (as often happens in practice), or are we implementing the rules strictly?

The Framework (cont) • Step 2: Articulate the underlying theory of the language • What is the language useful for? • Who should be writing in it? • Who should be reading it? • When in the software process should the language be used? • Step 3: Formulate the claims of the notation • Re-express the underlying theory as a set of claims regarding comprehension

The Framework (cont) • Step 4: Choose a control • It should be a sensible alternative to the notation • It does not need to be diagrammatic • Risky to compare a language extension vs. the bare language • Step 5: Turn the claims into hypotheses • Consider the affected/affecting comprehensibility variables • From a language evolution perspective it is more important to discover which elements and concepts work well, and which do not, rather than to make general claims of the language

The Framework (cont) • Step 6: Inform the hypotheses • Bring insights from other areas that study comprehension • External cognition • Cognitive dimensions framework • ... • Step 7: Design and execute the study • Suggestions: • Natural domains • Explicit participant roles • Expert modelers • Two or more domains • Collect data on all affected variables • Step 8: Improve these guidelines

The Framework (summary) • Step 1: Select the modeling notation • Step 2: Articulate the underlying theory • Step 3: Formulate the claims of the notation • Step 4: Choose a control • Step 5: Turn the claims into hypotheses • Step 6: Inform the hypotheses • Step 7: Design and execute the study • Step 8: Improve these guidelines

Questions?

A Framework for Empirical Evaluation of Model Comprehensibility