1 / 24

Benchmarking Web Accessibility Evaluation Tools:

http:// dx.doi.org /10.6084/m9.figshare. 701216. Benchmarking Web Accessibility Evaluation Tools:. Measuring the Harm of Sole Reliance on Automated Tests. Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia )

magda
Download Presentation

Benchmarking Web Accessibility Evaluation Tools:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://dx.doi.org/10.6084/m9.figshare.701216 Benchmarking Web Accessibility Evaluation Tools: Measuring the Harm of Sole Reliance on Automated Tests Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia) Vivienne Conway Edith Cowan University (Australia) 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013

  2. Problem & Fact WWW is not accessible 13 May 2013 W4A2013

  3. Evidence Webmasters are familiar with accessibility guidelines Lazar et al., 2004 Improving web accessibility: a study of webmaster perceptions Computers in Human Behavior 20(2), 269–288 13 May 2013 W4A2013

  4. Hypothesis I Assuming guidelines do a good job... H1: Accessibility guidelines awareness is not that widely spread. 13 May 2013 W4A2013

  5. Evidence II Webmasters put compliance logos on non-compliant websites Gilbertson and Machin, 2012 Guidelines, icons and marketable skills: an accessibility evaluation of 100 web development company homepages W4A 2012 13 May 2013 W4A2013

  6. Hypothesis II Assuming webmasters are not trying to cheat... H2: A lack of awareness on the negative effects of overreliance on automated tools. 13 May 2013 W4A2013

  7. Expanding on H2Why we rely on automated tests • It's easy • In some scenarios seems like the only option: web observatories, real-time... • We don't know how harmful they can be 13 May 2013 W4A2013

  8. Expanding on H2Knowing the limitations of tools • If we are able to measure these limitations we can raise awareness • Inform developers and researchers • We run a study with 6 tools • Compute coverage, completeness and correctnesswrt WCAG 2.0 13 May 2013 W4A2013

  9. MethodComputed Metrics • Coverage: whether a given Success Criteria (SC) is reported at least once • Completeness: • Correctness: 13 May 2013 W4A2013

  10. MethodStimuli Vision Australia www.visionaustralia.org.au Non-profit Non-government Accessibility resource Prime Minister www.pm.gov.au Federal Government Should abide by the Transition Strategy Transperth www.transperth.wa.gov.au Government affiliated Used by people with disabilities 13 May 2013 W4A2013

  11. MethodObtaining the "Ground Truth" Ad-hoc sampling Manual evaluation Agreement Ground truth 13 May 2013 W4A2013

  12. MethodComputing Metrics For every page in the sample... Evaluate Get reports Compare with the GT Compute metrics T1 M1 R1 GT T2 M2 R2 T3 M3 R3 R4 T4 M4 T5 M5 R5 R6 T6 M6 13 May 2013 W4A2013

  13. Accessibility of Stimuli Vision Australia www.visionaustralia.org.au Prime Minister www.pm.gov.au Transperth www.transperth.wa.gov.au 13 May 2013 W4A2013

  14. ResultsCoverage • 650 WCAG Success Criteria violations (A and AA) • 23-50% of SC are covered by automated test • Coverage varies across guidelines and tools 13 May 2013 W4A2013

  15. ResultsCompleteness per tool • Completeness ranges in 14-38% • Variable across tools and principles 13 May 2013 W4A2013

  16. ResultsCompleteness per type of SC • How conformance levels influence on completeness • Wilcoxon Signed Rank: W=21, p<0.05 • Completeness levels are higher for 'A level' SC 13 May 2013 W4A2013

  17. ResultsCompleteness vs. accessibility • How accessibility levels influence on completeness • ANOVA: F(2,10)=19.82, p<0.001 • The less accessible a page is the higher levels of completeness 13 May 2013 W4A2013

  18. ResultsTool Similarity on Completeness • Cronbach's α = 0.96 • Multidimensional Scaling (MDS) • Tools behave similarly 13 May 2013 W4A2013

  19. ResultsCorrectness • Tools with lower completeness scores exhibit higher levels of correctness 93-96% • Tools that obtain higher completeness yield lower correctness 66-71% • Tools with higher completeness are also the most incorrect ones 13 May 2013 W4A2013

  20. ImplicationsCoverage • We corroborate that 50% is the upper limit for automatising guidelines • Natural Language Processing? • Language: 3.1.2 Language of parts • Domain: 3.3.4 Error prevention 13 May 2013 W4A2013

  21. ImplicationsCompleteness I • Automated tests do a better job... ...on non-accessible sites ...on 'A level' success criteria • Automated tests aim at catching stereotypical errors 13 May 2013 W4A2013

  22. ImplicationsCompleteness II • Strengths of tools can be identified across WCAG principles and SC • A method to inform decision making • Maximising completeness in our sample of pages • On all tools: 55% (+17 percentage points) • On non-commercial tools: 52% 13 May 2013 W4A2013

  23. Conclusions • Coverage: 23-50% • Completeness: 14-38% • Higher completeness leads to lower correctness 13 May 2013 W4A2013

  24. Follow up Contact @markelvigo | markel.vigo@manchester.ac.uk Presentation DOI http://dx.doi.org/10.6084/m9.figshare.701216 Datasets http://www.markelvigo.info/ds/bench12/index.html 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013 13 May 2013

More Related