testing evaluation eric morley june 1 2010 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Testing / Evaluation Eric Morley June 1, 2010 PowerPoint Presentation
Download Presentation
Testing / Evaluation Eric Morley June 1, 2010

Loading in 2 Seconds...

play fullscreen
1 / 26
Download Presentation

Testing / Evaluation Eric Morley June 1, 2010 - PowerPoint PPT Presentation

Download Presentation

Testing / Evaluation Eric Morley June 1, 2010

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Testing / EvaluationEric MorleyJune 1, 2010

  2. Papers • File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative and Alternative Communication 18 (December), 228-241. • Todman, John and HalinaRzepecka. 2003. Effect of pre-utterance pause length on perceptions of communicative competence in AAC-aided social conversations. Augmentative and Alternative Communication, 19(4):222–234. • Ball, L. L., Beukelman, D. R., Pattee, G. L., 2004. Acceptance of augmentative and alternative communication technology by persons with amyotrophic lateral sclerosis. Augmentative and Alternative Communication 20 (2), 113-122.

  3. File and Todman 2002 • Voice output communication aids (VOCAs) allow storage of whole utterances, but are not designed for users to rely on these for open-ended conversations • Impossible to predict which utterances will be needed • Difficult to locate potential follow-up phrases • May be possible to use “imperfect expressions” in social conversation and still have a positive experience

  4. TALK • Computer system (Talk Aid Using Pre-Loaded Knowledge) • Uses pre-stored messages • For people who understand language but can’t talk (anymore) • User writes phrases on their own time • Phrases stored in person, time and aspect locations • Ex: My father was a shop manager (me/past/who) • Quick fires: “ah yes”, “too bad”, etc. • Can be set to say equivalent phrase to one selected • Comments: useful in many contexts (ex. “what about you?”) • Allows word by word entry

  5. Old TALK • Evaluations have been getting more realistic since inception (started with cocktail party type situation) • Rates of up to 40-60wpm (vs 2-15wpm for word by word) • Improvements in interaction quality • Some differences from normal conversation in simulated use with unimpaired user • VOCA user was less likely to follow up with narrative

  6. Current Study • Conversations can be on any topic • Genuine users interacting with new partners • 19/68 conversations had the same partner allowed assessment of coherence of conversations with a familiar partner • Analyzed 68 old transcripts of conversations with lag-sequential analysis • Participants were • 1 TALK user (~40y.o w/cerebral palsy w/dysarthria) • 56 undergrad psych students • 50 students were invited to talk • 6 were invited to have a “getting to know you” conversation • People: TALK user (CP); repeat partner (RP); new partners (NP1 and NP2) from different sets of partners (not repeated)

  7. Hand Labeling • Speech acts coded by category • Questions; answer; observation; agreement; disagreement; repetition; interjection; directive; narrative • 2,345 speech acts w/RP; 2,802 acts w/NP1; 4,268 w/NP2 • Seemed to be ~90% agreement on labeling based off of re-labeling 15% of the acts • Used kappa statistic to confirm agreement of labelers

  8. Lag Sequential Analysis • Statistical analysis of a sequence of terms (here speech acts in a conversation) • Finds statistically significant pairs of acts which occur at particular distances (ie lags) • Lag n means acts a and b with n-1 intervening acts • Crosslag: speech act a (intervening acts) speech actb • Autolag: speech act a (intervening acts) speech acta • Looked at both types together and at crosslag alone • Pooled data across conversation sets (RP, NPi) • Use z-scores: difference between observed and expected lag probabilities • Only counted long sequences as statistically significant if subsequences were also significant)

  9. Results • Most speech acts were questions, answers, observations or agreements • VOCA user made 42% of acts (41% of questions and observations, 55% of answers, 10% of agreements) • In unaided conversations one would expect many more observations than questions and answers, and some more agreements • 31 sequences identified • 19 had ≥3 speech acts • Of those which occurred in all 3 sets, 13 sequences identified, 7 with 3 speech acts • Question and answer sequences were common • Facilitates turn-taking

  10. Initiations • Observation • Often followed by questions, sometimes agreement • CP doesn’t interject as much as RP • Agreement • Used for turn taking in unaided conversation • Questions used for this in aided conversation • Other • CP repetition and N/RP narrative followed by question

  11. Discussion • Only speakers reliably followed answers with observations or narrative • Aided partner did not use narrative (possibly because of lack of practice) • Maybe training would help this • More questions in aided conversation • More likely that the VOCA user has an appropriate general question than a specific narrative or observation • Gives VOCA user control over topic • Pre-stored utterances seem to be re-usable between new and repeat conversation partners • Should include “quick fire” interjection support for conversation with RP • Maybe the space taken up by this can be taken up with something else for NP mode?

  12. Todman and Rzepecka 2003 • Several types of VOCAs • Word by word • Need to generate text at some point, even with pre-stored messages. Pauses are so long that speaking rate goes to 2-15wpm • Utterances are extremely short, partner dominates conversation, “folk walk away” • Whole utterance approach (WUA) • Based on the idea that content of conversations is “frequently approximate rather than precise” • Should result in faster communication rates when precision is not critical • Is this the case? • Does quality of conversation degrade? • If yes to both of these, does this result in more positive perceptions of VOCA users’ communicative competence and interactions with them?

  13. More on WUA • TALK system • For free-ranging social interactions • Uses lots of small talk • 40wpm w/o training, 50-80 with • Frametalker • Designed for transactional conversations (restaurant, bank, etc) • 45 wpm, rated as having a high degree of naturalness

  14. Quality of Conversation (WUA) • Todman, Elder and Alm (1995) • Speaking person used TALK to converse • Parts of these conversations were reenacted with speakers, and pauses were removed • Compared with non-aided conversations • Aided found to be of higher quality • Likely because pre-composed messages will be more coherent • Would this be the case to the other listener, or only for people eavesdropping?

  15. Conversation Rate and Perceived Competence • Variation in conversation rate (CR) can be approximated by looking at pre-utterance pause length • A pause of even a few seconds can cause problems • User perceived as unintelligent • Poor quality of social interactions • Frustration • Abandonment of VOCA • Previous studies have found • Positive correlation between conversational rate and satisfaction • Negative correlation between pre-utterance pause length and satisfaction

  16. Previous Experiments • Newman (1982) • Pre-utterance pauses of 4-7s led to worse interactions when compared with utterances w/o pauses • If the pauses were a result of doing something else (sculpting), this effect disappeared • Does this apply to VOCA users? • Ratcliff, et al (2002) • Effects of pauses and speaking rate on naturalness of synthetic speech • Increased speaking rate perceived as more natural, pauses didn’t seem to do much on their own (only had an effect since they changed the speaking rate) • Bedrosian (2002) • VOCA users ask for a book, 1 with a mostly irrelevant message (after 4), other after 90sec with a highly relevant message • 2nd approach preferred • Had VOCA users give too much/too little information quickly, or relevant info after a delay • Tradeoff: short delay led to improved affective/behavioral ratings, wrong amount of info led to lower rating of cognitive component

  17. Current Experiment • How does pre-utterance pause time affect the perception of social conversation? • Does the amount of experience a VOCA user has with WUA have an effect on this? • 3 VOCA users with cerebral palsy • Used TalkBoards, had varying experience with this • Partners had 20 min introductory conversation • 2 of 3 got sick, so 5 partners, none with VOCA experience • Possible effects of having different partners • Told there were no restrictions on topic of conversation, but other would use VOCA • Interactions recorded, 5 min chunk extracted (after small talk) • Pre-utterance pauses replaced with pauses of set lengths (2-10s) • Pauses didn’t seem to be identical in length, so those are means • Also used original interaction • 28 raters • All psych students • Used Likert scale and recorded conversations as baseline • told they would first listen to a conversation with some natural speech changed to synthetic • Then they would listen to “getting to know you” conversation involving VOCA user

  18. Current Experiment (cont’d) • 7 point scale to evaluate 4 areas • Linguistic, operational, social, strategic • Raters heard each conversation 1x with one pause variation for each one • Blocked raters, found high level of agreement among raters • Effect of pause length found to be significant

  19. Results, Discussion • Shorter pause time is preferred • Linear trend for 2-16sec pause • Possible that pause time became salient because raters listened to conversations with multiple pause times • VOCA user had significant effect • May be something other than experience causing this • Smaller effect than pause time, no interaction • Social nature of conversation • Perhaps perceived nature of VOCA user had effect • Pauses might not be legitimized b/c no shared activity • WUA is important because of causal relationship between pause time and partner/observer preferences

  20. Amyotrophic Lateral Sclerosis (ALS) • Neuromuscular disease which results in weakness, atrophy and paralysis • 80% eventually require AAC • ≥25% of these did not accept AAC • Little is known about the 80% of ALS patients who need and use AAC • High vs. low tech • Stage of ALS at adoption • “Attitudes toward technology” • Mathy et al. (2000) gives preliminary information • High tech: detailed needs and wants; written communication; stories • Low tech: immediate needs and wants; conversation • Gutmann (1999) found gender differences • Women prefer low-tech strategies and VOCAs more than men • Men prefer high-tech writing systems more often than women • Gutmann and Gryfe (1996) found that early and frequent intervention, and early introduction of AAC is critical for acceptance • Using an AAC can allow someone with ALS to continue working • Focus on high-tech AAC • Low-tech options haven’t changed very much, and have been examined • High-tech options are changing rapidly and becoming more accessible • Is there a pattern to adoption of high-tech AAC? • Why do people use/reject AAC? Ball, et al. 2004

  21. Group • 50 ALS patients monitored over 4 years • 22 females, 28 males • 17 bulbar, 22 spinal, 11 mixed diagnosis • Ages 36-78 (μ=60.16 y.o.) • All spoke English primarily • 2 had cognitive deficits • Seen for AAC assessment when their speech began to change • Those wanting only written communication were not included in the study • Wide variety of educational levels, social status

  22. Procedure • AAC assessment when intelligibility ≤90% or speaking rate ≤100wp (tested quarterly) • Patients presented with various devices • Tried them during presentation • Evaluator made recommendations • Could bring home favored device for 1 week trial • AAC intervention recommended • AAC acceptance, use, rejection and discontinuance were monitored until their death (4-181mo., μ=43.8mo, SD=37.54mo)

  23. Results • Acceptance: 90% immediate, 6% delayed • Came from all social classes • No gender differences • Immediate Acceptance • In interviews, participants listed communication, participation and employment as reasons for acceptance • All used AAC as primary means of communication

  24. Delayed Acceptance • Ages 30-39, 70-79 • Delay of 6-24 mo. • Preferred multifunctional devices • Delayed in part because of family members • Believed that they could understand the participants well enough to meet their needs • Thought they were providing adequate care w/o AAC • 2 thought that AAC questioned the quality of their care • One physician advised a family to accept dysarthria rather than turn to technology • Three individuals were in some form of denial

  25. Rejection and Discontinuance • Rejected by the two participants with cognitive limitation • No discontinuance • High-tech AAC often abandoned at end-of-life

  26. Discussion • Saw wider adoption than before (1996) • AAC seems to be more widely accepted in society • US began funding AAC devices in 2000 • Recommendations • Providing appropriate information regarding the speech-language characteristics of ALS • Regular contact/monitoring • Sustaining awareness of AAC/intervention opportunities • Doctors must be aware of these options and be able to explain them