1 / 26

Testing / Evaluation Eric Morley June 1, 2010

Testing / Evaluation Eric Morley June 1, 2010. Papers. File, P., Todman , J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative and Alternative Communication 18 (December), 228-241.

julie
Download Presentation

Testing / Evaluation Eric Morley June 1, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing / EvaluationEric MorleyJune 1, 2010

  2. Papers • File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative and Alternative Communication 18 (December), 228-241. • Todman, John and HalinaRzepecka. 2003. Effect of pre-utterance pause length on perceptions of communicative competence in AAC-aided social conversations. Augmentative and Alternative Communication, 19(4):222–234. • Ball, L. L., Beukelman, D. R., Pattee, G. L., 2004. Acceptance of augmentative and alternative communication technology by persons with amyotrophic lateral sclerosis. Augmentative and Alternative Communication 20 (2), 113-122.

  3. File and Todman 2002 • Voice output communication aids (VOCAs) allow storage of whole utterances, but are not designed for users to rely on these for open-ended conversations • Impossible to predict which utterances will be needed • Difficult to locate potential follow-up phrases • May be possible to use “imperfect expressions” in social conversation and still have a positive experience

  4. TALK • Computer system (Talk Aid Using Pre-Loaded Knowledge) • Uses pre-stored messages • For people who understand language but can’t talk (anymore) • User writes phrases on their own time • Phrases stored in person, time and aspect locations • Ex: My father was a shop manager (me/past/who) • Quick fires: “ah yes”, “too bad”, etc. • Can be set to say equivalent phrase to one selected • Comments: useful in many contexts (ex. “what about you?”) • Allows word by word entry

  5. Old TALK • Evaluations have been getting more realistic since inception (started with cocktail party type situation) • Rates of up to 40-60wpm (vs 2-15wpm for word by word) • Improvements in interaction quality • Some differences from normal conversation in simulated use with unimpaired user • VOCA user was less likely to follow up with narrative

  6. Current Study • Conversations can be on any topic • Genuine users interacting with new partners • 19/68 conversations had the same partner allowed assessment of coherence of conversations with a familiar partner • Analyzed 68 old transcripts of conversations with lag-sequential analysis • Participants were • 1 TALK user (~40y.o w/cerebral palsy w/dysarthria) • 56 undergrad psych students • 50 students were invited to talk • 6 were invited to have a “getting to know you” conversation • People: TALK user (CP); repeat partner (RP); new partners (NP1 and NP2) from different sets of partners (not repeated)

  7. Hand Labeling • Speech acts coded by category • Questions; answer; observation; agreement; disagreement; repetition; interjection; directive; narrative • 2,345 speech acts w/RP; 2,802 acts w/NP1; 4,268 w/NP2 • Seemed to be ~90% agreement on labeling based off of re-labeling 15% of the acts • Used kappa statistic to confirm agreement of labelers

  8. Lag Sequential Analysis • Statistical analysis of a sequence of terms (here speech acts in a conversation) • Finds statistically significant pairs of acts which occur at particular distances (ie lags) • Lag n means acts a and b with n-1 intervening acts • Crosslag: speech act a (intervening acts) speech actb • Autolag: speech act a (intervening acts) speech acta • Looked at both types together and at crosslag alone • Pooled data across conversation sets (RP, NPi) • Use z-scores: difference between observed and expected lag probabilities • Only counted long sequences as statistically significant if subsequences were also significant)

  9. Results • Most speech acts were questions, answers, observations or agreements • VOCA user made 42% of acts (41% of questions and observations, 55% of answers, 10% of agreements) • In unaided conversations one would expect many more observations than questions and answers, and some more agreements • 31 sequences identified • 19 had ≥3 speech acts • Of those which occurred in all 3 sets, 13 sequences identified, 7 with 3 speech acts • Question and answer sequences were common • Facilitates turn-taking

  10. Initiations • Observation • Often followed by questions, sometimes agreement • CP doesn’t interject as much as RP • Agreement • Used for turn taking in unaided conversation • Questions used for this in aided conversation • Other • CP repetition and N/RP narrative followed by question

  11. Discussion • Only speakers reliably followed answers with observations or narrative • Aided partner did not use narrative (possibly because of lack of practice) • Maybe training would help this • More questions in aided conversation • More likely that the VOCA user has an appropriate general question than a specific narrative or observation • Gives VOCA user control over topic • Pre-stored utterances seem to be re-usable between new and repeat conversation partners • Should include “quick fire” interjection support for conversation with RP • Maybe the space taken up by this can be taken up with something else for NP mode?

  12. Todman and Rzepecka 2003 • Several types of VOCAs • Word by word • Need to generate text at some point, even with pre-stored messages. Pauses are so long that speaking rate goes to 2-15wpm • Utterances are extremely short, partner dominates conversation, “folk walk away” • Whole utterance approach (WUA) • Based on the idea that content of conversations is “frequently approximate rather than precise” • Should result in faster communication rates when precision is not critical • Is this the case? • Does quality of conversation degrade? • If yes to both of these, does this result in more positive perceptions of VOCA users’ communicative competence and interactions with them?

  13. More on WUA • TALK system • For free-ranging social interactions • Uses lots of small talk • 40wpm w/o training, 50-80 with • Frametalker • Designed for transactional conversations (restaurant, bank, etc) • 45 wpm, rated as having a high degree of naturalness

  14. Quality of Conversation (WUA) • Todman, Elder and Alm (1995) • Speaking person used TALK to converse • Parts of these conversations were reenacted with speakers, and pauses were removed • Compared with non-aided conversations • Aided found to be of higher quality • Likely because pre-composed messages will be more coherent • Would this be the case to the other listener, or only for people eavesdropping?

  15. Conversation Rate and Perceived Competence • Variation in conversation rate (CR) can be approximated by looking at pre-utterance pause length • A pause of even a few seconds can cause problems • User perceived as unintelligent • Poor quality of social interactions • Frustration • Abandonment of VOCA • Previous studies have found • Positive correlation between conversational rate and satisfaction • Negative correlation between pre-utterance pause length and satisfaction

  16. Previous Experiments • Newman (1982) • Pre-utterance pauses of 4-7s led to worse interactions when compared with utterances w/o pauses • If the pauses were a result of doing something else (sculpting), this effect disappeared • Does this apply to VOCA users? • Ratcliff, et al (2002) • Effects of pauses and speaking rate on naturalness of synthetic speech • Increased speaking rate perceived as more natural, pauses didn’t seem to do much on their own (only had an effect since they changed the speaking rate) • Bedrosian (2002) • VOCA users ask for a book, 1 with a mostly irrelevant message (after 4), other after 90sec with a highly relevant message • 2nd approach preferred • Had VOCA users give too much/too little information quickly, or relevant info after a delay • Tradeoff: short delay led to improved affective/behavioral ratings, wrong amount of info led to lower rating of cognitive component

  17. Current Experiment • How does pre-utterance pause time affect the perception of social conversation? • Does the amount of experience a VOCA user has with WUA have an effect on this? • 3 VOCA users with cerebral palsy • Used TalkBoards, had varying experience with this • Partners had 20 min introductory conversation • 2 of 3 got sick, so 5 partners, none with VOCA experience • Possible effects of having different partners • Told there were no restrictions on topic of conversation, but other would use VOCA • Interactions recorded, 5 min chunk extracted (after small talk) • Pre-utterance pauses replaced with pauses of set lengths (2-10s) • Pauses didn’t seem to be identical in length, so those are means • Also used original interaction • 28 raters • All psych students • Used Likert scale and recorded conversations as baseline • told they would first listen to a conversation with some natural speech changed to synthetic • Then they would listen to “getting to know you” conversation involving VOCA user

  18. Current Experiment (cont’d) • 7 point scale to evaluate 4 areas • Linguistic, operational, social, strategic • Raters heard each conversation 1x with one pause variation for each one • Blocked raters, found high level of agreement among raters • Effect of pause length found to be significant

  19. Results, Discussion • Shorter pause time is preferred • Linear trend for 2-16sec pause • Possible that pause time became salient because raters listened to conversations with multiple pause times • VOCA user had significant effect • May be something other than experience causing this • Smaller effect than pause time, no interaction • Social nature of conversation • Perhaps perceived nature of VOCA user had effect • Pauses might not be legitimized b/c no shared activity • WUA is important because of causal relationship between pause time and partner/observer preferences

  20. Amyotrophic Lateral Sclerosis (ALS) • Neuromuscular disease which results in weakness, atrophy and paralysis • 80% eventually require AAC • ≥25% of these did not accept AAC • Little is known about the 80% of ALS patients who need and use AAC • High vs. low tech • Stage of ALS at adoption • “Attitudes toward technology” • Mathy et al. (2000) gives preliminary information • High tech: detailed needs and wants; written communication; stories • Low tech: immediate needs and wants; conversation • Gutmann (1999) found gender differences • Women prefer low-tech strategies and VOCAs more than men • Men prefer high-tech writing systems more often than women • Gutmann and Gryfe (1996) found that early and frequent intervention, and early introduction of AAC is critical for acceptance • Using an AAC can allow someone with ALS to continue working • Focus on high-tech AAC • Low-tech options haven’t changed very much, and have been examined • High-tech options are changing rapidly and becoming more accessible • Is there a pattern to adoption of high-tech AAC? • Why do people use/reject AAC? Ball, et al. 2004

  21. Group • 50 ALS patients monitored over 4 years • 22 females, 28 males • 17 bulbar, 22 spinal, 11 mixed diagnosis • Ages 36-78 (μ=60.16 y.o.) • All spoke English primarily • 2 had cognitive deficits • Seen for AAC assessment when their speech began to change • Those wanting only written communication were not included in the study • Wide variety of educational levels, social status

  22. Procedure • AAC assessment when intelligibility ≤90% or speaking rate ≤100wp (tested quarterly) • Patients presented with various devices • Tried them during presentation • Evaluator made recommendations • Could bring home favored device for 1 week trial • AAC intervention recommended • AAC acceptance, use, rejection and discontinuance were monitored until their death (4-181mo., μ=43.8mo, SD=37.54mo)

  23. Results • Acceptance: 90% immediate, 6% delayed • Came from all social classes • No gender differences • Immediate Acceptance • In interviews, participants listed communication, participation and employment as reasons for acceptance • All used AAC as primary means of communication

  24. Delayed Acceptance • Ages 30-39, 70-79 • Delay of 6-24 mo. • Preferred multifunctional devices • Delayed in part because of family members • Believed that they could understand the participants well enough to meet their needs • Thought they were providing adequate care w/o AAC • 2 thought that AAC questioned the quality of their care • One physician advised a family to accept dysarthria rather than turn to technology • Three individuals were in some form of denial

  25. Rejection and Discontinuance • Rejected by the two participants with cognitive limitation • No discontinuance • High-tech AAC often abandoned at end-of-life

  26. Discussion • Saw wider adoption than before (1996) • AAC seems to be more widely accepted in society • US began funding AAC devices in 2000 • Recommendations • Providing appropriate information regarding the speech-language characteristics of ALS • Regular contact/monitoring • Sustaining awareness of AAC/intervention opportunities • Doctors must be aware of these options and be able to explain them

More Related