noisy text correction an exercise in futility l.
Skip this Video
Download Presentation
Noisy Text Correction – an exercise in futility?

Loading in 2 Seconds...

play fullscreen
1 / 9

Noisy Text Correction – an exercise in futility? - PowerPoint PPT Presentation

  • Uploaded on

Noisy Text Correction – an exercise in futility?. Sreeram Balakrishnan IBM India Research Lab. Aggregate versus Instance analysis. Can divide applications for noisy text into two broad categories Applications that look at individual text instances Eg Search, transcription (OCR, speech2text)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Noisy Text Correction – an exercise in futility?' - vienna

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
noisy text correction an exercise in futility

Noisy Text Correction – an exercise in futility?

Sreeram Balakrishnan

IBM India Research Lab

aggregate versus instance analysis
Aggregate versus Instance analysis
  • Can divide applications for noisy text into two broad categories
    • Applications that look at individual text instances

Eg Search, transcription (OCR, speech2text)

    • Applications that look at aggregate features of the text

Eg Document classification, Aggregate text analytics

  • Aggregate analysis is more robust to noise since errors can be averaged out.
    • Text correction techniques can help improve accuracy of aggregate statistics
  • Applications that require accurate correction of each text instance may be an exercise in futiliy (at least in the short term)
    • Eg the example of SMSs that require knowledge of whole context of conversation to manually correct
example customer contact records
Example – Customer Contact Records
  • IBM PC help centers received over 500,000 calls per year
  • Agents produce summary transcripts for each call

Date: 19990425, ID: 13163548 PRELA

04/25/1999 20:46 - Call started by John Velocci (MOB_NORTH).

Q: wants to know if he's protected againt the CIH virus. a: has ibm anti virus installed.

A: told him to goto the web site for upgrade patches. told him to fax pop. s: self st: closed 04/25/1999 20:51 - Call closed by John Velocci (MOB_NORTH).

Date: 19990426, ID: 13171316 POWER

b04/26/1999 18:50 - Call started by Scott MacDonald (MOB_NORTH).


a: looked up the pn for ac adapter 02k6496 and transfered her to parts 04/26/1999 18:55 - Call closed by Scott MacDonald


Date: 19990604, ID: 13376646MONIT

06/04/1999 22:16 - Call started by Barry O'Kelly (IREL_MOB3). Q:Tp attached to dock with external border around LCD and monitor A:Undocked Tp......booted.....screen full Only gets border when attached to dock and monitor Was reinstalling monitor when cus disconnected S:Training 06/04/1999 23:11 - Call placed in Mobiles call back queue by Barry O'Kelly (IREL_MOB3). 06/04/1999 18:14 - Call taken by Andrew Atias (TAG4). Q: Customer calling back, customer still getting black border on LCD and monitor. A: I explained to the customer that this will will happen when using a simultanious display. S: SOP 06/04/1999 19:04 - Call closed by Andrew Atias (TAG4).

CALL TYPE: Technical Information for Purchased Equipment



OMPONENT TYPE: Monitor/Display



Date: 19990605, ID: 13376581MONIT

06/03/1999 10:39 - Call started by Robert Dennis (MOB_NORTH). Machine is oow warranty Q:machine lcd panel will cut out on cust with the machine sitting normally cust states that presure on the under side of machine at or near the F10 key is what is needed to keep machine running....advised billable repair cust agreed, seeking R3 service 10:46:02 * MSG FROM EZSRV : EasyServe R3 pickup request received for 10:46:02 * MSG FROM EZSRV : Machine Type: 2640 Serial# 78GM283 advised customer that data may be lost when sending in to ez serve.... advised customer to back up all personal data...if possible also machine may be reloaded as part of pd/repair process please have all software, product licenses and COA's available when machine is returned also please write the case number on the outside of the box before sending in to ez serve repair 06/03/1999 10:49 - Call closed by Robert Dennis (MOB_NORTH). 06/04/1999 22:52 - Case Number: 13366697 continued by Margaret Butler



Date: 19991001, ID: 8629697

inquiry about TP8:memory upgrade

customer would like to upgrade TP8:memory but needs information.

gave customer information about memory products.

Date: 19991051, ID: 8630655

complaint aboutTPxx

customer reportspaint ispeelingaroundpalm rest

large number of customer claim logs

diverse contact media

  • Summary records contains details of why customers are unhappy
  • Aggregate analysis of the key phrases reveals that paint peeling at palm rest is common complaint of many TPxx users
some example of sms data
Some example of SMS data
  • Please send me about yyy card
  • What is the no. that i may have to dial for kaun banega lakhpati ?
  • Tell me about new plans plese thanks
  • Mera custmbar care kayu band kar diya gaya hai kirpa karke aap mera custmbar chalu karen kayu ki mera ko newplan ki jankare chahi
  • Sir mere custmer care nall gall nahi ho rahi..
  • Please activate my wap over gprs
  • Gup shup pack not activating .unable to connect 656.
  • Pl. confirm the receipt of payment of Rs. 500 paid on 19.05.06 vide receipt 0244213 at Karanagar. Thanks
  • Request har never made by me for ISD. I dont need ISD
  • 24 hrs over what is the reply
  • I am post paid customor & i have quary about my bill but custamer ex are not there. What can i do now
  • 3 din ho gaye aap ko aap ke 24 hour kab pure honge
  • Xxx ki service ko kya ho gaya hai.custmer ko satisfied hi nahi karte.
  • Tell me where i can contact othewise i would take another connection.
  • The service of xxx is extremly bad and some of the senior employee are irresponsible regarding their work.e.g. (XYZ)
  • Xxx service veri poor.
  • No care for customer is what Xxx focus on. I've to leave xxx as it is not solving my problem. Gudbye Keep NOT care customers
  • I am very distrebed to xxx massangar I riqvest 3rd time complained
  • Bhaji plz custmar care service chalu kar do nahi ta mai no. Band kar devaga.Menu bahut mushkil aa rahi plz.Kal spice da no chalega
some examples of call centre notes
Some examples of call centre notes
  • no.fwd to unbarr
  • pls actv. AR on cst reqt ............10:19am.
  • cust wants to actv roam as he don't understand the ivr
  • roming actv on cust req ,,,,,,,,,,charges told,,,,reena
  • no. unbarred as pymt reflected on cust req
  • xxx roaming deactv on cust req
  • the cust secratory called up and he inf tht he was not able to access GPRS ,he was not able to confirm whether its masala or MO,and he told that he will call back with other details later and disconn teh call
  • No waiver given to him at any cost........
  • promotional mssg restricted as on cst req......11:14am
  • Customer was charged SMS for Rs.3074.But customer didnt give request for deactivation of 10000sms pack.Since om dwn,not able to chk active or not.But its shows active in new crm window.
  • resume no. as pymt is reflected.....................9.20am
  • ar deactivated on cust request
  • case escallated : HEALTH ALERTS to be deactivated ......11:15 am
enriching structured bi with unstructured data
Enriching structured BI with unstructured data
  • Augment classic structured data warehouses with information extracted from unstructured sources
  • Domain specialized annotators embedded in UIMA (open source) extract structured attributes from unstructured sources

Unstructured Information Management Architecture


OLAP tools



Unstructured sources

UIMA processing

Unstructured Enriched Data Warehouse


Structured sources

analysis of agent performance at002
Analysis of Agent Performance (AT002)

Scenario: IBM BPO business with agents handling car rentals wants higher check-out rate

Solution: Extract and correlate key phrases from call transcripts with outcomes

Value Selling Phrases

Mention of Good Vehicle

Mention of Good Rate

analysis of agent performance
Analysis of Agent Performance

Higher use of value selling phrases mentioning good rate for checked out cars versus no show

Value selling phrases

Checked out cars


No shows



Pick up information