1 / 24

QA for the Web

QA for the Web. Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com. Motivation. In the US alone, there are more than 100 million Internet users per day Each user asks on average 5 questions

paulos
Download Presentation

QA for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QA for the Web Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

  2. Motivation • In the US alone, there are more than 100 million Internet users per day • Each user asks on average 5 questions • Each user spends about half an hour to find answers

  3. Tasks • Task 1 – Adapt the QA technology to the universality of the Web hypertexts • Task 2 – Interface the QA system with the emerging Semantic Web technologies

  4. Task 1 Adapt QA Technology to the Web • Two approaches: • use available Search Engines • gather documents from the Web and form a local collection

  5. QA on Top of a Search Engine Search Engine Documents Keywords Format Manager Normalized Documents Question Processing Paragraph Retrieval Answer Processing

  6. QA on Top of a Database Engine Database Engine Database Records Query Query Builder Format Manager Normalized Documents Keywords Question Processing Paragraph Retrieval Answer Processing

  7. Technical challenges • Different formats: pdf, html, doc, ps • Document layout • Pages dynamically generated • Password protection • Subscription required • Cookies

  8. Build local collections of documents • Gather documents from a specific site, and cache locally • Transform in text canonical form, then index documents • Maintain document collection: constantly update, avoid redundant documents, garbage collection, etc.

  9. Experiments • Business: InterVoice Brite Product Manuals • Community: City of Irving • NEWS: cnn.com, abcnews.com, dallasnews.com, time.com, washingtonpost.com

  10. InterVoiceBrite • Collection: • product manuals • size: 38MB • files: 802 • format: PDF • layout: specific to manuals • changes occur at large time intervals

  11. PECULIARITIES OF THEIR NEEDS • The Question is in the form of a problem description • The expected answer is a solution to the problem • The answer is compiled from different parts of documents and given in the form of a procedure to be followed • Follow-ups are frequently leading to dialogue

  12. An Example • Question: “I would like to have the caller be able to control the playback of a long set of instructions with speech recognition. While the message is playing the caller may say “stop”, “go back”, “forward”, “start over” and have the system respond appropriately. Can this be done? The SpeechAccess engine is Nuance. • Answer: “Yes this can be done. Play a lead in message to tell the caller to say “next” “backup” or “done”. Then with the loop play the first instruction you want the caller to hear in keyover mode. To obtain line balancing procedure and the required files please visit the continuing engineering web page”

  13. Our Demo • Q: How can I obtain line balancing information ? • A: READ DSLAC Request AI1 DSLAC line balancing information • Q:How can I modify a message ? • A: Your Voice The feature that enables a voice mail user to change specific voice messages • Q: What is the runtime engine ? • A: ISINIT, the runtime engine,

  14. Our Demo • Q: What type of error is HH ? • A: Hardware Handler (HH) error • Q: What causes telephony connection problems ? • A: Telephony connection problems can be caused by the InterSoft system or by the telephony equipment (PBX) • Q: What does FUSE mean ? • A: FUSE Indicates a problem with the fuse

  15. City of Irving • Collection: • heterogeneous, city information • size: 96MB • files: 1097 • format: HTML, PDF, DOC • layout: WWW space • small daily changes

  16. Examples • Q: When does the Farmer’s Market take place ? • A: Irving Farmers ‘ Market: 1st and 3rd Saturdays in Downtown Irving • Q: What is Irving ‘s news source ? • A: Irving ‘s news source is the City Spectrum • Q: Where does Irving’ s water supply come from ? • A: The City of Irving purchases its entire water supply from the City of Dallas

  17. Examples • Q: Where can I pay traffic fines ? • A: Irving Municipal Court Criminal Justice Center 305 N. O’Connor Rd • Q: How do I apply for a job with the City ? • A: Applications are accepted from 8a.m. to 5p.m. Monday – Friday at the Civic Center Complex, 825 W. Irving Blvd. Job listings are available on the city ‘s Web site, www.ci.irving.tx.us , or by calling the city ‘s 24 –hour job line at (972) 721 3773

  18. NEWS • Collection: • sources: CNN.COM, TIME.COM, ABCNEWS.COM, DALLASNEWS.COM, WASHINGTONPOST.COM • size: 531MB • files: 55880 • format: HTML, PDF, DOC • frequent changes

  19. Issues • broken links • garbage collection for obsolete files • cumulative NEWS • updates depending on the type of source (TIME.COM - weekly)

  20. Examples • Q: How many soldiers died in Afghanistan? • A: The US military has opened an investigation into last week’s friendly fire incident in Afghanistan that killed four Canadian soldiers and injured eight others • Q: How much did President Bush increase aid for poor countries ? • A: Bush said the US will increase its initial pledge of $ 200 million only after the fund proves successful • Q: Who is the owner of Dallas Mavericks ? • A: Mark Cuban, Internet entrepreneur and owner of the NBA ‘s Dallas Mavericks

  21. QA and Semantic Web • QA Technology can contribute to the development of Semantic Web • Possible architectures: • 1. QA as an interface between Intelligent Agent and the Semantic Web Agent Human Web QA

  22. QA and Semantic Web • 2. QA works on a local collection Local Collection QA Web Agent Human Agent Human Local Collection QA

  23. Technical Challengesto be Addressed • 1. Make QA system compatible with semantic web language (i.e. XML, RDF, DAML, OIL, etc.) • 2. Make QA ontologies compatible with the Semantic Web ontology • 3. Interface QA system with Intelligent Agents

  24. Thank you!

More Related