1 / 7

Recent Work at ISI

Recent Work at ISI. Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI. Overview. 1. EDC system NHANES health questionnaire data (Semi-)automatic domain model construction NL-based question understanding 2. Proposals Urban Transportation SGER awarded

Download Presentation

Recent Work at ISI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI

  2. Overview 1. EDC system • NHANES health questionnaire data • (Semi-)automatic domain model construction • NL-based question understanding 2. Proposals • Urban Transportation SGER awarded • Submitted proposal to ITR 3. Outreach • Connections to USC campus • Conference planning: dg.o 2002

  3. NHANES Data Collection • We acquired and wrapped NHANES database • From National Center for Health Statistics • Survey of thousands of records (people), each record contains max. 12,000 questions about health, family, medical history, etc. • Database wrapped and accessible via EDC system Challenge: can we learn the domain model automatically? • Try to extract terms from DB, cluster them, and then link them into Ontology • Then test Domain Model using SIMS

  4. Automated Domain Modeling Research • Step 1: performed manual pre-test • extracted approx. 60 column headings (database questions) • clustered them manually • compared accuracy: about 50% overlap only • Step 2: developed clustering toolkit • assembled CLINK, SLINK, Median, k-Means, etc. into toolkit • developed speedup techniques • Step 3: ran series of 10 experiments • various word manipulations (word weighting by inverse frequency, etc.; word stemming; longer passage extracts; etc.) • mapped out extensive parameter space; did pinpointed sweep • Results still not great

  5. NL Question Understanding Challenge: can we interpret user’s question when posed in English, not using menus or ontology? • Approach: 1. create new Finite State Machine 2. create question grammar and lexicon (linked to Ontology) 3. create conversion routines that assemble SQL queries out of user input 4. test and evaluate using EDC system and SIMS • Current status: • new FSM completed • grammar and conversion routines under construction • will demo English (+ other?) query input at conference

  6. Proposals • SGER proposal funded • Topic: Urban transportation study—new methods for freight tracking in LA by comparing across databases • Grant awarded to USC, shared by ISI and USC’s Dept of Policy and Planning • Jose Luis Ambite will spend approx. 25% time on this study • White paper to DoT • Topic: Searching for patterns in freight traffic • Submitted by USC campus people and Jose Luis Ambite • ITR proposal submitted • Topic: Semi-automated topic hierarchy creation • Partners: Eduard Hovy communicated with EPA group • If funded will use EPA’s CARAT ontology as starting point and evaluation standard

  7. Outreach • USC Campus Group • Urban policy planners, digital democracy sociologists, industrial and systems engineers, etc. • Held several meetings, chaired by Yigal Arens and Genevieve Giuliano, to explore collaborations and to see if we can extend DGRC to start a separate organization • Drafted a statement of goals to hand to Provost and USC-based small funding offices • New issue of DG Online! http://www.dgrc.org/dg-online/ • Conference: dg.o 2002 • Hotel arranged • Website up (but still need fancy graphics) • Call for presentations disseminated • Some portions of program and invitees determined

More Related