1 / 4

Applications of Text Mining

Applications of Text Mining. Ewan Klein School of Informatics & NeSC. Text Mining. Goals Extract useful information from large bodies of unstructured or semi-structured documents Looks for patterns in natural language text Driven by application needs Three Areas: Adding Metadata

melia
Download Presentation

Applications of Text Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of Text Mining Ewan Klein School of Informatics & NeSC

  2. Text Mining • Goals • Extract useful information from large bodies of unstructured or semi-structured documents • Looks for patterns in natural language text • Driven by application needs • Three Areas: • Adding Metadata • E.g., identify Dublin Core elements from document headers • Information Extraction • Identify nuggets of text data and marshall them into a fixed format • Assisting Curation

  3. Text mining and Curation • Example workflow: • Make an observation • Search the research literature for knowledge • Incorporate relevant information into database • Challenges: • Current Information Retrieval (IR) techniques often too imprecise • Which enzymes act as catalysts in the glycolysis pathway? • We want to identify a relation between two entities • Move to augmenting IR with more knowledge of text structure • Mostly supervised machine learning techniques • Still need training data for each domain • Need to integrate text mining into Grid applications

  4. BlueDwarf for Text Mining • BioCreative Competitioin • Joint entry with Stanford • Recognition of drug names, chemical names, and protein names in MEDLINE abstracts • Java maximum entropy tagger • Used roughly 700,000 features in the early stages • Java memory size of 1950 Mb • Died on available Informatics and Stanford machines • BlueDwarf • Arrived at 1,247,77 features, memory: 2560 Mb • Several experiments running in parallel • Provisional results: we obtained top-scoring results

More Related