1 / 23

Text Mining in Combination with Enterprise Search

7th Fraunhofer Symposium on Text Mining 5./6. October 2009. Text Mining in Combination with Enterprise Search. Thomas Herbst CEO B-S-S GmbH. Todays Challenge: Information Overload. Search. 30% of working time is used for search of relevant information .

truman
Download Presentation

Text Mining in Combination with Enterprise Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7th Fraunhofer Symposium on Text Mining 5./6. October 2009 Text Mining in Combinationwith Enterprise Search Thomas Herbst CEO B-S-S GmbH

  2. Todays Challenge: Information Overload Search • 30% of working time is used for search of relevant information. • 85% of all relevant data are unstructured. • the amount of unstructured information doubles approximately every 8 months. • user has the need to get information combined • user is missing the 360°view on all relevant content WWW CMS APPs DMS KM B-S-S Business Software Solutions GmbH

  3. What Customers ask for... ...provide a dynamic holistic view of all information in a proper context. B-S-S Business Software Solutions GmbH

  4. Todays information systemarchitecture B-S-S Business Software Solutions GmbH

  5. Classic information architecture Portal ...siloed content, that can‘t be used in a combined context. DMS Intranet App KM News DMS WCMS MOSS KM DB B-S-S Business Software Solutions GmbH

  6. Enterprise Search Today ... find most of the content, but only links to the content silo‘s. enterprise search Search B-S-S Business Software Solutions GmbH

  7. Todays KM + Search Infrastruct Information Worker KM Web Search Search 1 Search 2 Oracle Google Lucene Web • Search or KM Systems often only address a specific need or purpose • Data must be transferred and transformed between the systems • Time consuming • Information lost • Holistic view cannot be created because every system is a new data silo that can’t be combined • User must learn the query language of every system B-S-S Business Software Solutions GmbH

  8. Enterprise Search + Text Mining based on a Information Access Layer B-S-S Business Software Solutions GmbH

  9. Information access layer Search CMS WWW IAL DMS APPs KM Information Access Layer B-S-S Business Software Solutions GmbH

  10. Create Virtual Datasources App 1 Portal 1 App 2 Portal 2 Healthcare Market Watch Brand Protection Marketing Intranet App CMS DB 10 Search DMS B-S-S Business Software Solutions GmbH

  11. Pipeline = Extract + Enrich Company Geography People Speechtagger Conversion Language Lemmas PLUG-IN Sentiment Ontology Taxonomy Entities • <pages> • <page id=„1“><abstract id=„0“><sentence id=„0“>dpa-afx <location country=„Germany“ long=„46225533“ lat=„13452345“>FRANKFURT</location>. <sentence><sentence id=„1“>“Wir werden weiter profitabel wachsen, die Qualität verbessern und die operative Marge vergrößern“, sagte Vorstandschef <person typ=„male“ class=„economy“>Wolfgang Mayrhuber</person> am Donnerstag in <location country=„Germany“ long=„46225533“ lat=„13452345“ >Frankfurt</location>. </sentence>... • </page> Search  Alert B-S-S Business Software Solutions GmbH

  12. Advanced Linguistics Cerebral infarct / conferences Cerebral infarkt Serebral infarct Cetebral ingarct “Cerebral infarct” Doc type classification Phrasing Spellchecking – Phonetic match Cerebral infarct / medicine Cerebral infarct / biology Cerebral disease Infarction Cerebral infarct Topic classification Thesaurus support Lemmatization Synonymy Cerebral infarcts Character normalization Apoplexy Apoplectic insult Stroke Refinement Infarctus cérébral Ambiguous queries

  13. Architecture Overview App 1 Portal 1 App 2 Portal 2 • Intuitional generation of dynamic application and portals • Enablement of search driven portals • Highly flexibel to modify, adapt and update • Rank based content delivery (popularity, expected sales, confidence) Portal Frontend • Building a real information layer • Integrate all needed content • Convert to one common access layer • Combine all content into virtual datasources • WITHOUT INFLUENCING THE EXISTING INFRASTRUCTURE Information Access Layer e.g. App CMS DB 13 Search DMS B-S-S Business Software Solutions GmbH

  14. Dynamic Content networking Boulevard Gallery • Automatic cross linking of content based on either user context, content context or extracted entities • A sport article about „Tiger Woods“ links to Galleries' and boulevard news about him • A boulevard article also offers upcoming events Portal Sport Events B-S-S Business Software Solutions GmbH

  15. Automatic content linking • Paragraphs • Persons • Locations • Countries / Regions • Companies • Branches • Acronyms • Chemical Structures • Dates • Other custom entities B-S-S Business Software Solutions GmbH

  16. Navigators + Tagclouds • Automatically generated navigators and clouds for most common topics • Enables the user to get an idea of the list of content and results and also to understand and to navigate through them • Automatic search by relevant words or pair of words B-S-S Business Software Solutions GmbH

  17. Offering similiar news • offering of similar contents, based on topic-sensitive matching techniques • Real-time provision of related content (Find, Refine, Exclude, Custom Logic) B-S-S Business Software Solutions GmbH

  18. Document thumbnailing • Creates thumbnails from many document types in different sizes • Gives a user a quick look without opening a native application • Allows visual navigation on page level between text and images B-S-S Business Software Solutions GmbH

  19. Content Analysis On the fly multi dimensional cross tab content analysis Discover trends, knowlege or content relations in structured or unstructured content e.g. sales per region, expert for products, relations between persons and locations B-S-S Business Software Solutions GmbH

  20. User generated content • Put comments on every content • Comment list to show the last comments or the content with the most comments • Let users rate your content • Use the rating to boost or deboost content in the result B-S-S Business Software Solutions GmbH

  21. Summary • Information Access Layer can combine different kind of data silos • Integrate content once and use it in different scenarios under different perspectivs • fully security and access control support • Seamless integration of different Text Mining Products B-S-S Business Software Solutions GmbH

  22. Thank you B-S-S Business Software Solutions GmbH Wartburgstrasse 1 99817 Eisenach/Germany Tel. +49 3691 709000 thomas.herbst@b-s-s.de www.b-s-s.de B-S-S Business Software Solutions GmbH

More Related