1 / 21

Googalize your Search with DirectInfo Documents

Googalize your Search with DirectInfo Documents. Author: Kiril Rusev Software Architect Semantec Bulgaria OOD. DirectInfo Documents - New Features. Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de. Agenda. Motivation What is DirectInfo Documents? What's new?

ugo
Download Presentation

Googalize your Search with DirectInfo Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Googalize your Search with DirectInfo Documents Author: Kiril Rusev Software Architect Semantec Bulgaria OOD DirectInfo Documents - New Features Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de

  2. Agenda • Motivation • What is DirectInfo Documents? • What's new? • Live Demo • Future development

  3. Motivation - The Need ? ? ?

  4. Motivation - The Challenge Database Data Local Files Intranet Email Internet

  5. Motivation - The Answer Document Files DirectInfo Database Data Structured Search Results Oracle Text Index Web Contents

  6. What is DirectInfo? • A framework based on Oracle Text • Can index and search into various data sources • Can be extended • Can be adjusted to the customer’s needs

  7. Oracle Text - how does indexing work?

  8. DirectInfo and Oracle Text Custom defined document grouping Context indexes with USER_DATASTORE Oracle Fast and flexible searching Full control over the indexing A lot of context information Flexible and extensible filtering Summarizing capabilities Regular index management Oracle Text DirectInfo Effective caching mechanism

  9. DirectInfo Architecture

  10. What is DirectInfo Documents? • Based on DirectInfo platform • A powerful document searching tool • A web based “google-like” application • Easily managed and deployed

  11. What's new? • Speed improvement • Robustness • Manageability • Functional improvements • LF and search results presentation improved

  12. Speed improvement – Document Cache User Datastore PL/SQL Procedure PDF HTML NullFilter HTML PDF Store/Retrieve HTML HTML Filtering Document Cache • Filtering is done only once • The HTML version of the document is cached

  13. Speed improvement – Faster Crawling Internet Crawler Interface File Crawler Local Files DirectInfo Web Crawler Other… Email Crawlers are adjusted according to the target document sources

  14. INSO Filter HTML PDF PDF Before: Datastore NULL Filter HTML PDF HTML After: Datastore Filter 1 Filter 2 … Filter N Robustness – Better Filtering XFilter

  15. Index Before: Dtx_Ddl.Sync_Index Index Dtx_Ddl.Sync_Index Dtx_Ddl.Sync_Index After: Dtx_Ddl.Sync_Index ……… Manageability - Indexing in Chunks Unstoppable !!!

  16. Before: Found Files Indexed Files After: Indexed Files Found Files Functional improvements - Duplicated Files Detection

  17. Functional improvements - Summarizer

  18. LF and search results presentation improved • Deferred fragments loading • Skins support, XP look and feel • Visual and functional redesign - HTML Frames • Searching made more simple

  19. Live Demo

  20. Future development • Defining and searching of meta data • Search results clustering • Improved flexibility • Improved administration • Improved caching • Better summarizing

  21. Thank You!

More Related