1 / 23

Improving Intranet Search with Database-Backed Technology

Session id: 40185. Improving Intranet Search with Database-Backed Technology. Omar Alonso Oracle Corporation. Agenda. Issues with Enterprise Search Oracle’s products Infrastructure: Oracle Text Solution: Oracle Ultra Search Looking into the details Overview of main features Conclusions.

elsbeth
Download Presentation

Improving Intranet Search with Database-Backed Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session id: 40185 Improving Intranet Search with Database-Backed Technology Omar Alonso Oracle Corporation

  2. Agenda • Issues with Enterprise Search • Oracle’s products • Infrastructure: Oracle Text • Solution: Oracle Ultra Search • Looking into the details • Overview of main features • Conclusions

  3. Current Problems with Intranet Search • Enterprise Intranet is very different from typical Internet websites • Users are different • Tasks are different • Amount and quality of information are different • Searching is also different

  4. Main Issues with Intranet Search • Multiple repositories • Different data sources (websites, files, email, etc.) • Performance • Sub-second query respond time no minutes • Quality • Good search results not thousand of irrelevant stuff • Ease of Use • One single search engine not an engine per data source • Bad search is very easy to do • Good search is very difficult

  5. What is a Bad Search? • No search box • Too many hits • Return 10,000 hits when the average user looks at the top-20 only • The most relevant item is not at the top of the list • Bad scoring • Too many similar documents • Poor duplicate detection • Inability to judge user intent • No spell checking • No context disambiguation (cricket the game or cricket the bug?) • No recommendation system

  6. What is a Bad Search (Cont.) • Inability to understand why a document has been returned • No KWIC • Lack of categorization • Similar documents in the same list • Documents change behind your back • No cache • Meta information • Size, format, date, feedback, etc.

  7. Some Examples - I Where is the search box?

  8. Some Examples – II “ultra seek” or “ultraseek”?

  9. Some Examples - III Looking for “k-means” in lotus.com

  10. The Oracle Products • Oracle Text • Complete API for building any type of search application • Features range from basic keyword searching to advanced techniques like classification and information visualization • Oracle Ultra Search • Out-of-the-box solution that requires no coding • Can search across OCS components, websites, databases, files, email, and Portal • Built on top of Oracle Text

  11. The Oracle Solution (Cont.) Looking into the details • Quality • Performance • Ease of Use • Personalization • Advanced features • Classification and visualization

  12. Quality • Link awareness • Popular pages and hubs • Website structure • Page structure • Duplicate elimination • Remove URLs with duplicate or near duplicate content • Spelling correction • Component that uses a dictionary and data from query logs • Did you mean …? • KWIC (Key Word In Context) • Highlights relevant parts of the document • No need to open the URL if it doesn’t look relevant

  13. Performance • Oracle Text integrates with and benefits from features like • Data partitioning • RAC • Query optimization • Common and rare queries • Small index on URL and title for common queries • Large index on document content for rare queries • Query Relaxation • Enables you to execute most restrictive query first • Then relaxing the search

  14. Ease of Use • Users want a simple and easy to use search interface • Hide all the complexity and expose simple interface • Ultra Search • Two search modes • Basic: simple search box where search results are sorted by relevance • Advanced: interface with more options where user has more control over the collection

  15. Ease of Use (Cont.)

  16. Personalization • Know user search patterns • What do they search? • When do they search? • Search query log analysis • Which queries were made? • Which queries were successful? • How many times was each query made?

  17. Advances Features • Classification • Supervised classification of content • Two ways: rules or training sets • You can group a number of categories into a taxonomy • Very useful for defining a common vocabulary in an enterprise • Clustering • Unsupervised classification of patterns into groups • The engine analyzes the document collection and outputs a set of clusters with documents on it • Very useful for discovering patterns or nuggets in collections • Could be used as a starting point when there is no taxonomy present

  18. Advanced Features (Cont.) • Information Visualization • Very useful for • Navigation through large data sets • Discover relationships and associations between items • Focus + context tasks • Number of visualizations available • StretchViewer • Interactive Viewer (ThemeMap, Cluster visualization) • Integration with 3rd party vendors

  19. Conclusions • Search is hitting a plateau • Bad search is easy to implement, good search is difficult • Correcting deficiencies • Quality, performance, and other features help • Moving to the next level • Classification and clustering • Text mining • Information Visualization • Content structure aware • Oracle Database 10g provides complete solution for enterprise search • Oracle Text: complete API where you have total control • Ultra Search: out-of-the-box solution that requires no coding

  20. Links • Oracle Text page http://otn.oracle.com/products/text • Ultra Search page http://otn.oracle.com/products/ultrasearch • Java library for Text visualization http://otn.oracle.com/software/products/workspace_mgr/text_visualizer.html

  21. Q & Q U E S T I O N S A N S W E R S A

More Related