Improving Intranet Search with Database-Backed Technology - PowerPoint PPT Presentation

Faraday
slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Improving Intranet Search with Database-Backed Technology PowerPoint Presentation
Download Presentation
Improving Intranet Search with Database-Backed Technology

play fullscreen
1 / 23
Download Presentation
Improving Intranet Search with Database-Backed Technology
329 Views
Download Presentation

Improving Intranet Search with Database-Backed Technology

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Session id: 40185 Improving Intranet Search with Database-Backed Technology Omar Alonso Oracle Corporation

  2. Agenda • Issues with Enterprise Search • Oracle’s products • Infrastructure: Oracle Text • Solution: Oracle Ultra Search • Looking into the details • Overview of main features • Conclusions

  3. Current Problems with Intranet Search • Enterprise Intranet is very different from typical Internet websites • Users are different • Tasks are different • Amount and quality of information are different • Searching is also different

  4. Main Issues with Intranet Search • Multiple repositories • Different data sources (websites, files, email, etc.) • Performance • Sub-second query respond time no minutes • Quality • Good search results not thousand of irrelevant stuff • Ease of Use • One single search engine not an engine per data source • Bad search is very easy to do • Good search is very difficult

  5. What is a Bad Search? • No search box • Too many hits • Return 10,000 hits when the average user looks at the top-20 only • The most relevant item is not at the top of the list • Bad scoring • Too many similar documents • Poor duplicate detection • Inability to judge user intent • No spell checking • No context disambiguation (cricket the game or cricket the bug?) • No recommendation system

  6. What is a Bad Search (Cont.) • Inability to understand why a document has been returned • No KWIC • Lack of categorization • Similar documents in the same list • Documents change behind your back • No cache • Meta information • Size, format, date, feedback, etc.

  7. Some Examples - I Where is the search box?

  8. Some Examples – II “ultra seek” or “ultraseek”?

  9. Some Examples - III Looking for “k-means” in lotus.com

  10. The Oracle Products • Oracle Text • Complete API for building any type of search application • Features range from basic keyword searching to advanced techniques like classification and information visualization • Oracle Ultra Search • Out-of-the-box solution that requires no coding • Can search across OCS components, websites, databases, files, email, and Portal • Built on top of Oracle Text

  11. The Oracle Solution (Cont.) Looking into the details • Quality • Performance • Ease of Use • Personalization • Advanced features • Classification and visualization

  12. Quality • Link awareness • Popular pages and hubs • Website structure • Page structure • Duplicate elimination • Remove URLs with duplicate or near duplicate content • Spelling correction • Component that uses a dictionary and data from query logs • Did you mean …? • KWIC (Key Word In Context) • Highlights relevant parts of the document • No need to open the URL if it doesn’t look relevant

  13. Performance • Oracle Text integrates with and benefits from features like • Data partitioning • RAC • Query optimization • Common and rare queries • Small index on URL and title for common queries • Large index on document content for rare queries • Query Relaxation • Enables you to execute most restrictive query first • Then relaxing the search

  14. Ease of Use • Users want a simple and easy to use search interface • Hide all the complexity and expose simple interface • Ultra Search • Two search modes • Basic: simple search box where search results are sorted by relevance • Advanced: interface with more options where user has more control over the collection

  15. Ease of Use (Cont.)

  16. Personalization • Know user search patterns • What do they search? • When do they search? • Search query log analysis • Which queries were made? • Which queries were successful? • How many times was each query made?

  17. Advances Features • Classification • Supervised classification of content • Two ways: rules or training sets • You can group a number of categories into a taxonomy • Very useful for defining a common vocabulary in an enterprise • Clustering • Unsupervised classification of patterns into groups • The engine analyzes the document collection and outputs a set of clusters with documents on it • Very useful for discovering patterns or nuggets in collections • Could be used as a starting point when there is no taxonomy present

  18. Advanced Features (Cont.) • Information Visualization • Very useful for • Navigation through large data sets • Discover relationships and associations between items • Focus + context tasks • Number of visualizations available • StretchViewer • Interactive Viewer (ThemeMap, Cluster visualization) • Integration with 3rd party vendors

  19. Conclusions • Search is hitting a plateau • Bad search is easy to implement, good search is difficult • Correcting deficiencies • Quality, performance, and other features help • Moving to the next level • Classification and clustering • Text mining • Information Visualization • Content structure aware • Oracle Database 10g provides complete solution for enterprise search • Oracle Text: complete API where you have total control • Ultra Search: out-of-the-box solution that requires no coding

  20. Links • Oracle Text page http://otn.oracle.com/products/text • Ultra Search page http://otn.oracle.com/products/ultrasearch • Java library for Text visualization http://otn.oracle.com/software/products/workspace_mgr/text_visualizer.html

  21. Q & Q U E S T I O N S A N S W E R S A