slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Improving Intranet Search with Database-Backed Technology PowerPoint Presentation
Download Presentation
Improving Intranet Search with Database-Backed Technology

Loading in 2 Seconds...

play fullscreen
1 / 23

Improving Intranet Search with Database-Backed Technology - PowerPoint PPT Presentation


  • 320 Views
  • Uploaded on

Session id: 40185 Improving Intranet Search with Database-Backed Technology Omar Alonso Oracle Corporation Agenda Issues with Enterprise Search Oracle’s products Infrastructure: Oracle Text Solution: Oracle Ultra Search Looking into the details Overview of main features Conclusions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Improving Intranet Search with Database-Backed Technology' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Session id: 40185

Improving Intranet Search with Database-Backed Technology

Omar Alonso

Oracle Corporation

agenda
Agenda
  • Issues with Enterprise Search
  • Oracle’s products
    • Infrastructure: Oracle Text
    • Solution: Oracle Ultra Search
  • Looking into the details
  • Overview of main features
  • Conclusions
current problems with intranet search
Current Problems with Intranet Search
  • Enterprise Intranet is very different from typical Internet websites
    • Users are different
    • Tasks are different
    • Amount and quality of information are different
  • Searching is also different
main issues with intranet search
Main Issues with Intranet Search
  • Multiple repositories
    • Different data sources (websites, files, email, etc.)
  • Performance
    • Sub-second query respond time no minutes
  • Quality
    • Good search results not thousand of irrelevant stuff
  • Ease of Use
    • One single search engine not an engine per data source
  • Bad search is very easy to do
  • Good search is very difficult
what is a bad search
What is a Bad Search?
  • No search box
  • Too many hits
    • Return 10,000 hits when the average user looks at the top-20 only
  • The most relevant item is not at the top of the list
    • Bad scoring
  • Too many similar documents
    • Poor duplicate detection
  • Inability to judge user intent
    • No spell checking
    • No context disambiguation (cricket the game or cricket the bug?)
    • No recommendation system
what is a bad search cont
What is a Bad Search (Cont.)
  • Inability to understand why a document has been returned
    • No KWIC
  • Lack of categorization
    • Similar documents in the same list
  • Documents change behind your back
    • No cache
  • Meta information
    • Size, format, date, feedback, etc.
some examples i
Some Examples - I

Where is the search box?

some examples ii
Some Examples – II

“ultra seek” or “ultraseek”?

some examples iii
Some Examples - III

Looking for “k-means” in lotus.com

the oracle products
The Oracle Products
  • Oracle Text
    • Complete API for building any type of search application
    • Features range from basic keyword searching to advanced techniques like classification and information visualization
  • Oracle Ultra Search
    • Out-of-the-box solution that requires no coding
    • Can search across OCS components, websites, databases, files, email, and Portal
    • Built on top of Oracle Text
the oracle solution cont
The Oracle Solution (Cont.)

Looking into the details

  • Quality
  • Performance
  • Ease of Use
  • Personalization
  • Advanced features
    • Classification and visualization
quality
Quality
  • Link awareness
    • Popular pages and hubs
    • Website structure
    • Page structure
  • Duplicate elimination
    • Remove URLs with duplicate or near duplicate content
  • Spelling correction
    • Component that uses a dictionary and data from query logs
    • Did you mean …?
  • KWIC (Key Word In Context)
    • Highlights relevant parts of the document
    • No need to open the URL if it doesn’t look relevant
performance
Performance
  • Oracle Text integrates with and benefits from features like
    • Data partitioning
    • RAC
    • Query optimization
  • Common and rare queries
    • Small index on URL and title for common queries
    • Large index on document content for rare queries
  • Query Relaxation
    • Enables you to execute most restrictive query first
    • Then relaxing the search
ease of use
Ease of Use
  • Users want a simple and easy to use search interface
  • Hide all the complexity and expose simple interface
  • Ultra Search
  • Two search modes
    • Basic: simple search box where search results are sorted by relevance
    • Advanced: interface with more options where user has more control over the collection
personalization
Personalization
  • Know user search patterns
    • What do they search?
    • When do they search?
  • Search query log analysis
    • Which queries were made?
    • Which queries were successful?
    • How many times was each query made?
advances features
Advances Features
  • Classification
    • Supervised classification of content
    • Two ways: rules or training sets
    • You can group a number of categories into a taxonomy
    • Very useful for defining a common vocabulary in an enterprise
  • Clustering
    • Unsupervised classification of patterns into groups
    • The engine analyzes the document collection and outputs a set of clusters with documents on it
    • Very useful for discovering patterns or nuggets in collections
    • Could be used as a starting point when there is no taxonomy present
advanced features cont
Advanced Features (Cont.)
  • Information Visualization
  • Very useful for
    • Navigation through large data sets
    • Discover relationships and associations between items
    • Focus + context tasks
  • Number of visualizations available
    • StretchViewer
    • Interactive Viewer (ThemeMap, Cluster visualization)
    • Integration with 3rd party vendors
conclusions
Conclusions
  • Search is hitting a plateau
    • Bad search is easy to implement, good search is difficult
  • Correcting deficiencies
    • Quality, performance, and other features help
  • Moving to the next level
    • Classification and clustering
    • Text mining
    • Information Visualization
    • Content structure aware
  • Oracle Database 10g provides complete solution for enterprise search
    • Oracle Text: complete API where you have total control
    • Ultra Search: out-of-the-box solution that requires no coding
links
Links
  • Oracle Text page

http://otn.oracle.com/products/text

  • Ultra Search page

http://otn.oracle.com/products/ultrasearch

  • Java library for Text visualization

http://otn.oracle.com/software/products/workspace_mgr/text_visualizer.html

slide22

Q

&

Q U E S T I O N S

A N S W E R S

A