slide1 n.
Download
Skip this Video
Download Presentation
Web clustering Engines

Loading in 2 Seconds...

play fullscreen
1 / 20

Web clustering Engines - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

Web clustering Engines are emerging trend in the field of data retrieval. They organize search results by topic, thus providing a complementary view to the flat ranked list returned by the standard search engines.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Web clustering Engines' - factscomputersoftware


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
search engine
Search Engine?
  • Search engines are an invaluable tool for retrieving information from the Web. In response to a user query, they return a list of results ranked in order of relevance to the query.
  • Eg: Google, Yahoo etc.
flat ranked vs clustered
Flat Ranked VS Clustered
  • Google (Flat Ranked Search Engine)
why web clustering engines
Why Web Clustering Engines?
  • Conventional Engines are not much efficient in ‘Ambiguous’ queries.
  • The search results returned by conventional search engines on query will be mixed together in the list irrelevant items occurs.
slide6

This systems group the results returned by a search engine into a hierarchy of labeled clusters (also called categories).

Web clustering engines:

1. Northern Light - predefined set of clusters

2. Credo Reference

3. Kartoo

4. Eyeplorer

main advantages of the cluster hierarchy
Main advantages of the cluster hierarchy
  • It makes for shortcuts to the items that relate to the same meaning.
  • It allows better topic understanding.
issues in implementation of clusters
Issues in Implementation Of clusters
  • Short input data description.
  • Meaningful labels.
  • Selection of similarity measure.
  • Grouping of objects into clusters.
  • Computational efficiency.
  • Unknown number of clusters.
1 search results acquisition
1.Search Results Acquisition
  • Provides input for the rest of the system.
  • Based on the query, the acquisition component must deliver 50 to 500 results, each of which should contain a title, a contextual snippet, and the URL
  • The source of search results can be any public search engines, such as Google,Yahoo etc.
  • Fetching results from other search engines.
2 preprocessing of search results
2.Preprocessing of Search results
  • Primary aim is to convert the search results into ‘features’

steps:

i.Language identification

ii.Tokenization

iii.Stemming

iv.Selection features

slide12

ii.Tokenization:

Text of each search result gets split into a sequence of basic independent units called tokens represent by word, number or symbol.

slide13

iii.Stemming:

Remove the inflectional prefixes and suffixes of each word to reduce different grammatical form of the word to a common base form called a ‘stem’.

Eg:

connected,connecting & interconnection

↓ ↓ ↓

‘connect’

slide14

iv.Selection features:

  • Extract features for each search result present in the input.
  • Features are atomic entities by which we can describe an object and represent its most important characteristic to an algorithm.
  • Features vary from single word to tuples of word.
how can represent a feature text
How can represent a feature/text?
  • Vector Space Model(VSM)
  • Document d is represented in the VSM as a vector [wt0 , wt1 , . . .wtn]

where t0, t1, . . . tnis a set of words/features

andwtiis the weight/importance of feature ti

Eg:

d→“Pollyhad a dog and the dog had Polly”

vsm representation

3 cluster construction labelling
3.Cluster Construction & Labelling
  • The set of search results along with their features are input to the clustering algorithm,

for building the clusters and labeling.

Three types of Algorithms:

1. Data Centric Algorithms

2. Description aware

3. Description centric

data centric clustering algorithm
Data Centric Clustering Algorithm
  • It has initial clustering of a collection of documents in a set of k clusters(scatter)
  • At Query time the user selected clusters of interest(gather) and the system re-clustered those documents.
  • Process repeats until a small cluster with relevant documents is found
difficulties in data centric algorithms
Difficulties in Data centric algorithms
  • All these algorithms are not incremental in nature - each document arrives from the web, we “clean” it and add it to the available model.
  • Missing of meaningful labels.
4 visualization of clustered results
4.Visualization of Clustered Results
  • One prominent approach is based on hierarchical folders
  • Clusty, CREDO, Lingo3G - hierarchical folder visualization approach
  • Grokker - Nesting ,zooming approach
  • KartOO - Graph based interfaces
ad