Web clustering Engines are emerging trend in the field of data retrieval. They organize search results by topic, thus providing a complementary view to the flat ranked list returned by the standard search engines.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Web clustering engines:
1. Northern Light - predefined set of clusters
2. Credo Reference
ii.Tokenization: into a hierarchy of labeled clusters (also called categories).
Text of each search result gets split into a sequence of basic independent units called tokens represent by word, number or symbol.
iii.Stemming: into a hierarchy of labeled clusters (also called categories).
Remove the inflectional prefixes and suffixes of each word to reduce different grammatical form of the word to a common base form called a ‘stem’.
connected,connecting & interconnection
↓ ↓ ↓
iv.Selection into a hierarchy of labeled clusters (also called categories). features:
where t0, t1, . . . tnis a set of words/features
andwtiis the weight/importance of feature ti
d→“Pollyhad a dog and the dog had Polly”
for building the clusters and labeling.
Three types of Algorithms:
1.Data Centric Algorithms
THANK YOU into a hierarchy of labeled clusters (also called categories).