Automatic Classification of Text Databases Through Query Probing. Panagiotis G. Ipeirotis Luis Gravano Columbia University Mehran Sahami E.piphany Inc. Search-only Text Databases. Sources of valuable information Hidden behind search interfaces Non-crawlable Example: Microsoft Support KB.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Panagiotis G. Ipeirotis
Example: Microsoft Support KB
Examples: MetaCrawler, SavvySearch, Profusion
Example from InvisibleWeb.com
Computers > Publications > ACM DL
During the training phase:
How can we extract this information?
The probes should extract information about the categories of the documents in the database
IF lung AND cancer THEN health +lung +cancer
IF linux THEN computers +linux
jordan AND bulls sports
lung AND cancer health
We use the results to estimatecoverage and specificity values
CS Papers archive (Computers)
Science and technology magazine (Science)
Articles about outdoor activities (Hobbies)
News and discussion about religions (Society)