1 / 1

Web Categorization Crawler Students: Mohammed Agabaria , Adam Shobash Advisor: Victor Kulikov

Web Categorization Crawler Students: Mohammed Agabaria , Adam Shobash Advisor: Victor Kulikov. “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion”. Crawler Overview:

magar
Download Presentation

Web Categorization Crawler Students: Mohammed Agabaria , Adam Shobash Advisor: Victor Kulikov

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Categorization Crawler Students: Mohammed Agabaria, Adam Shobash Advisor: Victor Kulikov “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion” • Crawler Overview: • The Crawler starts with a list of URLs to visit, called the seeds list • The Crawler visits these URLs and identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the frontier • URLs from the frontier are recursively visited according to a predefined set of policies “The main role of the categorizer is to classify the current fetched page to the categories defined by the user. The crawler passes through all the categories, and determines to which categories the page is ascribed” Internet Downloading … Done Matching … Done Get scheduled URL Fetch from the Internet Crawl task Categorize the page Crawler Server Category A Category B Category C “This project deals with Implementation of multi-threaded Web Categorization Crawler, which consists of fetching pages from the internet, extracting all the hyperlinks from the fetched page, ranking every link depending on the relevance of the link. Every page then is categorized and the results are saved in the database”

More Related