FICA: A Fast and Intelligent Crawling Algorithm for Efficient Web Information Retrieval

Web Information retrieval (Web IR) Handout #11: FICA: A Fast Intelligent Crawling Algorithm Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir

Web Crawling • Search engines do not index the entire Web • Therefore, we have to focus on the most valuable and appealing ones • To do this, a better crawling criterion is required • FICA

Breadth-First Crawling u q v r w p x s y t BFS Advantages Why it is a acceptable algorithm? z

Logarithmic Distance Crawling When i points to j then: u q v log4 dpv=log4+log3=1.07 r w p x s y t dpt=log4 z dpz=log4+log2=0.9

FICA • Intelligent surfer model • It is based on reinforcement learning

Priority Queue FICA (On-line) Web • Distance is used as the priority value Web pages Downloader URLs Text and Metadata Repository URL1 URL2 … FICA scheduler URLs Seeds

Comparison with Others Web Partial Ranking Algorithm Downloader Repository URLs and Links URL1 URL2 … Seeds

Experimental Results • Experiment was done on UK web graph including 18 million web pages • We chose PageRank as an ideal ranking mechanism

FICA Properties • Its time complexity is O(ElogV) • Complexity of Partial PageRank is • FICA outperforms others in discovering highly important pages • It requires small memory for computation • It is online & adaptive

FICA as a Ranking Algorithm • We used Kendall's metric for correlation between two rank lists • Ideal is PageRank

Dynamic Version of FICA

FICA: A Fast and Intelligent Crawling Algorithm for Efficient Web Information Retrieval

FICA: A Fast and Intelligent Crawling Algorithm for Efficient Web Information Retrieval

Presentation Transcript

Information Retrieval and Web Search

Web Information Retrieval

Information retrieval practice

Information Retrieval

CS276: Information Retrieval and Web Search

Web Information retrieval (Web IR)

Information Retrieval

Information Retrieval

Geographic Web Information Retrieval

Web Information retrieval (Web IR)

Web Information retrieval (Web IR)

Web Information retrieval (Web IR)

Information Retrieval and Web Search

Information Retrieval and Web Search

Information Retrieval and Web Search

Web Information Retrieval

Information Retrieval and Web Search

Information Retrieval and Web Search

Information Retrieval and Web Search