Crawling Deep Web Content Through Query Forms. Jun Liu, Zhaohui Wu, Lu Jiang, Qinghua Zheng and Xiao Liu Speaker: Lu Jiang Xi’an Jiaotong University P.R.China. Outline. Background Related work Minimum Executable Pattern Adaptive Crawling Algorithm Experimental results Conclusions.
Jun Liu, Zhaohui Wu, Lu Jiang, Qinghua Zheng and Xiao Liu
Speaker: Lu Jiang
Xi’an Jiaotong University
Data retrieval in Deep Web [Michael K. Bergman,2001]Why the Deep Web
Data Accumulation Phase
Obtained x new records while accessing y records.
Harvest rate = x/y.
The harvest rate and extracted records are used to evaluate query candidate.
Iteration goes on until stop condition is satisfied
We believe MEP method with multi-MEP outperforms than that with a single one of the multi-MEP
Here comes the Appendix
Issue a query via mep1 and get 200 record assessing 250 records
Accessing new record rate = 200/250 = 0.8
mep1 = 0.8/(0.33+0.33+0.8) = 0.55
mep2 = 0.33/(0.33+0.33+0.8) = 0.22
mep3 = 0.33/(0.33+0.33+0.8) = 0.22
Issue a query via mep1 and get 30 record assessing 100 records
Accessing new record rate = 30/100 = 0.3
mep1 = 0.3/(0.22+0.22+0.3) = 0.40
mep2 = 0.22/(0.22+0.22+0.3) = 0.29
mep3 = 0.22/(0.22+0.22+0.3) = 0.29
mep1 mep2 mep3
Keyword capability =