1 / 24

Tour the World: building a web-scale landmark recognition engine

Tour the World: building a web-scale landmark recognition engine. ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro Bissacco2, Fernando Brucher2 Tat- Seng Chua1, and Hartmut Neven2. Outline. Introduction Methods Experiments & Results

ronald
Download Presentation

Tour the World: building a web-scale landmark recognition engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tour the World: building a web-scale landmark recognition engine ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro Bissacco2, Fernando Brucher2 Tat-Seng Chua1, and Hartmut Neven2

  2. Outline • Introduction • Methods • Experiments & Results • Conclusion

  3. Introduction

  4. Introduction • Motivation • the vast amount of landmark images in the Internet • Applications • clean landmark images for building virtual tourism • content understanding and geo-location detection • provide tour guide recommendation and visualization • Issues • no readily available list of landmarks in the world • even if there were such a list, it is still challenging to collect true landmark images • efficiency for such a large-scale system.

  5. Discovering Landmarks in the World • Comprehensive and well-organized list of landmarks • Twosources on the Internet • Geographically calibratedimages in photo sharing websites(ex: Picasa) • Geo-tag, text-tag, popularity • Travel guide articles from websites(ex: wikitravel) • Text-based

  6. Frameworks Agglomerative hierarchical clustering Validation criterion: The number of authors in cluster > threshold Google image search & Google map Extract landmark names from the city tour guide corpus[16] [16] J. Yi and N. Sundaresan. A classifier for semi-structured documents. In Proc. of Conf. on Knowledge Discovery and Data Mining, pages 340–344, New York, NY, USA, 2000. 2, 3

  7. Landmarks mined from GPS-tagged

  8. Landmarks extracted from tour guide

  9. Frameworks SIFT-128 -> PCA -> feature dimension 40 Affine transformation[9] [9] D. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, volume 20, pages 91–110, 2003.

  10. Graph cluster

  11. Frameworks Validation criterion: 1. Landmark name too long or most of its words are not capitalized 2. The number of authors in cluster > threshold Cleaning: 1. Photographic vs non-photographic classifier(Adaboost algorithm) 2. Face detector[15] [15] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. of Conf. on Computer Vision and Pattern Recognition, volume 1, pages I–511–I–518 vol.1, 2001.

  12. Photographic vs non-photographic classifier

  13. Efficiency Issues • Parallel computing to mine true landmark images • Efficiency in hierarchical clustering • Indexing local feature for matching • Query, k-d tree

  14. Experiments & Results • ~20 million GPS-tagged photos from picasa and panoramio to construct geo-cluster • Mined from tour guide corpus in Google Image Search to construct noisy image set from first 200 returned images. • The total number of images ~21.4 million.

  15. Statistics of mined landmarks • GPS-tagged photos delivers • 2240 validatedlandmarks, from 812 cities in 104 countries. • The tourguide corpus yields • 3246 validated landmarks, from 626cities in 130 ountries. • Only 174 landmarksin both list

  16. Statistics of mined landmarks • The combined list of landmarks consists of 5312 uniquelandmarks from 1259 cities in 144 countries.

  17. Statistics of mined landmarks • Our processing language focuseson English only. • Thenumber of landmarks in China amounts to 101 only->a multi-lingual data mining task

  18. Evaluation of landmark image mining • set the minimum cluster size to 4 • The visual clustering yields • ~14k visual clusters with ~800k images for landmarks mined from GPS-tagged photos • ~12k clusters with ~110k images for landmarks mined from tour guide corpus. • 1000 visual clustersare randomly selected to evaluate the correctness • Outliercluster rate 0.68% ( 68 out of 1000) • By validation and cleaning, dropsto 0.37%(37 out of 1000).

  19. Evaluation of landmark recognition • Positive and Negative query images • Positive testing image set consists of 728 images from 124 randomly selected landmarks • manually annotated from images that range from 201 to 300 in the Google Image Search

  20. Evaluation of landmark recognition • Negative testing set consists of 30524 (Caltech-256) + 9986 (Pascal VOC 07) = 40510 images in total • The recognition by local feature matching of query image against model images • nearest neighbor (NN) principle.

  21. Evaluation of landmark recognition • Positive testing image set, 417 images are detected by the system to be landmarks, of which 337 are correctly identified. The accuracy of identification is 80.8%, which is fairly satisfactory, considering the large number of landmark models in the system. • This high accuracy enables our system to provide landmark recognition to other applications, like image content analysis and geo-location detection. The identification rate (correctly identified / positive testing images) is 46.3% (337/728)

  22. Evaluation of landmark recognition

  23. Evaluation of landmark recognition • For the negative testing set, 463 out of 40510 images • The false acceptance rate is only 1.1% • After careful examination, we find that most false matches occur in two scenarios: • (1) the match is technically correct, but the match region is not representative to the landmark; and • (2) the match is technically false, due to the visual similarity between negative images and landmark.

  24. Conclusion • We build a world-scale landmark recognition engine with a multi-source and multi-modal data mining task. • We have employed the GPS-tagged photos and online tour guide corpus to generate a worldwide landmark list. • We then utilize ~21.4 M images to build up landmark visual models, in an unsupervised fashion. • The experiments demonstrate that the engine can deliver satisfactory recognition performance, with high efficiency.

More Related