Gunhee Kim Eric P. Xing

Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing School of Computer Science, Carnegie Mellon University June 19, 2013

Outline • Problem Statement • Algorithm • Dataset and preprocessing • Alignment of Multiple Photo Streams • Large-scale Cosegmentation • Experiments • Conclusion

Background Online photo sharing becomes so popular Information overload problem in visual data

Background Query scuba+divingfrom Flickr Any meaningful structural summary? Taken in different spatial, temporal, and personal perspective Likely to share common storylines

Our Ultimate Goal An example of scuba+diving storyline beach on boat diving underwater dinner boat coral sunset cf) ranking and retrieval by Narrative structural summary vs. independently retrieved images Reconstructing photo storylines from large-scale online images

Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding! Alignment Cosegmentation • Segment K common regions from aligned M images • Match images from different photo streams PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico)

Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding! Alignment Cosegmentation • Online images are too diverse to segment together at once • The alignment discovers the images that share common regions PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico)

Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding! Alignment Cosegmentation • Improve image matching by a better image similarity measure Closing a loop between the two tasks

Flickr Dataset Flickr dataset of 15 outdoor recreational activities • Experiments with more than 100Kimages of 1Kphoto streams • Larger than those of previous work by orders of magnitude # photo streams (13,157) # of images(1,514,976) SurfingBeach HorseRiding AirBallooning RAfting YAcht ROwing ScubaDiving FormulaOne SNowboarding SafariPark MountainCamping RockClimbing LondonMarathon FlyFishing Tour deFrance

Image Descriptor and Similarity Measure Image description • HSV color SIFT and HOG features on regular grid • L1 normalized spatial pyramid histogram using 300 visual words Image similarity measure • (Our assumption) Segmentation enhances the image alignment. 2. Segmentation available 1. No segmentation available • Histogram intersection on SPH • Histogram intersection on the best assignment of segments • Not robust against location/pose changes

Alignment of Photo Streams Input: A set of photo streams (PS): P = {P1,…,PL} • Photo Stream: a set of photos taken in sequence by a single user in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric P1 P2 P3 P4

Alignment of Photo Streams Input: A set of photo streams (PS): P = {P1,…,PL} • Photo Stream: a set of photos taken in sequence by a single user in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric For simplicity, first consider pairwise alignment of two photo streams Pairwise alignment P1 P2 P3 P4

Pairwise Alignment Goal of alignment: find a matching btw a pair of PS • means I in P1 has no match in P2. Optimization: MRF-based energy minimization • ex) Deformable image matching [Shekhovtsovet al.07] or SIFT flow [Liu et al. 2009] • Flexibility: Various energy terms • Solved by discrete BP

Pairwise Alignment Objective function 9AM Data term: The matched image pairs should be visually similar. Smoothness term: The matched images to neighbors in P1 should be neighbors in P2. Time term: The matched image pairs should be temporally similar. 6PM 10AM

Alignment of Multiple Photo Streams Objective : MRF-based energy minimization : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations Pairwise alignment Pairwise alignment Pairwise alignment P1 P2 Pairwise alignment P3 P4

Alignment of Multiple Photo Streams Objective : MRF-based energy minimization : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations P1 P2 P3 P4

Build an Image Graph Idea: Connect the images that are similar enough to be cosegmented Image Graph G = (I, E) • I : The set of images. E: The set of edges. • E = EBUEw • EB: Edges between different photo streams (results of alignment) • EW: Edges within a photo stream consider the images such that For each image I, links I with the K-NN of I (EW). Ew EB

Scalable Cosegmentation Iteratively run the MFC algorithm [Kim and Xing, 2012]on the image graph Review of MFC algorithm • Cosegmentation: Jointly segment M images into K+1regions (Kforegrounds (FG) + background (BG)) Foreground Modeling RegionAssignment • Learn appearance models of K FGs and BG Iterate • Allocate the regions of image into one of KFGs or BG • Any region classifiers or their combination • Very efficiently solve using the idea of combinatorial auction • Ex. Gaussian mixture on RGB, linear SVM on SPH

Scalable Cosegmentation on Image Graph Message-passing based optimization • Learn FG Models from neighbors of Ii. • Run region assignment on Ii. Iteratively solve… FG 1 (car) FG 3(BG) FG 2(road)

Scalable Cosegmentation on Image Graph Initialization Message-passing based optimization • Supervised: start from seed labels • Learn FG Models from neighbors of Ii. • Unsupervised: use the algorithm of CoSand[Kim et al. 2011]. • Run region assignment on Ii. Iteratively solve… FG 1 (car) FG 3(BG) FG 2(road)

Summary of Algorithm As a first technical step, jointly perform two tasks... Mutually rewarding! Alignment Cosegmentation Message-passing style optimization • Build K-NN graph between photo streams (PS) • Build K-NN graph between images • Align all photo streams at once guided by the PS graph • Cosegment all images at once guided by the image graph Computation time: O(T|E|) • O(TN) where N: # of images • O(TL) where L: # of photo streams

Evaluation – Two Experiments Evaluation for Alignment Very hard to obtain groundtruth! • Correspondences btw two sets of thousands of images? Task: Temporal localization (inspired by geo-location estimation) Where is it likely to be taken? Whenare they likely to be taken? Timeline P1 [Hays and Efros. 2008] Evaluation for cosegmentation Task: Foreground detection • We manually annotate 100 images per class • Accuracy is measured by intersection-over-union

Evaluation of Alignment Procedures of temporal localization 1. Given a set of photo streams, randomly split training and test sets Training (80%) Better temporal localization ≠ Better Alignment Test (20%) 2. Run alignment 3. Estimate timestamps of all images in test photo streams 4. Temporal localization is correct if Baselines • BPS: Our Alignment + Cosegmentation • BP: Our alignment only Popular multiple sequence alignment • KNN: K-nearest neighbors – Image similarity only (the simplest) • HMM: Hidden Markov Models Justify closing a loop • DTW: Dynamic Time Windows

Evaluation of Alignment = 60 min. • Temporal localization is correct if BPS: Our Alignment + Cosegmentation BP: Our alignment only KNN: K-nearest neighbors HMM: Hidden Markov Models DTW: Dynamic Time Windows

Evaluation of Cosegmentation Task: Foreground detection BP+MFC: (Proposed) Alignment + Cosegmentation MFC: Our cosegmentation without alignment COS : Submodular optimization [Kim et al. ICCV11] LDA: LDA-based localization [Russell et al. CVPR06] Examples

Conclusion Ultimate goal: building photo storylinesfrom large-scale online images horse+riding safari+park

Conclusion Jointly perform Alignment and cosegmentation • As a first technical step • Much remains to be done. • Code will be available soon. Message-passing based optimization • An unified framework for both alignment and cosegmentation • Scalable (ex. linear computation time with # PS, # of images)

Thank you !Stop by our poster.

Gunhee Kim Eric P. Xing

Gunhee Kim Eric P. Xing

Presentation Transcript

Yi Xing Hong Cha

Kim

Gunhee Kim Leonid Sigal Eric P. Xing

ERIC

ERIC PAN (#6) z 3329915 DAVID KIM (#17) z3220406

Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Dennis P. Lettenmaier Qiuhong Tang Eric Rosenberg

Xing Fu Eric Puster 14 October 2008

P. Kim Sturgess, P.Eng. FCAE

My Hometown- Shao Xing

Eric Strawser, Kim Smith, Gretchen Sturm Per. 4

KIM

Amr Ahmed and Eric P. Xing

Suyash Shringarpure and Eric Xing School of Computer Science Carnegie Mellon University ICML 2008

Dennis P. Lettenmaier Qiuhong Tang Eric Rosenberg

ERIC

ERIC

Eric Manns | Eric Manns

Xing Data Scraper