420 likes | 449 Views
Develops an innovative way to create dynamic photo slideshows with music beats, combining visual processing and music analysis for an engaging presentation.
E N D
Tiling Slideshow Jun-Cheng Chen, Wei-Ta Chu, Jin-Hau Kuo, Chung-Yi Weng, and Ja-Ling Wu wtchu@cmlab.csie.ntu.edu.tw National Taiwan University ACM Multimedia Conference 2006
Motivation • Large amounts of unorganized photos burden information access • Conventional methods • Content-based image retrieval • Digital photo album • Photo browsers • Issues • Organized presentation • Presenting one-by-one makes users boring • Photo quality ACM Multimedia Conference 2006
Generator Tiling slideshow Goal • Automatically generate well-organized and lively photo presentation
Topic photo Supportive photos Photographic Story • Paragraph: describe by text • Contains a topic sentence and several supportive sentences. • Tiling slideshow: describe by photos • Contains a topic photo and several supportive photos ACM Multimedia Conference 2006
The Proposed Slideshow Music Beats Time 5 1 2 3 4 6 7 8 ACM Multimedia Conference 2006
Outline • System Overview • Visual Processing • Music Analysis • Tiling Slideshow Composition • Evaluation • Conclusion ACM Multimedia Conference 2006
Orientation cor. • Blur detection • Overexposure • underexposure • detection • Time-based & • Content-based • clustering • Temp. selection • ROI detection • Temporal & spatial composition System Overview Music Photos Preprocessing Beat detection Clustering Composition Tiling slideshow ACM Multimedia Conference 2006
Blurred photo Underexposure photo Photo Processing • Orientation correction • EXIF (Exchangeable Image File Format) metadata • Photo Filtering • Blur detection • Check edge information in diff. resolutions • Overexposure/Underexposure detection • Check intensity information of each photo
15 sec 47 sec 7hr 30 sec Time-based Clustering • Check the time gap between adjacent photos ACM Multimedia Conference 2006
Within-cluster distance: d(.) is the average of normalized dominant color and color layout distances. Between-cluster distance: Goodness of a clustering case: Content-based Clustering (1/3) • Given a time-based photo cluster, finer clustering is performed based on content-based features. (dominant color and color layout) ACM Multimedia Conference 2006
Clustering case 1 Clustering case 2 Content-based Clustering (2/3) ACM Multimedia Conference 2006
Clustering Results Clustering Results Content-based Clustering (3/3)
Search range for frame switching Sound EnergyDifference 1 2 3 4 5 Frame 1 starts Frame 2 starts Music Beats t1 t2 t3 t4 t5 r1 (4 seconds) r2 (6 seconds) Music Analysis For frame switching and photo displaying • Beat detection • Music segmentation Frame 2 Frame 1 2 4 1 3 5
Short Summary • Photo • Filter out defective photos • Organize photos in terms of time and content characteristics • Music • Segment into smaller pieces Music Photos Preprocessing Beat detection Clustering Composition Tiling slideshow ACM Multimedia Conference 2006
Tiling Slideshow Composition • Problem 1 • Given a time-limited music clip, only a subset of photo clusters can be displayed. • Problem 2 • For a cluster of photos to be displayed, more important photos should occupy larger space. • Problem 3 • Photos should be smartly manipulated to fit in with the limited displaying space. ACM Multimedia Conference 2006
─ Shooting frequency ─ Opposite to within-cluster distance ─ Nonlinear fusion scheme Cluster Selection (for Problem 1) • Cluster-based importance • Defined based on “photo per minute (PPM)” and “photo conformance (PC)” For each content-based cluster Cg in a time-based cluster ACM Multimedia Conference 2006
─ Template importance vector Template Determination (for Problem 2) • Templates importance 3-cell Template Topic cell Supportive cell 4-cell Template Topic cell Topic cell ACM Multimedia Conference 2006
Template Determination (for Problem 2) • Photo-based importance • Defined based on “face region (FR)” and “attention value (AV)” ─ Photo importance vector ACM Multimedia Conference 2006
Template Determination (for Problem 2) • Find the most matching between template importance and photo importance • Find the minimum included angle between them ACM Multimedia Conference 2006
Top-down case: (photo with face) Bottom-up case: (photo without face) Composition (for Problem 3) • Find the region that conveys most “content value” and conforms to the aspect ratio of the targeted cell. ACM Multimedia Conference 2006
480 pixels 720 pixels Composition (for Problem 3) 1. Find ROI 2. Extend 3. Crop 4. Resize
Demo ACM Multimedia Conference 2006
Evaluation • Data set ACM Multimedia Conference 2006
User Study • Compare the satisfaction of ACDSee, PhotoStory, and Tiling slideshow • Questionnaire • Q1: How do you feel the photo variety in a time unit? • Q2: Do you think it's a funny presentation? • Q3: Do you think the sequence helps you experience this trip? • Q4: Are you willing to use it to generate your own slideshow? • Q5: How do you feel the audiovisual effects of this slideshow? ACM Multimedia Conference 2006
Subjective Scores Sequence 1 Questions Sequence 2 Sequence 3
Objective Tests (1/2) • Clustering performance evaluation ACM Multimedia Conference 2006
Objective Tests (2/2) • Cropping performance evaluation ACM Multimedia Conference 2006
Summary • We propose a new type of audiovisual presentation for consumer photos. • Perform both visual and music analysis for organized presentation. • We deal with issues on content selection and smart manipulation to display qualified content in limited time and limited space. • Semantic features or user intervention can be added to enhance the performance. ACM Multimedia Conference 2006
Backup Slides ACM Multimedia Conference 2006
Photos Music Orientation Correction Quality Estimation Preprocess Photo Filtering Beat Detection Time-based Clustering ROI Determination Analysis Music Segmentation Content-based Clustering Composition Tiling Slideshow Composition Tiling slideshow System Overview
An EXIF Example File name : IMG_1770.JPG File size : 2062120 bytes File date : 2005:11:16 10:04:20 Camera make : Canon Camera model : Canon PowerShot S60 Date/Time : 2005:11:16 10:04:21 Resolution : 2592 x 1944 Orientation : rotate 90 Flash used : No (auto) Focal length : 5.8mm (35mm equivalent: 29mm) CCD width : 7.19mm Exposure time: 0.0100 s (1/100) Aperture : f/2.8 Whitebalance : Auto Metering Mode: matrix
+ + + Blur Detection H. Tong, M. Li, H.-J. Zhang, and C. Zhang, “Blue detection for digital images using wavelet transform,” Proc. of ICME, pp. 17-20, 2004.
Time-based Clustering Adaptive threshold clustering algorithm gi is the time gap between photo i and photo i+1 K is a suitable threshold (K=log(17)) d is the size of sliding windows (d = 5) 11 time gaps g1 g2 gN J.C. Platt, M. Czerwinski, and B.A. Field, “PhotoTOC: automating clustering for browsing personal photographs,” Proc. of PCM, pp. 6-10, 2003. ACM Multimedia Conference 2006
(a) N+5 N-3 N-2 N-1 N N+1 N+2 N+3 N+4 t (b) N+5 N-3 N-2 N-1 N N+1 N+2 N+3 N+4 t ACM Multimedia Conference 2006
Beat Detection Music Signal Frequency Filterbank . . . . Envelope Extractor Envelope Extractor . . . . First-Order Differentiator First-Order Differentiator . . . . Half-wave Rectifier Half-wave Rectifier . . . . Comb Filterbank Comb Filterbank . . . . . . . . Energy Energy Energy Energy . . . . ∑ ∑ Beat Peak Picking E.D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” Journal of Acoustical Society of America, vol. 103, no. 1, pp. 588-601, 1998.
Tiling Slideshow Composition • Cluster Selection • Cluster-based importance • Template Determination • Photo-based importance • Spatial Composition • Smart cropping • Temporal Composition ACM Multimedia Conference 2006
ROI Determination Top-Down Attention Detection Face detection ACM Multimedia Conference 2006
ROI Determination Bottom-up Attention Detection Salience Map Generation Attentive Center and Region Extraction ACM Multimedia Conference 2006
Composition (for Problem 3) • Region selection C(Ri) = content value of the region Ri Top-down case: Bottom-up case: IMP(x,y): Applying a 2D Gaussian to the point (x,y), which is the centroid of face region or saliency map. ACM Multimedia Conference 2006
User Study 2 • Evaluate the performances in terms of content-based clustering and template determination. • Questionnaire • Q6: How do you feel the visual coherence of photos in the same frame? • Q7: How do you feel the layout of display? ACM Multimedia Conference 2006
Question 6 Question 7 User Study 2 ACM Multimedia Conference 2006