1 / 38

Geremy Heitz Steve Gould Ashutosh Saxena Daphne Koller August 11, 2008 DAGS

Cascaded Classification Models: Combining Models for Holistic Scene Understanding Helping models play nice since 2008. Geremy Heitz Steve Gould Ashutosh Saxena Daphne Koller August 11, 2008 DAGS. Outline. Understanding Scene Understanding Related Work Model Desiderata CCM Framework

alvint
Download Presentation

Geremy Heitz Steve Gould Ashutosh Saxena Daphne Koller August 11, 2008 DAGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cascaded Classification Models:Combining Models for Holistic Scene UnderstandingHelping models play nice since 2008... Geremy Heitz Steve Gould Ashutosh Saxena Daphne Koller August 11, 2008 DAGS

  2. Outline • Understanding Scene Understanding • Related Work • Model Desiderata • CCM Framework • Results • Extensions

  3. Computer View of a “Scene” SKY GRASS SEASIDEPASTURE

  4. Human View of a “Scene” She’s walking. A cow Some grass… “The cow is walking through the grass on a pasture by the sea.”

  5. Scene Understanding • Requires combining many tasks • Object Detection • Scene Categorization • Region Labeling • Depth Reconstruction • Requires the “right” representation • Matches the questions we might ask • Operates at multiple granularities • The whole is greater than the sum… • What information can they share?

  6. Visual Context • Context (from http://www.thefreedictionary.com): • “The words before and after a word or passage in a piece of writing that contribute to its meaning.” • Visual Context: • “The visual objects ‘near’ a particular visual object that contribute to its meaning” • Visual Context Cues: • “Signals obtained from nearby visual objects that may help a classifier classify a query object”

  7. Context Example

  8. Outline • Understanding Scene Understanding • Related Work • Model Desiderata • CCM Framework • Results • Extensions

  9. 3D from Line Drawings • David Waltz – “Understanding Line Drawings of Scenes with Shadows” - 1975

  10. Intrinsic Images • Barrow and Tenenbaum – “Recovering intrinsic scene characteristics from images” - 1978 • Tappen et al. – “Recovering intrinsic images from a single image” - 2005 Original Image Reflectance Image Shading Image

  11. Scene Understanding • Derek Hoiem – “Closing the Loop in Scene Interpretation” – CVPR 2008 • Uses “Intrinsic Image” idea • But… • Tailored specifically to his previous models • Fewer classes • Regions get generic properties • Hard to pronounce his name

  12. Context Model Desiderata • Allow state-of-the-art subcomponents • Generic method of combining them • Limited interface into “black boxes” REGION LABELINGGould et al., 2007 DEPTH RECONSTRUCTIONSaxena et al., 2007 DETECTIONDalal & Triggs, 2006

  13. Context Model Desiderata SKY SKY GRASS GRASS • Learn from datasets with arbitrary sets of labels • Different components improve each other MSRC MulticlassSegmentation Pascal VisualObject Classes LabelMe Stanford RangeImage Data + > ,

  14. Cascaded Classification Models • Component modules must have 3 properties • Learning The classifier should be able to learn from a set of training instances. • Classification We should be able to obtain a classification of the output variables. • Connectivity The classifier should provide a mechanism for including features from other modules.

  15. CCMs I ΦD ΦS ΦZ ŶD ŶD ŶS ŶS ŶD ŶZ ŶZ ŶZ ŶS L L 0 1 1 L 0 0 1 • I: Image • Φ: Image Features • Ŷ: Output labels • Features for level ℓ+1 computed from Φ and labels of level ℓ

  16. How to use black boxes? BLACK BOX WAHOOCLASSIFIER Output Labels YWAHOO BLACK BOX SHAZAMCLASSIFIER YSHAZAM

  17. CCMs for Scene Understanding • Scene Categorization • Object Detection • Region Labeling • Depth Reconstruction

  18. Scene Categorization C = { ‘urban’, ‘rural’, ‘ocean’, ‘other’ } RGB Mean/StddevYCbCr Mean/Stddev From Detection: # of detections of each object From Regions Labeling: Fraction of each region type

  19. Object Detection – HOG Features • Dalal & Triggs, 2006 SVM

  20. Object Detection - Sliding Window • Consider every bounding box • All shifts • All scales • Possibly all rotations • Each box gets a score: • D(x,y,s,Θ) • Detections: • Local peaks in D() D = 1.5 D = -0.3

  21. Object Detection = [1 D(x,y,s) X Y X2 Y2 XY W W2] P(Y) = LogReg(Φ,w) Y = 1{is a car} F2: Detector Scoreof window F10: Amount of “building” above window F50: Variance of depthsin window

  22. Region Labeling CRF SKY GRASS Y = { ‘grass’, ‘road’, ‘tree’, ‘sky’, ‘water’, ‘building’, ‘foreground’ } Mean R,G,BMean H,U,VTexture ResponsesAreaAspect Ratio… Delta R,G,BOffset Vector…

  23. Region Labeling Context Predict“grass” Relative Location Map

  24. Depth Reconstruction

  25. Depth Reconstruction with Context SKY GRASS Normals point out Normals point up • Find d* • Reoptimize depths with new constraints: BLACK BOX dCCM = argmin γ||d - d*|| + β||n - nCONTEXT|| + …

  26. SU-CCM SKY GRASS SEASIDEPASTURE Grass = FlatSky = FarFG = Vertical 40% Grass,30% Sky… 1 cow, 2 boats…

  27. Results • Experiments on 2 datasets • SU-1 • 362 images, fully labeled • Scene categorization, object detection, region labeling • Gathered by us • SU-2 • 1746 images, disjoint labels • Object detection, region labeling, depth reconstruction • Combination of PASCAL data, MSRC data, Stanford Range Image Data, other…

  28. Methods I ΦD ΦS ΦZ ŶS ŶD ŶD ŶD ŶZ ŶS ŶZ ŶS ŶZ L L 1 0 L 0 1 0 1 • Independent • Level 0 Models • Groundtruth • Each tier is trained using the groundtruth outputs from the previous tier • 2-CCM • Parameters from tier 1 are copied to all other levels • 5-CCM

  29. SU-1 Segment Labeling 0.75 0.73 0.71 Pixel Accuracy Independent 0.69 Groundtruth 2-CCM 0.67 5-CCM 0.65 1 2 3 4 5 6 Classification Tiers

  30. SU-1 Object Detection 0.38 0.37 0.36 Detection AP 0.35 0.34 0.33 1 2 3 4 5 6 Classification Tiers Detection AP = Robust Area Under Precision-Recall Curve

  31. SU-1 Scene Categorization 0.8 0.76 0.72 Scene Category Acc. 0.68 0.64 0.6 1 2 3 4 5 6 Classification Tiers

  32. Some Examples: SU-2

  33. Some Examples: SU-2

  34. SU-2 Results

  35. Scene Understanding • Requires combining many tasks • Object Detection • Scene Categorization • Region Labeling • Depth Reconstruction • Requires the “right” representation • Matches the questions we might ask • Operates at multiple granularities • The whole is greater than the sum… • What information can they share?

  36. Descriptive Classification Localized Test Outlines Up Down UP DOWN Descriptive Classification City walking during rush hour?ORLong walk on the beach? Object Level Scene Level?

More Related