1 / 48

Recognition by Parts

Recognition by Parts. Visual Recognition Lecture 12. “The whole is equal to the sum of its parts” Euclid. Main approaches to recognition:. Pattern recognition Invariants Alignment Part decomposition Functional description. Recognize !.

woods
Download Presentation

Recognition by Parts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition by Parts Visual Recognition Lecture 12 “The whole is equal to the sum of its parts” Euclid

  2. Main approaches to recognition: • Pattern recognition • Invariants • Alignment • Part decomposition • Functional description

  3. Recognize !

  4. “One of the most interesting aspect of the world is that it can be considered to be made up of patterns. A pattern is essentially an arrangement. It is characterized by the order of the elements of which it is made rather than by intrinsic nature of the elements” Norbert Wiener

  5. Nonsense Object • The description reflect the working of a representational system • Segmentation at regions of deep concavity • Parts are described with common volumetric terms • The manner of segmentation and analysis into components does not depend on our familiarity with the object

  6. Issues • Why parts? Why partition the shape? • How does the visual system decompose shapes into parts ? • Are parts chosen arbitrarily by the visual system? • How the 3D parts of an object are inferred from its 2D projection delivered by the eye? • Etc.

  7. Between Speech and OR • Number of categories rivals the number of words that can be identified from speech • Speech perception: by identification of primitive elements – phonemes • Small set of primitives (English 44) each with a handful of attributes • The representational power derives from combinations of the primitives

  8. OR – The Visual Domain • Primitives – modest number of simple geometric components • Generally convex and volumetric (cylinders, blocks, cones, etc.) • Segmentation at regions of sharp concavity • Primitives derive from combinations of few qualitative characteristics of the edges in the 2D image (straight vs. curved, symmetry etc.) • These particular properties of edges are invariant over changes in orientation and can be determined from just a few points on each edge • Tolerance for variations of viewpoint, occlusion, noise • The representational power derives from the enormous number of combinations

  9. Count VS. Mass Noun Objects • Categorization of isolated (unanticipated) objects • Modeling is limited to concrete entities with specified boundaries • Mass nouns (water, sand) do not have a simple volumetric description and are identified differently. Primarily through surface characteristics (texture, color)

  10. Unexpected Object Recognition • Is possible (not an obvious conclusion) • Can be done rapidly • When viewed from novel orientations • Under moderate level of visual noise • When partially occluded • When it is a new exemplar of a category

  11. Resulting Constraints • Access to mental representation should not be dependent on absolute judgment of quantitative detail • The information that is the basis of recognition should be relatively invariant with respect to orientation and modest degradation • Partial matches should be computable

  12. RBC: Recognition-By-Components The contribution: a proposal for a particular vocabulary of components derived from perceptual mechanisms and its account of how an arrangement of these components can access a representation of an object in memory

  13. Issues • Stages up to and including the identification of components are assumed to be bottom-up • It is likely that top-down routes (e.g. from expectancy, object familiarity, scene constraints) will be observed at number of the stages (e.g. segmentation, component definition, matching) – omitted in the interests of simplicity • Matching of the components occurs in parallel • Partial matches are possible (degree of match is proportional to the similarity in the components between image and representation)

  14. Geons - Units of Representation • Segmentation into separate regions at points of deep concavity (particularly at cusps where there are discontinuities in curvature) • Transversality – paired concavities arise whenever convex volumes are joined • Each segmented region is approximated by one of a possible set of simple components = geons (geometrical ions) • Can be modeled by generalized cones: volume swept out by a cross section moving along an axis

  15. Geons • Are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities (e.g. blocks, cylinders, spheres) • Can be differentiated on the basis of perceptual properties in the 2D image that are readily detectable and relatively independent of viewing position and degradation (e.g. good continuation, symmetry) • Objects can be complex – the units are simple and regular

  16. Relations Among the Geons • The arrangement of primitives is necessary for representing a particular object • Different arrangements of the same components can lead to different objects

  17. Perceptual Basis for RBC • Certain properties of edges in 2D are taken by the visual system as strong evidence that the 3D edges contain those same properties • Nonaccidental properties – would only rarely be produced by accidental alignments of viewpoint and object features Five nonaccidental properties: • Collinearity – the edge in the 3D world is also straight • Curvilinearity – smoothly curved elements in the image are inferred to arise from smoothly curved features in the 3D world • Symmetry – the object projecting the image is also symmetrical • Parallelism • Cotermination

  18. Nonaccidental Properties Witkin & Tenenbaum 83:surface’s silhouette override the perceptual interpretation of the luminance gradient

  19. Penrose Impossible Triangle

  20. Penrose Impossible Triangle • Cotermination – accidental alignment of the ends of noncoterminous segments

  21. Muller-Lyer Illusion

  22. Muller-Lyer Illusion

  23. Muller-Lyer Illusion Y, arrow, and L vertices allow inference as to the identity of the volume in the image

  24. Generating Geons from GC • The primitives should be rapidly identifiable and invariant over viewpoint and noise • Differences among components are based on differences in nonaccidental properties • Variation over the nonaccidental relations of four attributes of GC generates a set of 36 geons

  25. Geon Set • The characteristics of the cross section: Shape, Symmetry, Constancy of size along the axis (2 x 3 x 3) • The shape of the axis ( x 2) • Here figures 6 and or 7

  26. Nonaccidental 2D Contrasts Among Geons • The values of the 4 attributes can be directly detected as differences in nonaccidental properties e.g. : • Cross-section edges and curvature of the axis – collinearity or curvilinearity • Constant vs expand size of the cross section – parallelism • Specification of the above is sufficient to uniquely classify a given arrangement of edges as one of the 36 geons

  27. More Distinctive Nonaccidental Differences The arrangement of vertices – a richer description

  28. RBC - Summary • A specific set of primitives is derived from small number of independent characteristics of the input • The perceptual system is designed to represent the free combination of a modest number of primitives based on simple perceptual contrast • Geons are uniquely specified from their 2D image properties ( -> 3D object centered reconstruction is not needed) • The input is mapped onto this modest number of primitives. Then using a representational system we can code and access free combinations of these primitives

  29. RBC – General Principles • A line drawing which represents discontinuities is an efficient description and sufficient for primal access • Objects are better represented and analyzed by decomposing them into their natural components – parts • A qualitative description of the components is necessary and sufficient to permit fast access to DB of object models • Non-accidental instances of viewpoint invariant features in the 2D line drawing are sufficient to permit fast access to the qualitative model of a 3D object • Primal access for visual OR is obtained by matching a description of the spatial structure of components making up the object to an indexed DB of models in similar representation

  30. RBC – Computational Hypotheses • Five specific classes of 2D line groupings are sufficient to access the parts representation • Segmentation should happen at concavities in the outline of an object • The geons form an efficient qualitative shape representation for the parts which is suitable for primal access • The symbolic description for objects and models should include geon labels aspect ratios and relative sizes of parts

  31. Implementations • PARVO - Bergevin and Levine 1988 • OPTICA – Dickinson, Rosenfeld, Pentland 1989 • Munck-Fairwood 1991 • Pentland and Sclaroff 1991 • Raja and Jain 1992

  32. Example - Recovering Geons using Superquadrics Lame curves (1818): Superellipse (Hein 1960) Where p even positive integer and q odd positive integer

  33. Superellipse From star-shape to a square in the limit

  34. Superellipsoid 3D surface is obtained by the spherical product of two 2D curves

  35. e2 0.1 1 2 e1 0.1 1 2

  36. Superquadrics Barr 1981 – extension to Includesuperhyperboloids (1-2 pieces) andsupertoroids

  37. Superquadrics in Genral Position From world coordinates to SQ centered (11DOF)

  38. Issues Domain: • Suitable mainly for categorization. Problems: • Extracting parts from the image is often difficult and unreliable. • Many objects cannot be distinguished by their part structure only. • Metric information is essential in many cases.

More Related