Hierarchical Framework for Content-Based Image Retrieval

Hierarchical Framework for Content Based Image RetrievalStatus and Implementation Bob Curry

Agenda • BRIEF Review of work to date • Progress since last presentation • Overview of implementation • Remaining work • Future Possibilities

Proposed Solution – Hierarchical Framework • Use complementary strengths of RSC and RIE • Create a small domain-specific RSC database (domain database) that will process semantic queries relevant to that problem domain and will generate a set of example images. • Send the set of example images to the larger RIE database (target database) to get matched images From Previous Presentation

Advantages of Hierarchical Framework • Provides a semantically-relevant method of querying an image database • Decouples the knowledge organization from the image matching mechanism • Requires expert involvement to encode knowledge of smaller set of data • Images and image matching algorithms in the large target database can change and improve with no impact to knowledge • Multiple, distributed domain databases can be used against one target database From Previous Presentation

Hierarchical CBIR Framework Diagram From Previous Presentation

Problem Formulation • In order for this framework to be realized, several problems must be solved: • Definition and method for capturing domain knowledge that allows semantic queries • Method to allow semantic features in domain to be mapped to an example image • Definition of an interface to an RIE image retrieval system that possesses the necessary properties. • Development of a query language to specify desired semantic values for image retrieval • The solution of these problems will be the contribution of this work to the general knowledge in CBIR From Previous Presentation

Knowledge Encoding for the Problem Domain • Various possible ways to encode knowledge • This approach uses • Structured Annotation (agent, action, object) to specify semantic values of interest • Shown to be effective in semantically describing visual data • Domain-Specific Ontology to represent all agents and objects • Ontology intrinsically codes general-to-specific relationships • Spatial Relationships to represent all actions • Eliminates need to develop example images for all possible actions between all possible agents and objects. From Previous Presentation

Remaining Work • Refine mechanism to transform from action specification to translation/rotation data • Create prototype image synthesis process – wring out problems – generate images from examples • Expand example to fully populate ontology with instances including images. Do full end-to-end scenarios From Previous Presentation

Additional Comments from Previous Presentation • In addition to Agents, Actions, Objects, use Setting as a relevant concept • Develop a method to select the best image from group of possible images

Accomplishments since last Presentation • Refined mechanism to transform from Action specification to image clip placement • Defined and implemented all data structures required in ontology to support the approach • Developed general methodology and specific implementation for clip image pruning based on • Image knowledge • Action Constraints • Created support for Setting information and Setting images • Defined format and algorithm for queries

Accomplishments since last Presentation • Created implementation with capabilities including: • Creating clip images • Processing query results from ontology • Synthesizing example images from queries • Tested example scenarios on simplified knowledge base of geometrical shapes • Generated example images as expected • Confirmed proper operation of all elements including pruning process

Overview of Implementation • Consists of three main parts • Domain Database (Win2000) • Protégé ontology • Example domain images • Processing Module (Win2000) • MS Visual C++ code • Interpretation of query results • Image selection • Image synthesis • Target Database (Linux) • GIFT image matching system • Simulated heterogeneous images

Hierarchical CBIR System Diagram

Query Definition Query [Agent; Action;] Object [;Setting] Agent Ont_Descriptor Object Ont_Descriptor Setting SettingImages::ImageName Action Actions::ActionName Ont_Descriptor Term [, (Ont_Descriptor)] Term ClassName::SlotName (= | < | > ) Value Value String | Integer | Real | Enum An example of a query is: GeometricShape::Edges > 2,GeometricShape::Verticies > 3; RightOf; GeometricShape::NAME = AcuteTriangle1; Outdoors

Image Synthesis Expanded • Tasks are • Obtain Query Results • Agent Image Descriptors • Object Image Descriptors • Setting Image Descriptor • Action Descriptor • Prune Query Results • Synthesize Images

Obtain Query Results

Descriptors Class CClipImage : public CImage { CString ImageName; CString FileName; CKnowledgeDef MyKnowledgeDef }; Class CKnowledgeDef { Direction Direction_Front; Direction Direction_Top; }; Class CAction { int DeltaX; int DeltaY; int DeltaZ; double Theta; CConstraintDef MyConstraintDef; }; Class CConstraintDef { // Orientation of Agent toward Object Orientation AgentToObjectOr; //Orientation of Object toward Agent Orientation ObjectToAgentOr; };

Descriptors enum Direction { Left = -3, In = -2, Down = -1, None = 0, Up = 1, Out = 2, Right = 3 }; enum Orientation { Top, Front, Back, Bottom, LSide, RSide, Any };

Pruning – Image Knowledge/Action Constraint • Semantically relevant image knowledge is linked to each image • Example: • In a passport photo, • Direction_Front = Out; (The front of the object is pointing out of the image) • Direction_Top = Up; (The top of the object is pointing up in the image) • In a silhouette, • Direction_Front = Right; (or left) • Direction_Top = Up;

Pruning – Image Knowledge/Action Constraint • Semantically relevant constraints on image knowledge between Object and Agent are specified in Action descriptor • Example: • If the Action between two people is “talking” it is appropriate to consider only images where the two people are facing each other. • AgentToObjectOr = Front; (Front of Agent is toward Object) • ObjectToAgentOr = Front; (Front of Object is toward Agent) • Requires that: Direction(AgentToObjectOr) = - Direction(ObjectToAgentOr)

Pruning – Image Knowledge/Action Constraint • Putting it together • Each pair of candidate images obtained from query is examined to see if the direction knowledge meets the orientation constraint. If so, that pair can be combined. Agents Objects Front = Right Top = Up Front = Out Top = Up Front = In Top = Up Front = Left Top = Up

Test Ontology • Simple 2-D Geometric Shapes • Triangles, quadrilaterals, etc. • Allows simple example images – finite set • Includes semantic attributes such as “short”, “long”, etc • Supports demonstration of all important aspects of synthesis • Synthesized images should be less susceptible to shortcomings of Target Database matching algorithms. • Example Query statement: • “Show all images of a fat cylinder under and touching a narrow red cone”

Example Cases • Simple Presence Representation • No Action term - DONE • Single Agent, Single Object • Simple synthesis – no pruning - DONE • Multiple Agents and Objects • No Pruning required - DONE • Pruning - DONE • Compound Queries • AND • OR • BUTNOT

Remaining for me • Consider changing from current DeltaX, DeltaY, DeltaZ and theta image placement to relative spatial relationships defined by Allen (Virtual Images for Similarity Retrieval) • Solves a problem of specifying touching images • More intuitive • Add Image Knowledge/Action Constraints to qualitatively classify distance of image object from observer • Create and test a realistic problem domain • Weapons and Targets – simulating battlefield image info • “Show images of missile chasing airplane over desert” • Incorporate results and recent changes into thesis

Further work possible • Automation of all system interfaces • A complete turn-key test bed for future use • Expand concept beyond unary and binary (Agent – Object) to n-ary object synthesis • Includes associated enhancement of queries • Generalized implementation of image knowledge/action constraints • This could be different for different problem domains

Problem Synthesized Image Image Synthesizer Query Image database Knowledge base

Future Work Dipti’s Pitch

object Two wheelers bike motorbike hat man agent Example (Bob’s system) Sample database Ontology man hat bike • Role • Wears(agent, object) = object above agent • Rides(agent,two-wheeler) = agent above two wheeler

Example (simple queries) Query: man wears hat Query: man rides a bike

object Two wheelers bike motorbike hat girl agent Example (Dipti’s system) Sample database Ontology man hat Rides(Girl, two-wheeler) • Role • Wears(agent, object) = object above agent • Rides(agent,two-wheeler) = agent above two wheeler

Example (composite queries) Query: girl rides a bike and wears a hat

Challenges • Representing the semantic content of an image • Identifying objects in an image • Resolving queries • Synthesizing images

Representing the semantic content of an image • Use Description Logic to represent the content of the image in a way that it is hierarchical and compositional • What is DL ? It’s a language that allows reasoning about information, in particular supporting the classification of descriptors Reference: CLASSIC, Meghini(image description using DL),GALEN.

Description Logic • A DL models a domain in terms of 3 things: • INDIVIDUALS • which represent the instances of objects which we are modeling • CONCEPTS • descriptions of a class of objects which represent the same thing • ROLES • the relationship between Concepts & Roles

Example • Concept Example • person represents all human beings • fruit represents all the fruits • Individual Example • Man, woman are individuals of the concept person • Banana is an individual of the concept fruit • Role Example • Eat(person,fruit) is a relationship describing person and something they are eating

DL • Using these small blocks we can build more complex expressions • Example • Eat(person, fruits) • Eat(Person, fruits) & Sits(Person,Chair) • Example • cool-student = student & drives(student, ferrari)

Reasoning with DL • Subsumption ( ) • Basic inferencing tool • Checks whether a concept is more general than other • Example: • mother woman

Reasoning with DL Classification • Collection of descriptions can be classified using subsumption, providing a hierarchy of descriptions ranging from general to specific. Example person driving car and wearing hat person driving car person New Concept: person wearing hat ? person driving car and wearing hat person driving car Person person wearing hat Person

Automatic Classification • person person driving car person driving car and wearing hat New concept or query: person wearing hat ???

Architecture of DL • DL is described as being split into 2 parts. T-Box & A-Box T- Box => Subsumption & Classification A-Box => Reasons about relationships between individuals thus providing classification and retrieval Eg: mammal vehicle person dog person wearing cap person driving bus bus bike person wearing cap & driving bus Mary’s neighbor driving nimbus Ted

Identifying objects in an image • Segment sections of images and associate them with concepts 2 men standing mountains

man agent object Two wheelers bike motorbike hat girl Resolving queries Sample database Ontology man wears rides hat Rides(Girl, two-wheeler) • Role • Wears(agent, hat) = hat above agent • Rides(agent,two-wheeler) = agent above two wheeler

Resolving queries Query: girl rides a bike and wears a hat • 1- attempt to find an image describe with the query (no synthesizing needed), if not found then • 2- break query into components (synthesizing needed) • Girl rides bike, girl wears a hat, if not found • 3- break query into components (synthesizing needed) • Girl, bike, hats,actions…(for actions, we can have a geo-spatial modeling which maps the action..can use Bob’s definitions here.

Synthesizing images • Issues • find a method to segment out the image content regions • Scaling issues. Ex we have an image of a big hat and a small man…..

To do • Protégé + OIL Adding OIL tag to Protégé, supports the transformation rules of knowledge representation as shown in the papers on DL. • FaCT reasoner Fast Classification of Terminologies is a DL classifier which can be used for concept validity testing. • Query interface • Explore the idea of using image models instead of images.

Hierarchical Framework for Content-Based Image Retrieval

Hierarchical Framework for Content-Based Image Retrieval

Presentation Transcript

Content-Based Image Retrieval

Content-based Image Retrieval

Content-Based Image Retrieval

Content-Based Image Retrieval

Content-based Image Retrieval

CM613 Multimedia storage and retrieval Content-based image retrieval

Private Content Based Image Retrieval

Content-based Image Retrieval for Solar Physics

Content-Based Image Retrieval (CBIR)

Bayesian Content-Based Image Retrieval

Content Based Image Retrieval

Content-based Image Retrieval (CBIR)

Content Based Image Retrieval

Content-based Image Retrieval

Content Based Image Retrieval

Content-Based Image Retrieval

Content-Based Image Retrieval

Content-based Image Retrieval (CBIR)

A Hierarchical Framework for Content-Based Image Retrieval

Content Based Image Retrieval

Content Based Image Retrieval