Costantino Grana , Roberto Vezzani, Rita Cucchiara

DELOS Conference Tirrenia (PI), Italy, 13-14 February 2007 Prototypes selection with context based intra-class clustering for video annotation with MPEG-7 features Costantino Grana, Roberto Vezzani, Rita Cucchiara

INTRODUCTION • A system for the semi automatic annotation of videos, by means of Pictorially Enriched Ontologies is presented. These are ontologies for context-based video digital libraries, enriched by pictorial concepts for video annotation, summarization and similarity-based retrieval. Extraction of pictorial concepts with video clips clustering, ontology storing with MPEG-7, and the use of the ontology for stored video annotation are described. Video Shot & Sub-Shot detection Extraction Features Classification with ontology Save results in Mpeg 7 camera car

SIMILARITY OF VIDEO CLIPS • The color histogram, in 256 bins of HSV color space (Scalable Color Descriptor). • The 64 spatial color distributions: to account for the spatial distribution of the colors, an 8x8 grid is superimposed to the frame and the mean YCbCr color is computed for each area (Color Layout Descriptor). • The four main motion vectors: they are computed as the average of the MPEG motion vector, extracted in each quarter of frame. The median value has been adopted since MPEG motion vector are not always reliable and are often affected by noise .

SIMILARITY OF VIDEO CLIPS • Generalization of image similarity • Usually single key frame per shot, but variation may be too large. • Our approach: • M representative frames uniformly selected from the shot • Remove the worse match to improve reliable correspondences

PICTORIAL ENRICHED ONTOLOGY CREATION • After the definition of the textual domain ontology, a pictorial enriched ontology requires the selection of the prototypal clips that can constitute pictorial concepts as specialization of each ontology category. Large training sets of clips for each category must be defined and an automatic process extracts some visual prototypes for every category. • Using the previously defined features and dissimilarity function, we employ a hierarchical clustering method, based on Complete Link. • This technique guarantees that each clip must be similar to every other in the cluster and any other clip outside the cluster has dissimilarity greater than the maximum distance between cluster elements. • For this clustering method the dissimilarity between two clusters is defined as

Automatic clustering level selection • A rule has to be based on cluster topology concerns: a trade-of between data representation and small number of clusters • It is not possible to choose the right one even if we can define an objective function, the data may badly fit with that definition, in real cases. • An example is provided by our experience with the Dunn’s Separation Index, which was conceived for this particular clustering approach. • Better results (in terms of a subjective evaluation) have been obtained with the following approach: start from the definition of diameter and delta distance • We can obtain from these the corresponding maximum and minimum at level n.

Automatic clustering level selection • We define the Clustering Score at level n as: • The selected level is the one which maximizes the clustering score. (Note that the paper says “minimize”… sigh) • Note that both the maximum diameter and the minimum delta distance between clusters are monotonically increasing, so we can stop the clustering when a local minimum is reached (leading to a computational improvement).

Intra-class clustering with context data • The presented choice of prototypes is guided by how similar the original clips are in the feature space, without considering the elements belonging to the other classes (context data). • This may lead to a prototype selection which is indeed representative of the class but lacks the properties useful for discrimination purposes.

Intra-class clustering with context data • We define an isolation coefficient for each clip as • Then we can introduce a class based dissimilarity measure between two clips as: • Even if the central points (B,C) are closer each other than to the corresponding colored ones (A and D respectively), the interposed purple distribution largely increases their dissimilarity measure, preventing their merge in a single cluster.

Intra-class clustering with context data

ONTOLOGIES IN MPEG-7 <Description xsi:type="ModelDescriptionType"> <Model xsi:type="CollectionModelType"> <Label href="urn:mpeg:mpeg7:cs:OntologiaMpeg7:CameraCar"/> <Collection xsi:type="ContentCollectionType" id="prototype0"> …………………………… <Collection xsi:type="ContentCollectionType" id="prototype0"> …………………………… </Model> </Description> • Ontologies may be effectively defined with OWL, but this language does not contain any construct for including a pictorial representation. On the other hand, such feature is present in the MPEG-7 standard. MPEG-7 has much less sophisticated tools for knowledge representation, since its purpose of standardization limits the definition of new data types, concepts and complex structures. Nevertheless, the MPEG-7 standard can naturally include pictorial elements such as objects, key-frames, clips and visual descriptors in the ontology description. • Therefore, our system stores the pict-en ontologyfollowing the directions of the MPEG-7 standard, and in particular uses a double description provided by the combined with a which includes a Collection ModelType DS ModelDescriptionType DS ClassificationSchemeDescriptionType DS <Description xsi:type="ClassificationSchemeDescriptionType"> <ClassificationScheme uri="urn:mpeg:mpeg7:cs:OntologiaMpeg7"> <Term termID="CameraCar"/> <Term termID="External car view "/> <Term termID="Spectators"/> <Term termID="People"/> </ClassificationScheme> </Description>

Example results

Semi automatic annotation analysis (annotatate and correct)

Screenshot of the classification scheme manager window of the semi-automatic annotation framework, showing the classification scheme together with the selected prototypes.

Conclusions and future work • We presented a system for the creation of a specific domain ontology, enriched with visual features and references to multimedia objects. • The ontology is stored in MPEG-7 compliant format, and can be used to annotate new videos. • This approach allows a system to behave differently by simply providing a different ontology, thus expanding its applicability to mixed sources Digital Libraries. • We are working on the extension of the similarity measures from key frames to clip matching, by means of the Mallows distance (a linear programming based distance). • Other features are being considered, and extensive tests are performed, in order to assess the scalability of the proposed approach.

Costantino Grana , Roberto Vezzani, Rita Cucchiara

Costantino Grana , Roberto Vezzani, Rita Cucchiara

Presentation Transcript

Rita

Rita Rodrigues

Rita Cucchiara, Università di Modena e Reggio Emilia,

Caroline Toney* and Costantino Vetriani

Rita Moreno

Maryann Cucchiara macucchiara@gmail

Cesare Costantino

Costantino Baum

Educating Rita

Salvatore Costantino University of Palermo

Flavia Rita

Flávia Rita

Flávia Rita

Flávia Rita