Video Indexing and Retrieval using an MPEG7 Based Inference Network

1. 1 Video Indexing and Retrieval using an MPEG7 Based Inference Network Andrew Gravesandrew@dcs.qmul.ac.uk

2. 2

3. 3 Introduction

4. 4 Project Aims Metadata based retrieval using MPEG7 Assume we have the metadata Build a modular retrieval system{Video analysis -> MPEG7 -> Video retrieval} Exploit MPEG7 structure, context and concepts

5. 5 Background Information Retrieval IR Models {Inference Network model} Text retrieval: Indexing & Retrieval; Term Statistics Structured Information Retrieval at QM & Dortmund Multimedia MPEG 1/2/4/7 MPEG7, �Multimedia Content Description Interface� Video Indexing and Retrieval {Annotation, Content, Metadata} based approaches Assume we have the Metadata {Shot/Scene detection} Feature extraction / Acquisition of Semantics

6. 6 MPEG7 { Description Definition Language (DDL), Descriptor (D) and Description Schemes (DS) Just another XML format

7. 7 Inference Network Model Probabilistic Framework for IR that uses a Bayesian Network (so based on proven statistical theory) Complete Network = Document Network + QueryNetwork+ Attachment + Evaluation Complete Network used to estimate the �probability of relevance� for each Document Node

8. 8 Positioning Inference Network Allows a �combination of evidence� Allows hierarchical document nodes (structure) MPEG7 Structural, conceptual & contextual info So, we process DSs and Ds to form IN

9. 9 In Other Words... Build a Document Network that represents all of the Ds (concepts) and DSs (structure) Attach a Query Network and evaluate

10. 10 MPEG7 Collection

11. 11 Collection

12. 12 Annotations �Abstract� from box �StructuredAnnotation� for each scene to specify exactly participants and location �FreeTextAnnotation� to describe action �FreeTextAnnotation� with speech extracts

13. 13 MPEG7 Excerpt #1 <AudioVisual id="Communication Problems"> <MediaInformation/> <MediaProfile/> <CreationInformation> <Creation> <Title>Communication Problems </Title> <Abstract> <FreeTextAnnotation> It's not a wise man who entrusts his furtive winnings on the horses to a geriatric Major, but Basil bas never known for that quality. Parting with those ill gotten gains was Basil's first mistake; his second was to tangle with the intermittently deaf Mrs Richards. </FreeTextAnnotation> </Abstract> <Creator>BBC</Creator> </Creation> <Classification> <Genre>Comedy</Genre> <Language>English</Language> </Classification> </CreationInformation>

14. 14 MPEG7 Excerpt #2 <SegmentDecomposition decompositionType="temporal" gap="true" id="TableOfContent" overlap="false"> <Segment id="A satisfied customer" xsi:type="AudioVisualSegmentType"> <TextAnnotation> <FreeTextAnnotation>Basil receives a tip on a horse from a customer. Sybil warns Basil not to bet. Basil says Sybil is a dragon to Polly.</FreeTextAnnotation> <StructuredAnnotation> <Who>Basil,Sybil,Major,Polly</Who> <Where>Lobby</Where> </StructuredAnnotation> </TextAnnotation> <SegmentDecomposition decompositionType="temporal" gap="true" overlap="false"> <Segment id="Shot_1" xsi:type="AudioVisualSegmentType"> <TextAnnotation><FreeTextAnnotation> Glad you enjoyed it. Polly will you get Mr Firkins bill please. </FreeTextAnnotation></TextAnnotation> <MediaTime><MediaIncrDuration timeUnit="PT1N25F">86</MediaIncrDuration> </MediaTime> </Segment> </SegmentDecomposition> <MediaTime> <MediaIncrDuration timeUnit="PT1N25F">3028</MediaIncrDuration> </MediaTime>

15. 15 Model

16. 16 Model Overview Document Network (built during indexing) Static, contains information about the collection Query Network (built during retrieval) Query Language based upon INQUERY Statistical operators (and approximations of Boolean) Attachment process Builds the �Complete Network� Create DN->QN links where concepts are the same Evaluation process Calculate probability of relevance for each element

17. 17 Document Network Document Node layer. Created from MPEG7 structural aspects Context Node layer. Provides contextual information Concept Node layer. Contains all the contents present in collection

18. 18 Query Network 1 Query text is parsed to produce Query tree Inverted DAG with a single final node Terms & Operators Boolean Operators: #and #or #not Statistical Operators: #sum #wsum #max Constraints: #constraint #tree

19. 19 {No; Simple; Complex} constraints #constraint and #tree Query Network 2

20. 20 Attachment Attachment creates DN->QN links (at concept level) Find candidate links & then consider constraints Strength of link can be determined by closeness of match Perform Tree Matching to find �Edit Distance� (ED) Use ED by a) testing against threshold, b) reduce weight

21. 21 Evaluation After attachment we have formed the Complete Network This is evaluated for every Document Node and resultant probabilities are used for ranking All nodes required are evaluated using 1) Value of parents nodes 2) Conditional probabilities Nodes may inherit parental contexts (Link Inheritance) The parents outside the constraint may be ignored (Path Cropping)

22. 22 Extraction Structural Extraction. About the hierarchical makeup. Attribute Extraction. Data about the structural elements. Concept Extraction. Obtain the concepts that appear. Text preprocessing Luhn�s Analysis, Term Statistics

23. 23 Probability Estimation Probability document is relevant to the query Conditional probabilities between the nodes Context->Context (eg: Video->Scene) Context->Concept

24. 24 Experiments

25. 25 Experiment Overview Software written in {C++ NT} Not using INQUERY 1. Basic. Does the model work at all? 2. Real Data. Does the model work with our real metadata collection? 3. Metrics. What are the precision/recall metrics?

26. 26 Remember... Link Inheritance (LI) Link Degradation (LID) Tree Matching (TM) Threshold (TMT): The attachment is made only if the constraint is met, and if the Edit Distance is below the specified threshold. Weighted (TMW): The attachment is made if the constraint is met. The Edit Distance is used as a weight upon the DN->QN link. Path Cropping (PC)

27. 27 Representations <Root> <Operator Type="WSUM"> <Concept weight="0.2">breakfast </Concept> <Concept weight="0.8">view </Concept> </Operator> </Root> <Root> <Operator Type="AND"> <Concept> <Text>BBC</Text> <Constraint>Creation </Constraint> </Concept> <Concept>Basil</Concept> </Operator> </Root>

28. 28 Experiment 1 <Root> <Video id='Video1' Duration='1000' weight='1.000000'> <CreationInformation weight='0.750000'> <Creation weight='1.000000'> <Concept weight='0.8' cid='1'>banana</Concept> </Creation> </CreationInformation> <MediaInformation weight='0.750000'/> <Scene id='Scene1' KeyFrame='none.jpg' Duration='400' weight='0.700000'> <Shot id='Shot1' KeyFrame='none.jpg' Duration='100' weight='0.625000'/> <Shot id='Shot2' KeyFrame='none.jpg' Duration='300' weight='0.875000'> <Concept weight='0.7' cid='1'>banana</Concept> </Shot> </Scene> <Scene id='Scene2' Duration='600' weight='0.800000'/> </Video> <Video id='Video2' weight='1.000000'/> <Video id='Video3' weight='1.000000'/> </Root> <Root> <Concept> <Text>banana</Text> <Constraint>CreationInformation</Constraint> </Concept> </Root>

29. 29 Experiment 1 Model works Different levels of document granularity (Video/Scene/Shot) retrieved in same list Parameters work but unclear if they help

30. 30 Experiment 2 Model worked with real collection to produce real results Results were as expected given knowledge of material

31. 31 Experiment 3 Recall/Precision metrics calculated Rank in results list (not result rank) used for analysis Ten best Video/Scene/Shot chosen by author Ranking seems good: 6/10 required in top 10 All 10 within top 93 (out of 362 in total) Figures suggest that the model working effectively although this is not conclusive

32. 32 Discussion Size of collection too small to produce significant results. No known MPEG7 collections. No independent queries with relevance assessments exist (obviously) Software efficiency crucial - simplifying assumptions can be made to ensure that the IN is computationally viable. Size of computation is not proportional to size of collection.

33. 33 Concluding Remarks

34. 34 Concluding Remarks MPEG7 was found to contain useful �tools� Model for VIR developed Based on Inference Network, Built from MPEG7 files Indexing captures structure, context and concepts Retrieval done using Terms, Operators and Constraints Model parameters devised Results suggest that approach taken well founded although lack of data is problematic

35. 35 Next... Build an independent MPEG7 collection with relevance assessments etc! Automatic methods for generating metadata Eliminate bias, Increase consistency, Improve quality Feature extraction etc. to produce Simple Semantics Solve the Semantic Gap issue Build metadata based models that exploit contextual information Assume contextual information can help retrieval Assume we have good metadata Efficiency of the evaluation vital

36. 36 The End

Video Indexing and Retrieval using an MPEG7 Based Inference Network

Video Indexing and Retrieval using an MPEG7 Based Inference Network

Presentation Transcript

Content-Based Video Retrieval System

Content-based Video Indexing, Classification & Retrieval

Information Retrieval: Indexing

Video Indexing and Retrieval using an MPEG7 Based Inference Network

Inference Network Approach to Image Retrieval

Indexing and Retrieval Semantic Search

Indexing and Retrieval

Image indexing and Retrieval Using Histogram Based Methods,

Video Indexing and Modeling

Search Strategies based on cluster-based indexing and retrieval sophiasearch

Video indexing and retrieval at TREC 2002

Image Indexing and Retrieval using Moment Invariants

Edge-based Network Modeling and Inference

Embedded Helicopter Heading Control using an Adaptive Network-Based Fuzzy Inference System

Concept-based Image and Video Retrieval

Content-Based Video Retrieval System

Indexing & retrieval

Content-based Video Indexing, Classification & Retrieval

Video Indexing and Retrieval using an MPEG7 Based Inference Network