Video Indexing and Retrieval using an MPEG7 Based Inference Network - PowerPoint PPT Presentation

Jims
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Video Indexing and Retrieval using an MPEG7 Based Inference Network PowerPoint Presentation
Download Presentation
Video Indexing and Retrieval using an MPEG7 Based Inference Network

play fullscreen
1 / 31
Download Presentation
Video Indexing and Retrieval using an MPEG7 Based Inference Network
206 Views
Download Presentation

Video Indexing and Retrieval using an MPEG7 Based Inference Network

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. 1 Video Indexing and Retrieval using an MPEG7 Based Inference Network Andrew Graves andrew@dcs.qmul.ac.uk

    2. 2

    3. 3 Introduction

    4. 4 Project Aims Metadata based retrieval using MPEG7 Assume we have the metadata Build a modular retrieval system {Video analysis -> MPEG7 -> Video retrieval} Exploit MPEG7 structure, context and concepts

    5. 5 Background Information Retrieval IR Models {Inference Network model} Text retrieval: Indexing & Retrieval; Term Statistics Structured Information Retrieval at QM & Dortmund Multimedia MPEG 1/2/4/7 MPEG7, Multimedia Content Description Interface Video Indexing and Retrieval {Annotation, Content, Metadata} based approaches Assume we have the Metadata {Shot/Scene detection} Feature extraction / Acquisition of Semantics

    6. 6 MPEG7 { Description Definition Language (DDL), Descriptor (D) and Description Schemes (DS) Just another XML format

    7. 7 Inference Network Model Probabilistic Framework for IR that uses a Bayesian Network (so based on proven statistical theory) Complete Network = Document Network + QueryNetwork + Attachment + Evaluation Complete Network used to estimate the probability of relevance for each Document Node

    8. 8 Positioning Inference Network Allows a combination of evidence Allows hierarchical document nodes (structure) MPEG7 Structural, conceptual & contextual info So, we process DSs and Ds to form IN

    9. 9 In Other Words... Build a Document Network that represents all of the Ds (concepts) and DSs (structure) Attach a Query Network and evaluate

    10. 10 MPEG7 Collection

    11. 11 Collection

    12. 12 Annotations Abstract from box StructuredAnnotation for each scene to specify exactly participants and location FreeTextAnnotation to describe action FreeTextAnnotation with speech extracts

    13. 13 MPEG7 Excerpt #1 <AudioVisual id="Communication Problems"> <MediaInformation/> <MediaProfile/> <CreationInformation> <Creation> <Title>Communication Problems </Title> <Abstract> <FreeTextAnnotation> It's not a wise man who entrusts his furtive winnings on the horses to a geriatric Major, but Basil bas never known for that quality. Parting with those ill gotten gains was Basil's first mistake; his second was to tangle with the intermittently deaf Mrs Richards. </FreeTextAnnotation> </Abstract> <Creator>BBC</Creator> </Creation> <Classification> <Genre>Comedy</Genre> <Language>English</Language> </Classification> </CreationInformation>

    14. 14 MPEG7 Excerpt #2 <SegmentDecomposition decompositionType="temporal" gap="true" id="TableOfContent" overlap="false"> <Segment id="A satisfied customer" xsi:type="AudioVisualSegmentType"> <TextAnnotation> <FreeTextAnnotation>Basil receives a tip on a horse from a customer. Sybil warns Basil not to bet. Basil says Sybil is a dragon to Polly.</FreeTextAnnotation> <StructuredAnnotation> <Who>Basil,Sybil,Major,Polly</Who> <Where>Lobby</Where> </StructuredAnnotation> </TextAnnotation> <SegmentDecomposition decompositionType="temporal" gap="true" overlap="false"> <Segment id="Shot_1" xsi:type="AudioVisualSegmentType"> <TextAnnotation><FreeTextAnnotation> Glad you enjoyed it. Polly will you get Mr Firkins bill please. </FreeTextAnnotation></TextAnnotation> <MediaTime><MediaIncrDuration timeUnit="PT1N25F">86</MediaIncrDuration> </MediaTime> </Segment> </SegmentDecomposition> <MediaTime> <MediaIncrDuration timeUnit="PT1N25F">3028</MediaIncrDuration> </MediaTime>

    15. 15 Model

    16. 16 Model Overview Document Network (built during indexing) Static, contains information about the collection Query Network (built during retrieval) Query Language based upon INQUERY Statistical operators (and approximations of Boolean) Attachment process Builds the Complete Network Create DN->QN links where concepts are the same Evaluation process Calculate probability of relevance for each element

    17. 17 Document Network Document Node layer. Created from MPEG7 structural aspects Context Node layer. Provides contextual information Concept Node layer. Contains all the contents present in collection

    18. 18 Query Network 1 Query text is parsed to produce Query tree Inverted DAG with a single final node Terms & Operators Boolean Operators: #and #or #not Statistical Operators: #sum #wsum #max Constraints: #constraint #tree

    19. 19 {No; Simple; Complex} constraints #constraint and #tree Query Network 2

    20. 20 Attachment Attachment creates DN->QN links (at concept level) Find candidate links & then consider constraints Strength of link can be determined by closeness of match Perform Tree Matching to find Edit Distance (ED) Use ED by a) testing against threshold, b) reduce weight

    21. 21 Evaluation After attachment we have formed the Complete Network This is evaluated for every Document Node and resultant probabilities are used for ranking All nodes required are evaluated using 1) Value of parents nodes 2) Conditional probabilities Nodes may inherit parental contexts (Link Inheritance) The parents outside the constraint may be ignored (Path Cropping)

    22. 22 Extraction Structural Extraction. About the hierarchical makeup. Attribute Extraction. Data about the structural elements. Concept Extraction. Obtain the concepts that appear. Text preprocessing Luhns Analysis, Term Statistics

    23. 23 Probability Estimation Probability document is relevant to the query Conditional probabilities between the nodes Context->Context (eg: Video->Scene) Context->Concept

    24. 24 Experiments

    25. 25 Experiment Overview Software written in {C++ NT} Not using INQUERY 1. Basic. Does the model work at all? 2. Real Data. Does the model work with our real metadata collection? 3. Metrics. What are the precision/recall metrics?

    26. 26 Remember... Link Inheritance (LI) Link Degradation (LID) Tree Matching (TM) Threshold (TMT): The attachment is made only if the constraint is met, and if the Edit Distance is below the specified threshold. Weighted (TMW): The attachment is made if the constraint is met. The Edit Distance is used as a weight upon the DN->QN link. Path Cropping (PC)

    27. 27 Representations <Root> <Operator Type="WSUM"> <Concept weight="0.2">breakfast </Concept> <Concept weight="0.8">view </Concept> </Operator> </Root> <Root> <Operator Type="AND"> <Concept> <Text>BBC</Text> <Constraint>Creation </Constraint> </Concept> <Concept>Basil</Concept> </Operator> </Root>

    28. 28 Experiment 1 <Root> <Video id='Video1' Duration='1000' weight='1.000000'> <CreationInformation weight='0.750000'> <Creation weight='1.000000'> <Concept weight='0.8' cid='1'>banana</Concept> </Creation> </CreationInformation> <MediaInformation weight='0.750000'/> <Scene id='Scene1' KeyFrame='none.jpg' Duration='400' weight='0.700000'> <Shot id='Shot1' KeyFrame='none.jpg' Duration='100' weight='0.625000'/> <Shot id='Shot2' KeyFrame='none.jpg' Duration='300' weight='0.875000'> <Concept weight='0.7' cid='1'>banana</Concept> </Shot> </Scene> <Scene id='Scene2' Duration='600' weight='0.800000'/> </Video> <Video id='Video2' weight='1.000000'/> <Video id='Video3' weight='1.000000'/> </Root> <Root> <Concept> <Text>banana</Text> <Constraint>CreationInformation</Constraint> </Concept> </Root>

    29. 29 Experiment 1 Model works Different levels of document granularity (Video/Scene/Shot) retrieved in same list Parameters work but unclear if they help

    30. 30 Experiment 2 Model worked with real collection to produce real results Results were as expected given knowledge of material

    31. 31 Experiment 3 Recall/Precision metrics calculated Rank in results list (not result rank) used for analysis Ten best Video/Scene/Shot chosen by author Ranking seems good: 6/10 required in top 10 All 10 within top 93 (out of 362 in total) Figures suggest that the model working effectively although this is not conclusive

    32. 32 Discussion Size of collection too small to produce significant results. No known MPEG7 collections. No independent queries with relevance assessments exist (obviously) Software efficiency crucial - simplifying assumptions can be made to ensure that the IN is computationally viable. Size of computation is not proportional to size of collection.

    33. 33 Concluding Remarks

    34. 34 Concluding Remarks MPEG7 was found to contain useful tools Model for VIR developed Based on Inference Network, Built from MPEG7 files Indexing captures structure, context and concepts Retrieval done using Terms, Operators and Constraints Model parameters devised Results suggest that approach taken well founded although lack of data is problematic

    35. 35 Next... Build an independent MPEG7 collection with relevance assessments etc! Automatic methods for generating metadata Eliminate bias, Increase consistency, Improve quality Feature extraction etc. to produce Simple Semantics Solve the Semantic Gap issue Build metadata based models that exploit contextual information Assume contextual information can help retrieval Assume we have good metadata Efficiency of the evaluation vital

    36. 36 The End