1 / 38

Supervised by Prof. LYU, Rung Tsong Michael

Department of Computer Science & Engineering The Chinese University of Hong Kong. LYU0102 XML for Interoperable Digital Video Library. Supervised by Prof. LYU, Rung Tsong Michael. Prepared by: Chan Pik Wah, Pat Ngai Cheuk Han, Table. Outline. Introduction to XVIP Overview of Project

galeno
Download Presentation

Supervised by Prof. LYU, Rung Tsong Michael

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department of Computer Science & Engineering The Chinese University of Hong Kong LYU0102 XML for Interoperable Digital Video Library Supervised by Prof. LYU, Rung Tsong Michael Prepared by: Chan Pik Wah, Pat Ngai Cheuk Han, Table

  2. Outline • Introduction to XVIP • Overview of Project • Extraction Techniques • Face Detection • Speech Recognition • Multimedia Transformation & Presentation • XSL • SMIL • Transformation • Problems & Solutions • Conclusion

  3. Motivations • Rapid increase in the usage of multimedia information • New approach: DIGITAL VIDEO LIBRARY Project Outline

  4. Motivations • Little attention paying on video information extraction and storage • Scalability of the system in terms of adding new extraction components • Lack of a generic framework for presentation and visualization of video information Project Outline

  5. Overview of XVIP Project Outline

  6. 2 Extraction Techniques Scene Change VOCR Integrate data into XML XML Editor Knowledge Enrichment Achievements in last Semester Project Outline

  7. Achievements in this Semester • 2 more extraction techniques • Face Detection • Speech Recognition • New data integrated to XML • XML to SMIL Transformer Project Outline

  8. Extraction Techniques XML Video Scene Change VOCD Face Detection Speech Recognition Extraction Techniques

  9. Face Detection • Object-presence detections are also an important technique. • Identify and index features to support image similarity matching. Face detection is a good example Extraction Techniques

  10. Face Detection • Name of people appearing in the video • How they are interacting with the environment • More searchable Extraction Techniques

  11. Face Detection • Neural Network-Based Algorithm • The basic algorithm used for face detection Extraction Techniques

  12. Face Detection • Face Recognition • Facial Expression Analysis • Enrich the XML • Easier for user to search the content of video Extraction Techniques

  13. Speech Recognition • Speech recognition technology can make any spoken data useful for library indexing and retrieval Extraction Techniques

  14. Speech Recognition Engine Extraction Techniques

  15. Speech Recognition • ViaVoice • Error rate > 50% Extraction Techniques

  16. Usage of XML Indexing & Searching XML Combine with other XML for Knowledge Enrichment Exchange data with different application Presentation

  17. Presentation of the video data • XML is not presentable without processing • HTML with images, but is static • SMIL is good for multimedia presentation • No existing tools for integrating different XML data into a SMIL presentation • Current transformation language has a lot of limitations in transforming XML to SMIL SMIL

  18. SMIL • SMIL stands for Synchronized Multimedia Integration Language is currently a W3C Recommendation. • It is a markup language that can synchronize and integrate multimedia. • It enables authors to specify when and what should be presented. • RealPlayer, QuickTime, IE support SMIL

  19. Advantages • SMIL is text-based • Easy to develop with a text editor • Generate customized presentations • Generate customized SMIL file based on preferences recorded in the visitor's browser • SMIL effort is led by the W3C • W3C tries to shape a specification that is beneficial to all parties involved. • Avoid using container formats. • SMIL can stream many media formats, no need to merge clips into a single streaming file. SMIL

  20. Sequence element: <seq> <img src="pix/0.jpg" dur="15" region="scene"/> <img src="pix/15.jpg" dur="5" region="scene"/> <img src="pix/20.jpg" dur="7" region="scene"/> <img src="pix/27.jpg" dur="4" region="scene"/> …… </seq> Parallel element: <par> <text src="text/transcript.rt" region="transcript" /> <text src="text/mapdetail.rt" region="mapdetail" /> <video src="news.mpg" region="video" fill="freeze"/> … </par> Timing and Synchronization SMIL

  21. XSL • Stands for “Extensible Stylesheet Language” • XSL is the language defined by the W3C to add formatting information to XML data. • XSLT -- most commonly used XSL standard • Transforms one XML document into another. • Used in our FYP. XSL

  22. Working Principle XSL Stylesheet Source Tree Output XSL

  23. Transformation Process • Output files • A SMIL file • Some RealText files • Input files • XML file generated by XVIP • XML files of additional information Transformation

  24. Design 1 • Disadvantages • Layout of the SMIL presentation need to be hard-coded in the VC++ program. • The layout becomes hard to change and the transformer becomes hard to extend. • Build with VC++ solely • Read all the input files, get the information • Create the output the files for the SMIL presentation. Transformation

  25. Design 1 with modification • Modification • Provide an additional file or interface as a template for user to define the layout of SMIL presentation. • Disadvantage • The flexibility provided is still limited. • Not a standard way to define a template. Transformation

  26. Design 2 • Limitations of XSLT • It can only read one input data file and one XSL file, then generate one output. • It cannot do combin-ation among files. • Use XSLT assistingthe transformation. User can define his own template with XSL. • Advantages • Program-independent • Extensible • Standard templates Transformation

  27. Design 2 Solutions: • Knowledge Enrichment • Combine additional information with the XML file from XVIP before converting to SMIL • Creating output files • Use separate XSL files to generate RealText files • Use separate XSL files to generate layout of the presentation and displaying order of objects in different regions, then combine them to a SMIL file Transformation

  28. Knowledge Enrichment Information of major cities XML file from XVIP Combined XML file Transformation

  29. XML file contains information of major cities that are related to the video. <COMBINE> <TIME begin="10" dur="11"> <NAME>香港</NAME> <DETAIL>中國南部一個沿海城市</DETAIL> <AREA>China</AREA> </TIME> <TIME begin="21" dur="20"> <NAME>紐約</NAME> <DETAIL>隸屬美國紐約州的城市</DETAIL> <AREA>America</AREA> </TIME> </COMBINE> Combined XML file Transformation

  30. Create RealText files Geographical Information Biographical Information Video Transcript Transformation

  31. Create SMIL file Layout Displaying order Transformation

  32. Create SMIL file SMIL Presentation Combining the temporary files Transformation

  33. Problems & Solutions • Problem 1 • The result from XSLT processor is in UTF-8 encoding format, but SMIL needs the format ANSI. • Solution: • Write a function “UTF8toANSI” for conversion. Problems & Solutions

  34. Problems & Solutions • Problem 2 • XSLT has limitation. It can only read one XML, one XSL file and generate one output file. • Our transformation process has more than one input files • Solution: • Do knowledge enrichment and produce a combined XML result file before creating the output files. Problems & Solutions

  35. Conclusion XVIP contains: • Four video information modalities • Scene change detection • VOCD • Speech recognition • Face detection • Information integration module with XML • For storing the extracted video data in XML format Conclusion

  36. Conclusion • XML editor • For editing the XML file generated • Knowledge enrichment component • For adding additional information to the XML-based video data • XML to SMIL transformer • For converting the XML-based video data into SMIL presentation Conclusion

  37. Conclusion XVIP : • provides multiple functions for extracting video information • stores video information in a flexible and scalable way • Comprises a transformer to generate presentation on the information Paper “XVIP: An XML-Based Video Information Processing System”, Michael Lyu, Edward Yau, C.H.Ngai, P.W.Chan, was accepted by COMPSAC 2002. Conclusion

  38. Q & A

More Related