CIS750 – Seminar in Advanced Topics in Computer ScienceAdvanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction
General Overview • Multimedia data • Multimedia data applications • Access of different media types • Operations on multimedia data • Infrastructure to support operations • Content: Extracting media content; indexing; queries • Physical storage: storing media data • Creating presentations: Creating media presentations (in response to queries); delivering them to users at remote sites
Media types • Text/Document • Image • Video • Audio • “Traditional” data (e.g., relations, flat files, object bases, etc) • Video and audio differ from the other media types (temporal info): • retrievals must be continuous • support fast-forward, rewind, pause
Multimedia DBMS • A framework that manages different types of data potentially represented in a wide diversity of formats on a wide array of media sources • Abilities of Multimedia DBMSs (MMDBMSs) • Uniformly query data represented in different formats • Simultaneously query different media sources and apply traditional (classical) database operations on them • Retrieve media objects from local storage devices in a smooth jitter free (i.e., continuous) manner • Develop a presentation of the answer generated by a query and deliver the presentation in way that satisfies Quality of Service
Sample Multimedia Scenario • Scenario: A police investigation of a large-scale drug operation • Types of data: • Video: from surveillance cameras that record activities at various locations • Audio: from legally authorized telephone wiretaps • Images: from still photographs taken by investigators • Documents: seized by the police • Structured relational data: containing background information, bank records, etc. • Geographical information systems data: containing geographic data relevant to the investigation
Example Images Queries for Multimedia scenario • Query 1: • Retrieve all images from the image library in which the person appearing in the (currently displayed) photograph appears • Query 2: • Retrieve all images from the image library in which Dennis Dopeman appears
Example Image Queries: Issues • Image based vs keyword-based queries • Output: a ranked list of images that are “similar” to the query image • Similarity? • Ranking? • Efficiency in supporting these queries • Association of different attributes with images(or parts of images) • Indexing and retrieving images effectively
Example Audio Query for Multimedia Scenario • Query 1: • Listening to a tape containing a conversation between individual A (person under surveillance) and individual B (somebody meeting person A)…. Query: Find the identity of individual B, given that individual A is Denis Dopeman • Query 2: • Find all audio tapes in which Denis Dopeman was a participant
Example Text/Video Query • Text Query: Find all documents (from the corpora of text documents, e.g., newspaper archives, police dept. files on old, unsolved cases, witness statements, etc) that deal with the Cali drug cartel’s financial transactions with ABC Corp. • Video Query: Find all video segments in which the victim of the assault appears.
Heterogeneous Query • Complex queries will “mix and match” data from different media sources • Textual example: Find all individuals who have been convicted of attempted murder in North America and who have recently has electronic fund transfers made into their bank accounts from ABC Corp. • Answering this query is problematic: • May requiring accessing a wide variety of databases belonging to different police jurisdictions, etc • ABC Corp. may have accounts in hundreds of banks worldwide each using different formats and database systems
Heterogeneous Multimedia Query • Find all individuals who have been photographed with Jose Orojuelo and who have been convicted of attempted murder in North America and who have recently has electronic fund transfers made into their bank accounts from ABC Corp. • This query requires: • Accessing a database containing names and pictures of various individuals • Accessing photograph database of still images, surveillance video database • Accessing image processing algorithms to determine who occurs in which video/still photograph
Multimedia Research Issues: Queries • A single language within which multimedia data of different types can be accessed • Be able to specify combination operations across different media types (in addition to across different relations through “join”, “union”, “difference”, etc) • Be able to access “metadata” (describing the content of different media sources) and “raw” data supported by different media sources • Be able to merge, manipulate, and “join” results from different media sources • Optimize a single query or a set of queries
Multimedia Research Issues: Content • What is the content of a media source? Under what conditions can content be described textually and under what conditions must it be described directly through the original media type? • How should we extract the content of: an image, video-clip, audio-clip, free/structured document? • How to index the results of the extracted content? • How to efficiently retrieve media data on the basis of similarity? • How one had to design a multimedia database • How should queries be “relaxed” so that not only the original but also “similar” queries get processed? • What are efficient algorithms for processing these queries?
Multimedia Research Issues: Storage • How do the following (standard) storage devices work? • Disk systems • CD-ROM systems • Tape systems and tape libraries • How is data laid out on such devices? • How do we design disk/CD-ROM/tape servers so as to optimally satisfy different clients concurrently, when the operations are: • Playback • Rewind • Fast forward • Pause
Multimedia Research Issues: Presentations and Delivery • How to specify the content of multimedia presentations? • How to specify the form (temporal/spatial layout) of this content? • How to create a presentation schedule that satisfies these temporal/spatial presentation requirements? • How to deliver a multimedia presentation to users when: • There is a need to interact with remote servers to assemble the presentation • There are bounds on the bandwidth, load and other resources • There is a mismatch between host servers’ capabilities and customer’s machine capabilities • How to achieve Quality of Service?