BIT 3193 MULTIMEDIA DATABASE. CHAPTER 3 : MULTIMEDIA INFORMATION MODELING. Metadata for Multimedia. Metadata. Media objects such as audio, image and video are binary, unstructured, and uninterpreted.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 3 : MULTIMEDIA INFORMATION MODELING
Metadata • Media objects such as audio, image and video are binary, unstructured, and uninterpreted. • Multimedia databases have to derive and store interpretation based on the content of these media objects. • These interpretations are termed metadata.
Metadata • Metadata is the description of the media object’s content which is: • subjective in nature • dependent on the media type as well as the role of an application • Has to be generated automatically or semi-automatically from the media information.
Metadata Generation can be based on Intra-Media Inter-Media Deals with the interpretation of information on multiple media Deals with the interpretation of information within the media
i1 i2 i3 in Inter-media Metadata Extracting Function F Media Metadata Extracting Function (f1) (f2) (fn) Intra-media Metadata i1 i2 in i1 i2 in --------- ... .. Figure 3.1 : Intra-media and Inter-media Metadata
Metadata Classification • Figure 3.1 describes the ways of generation of metadata from various media objects. • Here the extracting functions f1, f2, …, fn work on the individual media and generate the intra-media data i1, i2, …, in. • The extracting function F generates inter-media metadata, I1, I2, …, In, by working on all the composed media. • The functions applied to extract metadata can be dependent or independent of the contents of media objects.
Metadata Classification Still be useful for document retrieval and information checking Content Dependent Metadata Content Independent Metadata • Depends only on the content of media objects. • Example: • derivation of facial features of a person’s photographic image (ex: type of nose) • derivation of camera operations in a video clip • Does not depend on the contents of the media information. • Example: • name of the photographer who took the picture • budget of the movie • author who created a multimedia document
Content Dependent Metadata Direct Content Based Metadata Content Descriptive Metadata • This type of metadata describes the contents of a document with direct utilization of those contents. • This type of metadata always involves use of knowledge of human perception. • Example: • fragrance of an image containing a flower • It is based directly on the content of a document. • Example: • Full-text indices based on text of the document such as inverted tree and document vectors.
Content Descriptive Metadata Domain Independent Metadata Domain Specific Metadata • Described in a manner specific to the application or subject domain of information. • Issues of vocabulary become very important as the terms have to be chosen in a domain specific manner. • Example: • relief, land-cover from the GIS • Capture information present in the document independent of the application or subject domain of the information. • Example: • C++ parse tree and HTML/SGML document type definition.
Metadata Generation Methodologies • The generation of metadata is done by applying feature extracting function on media objects. • These feature extracting function employ content dependent, content independent and content descriptive techniques. • These techniques use a set of terminologies that represents an application’s view as well as the contents of media objects. • These set of terminologies refer to ontologies, create a semantic space onto which metadata is mapped.
Metadata Generation Methodologies • The ontologies used for derivation of metadata are: • media dependent ontologies • media independent ontologies • metacorrelation
It refers to concepts and relationships • that are specific to a particular media • type, such as text, image or video. • Example: • features such as color or texture can • can be applied to image data • features such as silence periods can • be applied only to audio data Media Dependent Ontologies
These ontologies describe the • characteristics that are independent • of the content of media objects. • Example: • the ontologies corresponding to the • time creation, location, owner of a • media object are media • independent. Media Independent Ontologies
Metadata associated with different • media objects have to be correlated to • provide a unified picture of a • multimedia database. • This unified picture called query • metadata is modeled for the needs of • database users based on application- • dependent ontologies. Meta- correlation
Example: • Consider the following query on GIS • database on the Himalayan mountain • ranges: • “ Get me images of the peaks that are at l • east 20000 feet in height, showing at least 5 • people climbing the peaks” • Here a correlation has to be established • between the height of the peaks in the • Himalayans and their available images. Meta- correlation
Metadata Generation Methodologies • Figure 3.2 shows the various stages in the process of • generation of metadata. • The media pre-processor helps in identifying the contents • interest in the media information. • For each media, a separate pre-processor is required to identify the contents of interest.
QUERY METADATA Inter-media Metadata Metacorrelation Text Metadata Audio Metadata Image Metadata Video Metadata Media Independent ONTOLOGIES Media Dependent Media Dependent Media Dependent Media Dependent Media Pre-processor Media Pre-processor Text Audio Image Video Physical Storage View Figure 3.2 : Ontologies and Metadata Generation
Metadata For Text • Text is often represented as a string of characters, stored in formats such as ASCII. • Text can also be stored as digitized images, scanned from the original text document. • Text has a logical structure based on the type of information it contains. • Example: • if the text is a book, the information can be logically structured into chapters, sections, etc.
Metadata For Text • Metadata for text describes this logical structure, and the generation of metadata for text involves identifying its logical structure. • Text that is keyed into a computer system normally uses a language to represent the logical structure. • In the case of scanned text images, mechanisms is needed for identifying columns, paragraphs, semantic information and for locating the keywords. Metadata for text images
Document Content Description Types of Text Metadata Document History Document Location Representation of Text Data
This metadata provides additional • information about the content of the • document. • Example: • a document on global warming can • be described as a document in • environmental sciences. Document Content Description
This metadata describes the format, • coding and compression techniques • that is used to store the data. • Example: • the language using which the text • has been formatted is a metadata. • This metadata is content descriptive Representation Of Text Data
This metadata describes the history of • creation of the document. • This metadata is also content- • descriptive. • The components described as part of • the history could include: • Status of the document • Date of update of the document • Components of the older document • which have been modified. Document History
This content independent metadata • describes the location in a computer • network that holds the documents. Document Location
Generating Text Metadata Implicit Generation Explicit Generation • when creating the raw media data without further analyzing it. • For instance: • digital camera can implicitly deliver time and date for pictures of video taken. • SGML editor generates metadata according to the document type definition • depends on the analysis of the media object. • this is done very often off-line • after having recorded and stored the medium & prior to any processing which makes use of data. • reason : tremendous cost for extracting the image’s features.
Generating Text Metadata • SGML describes typographical annotations as well as the structure of the document. • Markup is a notion from the publishing world. • Manuscript are annotated with a pencil or a marker to indicate the layout. • The markups or the structure provided by the author alone may not sufficient in many instances. • Hence, automatic or semi-automatic mechanisms might be required to identify metadata such as topic and subtopic boundaries.
Generating Text Metadata • Such mechanisms are especially needed when text is present in the form of scanned images. • Here, we discuss the following text metadata generation methodologies: • Text Formatting Language : SGML (Implicit Metadata Generation) • Automatic / Semi-automatic Mechanisms (Explicit Metadata Generation) • Subtopic Boundary Location • Word Image Spotting
Texttiling algorithms are used to • partition text information into tiles that • reflects the underlying topic structure. • Texttiling is a technique for • automatically subdividing texts into • multi-paragraph units that represent • passages or subtopics. • The algorithm uses quantitative lexical • analysis to determine the segmentation • of the documents. Subtopic Boundary Location
The algorithm identifies subtopic boundaries by : • Dividing the text into 20 • words adjacent token • sequences. b) Comparing adjacent blocks of token sequences for overall lexical similarity. • In Texttiling, a block of k sentences • is treated as a logical unit. • The frequency of occurrence of a term • within each block is compared to its • frequency in the entire domain. • This helps in identifying the usage of the • term within a discussed topic or in the • entire text. • If the term occurs frequently over the • entire text, then it cannot be used to • identify topics.
The algorithm identifies subtopic boundaries by : • Computing similarity • values for adjacent • blocks. • Determining boundary changes • in the sequence of similarity scores.
In the case of digitized text images, • keywords have to be located in the • document. • The set of keywords that are to be • located can be specified as part of the • application. Word Image Spotting
Typical word spotting systems need to do the following: • Identify a text line by using a bounding box of a standard height and • width. • The concept of multi-resolution morphology is used to identify text • lines using the specified bounding boxes. • b) Identify specific words within the determined text line. • A technique termed Hidden Markov Model (HMM) is used to identify • the specific words in the text line. * A separate sheet will be given for Word Image Spotting using HMM.
Metadata For Speech • The speech media refers to spoken language. • It is considered as part of audio. • The importance of speech processing arises due to its ease use as input/output mechanism for multimedia application. • The metadata needs to be generated can be content dependent or content descriptive.
The metadata generated for speech can be as follows: • Identification of spoken words, it is called speech recognition. • Helps in deciding whether or not a particular speaker produced the • utterance. • It is also termed as verification. • b) Identification of speaker. • Here a person’s identity is chosen from a set of known speaker. • It is called speaker identification or speaker recognition. • b) Identification of prosodic information. • Used for drawing attention to a phrase or sentence or to alter the • word meaning.
Metadata For Speech • Metadata generated as part of speech recognition is content dependent. • This metadata can consist of the start and the end of the speech • along with a confidence level of spoken word identification.
Metadata For Speech • Metadata generated as part of speaker recognition can be considered as content-descriptive • though this metadata is generated by analyzing the content of the speech • this metadata can consist of the name of the speaker, the start and the end time of the speech
Metadata For Speech • Metadata describing the prosodic information can be considered as content dependent. • it can consist of : • the implied meaning in case the speaker altered the word meaning, and • a confidence score of the recognition of the prosodic information.
Metadata For Speech • Content independent metadata can also be associated with speech data. • time of the speech location where the speech was given can be considered as content independent metadata for speech.
Speech Recognition System Has two (2) main components Signal Processing Module Pattern Matching Module
gets the speech analog signal (via • microphone or a recorder), and • digitizes it • the digitized signal is processed to do • the following actions: • detection of silence periods • separation of speech from non- • speech components • conversion of the raw waveform • into a frequency domain • representation and data • compression Signal Processing Module
the stream of such sample speech data • values is grouped into frames of usually • 10-30 ms duration • the aim of this conversion is to retain • only those components that are useful • for recognition purpose • this processed speech signal is used for • identification of the spoken words or • the speaker or prosodic information Signal Processing Module
is done by matching the processed • speech with stored patterns. • The pattern matching module has a • repository of reference patterns that • consists of the following: • Difference utterances of the same • set of words • Difference utterances by the same • speaker • Difference ways of modifying the • meaning of a word Pattern Matching Module
Metadata For Images • Metadata for images depend on : • the type of images that are to be analyzed • the applications that will be using the analyzed data • Metadata that can be used for a few type of images such as: • satellite images • facial images • architectural design images
Metadata For Sattelite Images • The satellite images are treated as 3-D grids (2889 rows, 4578 columns, 10 layers deep) – computer scientist • The perception of earth scientists is to focus on the process that created the images. • from this point of view, the images has 10 bands or layers, each created by a different process
Raster metadata Categories of Satellite metadata Lineage metadata Data set metadata Object Description metadata
describes the grid structure, spatial, and • temporal information • the spatial information describes the • location and overlay of images • the temporal information describes the • time at which the images were taken Raster Metadata