concepts of multimedia processing and transmission n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Concepts of Multimedia Processing and Transmission PowerPoint Presentation
Download Presentation
Concepts of Multimedia Processing and Transmission

Loading in 2 Seconds...

play fullscreen
1 / 46

Concepts of Multimedia Processing and Transmission - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Concepts of Multimedia Processing and Transmission. IT 481, Lecture #1 Dennis McCaughey, Ph.D. 28 August, 2006. Outline. Course Description Instructor Student Survey Exams, Homework and Project Grading General Policies Lecture Schedule. Course Description. Topics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Concepts of Multimedia Processing and Transmission' - may


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
concepts of multimedia processing and transmission

Concepts of Multimedia Processing and Transmission

IT 481, Lecture #1

Dennis McCaughey, Ph.D.

28 August, 2006

outline
Outline
  • Course Description
  • Instructor
  • Student Survey
  • Exams, Homework and Project
  • Grading
  • General Policies
  • Lecture Schedule

IT 481, Fall 2006

course description
Course Description
  • Topics
    • The fundamentals of signal and image processing, including algorithms for signal processing that have applications to multimedia
    • Techniques for voice coding and recognition, CD and DVD technology, streaming video, WANs and LANs, and videoconferencing technology
  • Text: Multimedia Communication Systems: Techniques, Standards, and Networks, K. R. Rao, Zoran S. Bojkovic, Dragorad A. Milovanovic,  Prentice Hall PTR; 1st edition (April 26, 2002), ISBN: 013031398X.

IT 481, Fall 2006

instructor
Instructor
  • Dennis McCaughey
    • Contact Information
      • 703-263-7425 (Office)
      • 703-624-6830 (Cell)
      • dgm@rincon.com (e-mail)
      • Office Hours: one hour before class
    • Background
      • PhD in EE University of Southern California 1977
        • Thesis: Degrees of Freedom for Projection Imaging

IT 481, Fall 2006

student survey
Student Survey
  • Name
  • Contact Information
  • Last Degree along with current Degree Objective i. e.
    • Undergrad seeking Bachelor’s, Grad seeking MS/PhD, Other
  • Mathematical Background
    • Calculus?
    • Differential Equations?
    • Linear Algebra?
    • Probability, Statistics, Random Processes?

IT 481, Fall 2006

student survey cont d
Student Survey Cont’d
  • Systems Background
    • Linear Systems?
    • Signal Processing
    • Image processing
  • Programming Languages
    • C or C++?
    • MATLAB?

IT 481, Fall 2006

exams homework and project
Exams, Homework and Project
  • Mid-Term: 1 Hour Closed Book
    • Cover the key topics covered in class and homework
  • Final: Format “To Be Determined”
  • Homework: 1) Reading assignments, 2) Written answers to selected questions based on reading assignments, 3) Some limited math problems
  • Project: Format (Preliminary): MATLAB implementation of a multimedia processing application.

IT 481, Fall 2006

more on the project
More on the Project
  • A course project will be required exploring aspects of multimedia signal processing which may computer based using MATLAB.
  • Project topics will be of the student’s choice subject to review by the instructor.
  • Each student will also be required to present a short briefing on the results.
  • Projects will be evaluated on the content of the presentation and not on the briefing itself.
  • Details regarding topics, content, and format will be provided during the course.

IT 481, Fall 2006

grading
Grading
  • The final grade will be determined by a weighted average of the homework assignments, a mid-term exam, a final exam and a project

IT 481, Fall 2006

general policies
General Policies
  • Collaboration
    • Students are permitted and encouraged to collaborate on homework assignments. 
    • All graded work, however, must be the original effort of the student submitting the paper. 
  • Homework
    • Homework will be collected at the beginning of each class period.  Note:  Late homework will be accepted provided the reason for the delay is coordinated with the instructor within 2 days of its assignment. Homework solutions will be discussed in class.
  •   Make-up Exams
    • Make-up exams will not be given unless detailed written clarification accompanied by documentation for the absence is provided. If this information is not provided an F grade will be given for the exam. The location and time for a make-up exam will be decided by the instructor. Also, students are expected to be in class and on-time for every class.

IT 481, Fall 2006

what is multimedia
What is Multimedia?
  • Multimedia is a combination of text, art, sound, animation, and video.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

multimedia components simplified
Multimedia Components Simplified

Audio

Multimedia

Video

Data

  • Multimedia can be viewed as they combination of audio, video, data and how they interact with the user (more than the sum of the individual components)

IT 481, Fall 2006

background
Background
  • Fast paced emergence in applications in medicine, education, travel etc
  • Characterized by large documents that must be communicated with short delays
  • Glamorous applications such as distance learning, video teleconferencing
  • Applications that are enhanced by Video are often seen as driver for development of multimedia networks

IT 481, Fall 2006

forces driving communications that facilitate multimedia communications
Forces Driving Communications That Facilitate Multimedia Communications
  • Evolution of communications and data networks
  • Increasing availability of almost unlimited bandwidth demand
  • Availability of ubiquitous access to the network
  • Ever increasing amount of memory and computational power
  • Sophisticated terminals
  • Digitization of virtually everything

IT 481, Fall 2006

new information system paradigm
New Information System Paradigm

Multimedia

Integrated

Communication

Broadband Link

Integration

Multimedia

Processing

Workstation, PC

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

elements of multimedia systems
Elements of Multimedia Systems

Use

Interface

Use

Interface

Transport

Processing

Storage and

Retrieval

Use

Interface

Transport

  • Two key communication modes
    • Person-to-person
    • Person-to-machine

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

multimedia networks
Multimedia Networks
  • The world has been wrapped in copper and glass fiber and can be viewed as a “hair ball” with physical, wireless and satellite entry/exit points.
  • Physical: LAN-WAN connections
  • Wireless: Cellular telephony, wireless PC connectivity
  • Satellite: INMARSAT, THURYA, ACeS etc

IT 481, Fall 2006

multimedia communication model
Multimedia Communication Model
  • Partitioning of information objects into distinct types, e.g., text, audio, video
  • Standardization of service components per information type
  • Creation of platforms at two levels – network service and multimedia communication
  • Define general applications for multiple use in various multimedia environments
  • Define specific applications, e.g. e-commerce, tele-training, … using building blocks from platform and general applications

IT 481, Fall 2006

requirements
Requirements
  • User Requirements
    • Fast preparation and presentation
    • Dynamic control of multimedia applications
    • Intelligent support to users
    • Standardization
  • Network Requirements
    • High speed and variable bit rates
    • Multiple virtual connections using the same access
    • Synchronization of different information types
    • Suitable standardized services along with support

IT 481, Fall 2006

network requirements
Network Requirements
  • ATM-BISDN and SS7 have enabled the switching based communications capabilities over the PSTN that support the necessary services
  • ATM-BISDN-SS7 will evolve to all optical “switchless” networks based on packet transfer

IT 481, Fall 2006

packet transfer concept
Packet Transfer Concept
  • Allows voice, video and data to be dealt with in a common format
  • More flexible than circuit switching which it can emulate while allowing the multiplexing of varied bit rate data streams
  • Dynamic allocation of bandwidth
  • Handle Variable Bit Rate (VBR) directly

IT 481, Fall 2006

considerations
Considerations
  • Buffering required for constant bit rate data such as audio
  • Re-sequencing and recovery capabilities must be provided over networks where packets may be received either in an order different from that transmitted or dropped
    • In an ATM network some packets can be dropped while others may not (i.e. voice vs bank transfer data packets)
    • Optimum packet lengths for voice video and data differ in an ATM network
    • IP packets over the internet may arrive in a different order or be dropped.

IT 481, Fall 2006

digital video signal transport
Digital Video Signal Transport
  • Decoder
  • De-quantization
  • Entropy decode
  • Inv Trans
  • Loss conceal
  • Post process
  • Encoder
  • Transformation
  • Quantization
  • Entropy Coding
  • Bit-Rate Control
  • Application
  • Data Structuring
  • Application
  • Re-Synch

Network Multiplexing/Routing

Video

Users

  • Error detection
  • Loss detection
  • Error correction
  • Erasure correction
  • Overhead (FEC)
  • Re-Trans

IT 481, Fall 2006

quality of service qos
Quality of Service (QoS)
  • The set of parameters that defines the properties of media streams
  • Can define four QoS layers:
    • User QoS: Perception of the multimedia data at the user interface (“qualitative”)
    • Application QoS: Parameters such as end-to-end delay (“quantitative”)
    • System QoS: Requirements on the communications services derived from the application QoS
    • Network QoS: Parameters such as network load and performance

IT 481, Fall 2006

importance of interaction
Importance of Interaction
  • Multimedia is more than the combination of text, audio, video and data
  • Interaction among media is important
  • Consider a poorly dubbed movie
    • Audio not synchronized with video
    • Lip movements inconsistent with language
    • Audio dynamic range inconsistent with the scene

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

media interaction
Media Interaction
  • Process and Model

Compression

Synthesis

3D Sound

Audio

Lip synch

Face Animation

Joint A/V Coding

Speech Recognition

Text-to-Speech

Multimedia

Text

Image

Video

Sign language

Lip reading

Compression, Graphics

Database indexing/retrieval

Translation

Natural language

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

bimodality of human speech
Bimodality of Human Speech
  • Human speech is produced by vibration of the vocal cord, configuration of the vocal tract with muscles that generate facial expressions

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

basic definitions
Basic Definitions
  • The basic unit of acoustic speech is called a phoneme
  • In the visual domain, the basic unit of mouth movement is called viseme
    • A viseme is the smallest visibly distinguishable unit of speech
    • Can contain several phonemes and thus form one viseme group
    • A many-to-one mapping between phonemes and visemes

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

lip reading system
Lip Reading System
  • Application to support hearing-impaired person
  • People learn to understand spoken language by combining visual content with lexical, syntactic, semantic and programmatic information
  • Automated lip reading systems
    • Speech recognition possible using only visual information
    • Integrated with speech recognition systems to improve accuracy

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

lip synchronization
Lip Synchronization
  • Applications
    • In VTC (video teleconferencing) where video frame is dropped (low bandwidth requirement) but audio must still be continuous
    • In non-real-time use such as dubbing in studio where recorded voice full of background noise
  • Time-warping commonly used in both audio and video modes
    • Time-frequency analysis
    • Video time-warping could be used for VTC
    • Audio time-warping could be used for dubbing

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

lip tracking
Lip Tracking
  • To prevent too much jerkiness in the motion rendering and too much loss in lip synchronization
  • Involved real-time analysis on 3-dimensional of the video signal plus one temporal dimension
  • Produce meaningful parameters
    • Classification of mouth images into visemes
    • Measures of dimension, e.g. mouth widths and heights
  • Analysis tools – Fourier Transform, Karhunen-Loeve Transform (KLT), Probability Density Function (pdf) Estimation

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

audio to visual mapping for lip tracking
Audio-to-Visual Mapping for Lip Tracking
  • Conversion of acoustic speech to mouth shape parameters
  • A mapping of phonemes to visemes
  • Could be most precisely implemented with a complete speech recognizer followed by a look-up table
    • High computational overhead plus table look-up complexity
    • Do not need to recognize spoken word to achieve audio-to-visual mapping
  • Physical relationships exist between vocal tract shape and sound produced  functional relationships exist between speech and visual parameters

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

classification based conversion approaches for lip tracking
Classification-Based Conversion Approaches for Lip Tracking
  • Two-step process
    • Classification of acoustic signal using VQ (vector quantization), HMM (hidden Markov model) and NN (neural network)
    • Mapping of the acoustic classes into corresponding visual outputs, then averaged to get centroid
  • Shortcomings
    • Error resulting from averaging visual vector to get visual centroid
    • Not a continuous mapping – finite output levels

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

classification based conversion
Classification-Based Conversion

Viseme Space

Phoneme Space

Centroid

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

audio and visual integration for lip reading applications
Audio and Visual Integration for Lip Reading Applications
  • Three major steps
    • Audio-visual pre-processing – Principal Component Analysis (PCA) has been used for feature extraction
    • Pattern recognition strategy (HMM, NN, time-warping…)
    • Integration strategy (decision making)
      • Heuristic rules to incorporate knowledge of phonemes about the two modalities
      • Combination of independent evaluation score for each modalities

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

application in biometrics bimodal person verification
Application in Biometrics – Bimodal Person Verification
  • Existing methods for person verification are mainly based on a single modality which would have limitation in security and robustness
  • Audio visual integration using a camera and microphone makes person verification a more reliable product

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

joint audio video coding
Joint Audio-Video Coding
  • Correlation between audio and video can be used to achieve more efficient coding
    • Predictive coding of audio and video information used to construct estimate of current frame (cross-modal redundancy)
    • Difference between original and estimated signal can be transmitted as parameters
    • Decision on what and how to send is based on Rate Distortion (R-D) criteria
  • Reconstruction done at receiver according to agreed-upon decoding rules

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

cross model predictive coding
Cross-Model Predictive Coding

Visual

Analysis

Parameter X

Decision

Module

(R-D)

Nothing

Parameter X

A-to-V

Mapping

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

applications of multimedia
Applications of Multimedia
  • Business - Business applications for multimedia include presentations training, marketing, advertising, product demos, databases, catalogues, instant messaging, and networked communication.
  • Schools - Educational software can be developed to enrich the learning process.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

applications of multimedia1
Applications of Multimedia
  • Home - Most multimedia projects reach the homes via television sets or monitors with built-in user inputs.
  • Public places - Multimedia will become available at stand-alone terminals or kiosks to provide information and help.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

compact disc read only cd rom
Compact Disc Read-Only (CD-ROM)
  • CD-ROM is the most cost-effective distribution medium for multimedia projects.
  • It can contain up to 80 minutes of full-screen video or sound.
  • CD burners are used for reading discs and converting the discs to audio, video, and data formats.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

digital versatile disc dvd
Digital Versatile Disc (DVD)
  • Multilayered DVD technology increases the capacity of current optical technology to 18 GB.
  • DVD authoring and integration software is used to create interactive front-end menus for films and games.
  • DVD burners are used for reading discs and converting the disc to audio, video, and data formats.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006

multimedia communications1
Multimedia Communications

Audio Communications

(Telephony, sound, Broadcast)

Multimedia

Communications

Video Communications

(Video telephony,

TV/HDTV)

Data, text, image

Communications

(Data Transfer, fax…)

  • Multimedia communications is the delivery of multimedia to the user by electronic or digitally manipulated means.

Slide: Courtesy, Hung Nguyen

IT 481, Fall 2006