wmes3103 information retrieval n.
Download
Skip this Video
Download Presentation
WMES3103 : INFORMATION RETRIEVAL

Loading in 2 Seconds...

play fullscreen
1 / 17

WMES3103 : INFORMATION RETRIEVAL - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

WMES3103 : INFORMATION RETRIEVAL. TEXT AND MULTIMEDIA LANGUAGES AND PROPERTIES. INTRODUCTION. Text - main form of communicating data and information Text also supplemented with multimedia elements - to make the contents of an IRS more attractive and interactive

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'WMES3103 : INFORMATION RETRIEVAL' - mizell


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wmes3103 information retrieval

WMES3103 : INFORMATION RETRIEVAL

TEXT AND MULTIMEDIA LANGUAGES AND PROPERTIES

introduction
INTRODUCTION
  • Text - main form of communicating data and information
  • Text also supplemented with multimedia elements - to make the contents of an IRS more attractive and interactive
  • Website with a combination ot text and multimedia will be visited by many as compared to one which is text-based only
  • IRS - text and multimedia is depicted via special languages.
metadata
Metadata
  • New concept on information – metadata
  • Information about data arrangement, data domain and relationship between the two
  • Data about data
  • 2 types – descriptive and semantic
slide4
descriptive Metadata – metadata which explain about document or one unit of information
  • Commonly used Metadata :
    • Authors
    • Date of publication
    • Source of publication
    • Length of document
    • Type of document
metadata1
Metadata
  • semantic Metadata –resembles subject that can be obtain from the contents of the document – subjects heading
  • Keywords
  • LC Code
slide7
TEXT
  • With computers, we need to code text into binary digits
  • First coding schemes – EBCDIC and ASCII – 7 bits to code each symbol
  • Then, ASCII changed to 8 bits to accommodate other languages, accents and diacritical marks
  • Oriental languages – Unicode – 16 bits
slide8
TEXT

Formats

  • No one single format for a text document
  • Good IRS system should be able to retrieve information from any format
  • Initially, IRS will convert a document to an internal format but this had a lot of disadvantages
  • Now, many new format has been developed for document interchange
slide9
TEXT
  • RTF – Rich Text Format for word processing
  • PDF – Portable Document Format for displaying and printing documents
  • Postscript – powerful programming language for drawing
  • MIMT – Multipurpose Internet Mail Exchange to encode e-mail
  • Files are compressed – Compress (Unix), ARJ (PCs), ZIP
  • Convert binary files to ASCII text –uuencode/uudecode, binhex
markup languages
MARKUP LANGUAGES
  • Markup = extra textual syntax that can be used to describe formatting actions, structure information, text semantics, attributes, etc.
  • Formal markup languages are more structured
  • Marks = tags - initial and ending tag surrounding the marked text
  • Standard metalanguage = SGML
  • New metalanguange for Web = XML (eXtensible Markup Language) = subset of SGML
  • Most popular markup language used for the Web = HTML (HyperText Markup Language)
multimedia
MULTIMEDIA
  • Applications that handle different types of digital data originating from distinct types of media
  • Text, sound, images, video
  • Digital data distinct and different in volume, format, and processing requirements
  • Different types of formats necessary for storing each type of media
multimedia1
MULTIMEDIA
  • Different formats used commonly on the Web and in digital libraries
    • Images
    • Audio
    • Moving Images
    • Textual Images
    • Graphics and Virtual Reality
images
IMAGES
  • XBM, BMP, PCX – direct representation of a bit-mapped (or pixel-based)
  • GIF (Graphic Interchange Format) – includes compression and good for black or white or with small number of clours or gray levels (256)
  • JPEG (Joint Photographic Experts Group) – includes compression
  • TIFF (Tagged Image File Format) – used to exchange different documents between different applications and different computer platforms
  • TGA (Television Targa image file) – associated with video game boards
  • Various other image formats
audio
AUDIO
  • Must be digitized before storage
  • AU, MIDI (standard format to interchange music between electronic instruments and computers), WAVE – for small pieces of digital audio
  • Audio libraries – RealAudio or CD formats
  • Animation or moving pictures
    • MPEG (Moving Pictures Expert Group) – related to JPEG
    • Others – AVI, FLI, QuickTime
textual images
TEXTUAL IMAGES
  • Images that contain mainly typed or typeset text
  • Obtained by scanning the documents
  • For archival purposes
  • Saved as images but with further compression
  • Textual and non-textual stored and compressed separately and when neded can be combined and displayed together
graphics and virtual reality
GRAPHICS AND VIRTUAL REALITY
  • 3-dimensional graphics found on Web
  • CGM (Computer Graphics Metafile) standard
  • Metafile = collection of elements
  • CGM standard specifies which elements are allowed to occur in which positions in a metafile
  • VRML (Virtual Reality Modeling Language) – file format for describing interactive 3D objects and worlds - universal interchange format for 3D graphics and multimedia - can be used for various applications
multimedia documents markup
MULTIMEDIA DOCUMENTS MARKUP
  • HyTime = Hyper/Time-based Structuring Language – standard defined for multimedia documents markup
  • SGML architecture which specifies the generic hypermedia structure of documents