1 / 5

Overview

Overview. Focus: Methods and technologies to store and retrieve information in the form of documents that contain text and that may also contain tables, diagrams and images

jacoba
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview Focus: Methods and technologies to store and retrieve information in the form of documents that contain text and that may also contain tables, diagrams and images • In any information system, the “real world” is represented by a collection of data abstracted from observations of the real world and made available to the system • Need, reality, data, query

  2. Overview (cont.1) Ectosystem: system factors that are not under the control of the designer Endosystem: system factors that the designer can specify and control (e.g., algorithms) Performance • Effectiveness • Efficiency • Economy

  3. From Data to Wisdom • Data: impersonal, and equally available • Information: set of data matched to a need, personal, and time-dependant • Knowledge • Data, information, and rules • IR&S process description

  4. Data Compression • Level of compression; character vs. word • Data model • Statistical: build statistical tables for sample • Adaptive: starts with a priori stat distributionfor the text symbols but modifies it as each char/word is encoded • Semi-static: Start with model for, say Chapter 1, then modify for better fit of Chapter 2, and so on

  5. Types of Codes for Text Compression • Huffman: static, binary tree • Ziv-Lempel: adaptive, identify each text segment the first time it appears and then point back when it occurs again • Arithmetic: adaptive, text steam identifies by a number that represents the statistical distribution of the symbols, later modified as the text is encoded

More Related