slide1
Download
Skip this Video
Download Presentation
Compression

Loading in 2 Seconds...

play fullscreen
1 / 16

Compression - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

Compression. Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR = , where is the number of bits Compression techniques take advantage of: Sparse coverage Repetitive scan lines Large smooth gray areas

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Compression' - beatrice-west


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Compression

  • Word document: 1 page is about 2 to 4kB
  • Raster Image of 1 page at 600 dpi is about 35MB
  • Compression Ratio, CR = , where is the number of bits
  • Compression techniques take advantage of:
    • Sparse coverage
    • Repetitive scan lines
    • Large smooth gray areas
    • ASCII code, always 8 bits per character
    • Long words frequently used
slide2

Entropy

  • Entropy is a quantitative term used for amount of information in a string

1.00

0.80

0.60

0.40

0.20

0.00

H(1)+H(0)

H(1)

H(0)

0.0 0.2 0.4 0.6 0.8 1.0

For N clusters, where li is the length of the ith cluster

slide3

Binary Image Compression Techniques

  • Packing: 8 pixels per byte
  • Run Length Encoding: Assume 100 dpi, 850 bits per line
    • encode only the white bits as they are long runs
    • Top part of a page could be 0(200)111110(3)111110(3) ….
  • Huffman Coding: use short length codes for frequent messages

Encode Decode

slide4

0

(2,7) (13,2) 0

(2,7) (13,2) 0

(2,7) (13,2) 0

(2,2) (7,2) (13,2) 0

(2,2) (7,2) (13,2) 0

(2,7) (13,2) 0

(2,2)(7,2)(13,2) 0

(2,2)(7,2)(13,2) 0

0

Bit map: 160 bits

50 numbers in range 0-15

Use 4 bits per number: 200 bits

2 bits per symbol: 100 bits

HC: 1.84 x 50 = 92 bits

Huffman Encoding

slide5

Predictive Coding

  • Most pixels in adjacent scan lines s1 and s2 are the same
  • S2’ is the predicted version

2 dimensional prediction

  • Probabilities gathered from document collections
  • Tradeoff between context size and table size; Context size of 12 pixels common which uses a 4096 entries table
slide6

Group III Fax

  • White runs and black runs alternate
  • All lines begin with a white run (possibly length zero)
  • There are 1728 pixels in a scan line
  • Makeup codes encode a multiple of 64 bits
  • Terminating codes encode the remainder (0 to 63)
  • EOL for each line
  • CCITT lookup tables
  • Example,
    • White run of 500 pixels would be encoded as
    • 500 = 7x 64 + 52
    • Makeup code for 7x 64 is 0110 0100
    • Terminating code for 52 is 0101 0101
    • Complete code is 0110 0100 0101 0101
slide8

Group IV READ

b1

b2

Reference

Coding

a0

a1

a2

  • a0 is the reference changing pixel; a1 is the next changing pixel after a0; and a2 is the next changing pixel after a1.
  • b1 is the first changing pixel on the reference line after a0 and is of opposite color to a0; b2 is the next changing pixel after b1.
  • To start, a0 is located at an imaginary white pixel point immediately to the left of the coding line.
  • Follow READ algorithm chart
slide12

Information Retrieval (Typed text documents)

  • IR goal is to represent a collection of documents were a single document is the smallest unit of information
  • Typify document content and present information upon request

Similarity

Measure

Requests

Documents

  • OCR translates images of text to computer readable form and IR extracts the text upon request
  • Inverted Index: Transpose the document-term relationship to a term-document relationship
  • Remove Stopwords: the, and, to, a, in, that, through, but, etc.
  • Word Stemming: Remove prefixes and suffixes and normalize
slide13

Query 1: recognition or retrievalResponse: 1 2 3

Query 2:sequentially and readableResponse: 3

Query 3:not translateResponse: 2

Query:character and recognition or retrieval

slide14

Vector Space Model

  • Each document is denoted by a vector of concepts (index terms)
  • If the term is present in the document 1 is placed in the vector
  • Vector of document 1 from table: (1 1 1 0 1 0 0 1 1 1 1 1 0 0 0 1 1)
  • Weighting: Favor terms with high frequency in a few documents

N = total documents

Dfi = no. of docs containing term i

Tij = frequency of term i in doc j

Document similarity measure

between Dj (wi,w2j,…wmj) and Qr (q1r,q2r,..qmr)

slide15

Relevance Feedback

N = no. of documents in collection

R = number of documents relevant to query q

N = no. of documents containing t

R = no. of relevant documents containing t

F =proportion of relevant documents to non-relevant documents in which term occurs

F’ = without relevance feedback

k = constant, adjusted with collection size

c = collection size

fi = no. of documents in which term i occurs

tij = frequency term i in document j

Maxtfj = maximum term frequency in document j

slide16

Precision and Recall

  • Coverage: extent to which system includes relevant documents
  • Time lag: average time it takes to produce an answer to a search request
  • Presentation: quality of the output
  • Effort: energies put forth by user to obtain information sought
  • Recall: proportion of relevant material received from a query
  • Precision: proportion of retrieved documents actually relevant

Recall=

Precision=

ad