effect of linearization on normalized compression distance l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Effect of Linearization on Normalized Compression Distance PowerPoint Presentation
Download Presentation
Effect of Linearization on Normalized Compression Distance

Loading in 2 Seconds...

play fullscreen
1 / 41

Effect of Linearization on Normalized Compression Distance - PowerPoint PPT Presentation


  • 246 Views
  • Uploaded on

Effect of Linearization on Normalized Compression Distance. Jonathan Mortensen Julia Wu DePaul University July 2009. Introduction. Kolmogorov Complexity is an emerging similarity metric Transformation Distance Universal Similarity Measure

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Effect of Linearization on Normalized Compression Distance' - betty_james


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
effect of linearization on normalized compression distance

Effect of Linearization on Normalized Compression Distance

Jonathan Mortensen

Julia Wu

DePaul University

July 2009

introduction
Introduction
  • Kolmogorov Complexity is an emerging similarity metric
    • Transformation Distance
  • Universal Similarity Measure
    • Does not require feature identification and selection
  • How can it be applied to images?
    • CBIR, Classification
  • Investigate its effectiveness
  • Discovered some fundamentals have been overlooked thus far
outline
Outline
  • Background
  • Kolmogorov Complexity and Complearn
  • Research Topics
  • Spatial Transformations
  • Intensity Transformations
  • Image Groupings
  • Conclusion
  • Future Work
background
Background
  • Li (2004): successful clustering of phylogeny trees, music, text files
    • 1D to 2D data?
  • Tran (2007): NCD not a good predictor of visual indistinguishability
    • Only one photograph used, one type of linearization (row-by-row)
  • Gondra (2008): CBIR using NCD produced statistically significant measures against H0 of random retrieval and other similarity measures
    • Test set of hundreds of images, inconsistent methods of compression and concatenation, linearization unclear
kolmogorov complexity
K(x) – The length of the shortest program or string x* to produce x

K(x|y) - The shortest binary string to convert output x given input y

E(x,y)=max{K(x|y),K(y|x)}

Normalized Information Distance:

Kolmogorov Complexity
kolmogorov complexity6
Kolmogorov Complexity
  • Universal, in that it captures all other semi-computable normalized distance measures
  • Therefore also semi-computable
  • Compression losslessly simplifies strings, and therefore is used as an approximation, C(x)

“The human brain is incapable of creating anything which is really complex.”--Kolmogorov,  A.N., Statistical Science, 6, p314, 1990

complearn
CompLearn
  • Open Source package which implements K-Complexity
  • Developed by Rudi Cilibrasi, Anna Lissa Cruz, Steven de Rooij, and Maarten Keijzer
  • Uses basic linux compression tools to develop the comparison map
initial questions
Initial Questions
  • Linearization Methods and Alternatives
    • How to Preserve a 2D signal
  • Linearization’s affect NCD on spatial transformations and intensity shifts
  • Do additional feature images lower NCD?
  • CBIR: Can K-Complexity be used with feature vectors or image semantics
spatial transformations
Spatial Transformations
  • Applied 4 types of linearization to 800 images (original and 7 transformations)
  • Found that each linearization type produced distinctly different NCDs
  • Certain linearizations result in lower NCDs for certain transformations
linearization methods
Linearization Methods

Row Major

Column Major

Hilbert-Peano SPC:

Images transformed to 128x128

SCPO:

Images transformed to 35% of original size

slide13

Spatial Transformations

Original Image

Down Shift

Left Shift

180 rotation

90 rotation

270 rotation

Reflection Y Axis

Reflection X Axis

intensity transformations
Intensity Transformations
  • Additive Constant
  • Three types of noise
    • Gaussian
    • Speckle
    • Salt and Pepper
  • Least Significant Bit (LSB) Steganography
  • Contrast Windowing
additive constant
Additive Constant

Image 937.jpg

+32 and +64 respectively

  • P = Intensity + Constant
    • +4, +8, +12… +100
  • 16 bit
    • 255 (+4)-> 259
  • Truncation
    • 255 (+4)-> 255
  • Wrap
    • 255 (+4)-> 4
various noise
Various Noise
  • Gaussian (Statistical)
  • Speckle (Multiplicative)
  • Salt and Pepper (Drop-off)

0.32 and 0.64 Variance/Noise Density Respectively

noise cont
Noise Cont:
  • Gaussian and Speckle Noise don’t compress well
  • Gaussian and Salt Pepper experience some posterior decay
least significant bit steganography
Least Significant Bit Steganography
  • Hide4PGP
  • “Scrambles” message
  • Changes pixel bit to most similar color with opposite bit assignment
  • Spreads secret data over entire file
  • True Grayscale: Changes two bits per pixel

Image with No Text

Image hiding “Gettysburg Address”

contrast windowing
Contrast Windowing
  • Computed Tomography image enhancement that increases contrast in certain structures
  • Brief Medical Exploration
contrast windowing24
Contrast Windowing

Lung Window (-200 HU, width 2000 HU)

Bone Window (300 HU, width 1500 HU)

Patient 5: Original Image top left

Soft Tissue Window (50 HU, width 350 HU)

slide25

P1

P3

P5

conclusion how many vs how little
Conclusion: "How Many" vs "How Little"
  • NCD for Ordinal Comparisons
  • Numerical Redundancy

Selective

Entire Picture

Gaussian

Speckle

Noise

Salt and Pepper Noise

Steganography

Additive Constants

Contrast Windowing

Larger NCD

SmallerNCD

feature image comparison and grouping
Feature Image Comparison and Grouping
  • Feature Image: Pixel based values derived from the original image
  • 3 Main Types of Linearization
  • Avg NCD inter > Avg NCD intra
  • The greater inter - intra, the better NCD finds groupings
feature image linearization
Feature Image Linearization
  • Image-At-Once – row-order one feature image at a time
  • Row Concatenation – Appends all images, then performs row-order linearization
  • Pixel Order – Selects value from same pixel of each feature image in row-order fashion
  • Gray Row-Major – Grayscales an image and follows row-order on intensities
data set and methods
Data Set and Methods
  • Corel Image Database with 10 predefined groupings
  • Linearized by 5 methods
  • NCDs were found within a group and then to the left and to the right
results
Results
  • Nearly every linearization produced statistically different NCDs
  • Intra Group was always less than Inter Group
  • Gray provided the greatest difference Inter-Intra
    • Thought this was due to filesize
  • Triple Concat’ed Gray creating equal filesize: Found an even greater difference
conclusion
Conclusion
  • NCD is a good model for predefined human groupings and linearization has little impact on this
  • Gray-Triple Row-Major may be the best form of linearization
  • Direction of concatenation does not matter
  • Defined a methodology for any number of feature images
conclusion38
Conclusion
  • Compressor Errors
  • Numerical Redundancy
    • Ordinal Variables vs Nominal Variables
    • EX: 195 195 195 195 <=> 198 198 198 198
      • NCD = 0.100000
    • 199 199 199 199 <=> 202 202 202 202
      • NCD = 0.128205
  • NCD needs refinement
  • 2D image as a 1D string?
future work
Future Work
  • Image Scaling and Normalization
  • Additional Feature Images
  • New Forms of Image concatenation
  • Investigate Compressors (Numeric?)
references
References
  • A. Itani and D. Manohar. Self-Describing Context-Based pixel ordering. Lecture notes in computer science, pages 124{134, 2002.
  • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004.
  • R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000.
  • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/.
  • R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003.
  • N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007.
  • I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.