effect of linearization on normalized compression distance l.
Skip this Video
Loading SlideShow in 5 Seconds..
Effect of Linearization on Normalized Compression Distance PowerPoint Presentation
Download Presentation
Effect of Linearization on Normalized Compression Distance

Loading in 2 Seconds...

play fullscreen
1 / 41

Effect of Linearization on Normalized Compression Distance - PowerPoint PPT Presentation

  • Uploaded on

Effect of Linearization on Normalized Compression Distance. Jonathan Mortensen Julia Wu DePaul University July 2009. Introduction. Kolmogorov Complexity is an emerging similarity metric Transformation Distance Universal Similarity Measure

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Effect of Linearization on Normalized Compression Distance' - betty_james

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
effect of linearization on normalized compression distance

Effect of Linearization on Normalized Compression Distance

Jonathan Mortensen

Julia Wu

DePaul University

July 2009

  • Kolmogorov Complexity is an emerging similarity metric
    • Transformation Distance
  • Universal Similarity Measure
    • Does not require feature identification and selection
  • How can it be applied to images?
    • CBIR, Classification
  • Investigate its effectiveness
  • Discovered some fundamentals have been overlooked thus far
  • Background
  • Kolmogorov Complexity and Complearn
  • Research Topics
  • Spatial Transformations
  • Intensity Transformations
  • Image Groupings
  • Conclusion
  • Future Work
  • Li (2004): successful clustering of phylogeny trees, music, text files
    • 1D to 2D data?
  • Tran (2007): NCD not a good predictor of visual indistinguishability
    • Only one photograph used, one type of linearization (row-by-row)
  • Gondra (2008): CBIR using NCD produced statistically significant measures against H0 of random retrieval and other similarity measures
    • Test set of hundreds of images, inconsistent methods of compression and concatenation, linearization unclear
kolmogorov complexity
K(x) – The length of the shortest program or string x* to produce x

K(x|y) - The shortest binary string to convert output x given input y


Normalized Information Distance:

Kolmogorov Complexity
kolmogorov complexity6
Kolmogorov Complexity
  • Universal, in that it captures all other semi-computable normalized distance measures
  • Therefore also semi-computable
  • Compression losslessly simplifies strings, and therefore is used as an approximation, C(x)

“The human brain is incapable of creating anything which is really complex.”--Kolmogorov,  A.N., Statistical Science, 6, p314, 1990

  • Open Source package which implements K-Complexity
  • Developed by Rudi Cilibrasi, Anna Lissa Cruz, Steven de Rooij, and Maarten Keijzer
  • Uses basic linux compression tools to develop the comparison map
initial questions
Initial Questions
  • Linearization Methods and Alternatives
    • How to Preserve a 2D signal
  • Linearization’s affect NCD on spatial transformations and intensity shifts
  • Do additional feature images lower NCD?
  • CBIR: Can K-Complexity be used with feature vectors or image semantics
spatial transformations
Spatial Transformations
  • Applied 4 types of linearization to 800 images (original and 7 transformations)
  • Found that each linearization type produced distinctly different NCDs
  • Certain linearizations result in lower NCDs for certain transformations
linearization methods
Linearization Methods

Row Major

Column Major

Hilbert-Peano SPC:

Images transformed to 128x128


Images transformed to 35% of original size


Spatial Transformations

Original Image

Down Shift

Left Shift

180 rotation

90 rotation

270 rotation

Reflection Y Axis

Reflection X Axis

intensity transformations
Intensity Transformations
  • Additive Constant
  • Three types of noise
    • Gaussian
    • Speckle
    • Salt and Pepper
  • Least Significant Bit (LSB) Steganography
  • Contrast Windowing
additive constant
Additive Constant

Image 937.jpg

+32 and +64 respectively

  • P = Intensity + Constant
    • +4, +8, +12… +100
  • 16 bit
    • 255 (+4)-> 259
  • Truncation
    • 255 (+4)-> 255
  • Wrap
    • 255 (+4)-> 4
various noise
Various Noise
  • Gaussian (Statistical)
  • Speckle (Multiplicative)
  • Salt and Pepper (Drop-off)

0.32 and 0.64 Variance/Noise Density Respectively

noise cont
Noise Cont:
  • Gaussian and Speckle Noise don’t compress well
  • Gaussian and Salt Pepper experience some posterior decay
least significant bit steganography
Least Significant Bit Steganography
  • Hide4PGP
  • “Scrambles” message
  • Changes pixel bit to most similar color with opposite bit assignment
  • Spreads secret data over entire file
  • True Grayscale: Changes two bits per pixel

Image with No Text

Image hiding “Gettysburg Address”

contrast windowing
Contrast Windowing
  • Computed Tomography image enhancement that increases contrast in certain structures
  • Brief Medical Exploration
contrast windowing24
Contrast Windowing

Lung Window (-200 HU, width 2000 HU)

Bone Window (300 HU, width 1500 HU)

Patient 5: Original Image top left

Soft Tissue Window (50 HU, width 350 HU)





conclusion how many vs how little
Conclusion: "How Many" vs "How Little"
  • NCD for Ordinal Comparisons
  • Numerical Redundancy


Entire Picture




Salt and Pepper Noise


Additive Constants

Contrast Windowing

Larger NCD


feature image comparison and grouping
Feature Image Comparison and Grouping
  • Feature Image: Pixel based values derived from the original image
  • 3 Main Types of Linearization
  • Avg NCD inter > Avg NCD intra
  • The greater inter - intra, the better NCD finds groupings
feature image linearization
Feature Image Linearization
  • Image-At-Once – row-order one feature image at a time
  • Row Concatenation – Appends all images, then performs row-order linearization
  • Pixel Order – Selects value from same pixel of each feature image in row-order fashion
  • Gray Row-Major – Grayscales an image and follows row-order on intensities
data set and methods
Data Set and Methods
  • Corel Image Database with 10 predefined groupings
  • Linearized by 5 methods
  • NCDs were found within a group and then to the left and to the right
  • Nearly every linearization produced statistically different NCDs
  • Intra Group was always less than Inter Group
  • Gray provided the greatest difference Inter-Intra
    • Thought this was due to filesize
  • Triple Concat’ed Gray creating equal filesize: Found an even greater difference
  • NCD is a good model for predefined human groupings and linearization has little impact on this
  • Gray-Triple Row-Major may be the best form of linearization
  • Direction of concatenation does not matter
  • Defined a methodology for any number of feature images
  • Compressor Errors
  • Numerical Redundancy
    • Ordinal Variables vs Nominal Variables
    • EX: 195 195 195 195 <=> 198 198 198 198
      • NCD = 0.100000
    • 199 199 199 199 <=> 202 202 202 202
      • NCD = 0.128205
  • NCD needs refinement
  • 2D image as a 1D string?
future work
Future Work
  • Image Scaling and Normalization
  • Additional Feature Images
  • New Forms of Image concatenation
  • Investigate Compressors (Numeric?)
  • A. Itani and D. Manohar. Self-Describing Context-Based pixel ordering. Lecture notes in computer science, pages 124{134, 2002.
  • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004.
  • R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000.
  • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/.
  • R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003.
  • N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007.
  • I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.