Introduction to Steganalysis Schemes

Introduction to Steganalysis Schemes Multimedia Security

Outline • Steganalysis to LSB encoding • Steganalysis based on JPEG compatibility • Some discussions

Introduction • Steganography • The art of secret communication • Stego content (e.g. images) should not contain any easily detectable artifacts due to message embedding • The less information is embedded, the smaller the probability of introducing detectable artifacts

Fidelity Watermarking Capacity Robustness Steganography Watermarking vs. Steganography

Steganalysis of LSB Encoding

Goal • To inspect one or possibly more images for statistical artifacts due to message embedding in color images using the LSB method • To find out whichimages are likely to contain secret messages • To estimate the reliability of decisions • Type I error (false-alarm) and Type II error (Miss)

Automatic Checking Internet Internet node with a special filter Images in Seized computer Images sent to a certain address Forensics Expert Application Scenarios

LSB Encoding • Replacing the LSB of every gray-level of color channel with message bits • On average 50% of the LSB are changed • Logic behind this scheme • LSB in scanned or camera-taken images are essentially random • Encrypted (randomized) message are random • No statistical artifacts will be introduced

Important Observation • Number of unique colors in cover images • Typically smaller than the number of pixels in the images • 1:2 for high quality scans in BMP format • 1:6 or lower for JPEG images or video • Many true-color images have a relatively small “palette” • After LSB embedding, new color palette will have a distinct feature • Many pairs of close colors • An evidence of LSB encoding-based steganography

Formulations • U: number of unique colors in an image • P: number of close color pairs • Two colors (R1,G1,B1) and (R2,G2,B2) are close if |R1-R2|≤1 and |G1-G2|≤1 and |B1-B2|≤1 • R: ratio between the number of close pairs of colors and all pairs of colors • R=P/C(U, 2) , C(., .) # of combination

The Proposed Scheme • After embedding, U will be increased to U’, and we can evaluate the number of unique pairs of P’. • The value of R for an image that does not have a message will be smaller than that of an image that already has a message already embedded in it

The Proposed Scheme (cont.) • It is impossible to find a threshold of R for all images • Due to a large variation of U • Observations for reliable distinguishing • For an image already contains a large message • Embedding another message in it does not modify R significantly • For an image not containing a message • R increases significantly • Use the relative comparison of R as the decision criterion

Detection Algorithm • To find out whether or not an image has a secret message • Calculate R=P/C(U, 2) • Using LSB embedding in randomly selected pixels • Size of the test message: 3‧a‧M‧N (for M by N color images) • Calculate R’=P’/C(U’,2) • Decide whether an image is embedded • R~=R’  the image already had a large message hidden • R’>R the image did not have a message in it R’/R: the separating statistics

Limitations • If the secret message size is too small • the two ratio will be very close to each other • We cannot distinguish images with and without messages

Experiments • Using an image database of 300 color images • 350x250 pixels • JPEG compressed • Capacity for each image: 32.8k bits (350x250*3/8) • A message of length 20KB (2/3 of maximal capacity) was embedded into each image to form a new database of images with messages • The detection algorithm is run for both database and the message presence is tested by embedding a test message of size 1KB (a=1/30)

1.1 _ : original database … : embedded database Experimental Results

Parameter Optimization • Model the density functions as Gaussian distributions • N(μ, σ) and N(μs, σs) • Different size of secret messages ,denoted as s, and test messages are tested • Secret messages: 1% to 50% • Test messages: a=0.01 – 0.5 • Results • μ>μs for all s • s decreases  N(μs, σs) become flat and the peak moves right • s increases N(μs, σs) become narrower and the peak moves left • Easier to separate the two peaks for larger secret message sizes

Threshold Selection Type I Error = Type II Error (equals minimizing overall error) Change the threshold Th to adjust for the importance of not missing an image with a secret message at the expense of false-alarm

K K K K Experimental Results

K K Experimental Results (cont.)

Conclusions • The probability of error prediction is mainly determined by the size of the secret message • The influence of the test message size is much smaller • The optimal test message size is different for different secret message size • The detection algorithm mainly targets for images with smaller number of unique colors • The results for high-quality scanned and loselessly compressed images (U>0.5MN) may be unreliable

Steganalysis Based on JPEG Compatibility

Image Steganography • Image formats • Uncompressed (BMP) • Offering the highest capacity and best overall security • Palette (GIF) • Difficult to provide security with reasonable capacity • Lossy compressed (JPEG, JPEG 2000) • Difficult to hide message in JPEG stream in a secure manner while keeping the capacity practical

Goal of this Paper • To show that images may be extremely poor candidates for cover images if • Initially acquired as JPEG images and later decompressed to a loseless format • For steganalysis methods, minimal amount of distortion is to be achieved to reduce visible artifacts • The act of message embedding will not erase the characteristic structure created by JPEG compression • Analyzing the DCT coefficients of images to recover even the values of JPEG quantization table • Evidence for steganography • An image stored in loseless format that bears a strong fingerprinting of JPEG compression, yet is not fully compatible with JPEG compressed image

DCT Uncompressed Image Borig dk(i), i=0,…,63 Huffman coder Zigzag-scan Dk(i)=Round (dk(i)/Q(i)) JPEG Quantization Matrix Q JPEG Compression

JPEG Decompression • Huffman decoding • QDk(i)=Q(i)*Dk(i) • Multiplying quantized DCT step with quantization step • Braw=DCT-1(QD) • Inverse DCT • B=[Braw] • rounded to integers in the range of 0-255

Observations • If the block B has no pixels saturated at 0 or 255 • ||Braw-B||2 ≤ 16 , ||·||: L2 norm • Since |Braw(i)–B(i)| ≤0.5 for all i

The Proposed Scheme • Question • Given an arbitrary 8x8 block B of pixel values, could this block have arisen through the process of JPEG decompression with the quantization matrix Q (if available)? • ||B-Braw||2 =||DCT(B)- DCT(Braw)|| =||QD’-QD|| ≤ 16 • Additional check • Σ(QD’(i)-qp(i)(i))2 ≤ 16, qp(i):integer multiples of Q(i) close to QD(i) • B=[DCT-1(QD)], where QD(i)=qp(i)(i) By Parseval’s Equality ≧Σ|QD’(i)-Q(i)round(QD’(i)/Q(i)| = S

Algorithm • Divide the images into 8x8 blocks • Arrange the blocks in a list, and remove all saturated blocks from the list • T: number of remaining blocks • Extract the quantization matrix Q from all T blocks • If all elements of Q are 1s, the image is not calculated

Algorithm (cont.) 4. For each block B, calculate S 5. If S>16, B is not compatible with JPEG compression. else Perform the additional check 6. After going through T blocks, if no incompatible blocks is found, no evidence of steganography is available. 7. Repeat the algorithm for different 8x8 division for detecting cropped images

Extracting the Quantization Matrix

Some Discussions

Reference • J. Fridrich, R. Du and M. Long, “Steganalysis of LSB encoding in color images, ” ICME 2000, New York, 2000 • J. Fridrich, M. Goljan and R. Du, “Steganalysis based on JPEG compatibility,” SPIE Multimedia Systems and Applications IV, Denver, 2001 • G. Goth, “Steganalysis gets past the hype,’ IEEE Distributed Systems Online, April 2005

Introduction to Steganalysis Schemes