Slide 1 Security and Error Correction/Detection in 802.1x and GSM

Hide and Seek: An Introduction to Steganography

Niels Provos and Peter Honeyman, University of Michigan IEEE Security and Privacy Journal, May-June 2003 (Vol. 1, No. 3)

Sweety Chauhan

October 24, 2005

CMSC 691I

Clandestine Channels

Slide 2 ### Overview

- New and Significant
- What is Steganography?
- Previous Work
- Steganographic systems for JPEG images
- Steganography Detection on the Internet
- Results

Slide 3 ### New and Significant

- Detection of Steganographic systems via statistical steganalysis
- Practical application of detection algorithms

Slide 4 ### What is Steganography?

- Art and Science of hiding communication
- A steganographic system embeds hidden content in unremarkable cover media
- A steganographic system consists of :
- Identifying cover’s medium redundant bits
- Embedding process which creates a stego medium by replacing the redundant bits with hidden message data

Slide 5 ### Statistical Steganalysis

- Modern Steganography’s goal is to keep its mere presence undetectable
- But steganographic systems – leave behind detectable traces in the cover medium
- Though secret content is not revealed but its existence can be detected
- Modifying the cover medium changes its statistical properties
- Eavesdroppers can detect the distortions in the resulting stego medium’s statistical properties

The process of finding these distortions is called statistical steganalysis

Slide 6 ### Information Hiding Systems

- Three different aspects in information-hiding systems contend with each other:
- Capacity – amount of information that can be hidden in the cover medium
- Security – eavesdropper inability to detect hidden information
- Robustness – amount of modification the stego medium can withstand before an adversary can destroy hidden information

- Watermarking system – high level of robustness
- Steganography – high security and capacity
- Hidden information is fragile

Slide 7 ### Steganographic Systems

- Classical Steganography system
- Security relies on the encoding system’s secrecy
- e.g. – Roman General shaving slave’s head and tattooing a message on it. After the hair grew back, the slave was sent to deliver the hidden message

- Modern Steganography
- Attempts to be detectable only if secret information is known (secret key)
- Similar to Kerckhoffs’ Principle of cryptography which holds that “a cryptographic system’s security should rely solely on the key material”

Slide 8 ### Modern Steganography

- Steganographic communication senders and receivers agree on a :
- steganographic system
- a shared secret key – determines how message is encoded in the cover medium

Slide 9 ### Overview of Encoding Step

- To send a hidden message, for example,
- Alice creates a new image with digital camera
- Alice supplies the steganographic system with her shared secret and message
- The steganographic systems uses the shared secret to determine how the hidden message should be encoded in the redundant bits
- The result is the stego image that Alice sends to Bob
- When Bob receives the image, he uses the shared secret and the agreed steganographic system to retrieve the hidden message

Slide 10 ### Hide and Seek in JPEG images

- Why steganographic systems for JPEG format?
- System operate in a transform space
- Not affected by visual attacks (as in BMP images)
- Modifications are in the frequency domain instead of the spatial domain

- Neil F. Johnson and Sushil Jajodia showed steganographic systems for palette-based images leave easily detected distortions

Slide 11 ### Discrete Cosine Transform (DCT)

For each color component, the JPEG image format uses a Discrete Cosine Transform (DCT) to transform successive 8x8 pixel block of the image into 64 DCT coefficients each

The DCT coefficients F(u, v) of an 8 x 8 block of image pixels f(x, y) are given by

The following operation quantizes the coefficients:

where Q(u,v) is a 64-element quantization table

Slide 12 ### Steganographic Systems

- Sequential – for example: JSteg
- Pseudo Random – for example: Outguess 0.1
- Subtraction – for example: F5
- Statistics aware embedding

Slide 13 Least-significant bits of the quantized DCT coefficients is used as redundant bits to embed the hidden message

### Sequential Embedding (I)

- Derek Upham’s JSteg Algorithm - does not require a shared secret
Input: message, cover image

Output: stego image

while data left to embed do

get next DCT coefficient from cover image

if DCT ≠ 0 and DCT ≠1 then

get next LSB from message

replace DCT LSB with message LSB

end if

insert DCT into stego image

end while

- As a result anyone who knows the steganographic system can retrieve the message hidden by JSteg

Slide 14 ### Sequential Embedding Steganalysis (I)

- Andreas Westfeld and Andreas Pfitzmann noticed that
- steganographic systems that change least-significant bits sequentially cause distortions detectable by steganalysis
- for a given image, the embedding of high-entropy data (often due to encryption) changed the histogram of color frequencies in a predictable way.

- Embedding uniformly distributed message bits reduces the frequency difference between adjacent DCT coefficients’
- By observing differences in the DCT coefficients’ frequency, embedding can be detected

Slide 15 ### Frequency Histograms

Histogram before (a) and after (b) a hidden message is embedded in a JPEG image

Sequential changes to the

(a) original and

(b) modified image’s least-sequential bit of discrete cosine transform coefficients tend to equalize the frequency of adjacent DCT coefficients in the histograms

Slide 16 ### Sequential Embedding Steganalysis (II)

- Westfeld and Pfitzmann χ2-test
- determine whether the observed frequency distribution in an image matches a distribution that shows distortion from embedding hidden data

- The probability of embedding is determined by calculating p for a sample from the DCT coefficients
- The samples start at the beginning of the image and for each measurement the sample size is increased

Slide 17 ### Sequential Embedding Steganalysis (III)

- A high probability of embedding indicates that the image contains steganographic content
- Hidden message’s length can also be determined by JSteg

Slide 18 ### Pseudo Random Embedding

- Niels Provos’s Outguess 0.1 steganographic system
- Improves the encoding step by using a pseudo-random generator to select DCT coefficients at random
- The LSB of a selected DCT coefficient is replaced with encrypted message data

Slide 19 The algorithm replaces the least-significant bit of pseudo-randomly selected discrete cosine transform (DCT) coefficients with message data

### Outguess 0.1 Algorithm

- The OutGuess 0.1 algorithm :
Input: message, shared secret, cover image

Output: stego image

initialize PRNG with shared secret

while data left to embed do

get pseudo-random DCT coefficient from cover image

If DCT ≠ 0 and DCT ≠1 then

get next LSB from message

replace DCT LSB with message LSB

end if

insert DCT into stego image

end while

Slide 20 ### Embedded Message Detection (I)

- χ2 -test can be extended to detect the local distortions in an image
- Two identical distributions produce about the same χ2 values in any part of the distribution
- Instead of increasing the sample size and applying the test at a constant position,
- a constant sample size is used and the sample position is increased (slided)

Slide 21 ### Embedded Message Detection (II)

- The extended χ2-test detects pseudo-randomly embedded messages in JPEG images
- The detection rate depends on
- hidden message’s size
- number of DCT coefficients in an image
- can be improved by applying a heuristic that eliminates coefficients likely to lead to false negatives

The graph shows the detection rates for three different false-positive rates

The change rate refers to the fraction of discrete cosine transform (DCT) coefficients available for embedding a hidden message that have been modified

Slide 22 ### Subtraction

- Andreas Westfeld’s steganographic system, F5
- Instead of replacing the least-significant bit of DCT coefficient with message data
- F5 decrements its absolute value in a process called matrix encoding

- There is no coupling of any fixed pair of DCT coefficients

Slide 23 ### Matrix Encoding

- Matrix encoding computes an appropriate (1, (2k– 1), k) Hamming code by calculating the message block size k from
- the message length and
- the number of nonzero non-DC coefficients

- The Hamming code (1, 2k– 1, k) encodes a k-bit message word m into an n-bit code word a with n = 2k– 1
- can recover from a single bit error in the code word

Slide 24 ### The F5 algorithm

Input: message, shared secret, cover image

Output: stego image

initialize PRNG with shared secret

permutate DCT coefficients with PRNG

determine k from image capacity

calculate code word length n←2k – 1

while data left to embed do

get next k-bit message block

repeat

G←{n non-zero AC coefficients}

s←k-bit hash f of LSB in G

s←s k-bit message block

if s ≠0 then

decrement absolute value of DCT coefficient Gs

insert Gs into stego image

end if

untils = 0 or Gs ≠ 0

insert DCT coefficients from Ginto stego image

end while

Slide 25 ### F5 Detection Algorithm

- Embedding information with F5 leads to double compression
- Most of the images are stored already in the JPEG format which could confuse this detection algorithm.

- Fridrich and her group proposed a method for eliminating the effects of double compression by estimating the quality factor used to compress the cover image

Slide 26 ### Statistics-aware embedding

- Previous discussed algorithms overwrite image data without directly considering the distortions that the embedding will cause
- To embed a single bit,
- a DCT coefficient’s value can either increment or decrement which allows change of DCT coefficient’s least-significant bit in two different ways
- Creating groups of DCT coefficients and using the parity of their least-significant bits as message bits

- For every DCT block, the space of all possible changes is searched to find a configuration that minimizes the change to image statistics

Slide 27 ### Detection Algorithms

- Two Different classes of algorithms:
- Based on inherent statistical properties
- no need to find a representative training set
- estimate an embedded message’s length

- Based on class discrimination
- Creating a representative training set is often difficult
- Do not provide an estimate of the hidden message’s length

Slide 28 ### Steganography Detection on the Internet

- How previous discussed steganalytic methods can be used in real world setting?
- Created a steganography detection framework that
- gets JPEG images off the Internet and
- uses steganalysis to identify subsets of the images likely to contain steganographic content

Slide 29 ### Steganography Systems in use

- JSteg
- supports content encryption and compression before JSteg embeds the data
- uses the RC4 stream cipher for encryption

- JPHide
- uses Blowfish as a PRNG Version 0.5 supports additional compression of the hidden message
- uses slightly different headers to store embedding information
- Before the content is embedded, the content is Blowfish-encrypted with a user-supplied pass phrase

- OutGuess
- All use some form of least-significant bit embedding and are detectable with statistical analysis

Slide 30 ### Detection Framework

- Stegdetect is an automated utility that can analyze JPEG images that have content hidden with JSteg, JPHide, and OutGuess 0.13b
- Stegdetect’s output lists
- the steganographic systems it finds in each image or
- writes “negative” if it couldn’t detect any

- Stegdetect’s false-negative rate depends on:
- The steganographic system and the embedded message’s size
- The smaller the message, the harder it is to detect by statistical means.

- Stegdetect is very reliable in finding images that have content embedded with JSteg
- For JPHide, detection depends also on the size and the compression quality of the JPEG images

Slide 31 ### Detection Results

Using Stegdetect over the Internet. (a) JPHide and (b) JSteg produce different detection results for different test images and message sizes

Slide 32 ### Finding Images

- Images from eBay auctions and discussion groups in the Usenet archive for analysis.
- Developed Crawl, a simple, efficient Web crawler that makes a local copy of any JPEG images it encounters on a Web page
- Crawl performs a depth-first search and has two key features:
- Images and Web pages can be matched against regular expressions
- Hence, include or exclude Web pages in the search

- Minimum and maximum image size can be specified
- Hence exclude images that are too small to contain hidden messages

- Calculation of true positive rate – the probability that an image detected by Stegdetect really has steganographic content

Slide 33 Percentages of (false) positives for analyzed images

Test

EBAY

USENET

JSteg

0.003

0.007

JPHide

1

2.1

OutGuess

0.1

0.14

### Percentages of positives for analyzed images

- After processing 2 million ebay images with Stagdetect
- Over 1% of all the images seemed to contain hidden content
- JPHide was detected most often

Slide 34 ### Verifying Hidden Content

- Stegdetect cannot guarantee a hidden message’s existence
- To verify the hidden content, Stegbreak must launch a dictionary attack against the JPEG files
- JSteg-Shell, JPHide, or Outguess all hide content based on a user-supplied password
- an attacker can try to guess the password by taking a large dictionary and trying to use every single word in it to retrieve the hidden message
- embedded header information, so attackers can verify a guessed password using header information

Slide 35 Stegbreak Performance on a 1,200- MHz Pentium III

System

ONE IMAGE (words/second)

FIFTY IMAGES (words/second)

JPHide

4,500

8,700

OutGuess

18,000

34,000

JSteg

36,000

47,000

### Stegbreak Performance

Slide 36 ### Results: Steganography Detection on the Internet

- From eBay and Usenet research
- No single hidden message was found

- Explanations for inability to find steganographic content on the Internet:
- All steganographic system users carefully choose passwords that are not susceptible to dictionary attacks
- Maybe images from sources that were not analyze carry steganographic content
- Nobody uses steganographic systems that researchers could find
- All messages are too small for analysis to detect

Either they are looking in the wrong place or there is no widespread use of steganography on the Internet

Slide 37 ### Conclusion

- Today, computer and network technologies provide easy-to-use communication channels for steganography
- Research work
- Provides an overview of existing steganographic systems
- presents methods for detecting them via statistical steganalysis

Slide 38 ### Future Work

- Research new algorithms to
- Hide information
- Improve Steganalysis

Slide 39 ### References

- Hide and Seek: An Introduction to Steganography, Niels Provos, Peter Honeyman, IEEE Security and Privacy Journal, May-June 2003
- Cyber warfare: steganography vs. steganalysis , Huaiqing Wang, Shuozhong Wang , Communications of the ACM, Volume 47, Issue 10, October 2004
- http://www.outguess.org/detection.php
- http://www.jjtc.com/Security/stegtools.htm
- http://www.stack.nl/~galactus/remailers/index-stego.html

Slide 40 ### Thanks a lot …

For Your

Presence

And

Patience

Slide 41 Slide 42 ### Homework

Presentation Slides and Research Papers are available at :

www.umbc.edu/~chauhan2/CMSC691I/