coe 589 carving contiguous and fragmented files with fast object validation n.
Skip this Video
Loading SlideShow in 5 Seconds..
COE-589 Carving contiguous and fragmented files with fast object validation PowerPoint Presentation
Download Presentation
COE-589 Carving contiguous and fragmented files with fast object validation

Loading in 2 Seconds...

play fullscreen
1 / 49

COE-589 Carving contiguous and fragmented files with fast object validation - PowerPoint PPT Presentation

  • Uploaded on

COE-589 Carving contiguous and fragmented files with fast object validation. Author: Simson L. Garfinkel Presented by: Mohammad Faizuddin g201106390. Outline . Introduction Limitations of File Carving Programs Contribution Related work Fragmentation in the wild

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'COE-589 Carving contiguous and fragmented files with fast object validation' - clark

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
coe 589 carving contiguous and fragmented files with fast object validation

COE-589Carving contiguous and fragmented files with fast object validation

Author: Simson L. Garfinkel

Presented by: Mohammad Faizuddin


  • Introduction
  • Limitations of File Carving Programs
  • Contribution
  • Related work
  • Fragmentation in the wild
  • Experimental Methodology
  • Object Validation
  • Pluggable validator framework
  • Carving with validation
  • Contiguous carving algorithms
  • Fragment Recovery Carving
  • Conclusions
  • Future work
  • File Carving
    • Reconstruction of files based on their content, rather than using metadata that points to the content.
  • Carving is useful for both computer forensics and data recovery.
  • Challenges
    • Files to be carved must be recognized in the disk image.
    • Some process must establish if the files are intact or not.
    • The files must be copied out of the disk image and presented to the examiner or analyst in a manner that makes sense.
limitations of file carving programs
Limitations of File Carving Programs
  • Most of today’s file carving programs share two important limitations.
      • Can only carve data files that are contiguous.
      • Carvers do not perform extensive validation on the files that they carve and , as a result, present the examiner with many false positives.
  • This paper significantly advances our understanding of the carving problem in three ways
    • First, a detailed survey of file system fragmentation statistics from more than 300 active file systems from drives that were acquired on the secondary market.
    • Second, this paper considers the ranges of options available for carving tools to validate carved data.
    • Third, this paper discusses the results of applying these algorithms to the DFRWS 2006 Carving Challenge.
related work
Related work
  • Defense Computer Forensics Lab developed CarvThis in 1999.
  • carvThisinsipiredAgent Kris Kendall to develop a carving program called SNARFIT.
  • Foremost was released as an open source carving tool.
  • Mikus extended Foremost while working on his master’s thesis and released version 1.4 in February 2007.
  • Richard and Roussev re-implemented the Foremost and the resulting tool was called Scalpel.
related work cont 2
Related work cont.(2)
  • Garfinkel introduced several techniques for carving fragmented files in his submission to the 2006 challenge.
  • CarvFS and LibCarvPath are virtual file system implementations that provide for “zero-storage carving”.
  • Douceur and Bolosky (1999) conducted a study of 10,568 file systems from 4801 personal computers running Microsoft Windows at Microsoft.
fragmentation in the wild
Fragmentation in the wild
  • A copy of Garfinkel’s used hard drive corpus is obtained for this paper.
  • Garfinkel’s corpus contains drive images collected over an eight year period (1998-2006) from the US, Canada, England, France, Germany, Bosnia, and New Zealand.
  • Many of the drives were purchased on eBay.
  • One third of the drives in the corpus were sanitized before they were sold.
  • The fragmentation pattern observed on those drives are typically close to the patterns found in drives of forensic interest.
experimental methodology
Experimental Methodology
  • Garfinkel’s corpus was delivered as a series of AFF files ranging between 100 k and 20 GB bytes in length.
  • Analysis performed using Carrier’s Slueth Kit and a file walking program specially written for this project.
  • Results stored in text files and later imported into an SQL database where further analysis was performed.
experimental methodology cont 2
Experimental Methodology Cont.(2)
  • Slueth Kit identified active file systems on 449 of the disk images in the Garfinkel corpus.
  • Many drives in Garfinkel corpus were either completely blank or completely formatted with an FAT or NTFS file system.
  • Only 324 hard drives contained more than five files.
  • Slueth Kit identified 2,204,139 files with file names of which 2,143,553 has associated data.
fragmentation distribution
Fragmentation distribution
  • 125,659 (6%) of the files recovered from the corpus were fragmented.
  • Half of the drives had not a single fragmented file.
  • 30 drives had more than 10% of their files fragmented into two or more pieces.
fragmentation distribution cont 2
Fragmentation distribution Cont.(2)
  • Modern operating systems try to write files without fragmentation because these files are faster to write and to read.
  • Three conditions under which an operating system must write a file with two or more fragments.
    • No contiguous region of sectors on the media.
    • No sufficient unallocated sectors at the end of the file to accommodate the new data.
    • File system itself may not support writing files of a certain size in a contiguous manner (e.g. Unix File System).
fragmentation distribution cont 3
Fragmentation distribution Cont.(3)
  • Files on Unix File System (UFS) were far more likely fragmented than those on FAT or NTFS volumes.
fragmentation by file extension
Fragmentation by file extension
  • High fragmentation rates were seen for log files and PST files.
  • Surprised to see that TMP files were most highly fragmented.
  • High fragmentation rates for file types (e.g. AVI, DOC, JPEG and PST ) that are likely to be of interest by forensic examiners.
files split into two fragments
Files split into two fragments
  • Term bifragmented describe a file that is split into two fragments.
  • Bifragmented files can be carved using straightforward algorithms.
  • Table shows bifragmented files observed in the corpus for the 20 most popular file extensions.
files split into two fragments cont 2
Files split into two fragments Cont.(2)
  • Performed Histogram analysis of the most common gap sizes between the first and the second fragment.
files split into two fragments cont 3
Files split into two fragments Cont.(3)
  • Tables show common gap sizes for JPEG and HTML files.
  • Gaps are represented in sectors ( 1 sector = 512 byte).
files split into two fragments cont 4
Files split into two fragments Cont.(4)
  • Table shows more files with a gap of eight blocks than the files with a gap of eight sectors.
  • It appears that some of the files with gaps of 16 or 32 sectors were actually on file systems with a cluster size of two or four sectors.
highly fragmented files
Highly Fragmented files
  • Small number of drives in the corpus had files that were highly fragmented.
    • Total of 6731 files on 63 drives had more than 100 fragments.
    • 592 files on 12 drives had more than 1000.
  • Highly fragmented files
    • Large DLLs and CAB files.
fragmentation and volume size
Fragmentation and volume size
  • Large hard drives are less likely to have fragmented files than the smaller hard drives.
  • In the Garfinkel’s corpus
    • 303 drives were smaller than 20GB.
    • 21 were larger than 20GB.
  • Most highly fragmented drives
    • 10-20 GB range (e.g. A 14 GB drive had 43% of drive’s 2517 JPEGs were fragmented).
  • Fragmentation does appear to go down as drive size increases
    • 4.3 GB drive had 34% fragmentation.
    • 9 GB drive had 33% fragmentation.
object validation
Object Validation
  • Object Validation
    • process of determining which sequence of bytes represent valid Microsoft Office files, JPEGs, or other kinds of data object.
  • Object Validation is a superset of file validation
    • It is possible to extract, validate and ultimately use meaningful components from with in a file (e.g. extracting a JPEG image embedded with in a Word file).
fast object validation
Fast object Validation
  • Validator
    • attempts to determine if a sequence of bytes is a valid file.
  • A disk with n bytes has (n)(n+1)/2 possible strings; thus, a 200 GB hard drive require 2.0 X different validations.
  • JPEG decompressor in FAT or NTFS file systems reduces the number of validations from 1.9 X to 4 X.
validating headers and footers
Validating Headers and Footers
  • Verifies static headers and footers.
  • JPEG files
    • begin with FF DE FF followed by an E0 or E1.
    • end with FF D9.
  • Chance of these patterns occurring randomly in any arbitrary object is 2 in .
  • Limitation
    • Fails in discovering sectors that are inserted, deleted or modified between header and footer because these sectors are never examined
  • Should be used to reject a data.
validating container structures
Validating Container Structures
  • JPEG file
    • Contains metadata, color tables and Huffman-encoded image.
  • Zip files
    • Contains directory and multiple compressed files
  • Microsoft word files
    • Contains Master Sector Allocation Table, a Sector Allocation Table, a Short Sector Allocation Table, a directory and one or more data streams.
validating container structures cont 2
Validating Container Structures Cont.(2)
  • Container structures have integers and pointers.
  • Validating requires checking
    • If an Integer is within a predefined range.
    • Or Pointer points to another valid structure.
  • Container structure validation is more likely than header/footer validation to detect incorrect byte sequences or sectors.
validating with decompression
Validating with decompression
  • Validate actual data contained.
  • Huffman-code is decompressed to display JPEG image.
  • JPEG decompressor frequently decompress corrupt data for many sectors before detecting error.
  • 2006 challenge
    • A photo present in two fragments (from sectors 31,533-31,752 and 31,888-32,773).
validating with decompression cont 2
Validating with decompression Cont.(2)
  • JPEG decompressor
    • Input contiguous stream of sectors.
    • Does not generate error until it reaches 31,761.
    • 9 sectors in the range 31,733-31,760 decompress as valid data, even though they are wrong.
validating with decompression cont 3
Validating with decompression Cont.(3)
  • JPEG decompressor
    • Decompress many invalid sectors before realizing the problem.
    • For a corrupted data never conclude that the entire JPEG had been properly decompressed without error.
    • Successful as a validator.
validating with decompression cont 4
Validating with decompression Cont.(4)
  • Using JPEG decompressor
    • Able to build a carving tool
  • Carving tool
    • Automatically carve both contiguous and fragmented JPEG files on the DFRWS 2006 with no false positives.
    • Six contiguous JPEGS identified and carved in 6 s.
semantic validation
Semantic validation
  • Use of English and other human languages to automatically validate data objects.
  • Garfinkel solved part of the 2006 Challenge
    • Using manually tuned corpus recognizer that based its decisions on vocabulary unique to each text in question.
manual validation
Manual validation
  • Manual validation
    • Users think accurate way to validate an object.
    • Still not definitive.
  • Word and Excel open files that contain substituted sectors.
  • Open file and examine with human eyes
    • Not possible in automated framework.
  • Best object validators give false positive.
pluggable validator framework
Pluggable validator framework
  • Implements each object validator as a C++ class.
  • Framework allows
    • Validator to perform fast operations first
    • Slow operations only if the fast ones succeed
    • To provide feedback from validator to the carvers.
validator return values
Validator return values
  • Validator supports a richer set of returns for more efficient carvers.
validator methods
Validator methods
  • Validator must implement one method Validation_function().
  • Validation_function()
    • Input is sequence of bytes.
    • Returns
      • V_OK if sequence validates.
      • V_ERR if it does not.
      • Optionally V_EOF if the validator runs out of data.
validator methods cont 2
Validator methods Cont.(2)
  • Validators may implement additional methods for
    • Sequence(s) of bytes in
      • File header.
      • File footer.
    • A variable that indicates the allocation increment used by file creators
      • JPEG files allocated in 1-byte increments.
      • Office files allocated in 512-byte increments.
    • Err_is_prefixflag.
    • Appended_data_ignored flag.
    • No_zblocksflag.
    • Plaintext_container.
    • Length_function.
    • Offset_funtion.
validator methods cont 3
Validator methods Cont.(3)
  • Implemented three validators with this architecture
    • V_jpeg
      • Checks JPEG segments and attempts to decompress the JPEG image using a modified libjpeg version.
    • V_msole
      • checks CDH, MSAT, SAT, and SSAT of Microsoft office and attempts to extract text out of the file using wvWare library.
    • V_zip
      • Validates the ZIP ECDR and CDR structures then uses unzip –t command to validate the compressed data.
carving with validation
Carving with validation
  • Developed a carving framework that allows to create carvers that implement different algorithms using a common set of primitives.
  • Framework
    • Starts with a byte in a given sector.
    • Attempts to grow the byte into a contiguous run of bytes .
    • Periodically validating the resulting string.
carving with validation cont 2
Carving with validation Cont.(2)
  • Several optimizations are provided
    • Carver maintains a map of sectors that are
      • Already carved.
      • Available for carving.
    • If zblockflag set, the run is abandoned if the carver encounters a block filled with NULs.
    • If err_is_prefixflag set, the run is abandoned when the validator stops returning V_EOFand start returning V_ERR.
    • If appended_data_ignoredflag set, the run’s length found by performing binary search on run lengths.
carving algorithms
Carving algorithms
  • Contiguous carving algorithms
    • Support block based carving
    • Support Character based carving
  • Fragment Recovery Carving
    • Carving method in which two or more fragments are reassembled to form the original file or object.
  • Garfinkel called this approach “split carving”.
contiguous carving algorithms header footer carving
Contiguous carving algorithms:Header/footer carving
  • Carving files out of raw data using
    • Distinct header
    • Distinct footer
  • Algorithm works
    • By finding all strings contained within the disk image with a set of headers and footers
    • And submitting them to the validator
contiguous carving algorithms header maximum size carving
Contiguous carving algorithms:Header/maximum size carving
  • Submits strings to the validator that begin with each discernible header and continue to the end of the disk image.
  • Binary search is performed to find the longest string sequence that still validates.
  • Header/maximum size carving works because
    • Many file formats (e.g. JPEG, MP3) do not care if additional data are appended to the end of a valid file.
contiguous carving algorithms header embedded length carving
Contiguous carving algorithms:Header/embedded length carving
  • Carver scans the image file for sectors that can be identified as the start of file.
  • Sectors are taken as the seeds of objects.
  • Seeds are grown one sector at a time by passing each object to the validator.
  • Validator returns
    • Length of the object.
    • V_ERR.
  • If length is found, information is used to create test object for validation.
  • If object is found with a given start vector, the carver moves to next sector.
contiguous carving algorithms file trimming
Contiguous carving algorithms:File trimming
  • Trimming
    • Removing content from the end of an object that was not part of the original file.
  • Two ways for automating trimming
    • Footer trimming (In case of JPEG and ZIP).
    • Character trimming (byte-at-a-time formats).
fragment recovery carving bifragment gap carving
Fragment Recovery Carving:Bifragment Gap Carving
  • Improved algorithm for split carving.
  • Places gap between the start and the end flags.
  • O() for carving a single object for file formats with recognizable header and footer.
  • O() for finding all bifragmented objects of a particular type.


fragment recovery carving bifragment carving with constant size and known offset
Fragment Recovery Carving:Bifragment Carving with constant size and known offset
  • Carver makes use of CDH to find and recover MSOLE files.
  • Employs an algorithm similar to gap carving except that the two independent variables are
    • Number of sectors in the first fragment.
    • Starting sector of the second fragment.
fragment recovery carving bifragment carving with constant size and known offset1
Fragment Recovery Carving:Bifragment Carving with constant size and known offset
  • O() if
    • CDH location is known.
    • MSAT appears in the second fragment.
  • O() if
    • The forensic analyst desires to find all bifragmented MSOLE files in the disk image.
  • 2006 challenge
    • Able to recover all Microsoft word and Excel files that were split in two pieces.
    • Number of false positives was low and were able to manually eliminate the incorrect ones.
    • Challenge was in three pieces.
  • Files contain significant internal structure, that can be used
    • To improve today’s file carvers.
    • Carve files that are fragmented into more than one piece
  • Carvers should attempt to handle the carving of fragmented files.
future work
Future work
  • Modify our carver to take into account the output of SleuthKit and see how many orphan files can actually be validated.
  • Integrate semantic carving into our carving system.
  • Developing an intelligent carver that can automatically suppress
    • The sectors that belong to allocated files
    • Sectors that match sectors of known good files.