1 / 25

Bulk Extractor Advanced Topics Webinar BitCurator Consortium

Bulk Extractor Advanced Topics Webinar BitCurator Consortium. Michael Olson, Stanford University Sandy Ortiz, Stanford University February 16, 2017. Topics. Bulk Extractor overview Why we use it at Stanford Bulk Extractor 1.6.0 –dev Advanced features – definitions. Topics continued.

ebetty
Download Presentation

Bulk Extractor Advanced Topics Webinar BitCurator Consortium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bulk Extractor Advanced Topics Webinar BitCurator Consortium Michael Olson, Stanford University Sandy Ortiz, Stanford University February 16, 2017

  2. Topics • Bulk Extractor overview • Why we use it at Stanford • Bulk Extractor 1.6.0 –dev • Advanced features – definitions

  3. Topics continued • Requirements to run • Configuration • Sample run • Results • Discussion / Questions

  4. What is Bulk Extractor? • Software that scans disk images, files, file directories • Identifies potentially sensitive information: SSN, financial data, etc. • Creates histograms of features

  5. Bulk Extractor @ Stanford • Identify PII in BD collections • Data classification mandate • Identify collection specific data for further analysis

  6. Bulk Extractor • BitCurator 1.8.16 running Bulk Extractor viewer 1.6.0 -dev • Performance / scanner improvements • BEViewer usability improvements

  7. BEViewer interface

  8. https://www.krogen.co/alice-in-wonderland-paintings/alice-in-wonderland-paintings-top-25-best-alice-in-wonderland-artwork-ideas-on-pinterest-picture/https://www.krogen.co/alice-in-wonderland-paintings/alice-in-wonderland-paintings-top-25-best-alice-in-wonderland-artwork-ideas-on-pinterest-picture/

  9. Going Down the Rabbit hole... Source: https://image.shutterstock.com/z/stock-vector-alice-is-falling-down-into-the-rabbit-hole-170986505.jpg

  10. Overview • Define Stop List, Wordlist, Alert List, Find regex text • Define requirements to run General Option Find regex text • Sample run configuration • Sample run results review

  11. Definitions Stop List (White list, saves time and processing) A stop list can simply be a list of words that the user wants bulk_extractor to ignore. Stop lists can also be used to remove features not relevant to a case. See section 4.4 suppressing false positives, p.24 Wordlist (if password cracking or custom analysis is needed) A list of all “words” extracted from the disk, useful for password cracking or to discover if an author ever used a specific term (including in deleted/hidden files). Note that the words this scanner can access depend on which other scanners are on; to include words in .zip files, for example, you'd need to have the "zip" scanner enabled. General option. This is disabled by default. See Section 5.4 p. 32. Alert List (Red list, context sensitive term search) The alert list can contain a list of words and/or feature filenames, and when a match is found, it will alert the user. The way the feature file alert works is similar to how they are used for context-sensitive stop lists. It will only alert on a specified feature when it’s found in the specified context. General option. This is disabled by default. See section 4.5 p. 26 Find Regex Text File (custom lexicon file; read vs find occurrences of) The find scanner reads through the data for anything listed in the global find list. The format of the find list should be rows of regular expressions while any line beginning with a # is considered a comment. CASE SENSITIVE. Terms will match on case only. See section 5.3, p. 29-32. Source: Bulk Extractor Users Manual v1.4

  12. Requirements to Run

  13. Sample Run: Configuration

  14. Sample Run: How it works Program> Output (Destination path) bulk_extractor -o /media/veracrypt1/NTFS_Pract_2017/Find_NTFS_Pract_2017 Option (Use Find regex Text file) > File path -F /home/bcadmin/Desktop/Persona.faculty2.UCI.english.lex.txt Source (Image path) /media/veracrypt1/NTFS_Pract_2017/NTFS_Pract_2017.E01

  15. Sample Run: How it works • Find scanner - One term/one pass over entire image. i.e. 853 term lexicon - 853 passes over image. Very inefficient. • Lightgrep scanner - Group of terms searched for in current buffer (processing segment). One pass through image, looking for all terms per “chunk.” Higher efficiency. • Refer to liblightgrep dev blog for details http://strozfriedberg.github.io/liblightgrep/ • Refer to Bulk Extractor summary by Garfinkel http://downloads.digitalcorpora.org/downloads/bulk_extractor/2014-07-17_BE15.pdf • Several scanners may write to several different feature files.

  16. Note: bulk_extractor version Scanners included: httplogs, lightgrep msxml,sqlite Scanners Missing:

  17. Sample Run Start: Hardware Monitor Keep an eye on your CPU... Open Hardware Monitor: CPU Temp rising 53c CPU Load 53% Host Memory: 16 GB Guest Memory: 10GB Guest Swap File Size: 11GB

  18. Sample Run: Finish Approximately 12 min processing time Rabbit Hole curiosity #1: 272 MB processed, 524MB source image?? Default scanners + faculty lexicon.

  19. Sample Run: Results Alert feature file Only one term found out of 41. Rabbit hole curiosity #2. Why?

  20. Sample Run: Results Service term count 3693 Service term count 3435

  21. References Bradley, J.R., Garfinkel, S. (2015, March 23). Bulk Extractor Users Manual v. 1.4l[PDF]. Retrieved from http://downloads.digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf Bulk Extractor 1.6.0 release notes. Retrieved from https://github.com/simsong/bulk_extractor/blob/master/doc/announce/announce_1.6.0.md ePadd Lexicon(n.d.). Persona.faculty2.UCI.english.lex.txt. Retrieved from https://drive.google.com/open?id=0B89h5GBZe8FaMGptb1locDQzOUk ePadd Lexicon Library(n.d.). Retrieved from https://library.stanford.edu/projects/epadd/community/lexicon-working-group Friedberg, S.(n.d.). Liblightgrep technical info[Blog]. Retrieved from http://strozfriedberg.github.io/liblightgrep/

  22. References Garfinkel, S.(n.d.). Bulk Extractor 1.5 Overview[PDF]. Retrieved from http://downloads.digitalcorpora.org/downloads/bulk_extractor/2014-07-17_BE15.pdf Linux LEO(n.d.). Sample Image[.E01] NTFS, 524MB. Retreived from http://linuxleo.com/Files/NTFS_Pract_2017_E01.tar.gz Stanford Risk Classifications. Retrieved from https://uit.stanford.edu/guide/riskclassifications Stevens, C., Malan, D., Garfinkel, S., Dubec, K. A., & Pham, C. (2006). Advanced forensic format: An open, extensible format for disk imaging. International Federation for Information Processing. Retrieved from https://dash.harvard.edu/handle/1/2829932

  23. Questions ?

More Related