1 / 37

Information Management

Information Management. DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida. Original image* by Moshell et al. Imagery is fromWikimedia except where marked with *. File System Organization. * Disks have sectors ; each sector

fiona-rios
Download Presentation

Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by Moshell et al . Imagery is fromWikimedia except where marked with *.

  2. File System Organization * Disks have sectors; each sector has an address (integer) * A file is a collection of sectors. They can be contiguous or fragmented. * To find the sectors comprising a file, we need a directory. * The directory system records which sectors belong to each file. * The Operating System has software to manage directories & files. recovermyfiles.com planetoftunes.com

  3. Formatting a Disk • * factory (low level) format: • - timing tracks, etc. • "marks in the parking lot" • - usually not re-doable • * local reformatting: • * Check for read/write errors • * Mark good sectors and bad ones • * Create a list of available sectors • * Set up file structure: • - directory • - boot sector (for bootable drives) stripespls.com

  4. File System Organization * A simple (conceptual) architecture: Directory: * at sectors 22010, 22021 we have records: So the file is a linked list (like a treasure hunt) through the disk's sectors. (Not all disks are organized this way.) recovermyfiles.com planetoftunes.com

  5. File System Errors * Disk drive hardware checks parity when reading sectors * If a parity error occurs, data may have been lost * Usually this just reports a failure to the OS and you're stuck. However – the actual disk drive hardware can probably still read the data; it just doesn't LIKE it. So, specialized software can sometimes get this "bad checksum" data and display it ... we discuss this shortly.

  6. File System Organization * Deleting a file: The OS keeps an available sector list of sectors that can be reused. To DELETE a file, the system just changes its first and last links. (Think of out-of-service boxcars). The data is not gone, it's just unlinked. It will be overwritten, when (and if) the OS needs more space. recovermyfiles.com tdc.ca

  7. recovermyfiles.com • Losing and Recovering Data Now what if the directory or a sector gets screwed up? a) software error: erase the pointer or link to a file. or b) hardware error: part of directory or sector gets corrupted The data is still out there, but OS can't find it. If you can directly READ THE SECTORS, you will find broken strands of spaghetti ... with clues in 'em. restaurantwidow.com

  8. Recovering Data What clues exist? Links (obviously) if it's a linked system Try to reconstruct the files, or fragments of them Directory item numbers, if these exist Try to "work backwards" and reconstruct the directory The data itself (e. g. search for "Adams") Use syntactic knowledge to match up partial sentences in blocks. Which block might match that one? nguins live in Antarc... .. and we re spect the opinions of... 492.7 \t 333.9e14 ...

  9. Recovering Data If you have 'bad sectors' (i. e. bad checksums) Read the data and override the parity error messages Humans are normally required to look at the data and piece it back together. Success is not guaranteed. Formatting a drive writes 0 in all the sectors. SOME claim they can recover what was there before (maybe NSA can?) But it is not a high-percentage bet.

  10. Forensics: Finding Hidden Stuff * simplest cases: just "erased" your files? - straightforward disk recovery may work. * the famous photocopier story. - copiers have hard drives and remember what was copied. http://www.cbsnews.com/stories/2010/04/19/eveningnews/main6412439.shtml * RAMsticks are just like hard drives; "delete" does not empty. (Nonvolatile RAM versus volatile RAM. Why isn't it ALL nonvolatile?) macforensiclabs.com

  11. Forensics: Finding Hidden Stuff * virtual memory: copies part of your RAM into hard drive on computer. * those images may include print queues and other information that can be recovered. * backup systems may not have been reformatted even if the main hard drive was reformatted. * offsite backup probably was NOT reformatted; old sectors may have copies of data you wanted to make disappear. macforensiclabs.com

  12. File Structures: Summary * vocabulary terms throughout lecture * backup/archive/redundant storage * criteria for choice of offsite backup * understand and explain disk organization * understand how disk errors occur * analyze what data could be recovered from a particular accident * discuss forensic issues concerning disk data erasure and recovery motifake.com

  13. Cloud Computing and • Digital Asset Management • First let's look at the Cloud • - Where did it come from? • - What is it? • - How can it help me? • - What new skills will I need to use it? • - What effect does Cloud have on DAM?

  14. As of the Year 2000 ... • Most Internet Service Providers sold ( ... rented ...) • dedicated hosting • One website: delivered by 1 computer • mystore.com • shared virtual hosting yourstore.com • N websites each got 1/Nth computer • histore.com • herstore.com

  15. And a few giants (Yahoo, Google, • Amazon) • Built giant 'ad hoc' • systems with • thousands of CPUs • and petabytes of • storage. phaseoneenterprises.com

  16. And a few giants (Yahoo, Google, • Amazon) • Built giant 'ad hoc' • systems with • thousands of CPUs • and petabytes of • storage. • Amazon noticed ... • less than 10% of their capacity was being used • most of the time. phaseoneenterprises.com

  17. ... and in 2006 launched Amazon Web Services • The 'utility model': power plants • have capacity to meet • AVERAGE demand • and so can • deliver UNLIMITED* • power to some customers • when needed. • (*"Unlimited" as long as << total capacity) en.wikimedia.org

  18. The Shared Telescope Model • Astronomers worldwide • now schedule time on • big telescopes • through the Internet • and don't have to go to a cold mountaintop • and stay up all night • to capture imagery. as.utexas.edu

  19. The Shared Computing Model • NASA released NEBULA in 2008, • to share research computers • instead of building additional • data centers. • NEBULA is an open source cloud management • system.

  20. ... resembles the old Mainframe Timeshare model • Before PCs, we • programmed on punch-cards as.utexas.edu

  21. ... resembles the old Mainframe Timeshare model • Before PCs, we • programmed on punch-cards • and thought it was a • great INNOVATION • when time-sharing • became possible. as.utexas.edu

  22. But with one fundamental difference: • In 1965 this was SCARCE • and we were NUMEROUS • (relatively) • (Skilled specialists who wanted to use computers) as.utexas.edu redlinecs.com.au

  23. But with one fundamental difference: • In 2012 this is ABUNDANT • and we are • EVERYONE allthingsdistributed.com reuters.com

  24. ... relies on fast, reliable networks • ... may reduce your company's IT costs • * software is expensive – so RENT it • * hardware is expensive to update – so RENT it • * buildings are expensive – so share them • * land is expensive – build in rural areas

  25. Key Cloud Concepts: • 1. Agility through dynamic provisioning • - Order up "supercomputer for an hour" • 2. API Accessibility • - Your program can specify the needed QOS* • QOS: Quality of Service: • - Maximum guaranteed latency (e. g. <1ms) • - Minimum guaranteed CPU (e. g. >1 petaflop)

  26. What's a flop? • "Floating Point Operations" like x=239.44*456.3733 • per second • Math models (physics, stock market, statistics) • may need tera = billion*billion of flops • giga = 109 • tera = 1012 • peta = 1015 • exa = 1018

  27. Key Cloud Concepts: 1. Agility through dynamic provisioning - Order up "supercomputer for an hour" 2. API Accessibility - Your program can specify the needed QOS* 3. Virtualization - You "THINK" you have your own machine - Protection models don't need to be reinvented http://www.vmware.com/virtualization/

  28. One Key Cloud Concern: SECURITY. (I know this guy) http://www.acsac.org/2012/workshops/ccw/ One solution (for larger firms): Build your own Cloud. http://www.enterprisenetworkingplanet.com/ebooks/50950510/95900/4190310/

  29. Quickly, web-hosts realized that they could virtualize their service bigbird.com cookie.com elmo.com kermit.com piggie.com

  30. Software as a Service (SaaS) The 800 pound anthropoid: Salesforce.com http://www.salesforce.com sales cloud (CRM systems) force.com – build your own pin.primate.wisc.edu

  31. Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet - http://www.mediavalet.co/home.aspx Widen Fordela

  32. Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet - http://www.mediavalet.co/home.aspx "CMIS compliant?"

  33. Content Management Interoperability Standard http://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS is an open standard that defines how DAM systems can manage metadata ("generic properties") for files and folders. Adobe, HP, IBM, Microsoft, Oracle + + +

  34. Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen - http://www.widen.com/ Fordela

  35. Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen Fordela http://www.fordela.com/ - VIDEO focus (started by LucasArts veterans)

  36. Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide http://www.datamation.com/storage/digital-asset-management-buying-guide-1.html

  37. Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide http://www.datamation.com/storage/digital-asset-management-buying-guide-1.html End of lecture ... End of lectureS. When we return ... Project Show-and-tell!

More Related