1 / 11

Agenda

CS C446 Data Storage Technologies & Networks. Agenda. Course Motivation Storage Requirements (Models) Large Data Cases Data Explosion Data Characteristics & Storage Characteristics. Course Focus. Storage Requirements. From a (logical) computing perspective: Transitory data

turi
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS C446 Data Storage Technologies & Networks Agenda Course Motivation Storage Requirements (Models) Large Data Cases Data Explosion Data Characteristics & Storage Characteristics

  2. Course Focus Storage Requirements • From a (logical) computing perspective: • Transitory data • To be stored for the period of computation. • Persistent, Isolated data • To be stored across computations but useful only by (or through) a single computer • Persistent, Shared data • To be stored across computations and used by (or through) multiple computers • Persistent, Exportable data • To be stored beyond computations and possibly used by external (non-computing) systems Is sharing of transitory data meaningful? Sundar B.

  3. Storage Requirements [2] • Persistent Isolated Data • Data is accessible to (or accessible through) a single computer and persistent across computations • (Input/Output on the) Storage is controlled by the computer • Types of (logical) data accesses: • Large streams – either text or binary (e.g. program code, multimedia) • Transactional units – records • Applications need not be aware of physical details of storage • Operating System provides a logical layer – File System • Special purpose logical layers are possible – Database System Sundar B.

  4. Network Attached Storage vs. Storage Area Networks Storage Requirements [3] • Persistent Shared Data • Data is accessible to (or accessible through) multiple computers and persistent across computations • Storage is shared by multiple computers i.e. available on a network • Question: Is the network same as the “network of computers”? • Question: Is the network “transparent” to the computers? Sundar B.

  5. Storage Requirements[4] • Network Attached Storage • Shared storage is on the same network as computers • Computers are aware of the network and the fact that storage is attached over the network • Translation: Data are accessed as files or logical units • Storage Area Network • Shared storage is on a different network from the computer network – • but these networks are connected. • Computers are not aware of this (storage) network • Translation: Data are accessed raw (as from direct storage) Sundar B.

  6. Large Data – Cases • Case 1: Genome Database • http://csis/faculty/sundarb/courses/dstn/lectures/lec2-cases/genome.txt • Case 2: Mass General – X-Rays • http://csis/faculty/sundarb/courses/dstn/lectures/lec2-cases/hospital.txt • Case 3: Google’s replica of the web • http://csis/faculty/sundarb/courses/dstn/lectures/lec2-cases/google.txt Sundar B.

  7. Data Explosion • Consider Case 2 (Mass General): • Mass General is moving to 3D images • (in technicolor?) • Increasing Resolution • Consider Case 3 (Google): • Number of websites/pages is ever increasing • Google Library (Books) • Google Earth (Geo/Carto graphic Images) • More generally, • Businesses collecting more information 24x365 • New (automated) technologies for data collection (e.g. RFID) Sundar B.

  8. Data Explosion [2] • Examples of data collection*: • Birth certificates by hospitals in the U.S. • 1983: 280 bytes • 1996: 1864 bytes • Grocery store purchase entry • 1983: 32 bytes • 1996: 1272 bytes • Reference: • L. Sweeney, Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Sundar B.

  9. Data Explosion [3] • Data quality (resolution) • Refer to example data collections • Data Availability and Access • Copies/Replicas for simultaneous access or low latency access • All major websites are “Akamaized” • Copies/Replicas for fault tolerance / disaster recovery • Companies were up and running within a few hours after WTC collapse • Regulatory Compliance • HIPAA (U.S. Govt.) requires 7 years of data to be stored by hospitals • Sarbanes-Oaxley (U.S. Govt.) requires documentation on corporate governance • i.e. all corporate decisions / deliberations must be recorded Sundar B.

  10. Data Explosion [4] • High volume requirements drive market • Low cost storage • Leads to Increased access to storage • Personal / Organizational storage is more affordable • Personal Multimedia content (mp3 songs, digicam/mobile pictures/videos) is increasing • All Course contents online and growing; every technology has its website (dhcp.com, snia.org, …) • Mass storage services are feasible • Gmail gives 2+x GB, x is monotonically increasing over time • Already > 1.2 billion email users (not all in Gmail but …) and growing • Blog sites, Photosharing sites, {naukri, monster}.coms, Pornographic sites, … • Nothing is ever deleted from websites – local or global! • Data is the new entropy!!! Sundar B.

  11. Data & Storage Characteristics • Data • May be transactional or stream data • But 80% of data is “semi-structured” or “unstructured”: • X-Ray image does not have any structure • A website (in HTML) is semi-structured • Is business critical • Storage • Must be highly available • With redundancy/replication and across non-local networks • Must provide high data rates • Must support both streaming and transactional access! Sundar B.

More Related