Trends and directions of mass storage in the scientific computing arena cas 2001
Download
1 / 25

Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001. Gene Harano National Center for Atmospheric Research. Vision. How do we accomplish that vision? Handling large datasets – Analysis and Visualization Shared File Systems and Cache Pools Middleware and layering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001' - brianne-haughey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Trends and directions of mass storage in the scientific computing arena cas 2001

Trends and Directions of Mass Storage in the Scientific Computing ArenaCAS 2001

Gene Harano

National Center for Atmospheric Research


Vision
Vision Computing Arena

  • How do we accomplish that vision?

    • Handling large datasets – Analysis and Visualization

    • Shared File Systems and Cache Pools

    • Middleware and layering

    • Management tools

    • Emerging Technologies

    • (To name a few)


Large datasets
Large Datasets Computing Arena

  • The NCAR MSS was originally a tape based archive.

  • NCAR MSS average file size is 35 MBs (11 M files); small due to historical restrictions (single volume datasets, model history files) and a large number (25%) of files < 1 MB (user backups)

  • Single TB sized files are common for visualization and analysis

    • Currently these large files are sliced up prior to landing in the archive.

    • Access is generally sequential, but some random access.


Large datasets1
Large Datasets Computing Arena

  • Are tape based archives obsolete?

    • No, but there is a need to reevaluate the entire storage structure at NCAR.

      • Cache pools

      • Data warehouses, data sub-setting

    • The NCAR MSS is being treated as a shared file system rather than an archive.


Shared file system
Shared File System Computing Arena

Web/

GRID/

servers

  • Heterogeneous

  • High-Performance

  • High-Capacity

  • Doesn’t yet exist.

Programmatic

Shared Data

Command

Line


Cache pools
Cache Pools Computing Arena

  • External to the archive

    • Minimize archive activity

    • Temporary data stays out of the archive

    • Customized for a smaller set of associated data

  • Internal to the archive

    • Minimize tape activity

    • Improve response time

    • Federate and distribute

    • Repackage small files for tape storage under system control


Terascale modeling analysis
Terascale Modeling & Analysis Computing Arena

Advanced Research Computing System (IBM SP)

MSS

Proxy

Data analysis

GPFS

Shared File System


Terascale analysis visualization
Terascale Analysis & Visualization Computing Arena

Vislab

MSS Proxy

Data analysis

Storage Area Network

Shared File System


Data provisioning access
Data Provisioning & Access Computing Arena

MSS

Proxy

Data

Processor

Unidata,

DODs

DSS server

CDP/ESG

Storage Area Network

Shared File System


Internal cache pools
Internal Cache Pools Computing Arena

  • NCAR MSS event log modeling (April 2000 – April 2001) – looking at tape activity

  • 20 TB cache pool – can be federated and distributed

    • 30 day average cache residency

    • 70% reduction in tape read-backs

    • Greatly enhanced response time

    • Reduce the amount of tape resources or redefine their use.


Middleware and layering
Middleware and Layering Computing Arena

Role of an archive

  • An Archive performs 2 basic functions

    • Reliably storing data

    • Returning data on demand

  • Data analysis, data mining, data assimilation, distributed data servers, etc. are functions utilizing middleware that sits on top of an archive and should be implemented independent of the underlying archive.


Middleware and layering1
Middleware and Layering Computing Arena

  • Separate archive functionality from

    • Visualization

    • Data servers

    • Data warehousing, data mining, data subsetting

    • Web and Grid access

    • Etc.

  • Maximally enables the use of COTS

  • Allows (transparent) replacement of components as needed

  • Fill the gaps with custom software


Future Data Services Computing Arena

NCAR MSS Archive

WEB

Visualization

Data Analysis/Mining/Assimilation

Digital Libraries, Data Servers

Data Cataloging/

Searching

Data Storage

Data Storage

File Cache

Services Pools


Management tools
Management Tools Computing Arena

  • There is a need for better user and system management tools as MSS capacity scales.

  • How does a single user manage 1 million files?

  • How does a MSS administrator dynamically tune a system, predict workloads, find and correct bottlenecks?


Management tools1
Management Tools Computing Arena

NCAR MSS tools

  • Defining new roles

    • Single ordinary user

    • MSS superuser

    • As users come and go, there is a need for:

      • Project superuser (new)

      • Division data administrator (new)

  • Web based metadata user tools

    • List, search, catalog holdings – metadata mining

    • Remove unwanted files


Management tools2
Management Tools Computing Arena

NCAR MSS tools

  • From the system perspective – utilize data warehousing and data mining techniques

    • System modeling using event logs.

      • Capacity planning

      • Identify bottlenecks

    • Operational monitoring

      • Track errors, identify trends (media problems)

      • Intrusion detection

      • Dynamic system tuning


Emerging technologies
Emerging Technologies Computing Arena

  • Data Path

  • Tape

  • Holographic Storage

  • Probe-Based MEMS

  • High-Density Rosetta (analog)


Data path
Data Path Computing Arena

  • HIPPI in use today in the NCAR archive

  • Fibre Channel will replace our HIPPI in the near term

    • FC SAN for RAID Cache Pools

    • FC SAN for Tape sharing

  • Others

    • iSCSI

    • FC over IP

    • Infiniband


Tape Computing Arena

1 TB,60MB,2004

2H02

200GB 1Q02

500GB 2003

Opt

2003

1 TB

9840B

DTF

3590E

SD-3

Mammoth 2

3570C

3590

Ultrium

2001

9940

9840

AIT-2

Accelis

SDLT

9490 EE

Mammoth

DLT-7000

3490 E

AIT

3570

DLT-4000

3480/90

Native Cartridge Capacity (GB)

Linear

Helical


Tape Computing Arena

  • To be competitive with magnetic disk, magnetic tape must grow at 10x each 5 years.

  • Achieved by a combination of increased areal density and longer (and possibly wider) tape.

    (from a storage vendor)


Tape Computing Arena

  • RAIT (Redundant Array of Independent Tapes)

    • Increased Performance

    • Higher Reliability with the use of parity

    • Higher single “volume” Capacity

    • Large datasets on a single “volume”

  • RAIL (Redundant Array of Independent Libraries)

    • Greater total system capacity

    • Improved response time

  • These are resource intensive solutions – dedicated libraries and drives


Holographic
Holographic Computing Arena

  • Large capacity – 10 GBs in a single cubic centimeter (10 Gbits/in2 for magnetic disk)

  • High-speed – 2 Gigabits/sec

  • Low power

  • Billions of write cycles


Probe based mems
Probe-Based MEMS Computing Arena

  • MEMS – Micro-Electrical Mechanical Systems

  • Probe-based storage arrays

    • Dense

    • Highly parallel to achieve high bandwidth

    • Rectilinear 2D positioning

    • Commercial devices in the next several years


Hd rosetta
HD Rosetta Computing Arena

  • Product marketed by Norsam Technologies

  • Developed at Los Alamos National Lab

  • Analog

    • Lifetime of 1000s of years

    • Can be read back with only a microscope

    • Stores text and images


ad