1 / 26

FreeLoader : Lightweight Data Management for Scientific Visualization

FreeLoader : Lightweight Data Management for Scientific Visualization. Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan Vazhkudai 2 1. North Carolina State University 2. Oak Ridge National Laboratory September, 2004. Roadmap. Motivation

ignatius
Download Presentation

FreeLoader : Lightweight Data Management for Scientific Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh1 Xiaosong Ma1,2 Nandan Tammineedi1 Jonathan Strickland1 Sudharshan Vazhkudai2 1. North Carolina State University 2. Oak Ridge National Laboratory September, 2004

  2. Roadmap • Motivation • FreeLoader architecture • Initial design and optimization • Preliminary results • In-progress and future work

  3. Motivation: Data Avalanche • More data to process • Science, industry, government • Example: scientific data • Better observational instruments • Better experimental instruments • More simulation power P&E Gene Sequencer From http://www.genome.uci.edu/ Space Telescope (Picture courtesy: Jim Gray, SLAC Data Management Workshop)

  4. Motivation: Needs for Remote Data Data acquisition, reduction, analysis, visualization, storage Data Acquisition System Remote users with local computing and storage High Speed Network raw data Remote users Metadata Local users Remote storage Supercomputers

  5. Motivation: Remote Data Sources • Supercomputing centers • Shared file systems • Archiving systems • Data centers • Internet • World Wide Telescope Virtual Observatory • NCBI bio databases • Tools used in access • FTP, GridFTP • Grid file systems • Customized data migration program • Web browser

  6. Motivation: Insufficient Local Storage • End user consumes data locally • Convenience and control • Better CPU/memory configurations • Problem 1: needs local space to hold data • Problem 2: getting data from remote sources is slow • Dataset characteristics • Write-once, read-many (or a few) • Raw data often discarded • Shared interest to same data among groups • Primary copy archived somewhere

  7. Condor for Storage? • Harnessing storage resources of individual workstations ~ Harnessing idle CPU cycles

  8. Why would it work, and work well? • Average workstations have more and more GBs • And half of the space is idle! • Even a modest contribution (Contribution << Available) can amass collective, staggering aggregate storage! • Increasing numbers of workstations are online most of the time [desk-top grid research] • Access locality, aggregate I/O and network bandwidth, data sharing

  9. Use Cases • FreeLoader storage cloud as a: • Cache • Local, client-side scratch • Intermediate hop • Grid replica • RAS for Terascale Supercomputers

  10. Related Work and Design Issues • Related Work: • Network/Distributed File Systems (NFS, LOCUS) • Parallel File Systems (PVFS, XFS) • Serverless File Systems (FARSITE, xFS, GFS) • Peer-to-Peer Storage (OceanStore, PAST, CFS) • Grid Storage Services (LegionFS, SRB, IBP, SRM, GASS) • Design Issues & Assumptions: • Scalability: O(100) or O(1000) • Commodity Components • User Autonomy • Security and trust • Heterogeneity • Large, “write once read many” datasets • Transparent • Naming • Grid Aware

  11. Intended Role of FreeLoader • What the scavenged storage “is not”: • Not a replacement to high-end storage • Not a file system • Not intended for integrating resources at wide-area scale • What it “is”: • Low-cost, best-effort alternative to scientific data sources • Intended to facilitate • transient access to large, read-only datasets • data sharing within administrative domain • To be used in conjunction with higher-end storage systems

  12. Grid Data Access Tools Management Layer Data Placement, Replication, Grid Awareness, Metadata Management Registration Storage Layer Pool A Pool n Morsel Access, Data Integrity, Non-invasiveness Pool m Registration FreeLoader Architecture

  13. dataset n: dataset 1: 1a 2a 3a 4a 1 2 3 1 3 1 3a 2 2 1a 3a 1a 4a 2a 2a Storage Layer • Benefactors: • Morsels as a unit of contribution • Basic morsel operations [new(), free(), get(), put()…] • Space Reclaim: • User withdrawal / space shrinkage • Data Integrity through checksums • Performance history • Pools: • Benefactor registrations (soft state) • Dataset distributions • Metadata • Selection heuristics

  14. Management Layer • Manager: • Pool registrations • Metadata: datasets-to-pools; pools-to-benefactors, etc. • Availability: • Redundant Array of Replicated Morsels • Minimum replication factor for morsels • Where to replicate? • Which morsel replica to choose? • Grid Awareness: • Information Providers • Space reservations • Transfer protocols • Transparent Access: • Namespace

  15. Dataset Striping • Stripe datasets across benefactors • Morsel doubles as basic unit of striping • Multiple-fold benefits • Higher aggregate access bandwidth • Better resource usage • Lowering impact per benefactor • Tradeoff between access rates and availability • Need to consider • Heterogeneity, network connections • Working together with replication • Serving partial datasets

  16. Current Status • (A) services: • Dataset creation/deletion • Space reservation • (B) services: • Dataset retrieval • Hints • (C) services: • Registration • Benefactor alerts, warnings, alarms to manager • (D) services: • Dataset store • Morsel request reserve() cancel() store() retrieve() delete() open() close() read() write() Application I/O interface Client UDP (A) UDP/TCP (B) Manager UDP (C) UDP/TCP (D) new() free() get() put() Benefactor Benefactor OS OS Simple data striping

  17. Preliminary Results: Experiment Setup • FreeLoader prototype running at ORNL • Client Box • AMD Athlon 700MHz • 400MB memory • Gig-E card • Linux 2.4.20-8 • Benefactors • Group of heterogeneous Linux workstations • Contributing 7GB-30GB each • 100Mb cards

  18. Sample Data Sources • Local GPFS • Attached to ORNL SPs • Accessed through GridFTP • 1MB TCP buffer, 4 parallel streams • Local HPSS • Accessed through HSI client, highly optimized • Hot: data in disk cache without tape unloading • Cold: data purged, retrieval done in large intervals • Remote NFS • At NCSU HPC center • Accessed through GridFTP • 1MB TCP buffer, 4 parallel streams

  19. FreeLoader Data Retrieval Performance Throughput (MB/s)

  20. Impact Tests • How uncomfortable donors may feel? • A set of tests at NCSU • Benefactor performing local tasks • Client retrieving datasets at a given rate

  21. CPU-intensive Task Time (s)

  22. Network-intensive Task Normalized Download Time

  23. Disk-intensive Task Throughput (MB/s)

  24. Mixed Task: Linux Kernel Compilation Time (s)

  25. In-progress and Future Work • In-progress • APIs for use as scratch space • Windows support • Future • Complete pool structure, registration • Intelligent data distribution, service profiling • Benefactor impact control, self-configuration • Naming and replication • Grid awareness • Potential extensions • Harnessing local storage at cluster nodes? • Complementing commercial storage servers?

  26. Further Information • http://www.csm.ornl.gov/~vazhkuda/Morsels/

More Related