1 / 23

Corollary

Corollary. System Overview. Second Key Idea: Specialization. Think GoogleFS. Third idea: Enable cross-layer optimizations Layered Architectures: High benefits, but …. TCP/IP File System Benefits, but… … limits information flow across layers. API. Cross-Layer Optimizations. Examples

eddy
Download Presentation

Corollary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corollary

  2. System Overview

  3. Second Key Idea: Specialization • Think GoogleFS

  4. Third idea: Enable cross-layer optimizationsLayered Architectures: High benefits, but … • TCP/IP • File System • Benefits, but… • … limits information flow across layers. API http://netsyslab.ece.ubc.ca

  5. Cross-Layer Optimizations • Examples • IP • Storage systems • …. • Applications  Storage System • Performance • QoS requirements • Consistency requirements • Applications  Storage System • Provide storage-level information to applications Data Intensive Applications: Co-usage of files Data Intensive Schedulers: Notification about data movements What’s missing? A vehicle to pass information across layers http://netsyslab.ece.ubc.ca

  6. Traditional Use of Custom Metadata Author=Smith input.dat File Browser Application Layer POSIX API Metadata Manager File System Layer File Organization Module Basic File System Storage System Layer http://netsyslab.ece.ubc.ca

  7. Cross-Layer Communication OK. Schedule Task on node3 Replicate input.dat 3x Application Layer POSIX API Metadata Manager File System Layer File Organization Module Basic File System Storage System Layer input.dat moved from node1 to node3 http://netsyslab.ece.ubc.ca

  8. Recap • Object-based storage • Enable specialization --> performance • Enable cross-layer optimization --> genrality

  9. One intended use: A Workflow-Aware Storage System

  10. Workflow Example - ModFTDock • Protein docking application • Simulates the creation of a complex protein from two known proteins • Applications • Drugs design • Protein interaction prediction

  11. Platform Example – Argonne BlueGene/P 2.5K IO Nodes 160K cores GPFS IO rate: 8GBps = 51KBps / core !! 10 Gb/s Switch Complex Torus Network 24 I/O servers 2.5 GBps per node 3D Torus 850 MBps per 64 nodes Tree The central storage is a potential bottleneck Underused resources

  12. App. task App. task App. task App. task App. task Local storage Local storage Local storage Local storage Local storage Background – ModFTDock in Argonne BG/P 1.2 M Docking Tasks Workflow Runtime Engine File based communication Large IO volume Scale: 40960 Compute nodes IO rate : 8GBps = 51KBps / core Backend file system (e.g., GPFS, NFS)

  13. App. task App. task App. task Local storage Local storage Local storage Intermediate Storage Approach Workflow Runtime Engine Scale: 40960 Compute nodes … POSIX API Stage Out Intermediate Storage Stage In Backend file system (e.g., GPFS, NFS)

  14. Usage scenario II: • Support for deduplication

  15. Stakeholders • The final clients • Financing agencies ($) • DoE • NSERC • Science teams • Development team • Graduate students (6+) • Undergraduate students, visitors (10+) • Me

  16. Stakeholders – and their goals • The final clients • Financing agencies ($) • DoE • NSERC • Science teams • Development team • Graduate students (6+) • Undergraduate students, visitors (10+) • Me

  17. Requirements • Easy to deploy • Easy to integrate with applications • Versatility and ability to configure • Efficiency / high-performance /scalability • Ability to support versioning and partially similar data. All have big architectural implications

  18. Early architectural decisions • Object-based storage • system structure 2.) Network/protocol stack: uniform - Stateless to the degree possible

  19. Early architectural decisions 3.) FUSE-based implementation • Impact: structure, deployability 4.) Policy to manage tension between code maturity and need to experiment

  20. Mid-way architectural decisions 5.) GeneralIO hack 6.) Test-driven design - integrate 3month projects

  21. Implicit architectural policies 7.) Personnel management: • prioritize ‘fun’ • Flat Team structure • Bottom-up decision making / prioritization: • ‘campaigns’ 8.) Align ‘values’

  22. Key architectural decisions • Object-based storage 2.) Uniform protocol stack 3.) POSIX, FUSE-based implementation, 4.) Policy to manage tension between code maturity and need to experiment 5.) GeneralIO hack 6.) Test-driven design 7.) Personnel management: prioritize ‘fun’ 8.) Align values

More Related