1 / 25

Towards Global Scale Cloud Computing: Using Sector and Sphere on the Open Cloud Testbed

NCDM SC08 Bandwidth Challenge. Towards Global Scale Cloud Computing: Using Sector and Sphere on the Open Cloud Testbed. Robert Grossman 1,2 , Yunhong Gu 1 , Michal Sabala 1 , David Hanley 1 , Shirley Connelly 1 , David Turkington 1 , and Jonathan Seidman 2

casey
Download Presentation

Towards Global Scale Cloud Computing: Using Sector and Sphere on the Open Cloud Testbed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCDM SC08 Bandwidth Challenge Towards Global Scale Cloud Computing:Using Sector and Sphere on the Open Cloud Testbed Robert Grossman1,2, Yunhong Gu1, Michal Sabala1, David Hanley1, Shirley Connelly1, David Turkington1, and Jonathan Seidman2 1National Center for Data Mining, Univ. of Illinois at Chicago 2Open Data Group

  2. Overview • Cloud computing software Sector/Sphere • Open source, developed by NCDM • Open Cloud Testbed • Cloud computing testbed supported by the Open Cloud Consortium • BWC Applications and Performance Evaluation • TeraSort: sorting 1TB distributed dataset • CreditStone: computing fraudulent rate from 3TB credit card transaction records

  3. Cloud Computing & Sector/Sphere • Hardware • Inexpensive computer clusters (may across multiple data centers) • High speed wide area networks • Software • Storage cloud: Distributed, big data set • Compute cloud: simplified distributed data processing Compute Cloud: Sphere Storage Cloud: Sector Routing: C/S or P2P High Speed Transport Protocol UDT

  4. Sector: Distributed Storage System Storage System Mgmt. Processing Scheduling Service provider System access tools App. Programming Interfaces User account Data protection System Security Security Server Master Client SSL SSL Data UDT Encryption optional slaves slaves Storage and Processing

  5. Sphere: Simplified Data Processing • Data is processed at where it resides (locality) • Same user defined functions (UDF) can be applied on all elements (records, blocks, or files) • Processing output can be written to Sector files, on the same node or other nodes • Generalize Map/Reduce

  6. Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf"); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

  7. Sphere: Data Movement • Slave -> Slave Local • Slave -> Slaves (Shuffle) • Slave -> Client

  8. Load Balance & Fault Tolerance • The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. • If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. • Load balance and fault tolerance are transparent to users!

  9. Implementation • Sector/Sphere is open source software developed by NCDM • Written in C++ • Client tools (file access, system management, etc) • Sector/Sphere API library • http://sector.sf.net • Current release version 1.15

  10. Open Cloud Testbed • 4 Racks • Baltimore (JHU): 32 nodes, 1 AMD Dual Core CPU, 4GB RAM, 1TB single disk • Chicago (StarLight): 32 nodes, 2 AMD Dual Core CPUs, 12GB RAM, 1TB single disk • Chicago (UIC): 32 nodes, 2 AMD Dual Core CPUs, 8GB RAM, 1TB single disk • San Diego (Calit2): 32 nodes, 2 AMD Dual Core CPUs, 12GB RAM, 1TB single disk • 10Gb/s inter-rack connection on CiscoWave • 1Gb/s intra-rack connection • SC08 • 9 nodes, 2 AMD Quad core CPUs, 16GB RAM, 6TB 8-disk RAID, 10Gb/s intra-rack connection

  11. SC08 & Open Cloud Testbed

  12. BWC App #1: Sorting a TeraByte • Data is split into small files, scattered on all slaves • Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key, so that all buckets are sorted. • Stage 2: On each destination node, an SPE sort all data inside each bucket.

  13. TeraSort Binary Record 100 bytes Stage 2: Sort each bucket on local node 10-byte 90-byte Value Key Bucket-0 Bucket-0 Bucket-1 Bucket-1 10-bit 0-1023 Stage 1: Hash based on the first 10 bits Bucket-1023 Bucket-1023

  14. Performance Results: TeraSort[Previous run] Run time: seconds

  15. Performance Results: TeraSortSC08 Tue Night • Sorting 1TB on 101 nodes • UIC: 24 + SL: 10 + JHU: 29 + Calit2: 29 + SC08: 9 • Hash vs. Local Sort: 1660sec + 604sec • Hash • Per rack (in/out): UIC 201/193GB; SL 96/95GB; JHU 212/217GB; Calit2 214/218GB; SC08 87/89GB • Per node (in/out): 10GB AVG • System total AVG 9.6Gb/s (Peak 20Gb/s) • Local Sort • No network IO

  16. BWC App# 2: CreditStone Text Record Trans ID|Time|Merchant ID|Fraud|Amount 01491200300|2007-09-27|2451330|0|66.49 Stage 2: Compute fraudulent rate for each merchant Transform merch-000X merch-000X Text Record Merchant ID Time Fraud merch-001X merch-001X Key Value 3-byte 000-999 merch-999X merch-999x Stage 1: Process each record and hash into buckets according to merchant ID

  17. Performance Results: CreditStone[Previous run]

  18. Performance Results: CreditStoneSC08 Tue Night • Process 3TB (53.5 Billion records) on 101 nodes • UIC: 24 + SL: 10 + JHU: 29 + Calit2: 29 + SC08: 9 • Stage 1 vs. Stage 2: 3022sec : 471sec • Scan and hash each record • Per rack (in/out): UIC 157/248GB; SL 84/122GB; JHU 229/135GB; Calit2 222/154GB; SC08 77/111GB • Per node (in/out): 9.7GB AVG • System total AVG 5.5Gb/s (Peak 7.2Gb/s) • Compute fraudulent rate per merchant • No network IO

  19. BWC App #3: CistrackSC08 Tue Night • 10Gb/s from SC08 & JGN2 (Japan) – OCT connection via CiscoWave & JGN2, respectively. • Upload Cistrack data from SC08 & JGN2 (Japan) to OCT • Approx 8Gb/s • Download Cistrack data from OCT to SC08 & JGN2 (Japan) • Approx 8Gb/s

  20. System Monitoring (Testbed)

  21. System Monitoring (Sector/Sphere)

  22. Summary • Aggregated bandwidth • Upload/Download SC08 <=> OCT: 8Gb/s each direction • TeraSort on OCT: 9.6Gb/s AVG, 20Gb/s Peak • CreditStone on OCT: 5.5Gb/s AVG, 7.2Gb/s Peak • Use of distributed clients • 101 nodes in 4 locations (Chicago, Baltimore, San Diego, Austin) • Bandwidth monitoring • Sector/Sphere application monitoring • Testbed system monitoring • Both monitor other resources (CPU, mem, etc) too

  23. Summary (cont.) • Usability • Client tools and C++ library (can call functions/applications written in other languages) • Sector is just like a distributed file system • Terasort: 60 lines of Sphere code, plus application specific code • CreditStone: 170 lines of Sphere code, plus application specific code • Real world Applicability • Software: Open Source Sector/Sphere • System: Open Cloud Testbed • The software can be installed on any heterogeneous clusters over single or multiple data centers • Real applications: bioinformatics, sorting and credit card transaction processing (and many other applications!)

  24. Summary (cont.) • Use of distributed HPC resources • Open cloud testbed (4 distributed racks + SC08 booth) • 10Gb/s CiscoWave • Conference floor resources • One 10Gb/s SCinet connection to CiscoWave • 9 servers + 1 10GE switch • Participating as slave (storage + computing) nodes for Sphere applications

  25. For More Information • Sector/Sphere code & docs: http://sector.sf.net • Open Cloud Consortium: http://www.opencloudconsortium.org • NCDM: http://www.ncdm.uic.edu • @SC08: Booth 321

More Related