1 / 26

Case Study: The University of Alabama at Birmingham OpenStack , Ceph , Dell

Case Study: The University of Alabama at Birmingham OpenStack , Ceph , Dell. Kamesh Pemmaraju, Dell John-Paul Robinson, UAB OpenStack Summit 2014 Atlanta, GA. An overview. Dell – UAB backgrounder What we were doing before How the implementation went What we’ve been doing since

palti
Download Presentation

Case Study: The University of Alabama at Birmingham OpenStack , Ceph , Dell

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study: The University of Alabama at BirminghamOpenStack , Ceph, Dell Kamesh Pemmaraju, Dell John-Paul Robinson, UAB OpenStack Summit 2014 Atlanta, GA

  2. An overview • Dell – UAB backgrounder • What we were doing before • How the implementation went • What we’ve been doing since • Where we’re headed

  3. Dell – UAB background • 900 researchers working on Cancer and Genomic Projects. • Their growing data sets challenged available resources • Research data distributed across laptops, USB drives, local servers, HPC clusters • Transferring datasets to HPC clusters took too much time and clogged shared networks • Distributed data management reduced researcher productivity and put data at risk • They therefore needed a centralized data repository for Researchers in order to insure compliances concerning retention of data. • They also wanted scale-out cost-effective solution and hardware that could be re-purposed for compute & storage

  4. Dell – UAB background (contd..) • Potential solutions investigated • Traditional SAN • Public cloud storage • Hadoop UAB chose Dell/Inktank to architect a platform that would be very scalable and provide lost costs per GB and was the best of all worlds that provide compute and storage on the same hardware.

  5. A little background… • We didn’t get here overnight • 2000s-era High Performance Computing • ROCKS-based compute cluster • The Grid and proto-clouds • GridWay Meta-scheduler • OpenNebula an early entrant that connected grids with this thing called the cloud • Virtualization through-and-through • DevOps is US

  6. Challenges and Drivers • Technology • Many hypervisors • Many clouds • We have the technology…can we rebuild it here? • Applications • Researcher started shouting “Data”! NextGen Sequencing Research Data Repositories Hadoop • Researcher kept on shouting “Compute”!

  7. Data Intensive Scientific Computing • We knew we needed storage and computing • We knew we wanted to tie it together with an HPC commodity scale-out philosophy • So August 2012 we bought 10 Dell 720xd servers • 16-core • 96GB RAM • 36TB Disk • A 192-core, ~1TB RAM, 360TB expansion to our HPC fabric • Now to integrate it…

  8. December 2012 • Bob said: Hearing good things about open stack and ceph at this week at dell world. Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today. He is also chair of company that supports He also spoke highly of dell crowbar deployment tool. I

  9. December 2012 • Bob said: Hearing good things about open stack and ceph at this week at dell world. Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today. He is also chair of company that supports He also spoke highly of dell crowbar deployment tool. • I said: Good to hear. I've been thinking a lot about dell in this picture too. We have the building blocks in place. Might be a good way to speed the construction.

  10. Lesson 1: Recognize when a partnership will help you achieve your goals.

  11. The 2013 Implementation • The Timeline • In January we started our discussions with Dell and Inktank • By March we had committed to the fabric • A week in April and we had our own cloud in place • The Experience • Vendors committed to their product • Direct engagement through open communities • Bright people who share your development ethic

  12. Next Step…Build Adoption • Defined a new storage product based on the commodity scale-out fabric • Able to focus on strengths of Ceph to aggregate storage across servers • Provision any sized image to provide Flexible Block Storage • Promote cloud adoption within IT and across the research community • Demonstrate utility with applications

  13. Applications • Crashplan Backup in the cloud • A couple hours to provision the VM resources • An easy half-day deploy with the vendor because we controlled our resources a.k.a. firewall • Add storage containers on the fly as we grow…10TB in few clicks • Gitlab hosting • Start a VM spec’d according to project site • Work with Omnibus install. Hey it uses Chef! • Research Storage • 1TB storage containers for cluster users • Uses Ceph RBD images and NFS • The storage infrastructure part was easy • Scaled provisioning, 100+ user containers (100TB) created in about 5 minutes. • Add storage servers as existing ones fill

  14. Ceph Rebalances as Storage Grows :)

  15. Lesson 2: Use it! That’s what it’s for!

  16. Lesson 2: Use it! That’s what it’s for! The sooner you start using the cloud the sooner you start thinking like the cloud.

  17. How PoC Decisions Age Over Time • Pick the environment you want when you are in operation…you’ll be there before you know it • Simple networking is good • But don’t go basic unless you are able to reinstall the fabric • Class B ranges to match the campus fabric • We chose a split admin range to coordinate with our HPC admin range • We chose a collapsed admin/storage network due to a single switch…probably would have been better to keep separate and allow growth • It’s OK to add non-provisioned interfacing nodes…know your net • Avoid painting yourself in corner • Don’t let the Paranoid Folk box-in your deployment • An inaccessible fabric is an unusable fabric • Fixed IP range mismatch with “fake” reservations

  18. Lesson 3: The fabric is flexible. Let it help you solve your problems

  19. Problems will Arise • The release version of the ixgbe driver in Ubuntu 12.04.1 kernel didn’t perform well with our 10Gbit cards • Open source has an upstream • Use it as part of debug network • Upgrading the drivers was a simple fix • Sometimes when you fix something you break something else • There are still a lot of moving parts but each has a strong open source community • Work methodically • You will learn as you go • Recognize the stack is integrated and respect tool boundaries

  20. Sometimes a Problem is just a Problem • Code ex

  21. Lesson 4: The code *is* the documentation

  22. Lesson 4: The code *is* the documentation …and that’s a *good* thing

  23. Where we are today • OpenStack plus Ceph are here to stay for our Research Computing System • They give us the flexibility we need for an ever expanding research applications portfolio • Move our UAB Galaxy NextGen Sequencing platform to our Cloud • Add Object Storage services • Put the cloud in the hands of researchers • The big question…

  24. …how far can we take it? • The goal of process automation is scale • Incompatible, non-repeatable, manual processes are a cost • Success is in dual-use • Satisfy your needs and customer demand • Automating process implies documenting process…great for compliance and repeatability • Recognize the latent talent in your staff today’s system admins are tomorrows systems developers • Traditional infrastructure models are ripe for replacement

  25. Lesson 5? You can we learn from research and engage as a partner

  26. Want to learn more about Dell + OpenStack + Ceph? Join the Session, 2:00 pm, Tuesday, Room #313 Software Defined Storage, Big Data and Ceph - What Is all the Fuss About? Neil Levine, Inktank& Kamesh Pemmaraju, Dell

More Related