1 / 31

Using OpenStack and Puppet to deliver IaaS at CERN

Ben Jones ben.dylan.jones@cern.ch. Using OpenStack and Puppet to deliver IaaS at CERN. Agile Infrastructure. Why change the operating model? Twice the compute, same staff levels New DC at Wigner, Budapest “We’re not special”

xanto
Download Presentation

Using OpenStack and Puppet to deliver IaaS at CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ben Jones ben.dylan.jones@cern.ch Using OpenStack and Puppetto deliver IaaS at CERN NEC'2013

  2. Agile Infrastructure NEC'2013 • Why change the operating model? • Twice the compute, same staff levels • New DC at Wigner, Budapest • “We’re not special” • Existence of open source tool chain: OpenStack, puppet, foreman, kibana • “Coffee time” provisioning of cloud servers

  3. NEC'2013

  4. New Data Centre • Data centre in Geneva at the limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Local on-site support for hardware maintenance and installations NEC'2013

  5. What is Cloud? NEC'2013 • Technology model • virtualization of compute, network, storage • Operational model • run your services in a certain way • Consumption model • “don’t make me talk to IT” • delivered instantly* over the wire, variable price

  6. What is IaaS? NEC'2013

  7. Private Cloud Software • We use OpenStack, an open source cloud project http://openstack.org • ATLAS and CMS High Level Trigger clouds • HEP Clouds at BNL, IN2P3, NECTaR, FutureGrid, … • Clouds at HP, IBM, Rackspace, eBay, PayPal, Yahoo!, Comcast, Bloomberg, Fidelity, NSA, CloudWatt, Numergy, Intel, Cisco … NEC'2013

  8. OpenStack NEC'2013

  9. CERN Network Database Block Storage Provider Cinder Network Account mgmtsystem Compute Scheduler Keystone Nova Microsoft Active Directory Horizon Glance CERN DB on Demand NEC'2013

  10. Nova NEC'2013 • Cloud computing fabric controller • Network manager modified for CERN • integration with network database • specific to our use case, not pushed upstream • Nova Compute aware of CERN DNS & AD • Multiple availability zones • special zone for Hyper-V • scheduler has filter based on image distribution metadata

  11. Glance NEC'2013 • Services for discovering, registering and retrieving VM images • Aim for automated image creation / update • common process for Linux & Windows images • common tools – Aeolus Oz • CERN tools to hook up Oz & Glance API • Images for all CERN supported OS • user defined images supported • Initial contextualization via cloud-init • Cloudbase contributed cloud-init for windows

  12. Keystone NEC'2013 • Identity service: authentication, authorization and service catalog • Full integration with Active Directory via LDAP • CERN’s AD: 44K users & 29K groups • Minimal changes to AD • CERN submitting changes upstream • Account mgmt. System Integration for project creation / deletion • SSL for everything

  13. NEC'2013

  14. Operational practices evolving NEC'2013 • Security incidents • old: reinstall, new: replace with new VM • Misconfiguration requiring reboot • Resize a service • lxplus.cern.ch add VMs to serve demand • resize VMs (or rather, replace with bigger) • In future resize services automatically

  15. Service Models • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one NEC'2013

  16. Some other use cases… • Hippos are cattle with block storage. Useful where there is redundancy, ieMongoDB, Cassandra. • Canaries are cattle at high risk to give early warning of failures. Fail fast and fix. NEC'2013

  17. Heat NEC'2013 Heat orchestrates composite cloud apps (stacks) HA (restarts resources) & “auto-scaling”

  18. Configuration Management NEC'2013 • Adopted puppet • widely used, large community, scales • Needed to make reproducible services in the CERN CC • Simplify the configuration of OpenStack itself. • community modules from RH, puppetlabs, users

  19. NEC'2013

  20. Accounting NEC'2013 • CERN computing is funded from CERN central budgets, no billing but quotas • Experiments don’t have credit cards • What to do when quota is exceeded? • Unused capacity? • low SLA usage to plug the gaps? • Fair share across the cloud? • Worked for supercomputers but heavy for clouds at scale • Bursting to public clouds?

  21. Ceilometer NEC'2013 • Accounting for OpenStack by project • Collects statistics from each compute node • common OpenStack message bus • ShardedMongoDB store • 2gb / day • HyperV in Havana • Cinder statistics upcoming

  22. CERN Status NEC'2013 • CERN IT OpenStack Cloud • Folsom based service ~500 hypervisors on KVM and Hyper-V • New “grizzly” production service opened late July • 280 hypervisors, 600 VMs, 50 projects and growing rapidly • High availability components using load balancing • ie 3 nova controllers per cell • All Puppet managed to configure OpenStack • LHC experiment farms • CMS currently running 1,300 hypervisors with 50,000 cores • ATLAS starting to ramp up to a similar size • Other science grid sites moving to private cloud on OpenStack • Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP, …

  23. Outlook NEC'2013 • Track stable Grizzly releases in RedHat RDO • Up to date but not too close to the leading edge • Scaling • Expect 15,000 hypervisors, 150,000 VMs by 2015 • Manageability • Metering, Orchestration with Heat, Bare Metal • Functionality • Load Balancing, High Availability Storage and Pets

  24. What have we learnt? NEC'2013 • Automate everything from the beginning • Puppet and Stackforge are a great help • Distributions and appliances make getting started much easier • Constant rate of change requires a different approach • Focus on core technologies and keep up to date • Track new projects but don’t adopt too early unless strategic • Many of our users are cloud aware • Culture changes for legacy application coding and IT services • Communities are major motivators • But administrators need to engage and adapt rather than re-invent

  25. Conclusions NEC'2013 CERN IT is re-engineering to deliver additional capacity to 11,000 physicists within fixed resources Clouds models can simplify current large scale computing infrastructure OpenStack and its ecosystem allows us to meet this challenge and help others through open source

  26. Questions ? NEC'2013

  27. Preproduction Service NEC'2013

  28. mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Pulp Active Directory / LDAP Hardware database Lemon / Hadoop / LogStash / Kibana Puppet-DB NEC'2013

  29. Training for Newcomers Buy the book rather than guru mentoring NEC'2013

  30. Job Opportunities NEC'2013

More Related