1 / 48

Real World Cloud Application Security chan@netflix.com

Real World Cloud Application Security chan@netflix.com. About Me. Director of Engineering @ Netflix Responsible for: Cloud app, product, infrastructure, ops security Previously: Led security team @ VMware Earlier, primarily security consulting at @stake, iSEC Partners. Netflix, Inc.

teal
Download Presentation

Real World Cloud Application Security chan@netflix.com

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real World Cloud Application Securitychan@netflix.com

  2. About Me • Director of Engineering @ Netflix • Responsible for: • Cloud app, product, infrastructure, ops security • Previously: • Led security team @ VMware • Earlier, primarily security consulting at @stake, iSEC Partners

  3. Netflix, Inc. “Netflix is the world’s leading Internet television network with more than 33 million members in 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series . . .” Source: http://ir.netflix.com

  4. AppSec Challenges

  5. Lots of Good Advice • BSIMM • Microsoft SDL • SAFECode

  6. But, what works? Forrester Consulting, 12/10

  7. Especially, given phenomena such as DevOps, cloud, agile, and the unique characteristics of an organization?

  8. Cloud @ Netflix

  9. Availability

  10. “Undifferentiated Heavy Lifting”

  11. Netflix Culture “may well be the most important document ever to come out of the Valley.” Sheryl Sandberg, Facebook COO

  12. Scale and Usage Curve

  13. Netflix is now ~99% in the cloud

  14. On the way to the cloud . . . (architecture)

  15. On the way to the cloud . . . (organization) (or NoOps, depending on definitions)

  16. Deploying Code

  17. Lots of watching in prime time Not as much in early morning Old way - pay and provision for peak, 24/7/365 A common graph @ Netflix Weekend afternoon ramp-up Multiply this pattern across the dozens of apps that comprise the Netflix streaming service

  18. Solution: Load-Based Autoscaling

  19. Autoscaling • Goals: • # of systems matches load requirements • Load per server is constant • Happens without intervention (the ‘auto’ in autoscaling) • Results: • Clusters continuously add & remove nodes • New nodes must mirror existing

  20. Every change requires a new cluster push (not an incremental change to existing systems)

  21. Deploying code must be easy (it is)

  22. RPM with app-specific bits VM template ready to launch Perforce/Git Bakery/Aminator ASG AMI YUM Code change Config change Base image + RPM Cluster config Running systems Netflix Deployment Pipeline

  23. Operational Impact • No changes to running systems • No systems mgmt infrastructure (Puppet, Chef, etc.) • Fewer logins to prod • No snowflakes • Trivial “rollback”

  24. Security Impact • Need to think differently on: • Vulnerability management • Patch management • User activity monitoring • File integrity monitoring • Forensic investigations

  25. Architecture, organization, deploymentare all different. What about security?

  26. We’ve adapted too. Some principles we’ve found useful.

  27. Points of Emphasis

  28. Points of Emphasis • Integrate • Make the right way easy • Self-service, with exceptions • Trust, but verify • Two contexts: • Integration with your engineering ecosystem • Integration of your security controls • Organization • SCM, build and release • Monitoring and alerting

  29. Integration: Base AMI Testing • Base AMI – VM/instance template used for all cloud systems • Average instance age = ~24 days (one-time sample) • The base AMI is managed like other packages, via P4, Jenkins, etc. • We watch the SCM directory & kick off testing when it changes • Launch an instance of the AMI, perform vuln scan and other checks SCAN COMPLETED ALERT Site name: AMI1 Stopped by: N/A Total Scan Time: 4 minutes 46 seconds Critical Vulnerabilities: 5 Severe Vulnerabilities:   4 Moderate Vulnerabilities: 4

  30. Integration: Control Packaging and Installation • From the RPM spec file of a webserver: Requires: osseccloudpassagenflx-base-harden hyperguard-enforcer • Pulls in the following RPMs: • HIDS agent • Config assessment/firewall agent • Host hardening package • WAF

  31. Integration: Timeline (Chronos) • What IP addresses have been blacklisted by the WAF in the last few weeks? • GET /api/v1/event?timelines=type:blacklist&start=20130125000000000 • Which security groups have changed today? • GET /api/v1/event?timelines=type:securitygroup&start=20130206000000000

  32. Integration: Static Analysis • Available self-service through build environment • FindBugs, PMD • Jenkins plugin to display graphs and support drill through to results

  33. Integration: Static Analysis

  34. Points of Emphasis • Integrate • Make the right way easy • Self-service, with exceptions • Trust, but verify • Developers are lazy

  35. Making it Easy: Cryptex • Crypto: DDIY (“Don’t Do It Yourself”) • Many uses of crypto in web/distributed systems: • Encrypt/decrypt (cookies, data, etc.) • Sign/verify (URLs, data, etc.) • Netflix also uses heavily for device activation, DRM playback, etc.

  36. Making it Easy: Cryptex • Multi-layer crypto system (HSM basis, scale out layer) • Easy to use • Key management handled transparently • Access control and auditable operations

  37. Making it Easy: Cloud-Based SSO • In the AWS cloud, access to data center services is problematic • Examples: AD, LDAP, DNS • But, many cloud-based systems require authN, authZ • Examples: Dashboards, admin UIs • Asking developers to securely handle/accept credentials is also problematic

  38. Making it Easy: Cloud-Based SSO • Solution: Leverage OneLoginSaaSSSO (SAML) used by IT for enterprise apps (e.g. Workday, Google Apps) • Provides a single & centralized login page • Built base module to make SSO/authN trivial

  39. Points of Emphasis • Integrate • Make the right way easy • Self-service, with exceptions • Trust, but verify • Self-service is perhaps the most transformative cloud characteristic • Failing to adopt this for security controls will lead to friction

  40. Self-Service: Security Groups • Asgard cloud orchestration tool allows developers to configure their own firewall rules • Limited to same AWS account, no IP-based rules

  41. Points of Emphasis • Integrate • Make the right way easy • Self-service, with exceptions • Trust, but verify • Culture precludes traditional “command and control” approach • Organizational desire for agile, DevOps, CI/CD blur traditional security engagement touchpoints

  42. Trust but Verify: Security Monkey • Cloud APIs make verification and analysis of configuration and running state simpler • Security Monkey created as the framework for this analysis • Includes: • Certificate checking • Firewall analysis • IAM entity analysis • Limit warnings • Resource policy analysis

  43. Trust but Verify: Security Monkey From:  Security Monkey Date:  Wed, 24 Oct 2012 17:08:18 +0000 To:  Security Alerts Subject:  prod Changes Detected         Table of Contents:             Security Groups                     Changed Security Group                         <sgname> (eu-west-1 / prod)                          <#Security Group/<sgname> (eu-west-1 / prod)>

  44. Trust but Verify: Exploit Monkey • AWS Autoscalinggroup is unit of deployment, so changes signal a good time to rerun dynamic scans On 10/23/12 12:35 PM, Exploit Monkey wrote: I noticed that testapp-live has changed current ASG name from testapp-live-v001 to testapp-live-v002. I'm starting a vulnerability scan against test app from these private/public IPs: 10.29.24.174

  45. Takeaways • Netflix runs a large, dynamic service in AWS • Newer concepts like cloud & DevOpsneed an updated approach to application security • Specific context can help jumpstart a pragmatic and effective security program • Don’t swim upstream - integrate and collaborate with your engineering partners

  46. Netflix References • http://netflix.github.com • http://techblog.netflix.com • http://slideshare.net/netflix

  47. Other References • http://www.webpronews.com/netflix-outage-angers-customers-2008-08 • http://www.pcmag.com/article2/0,2817,2395372,00.asp • http://www.readwriteweb.com/archives/etech_amazon_cto_aws.php • http://bsimm.com/online/ • http://www.microsoft.com/en-us/download/confirmation.aspx?id=29884 • http://www.slideshare.net/reed2001/culture-1798664 • http://techcrunch.com/2013/01/31/read-what-facebooks-sandberg-calls-maybe-the-most-important-document-ever-to-come-out-of-the-valley/ • http://www.gauntlt.org

  48. Questions? ? chan@netflix.com

More Related