1 / 43

Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward Correction

Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward Correction. By Anup K. Ghosh George Mason University With Sushil Jajodia, GMU Angelos Stavrou, GMU Angelos Keromytis, Columbia University Jason Nieh, Columbia University Sal Stolfo, Columbia University

clint
Download Presentation

Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward Correction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward Correction By Anup K. Ghosh George Mason University With Sushil Jajodia, GMU Angelos Stavrou, GMU Angelos Keromytis, Columbia University Jason Nieh, Columbia University Sal Stolfo, Columbia University Peng Liu, Penn State University July 10, 2008

  2. Problem • The complexity of most enterprise-scale software systems precludes perfect reliability and invulnerability • Enterprise computing supports mission-critical functions including logistics, transportation, intelligence, & command and control -- the failure of which can have severe consequences • Enterprise-wide solutions must be engineered to account for failure and attacks against network servers and workstation clients

  3. Objective Develop self-regenerative enterprise networks that recover and re-constitute themselves after attacks and failures • Develop a transaction-based model for commodity operating systems to determine where an attack occurred, what data or programs were altered, and back-out all these changes without affecting unrelated data/activities. • Automatically generate patches to make systems more robust after attack.

  4. Approach • Develop an enterprise-wide approach to self-regenerative systems including: • Application-level resilience using error virtualization and rescue points (Columbia U) • Non-stop server resilience using virtualization and automatic feedback control to continuously provide servers with known high integrity, even after compromise (GMU) • Self-healing database to track damage, quarantine tainted records, and repair damage (PSU/GMU) • Journaling computer system for workstations to determine malicious actions and effects including tainted documents, programs, and network events (GMU) • System restore with with correction to back-out malicious changes (GMU) • Dynamic patching of applications to improve resiliency after attack (Columbia U)

  5. Overview of Presentations • Application-level error virtualization & dynamic patching (Columbia U) • Self-healing database (Penn State) • Non-stop server system (GMU) • Journaling Computer System for desktop applications (GMU)

  6. A Non-Stop Server using Automatic Feedback Control • Goal • Develop a Virtual Machine-based system for enterprise servers that provides non-stop computing services by compensating for faults and attacks against critical network servers • Feedback control system coupled with VM-based redundancy provides very high availability for imperfect software systems

  7. State of Server Systems • Servers are very complex pieces of software • Complex -> buggy -> software failures & attacks • Servers are used to provide mission-critical services for enterprise networks • Failures in servers result in substantial business loss of revenue and productivity • Internet-facing servers tend to bear the brunt of attacks against enterprises (govt & commercial) • Current strategies to mitigate the risk of failing servers • Demand better software from vendors -> perfection is not realizable • Best is auto-patching which introduces downtime • Provide hardware back-up redundancy • Common-mode failures for incidental bugs & attacks • Large, emerging market for server consolidation via virtualization • Reduces TCO for maintaining lots of server boxes • Virtualized server farms is becoming the norm

  8. Diversify and replicate servers in virtual machines Create a trustworthy controller (TC) that uses automatic feedback control to control state of servers Hide details of server replication from clients Revert servers to pristine condition on attack or corruption while continuing to provide service VS VS SensorReports Action VSH VSH Action Recommendation TC LoadBalancer Action decisions VSH VSH VS VS Solution for Non-Stop Computer Servers

  9. Sensors Intrusion sensors Anomaly detectors Integrity monitors Performance monitors Exposure time Actuators Service restoration Terminate unauthorized processes VM revert Client throttling/blocking Control models Rules-based engine Learning-based state estimator Sensors Actuators TC State Estimator Response Selector Trustworthiness Controller (TC)

  10. TC Testbed Setup Apache00 TC GUI Station Apache01 TC Control Station Client Apache02 LoadBalancer Server 192.168.0/24 10.0.0.0/16

  11. TC GUI • Visualize system state and dynamics • Passive: it receives state information from TC control and displays --- GUI does not direct TC control • System View: • Show the state of the server machine and summarized state of VMs • VM View: • Show the state of individual VMs

  12. TC GUI: System View

  13. TC GUI: VM View

  14. Denial of Service Attack

  15. Active Malicious Process

  16. Withstanding Persistent DoS Attacks 1 attack per second  92% of normal throughput 8 attacks per second  60% 8/14/2014 16

  17. Revert Overhead 8 measurements took one minute Worst case revert overhead = 12% (when reversion starts) Return to 99% of normal throughputs in 30 sec (measure 5). 8/14/2014 17

  18. Conclusion • TC is a close-loop control architecture for intrusion detection and server defense • Servers are virtualized so that they can be reverted to pristine state at low cost. • The control loop issues actuators in response to sensor inputs • Handles “false negatives,” including zero-day exploits and ingenious stealthy attacks that evade detection. • Handles false alarms automatically without human in the control loop. • Address the problem of overwhelming “false positives.”

  19. Autonomic Recovery & Regeneration using Lightweight Virtualization Objective: Develop self-regenerative enterprise networks that recover and re-constitute themselves after attacks and failures Recover: bring the system back to an operational state Regenerate: roll forward with correction to quarantine tainted processes and files & back-out corrupted changes 8/14/2014 19

  20. Traditional Logging for Recovery To be comprehensive, all system objects and activities need to be monitored, including processes, threads, inter-process communications, file system activities, signals, memory, network and local sockets … The challenges lie in the number of activities to monitor and the amount of resulting information to log. 8/14/2014 20

  21. Virtualization Technologies • Full and para-virtualization • A virtual machine (VM), acts like a complete system, equipped with its own OS and (virtual) hardware management • Lightweight virtualization • A VE, aka Container, has its own file system space, process space, socket space, and network identity but no guest OS and ensuing overhead 8/14/2014 21

  22. Journaling Computing System JCS executes applications in lightweight VEs, created on demand and started in pristine state The host monitors; the VEs do the jobs The focus of monitoring is on the interactions among VEs, not VE internal activities. A novel VE construction method allows minimum effort to monitor VE integrity (to be discussed later). This drastically reduce the amount of activities of interests Abstract inter-VE interactions as transactions; high-level semantics of transactions further reduce the information needed to be kept 8/14/2014 22

  23. JCS Host Diagram TransactionSummarization Engine VE Manager System Journal VE 1 VE 2 VE N Atomic Transactions Syscalls JCS Kernel Monitor OpenVZ Kernel 8/14/2014 8/14/2014 23 23

  24. The JCS Transactions • File Transactions: information exchange through shared files • Socket transactions: limited to Inet sockets • Memory transactions. • Transactions channels are tightly controlled. • File sharing setup cannot be changed from within VEs. • Firewalls can be used to block illegal TCP connections. 8/14/2014 24

  25. JCS Transactions Continued • Atomic transactions: lowest level system events of interests --- presently syscalls. • Summarized transactions: combining multiple atomic transactions into one --- can be lossless or not • Application defined transactions: bring in application semantics • Ongoing research: causality-based summarization • summarize the combined effects of multiple system calls as one transaction, without loss information in causality analysis. 8/14/2014 25

  26. JCS Desktop • Clicking on icons or menu items creates a new VE to run the designated application • Each VE has its own file system, process space, local socket space, and network identity. • It is like running applications in their own VMs but without the overhead of full virtualization • Application windows are seamlessly integrated with the desktop. • Directory sharing is set up for seamless work flow: see an example in the next page 8/14/2014 26

  27. An Example of File Transactions through Shared Directory Application in a VE can’t see other apps/VEs. Shared directory for file transactions VEs created on demand Email VE Office VE Foo.doc Shared Directory Save Attachment Click onFoo.doc 8/14/2014 27

  28. Analysis & Recovery Actions • An interface to allow the user to identify corrupted files, virus infections, bad URLs, … • Intrusion/corruption detections can be integrated to automate the above • Corruption propagation analysis button • Analysis of sensitive data leakage • Corruption source discovery button (bad applications, URLs, etc.) • Application self-healing button 8/14/2014 28

  29. Name Space Unification Union Mount / vz JCS 101 FFX-PS 101-dirties Ubuntu Firefox .mozilla download bin/ls .mozilla download bin/ls RO RO RW RW 8/14/2014 29

  30. JCS VE A firefox VE Created / .mozilla download bin/ls The file system seenwithin the VE When a firefox VE is created at /vz/101, the /vz/101 subtree becomes its entire file system (the /) Applications in VE 101 see only unified namespace --- they cannot see individual branches. /vz/101-dirties serves as a “honey branch.” 8/14/2014 30

  31. 101-dirties: The “Honey Branch” • When a Trojan horse of the ls command is installed, it reveals itself in 101-dirties, the only branch needs to be monitored Union Mount / vz JCS 101 FFX-PS 101-dirties Ubuntu Firefox .mozilla download bin/ls bin/ls bin/ls .mozilla download bin/ls RO RO RW RW 8/14/2014 31

  32. Disk Space Usage by Firefox VEs 8/14/2014 32

  33. Memory Space by Firefox VE Memory in kilo bytes; each VM is configured with 128MB memory 8/14/2014 33

  34. First Generation Prototype Dell 2900 server: 8 cores, 16GB memory, 15000 RPM SCSI hard drive. OS: 64-bit CentOS 5.1 (free version of RedHat Enterprise Linux 5.1) Kernel: 2.6.24-4 with OpenVZ and unionFS patches 32 kprobes created to monitor file system and AF_INET socket activities Relay channel used to send probe reports to user space program (jcs-relay) 8/14/2014 34

  35. Demo • Scenario • Open the browser • Go to bad websites • Download some files, one of them a malware, installer.exe • Run installer.exe • Corrupt some files • Send sensitive data out (browsing history) • Demo • show all files corrupted and give instructions of recovery • show all data that leaked from corrupted processes • list possible IP sources of the malware 8/14/2014 35

  36. Movie 1: JCS System Startup • Show the startup of the VE manager • JCS kernel module installed • Relay channeled opened • JCS-relay programming running • System ready 8/14/2014 36

  37. Movie 2: Starting an Application Click on a pdf file Ve-manager (at the right lower of the screen) shows: Launching PDF Reader Container 105 PDF file displayed from container 105 8/14/2014 37

  38. Movie 3: Online Browsing Click on the Firefox icon on the desktop VE-manager shows: Launching Firefox Container 106 Firefox window shows up Visit kernel.org and download the change log of recent kernel revisions Visit a bad site (192.168.0.12, the server in the TC testbed), download Installer.exe, the malware Visit cnn.com and leave the browser there 8/14/2014 38

  39. Movie 4: Running Malicious Software Click on the Terminal icon VE-manager shows: Launching Terminal Container 107 Execute Installer.exe in the terminal The user will find files in the desktop corrupted. The user will see a message asking $100 for decryption key. 8/14/2014 39

  40. Movie 5: Analysis Engine Run the analysis program with the name and directory of the malware as inputs Wait until the program ends 8/14/2014 40

  41. Movie 6: Analysis Results Give the container (107) that executed the malware Display files that might have been leaked to what IP addresses by VE 107 Display files that are written by VE 107. Recommend the user to recover which files to versions before what times Shows that the malware was created by container 106, a Firefox container Shows files in the shared directory (Desktop) that VE 106 had “touched” Shows that suspected (IP) sources of the malware 8/14/2014 41

  42. Ongoing Research • Applying lightweight virtualization to TC • More virtual web servers rotating • Less exposure times for each • Causality-based summarization • Capturing application semantics in transactions • Mission specific summarization: for data recovery, intrusion analysis, … • Mechanisms to capture memory transactions 8/14/2014 42

  43. Discussion aghosh1@gmu.edu

More Related