1 / 9

Objective

Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward Correction Anup Ghosh, Sushil Jajodia: {aghosh1,jajodia}@gmu.edu; Angelos Kerymidas, Sal Stolfo, Jason Nieh: {angelos,sal,nieh}@cs.columbia.edu; Peng Liu :pliu@ist.psu.edu. Objective

stevie
Download Presentation

Objective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward CorrectionAnup Ghosh, Sushil Jajodia: {aghosh1,jajodia}@gmu.edu; Angelos Kerymidas, Sal Stolfo, Jason Nieh: {angelos,sal,nieh}@cs.columbia.edu; Peng Liu :pliu@ist.psu.edu • Objective • Develop self-regenerative enterprise networks that recover and re-constitute themselves after attacks and failures • Develop a transaction-based model for commodity operating systems to determine where an attack occurred, what data or programs were altered, and back-out all these changes without affecting unrelated data/activities. • Automatically generate patches to make systems more robust after attack. • Technical Approach: • Develop a layered approach to self-regenerative systems: • application-level resilience using error virtualization and rescue points • system-level resilience using virtualization and transaction semantics for programs to roll back system state to the last known good continuation point • dynamic patching of applications to improve resiliency after attack • roll forward with correction to quarantine tainted processes and files & back-out changes • DoD Benefit: • Uninterruptible service for critical network centric warfare services • Error localization and tolerance in applications • Automatic system recovery after attack including quarantine of tainted processes and data • Increased resiliency after attack through auto-patch generation Budget: Planned/Actual $K Dates and location of Major Reviews/Meetings: July 2008, TBD

  2. Uninterruptible Server Developed an architecture, algorithms, and system for providing uninterruptible critical network services in the face of attack Breakthroughs: Supports use of COTS buggy software while still providing 100% availability Experimental results show resilience against classes of malicious attack including denial of service, worms, and stealthy Trojans Experimentally-verified low-overhead Eliminates false negatives from sensors, and automatically handles false positives without manual review Technical Breakthroughs & Accomplishments (1 of 3) Architecture for uninterruptible servers Sensors Actuators TC State Estimator Response Selector Health status monitor for virtual machines and uninterruptible server

  3. Self-Healing Systems Developed an approach for a self-recoverable Linux file system Developed Self-Healing PostgreSQL, a damage tracking, quarantine, and repair DBMS The first COTS DBMS that satisfies two essential enterprise health requirements: Near-zero-run-time overhead: less than 8% Zero-system-down-time: during online repair, its throughput degradation quickly improves from 40% to 10-20% within few seconds Technical Breakthroughs & Accomplishments (2 of 3)

  4. Application Recovery Through Error Virtualization Developed novel “error virtualization with rescue points” recovery technique retrofit exception-handling capabilities in vulnerable code allows for safe and efficient application recovery from failures and attacks Evaluated recovery mechanism with 6 open-source apps 90%+ success Technical Breakthroughs & Accomplishments (3 of 3)

  5. Uninterruptible Server Develop a scalable architecture that virtualizes diverse redundant copies of critical network services Create a trustworthy controller (TC) that uses automatic feedback control to control state of servers Hide details of server replication from clients Revert servers to pristine condition on attack or corruption while continuing to provide service Technical Approach (1 of 3) VS VS VSH VSH Architecture for uninterruptible servers TC LoadBalancer VSH VSH VS VS TC Control Station

  6. Self-Healing Systems Zero-down-time self-recoverable Linux file system Create a DQR (dynamic quarantine and repair) hypervisor “underneath” User-Mode-Linux (UML) Zero-down-time quarantine and repair of infected application execution through processor emulation Use process-level reconstruction to guide instruction-level quarantine controls Selective replay to keep the results of good instructions while removing the effects of bad ones Technical Approach (2 of 3) Gang A Gang B Display process Stack Timer Log Heap Dependency Analyzer Keyboard Task structure Guest OS Guest OS Ports CPU auditor Quarantine VMM Task structure Roll-Forward Correction Instruction Generator Disks Hook Cache Surgery Agent Host Kernel Drivers

  7. Overview: Develop failure-agnostic application-level recovery mechanisms from faults and attacks React to previously unknown (zero day) observed attacks and software faults Recover using program’s native error handling Develop map between set of faults that could occur and explicit error handling Profile programs during “bad” runs Discover candidate “rescue” points Error Virtualization Modify program execution so that fault is translated into handled error Technical Approach (3 of 3) Application Recovery Through Error Virtualization

  8. Uninterruptible Servers Complex and buggy COTS server software can be deployed in mission-critical system without compromising reliability or security Off-the-shelf intrusion sensors can be used and false positives and false negatives automatically handled without requiring human intervention Self Healing Systems Local and remote surgical corrections of corrupted applications and operating systems Zero-downtime correction of corrupted processes Application Recovery Retrofitting 3rd party code with fault resilience by mapping potential faults to native error handling Increased resiliency after attack through auto-patch generation Impact of AFOSR Funding

  9. AFOSR MURI funding is leveraging prior NSF and DARPA-sponsored work Breadth, duration, and funding of MURI program makes significant research breakthroughs feasible Air Force Air Combat Command has assigned technical officer to monitor progress and transition useful technologies Besides GMU, Penn State, and Columbia University, collaborations with Dartmouth College and the University of Pennsylvania on application recovery has made an impact Patents filed by GMU and Columbia Significant industry interest in technologies being developed License agreement with VA-based start-up is being pursued Test & evaluation discussions with large system engineering firm being discussed Collaboration & Funding Sources

More Related