1 / 27

Computational Resiliency

Computational Resiliency. Steve J. Chapin, Susan Older Syracuse University Gregg Irvin Mobium Enterprises. Recap: What is Computational Resiliency?. The ability to sustain application operation and dynamically restore the level of assurance during an attack.

fifi
Download Presentation

Computational Resiliency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Resiliency Steve J. Chapin, Susan Older Syracuse University Gregg Irvin Mobium Enterprises OASIS PI Meeting

  2. Recap: What isComputational Resiliency? The ability to sustain application operation and dynamically restore the level of assurance during an attack. Application-centric self defense, built on replication, migration, functionality mutation, and camouflage.

  3. ComputationalResiliency Techniques applied to correct situation Attack Result of Attack Degraded Application trying to perform Mission CriticalFunction Mission CriticalApplication Degraded Application sufficiently Improved by Resiliency to perform Mission Critical Function Computational Resiliency

  4. Example of CRLib “Safe Zone” OASIS protection “The Wild” limited protection

  5. The Players • Rocky & Bullwinkle: our heroes, both air and ground forces. • Dudley: representative of allied power. • Boris & Natasha: Directed by shadowy figure (Fearless Leader). Mission: big trouble for Moose and Squirrel. • Snidely: attempting to disrupt Dudley’s jobs.

  6. The Benign State Rocky’s job Bullwinkle’s job Dudley’s job (low priority)

  7. The Attacks Snidely: blocked at firewall Dudley does nothing.

  8. The Attacks Natasha attacks Rocky; caught by IDS.

  9. The Attacks Rocky’s job migrates back into safe zone; Dudley must give up resources.

  10. The Attacks Boris attacks Bullwinkle’s job. Some attacks succeed.

  11. The Attacks Bullwinkle’s job employs camouflage, decoys, and migration.

  12. Multi-Faceted Approach • Strong theoretical basis • reason about conformance to policy • Computational resiliency library • dynamic application management • System software support • scheduling/policy frameworks

  13. Computational ResiliencyLibrary • Group messaging • group contains multiple nodes • all nodes receive all messages to group • Replication/recovery with migration • liveness check at synchronization points • application readiness restored via node creation and migration

  14. Group 1 Group 3 node channel Group 2 Groups and Messaging One group per cooperating task in a distributed computation.

  15. Group 1 Group 2 Group Messaging Detail In actuality, each member of Group 1 has a channel to each member of Group 2.

  16. Nodes of group mapped across processors Multiple nodes as threads in a single process One or more processes per processor Mapping of Nodes to Processors (channels not shown) Group Processor

  17. Periodic Liveness Check • Done at user-defined synchronization points in the computation • All group members send ping messages to all others in the same group • Local Group Leader (1 per group) elected (responsible for restoring intra-group replication level) • LGLs elect Global Group Leader (responsible for inter-group coordination)

  18. Periodic Liveness Check II • LGLs determine local status by fiat, restore replication level, and report to GGL • create new threads via cloning LGL • consensus option is in place but currently unused • GGL reports results of LGL actions to other LGLs. • LGL and GGL return to normal duty

  19. Simple Application

  20. Simple Application After Process Taken Out by Attacker

  21. Application After Second Processor Lost

  22. Current Issues • Exploring through in-house red teaming and modeling • Efficiency of basic mechanisms • multiplicative communication load • additive computation load • Efficacy of basic mechanisms • Window of attack between liveness checks • Attack during liveness check • agreement algorithms

  23. Next Steps • Additional policy choices • agreement protocols • replication/recovery methods • message passing schemes • Tool for user policy expression • state-dependent policy specified via “chinese menu” approach • logical predicates, state transitions } Not necessarily orthogonal choices

  24. Next Steps • -calculus-based formal model for core library behavior • Split/merge for groups • all nodes in a group must be identical • basis for load balancing, functionality mutation • First demo at summer PI meeting, 2001

  25. 6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03 Schedule Basic -calc Formal equivalence Policy/ Protocol Analysis Basic CRLib

  26. 6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03 Schedule II Funct. Mut. Policy Frameworks Camouflage Schedulers Hard. Apps. Integration Demos

  27. Open Issues • Cost/benefit analysis of CR • how much protection do we provide if the attacker knows what we’re trying to do? • How much is performance affected by message load, active replication, etc. • Potential integration with other OASIS • complementary with system-hardening technology (e.g., Dependable Intrusion Tolerance)

More Related