70 likes | 183 Views
Join the CS598YYZ-Fall 2005 discussions and presentations exploring both old and new software errors. Reflecting on a 13-year-old paper, we'll analyze how software errors have evolved, identify persistent problems, and devise improved strategies for robustness and recovery. Key projects include bug characteristics studies, anomaly detection in parallel programs, and automated recovery strategies rooted in dynamic execution information. Access recordings and slides from Chuck Thompson’s talk and collaborate on future software resilience strategies.
E N D
CS598-YYZ : Reliable and Robust SoftwareSoftware Errors Yuanyuan (YY) Zhou
Admin. Things • Chuck Thompson’s talk slides and Video are available from the web • Viki server is ready • https://www-s.cs.uiuc.edu/wiki/cs598yyz/ • Today • Two presentations • Discussions • 3 projects description CS598YYZ-Fall 2005
Discussion: Old Software Errors • It has been 13 years since that paper was published, things may have changed. So • What things are still the same? • What are changed? How would the changes affect the results? (James) • How to address the “undefined state” problem? (Joe) • How can we use the results? (Chao) CS598YYZ-Fall 2005
Discussion: New Software Errors • How useful are the results from the paper? (Soumyadeb, Joe) • How to make device driver more robust? (Mohammad) • How to recover non-environment dependent failures? CS598YYZ-Fall 2005
Project 4: Bug Crash Team • OS Bug characteristics study • So we know where to focus and how to design OS • Previous study • Based on the bugs detected by a home-built tool • May not represent what happens in real-world • Idea • Crawl the Linux or FreeBSD newsgroup, source code, change log, document to find bugs that have been reported • Characterize and analyze them • Related work • Zhenmin and Lin’s current project CS598YYZ-Fall 2005
Project 5: Bug Detection Team • Observation • Dynamic execution has many patterns • Value-based invariants (Stanford, MIT), PC-based invariants (Pin)… • Parallel programs also have patterns • Idea: • Explore patterns in parallel programs to detection anomaly • Choices of focus • Shared-memory applications • Message passing applications • Challenges • What information is useful? • How to mine the information? • How to detect anomaly? CS598YYZ-Fall 2005
Project 6: Bug Recovery Team • Motivation: • Software always have bugs, so we need recovery and self-healing • Effective recovery needs to understand what just happened, so automatic diagnoses are important • Automatic diagnose requires dynamic execution information, but collecting and analyzing these information at run-time is expensive • Idea: • Monitoring at different overhead levels • If an exception happen, rolls back to a previous checkpoint • Based on the symptom, dynamically translate the binary to collect more information • Automatically analyze these information to diagnose the problem and come with a good recovery strategy • Recovery (already provided by Rx) but with more precise workaround • Also potentially permanently fix the bug CS598YYZ-Fall 2005