1 / 8

Debugging of Parallel Systems

Debugging of Parallel Systems. A Short Introduction. Joel Huselius (joel.huselius@mdh.se). Terminology. Error (bug) An unwanted state in a product Fault An unintended condition that can cause an error Debug The process of locating, analysing, and correcting suspected faults.

ponce
Download Presentation

Debugging of Parallel Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Debugging of Parallel Systems A Short Introduction Joel Huselius (joel.huselius@mdh.se)

  2. Terminology Error (bug) An unwanted state in a product Fault An unintended condition that can cause an error Debug The process of locating, analysing, and correcting suspected faults

  3. Classes of Errors Probe effect Observability Problem Livelock Deadlock Stampede effect Bystander effect Irreproducibility effects Completeness problem

  4. Cyclic Debugging Repeated executions Execute – Halt – Examine – Continue loop Probe effect Irreproducibility problem Stampede effect

  5. Monitoring To record information of a program execution, in order to review it in a model of the target environment offline Software Hardware Hybrid

  6. Monitoring (cont) Browsing Replay Simulated Replay Probe effect Regression testing Accuracy of the model versus reality

  7. Major Players and Contibutions Recent Disputations Dieter Kranzmüller “Event Graph Analysis for Debugging Massively Parallel Programs” 2000 Henrik Thane “Monitoring Testing and Debugging Distributed Real-Time Systems” 2000 Seminal Papers LeBlanc and Mellor-Crummey “Debugging Parallel Programs with Instant Replay” 1987 McDowell and Helmbold “Debugging Concurrent Programs” 1989 Carver and Tai “Replay and Testing for Concurrent Programs” 1991 Fidge “Fundamentals of Distributed System Observation” 1996 Schütz “Fundamental Issues in Testing Distributed Real-Time Systems” 1994

  8. Conferences IEEE Parallel and Distributed Systems IEEE Symposium on Reliable Distributed Sysmtems ACM International Symposium on Software Testing and Analysis

More Related