1 / 25

LKCD – Linux Kernel Crash Dump

LKCD – Linux Kernel Crash Dump. Harish K Motorola Inc. What is LKCD? Why LKCD?. The Journey. Introduction LKCD – Process Design Considerations Kernel Implementation User Level Analysis – (Lcrash). Introduction.

bert
Download Presentation

LKCD – Linux Kernel Crash Dump

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LKCD – Linux Kernel Crash Dump Harish K Motorola Inc.

  2. What is LKCD? Why LKCD?

  3. The Journey • Introduction • LKCD – Process • Design Considerations • Kernel Implementation • User Level Analysis – (Lcrash)

  4. Introduction LKCD is a set of kernel and application code to configure, implement, and analyze system crash dumps Objectives: • Post-failure kernel analysis • Kernel problems are resolved more quickly • As the Linux kernel becomes more complex, the need for LKCD increases

  5. LKCD - Process

  6. LKCD – Kernel Design Considerations The biggest design considerations were: • Dump Save Mechanism • Raw I/O vs. Buffer Cache I/O • Kernel Code Location • Dump Storage

  7. LKCD – Kernel Design Considerations 1. Dump Save Mechanism • PROM Save Method Crash, reset the system, and have the hardware's PROM save the memory image to disk. • Kernel Save Method Crash, save the memory image to disk, and then reset the system

  8. LKCD – Kernel Design Considerations 1. Dump Save Mechanism Kernel save method chosen because: • PROM/BIOS is too architecture-specific • reset/power-off may clear memory • kernel disk driver restrictions • code can be modified in kernel; PROM code is difficult to make changes

  9. LKCD – Kernel Design Considerations 2. Raw I/O vs. Buffer Cache I/O • Buffer cache locking prevents handling dump workaround without major performance hit on basic I/O • Raw I/O was not fully supported in Linux (in the kernel) • IDE, RAID, etc., drivers need raw I/O hooks (current plan is to create driver layer above to avoid necessary locking)

  10. LKCD – Kernel Design Considerations 3. Kernel Code Location • Code changes are separated into generic and architecture-specific files • kernel/vmdump.c • arch/<arch>/kernel/vmdump.c • Additional modifications made to linux/include/sysctl.h, kernel/sysctl.c, and kernel crash hook functions

  11. LKCD – Kernel Design Considerations 4. Dump Storage • Memory dumps are saved to swap space • Swapping during boot-up is an issue • Disk partition tables in memory -- could this cause a data corruption problem? • Cannot assume filesystem layer will be available during crash

  12. LKCD - Kernel Implementation Dump Process Activation • Kernel Hooks for executing dump process: • The kernel directly calls panic() • A kernel exception occurs due to a system fault, calls die_if_kernel() • In both instances dump_execute is called, which in turn calls architecture specific __dump_execute() to save dump to disk

  13. LKCD - Kernel Implementation Storing Crash Dumps Dump Header Dump Page Headers Dump pages

  14. LKCD - Kernel Implementation Storing Crash Dumps • The first 64K of the crash dump contains the dump header, which show the system state at the time of the kernel failure • Memory pages are written next, each with a page header containing • virtual address of the page in memory • size of page (important if compressed) • page flags (compressed, raw, dump end) • page header with a special end marker is written and the dump process completes

  15. Kernel Dump Tunables • The set of kernel dump tunable are listed in /etc/sysconfig/vmdump which configures the behavior of LKCD system • The tunables are • DUMP_ACTIVE • DUMPDEV • DUMPDIR • DUMP_LEVEL • DUMP_COMPRESS_PAGES • PANIC_TIMEOUT

  16. User Level Analysis - LCrash lcrash is a utility that generates detailed kernel information about crash dumps. It contains many features for displaying information about the events leading up to a system crash in a clear, easy-to-read manner It basically operates in two modes: • Crash Dump Report Generation • Interactive Crash Dump Analysis

  17. User Level Analysis - LCrash Crash Dump Report Generation: This report contains selected pieces of information from the kernel considered most useful when trying to identify the cause of a crash. The LCRASH report includes the following information: • General system information • Type of crash • Dump of system log_buf • CPU summary • Kernel stack trace leading up to the system PANIC • Disassembly of instructions before and after the instructions that caused the crash

  18. User Level Analysis - LCrash LCRASH Interactive Commands • For a more detailed examination of the elements of a crash • Kernel data displayed in a clear, easy-to-read manner • Invoked via an ASCII command line user interface featuring command line editing and command history • Command output can be piped to utilities such as more and grep

  19. User Level Analysis - LCrash LCRASH Interactive Commands example: • Stat Displays pertinent system information and the contents of the log_buf array. • Vtop Displays virtual to physical address mappings for both kernel and application virtual addresses • Symbol Maps kernel symbols to virtual addresses

  20. User Level Analysis - LCrash LCRASH Interactive Commands example: • Dump Dumps the contents of system memory in a variety of bases (hexadecimal, decimal, or octal) and data sizes (byte, short, int, or long) • Task Displays relevant information for selected tasks or all tasks running at the time of the crash • Trace Displays a kernel stack backtrace for selected tasks, or for all tasks running on the system • Dis Disassembles one or more machine instructions

  21. lcrash Example Output >> stat | head sysname : Linux nodename : crashme.atmyhouse.com release : 2.4.8 version : #9 SMP Mon Dec 10 00:05:19 PST 2001 machine : i686 domainname : (none) LOG_BUF: >> dump log_buf 10 0xc0332c60: 4c3e343c 78756e69 72657620 6e6f6973 : <4>Linux version 0xc0332c70: 342e3220 2820382e 746f6f72 74617740 : 2.4.8 (root@cra 0xc0332c80: 79657265 70612e65 : shme.atm

  22. lcrash Example Output >> task ADDR UID PID PPID STATE FLAGS CPU NAME ====================================================================== 0xc02e4000 0 0 0 0 0 - swapper 0xdfffc000 0 1 0 0 0x100 - init 0xdfff2000 0 2 1 1 0x40 - keventd 0xdffee000 0 3 0 0 0x40 - ksoftirqd_CPU0 [ . . . ] 0xde47a000 0 867 1 1 0x100 - mingetty 0xda0fe000 0 1017 660 0 0x140 - sshd 0xd9c06000 0 1018 1017 1 0x100 - bash 0xde4b4000 0 1101 1018 0 0x100 0 insmod ====================================================================== 31 active task structs found

  23. lcrash Example Output >> t 0xda0fe000 ========================================================= STACK TRACE FOR TASK: 0xda0fe000(sshd) 0 schedule+1040 [0xc0111250] 1 schedule_timeout+121 [0xc0110d89] 2 do_select+506 [0xc014251a] 3 sys_select+820 [0xc01428c4] 4 system_call+44 [0xc0106ed4] =========================================================

  24. Reference: http:\\lkcd.sourceforge.net • Contact: harish@motorola.com

  25. Questions/Comments?

More Related