1 / 33

MC 고객 세미나 Dump 분석 사례 2002/7/18 김 병 수 Mission Critical Support Center

MC 고객 세미나 Dump 분석 사례 2002/7/18 김 병 수 Mission Critical Support Center. 시스템 Crash dump 개요 Dump Device 체크 Crash dump 설정 체크 & 변경 Crash dump 관련 환경 변수 시스템 Crash 유형 시스템 Crash 분석 Tools ESC & WTEC 지원 process Q4 를 이용한 Panic 과 HPMC 구별 Stack trace Example

cookalbert
Download Presentation

MC 고객 세미나 Dump 분석 사례 2002/7/18 김 병 수 Mission Critical Support Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MC 고객 세미나 Dump 분석 사례 2002/7/18 김 병 수 Mission Critical Support Center

  2. 시스템 Crash dump개요 • Dump Device 체크 • Crash dump 설정 체크 & 변경 • Crash dump 관련 환경 변수 • 시스템 Crash 유형 • 시스템 Crash 분석 Tools • ESC & WTEC 지원 process • Q4를 이용한 Panic과 HPMC 구별 • Stack trace Example • Spinlock Panic Dump 분석 예 • 메모리 관련 문제 • (kmeminfo, vmtrace,…) • 12. Q & A • Q & A Agenda

  3. 1. 시스템 crash dump 개요 (1) • Crash dump는 시스템이 비정상적인 종료(panic) 또는 • 강제 reset시(TOC)에 시스템의 메모리에 load되어 있던 • 내용을 추출한 것으로, 문제가 발생할 당시의 kernel과 • 메모리에 load되었던 내용의 snapshot입니다. • Physical Memory • Dump device • Dump file system • crashconf & savecrash H

  4. - 시스템 crash dump 개요 (2) System reboot Panic or TOC Dump device 메모리에 있는 내용을 dump device로 저장한다. - /etc/rc.config.d/crashconf - /sbin/rc1.d./S440savecrash booting • dump device에 저장되어 • 있는 내용을 file system에 • 저장한다. • /etc/rc.config.d/savecrash • /sbin/init.d/savecrash File system H

  5. 2. 시스템의 Dump Device check krcdump</etc/rc.config.d> lvlnboot -v Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c1t6d0 (8/16/5.6.0) -- Boot Disk PV Name: lvol1 on: /dev/dsk/c1t6d0 Root: lvol3 on: /dev/dsk/c1t6d0 Swap: lvol2 on: /dev/dsk/c1t6d0 Dump: lvol2 on: /dev/dsk/c1t6d0, 0 H

  6. 3. Crash Dump 설정 체크 & 변경 (1) krcdump</etc/rc.config.d> crashconf -v CLASS PAGES INCLUDED IN DUMP DESCRIPTION ---------- ---------- ---------------- ------------------------------------- UNUSED 13899 no, by default unused pages USERPG 45913 no, by default user process pages BCACHE 130278 no, by default buffer cache pages KCODE 1551 no, by default kernel code pages USTACK 839 yes, by default user process stacks FSDATA 457 yes, by default file system metadata KDDATA 58232 yes, by default kernel dynamic data KSDATA 10975 yes, by default kernel static data Total pages on system: 262144 Total pages included in dump: 70503 ( 70503*4096 -> 약 275MB ) DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME ------------ --------------- ------------- ------------------ ------------- 31:0x016000 310112 1572864 64:0x000002 /dev/vg00/lvol2 -------------------- 1572864 H

  7. - Crash Dump 설정 체크 & 변경 (2) # crashconf -i KCODE # crashconf -i BCACHE # crashconf Crash dump configuration has been changed since boot. CLASS PAGES INCLUDED IN DUMP DESCRIPTION ------ ------ ------------- ----------------------------- UNUSED 13899 no, by default unused pages USERPG 45913 no, by default user process pages BCACHE 130278 yes, by default buffer cache pages KCODE 1551 yes, by default kernel code pages USTACK 839 yes, by default user process stacks FSDATA 457 yes, by default file system metadata KDDATA 58232 yes, by default kernel dynamic data KSDATA 10975 yes, by default kernel static data Total pages on system: 262144 Total pages included in dump: 202332 ( 202332*4096 -> 약 790MB ) DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME ------ --------- -------- ----------- ------------------- 31:0x016000 310112 1572864 64:0x000002 /dev/vg00/lvol2 ---------- 1572864 H

  8. 4. Crash Dump 관련 환경 변수 • - /etc/rc.config.d/savecrash 파일 • SAVECRASH=1 • SAVECRASH_DIR=/var/adm/crash • /etc/rc.config.d/crashconf 파일 • CRASHCONF_ENABLED=1 • CRASH_INCLUDED_PAGES="" • CRASH_EXCLUDED_PAGES="" • CRASHCONF_READ_FSTAB=1 • CRASHCONF_REPLACE=1 H

  9. 5. 시스템 Crash 유형 • Panic • - Spinlock • - Data page fault • - kmalloc panic • 2. TOC • - Manual TOC : console, remote access • - MC/SG TOC • 3. HPMC H

  10. 6. 시스템 crash 분석 tools • q4 & p4 • kmeminfo • vmtrace • lanshow • seminfo • shminfo • /usr/contrib/bin/q4 • dump-analyze.tar.gz • getasm • adb • tusc H

  11. 7. ESC & WTEC 지원 Process WTEC 고객 Korea ESC WTEC 지원요청 Call 접수 Technical support 지원방향 제시 새로운 문제에 대한 협의 Solution 제공 WTEC과 협조하여 Beta patch release patch LAB H

  12. 8. q4를 이용한 Panic과 HPMC 구별 # cd /var/adm/crash/crash.0 # ls INDEX image.1.2 image.1.4 image.1.6 image.2.1 vmunix image.1.1 image.1.3 image.1.5 image.1.7 image.3.1 # /usr/contrib/bin/q4 . @(#) q4 $Revision: A.11.10dl $ $Fri Jun 23 18:05:11 PDT 2000 0 Reading kernel symbols ... Reading kernel data types ... Initialized PA-RISC 2.0 address translator ... Initializing stack tracer ... q4> trace event 0stack trace for event 0crash event was panic panic+0x14report_trap_or_int_and_panic+0x80trap+0xdb8nokgdb+0x8soo_select+0x10pollscan+0xa8poll+0x104syscall+0x480 q4> trace event 0stack trace for event 0crash event was an HPMCskip_int_restore_crs2_0+0x8idle+0x3f4swidle_exit+0x0 H

  13. - INDEX 파일 # cat INDEX comment savecrash crash dump INDEX file version 2 hostname AA123 modelname 9000/800/N4000-44 panic pdvtopg_size2_0 called with bogus address (no translation) dumptime 1018724765 Sun Apr 14 04:06:05 KST 2002 savetime 1018733443 Sun Apr 14 06:30:43 KST 2002 release @(#) $Revision: vmunix: vw: -proj selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1108' Wed Nov 8 19:24:56 PST 2000 $ memsize 8589934592 chunksize 268435456 module /stand/vmunix vmunix 14436240 3149385108 module /stand/dlkm/mod.d/undel undel 100640 2853146214 warning savecrash: savecrash running in the background image image.1.1 0x000000000000 0x000000000fff8000 0x000000000000 0x00000000000 1883f 2246856296 image image.1.2 0x000000000000 0x000000000fb68000 0x000000018840 0x00000000000 7ffff 3581939390 image image.2.1 0x000000000000 0x000000000fff5000 0x000000100000 0x00000000001 42f5f 2736275502 image image.2.2 0x000000000000 0x000000000fff7000 0x000000142f60 0x00000000001 5a437 3525879551 image image.2.3 0x000000000000 0x000000000fffe000 0x00000015a438 0x00000000001 6e17f 2007137303 image image.2.4 0x000000000000 0x0000000009f82000 0x00000016e180 0x00000000001 fffff 1373600558 image image.3.1 0x000000000000 0x0000000000010000 0x000000280000 0x00000000002 fffff 4215202376 H

  14. 9. Stack trace Example (1) Q4> trace –u event 0 LEVEL FUNC ARG0 ARG1 ARG2 ARG3 ARG4 ARG5 ARG6 ARG7 lev 0) panic+0x6c 0'00757110 n/a 0'00756e50 n/a n/a n/a n/a n/a lev 1) pdvtopg_size+0x104 n/a n/a n/a n/a n/a n/a n/a n/a lev 2) hdl_virt_to_pgsize+0x10 0'00000000 0'17e5f000 n/a n/a n/a n/a n/a n/a lev 3) kfree_to_superpage+0x30 n/a 0'00000001 0'00000000 n/a n/a n/a n/a n/a lev 4) kalloc_from_superpage+0x240 n/a n/a n/a n/a n/a n/a n/a n/a lev 5) kalloc+0x14 0'fff8477a 0'00000002 n/a n/a n/a n/a n/a n/a lev 6) alloc_mem+0x44 0xffffffff'fff8477a 0'00002 0'00000 0'0000 n/a n/a lev 7) get_kmem+0x8c n/a 0'00000001 0'00000000 n/a n/a n/a n/a n/a lev 8) kmem_arena_xlarge_alloc+0x74 0'40001240 0'00000 n/a 0'0018c590 n/a n/a n/a lev 9) kmalloc+0x1e4 n/a n/a n/a n/a n/a n/a n/a n/a lev 10) klvxbread2+0x48 n/a n/a n/a n/a n/a n/a n/a n/a lev 11) VxTyped4IndirGetFileAllocTable+0x94 n/a n/a n/a n/a n/a n/a n/a n/a lev 12) VxTyped4IndirGetFileAllocTable+0x1b8 n/a n/a n/a n/a n/a n/a n/a n/a lev 13) VxTyped4GetFileAllocTable+0xc8 n/a n/a n/a n/a n/a n/a n/a n/a lev 14) VxfsGetFileAllocTable+0x168 n/a n/a n/a n/a n/a n/a n/a n/a lev 15) FMakeOneRecord+0x1d4 n/a n/a n/a n/a n/a n/a n/a n/a lev 16) my_unlink+0x6a4 0x400003ff'ffff03a0 n/a 0'00ed1aa8 n/a n/a n/a lev 17) syscall+0x750 n/a n/a n/a n/a n/a n/a n/a n/a lev 18) $syscallrtn+0x0 n/a n/a n/a n/a n/a n/a n/a n/a H

  15. - Stack trace Example (2) func_b: func_b: LDO 128(r30),r30 func_b+4: STW r26,-164(r30) func_b+8: STW r25,-168(r30) func_b+0xC: LDW -164(r30),r1 func_b+10: LDW -168(r30),r31 func_b+14: ADD r1,r31,r19 func_b+18: STW r19,-96(r30) func_b+1C: LDW -96(r30),r28 func_b+20: BV r0(r2) func_b+24: LDO -128(r30),r30 krcdump</hasc/kbs/program> cat a.c func_b(a,b) int a; int b; { int sum; sum = a + b; return(sum); } main() { int a,b; int sum; a = 1; b = 2; sum = func_b(a,b); } main: main: STW r2,-20(r30) main+4: LDO 128(r30),r30 main+8: LDI 1,r20 main+0xC: STW r20,-112(r30) main+10: LDI 2,r21 main+14: STW r21,-108(r30) main+18: LDW -112(r30),r26 main+1C: LDW -108(r30),r25 main+20: LDIL L%0x2000,r31 main+24: BE,L 0x660(sr4,r31),sr0,r31 main+28: COPY r31,r2 main+2C: STW r28,-104(r30) main+30: LDW -148(r30),r2 main+34: BV r0(r2) main+38: LDO -128(r30),r30 H

  16. - Stack trace Example (3) func_B_11_aa( ) { ….. ….. } func_A( ) { ….. func_B( ); 0x40 func_C( ); ….. } func_B_11( ) { ….. func_B_11_aa( ); func_B_11_bb( ); 0xA10 ….. } func_B( ) { ….. func_B_11( ); 0xE30 func_B_22( ); ….. } func_B_11_bb( ) { ….. ….. → panic 발생함 0x104 ….. } func_B_22( ) { ….. func_B_22_aa( ); func_B_22_bb( ); ….. } q4> trace event 0 stack trace for event 0 crash event was a panic panic+0x6c func_B_11_bb+0x104 func_B_11+0xA10 func_B+0xE30 func_A+0x40 H

  17. 10. Spinlock panic 분석 Example (1) Spinlock timeout failure:The spinlock code has NOT failed! Instead, some spinlockusing code has failed to release a spinlock soon enough.Address: 0x00000001000b1180X ; Address: 0x00000001000b10a0X ; owner 0xD78B20; owner 0xD75858 ; lock 0x0 ; lock 0x0 ; flag 0x1flag 0x1 next_cpu 0x4Milliseconds spent spinning =60001 Millseconds/sec = 1000 panic: Spinlock deadlock!PC-Offset Stack Trace (read across, top of stack is 1st):0x002ea9bc 0x0009ca60 0x000f037c0x0003485c 0x005a9eb0 0x005a9e14End Of Stack Processor 0: running thread @ 0x794dc040 (tid 574427) command was /opt/omni/lbin/bma in system mode since 60.05 seconds (6005 ticks), here is the stack trace: ------------- check_panic_loop+0x38 trap+0xb2c nokgdb+0x8 spluser+0x14 syscall+0x59c $syscallrtn+0x0 ------------- Processor 1: servicing interrupt ------------- panic+0x6c too_much_time+0x2e8 wait_for_lock+0x174 sl_retry+0x1c chanq_timer_expire+0x28 invoke_callouts_for_self+0x9c sw_service+0x100 mp_ext_interrupt+0x1ec ivti_patch_to_nop3+0x0 check_panic_loop+0x3c trap+0xb2c nokgdb+0x8 H

  18. - Spinlock panic 분석 Example (2) Address: 0x00000001000b1180X; owner 0xD78B20; lock 0x0 ; flag 0x1next_cpu 0x4Milliseconds spent spinning =60001Millseconds/sec = 1000 q4> load struct lock from 0x1000b1180loaded 1 struct lock as an array (stopped by max count)q4> print -txindexof 0mapped 0x1spaceof 0addrof 0x1000b1180physaddrof 0x480b1180realmode 0sl_lock 0 -> lockedsl_owner 0xd78b20 -> 이 lock을 갖고 있던 CPUsl_flag 0x1 -> some process wait for this locksl_next_cpu 0x4sl_indirect 0sl_name_ptr 0x17a56esl_pad[0] 0 q4> examine 0x17a56e<<2 using sschedlock H

  19. - Spinlock panic 분석 Example (3) q4> load struct mpinfo from sl_owner loaded 1 struct mpinfo as an array (stopped by max count) q4> print -tx prochpa procindex prochpa 0xfffffffffc060000 procindex 0x3 q4> load struct mpinfo from mpproc_info max nmpinfo loaded 6 struct mpinfos as an array (stopped by max count) q4> print -x procindex prochpa procindex prochpa 0 0xfffffffffc220000 0x1 0xfffffffffc020000 0x2 0xfffffffffc0a0000 0x3 0xfffffffffc060000  lock owner 0x4 0xfffffffffc0e0000 0x5 0xfffffffffc260000 q4> H

  20. - Spinlock panic 분석 Example (4) SPINLOCK(rootvfs_lock); if (vfsp == rootvfs) { if (rootvfs->vfs_next) panic(“vfs_remove: unmounting root”); rootvfs = NULL; SPINUNLOCK(rootvfs_lock); vfs_unlock(vfsp); return; } for (tvfsp = rootvfs;tvfsp != (struct vfs *)0; tvfsp = tvfsp->vfs_next) { if (tvfsp->vfs_next == vfsp) { tvfsp->vfs_next = vfsp->vfs_next; SPINUNLOCK(rootvfs_lock); vp = vfsp->vfs_vnodecovered; vp->v_vfsmountedhere = (struct vfs *)NULL; vfs_unlock(vfsp); VN_RELE(vp); return; } } SPINUNLOCK(rootvfs_lock); H

  21. - Spinlock panic 분석 Example (5) H

  22. 11. 메모리 관련 문제 (1) +--------------------------------------------+ | Performance Related Globals | +--------------------------------------------+ Physical memory in pages: 1835004 7167.98 MBytes desfree in pages : 5120 20.00 MBytes minfree in pages : 2304 9.00 MBytes freemem in pages : 2178 8.51 MBytes sleepmem in pages : 2304 9.00 MBytes avefree in pages : 2176 8.50 MBytes avefree30 in pages : 2160 8.44 Mbytes There were 55 lower priority threads waiting for memory. They were: thread[26] @ 0x6e06b10, tid 1318, pri 130, cmd getty thread[27] @ 0x6e06d18, tid 404, pri 130, cmd syncer thread[88] @ 0x6e0e900, tid 876, pri 130, cmd snmpdm thread[91] @ 0x6e0ef18, tid 604, pri 130, cmd inetd thread[92] @ 0x6e0f120, tid 897, pri 130, cmd mib2agt thread[93] @ 0x6e0f328, tid 579, pri 130, cmd rpcbind thread[94] @ 0x6e0f530, tid 497, pri 130, cmd syslogd thread[98] @ 0x6e0fd50, tid 869, pri 130, cmd sendmail ………… H

  23. - 메모리 관련 문제 (2) Descending List of most frequent waitchannel's ============================================== sleeptime in ticks: waitchannel # threads longest shortest ----------- -------- ------- ------ memory_sleepers 55 1095559 19740 vx_inactive_thread_sv+0x10 26 50380 190 selwait 22 15023769 91 vx_inactive_thread_sv+0x18 15 28380 380 lvmkd_q 6 3123 3123 streams_mp_sync 5 1421106 1243523 vx_inactive_thread_sv 5 8380 380 ubase 3 1478338 77 vx_inactive_thread_sv+0x20 3 4380 380 pm_sigwait+0x280 2 80 14 hpstreams_read_int+0x238 2 15003499 1408477 streams_blk_sync 2 15042552 15042552 ticks_since_boot 2 42 42 Ticks since boot: 15050632 H

  24. - 메모리 관련 문제 (3) : kmeminfo kmeminfo (3.29) libp4 (6.60): Opening ./vmunix ./INDEX Loading symbols from ./vmunix Kernel TEXT pages not requested in crashconf Will use an artificial mapping from ./vmunix TEXT pages Processing pfdat table (1016730 entries) ... ---------------------------------------------------------------------- Physical memory usage summary (in pages): Physmem = 1835004 Physical memory Freemem = 2178 Free physical memory Used = 1832826 Used physical memory System = 362341 By kernel: Static = 42221 for text and static data Dynamic = 169727 for dynamic data Bufcache = 146799 for file-system buffer cache Eqmem = 42 for equiv. mapped page pool SCmem = 3552 for system critical page pool User = 1425163 By user processes: ----------> 5567 MB Uarea = 1344 for thread uareas Disowned = 43978 Disowned pages H

  25. - 메모리 관련 문제 (4) : kmeminfo -u kmeminfo (3.39) libp4 (7.33): Opening ./vmunix ./INDEX Loading symbols from ./vmunix Kernel TEXT pages not requested in crashconf Will use an artificial mapping from ./vmunix TEXT pages ---------------------------------------------------------------------- Summary of user processes memory usage: Process list sorted by resident set size ... proc vas p_pid va_rss va_prss va_ucount command ----------------------------------------------------------------------- 0x002d70980 0x0420acb00 783 40339 40298 112760 mib2agt 0x002d7b600 0x042c84500 3222 9535 1675 114346 oracle (37 MB ) 0x002d7f980 0x042a50100 3224 8499 639 113354 oracle 0x002d75480 0x042e14300 12241 8199 339 113018 oracle 0x002d7bd80 0x042c31900 3228 8131 271 112938 oracle 0x002d82a40 0x042c30700 3233 8127 267 112950 oracle 0x002d74940 0x0421e5c00 3235 8115 255 112918 oracle 0x002d7ae80 0x042c01600 3231 8099 239 112906 oracle 0x002d80100 0x042ba8a00 22349 3576 3154 5489 tnslsnr 0x002d7a700 0x042304000 1296 887 754 3343 opcctla 0x002d7e6c0 0x0422cac00 2150 845 587 3406 rep_server ……………………….. H

  26. - 메모리 관련 문제 (5) : shminfo shminfo (3.5) libp4 (6.24): Opening ./vmunix ./INDEX Loading symbols from ./vmunix Kernel TEXT pages not requested in crashconf Will use an artificial mapping from ./vmunix TEXT pages Global 64-bit shared quadrants: =============================== Space Start End Kbytes Usage Q4 0x00b4f000.0xc000000000000000-0xc000000000045fff 280 OTHER Q4 0x00b4f000.0xc000000000048000-0xc00000000004efff 28 OTHER Q4 0x00b4f000.0xc00000000017c000-0xc000000000fbffff 14608 FREE Q4 0x00b4f000.0xc000000000fd6000-0xc000000003ffffff 49320 FREE Q4 0x00b4f000.0xc000000004000000-0xc00000007424afff 1837356 SHMEM id=4773376 Q4 0x00b4f000.0xc00000007424b000-0xc000000077ffffff 63188 FREE Q4 0x00b4f000.0xc000000078000000-0xc0000000e824afff 1837356 SHMEM id=693780 Q4 0x00b4f000.0xc0000000e824b000-0xc0000000ebffffff 63188 FREE Q4 0x00b4f000.0xc0000000ec000000-0xc00000015c24afff 1837356 SHMEM id=662549 Q4 0x00b4f000.0xc00000015c24b000-0xc00000015fffffff 63188 FREE Q4 0x00b4f000.0xc000000160000000-0xc0000001d024afff 1837356 SHMEM id=58390 H

  27. - 메모리 관련 문제 (6) : vmtrace • Virtual Memory Trace : The tracing mechanism allows to basically do three things • Detect memory corruptions • Detect memory leaks • General logging of allocation and free operations. • *** • The following menu will appear.0) END OF LIST1) 32 byte bucket2) 64 byte bucket3) 128 byte bucket4) 256 byte bucket5) 512 byte bucket6) 1024 byte bucket7) 2048 byte bucket8) 1 page bucket9) 2 page bucket10) 3 page bucket11) 4 page bucket12) 5 page bucket13) 6 page bucket14) 7 page bucket15) 8 page bucket16) > 8 pagesEnter bucket sizes(only one at each prompt)[0- 16]>1Enter bucket sizes(only one at each prompt)[0- 16]>0 0) END OF LIST1) Tracing for Memory Corruption2) Tracing for Memory Leaks3) Tracing for LoggingEnter type of tracing(only one at this prompt) [0- 3]>1Enter type of tracing(only one at this prompt) [0- 3]>3Enter type of tracing(only one at this prompt) [0- 3]>0 H

  28. - 메모리 관련 문제 (7) : vmtrace +--------------------------------------+ | Processor activity | +--------------------------------------+ Processor 1 started it by panic'ing. Here is the stack trace: stack trace for event 0 crash event was a panic ... LEVEL FUNC ARG0 ARG1 ARG2 ARG3 lev 0) panic+0x10 0x2d6d8 n/a n/a n/a lev 1) kalloc+0x174 0x1 0x2 n/a n/a lev 2) kalloc_from_superpage+0xc8 0x1 0x2 n/a n/a lev 3) kmalloc+0x358 0x400 0x16 n/a n/a (0x400 ->1024) lev 4) kmem_alloc+0x114 0x400 n/a n/a n/a lev 5) pn_alloc+0x14 n/a n/a n/a n/a lev 6) pn_get+0x1c n/a n/a 0x7ffe6c30 n/a lev 7) lookupname+0x18 n/a n/a n/a 0 lev 8) stat1+0x34 n/a 0x1 0x20 n/a lev 9) stat+0x14 0x7ffe6268 n/a n/a n/a lev 10) syscall+0x1a0 n/a n/a n/a n/a lev 11) $syscallrtn+0x0 n/a n/a n/a n/a H

  29. - 메모리 관련 문제 (8) : vmtrace ------------------------------------------------------------ The following is the information in the Leak Log The records are printed in no particular order. This lists all memory allocations which have not been FREE'd. The stack trace for the corresponding address indicates the location at which it was allocated. Each record is printed in the following format: Pid, Address, Size, Type, Time stack trace where it was allocated -------------------------------------------------------------- 8304 0x33800000 576 0 Tue Jun 30 22:09:47 1998 vmtrace_kmalloc+0x1b4 kmalloc+0x15c poll+0x2a0 syscall+0x1a0 $syscallrtn 8304 0x33400000 560 0 Tue Jun 30 22:09:29 1998 vmtrace_kmalloc+0x1b4 kmalloc+0x15c poll+0x2a0 syscall+0x1a0 $syscallrtn 8304 0x33000000 672 0 Tue Jun 30 22:08:51 1998 vmtrace_kmalloc+0x1b4 kmalloc+0x15c poll+0x2a0 syscall+0x1a0 $syscallrtn ..... …………….. 8304 0x2e002000 528 0 Tue Jun 30 21:55:05 1998 vmtrace_kmalloc+0x1b4 kmalloc+0x15c poll+0x2a0 syscall+0x1a0 $syscallrtn Number of entries printed: 100 H

  30. - Process Trace tool : tusc #./tusc -T %T -v -l -E -p nnnnn mmmm ( Attached to process 1471: "./ns-admin -d /var/opt/netscape/server4/admin-serv/config" [32-bit] ) 14:16:46 [1471]{1582} ksleep(PTH_CONDVAR_OBJECT, 0x40177acc, 0x40177ad4, 0x7694062c) [sleeping] 14:16:46 [1471]{1583} sigwait(0x4002af44, 0x7683b128) ...................... [sleeping] set : SIGTERM 14:16:46 [1471]{1584} ksleep(PTH_CONDVAR_OBJECT, 0x401428cc, 0x401428d4, 0x7682a1ac) [sleeping] 14:16:46 [1471]{1607} ksleep(PTH_CONDVAR_OBJECT, 0x401828cc, 0x401828d4, 0x768191ec) [sleeping] 14:16:46 [1471]{1608} poll(0x76808550, 1, 5000) ............................ [sleeping] 14:16:46 [1471]{1609} poll(0x767f7550, 1, 5000) ............................ [sleeping] 14:16:49 [1471]{1584} ksleep(PTH_CONDVAR_OBJECT, 0x401428cc, 0x401428d4, 0x7682a1ac) = -ETIMEDOUT 14:16:49 [1471]{1584} gettimeofday(0x7682a1a4, 0) .......................... = 0 14:16:49 [1471]{1584} clock_gettime(CLOCK_REALTIME, 0x7682a250) ............ = 0 14:16:49 [1471]{1582} ksleep(PTH_CONDVAR_OBJECT, 0x40177acc, 0x40177ad4, 0x7694062c) = -ETIMEDOUT 14:16:49 [1471]{1582} gettimeofday(0x769405e0, 0) .......................... = 0 14:16:49 [1471]{1582} gettimeofday(0x76940624, 0) .......................... = 0 14:16:49 [1471]{1582} clock_gettime(CLOCK_REALTIME, 0x769406d0) ............ = 0 14:16:49 [1471]{1607} ksleep(PTH_CONDVAR_OBJECT, 0x401828cc, 0x401828d4, 0x768191ec) = -ETIMEDOUT 14:16:49 [1471]{1607} gettimeofday(0x768191e4, 0) .......................... = 0 14:16:49 [1471]{1607} clock_gettime(CLOCK_REALTIME, 0x76819290) ............ = 0 14:16:50 [1471]{1584} ksleep(PTH_CONDVAR_OBJECT, 0x401428cc, 0x401428d4, 0x7682a1ac) = -ETIMEDOUT 14:16:50 [1471]{1584} gettimeofday(0x7682a1a4, 0) .......................... = 0 14:16:50 [1471]{1584} clock_gettime(CLOCK_REALTIME, 0x7682a250) ............ = 0 14:16:50 [1471]{1582} ksleep(PTH_CONDVAR_OBJECT, 0x40177acc, 0x40177ad4, 0x7694062c) = -ETIMEDOUT 14:16:50 [1471]{1582} gettimeofday(0x769405e0, 0) .......................... = 0 H

  31. - 시스템 Hang & TOC • 시스템의 console access 가능하였는가 ? • 다른 시스템에서 ping이나 telnet이 되었는지 ? • 시스템에 이상이 발생할 당시에 특별히 변경되었거나, • 이상이 있었던 사항은 ? • 일부 sub-system의 partial hang ? ( network card ) • 메모리 사용은 ? • MC/SG TOC • ** 시스템 crash 분석에 도움이 되는 파일들 • OLDsyslog.log, shutdownlog, /var/tombstones/ts99 • ** 설치되어 있는 패치리스트 • ( swlist –l patch –a patch_state PH\* > swlist.txt) H

  32. 12. Dump 분석사례 Q & A HP ESC 김병수 : byung-soo_kim@hp.com

  33. Thanks

More Related