1 / 57

ASE 131: Troubleshooting ASE

ASE 131: Troubleshooting ASE. David Wein Principal Product Support Engineer david.wein@sybase.com July 17 th , 2003. Agenda. Introduction Presentation Goals Problem Solving Resources Understanding ASE Diagnostics Finding information in error messages Decoding stack traces

theta
Download Presentation

ASE 131: Troubleshooting ASE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASE 131: Troubleshooting ASE David WeinPrincipal Product Support Engineer david.wein@sybase.com July 17th, 2003

  2. Agenda • Introduction • Presentation Goals • Problem Solving Resources • Understanding ASE Diagnostics • Finding information in error messages • Decoding stack traces • Picking good key words for research • Trace Flags • The shared memory dump facility • Types of Problems • Performance Issues • Locking / Blocking / Deadlocking • Low-Level Task Related Problems • Backup and Recovery Issues

  3. Presentation Goals • Help you help yourself • What resources are available to customers? • Understand how Sybase Tech Support works on problems • What is the process? What can you do to help? • Understand ASE’s diagnostics • Errors, stack traces, and memory dumps • Understand how to deal with specific problems that may be encountered with ASE • Minimize impact by understanding what is happening within ASE and how to handle it.

  4. Online Problem Solving Resources • Online support home is http://www.sybase.com/support • Product Manuals • Technical Documents • Newsgroups • Solved Cases / Existing Change Requests • EBF Downloads • Case Management • ASE documents are very strong • Troubleshooting Guide • Early chapters are a wealth of information on getting out of trouble and staying out of trouble. • Also contains detailed write-ups on many ASE messages and should always be checked as a first step. • System Administration (SAG) and Performance and Tuning (P+T) guides are also very useful for understanding the area of ASE where you may be having problems

  5. Online Problem Solving Resources, Continued • Public newsgroups are an invaluable service • Over a dozen ASE newsgroups covering general issue to specific topcis such as High Availability and Linux • Searchable archive • Informally monitored by many Sybase employees and lots of experienced customers • Solved cases contains over 17,000 searchable support cases with answers • Common problems and their answers are well documented in this knowledge base. • Hundreds of additional entries are added every month • Change Requests (CRs or bugs) are also searchable • Contains most CRs reported by customers starting in May of 2002. • Contains a summary and applicable version

  6. P&T Newsgroup Screenshot

  7. Agenda • Introduction • Problem Solving Resources • Working with Sybase Tech Support • Change Request / Bug Fix Process • Understanding ASE Diagnostics • Finding information in error messages • Decoding stack traces • Picking good key words for research • Trace flags • The shared memory dump facility • Types of Problems • Performance Issues • Locking / Blocking / Deadlocking • Low-Level Task Related Problems • Backup and Recovery Issues

  8. ASE Error Message Formats • Messages that appear in the log have a “header” string that provides good information: 02:00000:00056:2002/12/16 23:40:49.14 kernel current process (0x11890001) infected with 11 • Use this information to isolate overlapping messages on multi-engine systems Date / Time Family ID ASE Layer Process ID Engine ID

  9. ASE Error Message Formats, Continued • ASE uses both numbered and unnumbered messages • The “numbered” messages correspond to what is found in the sysmessages table. • The “unnumbered” messages are more internal in nature, do not exist in sysmessages, and are not sent to the client. • Examples: • Numbered message: Msg 208, Level 16, State 1: Line 1: foo not found. Specify owner.objectname or use sp_help to check whether the object exists (sp_help may produce lots of output). • Unnumbered message: 2002/09/05 16:44:49.82 kernel  sddone : write error on virtual disk 16 block 3990498:

  10. Severity and State • Numbered messages consist of four components • The message number itself, i.e. 208, 695, 1105, etc. • A severity level between 10 and 24. • Severity determines how ASE responds to the error • Levels 10 – 18 represent a user error. These are non-fatal. 10 – 16 can be corrected by the user. 17 and 18 may require DBA intervention. • Levels 19 – 26 are fatal and will result in a terminated client session. These are more severe problems and represent internal inconsistencies • Severity determines how the error is reported • Non-SA client only receive non-fatal errors (<= 18). They receive an alternate message for severity 19 and higher. • SA clients receive errors regardless of severity • Only severity 19 and higher are sent to the errorlog / console. • If traceflag 3602 is enabled, severity 10-18 are also sent to the errorlog. • A state value • States identify the instance of an error. For errors raised several places, this isolates the code location that raised the instance. • This is very important for tech support! • The actual text of the message

  11. Stack Traces An invaluable diagnostic tool • Stack traces provide useful information relating to what a task was doing when a fatal condition was encountered • Stack traces go to the errorlog and console when a fatal error is encountered • Fatal conditions such as time slice violations and signals also cause stack traces to be generated • Trace flag 3601 was cause a stack trace to be generated for non-fatal errors as well, but this is not normally advisable. • Stack traces are often preceded or followed by relevant information • Be sure to save the entire errorlog if you see a stack trace • If you are working with Tech Support, they will want to see the entire log, not just a cut and paste of the stack trace. • Picking good key words out of a stack may help you find the cause online • Using good key words for searches may turn up the answer in the known problems or solved cases databases.

  12. Stack Traces, Continued An example stack trace 05:00000:00074:2002/06/21 08:58:21.17 kernel  current process (0x128b007b) infected with 11 05:00000:00074:2002/06/21 08:58:21.23 kernel  Address 0x0068b7f8 (getnext_ctlib_subst+0x138), siginfo (code, address) = (1, 0x0000010a) 05:00000:00074:2002/06/21 08:58:21.23 kernel  ************************************ 05:00000:00074:2002/06/21 08:58:21.23 kernel  SQL causing error : execute etcm.etcm.ap_get_ba_address;1 @country = 'CAN', @addr_type = 'I', @basicba = '999999', @addrsub = '99', @baname = '' 05:00000:00074:2002/06/21 08:58:21.23 kernel  ************************************ 05:00000:00074:2002/06/21 08:58:21.23 server  SQL Text: execute etcm.etcm.ap_get_ba_address;1 @country = 'CAN', @addr_type = 'I', @basicba = '999999', @addrsub = '99', @baname = '' 05:00000:00074:2002/06/21 08:58:21.23 kernel  curdb = 13 pstat = 0x10100 lasterror = 0 05:00000:00074:2002/06/21 08:58:21.23 kernel  preverror = 0 transtate = 0 05:00000:00074:2002/06/21 08:58:21.23 kernel  curcmd = 197 program =

  13. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.24 kernel  pc: 0x0064f8dc pcstkwalk+0x24(0x29b869a0, 0x00000000, 0x0000270f, 0x00000002, 0xfffffff8) 05:00000:00074:2002/06/21 08:58:21.24 kernel  pc: 0x0064f7e8 ucstkgentrace+0x194(0x128b007b, 0x2d3ccd58, 0x2d3ccd58, 0x2bb795c8, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.24 kernel  pc: 0x0061cc90 ucbacktrace+0xa8(0x2bb795c8, 0x00000001, 0x128b007b, 0x2d14f238, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.25 kernel  pc: 0x000f62e0 terminate_process+0xd70(0x00beef50, 0x000054e8, 0x000054e4, 0xffffffff, 0x00005000) 05:00000:00074:2002/06/21 08:58:21.25 kernel  pc: 0x006324d4 kisignal+0x1e0(0x29b8745c, 0x29b871c4, 0x0000000b, 0x29b87198, 0x29b87450) 05:00000:00074:2002/06/21 08:58:21.26 kernel  pc: 0xff34b7dc _getfp+0x220(0x0000000b, 0x29b87450, 0x29b87198, 0x006322f4, 0x00bf3424) 05:00000:00074:2002/06/21 08:58:21.26 kernel  pc: 0xff3484c0 _fork+0x864(0x0000000b, 0x00bf3380, 0x00000000, 0x00000000, 0x00000000)

  14. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.26 kernel  pc: 0x0068b77c getnext_ctlib_subst+0xbc(0x00000200, 0x00000000, 0x00000000, 0x00000000, 0x38433e50) 05:00000:00074:2002/06/21 08:58:21.26 kernel  [Handler pc: 0x006886ec omni_backout installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.26 kernel  pc: 0x0068b490 omni_getnext+0x2c0(0x38433e50, 0x2c65f574, 0x2d3ccd58, 0x2c662c5c, 0x2d3d2298) 05:00000:00074:2002/06/21 08:58:21.27 kernel  pc: 0x0042a760 exec_eop+0x1750(0x00002000, 0x2d3ccd58, 0x00000000, 0x00000000, 0x000005c5) 05:00000:00074:2002/06/21 08:58:21.27 kernel  pc: 0x004297ac exec_eop+0x79c(0x00007000, 0x2d3ccd58, 0x45478880, 0x00000000, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.28 kernel  [Handler pc: 0x0042e510 execerr installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.28 kernel  [Handler pc: 0x0056f508 jcsExHandler installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x003dfe84 sortmerr installed by the following function:-]

  15. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x0042e510 execerr installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x0042e510 execerr installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x00365b38 aritherr installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x00428960 execute+0xb80(0x455e96d0, 0x0000702c, 0x455e96d0, 0x2d3ccd58, 0x00005000) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x0038aebc s_execute+0x2404(0x80000000, 0x000000c5, 0x455e9348, 0x00005393, 0x00001000) 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x003a4e50 s_handle installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003a1e8c sequencer+0x10d8(0x00be0400, 0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003921f8 execproc+0x79c(0x00005388, 0x00007000, 0x2d3ccd58, 0x00005000, 0x0000001f)

  16. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x0038b3d4 s_execute+0x291c(0x00beec00, 0x00be2800, 0x29025200, 0x00000008, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x003a4e50 s_handle installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003a1e8c sequencer+0x10d8(0x00be0400, 0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003921f8 execproc+0x79c(0x00005388, 0x00007000, 0x2d3ccd58, 0x00005000, 0x0000001b) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x0038b3d4 s_execute+0x291c(0x00beec00, 0x00be2800, 0x29025200, 0x00000008, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x003a4e50 s_handle installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003a1e8c sequencer+0x10d8(0x00be0400, 0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58)

  17. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003921f8 execproc+0x79c(0x00005388, 0x00007000, 0x2d3ccd58, 0x00005000, 0x0000001b) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x0038b3d4 s_execute+0x291c(0x00beec00, 0x00be2800, 0x29025200, 0x00000008, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x003a4e50 s_handle installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x003a1e8c sequencer+0x10d8(0x00be0400, 0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58) 05:00000:00074:2002/06/21 08:58:21.30 kernel  pc: 0x0011cf9c tdsrecv_language+0xb4(0x00be0400, 0x00befc00, 0x00000001, 0x00005000, 0x00005400) 05:00000:00074:2002/06/21 08:58:21.30 kernel  [Handler pc: 0x0013d9ec hdl_backout installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.31 kernel  [Handler pc: 0x0033fe1c ut_handle installed by the following function:-] 05:00000:00074:2002/06/21 08:58:21.31 kernel  [Handler pc: 0x0033fe1c ut_handle installed by the following function:-]

  18. Stack Traces, Continued 05:00000:00074:2002/06/21 08:58:21.31 kernel  pc: 0x0012c6bc conn_hdlr+0x20f0(0x00befc00, 0x00be0400, 0x00000000, 0x000054e4, 0x00005400) 05:00000:00074:2002/06/21 08:58:21.31 kernel  pc: 0x00662618 _coldstart(0x00000081, 0x0012a5cc, 0x00000000, 0x00000000, 0x00000000) 05:00000:00074:2002/06/21 08:58:21.31 kernel  end of stack trace, spid 74, kpid 311099515, suid 141

  19. Good Information from Stack Traces • While most of the stack is only relevant for internal Sybase folks, some of it can be used to help match problems with exiting cases or bugs. • First, make note of the problem. Here it is an ‘infected with 11’ 05:00000:00074:2002/06/21 08:58:21.17 kernel  current process (0x128b007b) infected with 11 • Next, note which function was executing when the signal arrived 05:00000:00074:2002/06/21 08:58:21.23 kernel  Address 0x0068b7f8 (getnext_ctlib_subst+0x138), siginfo (code, address) = (1, 0x0000010a) • Here the function getnext_ctlib_subst() is where the problem was encountered. • This line is only relevant for “signal” stack traces and doesn’t apply to time slices or most other errors.

  20. Good Information from Stack Traces, Continued • The SQL causing error can help you reproduce or avoid the problem. 05:00000:00074:2002/06/21 08:58:21.23 kernel  SQL causing error : execute etcm.etcm.ap_get_ba_address;1 @country = 'CAN', @addr_type = 'I', @basicba = '999999', @addrsub = '99', @baname = '‘ • Lines thatfollow contain the current database (pcurdb) as well as any errors that were encountered prior to the event that triggered the stack trace.

  21. Picking Good Key Words for Research • Use good searching strategies, such as remembering synonyms. • When researching stack traces, search on functions near the top of the stack. • For “infected with” issues, try searching on the function in the address line following the infected with 11. • In the stack example we just looked at, we would search on getnext_ctlib_subst. • Search example on following slides

  22. Picking Good Keywords for Research, Continued • Some ASE functions appear in almost every stack trace, and are therefore poor keywords to use when doing research • Avoid searching based on these functions: • pcstkwalk • ucstkgentrace • ucbacktrace • os_backtrace • terminate_process • kisignal • hdl_default • hdl_backout • s_handle • execerr • aritherr • ex_raise • close_network • execute • s_execute • exec_eop • sequencer • s_compile • tdsrecv_language • conn_hdlr • _coldstart • kaclkintr • _fork • _getfp • Functions towards the top of the stack are more likely to be relevant than those on the bottom • If you match the bottom of the stack but not the top, you likely haven’t matched the problem.

  23. Researching our Stack Trace Step 1: Search solved cases Keyword we have entered

  24. Researching our Stack Trace, Continued Step 2: Get the results

  25. Researching our Stack Trace, Continued Step 3: Click on the case number and see the details

  26. Researching our Stack Trace, Continued Step 4: Search for CR details Changed search type

  27. Researching our Stack Trace, Continued Step 5: Get more details on the CR

  28. ASE Trace Flags • ASE has many, many trace flags that affect the server behavior. • Some are officially documented, some are not. • Some basic rules apply for all trace flags: • Don’t turn on a trace flag unless you understand what it does and what the side effects are. • Document who turned the flag on, when it was turned on, and why it was turned on. • Don’t ever add a trace flag to a RUN_server file without adding a comment detailing why it is there.

  29. ASE Trace Flags – How they work • With few exceptions, trace flags are not tracked on a per-connection basis. • A single, server-wide, mask of activated trace flags is maintained. • When a connection turns on a trace flag, it will inherit the behavior of all other activate trace flags within the server. • Example: • Spid 10 runs dbcc traceon (3604, 302, 310) • Spid 11 runs dbcc traceon (8399) • Trace 302 and 310 output will be generated for any query spid 11 runs. However, the output will not be sent back to the client since spid 11 did not activate trace 3604 • Some trace flags will automatically apply to all connections, regardless of whether or not they have activated other flags. • Access to the ASE code is required to determine which flags have this behavior, therefore when in doubt you should consult Sybase Tech Support • Most flags that affect query processing fall into this category • The only flags that must appear in the RUN_server file are those that affect the boot-time behavior and initialization of the server.

  30. The Sybmon Utility ASE’s advanced diagnostic facility • Diagnostic utility built into the dataserver binary • Provides access to shared memory structures • May be used to analyze live servers or dumped shared memory images of servers • Non-intrusive and requires no prior planning before invoking • The “halt” and “restart” commands are intrusive and will cause the server to suspend processing. However, these are the exceptions. • Sybmon dumps provide a “snapshot” image into a server • This snapshot is often times very useful in determining the cause of a problem • However, Sybmon dumps alone cannot answer every problem • Problems that occur over time (example: loss of SIGALRMs) can’t be seen with a snapshot • Problems that are occur outside of the shared memory segment can’t be seen.

  31. The Shared Memory Dump Facility (CSMD) • ASE has the ability to copy its shared memory to a file for examination by Sybase Tech Support • The shared memory dump can be manually triggered, or it can be event based • The event based dump is called a Configurable Shared Memory Dump or CSMD • CSMD can be configured for: • A specific error or a group of errors, such as 605 or 6xx • All errors with a severity equal to or greater than that configured • A signal, such as signal 10 or 11 (infected with 10 or 11) • A timeslice violation • A panic / stack overflow • An unnumbered message that might appear in the log • Memory dumps can be analyzed by Sybase to determine the state of the server and the contents of memory structures at the time of a fault

  32. sp_shmdumpconfig Stored Procedure The interface to CSMD • The sp_shmdumpconfig stored procedure interfaces with the CSMD facility • Add or drop dump conditions • View configured dump conditions • View and modify dump defaults • To view current setup, run sp_shmdumpconfig with no arguments • Example on the next slide • For details on syntax and usage, see the notes section of this slide • Two relevant sp_configure parameters: • “dump on conditions” set to 1 to turn on, set to 0 (default) to turn off • “max dump conditions” sets the maximum number of dump conditions that can be configured (default 10)

  33. sp_shmdumpconfig Sample Output > sp_shmdumpconfig > go Type       Value Maxdumps Page_Cache Proc_Cache Unused_Space Est_File_Size          Filename          Directory   ---------- ----- -------- ---------- ---------- ------------ -------------         ------------------------------         ---------- Error      813   Default  Default    Default    Default      13 MB Default File Name          Default Directory   Signal     10    Default  Include    Default    Default      22 MB    Default File Name          Default Directory     Timeslice  ---   Default  Default    Default    Default      13 MB   Default File Name          Default Directory   Panic      ---   Default  Default    Default    Default      13 MB   Default File Name          Default Directory   Defaults   ---   1        Omit       Omit       Omit         13 MB   Generated File Name          /opt/sybase/csmd Current number of conditions: 4 Maximum number of conditions: 10

  34. Sybmon and CSMD – What You Should Know • Sections of memory included in dump files are configurable • Core data structures are always included • Page cache, procedure cache, and unused space are optional • Inclusion of procedure cache is recommended • Dump files can be very large • Size is based on the amount of memory ASE is configured for and the optional memory modes • ASE processing will stop while the dump file is being written • This is done to achieve a consistent image of memory in the dump file • Time of the halt is based on the size of the dump and the speed of the I/O system • Specify a dump directory, but don’t specify a file name • It is best to let ASE generate the name for the dump file. This prevents dump files from being overwritten • Always FTP dump files in binary mode

  35. Should You Be Proactive With CSMD? • Some shops setup CSMD before they encounter problems to help speed resolution should they encounter any • Do this at your own discretion. • Keep in mind that generating a CSMD causes ASE to halt while the file is being written • Some conditions to consider • Signals 4 (very rare), 10, 11 (signal 0xC0000005 on NT) • Timeslice, panic • Fatal errors (configure for severity 19) • Keep in mind that the dump file may not be that useful and Tech Support may not be interested in it • Based on the specifics of the issue the CSMD may not be the best way to approach the problem

  36. Agenda • Introduction • Problem Solving Resources • Working with Sybase Tech Support • Change Request / Bug Fix Process • Understanding ASE Diagnostics • Finding information in error messages • Decoding stack traces • Picking good key words for research • Trace Flags • The shared memory dump facility • Types of Problems • Performance Issues • Locking / Blocking / Deadlocking • Low-Level Task Related Problems • Backup and Recovery Issues

  37. Performance Problems • Planning ahead greatly simplifies the resolution of performance problems. • Due to the variances in applications and environment, it is difficult to determine what is normal and what is abnormal for a particular customer. • Gathering benchmark data prior to encountering a problem allows for the isolation of problems

  38. Basic Performance Troubleshooting • Isolation query issues from system issues • Optimizer issue, blocking issue, or resource issue? • Gather data over time • Establish baselines for all levels of activity • Understand typical system loads at various times • Look for trends • Avoid tuning based on a single 5 minute sample • Understand what you are changing • Read documentation before tuning anything • Document what you did and why you did it • Don’t ever turn on a trace flag until you know what it does and what the side effects are!

  39. Tools to Consider for General Problems • sp_sysmon • Very good at giving an overall view of what the server is doing. • Not so good at narrowing down to specific users, queries, etc. • Well documented and easy to use, but can be difficult to find meaningful data. • Monitor and historical servers • Provides data sp_sysmon can’t • Better at monitoring specific users, queries, objects, etc. • Requires separate setup, but this is trivial • Can serve client apps written with the Monitor Client Library, such as Sybase central plug-in • Shipped as part of ASE, no separate licenses required • DBXray • New optional feature for ASE 12.5.0.3 • Provides real-time, graphical monitoring of ASE • Other external tools • OS level tools such as iostat, vmstat, sar, etc. • In extreme cases, tools such as truss can be used to determine the amount of time spent in individual system calls, but this adds a lot of overhead.

  40. Guidelines for Using sp_sysmon • Archive your output • If performance drops off, archived output will be very useful • Run sysmon prior to major changes • Hardware • Application • Configuration parameters • Run sysmon for short periods of time • Don’t overflow monitor counters (unsigned int) • A sample that includes both busy and idle time may reveal neither. • Look at per second and per transaction numbers • Better than raw numbers for making comparisons • Be careful of conflicts • Monitor and historical server share counters with sysmon • Run only a single sysmon at a time • Be careful of automated recommendations • Newer versions make recommendations • Look at the system over time and consider implications prior to following any of them.

  41. Locking / Blocking: Tempdb Contention As the overall scalability of ASE dramatically improved, tempdb became a bottleneck • Why is this a problem? • Creating and dropping tables requires exclusive locks on system catalogs • High throughput environments create / drop many temp tables • This is normally fast, but having to do physical IO while holding these locks can queue up other processes • Becomes a point of serialization in a high throughput environment

  42. Tempdb Contention, Continued Types of temporary tables • Procedural tables • Created inside of a stored procedure • Visible only to that procedure and sub procedures • Automatically dropped when proc exists • Session tables • Created outside of a proc using #tablename • Visible only to that session • Automatically dropped when that session exists • Shareable tables • A regular table that lives in tempdb • Visible to all users • Will not be automatically dropped (except for ASE reboot) Contention

  43. Tempdb Contention, Continued Some things to try • On the application side, reduce the number of #tables created • Replace procedural tables with session tables • proc can truncate the table at entry • Have multiple sessions share the same “regular” table • Row lock this table • Requires additional logic at the application level • One the server side, reduce latency while holding locks • Place tempdb on a ramdisk • Use named caches to take load out of default data cache

  44. Tempdb Contention, Continued Common Misconceptions • Performing create followed by insert…select is better than doing select into • Based on the belief that ASE holds the system table locks for duration of select into • Reality is select into releases catalog locks prior to inserting rows • Select into also faster due to reducing logging • Truncating a #table prior to dropping it reduces time locks are held • This is absolutely true. During drop we must deallocate all extents while holding the system table locks. • However, ASE already truncates temp tables prior to grabbing system table locks (introduced in 11.9.2) • Traceflag 3703 disables this, and it can increase contention. A common misconception is that it will reduce contention.

  45. Tempdb Contention, Continued Big Improvements in 12.5.0.3 • Multiple tempdbs spread the load • 12.5.0.3 allows DBAs to create and assign multiple tempdbs, based on application, login, etc. • Spreads the load and dramatically improves throughput • Lazy log writing reduces physical IO • Log buffers are not flushed to disk during commit processing in temporary databases • Data buffers not flushed to disk at end of select into where target table is in a temporary database

  46. Deadlocks • Deadlocks are almost always an application issue. • Get details by turning on “print deadlock information” • sp_configure option • Replaces the older 1204 traceflag for every day troubleshooting • Traceflags can provided additional details • 1204 : prints deadlock chains (similar to “print deadlock information” above). This is required for other deadlock traceflags, so it should only be used when the trace flags below are required. • 1205 : prints a stack trace for each task in the deadlock chain. Tech support may request this in rare circumstances (requires the use of trace 1204) • 1218 : prints the query plan (showplan) upon a deadlock. (requires the use of trace 1204) • My favorite way to understand deadlocks is to diagram them. • Note: 12.5.0.3 MDA tables also contain deadlock information

  47. Other Lock Troubleshooting Tips • Traceflag 1202 will insert blocked lock requests into syslocks • Normally blocked requests do not go to syslocks, and therefore are not displayed by sp_lock • 1202 causes them to be displayed for additional troubleshooting • Traceflags 1212 and 1217 provide extensive lock tracing • 1212 traces every lock acquisition and release • 1217 traces lock acquisition and release for user tables only (same as 1212, but doesn’t display system table locks) • Both of these traces provide A LOT of output, but may be helpful in isolated test and development environments • May provide a better idea of how the locking system is handling your query

  48. Low-Level Task Related Problems Things that make your process go boom • A task may be terminated due to some internal problems • Time slice violations • Operating system signals • Stack and stack guard overflows • Normally the task will be terminated and ASE will move on. However… • If the terminating task is holding a spinlock, ASE will shut itself down. • Stack overflow issues will always result in an ASE shutdown

  49. Low-Level Task Related Problems – Time Slice Time slice violations prevent a run-away task from taking over your server • ASE uses a non-preemptive scheduling algorithm • Each scheduled task is responsible for voluntarily or automatically scheduling itself out. • Every task is given a time quantum to run in, along with a grace time. • If a task fails to yield after this grace time is exhausted, it will be killed. • Tasks normally run for a very short time, yielding when: • The task must sleep for a resource, such a network or physical I/O • Execution has completed and the task becomes idle • They have exhausted their time quantum and are executing in the grace period. ASE code frequently checks to see if the time quantum has been exhausted and if the task should yield. • Tasks may fail to yield within the grace period when: • The task is caught in a loop that does not contain any yield point • A long code path is being executed that does not contain a necessary yield point • An operating system call that was expected to take a short period of time takes a long time

  50. Time Slice - Yielding

More Related