1 / 36

Ten Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com

Ten Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com. Focus of Presentation. Explore some “strange” database problems that have baffled some DBAs

jacob
Download Presentation

Ten Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ten Database MysteriesChris LawsonDatabase Specialists, Inc.www.dbspecialists.comclawson@dbspecialists.com

  2. Focus of Presentation • Explore some “strange” database problems that have baffled some DBAs • Most of the mysteries occurred on critical production systems, although some were on development systems • ALL of the mysteries were eventually explained • Depending on your personal experience, some of these “mysteries” will seem trivial or commonplace; others will indeed seem mysterious • Most mysteries have a simple explanation • Most mysteries have a simple fix

  3. Why Spend Time on These Database Mysteries? • Each DBA has a unique set of experiences and biases. What one DBA thinks is obvious, another will not. • An Oracle “detective” is part scientist, part artist. Many solutions require creativity, not just logic. • A superior DBA will look for ways to “stretch” and learn ways to handle difficult problems. • Without working out difficult problems, you will not advance as a DBA. • You will be the “hero” if you encounter a mystery and solve it; remember the solution--you may see it again!

  4. A Word About Oracle Versions • This presentation was originally written in 1998 • Most of these mysteries involve Oracle 7 databases • Although some of the mysteries might not apply directly to Oracle 8i, they still offer insight into the problem-solving process

  5. 1 The Case of the Berserk Application Clue #1: “Big Phone Company” 1997 • Using HPUX, Oracle 7.3.2.3 • Help desk application (Vantive) that connects to Oracle database suddenly goes berserk, creating thousands of connections • Program had worked normally for many months • DBAs watch helplessly as CPU load driven from 1 to 50 • As DBAs kill extra processes, more take their place • Alert log and recent trace files show nothing unusual • DBAs are united in accusing the application as the culprit

  6. Berserk Application(continued) Clue #2: “Big Publisher Ltd.” 1998 • Running Sun Solaris, Oracle 7.3.2.3 • Users complain that performance has degraded in recent months • Manager states that “something must be wrong with the network” • Application is CORIS, a document management/printing application • DBA investigates. Discovers that time to connect in SQL*Plus is 30-45 seconds, even though server load is low • Connect time is bad whether remote (PC) or directly on server • Server load (file I/O and CPU) is generally low

  7. Berserk Application: Solution • OTRACE is the culprit. It is active by default on many 7.3 Oracle versions • Excerpt from Oracle Corporation Alert: “Problems described here can occur when Oracle Trace is not configured and is widely enabled.”

  8. Berserk Application: Solution(continued) To Detect: • Check directory ORACLE_HOME/rdbms/otrace: As size of files process.dat and regid.dat approach 10mb, problems arise -rw-r--r-- 1 oracle dba 3161424 Jun 05 09:43 process.dat -rw-r--r-- 1 oracle dba 263808 Jun 05 09:43 regid.dat • To correct: simply remove these two files, then issue command otrccref

  9. Berserk Application: Solution(continued) To Prevent: • Add line to listener.ora for each database (after ORACLE_HOME): ( ENVS=’EPC_DISABLED=TRUE’) • Set and export environment variable EPC_DISABLED=TRUE for all users. Put standard profile in /etc directory • Restart all databases and restart listener

  10. 2 The Case of the Reluctant Patch Background: • To correct several bugs, decision is made to upgrade from 7.3.2.2 to 7.3.2.3 (HPUX) • Patch is obtained from Oracle and applied to test server. DBA notes that patch ran very quickly and runs again “just to be sure” • Bug is now gone on test server

  11. The Case of the Reluctant Patch(continued) Problem: • Patch is similarly applied to production server--same operating system and version. • Production application is tested, but bug is still there! • Another DBA reviews patch file, location, etc. All seem correct.

  12. The Case of the Reluctant Patch: Solution • DBA happens to notice that upon SQL*Plus startup, database is 7.3.2.2! • The patch was really only applied on the second run. This is apparently a quirk in the patch readme file. • The command what oracle (then grep for patch#) can be used to determine which patches are applied

  13. 3 The Case of the Sleazy SQL • “Big Publisher Ltd.” runs an MRP system called “AVALON,” similar to Oracle Manufacturing. Database stores inventory, part information, vendors, etc. • Server is ATT3555, running NCR UNIX. Database is Oracle 7.1.6 • Issue: Users report that certain common operations are very slow

  14. The Case of the Sleazy SQL(continued) • DBA investigates and queries v$sqlarea using: SELECT sql_text FROM v$sqlarea WHERE disk_reads/executions > 1000; • Query yields troublesome SQL statement, with these stats: DISK_READS PER EXECUTION =5,000 BUFFER_GETS PER EXECUTION =5,100

  15. The Case of the Sleazy SQL(continued) • Statement has been accidentally designed to ensure worst possible performance by making index usage impossible: SELECT * FROM ABC WHERE NVL (COL_W) = NVL (:1) AND NVL (COL_X) = NVL (:2) AND NVL (COL_Y) = NVL (:3) AND NVL (COL_Z) = NVL (:4); • TABLE ABC SIZE = 3mb, about 25,000 rows • DBA requests developers to alter statement to eliminate NVL (COL_N) functions • DBA advised that no resources available to make change • Problem: If code can’t be changed, what can be done to improve performance?

  16. The Case of the Sleazy SQL:Solution • CACHE the table! For example: alter table xyz cache; • Normally, blocks from full-table scans are designated for rapid age-out; otherwise, they would “wipe-out” the db cache. Cache of table causes blocks to be treated “normally.” • Caching table disables rapid age-out of this table • Logical reads will not be reduced, but disk reads approach zero! • Note: DB_BLOCK_BUFFERS was slightly increased to compensate for the cached table that now consumes a few megabytes of database cache

  17. 4 The Case of the Non-Optimal Optimizer • A large software company based in “Cedar Shores” has designed a large financials application. Program has been tuned for Rule Based Optimizer. • The application runs very well, is a mature product, which is used in thousands of companies around the world. • Some users clamor for new features: more horns and whistles

  18. The Case of the Non-Optimal Optimizer(continued) • The new development team, afraid to become obsolete, wants to convert to Cost-Based Optimizer (CBO). They also wisely consider that Oracle recommends using CBO on new projects. • The older developers, now nearing peaceful retirement, predict disaster if the database is switched to CBO, because the execution plans will change. • Issue: How can Optimizer be selectively switched to CBO without changing the code?

  19. The Case of the Non-Optimal Optimizer: Solution • Simply substitute a view having a “hint” for the table needing CBO For example: rename DEPT to DEPT_ORIG create view DEPT as select / * + ALL_ROWS */ * from DEPT_ORIG; • Now, application will use the VIEW when it looks for DEPT • All queries using DEPT will use CBO • Note: Upon renaming a table, the indexes and constraints will “move” with the table; however, synonyms and grants may need to be reset.

  20. 5 The Case of the Forgetful Memory • A new internet-transaction application, ECXpert, and its database have been installed on a Sun Ultra Enterprise Server • Sun Solaris 2.5.1, Oracle 7.2.3 • Application appears to run smoothly for several months, although it occasionally creates large dump files • Trace files appear occasionally with ORA-4030 “Out of Process Memory” and recommends “increase process memory quota” • Server seems to hang occasionally. Server reboot fixes • SysAdmin checks kernel parameters related to memory. All correct and match other servers. Not using any large stored procedures • Problem: What is causing memory/hang problems?

  21. The Case of the Forgetful Memory: Solution • DBA checks /tmp (swap area on server) and notes 99% consumed • Investigation reveals that application occasionally goes berserk and consumes ENTIRE SWAP area with log files • Deletion of log files does not return disk space, since application is still “holding” the files • Reboot of server cleaned up /tmp area, thereby correcting problem • Suggestion: If memory-related error messages exist, check swap area first

  22. 6 The Reluctant Index Affair Background: • DBA asked to analyze and tune Australia manufacturing database. Database is running CBO. One particularly bothersome SQL statement is identified • The WHERE condition is perfect for a new index, because of its excellent selectivity • Index is quickly created. Table is also analyzed Problem: • Even though index is a “perfect” solution to the query, a full table scan is used instead

  23. The Reluctant Index Affair:Solution • The values in the table are very lopsided. Optimizer, however, will assume uniform distribution, which is incorrect in many cases • Re-analyze and specify histogram: ANALYZE TABLE XYZ FOR ALL INDEXED COLUMNS SIZE 75 • This creates histogram of 75 “buckets” for each indexed column • With these statistics, optimizer will “know” how values are distributed, and will more often make right decision to use an index or not

  24. 7 Mystery of the Hanging Database • At random intervals, a 7.3.2.3 database hangs. No trace files, and nothing unusual in the alert log. • When problem occurs, no response to new connections requests; over 1200 existing connections “hang.” • Oracle Support is alerted to priority 1 problem; experts across the world investigate for weeks • DBA is using OEM Lock Manager tool and notices user who is blocking about 25 other users. The hang occurs soon after. • Oracle Australia recommends checking indexes. This suggestion led to the solution. Problem: • How did index problems hang database?

  25. Mystery of the Hanging Database: Solution • Application design flaw • There are hundreds of foreign keys in the database; 99% had indexes. A few did not, violating good design practice. When batch program began updates, locking increased rapidly. • Without FK index, updates on parent table completely block updates on child (vice versa for 7.1.6) • Reference: Server Application Developers Guide • Although not admitted as database “bug,” database was overwhelmed by the locks • Once indexes on all FK’s created, problems disappeared

  26. 8 The Case of the Mysterious Package • Manufacturing application was installed on a Sun Ultra 3000 server. A small database was created for testing purposes. Oracle version 7.2.3. Shared pool size about 60mb. • At first, all went well. Then, seemingly randomly, when the users began to try new features, they would receive a “funny” error message and the application failed. • A trace file recommended increasing shared pool Problem: • How can application fail with such a sizable shared pool? • Aside from massive increase in shared pool, what can be done?

  27. The Case of the Mysterious Package: Solution • The application uses about 20 massive PL/SQL packages. Some are 5x the SYS.STANDARD package. When a package load is attempted, it will not fit in the shared pool. • Memory-intensive packages should be “pinned” or “kept” in shared pool after database startup EXECUTE SYS.dbms_shared_pool.keep ('OBJECT_NAME'); • But first, must find the “big” packages (will also list SYS.STANDARD): SELECT owner, name, sharable_mem FROM v$db_object_cache WHERE sharable_mem > 100000;

  28. The Case of the Mysterious Package: Solution(continued) • Example script to find “big” packages and generate SQL script to “pin” them in memory SELECT 'EXECUTE SYS.dbms_shared_pool.keep('''|| owner||'.'||name||''');' FROM v$db_object_cache WHERE sharable_mem > 100000 AND type NOT IN ('VIEW', 'SYNONYM', 'TABLE') AND name NOT LIKE '%SHARED_POOL%' AND owner IS NOT NULL;

  29. 9 The Case of the Uncooperative Rollback • In mid-afternoon, DBA (running “OEM Top Sessions”) notices many users “ACTIVE” but showing 0 file I/O. Lock Manager reveals one user performing big update blocking all. • Culprit tracked down--agrees to be terminated. DBA disconnects session. • Locks are not released, but user is “marked for kill.” • Very little file I/O activity. Alert log shows very slow switching of redo logs. • DBA performs shutdown abort then startup. Database starts up after 2 minutes. All is well. Problem: • Why did user not rollback and release locks?

  30. The Case of the Uncooperative Rollback: Solution • If session is terminated, speed of rollback is proportional to init.ora parameter CLEANUP_ROLLBACK_ENTRIES • If default value (20) is used, rollback of killed session can take 50x time of original update. Alternatively, shutdown abort/startup cleans up database much faster. • Rationale: Parameter prevents rollback of one user from hogging all the resources on a busy system • Solution: Increase parameter to reduce rollback time (since shutdown abort is usually not an option)

  31. 10 The Singular Case of the Phantom Users • A manufacturing database in Sydney, Australia, needed performance tuning. SQL tuning on US databases had yielded good results. • The table v$sqlarea was queried to find resource-intensive SQL statements. Several commonly-run statements were isolated. Performance was improved through index additions. • Statistics were re-examined over the next 4 hours, in order to confirm improvements. • However, repeated looks at execution statistics showed no change. • DBA puzzles over enigma for several hours, then realizes that NOTHING is WRONG! What did he finally realize?

  32. The Singular Case of the Phantom Users: Solution • Nothing is wrong because the users were still asleep. It was only 5:00am in Sydney!

  33. 11 The Case of the Slow Physician(Bonus Mystery) • Health application is experiencing slow run times. Analysis shows following SQL statement causing 3000 disk reads • COSIGN_VIEW is a join of 2 tables (DOCS + COSIGN), joined on patient_id (indexed) • Search criteria ‘DR. MCKENZIE’ is very selective; thus, nested loop IS expected choice for optimizer, with DOCS as Driving table. SELECT * from COSIGN_VIEW WHERE doctor_id = 'DR. MCKENZIE'

  34. The Case of the Slow Physician(continued) • Even with index on DOCS(doctor_id), optimizer (CBO) insists on using hash-join, and refuses to ever use index on doctor_id! • Repeated analyze table commands do not correct • Substituting query not using a view yields expected NL result Problem: • Why does using the view cause optimizer to make the “wrong” choice?

  35. The Case of the Slow Physician:Solution • Everything seemed to point to a problem with the view, because all worked normally as long as the view was excluded • Finally, DBA compared view definition (in OEM Schema Manager) to definition seen using “describe table” syntax. The columns did NOT match! • Examining the object-create script revealed that the view switched column names, so that column DOC_ID in the view did NOT match DOC_ID in the table! • Once the correct column was indexed, a Nested-Loop Join was selected by the optimizer

  36. Contact Information Chris Lawson clawson@dbspecialists.com http://www.dbspecialists.com Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111

More Related