1 / 38

DM221 Build a Productive 24x7 Database Operation Infrastructure

DM221 Build a Productive 24x7 Database Operation Infrastructure. George Wang Melinda Meyers Principal Consultant Senior DBA Sybase Inc. America Online Inc. zwang@sybase.com mmeyersm@aol.com. AOL Business Challenge Architecture Future Direction Q & A. Agenda.

benito
Download Presentation

DM221 Build a Productive 24x7 Database Operation Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DM221Build a Productive 24x7 Database Operation Infrastructure • George Wang Melinda Meyers • Principal Consultant Senior DBA • Sybase Inc. America Online Inc. • zwang@sybase.com mmeyersm@aol.com

  2. AOL Business Challenge Architecture Future Direction Q & A Agenda

  3. World’s leader in interactive services, Web brands, Internet technologies, and e-commerce services AOL Mission Statement “To build a global medium as central to people’s lives as telephone or television…and even more valuable.” AOL Vision Statement “To build an interactive medium that improves the lives of people and benefits society as no other medium before it.” AOL Business

  4. World’s #1 Web-based communication portal World’s #1 internet online service #1 value brand internet online service 1 of the top 5 web sites #1 local content network and community guide Nation’s #1 movie guide and ticketing service AOL Business

  5. AOL Business Note: The number of ASE and RS on the graph does not indicate production deployment ASE RS

  6. Explosive growth Large-scale distributed deployment 24x7 operation Heterogeneous environment Mixed versions of OS Mixed versions of ASE and RS Dynamic configuration Staff Challenge

  7. Operation System monitoring Problem detection Maintenance Performance analysis Repository Automation Notification Architecture

  8. Standardization Installation Configuration Procedures High-availability Fast response History analysis Proactive action Operation – Minimize Down Time

  9. NOC SA On-call Designated SA Group Operation Chain of Escalation

  10. Responding • Connectivity to BAK • Responding • Health of all threads • Stable device System Monitoring Ping ASE RS ASE BAK BAK

  11. 50 75 90 Warning Warning Alarm System Monitoring Data and Log Space 0 100

  12. LogChecker Rule-based Check ASE errorlog Detect error messages Filter out informational messages Check RS errorlog Capture message tags Problem Detection

  13. RS Heartbeat & Latency Program flow Alarm if latency > threshold Detect health of RS Latency analysis Problem Detection ASE RS ASE Insert a row at Time A Detect the row at Time A+latency

  14. Database & Transaction Dump Dump to file system Copy system tables Unix backup Monitor capacity Maintenance

  15. Miscellaneous Threshold of transaction log 50%, 75% and 90% Update statistics Rotate errorlogs Database consistency check Maintenance

  16. Performance Data Collection ASE Monitor & Historical Server CPU utilization Store procedure execution IO activity Cache activity Object activity Server status Locking Performance Analysis

  17. Performance Data Analysis Exception Trend Capacity Load Benchmark Performance Analysis

  18. Operation

  19. Server inventory Maintenance log Problem history Performance warehouse Repository

  20. Server characteristics name hostname SA CPU version OS PM memory type subsystem POC connection Manual update via web pages Automatic update via collection agents Server Inventory

  21. Maintenance history Installation Upgrade Bounce OS maintenance Configuration Space allocation Update & query via web pages Help diagnose problems Maintenance Log

  22. Problem history Symptom Diagnostics Solution Case tracking Workaround Update & query via web pages Benefits Diagnose similar problems Share knowledge and skills Problem History

  23. Automatic data collection Automatic data summary Analysis model Dynamic – On-demand analysis on the Web Static – Pre-defined and complex data model Delivery via the Web Performance Warehouse

  24. Operation & Repository SI: Server Inventory ML: Maintenance Log PH: Problem History PW: Performance Warehouse

  25. Unix Cron Benefits Simple Drawbacks Failure detection Job stream & dependency Standalone vs. Distributed environment Job Scheduling

  26. Autosys Scheduling and operations automation for distribution environment Benefits Centralized job scheduling & management Flexible job scheduling and dependency Uninterrupted job processing Failure detection Fault tolerance Job Scheduling

  27. Autosys Ethernet Client Server Remote Agent Polls Remote Agent Autosys Database Event Processor • Remote Agent • start up • run job • return job status • exit • event found • starting conditions met • start up remote agent

  28. Job location Job @ Autosys

  29. Start condition Min & max run time Automatic restart Grouping Dependency Standard input & output redirection Job @ Autosys

  30. EMAIL Email Notification ASE RS ASE Monitoring Host Autosys NOC SA On-call Designated SA Group

  31. Benefits Easy to configure Drawbacks Large volume Duplicate Hard to prioritize Broken escalation chain Difficult to identify problem Email Notification

  32. Probes Object Server Netcool/OMNIbus ASE RS ASE Monitoring Host Autosys NOC SA On-call Designated SA Group

  33. Real-time event monitoring & management Consolidates Integrates Configurable Meaningful Netcool/Omnibus

  34.    Partially Infrastructure Component Web-enabled Operation Repository Automation Notification

  35. Productivity Availability Reliability Scalability Infrastructure

  36. Improved quality of online services Overall system availability 99.6% in 1999 Strong subscriber growth (10M to 20M in 2 years) Strong revenue growth (500M to 1.3B in 2 years) Business Impact

  37. Web-based infrastructure Knowledge-centric event analysis Performance-based early detection Automated agent Problem auto-correction Enterprise management integration Future Direction

  38. Contact Information George Wang <zwang@sybase.com> Mendy Meyers <mmeyersm@aol.com> Questions & Answers Conclusion Thank You

More Related