140 likes | 256 Views
This presentation focuses on BT's strategies for managing intricate end-to-end (e2e) systems, highlighting challenges and solutions in service design and operational management. Topics include standards for application events, business process monitoring, and the integration of metrics for effective service level management (SLM). Discussion encompasses real-time order tracking, capacity management, automated event correlation, and the importance of aligning SLAs with business requirements. BT's experience exemplifies the need for innovative approaches in dynamic, distributed architectures.
E N D
BT – Managing Complex Systems Ian Johnston & John Palmer BCS Kingston & Croydon Branch presentation 26/02/08
Presentation Objectives • Approach to managing e2e systems • A standard for application events • Business process and component transaction monitoring • Order tracking and jeopardy • Leveraging the value of monitoring, eg. ASGs, Service and Capacity etc. • Managing COTS products eg BEA, Siebel
The BT experience • BT architecture – SOA – linked reusable capabilities • Our position has been driven from experience in monitoring of complex distributed architecture. • The concept of configuring toolsets to monitor e2e is unachievable for large enterprises – maintenance expensive/ impossible. • This has led us along the Design route which now parallels ITIL‘s Service Design concepts.
BT Matrix Architecture Challenges - Service Design • Service Level Management • SLAs aligned to business requirements • BT’s outsourcing strategy • Availability • Understanding CE requirements • Response times • Capacity Management • Accurate measurement of transaction volumes • Response times broken down by capability • IT Service Continuity management • Dynamic deployment in virtualised environments • Physical and geographic resilience • SLM • Defining measurements & targets, eg volumes, response times • Aligning SLAs with UCs • Capacity Management • Procedures to ensure customer targets are met • Business Continuity management • Deployment designs to ensure resilience • Availability management • Measure e2e availability broken down to capabilities
BT Matrix Architecture Challenges - Service Operation • Operational management • How to assess the impact and prioritise application events by business process and IT Service ? • Application management • Routing of PRs to the appropriate support groups? • Analysing high volumes of events in log files? • Technical management • Pinpointing root-cause across multiple shared capability • Metrics • Stepped changes in volumes, errors and response times? • Impact of changes eg trend in error rates • Measuring operational efficiency eg txns vs. failures
BT Matrix Architecture Challenges – E2E Design End Customer End Customer NB : incorporates Flow Stream / Manage / Monitor / Director From Create ServiceID “ SF – Provide – Progress - pt 1 ” ( Place Order ) Build Port Network Capacity Shortfall Into Error Get Tie Cable Mapping queue for manual processing Place Order Pending Pending Assigned ` ` ` Acknowledged Acknowledged ( SMPF ID ) ` ` ` Committed Committed Committed Build VC RADIUS , B - RAS , VCI , etc ` Installed Completed Completed Update ( SMPF ID , Installation DN etc ) SMPF ID Status = “Completed” ` Complete Activation email Status = “Completed” To To “ Close Order” “ Close Order” sub - process sub - process
BT Approach – Application event standard Business transaction Business Process Event type Time Application Standard Host Business keys server e2e correlation key Component capability
BT Matrix Architecture Solution - Service Design SLM • agile design workshop to build in measures to support SLAs Availability • Agile capability workshops to build in measures for monitoring of capacity implemented by apis • Standardised events for common error conditions such as interface failures IT Service Continuity • Dynamic reports of services and deployment profile (host/server distribution)
BT Matrix Architecture Solution- Service Operation Operational management • Event correlation (by service and transaction identifiers) • Impact (problem scenario and guided action) • Performance bottlenecks • Support group checklists (quick wins) Application management • Improved routing of PRs to the appropriate support groups provided by e2e view • We can we analyse high volumes of events by restricting the types of events and provision of summarisation Technical management • Diagnosis – root cause ( e2e location and standard error) Metrics • Summarisation and granularity inherent in standard
Outsourcing Supplier Contracts 1.Monthly views to identify any stepped changes in • Volumes, Response times, Error rates 2. Weekly views of top 5-10 transactions showing • Distribution of volumes, variance in response times, peaks and spikes • Any worsening trends in errors and thresholds 3. Monthly analysis of error messages showing • Volumes errors, eg aborts, application, business, etc. • Breakdown by business process, IT service and component transaction • Corresponding traps and CR/DRs using AlarmMis 4. Ad-hoc Investigations to review • Loadings and relative performance across servers • Real-time transaction analysis • Drill down diagnostics • COTS, platform and network root cause analysis 5. Service management process to review • Capacity • Supplier’s (eg Siebel, WLS) and applications development group’s CRs and DRs • PRs against remedial activities
What is the BT experience? Key messages • Define Standard for Application Events • Instrumentation by design built into matrix capabilities • Implementation by using agile design workshops • Exploitation of toolset supported by supplier contracts • Application monitoring standard promotes the effective problem management by integration with the enterprises diagnostic toolsets
Events Performance Hunter Integration Console System & Application Trap Definitions Management Frameworks COTS Monitoring definitions, e.g., Seibel, BEA, Oracle Remote Operation Business Process & Application txn Monitoring • Flexible & agile • Uses COTS out-of-the-box • Rapid development & deployment • Any management frameworks • Low maintenance