Incremental Detection and Visualization
Download
1 / 25

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer – - PowerPoint PPT Presentation

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer – Marcelo Perazolo Autonomic Computing Architecture mperazolo@us.ibm.com Abdi Salahshour Autonomic Computing Technology & Development abdis@us.ibm.com

Related searches for Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer –

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer –

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Incremental Detection and Visualizationof Problem Patterns–a “Simplified” Symptomatic Event Vizualizer –

  • Marcelo Perazolo

  • Autonomic Computing

  • Architecture

  • mperazolo@us.ibm.com

Abdi Salahshour

Autonomic Computing

Technology & Development

abdis@us.ibm.com

April 25-26, 2006


Agenda

  • Statement of Problem

  • What is the Common Event Format

  • What is the Symptoms Reference Format

  • A Solution

  • Conclusion

  • Helpful Links


Problems Facing Today's Data Collection

  • Complexity of e-Business

    • Collection of distributed and heterogeneous software and hardware components

  • Variety of Data and Collectors/Adapters

    • Consume and publish proprietary data formats

    • Require ad hoc and product specifics code

      • Data format and APIs

    • Design and Standards considerations

    • Different skills set to configure, maintain, and tune

    • Difficult to correlate for e2e problem diagnostics

  • Instrumentation

    • Many-to-Many

    • Standards compliance

    • Customer pain and cost of ownership


[ibm][db2][jcc][t4] 0150 0400162110E2C1D4 D7D3C5F140404040 ...!........@@@@ .....SAMPLE1

[ibm][db2][jcc][t4] 0160 4040404040404000 59D0030003005324 @@@@@@@.Y.....S$ ..}......

[ibm][db2][jcc][t4] 0170 0800640000003032 30303053514C5249 ..d...02000SQLRI .............<..

[ibm][db2][jcc][t4] 0180 4558540001000480 0100000000000000 EXT............. ................

[ibm][db2][jcc][t4] 0190 0000000000000000 0000000020202020 ............ ................

[ibm][db2][jcc][t4] 01A0 2020202020202000 1253414D504C4531 ..SAMPLE1 ...........(&<..

[ibm][db2][jcc][t4] 01B0 2020202020202020 20202000000000FF ..... ................

[ibm][db2][jcc][t4]

[ibm][db2][jcc][ResultSetMetaData@108ac50a] BEGIN TRACE_RESULT_SET_META_DATA

[ibm][db2][jcc][ResultSetMetaData@108ac50a] Result set meta data for statement Statement@2b2cc50a

[ibm][db2][jcc][ResultSetMetaData@108ac50a] Number of result set columns: 1

isDescribed=true[ibm][db2][jcc][ResultSetMetaData@108ac50a] Column 1: { label=BALANCE, name=BALANCE, type name=DECIMAL, type=3, nullable=1, precision=9, scale=2, schema name=TEST , table name=ACCOUNTS, writable=false, sqlPrecision=9, sqlScale=2, sqlLength=0, sqlType=485, sqlCcsid=0, sqlName=BALANCE, sqlLabel=null, sqlUnnamed=0, sqlComment=null, sqludtxType=<null>, sqludtRdb=<null>, sqludtSchema=<null>, sqludtName=<null>, sqlxKeymem=0, sqlxGenerated=0, sqlxParmmode=0, sqlxCorname=ACCOUNTS, sqlxName=BALANCE, sqlxBasename=ACCOUNTS, sqlxUpdatable=0, sqlxSchema=TEST , sqlxRdbnam=SAMPLE1, internal type=3, is locator parameter=false }

[ibm][db2][jcc][ResultSetMetaData@108ac50a] { sqldHold=0, sqldReturn=0, sqldScroll=0, sqldSensitive=0, sqldFcode=85, sqldKeytype=0,

Event Logging

source=com.ibm.ws.rsadapter.spi.WSRdbDataSource org=IBM prod=WebSphere component=Application Server

<init>

[11/25/03 14:14:33:695 EST] 42754514 > UOW= source=com.ibm.ws.rsadapter.DSConfigurationHelper org=IBM prod=WebSphere component=Application Server

createDataStoreHelper parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper parm2={}

[11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.GenericDataStoreHelper org=IBM prod=WebSphere component=Application Server

init parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper@2128451b

[11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server

setGetTypeMapSupport: false

[11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server

setHelperType: 0

[11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper org=IBM prod=WebSphere component=Application Server

the cloudscape metadata is : parm1=

The defaultTransactionIsolation is: 2

The supportsExtendedForUpdate is: false

The supportsKerberos is: false

The supportsSelectForUpdate is: true

The supportsGetCatalog is: true

The supportsGetTypeMap is: false

The supportsIsReadOnly is: true

The supporstMultiplePartitionDB is: false

Applications

Database

Application

Servers

Servers

Storage devices

Networks

Proprietary format


Problem determination

may take days or weeks

Blame Storming

Blame Storming Syndrome

  • Proprietary log format

  • Domain specific set of tools

  • No interfaces between tools

  • Siloed problem determination

  • Finger pointing resolution

Applications

Database

Application

Servers

Servers

Storage devices

Networks

Proprietary format

Specialized skills and tools


Common Base Event (CBE) / WSDM Event Format (WEF)

  • Richer and normalized data enables cross-product analysis & correlation; is a prerequisite to effective root cause analysis and automation

  • Without standards the event data are of little value to autonomic management in problem determination and action in response

  • To alleviate this event data are structured in 4 categories

    • The identification of the component that is affectedby or experienced the situation

      • This is also known as the source of a situation

    • The identification of the component that is reporting the situation

      • This is also known as the reporter of a situation

      • It may be the same as the source component of the situation

    • The situation data

      • Properties or attributes that describes the situations

    • The Context/Correlation data

      • Properties or attributes to correlate the situations with others

  • CBE / WEF

    • A consistent specification for the definition of normalized event and log information for various domains (business, security, network, system, etc.)

    • An exchange format for events and logs

    • Describe situations about the external operational capabilities of the component.

    • data that captures execution information within a component (i.e. trace), which CBE/WEF is not positioned for

    • Context Data


What is a Symptom?

  • Dictionary definition:“A characteristic sign or indication of the existence of something else.”

  • AC definition:“A characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources.”

    • A form of knowledge, used to solve problems and situations automatically in an autonomic system.

    • Symptoms are composite records of information, formed by the combination of raw or composite information into patterns

    • Symptoms may be composed of other symptoms as well


From Events to Symptoms

  • Event: an indication of something being monitored

    • For example, memory usage has exceeded a set limit

  • Symptom: a characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources

    • Symptom: If event x (and y (and…) ) occur (under certain conditions), then report the occurrence and possible resolution actions

    • For example, memory usage has exceeded a set limit three times in a 10-minute stretch: suggest increasing your buffer sizes


Symptoms Reference Architecture

schema:

<schema used to create a new instance of the symptom>

metadata:

<schema used to index and categorize all forms of knowledge>

Policy

Change Req

Change Plan

Analyze

Plan

Symptom

Knowledge

SymptomDefinition

Monitor

Execute

Event

rule

effect:

<schema that describes how to react to instances of the symptom>

rule:

<schema used to recognize a symptom instance>

instance

engine

deploy

engine:

<a runtime artifact used to produce symptom instances>

instance:

<an instance of this symptom that conforms to the symptom schema>

SymptomCatalog


The Value Proposition

  • Management Data more consumable to end-user

    • Visualization of product symptoms within problem determination tooling

    • Symptoms are more deterministic than individual events

    • Increased customer satisfaction

  • Reduced problem determination costs

    • Administrators use automated event correlation to recognize symptoms (and potentially, corrective actions)

    • Support personnel access symptoms directly from the problem determination tools

    • Cross-product symptom catalogs allow quick diagnosis for known errors

  • Reduced maintenance costs

    • Incremental improvements to symptom databases will reduce requests to L2 and L3 support

    • Reduced support requests from other IBM organizations

    • Standard symptom format allows products to leverage problem resolution cost from other IBM organizations (e.g. Collaboration Center)


One Tool Does Not Fit All!

Advanced

Developers

LTA-eclipse

LTA-portal

Change

Team

Correlation

Support

Engineers

System

Analysts

LTA-JD

Analysis

Operators

Triage

Basic (e.g. operators)

Advanced (e.g. developers)

Simple

User Skills


“Simple“ Log and Trace Analyzer for Java Desktop

  • Standalone simple Java event viewer to merge, filter, sort, and display contents of event sources in a common event format (i.e., CBE) for problem isolation and triage to problem analysis

    • Enables end-to-end viewing of event sources across the heterogeneous environment

    • Customizable summary view

    • Ability to select and expand any raw from the summary view to display the full CBE attributes

    • Correlate on timestamp and/or sorting on any Common Base Event property

    • Filtering and multi level sorting of any event properties

    • Custom highlighting of triage events (simple symptoms definition)

    • Save and share configuration settings (import/export)

    • Staring point for Support personnel and Operation staff

      • Springboard to more advanced analyzer tools


Overall Architecture

Fast XPath

Process CBE

CBE

Event

Sources

Visual

Filters

  • FastXPath

    • Integrates solution with existing code generation tools

    • Extracts XML schema-specific metadata from the object it queries

    • Uses metadata available in auto-generated classes to build optimized XSL engines


Event sources collection

Customizable Results/Summary area

Events detail area


=

Equivalent toSymptom Rules

This filter is by Creation Time using XPath that can be generated by the Filter Builder


Filter Builder (Novice Users)

Powerful composition dialogs…

… while still showing full XPath syntax for power users


=

We associate visualization attributes to Symptom Rules


1

2

3

4

5


Flexibility to show only what the user wants to see: filters out the non-participating events


Symptom details (description of the problem) show up when hovering over the highlighted events


Helpful Links

  • Autonomic Computing Enablement Site

    • http://acenablement.raleigh.ibm.com/

    • http://acenablement.raleigh.ibm.com/html/technology/pd/pddwnlds.html

  • Autonomic Computing

    • http://www.ibm.com/autonomic

  • Autonomic Computing Toolkit

    • http://www.ibm.com/developerworks/autonomic

  • Autonomic Computing Toolkit Download

    • http://www-106.ibm.com/developerworks/autonomic/probdet1.html

  • Common Base Event Version V1.0.1 (CBE)

    • http://dev.eclipse.org/viewcvs/indextools.cgi/~checkout~/hyades-home/docs/components/common_base_event/cbe101spec/CommonBaseEvent_SituationData_V1.0.1.pdf

  • WSDM Event Format V1.0 (WEF)

    • PART 1: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part1-1.0.pdf

    • PART 2: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part2-1.0.pdf

  • Common Event Infrastructure (CEI)

    • http://www.ibm.com/software/tivoli/features/cei/

    • http://www-106.ibm.com/developerworks/library-combined/ac-cei


Backups


CBE Object

ACT/XPath

CEI/ESB

CBE

Logs

XPath

CBE

Logs

Import

CBE

Logs

CBE XML

Formatted

Logs

SymptomDB

SymptomDB

SymptomDB

Solution Problem

Isolation & Analysis

Product Problem

Isolation & Analysis

Solution Problem

Isolation

Solution Problem

Analysis

Use Cases

LTA-Eclipse (Correlate/Analyze)

  • Event viewing

  • Merge/sort/filter

  • Event correlation

  • Cross-Event analysis (symptoms)

  • Remote/local data collection

  • Event conversion

CBE XML

Log and Trace Analyzer

Tools Retrieve and Analyze

CBE Log Data

RAC (API)

CBE

Events

LTA-JD (Triage)

LTA-JD (Analyze)

Generic Log Adapters (GLA)

Triaged

CBE

Events

LTA-Portal (Correlate/Analyze)

  • Event viewing

  • Merge/sort/filter

  • Event correlation

  • Cross-Event analysis (symptoms)

  • Remote/local data collection

  • Event conversion

CBE XML

Formatted

Logs

  • Event viewing

  • Merge/sort/filter

  • Single Event Analysis (highlighting/simple symptom rules)

  • local data collection

  • Remote data collection from CEI server

Applications


LTA-JD Performance

  • Evaluation of LTA-JD end-to-end (xml input – convert & process object - filter – display)

  • Evaluation of simple FastXPath expression

    • /CommonBaseEvent[@severity >= '10'] on 100000 CBEs

    • FastXPath (157millisecs), JXPath (468 millisecs), Xalan (1328 secs)

  • Better results with

    • smarter filters

    • bigger JVM heap

    • IBM JDK 1.5 (~ 60% improvement !!!)


ad
  • Login