550 likes | 1k Views
Agenda. Building manageable applicationsCommon integration patternsCustomer dataTechnical best practices and architectureUnderstand how to model an application and use the model to build a management packUnderstanding of the steps required to build a management packWhat to do next and what que
E N D
1. Management Pack Development Vlad Joanovic
Senior Program Manager
Microsoft Corporation
2. Agenda Building manageable applications
Common integration patterns
Customer data
Technical best practices and architecture
Understand how to model an application and use the model to build a management pack
Understanding of the steps required to build a management pack
What to do next and what questions to ask
Summary
References
3. Operation Manager's Focus Areas Best for Windows
Best for Physical and Virtual
New Monitoring Scenarios
End-to-end IT service mgmt (distributed apps, web services)
Client Monitoring
Audit and Compliance
SP2: Non-windows monitoring
Best for Data Center Management
Solution-based selling
4. Microsoft Investing in Ops Mgr Operations Manager Team:
SP1 and developing SP2
Includes non-windows monitoring
Management Packs
Over 85 Operations Management Packs delivered since RTM
Window Server 2008 MPs and Support
New MP Authoring Console!
Other Microsoft Areas:
Corporate: Engyro acquisition for Ops Mgr interoperability
Microsoft Consulting Services: Server Infrastructure Optimization
Visio: New Connectors for Visio 2007
Solution Accelerator: Service Level Dashboard for System Center Operations Manager 2007 – New Beta!
Visual Studio: Team System Management Model Designer
Forefront: next generation security products – Stirling - built on Operations Manager 2007
5. Our Commitment to MP Partners Make it as easy as possible for you to develop onto Operations Manager
Authoring Console and documentation
Responsive business and technical resources
Built on open standards such as WS-Management and common models
Options for us to work together
Joint sales and marketing plans
Content (case studies), training, and readiness to support joint efforts
On –line marketing through System Center Alliance
www.microsoft.com/systemcenter/alliance
Management Pack catalog
Frequent communication
Joint collaboration in planning for future releases
By building partner solutions to extend Operations Manager you increase your opportunities, and we close more joint sales!
6. Thinking about monitoring Most instrumentation seen in the wild today…
Doesn’t tell me if a service or application is working well
Was great for the developer while debugging
Reports a symptom, and rarely alone is suitable to make a diagnosis
Most monitoring today is …
Added on by the people who are responsible for keeping the application running
Rarely a part of the up front system or application design effort
A best guess on the part of the person or team who designed monitoring rules based on what instrumentation is visible after setting up an application on a test environment.
Today we’ll talk about improving these
Just food for thought if you are shipping applications that need to be monitored.
7. Life cycle states & measurements
8. Three important questions Is my application healthy?
Use health measures to show there are no customer impacting issues
Look at redundant measures that detect elements that have failed
Look at the balance of work across the system
Are critical dependencies able to perform in concert without major disruption to users?
Are the users of my application happy?
How fast do your pages load from request to responsiveness?
Look at abandon page rates relative to overall traffic
Can an end to end interaction happen without interruption?
Consider artificial transactions as a weak proxy for these
How well do the parts of my application work together?
Look at subsystem measures that signal imbalances
Instrument for detecting problems where they occur
Be able to follow a call from end to end if necessary
9. Failure-mode analysis Definition
An up-front design effort for a monitoring plan that is similar to threat modeling
Produces
Instrumentation plan
Design artifacts used to write code that helps detect failures
Typically shows up in specs
Monitoring plan
Used by operations to configure the monitoring system(s) during deployment
Health model
Describes health at the end-user and subsystem level
Used to understand impact of specific types of failures on each subsystem
Guides mitigation and recovery documentation
Helps drive escalations from Tier 1 to Tier 2
Driven by:
Monitoring champion: Ensures that monitoring is part of design process
10. Failure mode analysis Process
Step 1: List what can go wrong and cause harm to service:
Identify all failure modes: List predictable ways to fail
Understand if an item is a way to fail or an effect of a failure
Prioritize according to impact on service health, probability, cost
Include physical, software, and network components
Step 2: Identify a detection strategy for each failure mode
Each high-impact item needs at least two detection methods
Detection can be a measure or event, or can require watchdog (code)
Step 3: Add these detection elements to your code effort
Some are probes, some are monitors (automate as much as possible)
Step 4: Write your management pack
Result is the basis of instrumentation and monitoring plans
Failure modes are root causes
Detecting root causes directly is optimal
Inferring root causes via symptoms requires correlation
11. Example: Blue client library Log cannot be written
Garbage collection interferes with expected behaviors
Machine.config is not set up right
Wrong version of .NET framework on machine
CRC failures
Discovery layer responding slowly
Discovery layer not responding
Discovery layer returns incorrect location
Location cache is out of date
Unrecognized or bad blob format causes crash
Remote code execution exploit occurs
Version mismatch
Assembly load fails
Unable to find a specific collection
Unable to communicate with anything
Authorization: Client library not able to communicate
Connection pool is not adding new connections
Client goes into infinite loop and spams blue
Memory not available to hold working set
Config file missing
Config file corrupt
Config file points to wrong places
Communicates with wrong end point
12. Coverage matrix
13. Common integration patterns An application that runs on windows
Windows Service
Windows Application
Web Service
An application that runs where there is not a healthservice
Exposes syslog, snmp, ws-man, other providers of mgmt type data
Proxy management
Another application exists that is already monitoring
14. App that runs on windows Agent is where monitoring happens
Management server, Gateway and RMS have “agents” too so monitoring can happen there
-OpsMgr has already defined and discovered the objects that represent the windows computers being monitored above with an “Agent” (this includes Mgmt servers, gateways and RMS’s)
-this means a user has to deploy an OpsMgr agent to the server/client that you want to monitor
-if discovery for your object are going to run within OpsMgr your first discovery has to be associated to something OpsMgr already knows about
Agent is where monitoring happens
Management server, Gateway and RMS have “agents” too so monitoring can happen there
-OpsMgr has already defined and discovered the objects that represent the windows computers being monitored above with an “Agent” (this includes Mgmt servers, gateways and RMS’s)
-this means a user has to deploy an OpsMgr agent to the server/client that you want to monitor
-if discovery for your object are going to run within OpsMgr your first discovery has to be associated to something OpsMgr already knows about
15. An application not on HS No health service is really any machine/system that does not have a local agent / health service that runs on it.
This means it will need to be discovered from a windows machine that does have a health service and data proxies through that windows machine.
Note: Proxy monitoring needs to be turned on for any agent that wants to discover instances that are not “hosted” by the windows computer that is responsible for that HS. This can be changed from the administration -> agents – right click properties and choose security.
No health service is really any machine/system that does not have a local agent / health service that runs on it.
This means it will need to be discovered from a windows machine that does have a health service and data proxies through that windows machine.
Note: Proxy monitoring needs to be turned on for any agent that wants to discover instances that are not “hosted” by the windows computer that is responsible for that HS. This can be changed from the administration -> agents – right click properties and choose security.
16. Proxy Management Proxy machine should be a windows machine that has an agent on it or management server so the proxy software (ie. service) can be monitored by a standard windows app mp.
Proxy machine should be a windows machine that has an agent on it or management server so the proxy software (ie. service) can be monitored by a standard windows app mp.
17. Pattern Comparison *if the computer is monitored agentless then the object is owned by the HS on the management server or RMS that monitors that computer (user picks this when they choose how to monitor the computer. 99% of our customers choose to monitor with an agent.
**RMS is the default HS that manages any object that is discovered through the SDK. This can be changed through specifying a special shouldmanage relationship with the object and the HS that should monitor it. SDK objects after being discovered typically have events and performance data created against it that should go through the workflows associated to those objects – the SDKevent and SDKperformance data sources only run on the RMS. If there is a really large amount of data that needs to be collected through the SDK/connector then the SDK data source method is not the right one. A ballpark estimation of when there is too much data is around 100-500 data items perf min being inserted (depends on number of objects, number of associated workflows, etc). If larger streams of data are expected then new custom performance and event data sources would need to be developed and used instead of the SDK ones for data generation for the workflows. These can execute on Management servers or agents that are dedicated to the connector while not negatively affecting the performance of the RMS. For more detail on building data source modules – contact vladj@microsoft.com
***discovery could be done as in the no health service case and does not have to be done through SDK. For discovery data rules that max discovery size (Ie. XML that discovery data produces) is 4MB*if the computer is monitored agentless then the object is owned by the HS on the management server or RMS that monitors that computer (user picks this when they choose how to monitor the computer. 99% of our customers choose to monitor with an agent.
**RMS is the default HS that manages any object that is discovered through the SDK. This can be changed through specifying a special shouldmanage relationship with the object and the HS that should monitor it. SDK objects after being discovered typically have events and performance data created against it that should go through the workflows associated to those objects – the SDKevent and SDKperformance data sources only run on the RMS. If there is a really large amount of data that needs to be collected through the SDK/connector then the SDK data source method is not the right one. A ballpark estimation of when there is too much data is around 100-500 data items perf min being inserted (depends on number of objects, number of associated workflows, etc). If larger streams of data are expected then new custom performance and event data sources would need to be developed and used instead of the SDK ones for data generation for the workflows. These can execute on Management servers or agents that are dedicated to the connector while not negatively affecting the performance of the RMS. For more detail on building data source modules – contact vladj@microsoft.com
***discovery could be done as in the no health service case and does not have to be done through SDK. For discovery data rules that max discovery size (Ie. XML that discovery data produces) is 4MB
18. MP Methodology Build application to be manageable
Gather monitoring requirements and start building MP
Failure mode analysis
Seek out the knowledge and encode in MP
often in the heads of customers, support staff, operators, subject matter experts, etc
Deploy it in real data centers
Refine it based on real world data
Is it noisy? Or not identifying all the unhealthy situations?
Did it identify the problems before a customer called the helpdesk with an issue?
Regularly update MP as more knowledge is gained
Also provide requirements into application for missing monitoring scenarios This doesn’t just apply to an application it could apply to any service, system or device, etc.
Key is to build the application to be manageable from the beginning. The first point really means having the right instrumentation so a monitoring product like OpsMgr can monitor it – this means events in the event log, log files, snmp, syslog, ws-man.
MP requirements need to be identified – what should be monitored? How can it be monitored? It is simple if everything runs on windows as that is where we have infrastructure (and soon to have it on non-windows).
The key is to start building a MP – the first one is never perfect and it is important to get the knowledge on how users today are monitoring your application (ie. playbook or operations guides) and encode this into the MP which OpsMgr can execute at pennies a transaction which scales better than people manually checking things
After this is done it needs to be validated in a real world environment – this is the only way that it is possible to tell if the MP adds value. Key things to look for is does the MP identify problems before users call about them? Is it saying there is a problem when everything is just fine?
Once some MP is there it needs to be refined and continually updated to say fresh with knowledge. The potential here is huge – as imagine our support team helping 1 customer with an issue in which they spent lots of time figuring out what the issue was and how it happened, then they encode this knowledge/scenario into a new MP so that any customer in the world that uses that new MP will not have to go through that same troubleshooting experience and will know directly from the MP that there is an issue and how it should be fixed.This doesn’t just apply to an application it could apply to any service, system or device, etc.
Key is to build the application to be manageable from the beginning. The first point really means having the right instrumentation so a monitoring product like OpsMgr can monitor it – this means events in the event log, log files, snmp, syslog, ws-man.
MP requirements need to be identified – what should be monitored? How can it be monitored? It is simple if everything runs on windows as that is where we have infrastructure (and soon to have it on non-windows).
The key is to start building a MP – the first one is never perfect and it is important to get the knowledge on how users today are monitoring your application (ie. playbook or operations guides) and encode this into the MP which OpsMgr can execute at pennies a transaction which scales better than people manually checking things
After this is done it needs to be validated in a real world environment – this is the only way that it is possible to tell if the MP adds value. Key things to look for is does the MP identify problems before users call about them? Is it saying there is a problem when everything is just fine?
Once some MP is there it needs to be refined and continually updated to say fresh with knowledge. The potential here is huge – as imagine our support team helping 1 customer with an issue in which they spent lots of time figuring out what the issue was and how it happened, then they encode this knowledge/scenario into a new MP so that any customer in the world that uses that new MP will not have to go through that same troubleshooting experience and will know directly from the MP that there is an issue and how it should be fixed.
19. Management Pack contents Class definitions – “the model”
Discoveries
Monitors – “health model”
Tasks
diagnostics, recovery, generic tasks
console tasks
Rules
event and performance collection
generic rules
Knowledge
Views
Reports Class definitions and its structure (model)
Discoveries define how to discover instances of the classes and their relationship to other objects
Monitors specify health states to detect and how to detect problems
Tasks
Diagnostics Tasks describe what extra data is needed to troubleshoot the problem
Recovery Tasks describe how to fix the problem
Generic Tasks describe common actions that can be executed for a given instance of a object Type when demanded
Rules
Event and Performance Collection Rules specify the event and performance data that needs to be collected
Generic Rules describe what actions to execute and when to execute for all instances of a object Type
Knowledge that is displayed in alerts and monitors
Reports that provide common sets of data against the modelClass definitions and its structure (model)
Discoveries define how to discover instances of the classes and their relationship to other objects
Monitors specify health states to detect and how to detect problems
Tasks
Diagnostics Tasks describe what extra data is needed to troubleshoot the problem
Recovery Tasks describe how to fix the problem
Generic Tasks describe common actions that can be executed for a given instance of a object Type when demanded
Rules
Event and Performance Collection Rules specify the event and performance data that needs to be collected
Generic Rules describe what actions to execute and when to execute for all instances of a object Type
Knowledge that is displayed in alerts and monitors
Reports that provide common sets of data against the model
20. Getting started Data flow
Where can the objects be discovered?
How is the instrumentation data going to get to an OpsMgr HealthService?
Is something else already “monitoring”
Building the actual MP
Standalone Authoring Console
Operations Manager Console
XML file
OpsMgr SDK apis
21. Building a MP Define the application/model
Define how to discover the application
Define the health model with knowledge
Define views
Define tasks – diagnostic, recoveries, other
Define reports
22. Who are the customers? Anxious IT Managers Don’t Sleep Well
InformationWeek - March 12, 2007
Two out of three IT managers say that they are kept awake at night worrying about work
75 percent admit ongoing anxiety about application performance concerns
25% of the responded reported suffering physical symptoms. including nausea, headaches, migraines, panic attacks, heart arrhythmia, and muscle twitches. And nightmares.
Terry Beehr, a Central Michigan University professor of psychology
"If IT goes down, a lot of other departments can't do their work.“
"IT is 24-by-7, plus that's combined with heavy workloads and work that needs to be done quickly,"
23. What do the customers want? What MP features are most important?
Health Model 61%
Alerting rules 23%
Alert views 6%, health views 6%, reports 3%
What features are least important?
Tasks 50%
Alert views 16%, performance views 10%, reports10%
Of the following monitoring aspects is most applicable?
Availability 84%
Configuration 10%, performance6%
65% of customers expect SOME MP 30 days after product release
84% of customers expect knowledge to be kept up to date and live with current knowledge bases (quarterly updates)
61% of customers expect “quality” in the mp over timeliness or coverage
81% of customers want to know when a MP is available or has been updated
Get them listed in our catalog & tell us the version so we can let user know when they have the wrong version
24. MP quality Coverage
Are all features being monitored?
Monitor the critical ones or the ones that break the most and cause the most support cases first (release then expand)
Depth
Health and availability
Performance
Transactions (synthetic or real)
Quality
Customer overrides required?
Knowledge correct and up to date
Tasks to help users resolve common problems
25. Quality MP author questions Does the MP show the object as red/down when it is really green/up?
False alerts are a waste of time
Would this red alert be something that a user should get out of bed at 3am for?
Are there cases where the helpdesk learned about a problem with the application before the management pack said there was a problem?
These are important to ask customers about and follow up:
Is this something that can be added to the MP? Instrumentation or transaction?
Or is the application missing instrumentation to enable the MP?
Remember the 3 important questions!!!
26. Define the application Step 1
Define your application
Define the key components
Define the important properties
Define how the key components are tied together
Step 2
Identify how it relates to the rest of the OpsMgr Model
Common bases classes
Why relate to other classes? Rollup health, allow user to see associations
3 general approaches
Discovery these in your MP
Discovery these in another MP (depends on your mp)
User discovers this in a distributed application Step 2 can really be done 3 different ways:
1. You can discover the associations to the OpsMgr classes from your MP – this is what you have to do if your classes are hosted directly (or indiretly from hosting relationships in the classes you inherited from) from the OpsMgr classes. For example – you inherit from computer rule and define a new computer role for your application. When you discover an instance of your application you always have to tell OpsMgr which object (computer in this case) is hosting your instance. This automatically ties your objects to something users are familiar with (a computer) and they will be able to drill into your health model from the computers health explorer)
2. These relationships can be discovered in a different management pack that depends on yours and the microsoft MP you have depended on. This can be done for reference or containment relationships. For example your application uses SQL – depending on what HS SQL is being monitored by and if you want to place a dependency on SQL in your base MP your application definition could contain or reference the SQL DB that you use.
3. There are many cases where the relationships are very hard to determine and in this case the user can using our distributed application designer drag and drop a collection of instances from many different mps into 1 distributed application definition so that app’s health is a union of all the objects that the user added in.Step 2 can really be done 3 different ways:
1. You can discover the associations to the OpsMgr classes from your MP – this is what you have to do if your classes are hosted directly (or indiretly from hosting relationships in the classes you inherited from) from the OpsMgr classes. For example – you inherit from computer rule and define a new computer role for your application. When you discover an instance of your application you always have to tell OpsMgr which object (computer in this case) is hosting your instance. This automatically ties your objects to something users are familiar with (a computer) and they will be able to drill into your health model from the computers health explorer)
2. These relationships can be discovered in a different management pack that depends on yours and the microsoft MP you have depended on. This can be done for reference or containment relationships. For example your application uses SQL – depending on what HS SQL is being monitored by and if you want to place a dependency on SQL in your base MP your application definition could contain or reference the SQL DB that you use.
3. There are many cases where the relationships are very hard to determine and in this case the user can using our distributed application designer drag and drop a collection of instances from many different mps into 1 distributed application definition so that app’s health is a union of all the objects that the user added in.
27. OpsMgr Model Keys to success
A model is never perfect - approximations of the real world never are
need to balance complexity and simplicity – keep it simple!
Pitfalls to avoid
Marketing objects as internal if they are required to be in distributed applications
Modeling classes for objects that are too transient or not useful for a typical admin to use
Modeling objects that have a very large instance space on 1 HS
Anytime more than 1 object exists on a HS should think about cookdown strategies
28. OpsMgr Model OpsMgr ships a large collection of predefined classes or types
Decide which class or type is closest to your model
Use the predefined class or type as a foundation for your application model
29. Model
30. Operating System Model
31. Computer Role Model
32. Defining the application DEMO
33. Discovery 2 different execution & data flow methods
Discovery rule in OpsMgr – timed intervals (every hour) or based on some event
SDK discovered – based on when connector calls SDK API
Incremental vs Snapshot
Incremental specifies logic around removals, additions lives somewhere else
Snapshot means discovery submission will always include ALL instances
OpsMgr implements business logic around removals, additions
HealthService “Hosting” implications
HS that discovered the object by default “hosts” all workflows against it
When SDK discovers an object RMS hosts it (this can be changed)
Reference counting and removing objects
If X discovery rules discover an object, X rules need to undiscover an object
If discovery is disabled by an override, or object it was targeted to goes away instances may remain (use powershell command in notes) Discovery rule in OpsMgr is executed by HS in monitoringhost.exe using specified creds – action account by default
Objects are ref counted so if X discovery rules discovery them then X discovery rules would have to “undiscover” them. If your discovery rules are no longer even running due to a targeting change (or disabling the discovery rule) then you would have to delete them through this powershell command.
http://blogs.msdn.com/boris_yanushpolsky/archive/2007/11/20/opsmgr-sp1-removing-instances-for-which-discovery-is-disabled.aspx
Discovery rule in OpsMgr is executed by HS in monitoringhost.exe using specified creds – action account by default
Objects are ref counted so if X discovery rules discovery them then X discovery rules would have to “undiscover” them. If your discovery rules are no longer even running due to a targeting change (or disabling the discovery rule) then you would have to delete them through this powershell command.
http://blogs.msdn.com/boris_yanushpolsky/archive/2007/11/20/opsmgr-sp1-removing-instances-for-which-discovery-is-disabled.aspx
34. Discovery timeline (snapshot) MP imported
-new classes and relationships exist in the model
-anything targeting an object that already exists (like Microsoft.Windows.Server.2003) will automatically be sent to the healthservices that “host” those instances
-eg. SQL server MP defines a discovery that looks at every windows server (if SQL is installed on a client that is monitored by the same OpsMgr management group then it will NOT monitor it)
-This discovery looks at the registry to see if SQL is installed – it will do this against ALL objects of type Microsoft.Windows.Server.Computer. This object is owned by the HS that is local to that windows server (except if that machine is being monitored agentless – then the object is “owned” by a HS on the management server or RMS that is configured to monitor that server agentlessly – user decides this during discovery of the system)
Discovery runs on the HSs that match its target and send the output to the mgmt server –Discoveries can be based on things like registry keys (which are cheap and every MP should use this as the first and only discovery rule that gets associated to every object in our system) to things like WMI queries, or as completely flexible as scripts – vbs or js or pearl (anything supported by cscript) as there is a scripting api that allows discovery data to be created.
Lets say in our example there is only 1 server being monitored by the OpsMgr mgmt group that is a Microsoft.Windows.Server.Computer and it also happens to have 2 instances of SQL installed on it. In this case 2 instances of SQL would come back from the discovery rule (they run independently on different healthservices). OpsMgr would see these 2 instances – look at what exists for this class already in the DB (at this point nothing) and then it inserts these 2 objects so if the user looks at a state view that shows all SQLDBEngine instances they will see these 2 objects.
Now lets say the user deletes 1 instance of SQL from the machine that the discovery just ran on. The discover rule that looks for SQL instances is configured to run daily. The discovery rule is in snapshot mode (only in a script can it be changed to incremental) so the discovery rule runs without any state carried over from the previous run – it just always discovers all instances – period. This allows OpsMgr to infer (at the DB level where the discovery data is being inserted) that 1 of the instances must not exist anymore so it is removed and deleted from the model so the user doesn’t see the state associated to it anymore as it doesn’t exist.
Same happens with discovery through the SDK – if snapshot is turned on. The only difference is there is nothing to schedule the discovery through the SDK (unless you use a rule from OpsMgr to schedule this). Basically the connector can submit discovery data anytime it wants. The big question is does it have the means to give OpsMgr deltas and do the BL to keep the model up to date or does it just want to submit ALL instances ALL the time and let OpsMgr figure it out. For small discovery data sets (less than 100 objects) snapshot is convenient – as discovery data set grows to 1000 or 10,000s then you will definitely want to make sure that your connector does not do snapshot as processing large amounts of discovery data is expensive to process
MP imported
-new classes and relationships exist in the model
-anything targeting an object that already exists (like Microsoft.Windows.Server.2003) will automatically be sent to the healthservices that “host” those instances
-eg. SQL server MP defines a discovery that looks at every windows server (if SQL is installed on a client that is monitored by the same OpsMgr management group then it will NOT monitor it)
-This discovery looks at the registry to see if SQL is installed – it will do this against ALL objects of type Microsoft.Windows.Server.Computer. This object is owned by the HS that is local to that windows server (except if that machine is being monitored agentless – then the object is “owned” by a HS on the management server or RMS that is configured to monitor that server agentlessly – user decides this during discovery of the system)
Discovery runs on the HSs that match its target and send the output to the mgmt server –Discoveries can be based on things like registry keys (which are cheap and every MP should use this as the first and only discovery rule that gets associated to every object in our system) to things like WMI queries, or as completely flexible as scripts – vbs or js or pearl (anything supported by cscript) as there is a scripting api that allows discovery data to be created.
Lets say in our example there is only 1 server being monitored by the OpsMgr mgmt group that is a Microsoft.Windows.Server.Computer and it also happens to have 2 instances of SQL installed on it. In this case 2 instances of SQL would come back from the discovery rule (they run independently on different healthservices). OpsMgr would see these 2 instances – look at what exists for this class already in the DB (at this point nothing) and then it inserts these 2 objects so if the user looks at a state view that shows all SQLDBEngine instances they will see these 2 objects.
Now lets say the user deletes 1 instance of SQL from the machine that the discovery just ran on. The discover rule that looks for SQL instances is configured to run daily. The discovery rule is in snapshot mode (only in a script can it be changed to incremental) so the discovery rule runs without any state carried over from the previous run – it just always discovers all instances – period. This allows OpsMgr to infer (at the DB level where the discovery data is being inserted) that 1 of the instances must not exist anymore so it is removed and deleted from the model so the user doesn’t see the state associated to it anymore as it doesn’t exist.
Same happens with discovery through the SDK – if snapshot is turned on. The only difference is there is nothing to schedule the discovery through the SDK (unless you use a rule from OpsMgr to schedule this). Basically the connector can submit discovery data anytime it wants. The big question is does it have the means to give OpsMgr deltas and do the BL to keep the model up to date or does it just want to submit ALL instances ALL the time and let OpsMgr figure it out. For small discovery data sets (less than 100 objects) snapshot is convenient – as discovery data set grows to 1000 or 10,000s then you will definitely want to make sure that your connector does not do snapshot as processing large amounts of discovery data is expensive to process
35. Discovery guidelines Keys to success
Use registry as a discovery source when targeting an object that OpsMgr discovered (ie. windows computer, server, client)
Having a simple model is simple to discover
Pitfalls to avoid
Discovering data too often – makes the default configuration report useless
General rule: balance discovery frequency with amount of likely change. Side with less frequent or use event driven model
If it changes a lot and is important to track think about using a perf counter or event
Making the model too complex – too many classes, relationships
36. Discovery sources on HS Registry
Look for presence of HKLM\Software\YourApp
Look for HKLM\Software\YourApp\Version = 6.0.6278.0
WMI
Perform “select * from Win32_logicaldisk”
Script
Anything your heart desires (and can be implemented in a script)
37. Discovering the application DEMO
38. Defining Health Model Provide a way for OpsMgr to understand whether application is healthy or not
Monitors (state is based on monitor state transitions)
Monitors can create alerts for state transitions
Rules – create alerts that do not impact state
Provide a way for OpsMgr to automatically diagnose or fix the problem (optional)
Diagnostics
Recoveries
Provide knowledge to the operator on how to fix the application A set of monitors needs to be defined for each object.
By definition an object that has no monitors associated to it will show up “unmonitored” even if it has alerts that are open and associated to it
Model defined earlier constricts how health can be rolled up (health can roll up over any containment relationship and this includes hosting)
Hs hosting the object could also prevent desired rollup
Job/contract of the MP/model is to appropriately make sure “state” of an object is always up to date
Users use the objects in DA and having the right state is critical
Users mostly use the alerts generated from monitors or rules so create alerts on health state transitions
A set of monitors needs to be defined for each object.
By definition an object that has no monitors associated to it will show up “unmonitored” even if it has alerts that are open and associated to it
Model defined earlier constricts how health can be rolled up (health can roll up over any containment relationship and this includes hosting)
Hs hosting the object could also prevent desired rollup
Job/contract of the MP/model is to appropriately make sure “state” of an object is always up to date
Users use the objects in DA and having the right state is critical
Users mostly use the alerts generated from monitors or rules so create alerts on health state transitions
39. Monitors (State monitoring) 39
40. Health Model 40
41. Health Model – Roll up 41
42. Health Model guidelines Keys to success
Provide a HM that captures the different symptoms and appropriately reports state
Pitfalls to avoid
When marking rules or monitors as “Remoteable” make sure they really are remoteable
For agentless monitoring
To be public or not to be public (meaning Internal)
If Internal no customer will be able to add diagnostics or recoveries
For scripted based monitors
Do implement OnDemand detection
If a HS has multiple instances on a HS (like SQL databases)
Make sure monitors have a criteria on the instance key
Run As profiles
Customers want to run agent action account as a low privilege account
Use a profile for all things resulting in workflows – rules, monitors, tasks and discoveries
43. Blue client library
44. Defining the health model DEMO
45. Cookdown strategy Will your model include monitoring more than 1 object of the same class on a HS?
Workflows are created in HS by associated rules/monitors/tasks
Workflows are per instance – if there are 100s or 1000s of instances – there will be 100s or 1000s * (rules + monitors) = much bigger number
RMS and MS have an ability in which workflows run per type * (rules + monitors) = much smaller number
Called “special aggregate monitors”
Cookdown could have a similar effect at reducing load to HS and monitoringhost.exe
46. Cookdown dataflow Scenario: HS has 5 instances of LOBAPP. Data is coming from WMI class LOBAPPPerf – there is an instance in wmi for each instance in OpsMgr
Datasource’s are expensive If the configuration being passed into a module (including a datasource module) is exactly the same in 2 or more instance then the healthservce will “cook” these down and wire the workflow as described on the right. If the configuration being passed into each DS is different then it cannot be cooked down.
For example imgine in the above wmi case that the WQL query being used was something like this:
Select * from LOBAPPPerf where LOBAppPerfKey = $Target/Property[Type=“MyCompanyname.OpsMgrLOBAPPClass”]/key$’
This would not be able to be cooked down as clearly the config for each DS is different as the LOBAPP key property is being passed into the query (which is part of the configuration for a wmi data source module). A better way to do this would be to specify this as the query: select * from LOBAppPerf as this now can get cooked down and then in the rest of the workflow a condition detection module can match the specific instance (this is way more efficient than running the WQL query 5 times).If the configuration being passed into a module (including a datasource module) is exactly the same in 2 or more instance then the healthservce will “cook” these down and wire the workflow as described on the right. If the configuration being passed into each DS is different then it cannot be cooked down.
For example imgine in the above wmi case that the WQL query being used was something like this:
Select * from LOBAPPPerf where LOBAppPerfKey = $Target/Property[Type=“MyCompanyname.OpsMgrLOBAPPClass”]/key$’
This would not be able to be cooked down as clearly the config for each DS is different as the LOBAPP key property is being passed into the query (which is part of the configuration for a wmi data source module). A better way to do this would be to specify this as the query: select * from LOBAppPerf as this now can get cooked down and then in the rest of the workflow a condition detection module can match the specific instance (this is way more efficient than running the WQL query 5 times).
47. Scripting Extend the monitoring abilities of OpsMgr 2007
Discovery
Create new tasks
Create new monitor types
Execute monitoring business logic in rules
Run synthetic transactions to simulate common user behaviour
Good when no monitoring instrumentation exists for a particular scenario
Easier to debug scripts Debugging scirpts:
Use Visual Studio for Script Debugging
Use Script Debugger:
Find it here http://www.microsoft.com/downloads/details.aspx?familyid=2f465be0-94fd-4569-b3c4-dffdf19ccd99&displaylang=en
When it is installed you can launch a script in the debugger with the //x switch.
cscript.exe //x script.vbs parameter1 parameter2 The script debugger allows you to set breakpoints, and to step through code line by line, examining the contents of the variables in the command window. To see the contents of a variable in the command window, enter ?variableName. Use a debugging tool like PrimalScript. For XML formatting use XmlSpy or Visual Studio for some quick formatting
Set an argument to the script “LogDetail” and if it is true/false dump out verbose log information, and set it as an overridable parameter.
Use Poor man’s debugging techniques like MsgBox or Wscript.Echo in the script.
Debugging scirpts:
Use Visual Studio for Script Debugging
Use Script Debugger:
Find it here http://www.microsoft.com/downloads/details.aspx?familyid=2f465be0-94fd-4569-b3c4-dffdf19ccd99&displaylang=en
When it is installed you can launch a script in the debugger with the //x switch.
cscript.exe //x script.vbs parameter1 parameter2 The script debugger allows you to set breakpoints, and to step through code line by line, examining the contents of the variables in the command window. To see the contents of a variable in the command window, enter ?variableName. Use a debugging tool like PrimalScript. For XML formatting use XmlSpy or Visual Studio for some quick formatting
Set an argument to the script “LogDetail” and if it is true/false dump out verbose log information, and set it as an overridable parameter.
Use Poor man’s debugging techniques like MsgBox or Wscript.Echo in the script.
48. Reporting Out of the box reports
Availability, configuration
Extend reporting
New generic reports or linked (specialized) reports
Supporting database objects (views, functions, stored procedures)
New storage structures (tables) and collection for new data types
49. SCE & management packs SCE = System Center Essentials
OpsMgr 2007 MP with rules/monitors ‘Enabled’ value configured as
OnEssentialMonitoring = Enabled both in OpsMgr and Essentials by default
critical rules/monitors appropriate for all environments
OnStandardMonitoring = Enabled only in OpsMgr by default
warning and information rules/monitors appropriate for complex environments/IT specialists
Customer can override; enable rules/monitors in Essentials that are marked OnStandardMonitoring
50. Releasing a MP During creation MP is an XML file
Before releasing to customers required to “seal” it
Not a way to protect IP
Verifies your identity during import
Makes it “read only” and enforces backwards compatibility during upgrades
51. Putting it all together Your objects can be added to a distributed application
Value and customers scenarios enabled through the health model associated to your object(s)
52. Distributed Applications DEMO
53. Summary Understand which pattern you are building and data flow strategy
Discovery, instrumentation
Building a management pack
Define the application
Define discovery
Define the application health model
Seal your management pack
Release and repeat – focus on knowledge and customer value Think of the contract signed between OpsMgr and a MP:
-MP will define classes
-OpsMgr will ensure workflows run on the HS for the types specified, including executing discovery rules, monitors, rules, tasks
-MP will keep instances/relationships up to date with real world environment
-MP will keep state correct and not show green when the object should be or is really red or red with the object is green (there is a yellow)
Think of the contract signed between OpsMgr and a MP:
-MP will define classes
-OpsMgr will ensure workflows run on the HS for the types specified, including executing discovery rules, monitors, rules, tasks
-MP will keep instances/relationships up to date with real world environment
-MP will keep state correct and not show green when the object should be or is really red or red with the object is green (there is a yellow)
54. References MP authoring guide
Develop against OpsMgr SP1
OpsMgr Authoring console
Microsoft management packs as samples
Blogs
Feedback - vladj@microsoft.com or for more information on building MPs
MP authoring links and information
OpsMgr 2007 SP1 RTM bits:
Upgrade to be installed on RTM: http://www.microsoft.com/downloads/details.aspx?FamilyId=EDE38D83-32D1-46FB-8B6D-78FA1DCB3E85&displaylang=en
Full install that includes SP1: http://www.microsoft.com/downloads/details.aspx?FamilyId=C3B6A44C-A90F-4E7D-B646-957F2A5FFF5F&displaylang=en
Here is where you can get the latest authoring console:
http://download.microsoft.com/download/f/4/3/f438d6a0-290c-42b8-8f9c-c6660f89e1aa/OpsMgr07_x64_AuthConsole.exehttp://download.microsoft.com/download/f/4/3/f438d6a0-290c-42b8-8f9c-c6660f89e1aa/OpsMgr07_x86_AuthConsole.exe
Here are our authoring guides:
Operations Manager 2007 Management Pack Authoring Guide:
http://download.microsoft.com/download/7/4/d/74deff5e-449f-4a6b-91dd-ffbc117869a2/OM2007_AuthGuide.doc
The report authoring guide for Operations Manager 2007:
http://download.microsoft.com/download/7/4/d/74deff5e-449f-4a6b-91dd-ffbc117869a2/OpsMgr2007_RprtGuide.doc
MP Authoring resources and lots of samples: http://www.authormps.com
Blogs:
OpsMgr++ http://blogs.msdn.com/boris_yanushpolsky
Notes on System Center http://blogs.msdn.com/mariussutara/default.aspx
System Center modules http://blogs.msdn.com/sampatton/default.aspx
SDK and connectors: http://blogs.msdn.com/jakuboleksy/default.aspx
Report authoring: http://blogs.msdn.com/eugenebykov/
Clive's Support blog: http://blogs.technet.com/cliveeastwood/default.aspx
Powershell http://blogs.msdn.com/scshell/
Here is some info on how to generate the XML files for MPs that are imported into your OpsMgr system as the microsoft MPs are very complete samples. All MPs can be found at our catalog site here:
http://www.microsoft.com/technet/prodtechnol/scp/catalog.aspx
These XML files will be useful to view in the authoring console or XML directly:
it is possible to do via the command shell. In both cases, you need to find the management pack you want to export through whatever method you prefer and then export it as below:
Command Shell Example:
Get-ManagementPack -Name "Microsoft.SystemCenter.2007" | Export-ManagementPack -Path "C:\"
SDK Example:
using System;
using System.Collections.ObjectModel;
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Configuration;
using Microsoft.EnterpriseManagement.Configuration.IO;
namespace WorkSamples
{
partial class Program
{
static void ExportManagementPack()
{
// Connect to the local management group
ManagementGroup mg = new ManagementGroup("localhost");
// Get any management pack you want
ManagementPack managementPack =
mg.GetManagementPack("Microsoft.SystemCenter.2007", "31bf3856ad364e35", new Version("6.0.5000.0"));
// Provide the directory you want the file created in
ManagementPackXmlWriter xmlWriter = new ManagementPackXmlWriter(@"C:\");
xmlWriter.WriteManagementPack(managementPack);
}
}
}
BlogMP authoring links and information
OpsMgr 2007 SP1 RTM bits:
Upgrade to be installed on RTM: http://www.microsoft.com/downloads/details.aspx?FamilyId=EDE38D83-32D1-46FB-8B6D-78FA1DCB3E85&displaylang=en
Full install that includes SP1: http://www.microsoft.com/downloads/details.aspx?FamilyId=C3B6A44C-A90F-4E7D-B646-957F2A5FFF5F&displaylang=en
Here is where you can get the latest authoring console:
http://download.microsoft.com/download/f/4/3/f438d6a0-290c-42b8-8f9c-c6660f89e1aa/OpsMgr07_x64_AuthConsole.exehttp://download.microsoft.com/download/f/4/3/f438d6a0-290c-42b8-8f9c-c6660f89e1aa/OpsMgr07_x86_AuthConsole.exe
Here are our authoring guides:
Operations Manager 2007 Management Pack Authoring Guide:
http://download.microsoft.com/download/7/4/d/74deff5e-449f-4a6b-91dd-ffbc117869a2/OM2007_AuthGuide.doc
The report authoring guide for Operations Manager 2007:
http://download.microsoft.com/download/7/4/d/74deff5e-449f-4a6b-91dd-ffbc117869a2/OpsMgr2007_RprtGuide.doc
MP Authoring resources and lots of samples: http://www.authormps.com
Blogs:
OpsMgr++ http://blogs.msdn.com/boris_yanushpolsky
Notes on System Center http://blogs.msdn.com/mariussutara/default.aspx
System Center modules http://blogs.msdn.com/sampatton/default.aspx
SDK and connectors: http://blogs.msdn.com/jakuboleksy/default.aspx
Report authoring: http://blogs.msdn.com/eugenebykov/
Clive's Support blog: http://blogs.technet.com/cliveeastwood/default.aspx
Powershell http://blogs.msdn.com/scshell/
Here is some info on how to generate the XML files for MPs that are imported into your OpsMgr system as the microsoft MPs are very complete samples. All MPs can be found at our catalog site here:
http://www.microsoft.com/technet/prodtechnol/scp/catalog.aspx
These XML files will be useful to view in the authoring console or XML directly:
it is possible to do via the command shell. In both cases, you need to find the management pack you want to export through whatever method you prefer and then export it as below:
Command Shell Example:
Get-ManagementPack -Name "Microsoft.SystemCenter.2007" | Export-ManagementPack -Path "C:\"
SDK Example:
using System;
using System.Collections.ObjectModel;
using Microsoft.EnterpriseManagement;
using Microsoft.EnterpriseManagement.Configuration;
using Microsoft.EnterpriseManagement.Configuration.IO;
namespace WorkSamples
{
partial class Program
{
static void ExportManagementPack()
{
// Connect to the local management group
ManagementGroup mg = new ManagementGroup("localhost");
// Get any management pack you want
ManagementPack managementPack =
mg.GetManagementPack("Microsoft.SystemCenter.2007", "31bf3856ad364e35", new Version("6.0.5000.0"));
// Provide the directory you want the file created in
ManagementPackXmlWriter xmlWriter = new ManagementPackXmlWriter(@"C:\");
xmlWriter.WriteManagementPack(managementPack);
}
}
}
Blog
55. Questions