HealthService 1200 and 1209 errors

see update at the bottom of this page:

Management Pack with id: “{<GUID>}”, version: “{<GUID>}” has been requested “<LARGE NUMBER>” times. Management Group “SMProd”. 

This is kinda a bane of my existence right now. Operations manager (and by extension, Service Manager, which runs on the same engine) distributes MPs to registered agents in a largely undocumented process that populates the Health Management State folder. Sometimes this fails, and it’s not really clear why, but it seems to be happening a lot more since 2012 SP1.

The reason this is a problem is that an agent that can’t get an MP will assume the health state isn’t up to date, and will never start any workflows. For Ops Manager, this means no health checks, no reports back to the management servers, and a grey mark in the health explorer. For Service Manager, the agent on the workflow management server executes workflows that do background maintenance, such as advancing activities, processing votes, populating connectors, and generating notifications.

The usual fix for this is to stop all of the System Center services, clear the Health Service State folder, restart services, and cross your fingers. For a lot of agents that seems to work. The Health Service realizes it’s state is empty, whatever black-magic method used to retrieve MPs is restarted cleanly, and the agent is back to normal with-in about 2-3 minutes.

In certain conditions, the MP itself may not be completely imported, or stored in the database incorrectly. This is more common in Service Manager (because more MPs are custom authored and are changed faster). The fix for this is to change the MP version (for custom or unsealed MPs that can be easily versioned) or remove and re-import the MP (for sealed MPs that can’t be versioned, but you risk losing data).  Use Get-SCSMManagementPack or Get-SCOMManagementPack with the -ID property and the first GUID to determine which MP is problematic.

In some rare conditions, this doesn’t work either. Ops Manager has one more thing to try up it’s sleeve, removing and re-installing the agent, but for service manager this is a pretty intensive thing to do to the Service Manager workflow server (which is the only place where this issue causes any problems).

I have two SM management groups that are constantly throwing a screw and complaining about one MP or another. Recently, one of them decided that it wasn’t going to accept new versions of a certain MPs. Clearing the health state isn’t working, this MP isn’t one that can easily be removed, and this is the only management server in the group, so re-installing would require Disaster Recovery methods.

I’m not sure what I am going to do about it, but I figured I should throw something up about these errors, since they seem to be somewhat common in Service Manager, and don’t seem to be widely documented.

update:
Last edited by Thomas Bianco on December 26, 2013 at 11:56 AM
The resolution to this issue turned out to be a bad version that had been previously imported to the database, and wasn’t upgrading properly for some unknown reason. I had to remove the MP, wait for the database to clear all of the objects that MP defined, and then re-import and recreate all those objects.

update the 2nd:
Last edited by Thomas Bianco on January 20, 2014 at 1:29 PM
This has reoccurred, apparently with a different MP, and then again on a different management group using the same set of MPs. My College, Brody Kilpatrick spent several days on the line with Microsoft attempting to troubleshoot this with no definitive solution.

We ended up seeing LOTS of counter-intuitive behaviors. The most striking of which was that removing the workflows from the MP sometimes caused it to load and deploy correctly, thou workflows still would not start. eventually, we ended up rebuilding those systems over the weekend from clean source and re-importing the MPs, and we haven’t run into the same behavior yet.

update the 3rd:
Last edited by Thomas Bianco on Febuary 17, 2014 at 12:29 PM
Brody found some additional weirdness that I feel needs to be documented:

  1. MPs (sealed or unsealed) are only copied to the health management store if they contain workflows. This makes sense if you consider it from the Ops Manager perspective; SCSM workflows that run on the workflow server are actually OMSDK health monitoring rules that run on agents
  2. Sealed MPs can not contain Config Item group rules. we’re still not certain why, but if a sealed MP contains a group or queue definition, something in the MP distribution engine hits a corner case, fails to update any further MPs, and those any sealed MPs processed after this do not get distributed. we’re not 100% certain if this is the full cause of the issue, but it’s certainly not helping.

update the 4th:
Last edited by Thomas Bianco on March 7, 2014 at 2:12 AM
One more update. We ran into this issue in the production environment, which is a much larger install with multiple management group servers. Importing the MP to a different management group server resulting in a successful distribution. We’re really not sure what this means, other then the fact that it seems to be tied to the workflow server.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s