While there a whole host of great ideas you can apply to monitoring and alerting, one of the key reasons you spend time crafting your operations story is to avoid being interrupted during family time. So the auto-healing feature for Azure Web Sites is your family friendly helper that will take care of minor issues without human intervention.
The story behind this feature is that there are a rather broad set of problems that are solved by simply recycling your worker process. From slow-downs to full-on website down scenarios, restarting your website will solve the issue quite a lot of the time. This is where Azure’s auto-healing can step in and take action, without disturbing your quality time.
You can enable auto-healing based on a number of factors – and it is as simple as adding a little configuration to your application:
<system.webServer> <monitoring> <triggers> <!-- The cool stuff happens here! --> </triggers> <actions value="..."/> </monitoring> </system.webServer>
Invalid Child Element
monitoring element is only really applicable once your application is on Azure, so you may see the error
'system.webServer' has invalid child element 'monitoring'. when you are working locally on this configuration. A common method for this is to add the configuration as a transformation in a separate file:
<?xml version="1.0" encoding="utf-8"?> <configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform"> <system.webServer> <monitoring xdt:Transform="Insert"> <triggers> <!-- The cool stuff happens here! --> </triggers> <actions value="..."/> </monitoring> </system.webServer> </configuration>
You can trigger auto-healing in a number of different scenarios, based on:
- The number of requests
- The number of slow requests
- The number of requests matching an HTTP status code, or sub-status code
- The memory usage of a worker process
To implement this effectively, you’ll need to understand what normal looks like, but the goal is to eliminate out-of-hours emergencies over time, so overshoot at first, and then gradually bring in the numbers until the phone stops ringing. It may be tempting to just get auto-healing to step in at the drop of a hat, but you don’t want to end up in a situation where you are automatically restarting your application every ten minutes due to false alarms.
Here are the standard examples for you to take a look at…
<system.webServer> <monitoring> <triggers> <!-- Triggers when you have "count" number of requests within "timeInterval" amount of time --> <requests count="1000" timeInterval="00:10:00"/> <!-- Triggers when you have "count" number of requests that take "timeTaken" within "timeInterval" amount of time --> <slowRequests timeTaken="00:00:45" count="20" timeInterval="00:02:00" /> <!-- Triggers when your worker process reaches "privateBytesInKB" kilobytes of private set --> <memory privateBytesInKB="800000"/> <!-- Triggers when you have "count" responses matching the configured status within "timeInterval" amount of time --> <statusCode> <add statusCode="500" subStatusCode="100" win32StatusCode="0" count="10" timeInterval="00:00:30"/> </statusCode> </triggers> <!-- Performs an overlapping recycle of the worker process when a trigger fires --> <actions value="Recycle"/> </monitoring> </system.webServer>
Kudu Diagnostic Tools You can also set up auto-healing in Kudu, by navigating to Kudu -> Tools -> Support, selecting the application you want to configure, and opening the Mitigate tab:
The auto-heal options have now moved into the main Azure portal. Navigate to the app service and select “Diagnose and solve problems”. This will bring up several options, but auto-heal is found under Diagnostic Tools > Auto Healing. The new UI features a wizard for setting up the auto-healing conditions. It also allows you to see the configuration in one view, so you don’t need to skip between the trigger and action tabs.