Automation

Uptime and SLAs

Posted on

This is a bonus post that follows up on some information that is useful if you read Web Operations Dashboards, Monitoring, and Alerting. This article is all about uptime and SLAs. Having helped a number of businesses understand what uptime and SLAs are, and how they work in real life, I have encountered a few […]

Automation

The Monitor Matrix

Posted on

This is the last in a series of posts to share some techniques that I wrote about in Web Operations Dashboards, Monitoring, and Alerting. In this final bite-size chunk, I’m going to talk about the Monitor Matrix. Selecting monitors has a gradual evolution. You start off monitoring the things that everyone starts monitoring. You keep […]

Automation

The Monitor Selection Principles

Posted on

This is one more article in a series of posts to share some techniques that I wrote about in Web Operations Dashboards, Monitoring, and Alerting. In this article, I’m going to talk about Monitor Selection Principles. While it can be tempting to start off by monitoring everything, and alerting every time something slightly odd happens, […]

Automation

The Alerting Principles

Posted on

This is the next in a series of posts to share some techniques that I wrote about in Web Operations Dashboards, Monitoring, and Alerting. In this instalment, I’m going to talk about the Alerting Principles. When it comes to monitoring alerting is an area you will want to get right. There is a natural tension […]

Automation

The Incident Causation Principles

Posted on

This is another in a series of posts to share some techniques that I wrote about in Web Operations Dashboards, Monitoring, and Alerting. In this article, I’m going to talk about incident investigations and the causations principles. When things go wrong, it may be that some internal trigger such as a software release or configuration […]

Automation

The Three Fs of Event Log Monitoring

Posted on

This is one of a series of posts to share some techniques that I wrote about in Web Operations Dashboards, Monitoring, and Alerting. In this article, I’m going to talk about the Three Fs of event log monitoring. When you first start collecting event logs, it is likely that you will be inundated with a […]

Automation

DataDog Interactive Monitor Report

Posted on

There are two competing fundamental needs for web operations… reducing false alarms, and ensuring you never miss a real incident! Here is a useful DataDog feature that you might not be using and that can help out a great deal in finding that magical balance-point between these two competing needs… interactive monitor reports. You can […]

Automation

Monitor Replication with DataDog and DogStatsd

Posted on

Although DataDog comes with a healthy selection of integrations, there is always going to be something custom that you want to monitor. This is why DogStatsd has been made available. DogStatsd is a small server that aggregates your custom app metrics. Let’s look at monitoring SQL Server Replication using DogStatsd and C#. DogStatsd runs on […]