Synthetic Monitoring

Simulate visitor interaction with your site to monitor the end user experience.

View Product Info

FEATURES

Simulate visitor interaction

Identify bottlenecks and speed up your website.

Learn More

Real User Monitoring

Enhance your site performance with data from actual site visitors

View Product Info

FEATURES

Real user insights in real time

Know how your site or web app is performing with real user insights

Learn More

Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info
Comprehensive set of turnkey infrastructure integrations

Including dozens of AWS and Azure services, container orchestrations like Docker and Kubernetes, and more 

Learn More

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info
Complete visibility into application issues

Pinpoint the root cause down to a poor-performing line of code

Learn More

Log Management and Analytics Powered by SolarWinds Loggly

Integrated, cost-effective, hosted, and scalable full-stack, multi-source log management

 View Log Management and Analytics Info
Collect, search, and analyze log data

Quickly jump into the relevant logs to accelerate troubleshooting

Learn More

Correlating Pingdom Alerts With AppOptics and Loggly

So SolarWinds® Pingdom® has alerted you to an issue—what do you do now? 

In this article, I’ll explain the features and capabilities of a full monitoring stack in SolarWinds and how you can use it to get to the bottom of a 3 a.m. Pingdom wake-up call.

The Setup

For our web service, we use a simple architecture of a front-end Flask application with a Postgres back end served behind an edge SSL-terminating NGINX instance on the DigitalOcean Managed Kubernetes service. 

Though this architecture is easy to set up and visualize, this kind of complexity can quickly overwhelm an investigation and give no clear source of alerts or clues. To help us correlate our problem from a single Pingdom alert and nail down the root cause quickly and efficiently, we’ll use two tools from the SolarWinds stack: AppOptics for capturing metrics and Loggly® for capturing logs.

Installing each of these tools is beyond the scope of this article, but it’s easy in a DigitalOcean managed cluster. To export the AppOptics metrics, we utilized the SolarWinds agent as a DaemonSet, and for the Loggly logs, we utilized rKubelog, a lightweight Kubernetes log tail application. Both tools are powerful assets and can be used independently, but together, they’re an even more powerful way to quickly get to the root cause of an issue and reduce mean time to resolution (MTTR). 

Pingdom: The Alert

In our scenario, we’re using the alerting mechanisms of Pingdom, an externally hosted suite of agents designed to constantly check your site for health and speed.

Below is our Pingdom alert dashboard, showing us the events that have triggered our wake-up call over the past several hours.

Alerts Dashboard in Pingdom


We can see from our dashboard there are some moments of unavailability in the last few tests, but the cause isn’t clear from this alone.

Based on the information in the Test Log from Pingdom, we can see our performance from several regions across the U.S. has worsened in the last 10 minutes:

Pindgom Test

And by clicking through to the root cause analysis section in Pingdom, we can see pretty clearly this is an application-side error (HTTP Error 500). We’ll need to dig into application timelines and logs to understand what we’re looking at.

Pingdom Root Cause Analysis

AppOptics: The Timing

To better understand the scope and timing of our problem, we can look at a simple metric heuristic dashboard AppOptics has provided out of the box with our Docker container monitoring, courtesy of our snap agent. 

With AppOptics, we can easily see in the image below one container repeatedly dumps memory (the bottom-right graph) and is high in CPU (the bottom-left graph) with each iteration of container replacement. This tells us at the very least, this one container is struggling with something. It also tells us when this probably started (just before 16:30) and gives us an idea of the kind of impact we might be seeing with this behavior. 

Unfortunately, since the application appears to be continuously restarting, we can be sure our customers are constantly seeing application-related failures on our site.

AppOptics Dashboard

Loggly: The Error Volume

With the information from AppOptics, we have a clear start time and a general idea of what applications or services are seeing the problem. We can now go one step further and use Loggly to filter our logs to this time and service and gain a crystal-clear picture of the errors causing our 500s all the way up at Pingdom.

The Loggly dashboard is an all-inclusive log view, giving us some easy search functionality to dial into what’s causing these trends. In this image, we can see a flood of new events coming into our unfiltered log view.

Loggly Dashboard

In general, we can see a large increase in errors. By using the search feature in Loggly to drill down into our filter of errors, we can clearly see our Postgres connector is experiencing unexpected closed connections in large volumes.

Drilling down into filtered errors in Loggly

Oddly, there aren’t any errors logged for the database itself. This indicates there may not be an issue with the database at all, leaving only the path between the application and the database as the source of the error. 

Now that we fully understand the error causing our alerts and the servers on which it’s happening, we can start working on mitigation. In our case, we know there’s a problem connecting only from our application to this specific database; no other applications or infrastructure appear to have the behavior based on our logs and overall metrics from our tool suites. To attempt a quick mitigation, we might restart the two pods and (hopefully) make the problem magically vanish. But as this is a simulated error, we already know what the problem is and can fast-forward to the cause. In our case, with a little manual testing of network pathing tools, such as traceroute, we can quickly find the latency exists on the database public interface and was caused by a misconfigured network queue. 

Using SolarWinds to Find Resolution

To find a problem, it’s necessary to articulate it. The best way to do this is with a connected set of metrics and observability capable of bringing you directly to the originating event. While Pingdom is great for overall availability monitoring, AppOptics has powerful built-in functionality to integrate APM into the application stack, and Loggly provides intuitive and elegant centralized log parsing. Using these tools together provides a steady and scalable workflow for any user.

So before you get another call at 3 a.m., do yourself the favor of laying the groundwork now—while you’re awake—so tired-you can quickly resolve the issue and go back to bed.

Webpages Are Getting Larger Every Year, and Here’s Why it Matters

Last updated: February 29, 2024 Average size of a webpage matters because it [...]

A Beginner’s Guide to Using CDNs

Last updated: February 28, 2024 Websites have become larger and more complex [...]

The Five Most Common HTTP Errors According to Google

Last updated: February 28, 2024 Sometimes when you try to visit a web page, [...]

Page Load Time vs. Response Time – What Is the Difference?

Last updated: February 28, 2024 Page load time and response time are key met [...]

Can gzip Compression Really Improve Web Performance?

Last updated: February 26, 2024 The size of the web is slowly growing. Over [...]

Monitor your website’s uptime and performance

With Pingdom's website monitoring you are always the first to know when your site is in trouble, and as a result you are making the Internet faster and more reliable. Nice, huh?

START YOUR FREE 30-DAY TRIAL

MONITOR YOUR WEB APPLICATION PERFORMANCE

Gain availability and performance insights with Pingdom – a comprehensive web application performance and digital experience monitoring tool.

START YOUR FREE 30-DAY TRIAL
Start monitoring for free