Synthetic Monitoring

Simulate visitor interaction with your site to monitor the end user experience.

View Product Info

FEATURES

Simulate visitor interaction

Identify bottlenecks and speed up your website.

Learn More

Real User Monitoring

Enhance your site performance with data from actual site visitors

View Product Info

FEATURES

Real user insights in real time

Know how your site or web app is performing with real user insights

Learn More

Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info
Comprehensive set of turnkey infrastructure integrations

Including dozens of AWS and Azure services, container orchestrations like Docker and Kubernetes, and more 

Learn More

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info
Complete visibility into application issues

Pinpoint the root cause down to a poor-performing line of code

Learn More

Log Management and Analytics Powered by SolarWinds Loggly

Integrated, cost-effective, hosted, and scalable full-stack, multi-source log management

 View Log Management and Analytics Info
Collect, search, and analyze log data

Quickly jump into the relevant logs to accelerate troubleshooting

Learn More

How to stop an outage from becoming an outrage


Sooner or later, every site or application will fail. However the consequences depend not only on how the failure is managed but also on how it is communicated. Recently the web hosting company Media Temple and even Google have well illustrated how hard it is for modern connected organizations to respond quickly enough to system outages. Here’s a suggested crisis checklist and notes on the difficulties of always practicing it.
On Saturday, February 28, a storage cluster at Media Temple failed, depriving thousands of customers of their service until the following Monday morning. In the process, the company did not mass e-mail its customers or swiftly seem to update anything other than the system status account on Twitter. Only later did the company attempt to send private messages to the accounts of some irritated customers. This quickly led to outrage on blogs and online communities.
Similarly, in an incident covered here as well as elsewhere, Google faced a similar crisis four days earlier when its Gmail service stopped functioning globally for 2-4 hours. As millions of users and companies were unable to use their e-mail, the company communicated only very briefly on its official blog. Of course, very quickly the big media, blogosphere and communities were on fire with messages about “Gfail”.
These examples show how modern organizations need to excel in following the deceptively simple rules of crisis communication – always try to reserve the capacity to:

I. Preparations:
  • Define your main stakeholders – customers, investors, partners, suppliers etc.
  • Keep an eye on big real-time forums where they may communicate.
  • Define what a serious error is and how to notice when one has happened.
II. Urgent actions:
  • Define what has happened as far as possible – be careful to separate facts from guesses.
  • Define what to do about it – recovery, calling in extra resources etc.
  • Define which stakeholders are affected.
  • Define how to communicate with these groups – avoid speculation and optimistic promises in favor of continuous updates, addressing the information vacuum and user frustration.
  • Start communicating.
III. Follow-up:
  • Respond quickly to further questions from key stakeholders – always stick to the facts/message as agreed above and avoid speculation.
  • If an error has been committed, offer apologies and remuneration (which both the mentioned companies currently have done).

With hindsight, Media Temple reacted as quickly as possible, throwing all resources at solving the issue – and forgot to communicate actively with their customers, generating anger and accusations that might have been avoided. Google for their part aggravated the error by reacting first offering erroneous information to its users – the failure was hardly “limited to a small subset of users”.
Both companies were hung high on Twitter, underlining the need for organizations to monitor real-time communities like this who can improve or aggravate the situation by instantly spreading information – if such is available. Media Temple later claimed that it lacked the staff resources to handle the thousands of micro conversations.
It may be the case that in this kind of situation the best course for a company may be to define its one message, mass-communicate and update this actively and avoid speculation or individualization. This is when it is beneficial to have one single source of information that all customers can be referred to for status updates, for example an externally hosted status blog.
So, are we saying that by following the above rules, communications mishaps could never happen? Of course not, the answer is that crisis management is never easy – otherwise it wouldn’t be a crisis.
Do you have any examples of superb crisis communications – or the opposite?
Please don’t hesitate to share them with us in the comments.

Introduction to Observability

These days, systems and applications evolve at a rapid pace. This makes analyzi [...]

Webpages Are Getting Larger Every Year, and Here’s Why it Matters

Last updated: February 29, 2024 Average size of a webpage matters because it [...]

A Beginner’s Guide to Using CDNs

Last updated: February 28, 2024 Websites have become larger and more complex [...]

The Five Most Common HTTP Errors According to Google

Last updated: February 28, 2024 Sometimes when you try to visit a web page, [...]

Page Load Time vs. Response Time – What Is the Difference?

Last updated: February 28, 2024 Page load time and response time are key met [...]

Monitor your website’s uptime and performance

With Pingdom's website monitoring you are always the first to know when your site is in trouble, and as a result you are making the Internet faster and more reliable. Nice, huh?

START YOUR FREE 30-DAY TRIAL

MONITOR YOUR WEB APPLICATION PERFORMANCE

Gain availability and performance insights with Pingdom – a comprehensive web application performance and digital experience monitoring tool.

START YOUR FREE 30-DAY TRIAL
Start monitoring for free