“Pingdom is an important part of the tool suite that allows us to promise an service uptime close to 100%”
Accedo was founded ten years ago back in 2004, and today they have eleven offices and is headquartered in Stockholm, Sweden. Accedo work with app development for Smart/Connected TVs. They have developed apps for customers like HBO, Spotify, Disney, MTV and Viaplay. With more than 1000 deployed applications on more than 40 different platforms reaching more than 100 million households, Accedo supports most of the application platforms on the market today.
Fredrik Wallenius is the Head of IT & Operations at Accedo. “We have used Pingdom for many years. We needed a reliable and easy to configure way of monitoring the status of our systems” recalls Fredrik Wallenius.
Since that first initial contact Accedo’s use of Pingdom have evolved and today it is part of the tools they use every day. “The custom HTTP check is a very clever feature. This allows us to build in a health check API in our services that can make a detailed control” says Wallenius.
“Some of our most uptime critical services are not web sites but rather APIs. It can be complicated to judge if an API is fully operational by just calling one of the API calls. If you are (very) unlucky it can be so that only that API call is working but the rest of the service is down,” he says. In those cases Accedo often create a custom health check that resides “next to” the actual API and set up Pingdom to ping this health check.
“What the health check does when receiving a request is that itself makes requests to several parts of the API we are monitoring and only return an OK if all parts respond as expected. That way we can be sure that all parts of the system is operational using only one Pingdom check.”
One of Accedo’s biggest products is an App Store solution called Accedo Application Sphere used by several TV operators and manufacturers. There are millions of set-top-boxes and TVs throughout the world relying on that the API of this service is up in order to present their app content.
“We have this service deployed on several locations around the globe and even the slightest downtime will affect many end-users so it is crucial that our operation teams get notified immediately in case of events,” says Wallenius.
For their most important services and apps Accedo alert the operation team using both email, sms and phone by integrating with the Pagerduty service. During weekends when they have staff members on call the phone alert is the key to being able to keep the uptime they promise their customers.
“But we also use Pingdom alerts for less critical systems, for example several staging environments. Here we only use mail alerts since even if an action is needed it may not have to be done within minutes,” says Fredrik Wallenius.
Furthermore Accedo use the Pingdom reports as a basis for SLA follow-ups with several of their customers. “For these follow-ups it is important to have information from an unbiased and reliable source that both parties can trust,” he adds.
“Monitor as much as possible! Things will break and you will want to notice this before your customers do. The customers can often understand that there are occasional events of downtime but they will not accept you not knowing about it and not being able to afterwards tell them exactly what happened and when it happened,” Fredrik Wallenius concludes.