“Using Pingdom has resulted in better uptime, better flow of communication and happier staff.”
Aptoma is a SaaS provider located in Oslo, Norway. Aptoma provide tools for efficient and beautiful editing of front pages, optimized and flexible workflow for article production and layout, and complete solutions for video encoding, distribution and playback. They also help customers with consulting and implementations in regards to their services.
Håkon Drange is the Head of Infrastructure Operations at Aptoma, where they have been using Pingdom for many years. “Back in the day there were no other real alternatives for hosted solutions like this which could notify through SMS in a simple way,” Håkon Drange recalls, “self-hosted Nagios with email alerts wasn’t good enough.”
Today Aptoma is using Pingdom primarily to cover their need for external monitoring of servers, network components and application health status. “In addition we use other tools for other use cases, but Pingdom complements our monitoring solutions for Aptoma's SaaS offerings,” says Drange and adds “Network ping, TCP connect on different ports and HTTP requests with various degrees of check complexity. Some custom HTTP and XML checks and some for matching content on pages.”
One example of the custom checks that Aptoma use counts the amount of video file transcoding jobs completed (or failed) during the last 24 hours. These values are then exported to XML-files that Pingdom monitors. The values are from the Pingdom checks are imported into a custom Geckoboard dashboard. Regular HTTP(S), Ping and TCP checks are also integrated with Geckoboard and displayed, with pretty graphs, on a large monitor in the Aptoma office.
TCP checks monitoring that services like Apache, Varnish and/or load balancers reply correctly on port 80, that Node.js applications reply on their respective ports and that FTP servers are functioning.
Some TCP checks are used in conjunction with more advanced HTTP (custom) checks to differentiate between infrastructure issues and application issues.
“For instance, a web server might be functioning from an infrastructure perspective, but an application level bug or problem might cause the service to malfunction from an end-user perspective. This insight drastically speeds up debugging,” says Drange.
The first and foremost reason that Aptoma started using Pingdom, like many others, was to get insight into how their systems are performing. “And of course to be notified in case of service outages before situations escalate into problems which in turn could affect customers” says Håkon Drange.
Monitoring and alerts by all means, but no monitoring solution is complete without the transparency that comes with reports. Aptoma has set up Pingdom reports to be sent out within the company: “We have set up weekly email reports to our internal hosting department, and monthly email reports to all managers and developers,” says Drange and continues “A selection of the most important checks are presented on the public status page service which have been integrated in our status system.
Aptoma built a system that queries the Pingdom API and a self-hosted Nagios API. “It collects information about outages to be put in a centralized system so that we can correlate and group Nagios and Pingdom checks/outages into incidents” says Drange.
As a result of using Pingdom, Aptoma has experienced some significant wins as a company. “Better uptime, better flow of communication and happier staff,” are some of the things according to Håkon Drange. “Developers and technicians are less afraid to try new things because they are confident they will be alerted in case of changes gone wrong” he concludes.