automated way of checking whether an internet destination responds to ping
Ping monitoring is an automated way of checking whether an internet destination such as an IP or a domain address responds to ping. When service becomes unavailable or stops responding during an outage (downtime), ping monitoring spots the issue and alerts the right person on the development team.
Go to Palzin Monitor and start with ping monitoring in 2 minutes.
The ping monitoring process works by sending automated ICMP echo requests at a pre-defined frequency to the desired destination and checking for the desired response. The pre-defined frequency depends on the specific user's need but generally ranges anywhere from 30 seconds for business use-cases up to 10 or more minutes for hobby projects.
The desired response from the monitored destination is the reply, where no packets were lost. If the correct reply is received no further action is taken and the monitoring continues. When the ping doesn’t receive a reply, the monitor starts what is called a downtime incident and starts alerting according to the on-call calendar.
ICMP stands for Internet Control Message Protocol. It supports the communication between particular end-points on the internet network, including your device connected to the internet.
Control messages used by ICMP provide communication feedback between two destinations on the internet. Communication has two phases: echo-request and echo-reply. The first device sends the message to the destination requesting the reply. The destination gets the message and replies back to the sender. That's what we call the echo-reply. The ICMP messages were designed to help identify network issues and the accessibility of the devices on the internet.
A downtime incident is a period of time during which a given destination is not responding to ping. This is how a ping response would look like in terminal if Cloudflare DNS was down for example.
PING 184.108.40.206 (220.127.116.11) 56(84) bytes of data.
From 18.104.22.168 icmp_seq=1 Destination Host Unreachable
From 22.214.171.124 icmp_seq=2 Destination Host Unreachable
From 126.96.36.199 icmp_seq=3 Destination Host Unreachable
From 188.8.131.52 icmp_seq=4 Destination Host Unreachable
From 184.108.40.206 icmp_seq=5 Destination Host Unreachable
--- 220.127.116.11 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 9ms
A downtime incident can be also a situation where the request sent by the ping monitor doesn’t receive a response in a given time frame. The request timeout can be anywhere from 5 seconds to 1 minute, depending on the priority of the monitor. Setting the monitor sensitivity correctly is key in avoiding large amounts of alerts.
After an incident is spotted by the ping monitoring tool it needs to be communicated to you. This process is called incident alerting or on-call alerting. On-call (or on-call calendar) is basically a scheduled duties calendar that defines which team member is responsible for incoming incidents.
The most common types of getting alerted by a ping monitor are automated phone calls, SMS, Slack, and Microsoft Teams messages. Ways of alerting depend on factors like the importance of the monitored service, time of the day, and team preference. For example push notifications or emails are generally used for less vital monitors.
Ping alerts include information about what monitor went down and when. It also includes information about the error that triggered the incident, specifically the details about received packages. There are 3 situations that can cause a ping incident:
Downtime alerts also include a call to action for the on-call person to take. Those usually include the option to acknowledge the incident or to view the incident.
After an alert is received it should be acknowledged immediately. If the alert is not acknowledged in a specified time frame (usually 3 minutes), the person next in line on the on-call duty is alerted. This process could continue further until the whole team is alerted. The best practice however is to have the on-call schedule set up in a way that the first team member is always ready to solve incoming incidents.
Once the incident is acknowledged the escalation process is paused and the team can fully focus on solving it. The speed by which an alert is acknowledged is called Time to acknowledge (TTA). Its average from different incidents called Mean Time to Acknowledge (MTTA) is a widely used incident management metric.
The next steps in the downtime resolution process are individual to different teams and apps. For larger teams, they can include collaborations between a few developers or even teams of developers, delegations of incidents to dedicated team members, and more. There are some best practices that should be used by all teams managing incidents. These include incident communication (both internal and external) and incident post-mortems.
Ping monitoring is a fully automated process that can run as often as every 30 seconds, which helps to discover any issues right away. In a best-case scenario, any downtime is fixed quickly, keeping the number of affected users to a minimum.
Ping is the most basic way of checking IPs availability and is the first thing to check when troubleshooting unknown downtime errors. Compared to more complex ways of monitoring like HTTP monitoring or API monitoring it gives a basic network situation overview so any investigation can start from there.
By consistently running over a long period of time ping monitoring gives a unique insight into apps performance - specifically uptime and response times (round-times). This set of historical data allows to benchmark against competitors or older versions of the same apps or products. Here is an example of spiked response times in the Asia region shown by the Palzin Monitor dashboard.
Ping monitoring is the main but not the only part of the synthetic monitoring toolbox. When it comes to monitoring, ping checks of servers, DNS or IP addresses are ideally accompanied by regular uptime checks. This provides better visibility into functionality of monitored services.
Palzin Monitor is an infrastructure monitoring tool that offers reliable ping monitoring. Here is how to get notified whenever an IP address becomes unavailable (doesn't respond to ping).
18.104.22.168 for example