The main mechanism used is the Internet Control Message Protocol (ICMP) Echo mechanism, also known as the Ping facility. This allows to send a packet of a user selected length to a remote node and have it echoed back.
Nowadays it usually comes pre-installed on almost all platforms, so there is nothing to install on the clients. The server (i.e. the echo reponder) runs at a high priority (e.g. in the kernel on Unix) and so is more likely to provide a good measure of network performance than a user application.
It is very modest in its network bandwidth requirements (~ 100 bits per second per monitoring-remote-host-pair for the way we use it).
The tools that implement the ping monitoring are collectively known as PingER. There are over 17 Monitoring Sites, over 300 remote sites being monitored and over 1000 monitor-site remote-site pairs included.
We use Ping to measure the Response Time, the Packet Loss percentages, the variability of the response time both short term and longer, and the the lack of reachability (no response for a succession of pings). See, for example, PinGER results for Africa.
Following is a short description of each:
The Packet Loss is a good measure of the quality of the link for many TCP based applications. Loss is typically caused by congestion which in turn causes queues (e.g. in routers) to fill and packets to be dropped. Loss may also be caused by the network delivering an imperfect copy of the packet. This is usually caused by bit errors in the links or in network devices.
When we get a zero packet loss sample (a sample refers to a set of n pings), we refer to the network as being quiescent or non-busy. We can then measure the percentage frequency of how often the network was found to be quiescent. A high percentage is an indication of a good network.
For the quality characterization of a link we focus mainly on the packet losses. This is beacuse users now use mostly interactive applications, such as video conferencing and audio chat, which require a low packet loss percentage. The levels that describe the link quality are the following: 0-1% of packet loss is good, 1%-2.5% is acceptable, 2.5%-5% is poor, 5%-12% is very poor and greater than 12% is bad. Our observations show that above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable.
The response time or Round Trip Time (RTT) can give an idea of ping data rate (kilo Bytes/sec). The RTT is related to the distance between the sites plus the delay at each hop along the path between the sites. The distance effect can be roughly characterized by the speed of light in fiber, and is roughly given by distance / (0.6 * c) where c is the velocity of light. Putting this together with the hop delays, the RTT is roughly given by:
RTT = 2 * (distance / (0.6 * c) + hops * delay) where the factor of 2 is since we are measuring the out and back times for the round-trip.
This rule does not apply if there is a satellite in the route. If a satellite is present in any portion of the route, that portion is allocated a fixed FTD (Frame Transfer Delay) of 320 msec. The value of 320 msec takes into account factors such as low earth station viewing angles, and forward error correcting encoding. Most portions that contain a satellite are not expected to exceed 290 msec of delay.
The delays at each hop are a function of 3 major components: the speed of the router, the interface clocking rates and the queuing in the router. The former two are constant over short periods of time. The router queuing effects, on the other hand, are dependent on more random queuing processes and cross-traffic and so are more variable.
The TCP throughput can be obtained combining the losses and the RTTs using the Mathis formula for deriving the maximum TCP throughput: derived throughput=MSS/(RTT*sqrt(loss)).
Given the historical measurements from PingER of the packet loss and RTT we can calculate the maximum TCP bandwidth for the last few years for various groups of sites.
Uses of the PingER data
The PingER data for a site can be used in many different ways. Just to name a few:
- economical: when you buy a network connection, you should be able to check if the quality of the connection is satisfactory. Using the PingER data, you can check if you are getting what you are paying for in terms of throughput and of link quality.
- technical: based on the presentation of the PingER findings, a recommendation can be made to the policy/funding people to increase the bandwidth. Furthermore, if one site in a certain region can attain credible connectivity, then other sites in that region should be able to have better connectivity as well. As a troubleshooting tool, PingER can be used to discern if a problem is network related, identify the time the problem started, whether it is still occurring, and provide quantitative analysis.
- collaboration: in order for scientists to collaborate, a certain level of link quality is required. By using PingER to measure the loss and RTT, you are able to provide expectations for the performance for bulk data transfer and other applications. In case of real-time collaboration, by comparing the results from PingER with various recommendations for loss, RTT and jitter, together with the perceptions of voice quality from the users, you can determine how well VoIP and other interactive applications might work between various pairs of sites.
A detailed technical description of the PingER project, with a list of published papers is available at the PingER homepage.