CodeWhore.com
 Networking: Synchronizing Network Clocks   Home 

One basic problem with running a networked simulation is keeping a consistent, millisecond-accurate clock across all the hosts in the session. Such a clock enables collision detection, event scheduling, and path prediction between hosts and authorities.

The naive solution is to set the clock once at startup by bouncing a ping packet off another host and estimating the latency. This doesn't work well, unfortunately, because even hardware timers can drift over time -- cheap clock crystals, time registers on overclocked CPUs, or just inconsistent results from different hardware on the same machine.

Fortunately, there has been plenty of practical research in this field. Network Time Protocol, or NTP, keeps Internet hosts synchronized to the standard atomic clock. If you are very serious about accuracy, then you should start your research there. As a game designer, however, I didn't need all of that complexity, so I adapted a few of the techniques and implemented a simpler algorithm:

  1. The first host records an initial timestamp from his clock into an unsigned 32-bit value. All others defer to his authority for the session-wide clock (which is only as accurate as his hardware and network allow).
  2. Other hosts periodically poll their clock (say every 3 or 5 seconds) and send the value in a "ping packet" to the server. This ping can even be piggybacked on a normal data packet.
  3. The server polls his clock, appends the result, and sends back a "pong packet" in response -- as quickly as possible. Dropped or extremely late packets are not resent, because they would be inaccurate and useless.
  4. On receipt, the client timestamps the response once more. He now has 3 numbers: the original timestamp, the server's timestamp, and the final timestamp. Once again, very slow responses (over 10 seconds) are discarded immediately.
  5. From these values, calculation of the round trip time and average latency is easy. By adding the average latency to the server's timestamp, the client knows what its clock should read.
  6. Record the round trip time into an array of the last 32 or 64 pings.
  7. Create a temporary array and fill it with the entries in increasing order.
  8. Extract the values at the 20th, 50th, and 80th percentile, or perform a similar statistical operation (eg, square of differences).
  9. If the round trip time for the latest ping falls within the 20th %ile and the clock appears to be over 3 milliseconds off, then the client synchronizes his clock to the "correct" time.
The percentiles above are slightly arbitrary, but they are good indicators of network performance in several ways: The following diagram illustrates an the array of recent timestamps that is sorted and broken into percentiles. Notice that beyond the 80th %ile that packet times tend to increase quickly and erratically. This is typical of changing network conditions, such as backbone traffic and busy routers.

You may decide that sending 32-bit timestamps is a waste of bandwidth -- and you'd be right. A 32-bit millisecond timer has a range of 50 days, but sending just a 16-bit timestamp delta has a range of +/- 32 seconds. This means that you can reduce your packet size by stripping off the most signficant bits of the timestamp, and then extrapolating them on the remote side. This must be done immediately, though, because a slow packet may actually spend 30 seconds on the wire or in a queue waiting for acknowledgement.

Once the clock on every machine is accurate within about 6 milliseconds of each other, it becomes easy to compare timestamps and perform detection or prediction in real time. To measure the elapsed time between two events A and B, simply subtract the timestamps (A - B) and examine the signed result:

There are problems, of course. Accuracy is still limited by the network latency and frequent errors can cause the clock to "jitter". One fix is to widen the tolerance from 3 to 7 milliseconds over time, so that clocks are well synchronized at start and touched later only if they drift heavily.

Assymmetric links are another problem. When the latency from host X to host Y is significantly greater than the latency from host Y to host X (eg, some cable modems use Ethernet for downstream data, but an analog modem for upstream), it is practically impossible to measure using the round-trip time of ping packets. This and other network time issues are still under active research.

Copyright (c) 1999-2003 Matt Slot and Ambrosia Software, Inc.