| Networking: Synchronizing Network Clocks
| Home |
One basic problem with running a networked simulation is keeping a consistent,
millisecond-accurate clock across all the hosts in the session. Such a clock
enables collision detection, event scheduling, and path prediction between
hosts and authorities.
The naive solution is to set the clock once at startup by bouncing a ping packet
off another host and estimating the latency. This doesn't work well,
unfortunately, because even hardware timers can drift over time -- cheap clock
crystals, time registers on overclocked CPUs, or just inconsistent results from
different hardware on the same machine.
Fortunately, there has been plenty of practical research in this field. Network Time Protocol, or NTP,
keeps Internet hosts synchronized to the standard atomic clock. If you are
very serious about accuracy, then you should start your research
there. As a game designer, however, I didn't need all of that complexity, so
I adapted a few of the techniques and implemented a simpler algorithm:
- The first host records an initial timestamp from his clock into an unsigned
32-bit value. All others defer to his authority for the session-wide clock
(which is only as accurate as his hardware and network allow).
- Other hosts periodically poll their clock (say every 3 or 5 seconds) and
send the value in a "ping packet" to the server. This ping can even be
piggybacked on a normal data packet.
- The server polls his clock, appends the result, and sends back a "pong
packet" in response -- as quickly as possible. Dropped or extremely late
packets are not resent, because they would be inaccurate and useless.
- On receipt, the client timestamps the response once more. He now has 3
numbers: the original timestamp, the server's timestamp, and the final
timestamp. Once again, very slow responses (over 10 seconds) are discarded
immediately.
- From these values, calculation of the round trip time and average latency
is easy. By adding the average latency to the server's timestamp, the
client knows what its clock should read.
- Record the round trip time into an array of the last 32 or 64 pings.
- Create a temporary array and fill it with the entries in increasing order.
- Extract the values at the 20th, 50th, and 80th percentile, or perform
a similar statistical operation (eg, square of differences).
- If the round trip time for the latest ping falls within the 20th %ile
and the clock appears to be over 3 milliseconds off, then the client
synchronizes his clock to the "correct" time.
The percentiles above are slightly arbitrary, but they are good indicators of
network performance in several ways:
- 20th %ile: These are the absolute best ping times for the link recently,
so the application shouldn't expect much better. Use this value to
select only the most accurate ping packets for synchronizing the clock.
- 50th %ile: The median value describes the latency of exactly half of the
pings. Use this (or a true average) to predict how long a given packet
should spend on the wire.
- 80th %ile: Most packets arrive within this time, so this determines when
a message should be considered "late". A reliable packet that hasn't been
acknowledged by twice this value (round trip time) should be considered
lost or late -- and resent.
The following diagram illustrates an the array of recent timestamps that is
sorted and broken into percentiles. Notice that beyond the 80th %ile that packet
times tend to increase quickly and erratically. This is typical of changing
network conditions, such as backbone traffic and busy routers.

You may decide that sending 32-bit timestamps is a waste of bandwidth -- and
you'd be right. A 32-bit millisecond timer has a range of 50 days, but sending
just a 16-bit timestamp delta has a range of +/- 32 seconds. This means that
you can reduce your packet size by stripping off the most signficant bits of
the timestamp, and then extrapolating them on the remote side. This must be
done immediately, though, because a slow packet may actually spend 30
seconds on the wire or in a queue waiting for acknowledgement.
Once the clock on every machine is accurate within about 6 milliseconds of each
other, it becomes easy to compare timestamps and perform detection or prediction
in real time. To measure the elapsed time between two events A and B, simply
subtract the timestamps (A - B) and examine the signed
result:
- Positive: Event A occurred (A - B) milliseconds
after event B.
- Negative: Event A occurred (B - A) milliseconds
before event B.
- Zero: Event A occurred at exactly the same time as event B.
There are problems, of course. Accuracy is still limited by the network
latency and frequent errors can cause the clock to "jitter". One fix is to widen
the tolerance from 3 to 7 milliseconds over time, so that clocks are well
synchronized at start and touched later only if they drift heavily.
Assymmetric links are another problem. When the latency from host X to host Y is
significantly greater than the latency from host Y to host X (eg, some cable
modems use Ethernet for downstream data, but an analog modem for upstream), it
is practically impossible to measure using the round-trip time of ping packets.
This and other network time issues are still under active research.