How to communicate peer-to-peer through NAT (Network Address Translation) firewalls
This document aims to describe how to provide peer-to-peer network communication
in the case where both peers are each behind an individual NAT device. This is
possible by using a central server for authentication and matchmaking purposes,
but once connections are established, the server gets out of the loop and no
bandwidth is expended on the server.
First, some assumptions just to make the explanation clear:
The client has local IP address 10.0.1.1 and is said to "live behind"
the thing providing NAT services (firewalling or connection sharing. Let's call
the client "Joe".
The server has global ("real") IP address 20.0.2.2, and runs a service
of some sort on port 2345 which is generally accessible from the internet at
large.
The NAT box has global ("real") IP address 30.0.3.3, and is the router
for the internal, NAT-ed machines on the 10.x.y.z network using a second IP
address of 10.0.3.3 (this is the "gateway" address).
How NAT works
A NAT firewall (or Internet Connection Sharing device) works like so:
- Client, on the inside, using address 10.0.1.1:1234, wants to connect
to server, on the outside, using address 20.0.2.2:2345, so client sends a
packet to that address through the NAT box, which is the default gateway
for the client.
- Default gateway intercepts the packet, and notices that if it forwarded
it to the outside, nobody would know where to return the packet, as 10.x.y.z
is a special "private" range that has no meaning on the real internet. Instead,
the gateway generously offers up its own address, 30.0.3.3, for the IP, and
allocates a locally unused port, say 3456, for the source address. Thus, the
packet that gets forwarded to server 20.0.2.2:2345 has a return address of
30.0.3.3:3456.
- To be able to do the right thing with packets that come back from the
server, the NAT box maintains a record that it used 30.0.3.3:3456 on behalf
of the internal client 10.0.1.1:1234.
- The server at 20.0.2.2:2345 receives a packet from a client claiming to
live at 30.0.3.3:3456. The server does whatever it needs to do, and then
returns a reply to 30.0.3.3:3456.
- The reply hits the NAT box on port 3456. The NAT box looks up its internal
records, and notices that this is really intended for the internal address
10.0.1.1:1234. It re-writes the destination address of the incoming packet
and forwards the packet to the internal machine.
- The internal machine sees a packet addressed to itself (10.0.1.1:1234)
from the remote server (20.0.2.2:2345).
- The server may repond from the same port that the client sent to
(such as in this situation) or it may repond from some other port.
Thus, most useful, working NAT boxes must not keep a table of where the
inboun packets are supposed to be coming from; it is sufficient and necessary
to only keep track of the local masqueraded addreses.
Some observations
All is well and good; as long as the NAT box remembers that 30.0.3.3:3456
really means 10.0.1.1:1234 internally, packets from the outside will reach the
internal client. Similarly, as long as he remembers that packets from
10.0.1.1:1234 should be sent to 20.0.2.2:2345 using 30.0.3.3:3456 as the
sender address, the remote server will not be surprised.
Most NAT firewalls keep a time-out for these connections; these time-outs
vary from 10 minutes (on some really broken residential firewalls) to several
hours. As long as some traffic flows within the time-out, the connection is
remembered and future traffic will keep working.
There are additional gnarls for some protocols which use in-band data for
talking about addresses in the assumed global internet name space -- but
client addresses are actually taken from the internal, 10.x.y.z name space.
(There are a few other private name spaces, as well). For example, FTP uses
ASCII to send port numbers and addresses around, to support a feature that
nobody uses (telling computer 2 to connect and transfer a file to computer 3,
when you're sitting at computer 1). Let's ignore these gnarls; they're not
important for modern, well-designed protocols and will just cloud the
discussion.
You will note that this scheme relies on the fact that the client
initiates the connection to the server, which lets the NAT gateway remember
how to re-write packet headers and forward packets correctly. If instead the
server had tried to somehow send a packet first, it would have tried to send
a packet to 30.0.3.3:3456 (presumably remembered from some previous session).
This would have hit the NAT firewall, which wouldn't know what to do with it
(as the previous connection would have timed out at that point). The packet
would never get re-written to 10.0.1.1:1234.
Similarly, if the remote server tries to send packets to 10.0.1.1:1234,
if it had gotten that address for the client through some out-of-band
channel, it wouldn't have made it there; the official address that's valid
on the network is the address of 30.0.3.3, the NAT box itself (this is why
these set-ups are also referred to as "internet connection sharing" -- there's
only a single IP address visible to the outside internet).
The firewall may also remember that we tried to talk to 20.0.2.2, and
put that address on a list of addresses we're willing to talk to (this varies
from firewall to firewall).
The problem with peer-to-peer
Another assumption: our peer-to-peer partner is on a machine behind a firewall
with address 40.0.4.4; internally, he uses the address 10.0.4.4. Also, let's call
him "Bob."
This is all cool and stuff, but what if your machine 10.0.1.1 wants to talk
to some other machine behind another NAT, which uses some other address; let's
call it 10.0.4.4 (although it could, in fact, ALSO be using the address 10.0.1.1
inside ITS network -- confusing, but true!).
The 10.0.4.4 machine wouldn't know what to send to. The best it could send to
would be 30.0.3.3 -- but 30.0.3.3 doesn't know how to forward the packet.
Similarly, if 10.0.1.1 tried to send to 40.0.4.4, that firewall would have the
same conundrum.
Introducing the Introducer
The NAT boxes need connections to be established by having outgoing traffic
come from the internal clients. One approach would be to start guessing at what
port number would be used on the firewall on the other side for a particular
connection, once a connection is opened. This could take a lot of time and
network traffic before it managed to get a packet through. Another approach is
to use the information you might already have, if you could sit in the middle
and look at what's going on.
Suppose that 20.0.2.2:2345 remembered the remote addresses it saw, and let
other clients know about those addresses. Thus, 10.0.1.1, masqueraded through
30.0.3.3:3456, would look like "Joe:30.0.3.3:3456" on the server list of
connected users, and 10.0.4.4, masqueraded through 40.0.4.4:4567, would look
like "Bob:40.0.4.4:4567".
- Joe starts out connecting to the introducer/server. This server can
also conveniently act as a gatekeeper, login server, and authenticator for
anyone wanting to participate in the network. This connection ends up telling
the NAT gateway that inbound traffic on 30.0.3.3:3456 should be forwarded back
to 10.0.1.1:1234, and the NAT firewall may, if it's paranoid, keep 20.0.2.2
on a list of addresses that are allowed to send data in.
- Bob starts out connecting to the server as well. His NAT gateway remembers
that 40.0.4.4:4567 really means 10.0.4.4:1234, and that 20.0.2.2 is on the
list of okay hosts.
- Now, the introducer lets Joe know about Bob, and vice versa.
- Joe tries to send packets to Bob on address 40.0.4.4:4567. Because it's sent
from the same internal address, the firewall re-uses the From address of 30.0.3.3:3456,
and write that as the "from" address on a packet headed "to" 40.0.4.4:4567. A
paranoid NAT will also add 40.0.4.4 as an allowed destination for this port.
- Bob's firewall sees an incoming packet on 40.0.4.4:4567, looks that up in
its masquerading table, and forward the packet to 10.0.4.4:1234.
Hey! Joe just sent a packet straight to Bob!
Bob does the same thing going the other way. Suddenly, with a little help from
the introducer server out on the network, these friends can talk to each other.
The cool thing is that whatever traffic goes on between these peers does NOT go
through the central server. Other than letting the clients find each other
("matchmaking") the server gets out of the way.
Small gnarl: if the firewalls keep lists of allowed IP addresses, then some
of the early packets will be dropped, until each side of the connection has sent
outgoing traffic to the other side, and thus put that side on its "white list".
Summary
Peer-to-peer gaming, file sharing, voice conferencing and other bandwidth hungry
network uses can be run without paying huge bandwidth bills for servers. All the
servers need to do is authenticate a user and provide a listing of other users to
connect to to each such authenticated user. Without the introducer, no communication
is possible, so the servers still have authority over "call setup" or similar
functionality, but the bandwidth usage is really low.
No doubt, some firewall set-ups, and some rather broken NAT boxes will not allow
this punch-through to happen. There are work-around, such as searching for ports in
the range nearby to the server connection port, because that's where the NAT box is
likely to allocate the UDP port for the "unused" peer connection. However, it is
likely to work on most NAT set-ups in use today.
I don't think I invented this technique, although I did independently re-discover
it. I know that there are match-making services for games that purport to work
through firewalls, and I know that some mad people are actually doing punch-through
without introducers, basically by using heuristics and the "try many ports" approach.
Update: I have since found the following document on the web, which
describes this technique and has a date in 1999. See, I knew this would be done
already!
http://www.alumni.caltech.edu/~dank/peer-nat.html
Another Update:I have received reports that modern NAT boxes uses the
pair of source-address/destination-address as keys for masquerading. Punch-through
will still work work, as long as the NAT box re-uses the same outward facing port
for the same inside port/address pair (which is legal and reasonable to do) as this
description explcitly states that the first message going each way may be discarded
before the punch-through has been set up, when peers start talking to each other.
Third Update:To convince the doubters, I hacked up a simple application
that will compile on UNIX (server/client) and Win32 (client only) which demonstrates
the concept. It has been validated through FreeBSD and Linux kernel NAT firewalls.
You need to change the hard-coded address of the introducer server if you want to
re-build it. Find source here.
Fourth Update:I found
http://midcom-p2p.sourceforge.net/ when surfing the web. It has a piece of
software that can test gateway NAT compatibility, and also attempts NAT using TCP
(which is not for the faint-of-heart). There's also a (rather inactive) Yahoo
mailing list dedicated to the issue at
http://games.groups.yahoo.com/group/nat-peer-games/.
Fifth Update:References seem to be popping up all over; it's quite the
topic of the day!
http://www.intel.com/cd/ids/developer/asmo-na/eng/79524.htm is an article from
Intel describing the same thing.
Sixth Update:The Midcom guys are now working on an Internet Draft to
further formalize language around NAT traversal, and suggesting best practices. They
also talk about TCP traversal (which still has severe implementation issues). Try
the document
http://www.brynosaurus.com/pub/net/draft-ford-midcom-p2p-03.txt , unless they updated it again and there's something newer there.
Seventh Update:Some firewalls will reject a packet they're not expecting,
and then keep rejecting the packet even if they would later want to expect that. This
leads to a race when starting to negotiate the connection. To fix that, the two sides
should first send a packet with a short TTL (typically, 2) to "open" their firewalls,
and when both sides have done that, start communicating with each other for real. This
ready-ness can either be brokered using extra negotiation through the introducer, or
it can be done using some reasonable time-outs.
Eighth Update:A good paper with some statistics on how common NAT
traversal connectivity actually is is found in
http://www.brynosaurus.com/pub/net/p2pnat/. It also talks about a similar TCP
punch-through technique, which is less supported than UDP, but still surprisingly
well supported.
Ninth Update:A draft RFC which specifies how applications should behave
to be NAT compliant is available at
http://tools.ietf.org/wg/behave/draft-ford-behave-app-05.txt. Good reading!
I hope this quick introduction has been helpful. If you put this into your own
application and gather experience about it, please let me know! My e-mail is:
|