How to communicate peer-to-peer through NAT (Network Address Translation) firewalls

This document aims to describe how to provide peer-to-peer network communication in the case where both peers are each behind an individual NAT device. This is possible by using a central server for authentication and matchmaking purposes, but once connections are established, the server gets out of the loop and no bandwidth is expended on the server.

First, some assumptions just to make the explanation clear:

The client has local IP address 10.0.1.1 and is said to "live behind" the thing providing NAT services (firewalling or connection sharing. Let's call the client "Joe".

The server has global ("real") IP address 20.0.2.2, and runs a service of some sort on port 2345 which is generally accessible from the internet at large.

The NAT box has global ("real") IP address 30.0.3.3, and is the router for the internal, NAT-ed machines on the 10.x.y.z network using a second IP address of 10.0.3.3 (this is the "gateway" address).

How NAT works

A NAT firewall (or Internet Connection Sharing device) works like so:

Client, on the inside, using address 10.0.1.1:1234, wants to connect to server, on the outside, using address 20.0.2.2:2345, so client sends a packet to that address through the NAT box, which is the default gateway for the client.
Default gateway intercepts the packet, and notices that if it forwarded it to the outside, nobody would know where to return the packet, as 10.x.y.z is a special "private" range that has no meaning on the real internet. Instead, the gateway generously offers up its own address, 30.0.3.3, for the IP, and allocates a locally unused port, say 3456, for the source address. Thus, the packet that gets forwarded to server 20.0.2.2:2345 has a return address of 30.0.3.3:3456.
To be able to do the right thing with packets that come back from the server, the NAT box maintains a record that it used 30.0.3.3:3456 on behalf of the internal client 10.0.1.1:1234.
The server at 20.0.2.2:2345 receives a packet from a client claiming to live at 30.0.3.3:3456. The server does whatever it needs to do, and then returns a reply to 30.0.3.3:3456.
The reply hits the NAT box on port 3456. The NAT box looks up its internal records, and notices that this is really intended for the internal address 10.0.1.1:1234. It re-writes the destination address of the incoming packet and forwards the packet to the internal machine.
The internal machine sees a packet addressed to itself (10.0.1.1:1234) from the remote server (20.0.2.2:2345).
The server may repond from the same port that the client sent to (such as in this situation) or it may repond from some other port. Thus, most useful, working NAT boxes must not keep a table of where the inboun packets are supposed to be coming from; it is sufficient and necessary to only keep track of the local masqueraded addreses.

Some observations

All is well and good; as long as the NAT box remembers that 30.0.3.3:3456 really means 10.0.1.1:1234 internally, packets from the outside will reach the internal client. Similarly, as long as he remembers that packets from 10.0.1.1:1234 should be sent to 20.0.2.2:2345 using 30.0.3.3:3456 as the sender address, the remote server will not be surprised.

Most NAT firewalls keep a time-out for these connections; these time-outs vary from 10 minutes (on some really broken residential firewalls) to several hours. As long as some traffic flows within the time-out, the connection is remembered and future traffic will keep working.

There are additional gnarls for some protocols which use in-band data for talking about addresses in the assumed global internet name space -- but client addresses are actually taken from the internal, 10.x.y.z name space. (There are a few other private name spaces, as well). For example, FTP uses ASCII to send port numbers and addresses around, to support a feature that nobody uses (telling computer 2 to connect and transfer a file to computer 3, when you're sitting at computer 1). Let's ignore these gnarls; they're not important for modern, well-designed protocols and will just cloud the discussion.

You will note that this scheme relies on the fact that the client initiates the connection to the server, which lets the NAT gateway remember how to re-write packet headers and forward packets correctly. If instead the server had tried to somehow send a packet first, it would have tried to send a packet to 30.0.3.3:3456 (presumably remembered from some previous session). This would have hit the NAT firewall, which wouldn't know what to do with it (as the previous connection would have timed out at that point). The packet would never get re-written to 10.0.1.1:1234.

Similarly, if the remote server tries to send packets to 10.0.1.1:1234, if it had gotten that address for the client through some out-of-band channel, it wouldn't have made it there; the official address that's valid on the network is the address of 30.0.3.3, the NAT box itself (this is why these set-ups are also referred to as "internet connection sharing" -- there's only a single IP address visible to the outside internet).

The firewall may also remember that we tried to talk to 20.0.2.2, and put that address on a list of addresses we're willing to talk to (this varies from firewall to firewall).

The problem with peer-to-peer

Another assumption: our peer-to-peer partner is on a machine behind a firewall with address 40.0.4.4; internally, he uses the address 10.0.4.4. Also, let's call him "Bob."

This is all cool and stuff, but what if your machine 10.0.1.1 wants to talk to some other machine behind another NAT, which uses some other address; let's call it 10.0.4.4 (although it could, in fact, ALSO be using the address 10.0.1.1 inside ITS network -- confusing, but true!).

The 10.0.4.4 machine wouldn't know what to send to. The best it could send to would be 30.0.3.3 -- but 30.0.3.3 doesn't know how to forward the packet. Similarly, if 10.0.1.1 tried to send to 40.0.4.4, that firewall would have the same conundrum.

Introducing the Introducer

The NAT boxes need connections to be established by having outgoing traffic come from the internal clients. One approach would be to start guessing at what port number would be used on the firewall on the other side for a particular connection, once a connection is opened. This could take a lot of time and network traffic before it managed to get a packet through. Another approach is to use the information you might already have, if you could sit in the middle and look at what's going on.

Suppose that 20.0.2.2:2345 remembered the remote addresses it saw, and let other clients know about those addresses. Thus, 10.0.1.1, masqueraded through 30.0.3.3:3456, would look like "Joe:30.0.3.3:3456" on the server list of connected users, and 10.0.4.4, masqueraded through 40.0.4.4:4567, would look like "Bob:40.0.4.4:4567".

Joe starts out connecting to the introducer/server. This server can also conveniently act as a gatekeeper, login server, and authenticator for anyone wanting to participate in the network. This connection ends up telling the NAT gateway that inbound traffic on 30.0.3.3:3456 should be forwarded back to 10.0.1.1:1234, and the NAT firewall may, if it's paranoid, keep 20.0.2.2 on a list of addresses that are allowed to send data in.
Bob starts out connecting to the server as well. His NAT gateway remembers that 40.0.4.4:4567 really means 10.0.4.4:1234, and that 20.0.2.2 is on the list of okay hosts.
Now, the introducer lets Joe know about Bob, and vice versa.
Joe tries to send packets to Bob on address 40.0.4.4:4567. Because it's sent from the same internal address, the firewall re-uses the From address of 30.0.3.3:3456, and write that as the "from" address on a packet headed "to" 40.0.4.4:4567. A paranoid NAT will also add 40.0.4.4 as an allowed destination for this port.
Bob's firewall sees an incoming packet on 40.0.4.4:4567, looks that up in its masquerading table, and forward the packet to 10.0.4.4:1234.

Hey! Joe just sent a packet straight to Bob!

Bob does the same thing going the other way. Suddenly, with a little help from the introducer server out on the network, these friends can talk to each other. The cool thing is that whatever traffic goes on between these peers does NOT go through the central server. Other than letting the clients find each other ("matchmaking") the server gets out of the way.

Small gnarl: if the firewalls keep lists of allowed IP addresses, then some of the early packets will be dropped, until each side of the connection has sent outgoing traffic to the other side, and thus put that side on its "white list".

Summary

Peer-to-peer gaming, file sharing, voice conferencing and other bandwidth hungry network uses can be run without paying huge bandwidth bills for servers. All the servers need to do is authenticate a user and provide a listing of other users to connect to to each such authenticated user. Without the introducer, no communication is possible, so the servers still have authority over "call setup" or similar functionality, but the bandwidth usage is really low.

No doubt, some firewall set-ups, and some rather broken NAT boxes will not allow this punch-through to happen. There are work-around, such as searching for ports in the range nearby to the server connection port, because that's where the NAT box is likely to allocate the UDP port for the "unused" peer connection. However, it is likely to work on most NAT set-ups in use today.

I don't think I invented this technique, although I did independently re-discover it. I know that there are match-making services for games that purport to work through firewalls, and I know that some mad people are actually doing punch-through without introducers, basically by using heuristics and the "try many ports" approach.

Update: I have since found the following document on the web, which describes this technique and has a date in 1999. See, I knew this would be done already! http://www.alumni.caltech.edu/~dank/peer-nat.html

Another Update:I have received reports that modern NAT boxes uses the pair of source-address/destination-address as keys for masquerading. Punch-through will still work work, as long as the NAT box re-uses the same outward facing port for the same inside port/address pair (which is legal and reasonable to do) as this description explcitly states that the first message going each way may be discarded before the punch-through has been set up, when peers start talking to each other.

Third Update:To convince the doubters, I hacked up a simple application that will compile on UNIX (server/client) and Win32 (client only) which demonstrates the concept. It has been validated through FreeBSD and Linux kernel NAT firewalls. You need to change the hard-coded address of the introducer server if you want to re-build it. Find source here.

Fourth Update:I found http://midcom-p2p.sourceforge.net/ when surfing the web. It has a piece of software that can test gateway NAT compatibility, and also attempts NAT using TCP (which is not for the faint-of-heart). There's also a (rather inactive) Yahoo mailing list dedicated to the issue at http://games.groups.yahoo.com/group/nat-peer-games/.

Fifth Update:References seem to be popping up all over; it's quite the topic of the day! http://www.intel.com/cd/ids/developer/asmo-na/eng/79524.htm is an article from Intel describing the same thing.

Sixth Update:The Midcom guys are now working on an Internet Draft to further formalize language around NAT traversal, and suggesting best practices. They also talk about TCP traversal (which still has severe implementation issues). Try the document http://www.brynosaurus.com/pub/net/draft-ford-midcom-p2p-03.txt , unless they updated it again and there's something newer there.

Seventh Update:Some firewalls will reject a packet they're not expecting, and then keep rejecting the packet even if they would later want to expect that. This leads to a race when starting to negotiate the connection. To fix that, the two sides should first send a packet with a short TTL (typically, 2) to "open" their firewalls, and when both sides have done that, start communicating with each other for real. This ready-ness can either be brokered using extra negotiation through the introducer, or it can be done using some reasonable time-outs.

Eighth Update:A good paper with some statistics on how common NAT traversal connectivity actually is is found in http://www.brynosaurus.com/pub/net/p2pnat/. It also talks about a similar TCP punch-through technique, which is less supported than UDP, but still surprisingly well supported.

Ninth Update:A draft RFC which specifies how applications should behave to be NAT compliant is available at http://tools.ietf.org/wg/behave/draft-ford-behave-app-05.txt. Good reading!

I hope this quick introduction has been helpful. If you put this into your own application and gather experience about it, please let me know! My e-mail is: