VOICE Home Page: http://www.os2voice.org |
March 2002
[Newsletter Index]
|
By Peter Moylan © March 2002 |
Originally I wrote this article in the context of a specific problem: is it possible
to make an FTP server work when it is behind a firewall? As a result, you'll find
many references to FTP here. This should, however, also be of interest to people
who have no particular interest in FTP, but who want a better understanding of how
networking works.
The Internet Protocol (IP) provides the basic transport of packets from one node
to another. Each IP packet includes a header that includes, among other things,
a source and destination address. These addresses are known as IP addresses.
Terminal Control Protocol (TCP) is a layer that sits on top of the IP layer.
It adds some higher-level functions, but it depends on IP for the actual transfer
of command and data packets. You can find out more about TCP and TCP/IP applications
by typing the command 'TCPhelp' at an OS/2 command prompt.
The IP addresses of the form
At any given time, a network node might have several connections in progress
with various machines around the network. This means that we need a way of labelling
the different connections, so that we can tell them apart. The method used is to
assign port numbers to (our end of) the connections. The port number has nothing
to do with hardware I/O ports. Instead, it is simply a numbering system to label
the connections. A port number is an unsigned 16-bit number. One end of a connection
is uniquely identified by the pair (IP address, port number). A connection is defined
by two of these pairs, one for each end.
The software mechanism for setting this up is known as a "socket".
You can think of the socket as a data structure that keeps track of the two IP addresses
and two port numbers involved in the transfer, together with whatever other state
information is needed (data buffers, byte counters, etc.).
When a socket is first created, we know only the IP address and port number at
our own end. Once a connection is established, we get to find out the IP address
and port number at the other end. Naturally, one of the two ends has to be the one
to initiate the connection. The usual mechanism is that the server goes into a "listening"
state where it waits for client connections, and then the client end actually establishes
the connection.
Port numbers in the range 0 to 49151 are reserved for "server" ports.
More precisely, those in the range 0 to 1023 are defined by official standards;
those in the range 1024 to 49151 have a less official status, but they are still
considered to be "registered ports" which are allocated to known applications.
A list of these reserved port numbers can be found in the file MPTN\ETC\SERVICES.
Port numbers in the range 49152 to 65535 form a pool of "available"
ports which are used whenever a new port is needed. Typically one of these ports
is allocated for a short-term connection, and then deallocated once the operation
is done.
In a client/server protocol, the clients need to have some way of finding the
servers. For this reason, servers always listen on "well known ports"
which are reserved for this purpose. The FTP protocol uses two connection channels,
one for commands and one for data. Consequently it has two well known ports: port
21 for the commands, and port 20 for the data. This, of course, is at the server
end. At the client end, the client can use whatever ports it wants, and normally
the client will choose its ports from the pool of available ports.
The FTP protocol allows for two kinds of data connection. In so-called "passive
FTP" the data transfer is initiated from the client end. In non-passive FTP,
also known as "port FTP", the transfer is initiated from the server end.
In both cases the command connection uses port 21 at the server end. The difference
between the two methods lies in the way the data ports are allocated.
227 Entering passive mode (127,0,0,1,203,197)
The first four of those numbers specifies the IP address of the server. The remaining
two specify a port number. In effect the PASV command is saying "please choose
a data port, and tell me what port it is". The server chooses the port, normally
from the big pool of available port numbers, and reports its number back to the
client. The server then listens at that port for the client to initiate a data connection.
Of course the client must also choose a data port at its own end; the server finds
out which port it is after the data connection is established.
In passive FTP, the command channel and the data channel behave in the same way.
The server listens at a known port. The client knows what the port number is, so
it can initiate a connection to that port.
This is specified with a PORT command from the client. An example of this command
is
PORT 127,0,0,1,203,201
This specifies the IP address of the client, and the port number that the client
wishes to use for the data transfer. That is, the client chooses a port number,
it listens on that port, and the server makes a connection to that port. In both
this and the PASV example, the IP address was 127.0.0.1. That's because I used the
loopback connection on my computer to generate the examples. In practice, the IP
address will be whatever address belongs to your own machine.
Meanwhile, the server must also choose a port number at its own end. By standard
convention, this is almost always port 20.
Note that the client must always give either a PASV command or a PORT command
before it starts an upload or download. Which of these it gives controls whether
we use a passive or non-passive transfer.
The function of a firewall is to allow some packets to pass through, while refusing
to let others pass through. This is done by a set of rules created by the system
administrator. The administrator has to decide which classes of traffic are legal.
I don't have much experience with firewalls, so I can't give an expert description
of what happens here. I believe, however, that the rules are usually based on port
numbers. Traffic to/from certain ports is allowed, while other ports are blocked.
The rules would normally have to be asymmetric, in the sense that the rules for
outgoing packets would be quite different from the rules for incoming packets.
Consider the case where an FTP client is behind a firewall, and is talking to
a server that is not behind a firewall. A typical choice of firewall rules would
make non-passive FTP illegal, because non-passive FTP requires the server - the
machine outside the firewall - to initiate the data connection. A firewall is often
set up in such a way that machines outside the firewall are not allowed to initiate
a connection. In fact, this is a very large part of the motivation for adding passive
FTP to the FTP standard. If the client is behind a firewall, then normally it should
use only passive FTP.
Conversely, if the server is behind the firewall and the client is not, then
passive FTP is likely to be blocked and port FTP is the only sensible option.
If the client is behind a firewall, and the server is behind a different firewall,
then we are in trouble. The 'firewall' concept was not designed with this situation
in mind. Servers should not be behind a firewall, except of course for servers that
are supposed to be private to the LAN. If you have a firewall protecting your LAN,
then you should normally put your public server applications on the same machine
that is running the firewall software. If you do this, then technically the server
is outside the firewall.
If, for any reason, you really have to have your server behind a firewall, then
you had better be an expert in designing the firewall rules. You should read the
preceding sections very carefully, to see which ports should be enabled in the firewall
rules. Actually, that part is easy. The difficult part is to do this in such a way
that you allow the FTP server to function, but without compromising the security
of your LAN. If you make the rules too permissive, you might as well not have a
firewall.
An IP data packet has a header that specifies, among other things, a source and
destination IP address. That says who is sending the packet, and who is supposed
to receive it. With the NAT feature in place, the firewall alters the source IP
address, to make it appear as if the packet came from a different address. That
is for outgoing traffic. For incoming traffic, the firewall intercepts the packets
destined for the "fake" IP address, and sends them to the real intended
recipient.
Consider the case where you have an FTP client behind the firewall, and an FTP
server outside the firewall. When the client connects to the server, the server
does not see the client's true IP address. Instead, it sees the address of the firewall.
The server doesn't even know that a firewall is present. It simply interacts with
that address, just as it would with any client. As far as the server is concerned,
the client is the firewall machine. However, the firewall passes on the server's
responses to the true client.
Similarly, if a server is behind a firewall then every client outside the firewall
thinks that the server is at the same address as the firewall machine. The firewall
modifies the addresses, and passes the traffic on to the true address of the server.
All of this would work well except for two little details. As we have seen in
earlier examples, the PASV and PORT commands of the FTP protocol send IP addresses
as data. These are the true IP addresses, not the addresses as altered by the NAT
software. This can result in data being sent to the wrong address.
The logical solution to this problem would be for the NAT software to intercept
the PASV and PORT commands, and alter the numbers in those lines. Some firewall
software is smart enough to do this. Unfortunately, many firewalls are not able
to make this adjustment.
In the past, people didn't normally think in terms of putting an FTP server behind
a firewall. Now that we have cable modems, ADSL, and various other ways to get high-speed
data links, that option is becoming more common. To deal with this complication,
it might be necessary for the FTP server to fake its response to the PASV command,
by giving the address of the firewall rather than its own IP address.
Likewise, an FTP client that is behind a firewall should in principle adjust its PORT command parameters to allow for the firewall. (In the case of PORT, the problem must be solved at the client end rather than at the server end.) In practice, I have never heard of an FTP client that does this. This means that an FTP client that is behind a firewall must always use passive FTP.
References:
|
[Feature Index]
editor@os2voice.org
[Previous Page] [Newsletter Index] [Next Page]
VOICE Home Page: http://www.os2voice.org