Introduction
============
This document describes some of the Traffic Control features of the 
Linux 2.2 kernels. This document is not written by a person knowing a lot 
about the Linux kernel so some of the information and terminology presented 
here may be wrong. And they might be much more complicated than this text 
leaves the impression of. I hope the information presented here will be useful 
for many users any way.


What you need
=============
Traffic Control is new in the Linux 2.2 kernels, so you will need a 2.2 kernel 
or later to use it. And you will need to select the options in the section 
"QoS and/or fair queuing" when compiling the kernel. Furthermore, you will 
need the user space program called "tc" (Traffic Control). The program is 
located in the iproute2+tc packet available from:

     ftp://ftp.inr.ac.ru/ip-routing


The big picture
===============
Each NIC (Network Interface Controller = network card) supported on your Linux 
box is supported by a Network Driver which controls the hardware. Basically, 
two things can be done with such network drivers:

1) The Linux Networking Code can request the network driver to send a 
   packet on the physical network.
2) The network driver can give packets it has received on the physical 
   network to the Linux Networking Code.   

The Traffic Control features deals with the first of these only. That is, it 
deals with the transmission of packets from your machine. Traditionally, 
packets sent form your machine have traveled the following way:

 +------------+    +------------+
 | Linux      |    | Network    |
 | Networking | -> | Driver     | ->  <physical wires>
 | Code       |    |            |
 +------------+    +------------+

Note that the packets from the Networking Code could have been generated in 
several different ways. They could for example have been made on a request 
from some application (Netscape, for example) running on your box. If your 
machine is acting as a firewall, router or an ethernet bridge, they could also 
have been read on one network interface and the put out on another interface 
by the Networking Code.

With Traffic Control an extra box is inserted in this picture:

 +------------+    +------------+    +------------+
 | Linux      |    | Traffic    |    | Network    |
 | Networking | -> | Control    | -> | Driver     |  -> <physical wires>
 | Code       |    |            |    |            |
 +------------+    +------------+    +------------+

An important thing to understand is that basically only the following can be
done with the Traffic Control box:

1) The Linux Networking Code can give a packet to the box.
2) The Network Driver can request a packet from the box.

The properties of the Traffic Control box in a given setup is which packets it 
decides to give to the Network Driver, in which order and in which speed.


Queuing disciplines
===================
When the Linux kernel starts up there is no Traffic Control box and the picture 
looks as on the first drawing above.

But, suppose we want to insert a so-called FIFO queue of 10 packets in the box.
The properties of such a queue is:

1) Packets are given in the same order to the Network driver as they were
   coming from the Networking Code.
2) Up to 10 packets can be stored in the queue.

You can think of a FIFO queue as a queue of humans in which there is only room 
for 10 people. We are just queuing packets instead of humans. Note that it is 
not very useful to setup up such a queue since the Network Driver itself has 
a queue of maybe 100 packets.

To setup such a FIFO queue for the Traffic Control box on eth0 use the command:

  tc qdisc add dev eth0 root pfifo limit 10

The FIFO queue is an example of a so-called qdisc, which is an abbreviation for 
queuing discipline. Queuing disciplines are the most fundamental concept in 
the Traffic Control functions of Linux. A qdisc has the same properties as the 
Traffic Control box, that is:

1) You can give a packet to it (this is called enqueuing a packet)
2) You can request a packet from it (this is called dequeuing a packet)

So in fact, the Traffic Control box can be viewed as a socket into which you 
can plug an arbitrary queuing discipline. The root keyword in the command 
above tell that the pfifo queue should be put into this socket.

Now try to type:

  tc qdisc ls dev eth0
  
You will see a line describing what you have just created:

  qdisc pfifo 8001: dev eth0 limit 10p  

This illustrates another aspect of qdiscs: All qdiscs are given a handle, in 
this case the handle "8001:" (or "8001:0").

You can specify a handle when you create a qdisc. If you don't the system 
will find one, which has happened here. The handle of a qdisc is a four digit 
hexadecimal number followed by a colon.

Finally, you can delete the qdisc again with the command:

  tc qdisc del dev eth0 root
  
  
The bfifo, sfq and tbf queuing disciplines
==========================================
There exits many other qdiscs in the Linux Traffic Control code than the 
pfifo queuing discipline.

The bfifo queue is like pfifo, but instead of containing a limited number 
of packets it will at most contain a limited number of bytes.
     
The sfq (Stochastic Fairness Queue) is more advanced. As far as I have 
understood, it divides the packets into so called flows. That is, if you open 
two different TCP connections packets from these two connections will probably 
be considered as belonging to two different flows. It then tries to distribute 
bandwidth equally between the different flows.

The tbf queue can be used to limit the bandwidth. It is not possible to take 
packets from tbf queue at a speed greater than one you specify. For example, 
if you don't want packets to be send with a greater speed than 128Kbit/s, you 
can use a tbf queue.

If you want to attached one of these qdiscs to your network interface you can 
try with the command:

  tc qdisc add <name> help
  
This will print a short description of the extra parameters the queue <name> 
takes. The queue can then be installed as in the pfifo example above. Note 
that you need to delete any existing qdiscs before you can create a new one 
in the way it is described above.


Classes
=======
The basic building block of the Traffic Control system is qdiscs as described 
above. The queuing disciplines mentioned in the previous section are 
comparatively simple to deal with. That is, they are setup and maybe given 
some parameters. Afterwards packets can be enqueued to them and dequeued from 
them as described.

But many queuing are of a different nature. These qdiscs do not store packets
themselves. Instead, they contain other qdiscs, which they give packets to and 
take packets from. Such qdiscs are known as qdiscs with classes.

For example, one could imagine a priority-based queuing discipline with the 
following properties:

1) Each packet enqueued to the queuing discipline is assigned a priority. For
   example the priority could be deduced from the source or destination IP
   address of the packet. Let us say that the priority is a number 
   between 1 and 5.
2) When a packet is dequeued it will always select a packet it contains with
   the lowest priority number.
   
A way to implement such a queuing discipline is to make the priority-based 
queuing discipline contain 5 other queuing disciplines numbered from 1 to 5. 
The priority-based queuing discipline will then do the following:

1) When a packet is enqueued, it calculates the priority number, i.e. a number 
   between 1 and 5. It then enqueues the packet to the queuing discipline 
   indicated by this number
2) When a packet is dequeued it always dequeues from the non-empty queuing 
   discipline with the lowest number.
   
What is interesting about this, is that the 5 contained queuing disciplines 
could be arbitrary queuing disciplines. For example sfq queues or any other
queue.

In Linux this concept is handled by classes. That is, a queuing discipline 
might contain classes. In this example, the priority queuing discipline has 5 
classes. Each class can be viewed as a socket to which you can plug in any 
other queuing discipline. When a qdisc with classes is created, it will 
typically assign simple FIFO queues to the classes it contains. But these can 
be replaced with other qdiscs by the tc program.

There is a qdisc call prio, which I believe does exactly what is described 
here. I have not used it myself, however.   

If a qdisc has the handle "8001:" it's classes is given handles of the form 
"8001:wxyz" where xwyz if a non-zero hexadecimal number. So in this example 
the prio qdisc will maybe have the handle "8001:" and it's classes the 
handles "8001:1" to "8001:5". To insert an SFQ queue in the socket of the 
class "8001:2" you can use the command:

  tc qdisc add dev eth0 parent 8001:2 sfq


Filters
=======
The last main concept that needs to be explained is filters. In the example 
with the priory qdisc above, it was said that the qdisc could select a 
priority depending on for example the IP addresses of the packets. The job of 
the filters is exactly to map packets to classes. That is, if a qdisc contain 
classes, you will typically be able also assign a filter that qdisc. Each time 
a packet is enqueued to the qdisc, the qdisc will ask the filter to which class
this packet should go. The qdisc will then enqueue the packet to the qdisc 
plugged into that class.

So what typically make the qdiscs with classes different is from each other
is which class they decide to dequeue a packet when they are asked to dequeue.

The author of this document has not experimented with filters. But as far as 
he can figure out some of the qdisc (cbq and prio) seems to be able both to 
handle filters as described here AND to use some kind of build-in filter.


Other material
==============
Look at:

  http://qos.ittc.ukans.edu
  
and at

  http://www.ds9a.nl/2.4Routing/  

For more information.


--------------------------------------------------------------------
This document was written by Christian Worm Mortensen, cworm@it-c.dk
