[Maarten Van Horenbeeck] [Information Security] [Resources]
IP Performance Tuning
Bandwidth Delay Product
Path MTU Discovery
Reliability, Performance and Security engineering are all integral parts of System Engineering. All of them also have an impact on the security posture of an organisation. A non-reliable or non-performant infrastructure will more often than not lead to an organisation leaving a bad impression on customers, competitors, or even employees, which are expected to enable and unlock the future of the organisation.
Bandwidth Delay Product [back]
BDP = bottleneck link bandwidth * RTT
The BDP is the maximum throughput possible through a certain IP channel. In order to calculate it, we need the "theoretical" maximum speed of the slowest link in the path, as well as the Round Trip Time (RTT) between both peers.
(Theorethical bandwith / 1 second) * (1 byte/8 bits) * (RTT/1000)
This translates in the easier formula:
Bandwidth (in bits/s) * RTT (in seconds)
The result is known as the BDP. If the BDP is very high, the default networking settings of most operating systems will not be sufficient to achieve high performance and throughput.
Large windows (RFC 1323) [back]
Described in RFC 1323, "Large Windows" needs to be enabled in order for Linux to accept a buffer size higher than 64 kbytes. These are enabled by default on every modern kernel..
TCP, by default, only supports a maximum buffer of 64 bytes. This is a useable buffer size for links where there is a very short RTT, but it is not useable on high speed links with a higher RTT. Here, the maximum buffer size should be increased to closely match the BDP. E.g., if you have a 10 Mbit link with a 150 ms average RTT, you should calculate your buffer size to be approximately 625kb. This would be a giant increase over the default of 64 bytes.
(10 000 000/8) * 0.15 = 625000
The actual buffer value is stored in the read and write memory windows. These can be accessed on Linux in the following way:
echo 1280000 > /proc/sys/net/core/rmem_default
echo 1280000 > /proc/sys/net/core/rmem_max
echo 1280000 > /proc/sys/net/core/wmem_default
echo 1280000 > /proc/sys/net/core/wmem_max
The value to use here is twice the value of the advised buffer size, thus (625 * 1024)*2 = 1280000.
Selective Acknowledgments (RFC2018) [back]
Selective Acknowledgments, aka SACK, allows the receiving party to acknowledge non-consecutive data. When a frame is missed, this allows the receiver to already accept a number of other frames, while only requesting one for retransmission. This way, it provides a great increase of speed on links with a high BDP.
echo 1 > /proc/sys/net/ipv4/tcp_sack
Path MTU Discovery [back]
Described in RFC 1191, Path MTU Discovery enables a host to automatically select the highest possible MTU for a specific network path. If it is disabled, most systems will default to an MTU (Maximum Transmission Unit) of 576 bytes, which is really low. As for each MTU size of data sent, there will be some overhead, this can increase performance in a way which is really felt. However, the calculation of the MTU can sometimes take a bit of time in the beginning of the connection. This calculation is done by the server sending a packet with the mtu set to 1500, waiting for a reply. If 1500 is an acceptable MTU for the entire path to the destination, the connection will establish succesfully. However, if, somewhere down the road, there is a link which only accepts a maximum MTU of e.g. 800, an ICMP type 3 code 4 (fragmentation needed but don't fragment bit set) will be sent to the source host. Continuing in this way, the highest useable MTU will iteratively be selected.
Path MTU Discovery is enabled by default on modern kernels, and can be disabled as follows:
echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
Unfortunately, Path MTU Discovery is not very well implemented on the entire internet. Quite a number of firewall administrators decide to block ICMP type 3 code 4, which are necessary for PMTU discovery to work. In such a case, the server with PMTU discovery enabled, will continue to send a packet with an mtu of 1500 bytes, and will never ever see the ICMP reply of the router in front of the link with a lower MTU. This results in the connection never establishing correctly. If you want more information on which ICMP messages to filter, and which to pass, visit my ICMP Filtering page.
Another problem, unrelated to filtering, which can occur, is when a provider uses RFC 1918 addresses for point to point links. This looks like a valid solution, as "free" IP addresses can be used to solve simple leased line links between sites. However, if traffic traverses such a link, and one of the routers on each site sends an ICMP error message, such as those used for PMTU, it could be sent with an RFC 1918 address as source. Due to the non-routeable nature these networks have on the internet, this would mean that those packets will never arrive.