[net-perf] bad performance w/o jumbo frames over large BDPs
Michael Sinatra
michael at rancid.berkeley.edu
Tue Oct 17 14:44:56 PDT 2006
We're making progress at Berkeley building a performance measurement and
characterization infrastructure. The measurement nodes connect to
backbone routers, and all of the measurement nodes (and the paths
between them) are jumbo-frame-capable. Recently, I set up an ad-hoc
active measurement system on a user network that is not
jumbo-frame-capable. While the performance is very good between this
net and our other campus measurement nodes (including the node at our
border), performance is very poor between this host and nodes that are
farther away. This performance problem only exists when sending from
the host in question to another node; receive performance is very good.
Here's an example:
drl10 ~ # /home/piPEs/bwctl/bin/bwctl -s nms1-chin.abilene.ucaid.edu -w
8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
bwctl: 34 seconds until test results available
RECEIVER START
3370091618.641665: /usr/bin/iperf -B 169.229.144.125 -P 1 -s -f b -m -p
5001 -w 8388608 -t 10 -i 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 169.229.144.125
TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
------------------------------------------------------------
[ 14] local 169.229.144.125 port 5001 connected with 198.32.8.162 port 59750
[ 14] 0.0- 2.0 sec 116840032 Bytes 467360128 bits/sec
[ 14] 2.0- 4.0 sec 211912536 Bytes 847650144 bits/sec
[ 14] 4.0- 6.0 sec 213602576 Bytes 854410304 bits/sec
[ 14] 6.0- 8.0 sec 210712912 Bytes 842851648 bits/sec
[ 14] 8.0-10.0 sec 212375576 Bytes 849502304 bits/sec
[ 14] 0.0-10.0 sec 968015872 Bytes 772489277 bits/sec
[ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
RECEIVER END
drl10 ~ # /home/piPEs/bwctl/bin/bwctl -c nms1-chin.abilene.ucaid.edu -w
8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
bwctl: 38 seconds until test results available
RECEIVER START
3370091715.986859: /ami/bin/iperf -B 198.32.8.162 -P 1 -s -f b -m -p
5001 -w 8388608 -t 10 -i 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 198.32.8.162
TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
------------------------------------------------------------
[ 15] local 198.32.8.162 port 5001 connected with 169.229.144.125 port 5001
[ 15] 0.0- 2.0 sec 4023992 Bytes 16095968 bits/sec
[ 15] 2.0- 4.0 sec 12415152 Bytes 49660608 bits/sec
[ 15] 4.0- 6.0 sec 15610888 Bytes 62443552 bits/sec
[ 15] 6.0- 8.0 sec 15926552 Bytes 63706208 bits/sec
[ 15] 8.0-10.0 sec 16485480 Bytes 65941920 bits/sec
RECEIVER END
As you can see we peak at 850+mb/s from the Abilene Chicago node to this
hosts, but we only get 65+mb/s from our host to Abilene-Chicago.
Performance on a gig-connected host is actually worse than what is
achieved on a host that is only connected to a 100mb/s interface.
I checked on more than one of our stationery measurement nodes,
switching between jumbo and non-jumbo frames. I can replicate the
performance issue with non-jumbo frames on each of these nodes. Again,
it appears to manifest itself only when the BD product gets above a
certain threshold. Performance is still fine in both directions with
jumbo frames. This is the case both with FreeBSD 6-STABLE, 7-CURRENT
(both csup'ed a few days ago) and Gentoo Linux (vanilla kernel
2.6.18-web100, with very few other modifications). The platform is
amd64--I haven't been able to compare with i386 yet.
I feel like I must be doing something wrong, especially since the
Abilene nodes are clearly able to send traffic to me at near-line-rate
using an MSS of only 1460, sending to our non-jumbo-enabled host.
Here are my sysctl variable changes.
In Linux:
# increase TCP maximum buffer size
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# increase Linux autotuning TCP buffer limits
# min, default, and maximum number of bytes to use
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# recommended to increase this for 1000 BT or higher
net.core.netdev_max_backlog = 2500
# for 10 GigE, use this
#net.core.netdev_max_backlog = 30000
In FreeBSD:
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendspace=8388608
net.inet.tcp.recvspace=8388608
net.inet.tcp.inflight.enable=0 # <-- doesn't seem to have any effect
# either way
I am going to try configuring the kernel for the advanced TCP congestion
stuff that Linux has built in and see how that goes. I will also set up
to capture a packet trace, so that I can better figure out what is going on.
I am hoping that Jeff Boote and/or Eric Boyd might be able to shed some
light on how the Abilene nodes are configured so as to get the
performance they do.
And if anyone else can point out the folly of my ways, let me know...
thanks,
michael
More information about the Network-performance
mailing list