[net-perf] bad performance w/o jumbo frames over large BDPs

Michael Sinatra michael at rancid.berkeley.edu
Tue Oct 17 14:44:56 PDT 2006


We're making progress at Berkeley building a performance measurement and
characterization infrastructure.  The measurement nodes connect to
backbone routers, and all of the measurement nodes (and the paths
between them) are jumbo-frame-capable.  Recently, I set up an ad-hoc
active measurement system on a user network that is not
jumbo-frame-capable.  While the performance is very good between this
net and our other campus measurement nodes (including the node at our
border), performance is very poor between this host and nodes that are
farther away.  This performance problem only exists when sending from
the host in question to another node; receive performance is very good.

Here's an example:

drl10 ~ # /home/piPEs/bwctl/bin/bwctl -s nms1-chin.abilene.ucaid.edu -w
8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
bwctl: 34 seconds until test results available

RECEIVER START
3370091618.641665: /usr/bin/iperf -B 169.229.144.125 -P 1 -s -f b -m -p
5001 -w 8388608 -t 10 -i 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 169.229.144.125
TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
------------------------------------------------------------
[ 14] local 169.229.144.125 port 5001 connected with 198.32.8.162 port 59750
[ 14]  0.0- 2.0 sec  116840032 Bytes  467360128 bits/sec
[ 14]  2.0- 4.0 sec  211912536 Bytes  847650144 bits/sec
[ 14]  4.0- 6.0 sec  213602576 Bytes  854410304 bits/sec
[ 14]  6.0- 8.0 sec  210712912 Bytes  842851648 bits/sec
[ 14]  8.0-10.0 sec  212375576 Bytes  849502304 bits/sec
[ 14]  0.0-10.0 sec  968015872 Bytes  772489277 bits/sec
[ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

RECEIVER END
drl10 ~ # /home/piPEs/bwctl/bin/bwctl -c nms1-chin.abilene.ucaid.edu -w
8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
bwctl: 38 seconds until test results available

RECEIVER START
3370091715.986859: /ami/bin/iperf -B 198.32.8.162 -P 1 -s -f b -m -p
5001 -w 8388608 -t 10 -i 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 198.32.8.162
TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
------------------------------------------------------------
[ 15] local 198.32.8.162 port 5001 connected with 169.229.144.125 port 5001
[ 15]  0.0- 2.0 sec  4023992 Bytes  16095968 bits/sec
[ 15]  2.0- 4.0 sec  12415152 Bytes  49660608 bits/sec
[ 15]  4.0- 6.0 sec  15610888 Bytes  62443552 bits/sec
[ 15]  6.0- 8.0 sec  15926552 Bytes  63706208 bits/sec
[ 15]  8.0-10.0 sec  16485480 Bytes  65941920 bits/sec

RECEIVER END


As you can see we peak at 850+mb/s from the Abilene Chicago node to this
hosts, but we only get 65+mb/s from our host to Abilene-Chicago.
Performance on a gig-connected host is actually worse than what is
achieved on a host that is only connected to a 100mb/s interface.

I checked on more than one of our stationery measurement nodes,
switching between jumbo and non-jumbo frames.  I can replicate the
performance issue with non-jumbo frames on each of these nodes.  Again,
it appears to manifest itself only when the BD product gets above a
certain threshold.  Performance is still fine in both directions with
jumbo frames.  This is the case both with FreeBSD 6-STABLE, 7-CURRENT
(both csup'ed a few days ago) and Gentoo Linux (vanilla kernel
2.6.18-web100, with very few other modifications).  The platform is
amd64--I haven't been able to compare with i386 yet.

I feel like I must be doing something wrong, especially since the
Abilene nodes are clearly able to send traffic to me at near-line-rate
using an MSS of only 1460, sending to our non-jumbo-enabled host.

Here are my sysctl variable changes.

In Linux:

# increase TCP maximum buffer size
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# increase Linux autotuning TCP buffer limits
# min, default, and maximum number of bytes to use
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# recommended to increase this for 1000 BT or higher
net.core.netdev_max_backlog = 2500
# for 10 GigE, use this
#net.core.netdev_max_backlog = 30000

In FreeBSD:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendspace=8388608
net.inet.tcp.recvspace=8388608
net.inet.tcp.inflight.enable=0 # <-- doesn't seem to have any effect
                                # either way

I am going to try configuring the kernel for the advanced TCP congestion
stuff that Linux has built in and see how that goes.  I will also set up
to capture a packet trace, so that I can better figure out what is going on.

I am hoping that Jeff Boote and/or Eric Boyd might be able to shed some
light on how the Abilene nodes are configured so as to get the
performance they do.

And if anyone else can point out the folly of my ways, let me know...

thanks,
michael




More information about the Network-performance mailing list