[net-perf] bad performance w/o jumbo frames over large BDPs

Michael Van Norman mvn at ucla.edu
Tue Oct 17 22:34:00 PDT 2006


I'm getting about a 6x increase in sending by turning off tcp 
segmentation offload.  This was theoretically fixed in the 2.6.12 
kernel, but I'm running  2.6.17.6-web100, so maybe not.  I am still not 
getting the performance levels I should, but it is much better with tso off.

/Mike

Michael Sinatra wrote:
> We're making progress at Berkeley building a performance measurement and
> characterization infrastructure.  The measurement nodes connect to
> backbone routers, and all of the measurement nodes (and the paths
> between them) are jumbo-frame-capable.  Recently, I set up an ad-hoc
> active measurement system on a user network that is not
> jumbo-frame-capable.  While the performance is very good between this
> net and our other campus measurement nodes (including the node at our
> border), performance is very poor between this host and nodes that are
> farther away.  This performance problem only exists when sending from
> the host in question to another node; receive performance is very good.
> 
> Here's an example:
> 
> drl10 ~ # /home/piPEs/bwctl/bin/bwctl -s nms1-chin.abilene.ucaid.edu -w
> 8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
> bwctl: 34 seconds until test results available
> 
> RECEIVER START
> 3370091618.641665: /usr/bin/iperf -B 169.229.144.125 -P 1 -s -f b -m -p
> 5001 -w 8388608 -t 10 -i 2
> ------------------------------------------------------------
> Server listening on TCP port 5001
> Binding to local address 169.229.144.125
> TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
> ------------------------------------------------------------
> [ 14] local 169.229.144.125 port 5001 connected with 198.32.8.162 port 
> 59750
> [ 14]  0.0- 2.0 sec  116840032 Bytes  467360128 bits/sec
> [ 14]  2.0- 4.0 sec  211912536 Bytes  847650144 bits/sec
> [ 14]  4.0- 6.0 sec  213602576 Bytes  854410304 bits/sec
> [ 14]  6.0- 8.0 sec  210712912 Bytes  842851648 bits/sec
> [ 14]  8.0-10.0 sec  212375576 Bytes  849502304 bits/sec
> [ 14]  0.0-10.0 sec  968015872 Bytes  772489277 bits/sec
> [ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> 
> RECEIVER END
> drl10 ~ # /home/piPEs/bwctl/bin/bwctl -c nms1-chin.abilene.ucaid.edu -w
> 8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
> bwctl: 38 seconds until test results available
> 
> RECEIVER START
> 3370091715.986859: /ami/bin/iperf -B 198.32.8.162 -P 1 -s -f b -m -p
> 5001 -w 8388608 -t 10 -i 2
> ------------------------------------------------------------
> Server listening on TCP port 5001
> Binding to local address 198.32.8.162
> TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
> ------------------------------------------------------------
> [ 15] local 198.32.8.162 port 5001 connected with 169.229.144.125 port 5001
> [ 15]  0.0- 2.0 sec  4023992 Bytes  16095968 bits/sec
> [ 15]  2.0- 4.0 sec  12415152 Bytes  49660608 bits/sec
> [ 15]  4.0- 6.0 sec  15610888 Bytes  62443552 bits/sec
> [ 15]  6.0- 8.0 sec  15926552 Bytes  63706208 bits/sec
> [ 15]  8.0-10.0 sec  16485480 Bytes  65941920 bits/sec
> 
> RECEIVER END
> 
> 
> As you can see we peak at 850+mb/s from the Abilene Chicago node to this
> hosts, but we only get 65+mb/s from our host to Abilene-Chicago.
> Performance on a gig-connected host is actually worse than what is
> achieved on a host that is only connected to a 100mb/s interface.
> 
> I checked on more than one of our stationery measurement nodes,
> switching between jumbo and non-jumbo frames.  I can replicate the
> performance issue with non-jumbo frames on each of these nodes.  Again,
> it appears to manifest itself only when the BD product gets above a
> certain threshold.  Performance is still fine in both directions with
> jumbo frames.  This is the case both with FreeBSD 6-STABLE, 7-CURRENT
> (both csup'ed a few days ago) and Gentoo Linux (vanilla kernel
> 2.6.18-web100, with very few other modifications).  The platform is
> amd64--I haven't been able to compare with i386 yet.
> 
> I feel like I must be doing something wrong, especially since the
> Abilene nodes are clearly able to send traffic to me at near-line-rate
> using an MSS of only 1460, sending to our non-jumbo-enabled host.
> 
> Here are my sysctl variable changes.
> 
> In Linux:
> 
> # increase TCP maximum buffer size
> net.core.rmem_max = 16777216
> net.core.wmem_max = 16777216
> 
> # increase Linux autotuning TCP buffer limits
> # min, default, and maximum number of bytes to use
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
> # don't cache ssthresh from previous connection
> net.ipv4.tcp_no_metrics_save = 1
> # recommended to increase this for 1000 BT or higher
> net.core.netdev_max_backlog = 2500
> # for 10 GigE, use this
> #net.core.netdev_max_backlog = 30000
> 
> In FreeBSD:
> 
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.sendspace=8388608
> net.inet.tcp.recvspace=8388608
> net.inet.tcp.inflight.enable=0 # <-- doesn't seem to have any effect
>                                # either way
> 
> I am going to try configuring the kernel for the advanced TCP congestion
> stuff that Linux has built in and see how that goes.  I will also set up
> to capture a packet trace, so that I can better figure out what is going 
> on.
> 
> I am hoping that Jeff Boote and/or Eric Boyd might be able to shed some
> light on how the Abilene nodes are configured so as to get the
> performance they do.
> 
> And if anyone else can point out the folly of my ways, let me know...
> 
> thanks,
> michael
> 
> 
> _______________________________________________
> Network-performance mailing list
> Network-performance at lists.cenic.org
> http://lists.cenic.org/mailman/listinfo/network-performance


More information about the Network-performance mailing list