[net-perf] bad performance w/o jumbo frames over large BDPs
Michael Van Norman
mvn at ucla.edu
Tue Oct 17 22:34:00 PDT 2006
I'm getting about a 6x increase in sending by turning off tcp
segmentation offload. This was theoretically fixed in the 2.6.12
kernel, but I'm running 2.6.17.6-web100, so maybe not. I am still not
getting the performance levels I should, but it is much better with tso off.
/Mike
Michael Sinatra wrote:
> We're making progress at Berkeley building a performance measurement and
> characterization infrastructure. The measurement nodes connect to
> backbone routers, and all of the measurement nodes (and the paths
> between them) are jumbo-frame-capable. Recently, I set up an ad-hoc
> active measurement system on a user network that is not
> jumbo-frame-capable. While the performance is very good between this
> net and our other campus measurement nodes (including the node at our
> border), performance is very poor between this host and nodes that are
> farther away. This performance problem only exists when sending from
> the host in question to another node; receive performance is very good.
>
> Here's an example:
>
> drl10 ~ # /home/piPEs/bwctl/bin/bwctl -s nms1-chin.abilene.ucaid.edu -w
> 8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
> bwctl: 34 seconds until test results available
>
> RECEIVER START
> 3370091618.641665: /usr/bin/iperf -B 169.229.144.125 -P 1 -s -f b -m -p
> 5001 -w 8388608 -t 10 -i 2
> ------------------------------------------------------------
> Server listening on TCP port 5001
> Binding to local address 169.229.144.125
> TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
> ------------------------------------------------------------
> [ 14] local 169.229.144.125 port 5001 connected with 198.32.8.162 port
> 59750
> [ 14] 0.0- 2.0 sec 116840032 Bytes 467360128 bits/sec
> [ 14] 2.0- 4.0 sec 211912536 Bytes 847650144 bits/sec
> [ 14] 4.0- 6.0 sec 213602576 Bytes 854410304 bits/sec
> [ 14] 6.0- 8.0 sec 210712912 Bytes 842851648 bits/sec
> [ 14] 8.0-10.0 sec 212375576 Bytes 849502304 bits/sec
> [ 14] 0.0-10.0 sec 968015872 Bytes 772489277 bits/sec
> [ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
> RECEIVER END
> drl10 ~ # /home/piPEs/bwctl/bin/bwctl -c nms1-chin.abilene.ucaid.edu -w
> 8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
> bwctl: 38 seconds until test results available
>
> RECEIVER START
> 3370091715.986859: /ami/bin/iperf -B 198.32.8.162 -P 1 -s -f b -m -p
> 5001 -w 8388608 -t 10 -i 2
> ------------------------------------------------------------
> Server listening on TCP port 5001
> Binding to local address 198.32.8.162
> TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
> ------------------------------------------------------------
> [ 15] local 198.32.8.162 port 5001 connected with 169.229.144.125 port 5001
> [ 15] 0.0- 2.0 sec 4023992 Bytes 16095968 bits/sec
> [ 15] 2.0- 4.0 sec 12415152 Bytes 49660608 bits/sec
> [ 15] 4.0- 6.0 sec 15610888 Bytes 62443552 bits/sec
> [ 15] 6.0- 8.0 sec 15926552 Bytes 63706208 bits/sec
> [ 15] 8.0-10.0 sec 16485480 Bytes 65941920 bits/sec
>
> RECEIVER END
>
>
> As you can see we peak at 850+mb/s from the Abilene Chicago node to this
> hosts, but we only get 65+mb/s from our host to Abilene-Chicago.
> Performance on a gig-connected host is actually worse than what is
> achieved on a host that is only connected to a 100mb/s interface.
>
> I checked on more than one of our stationery measurement nodes,
> switching between jumbo and non-jumbo frames. I can replicate the
> performance issue with non-jumbo frames on each of these nodes. Again,
> it appears to manifest itself only when the BD product gets above a
> certain threshold. Performance is still fine in both directions with
> jumbo frames. This is the case both with FreeBSD 6-STABLE, 7-CURRENT
> (both csup'ed a few days ago) and Gentoo Linux (vanilla kernel
> 2.6.18-web100, with very few other modifications). The platform is
> amd64--I haven't been able to compare with i386 yet.
>
> I feel like I must be doing something wrong, especially since the
> Abilene nodes are clearly able to send traffic to me at near-line-rate
> using an MSS of only 1460, sending to our non-jumbo-enabled host.
>
> Here are my sysctl variable changes.
>
> In Linux:
>
> # increase TCP maximum buffer size
> net.core.rmem_max = 16777216
> net.core.wmem_max = 16777216
>
> # increase Linux autotuning TCP buffer limits
> # min, default, and maximum number of bytes to use
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
> # don't cache ssthresh from previous connection
> net.ipv4.tcp_no_metrics_save = 1
> # recommended to increase this for 1000 BT or higher
> net.core.netdev_max_backlog = 2500
> # for 10 GigE, use this
> #net.core.netdev_max_backlog = 30000
>
> In FreeBSD:
>
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.sendspace=8388608
> net.inet.tcp.recvspace=8388608
> net.inet.tcp.inflight.enable=0 # <-- doesn't seem to have any effect
> # either way
>
> I am going to try configuring the kernel for the advanced TCP congestion
> stuff that Linux has built in and see how that goes. I will also set up
> to capture a packet trace, so that I can better figure out what is going
> on.
>
> I am hoping that Jeff Boote and/or Eric Boyd might be able to shed some
> light on how the Abilene nodes are configured so as to get the
> performance they do.
>
> And if anyone else can point out the folly of my ways, let me know...
>
> thanks,
> michael
>
>
> _______________________________________________
> Network-performance mailing list
> Network-performance at lists.cenic.org
> http://lists.cenic.org/mailman/listinfo/network-performance
More information about the Network-performance
mailing list