[net-perf] bad performance w/o jumbo frames over large BDPs
Jim Madden
jmadden at ucsd.edu
Tue Oct 17 23:20:56 PDT 2006
You've probably long since eliminated the possibility that MSS
negotiation from your local machine to off campus machines works and
produces something around 1500 bytes so that there's no packet
fragmentation going on? I don't see any indication of such a
negotiation in the slow transmit test.
Jim Madden
At 10:34 PM -0700 10/17/06, Michael Van Norman wrote:
>I'm getting about a 6x increase in sending by turning off tcp
>segmentation offload. This was theoretically fixed in the 2.6.12
>kernel, but I'm running 2.6.17.6-web100, so maybe not. I am still
>not getting the performance levels I should, but it is much better
>with tso off.
>
>/Mike
>
>Michael Sinatra wrote:
>>We're making progress at Berkeley building a performance measurement and
>>characterization infrastructure. The measurement nodes connect to
>>backbone routers, and all of the measurement nodes (and the paths
>>between them) are jumbo-frame-capable. Recently, I set up an ad-hoc
>>active measurement system on a user network that is not
>>jumbo-frame-capable. While the performance is very good between this
>>net and our other campus measurement nodes (including the node at our
>>border), performance is very poor between this host and nodes that are
>>farther away. This performance problem only exists when sending from
>>the host in question to another node; receive performance is very good.
>>
>>Here's an example:
>>
>>drl10 ~ # /home/piPEs/bwctl/bin/bwctl -s nms1-chin.abilene.ucaid.edu -w
>>8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
>>bwctl: 34 seconds until test results available
>>
>>RECEIVER START
>>3370091618.641665: /usr/bin/iperf -B 169.229.144.125 -P 1 -s -f b -m -p
>>5001 -w 8388608 -t 10 -i 2
>>------------------------------------------------------------
>>Server listening on TCP port 5001
>>Binding to local address 169.229.144.125
>>TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
>>------------------------------------------------------------
>>[ 14] local 169.229.144.125 port 5001 connected with 198.32.8.162 port 59750
>>[ 14] 0.0- 2.0 sec 116840032 Bytes 467360128 bits/sec
>>[ 14] 2.0- 4.0 sec 211912536 Bytes 847650144 bits/sec
>>[ 14] 4.0- 6.0 sec 213602576 Bytes 854410304 bits/sec
>>[ 14] 6.0- 8.0 sec 210712912 Bytes 842851648 bits/sec
>>[ 14] 8.0-10.0 sec 212375576 Bytes 849502304 bits/sec
>>[ 14] 0.0-10.0 sec 968015872 Bytes 772489277 bits/sec
>>[ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>>
>>RECEIVER END
>>drl10 ~ # /home/piPEs/bwctl/bin/bwctl -c nms1-chin.abilene.ucaid.edu -w
>>8m -i 2 -L 1800 -A A AESKEY ucb /home/piPEs/bwctl/etc/bwctld.keys
>>bwctl: 38 seconds until test results available
>>
>>RECEIVER START
>>3370091715.986859: /ami/bin/iperf -B 198.32.8.162 -P 1 -s -f b -m -p
>>5001 -w 8388608 -t 10 -i 2
>>------------------------------------------------------------
>>Server listening on TCP port 5001
>>Binding to local address 198.32.8.162
>>TCP window size: 16777216 Byte (WARNING: requested 8388608 Byte)
>>------------------------------------------------------------
>>[ 15] local 198.32.8.162 port 5001 connected with 169.229.144.125 port 5001
>>[ 15] 0.0- 2.0 sec 4023992 Bytes 16095968 bits/sec
>>[ 15] 2.0- 4.0 sec 12415152 Bytes 49660608 bits/sec
>>[ 15] 4.0- 6.0 sec 15610888 Bytes 62443552 bits/sec
>>[ 15] 6.0- 8.0 sec 15926552 Bytes 63706208 bits/sec
>>[ 15] 8.0-10.0 sec 16485480 Bytes 65941920 bits/sec
>>
>>RECEIVER END
>>
>>
>>As you can see we peak at 850+mb/s from the Abilene Chicago node to this
>>hosts, but we only get 65+mb/s from our host to Abilene-Chicago.
>>Performance on a gig-connected host is actually worse than what is
>>achieved on a host that is only connected to a 100mb/s interface.
>>
>>I checked on more than one of our stationery measurement nodes,
>>switching between jumbo and non-jumbo frames. I can replicate the
>>performance issue with non-jumbo frames on each of these nodes. Again,
>>it appears to manifest itself only when the BD product gets above a
>>certain threshold. Performance is still fine in both directions with
>>jumbo frames. This is the case both with FreeBSD 6-STABLE, 7-CURRENT
>>(both csup'ed a few days ago) and Gentoo Linux (vanilla kernel
>>2.6.18-web100, with very few other modifications). The platform is
>>amd64--I haven't been able to compare with i386 yet.
>>
>>I feel like I must be doing something wrong, especially since the
>>Abilene nodes are clearly able to send traffic to me at near-line-rate
>>using an MSS of only 1460, sending to our non-jumbo-enabled host.
>>
>>Here are my sysctl variable changes.
>>
>>In Linux:
>>
>># increase TCP maximum buffer size
>>net.core.rmem_max = 16777216
>>net.core.wmem_max = 16777216
>>
>># increase Linux autotuning TCP buffer limits
>># min, default, and maximum number of bytes to use
>>net.ipv4.tcp_rmem = 4096 87380 16777216
>>net.ipv4.tcp_wmem = 4096 65536 16777216
>># don't cache ssthresh from previous connection
>>net.ipv4.tcp_no_metrics_save = 1
>># recommended to increase this for 1000 BT or higher
>>net.core.netdev_max_backlog = 2500
>># for 10 GigE, use this
>>#net.core.netdev_max_backlog = 30000
>>
>>In FreeBSD:
>>
>>kern.ipc.maxsockbuf=16777216
>>net.inet.tcp.sendspace=8388608
>>net.inet.tcp.recvspace=8388608
>>net.inet.tcp.inflight.enable=0 # <-- doesn't seem to have any effect
>> # either way
>>
>>I am going to try configuring the kernel for the advanced TCP congestion
>>stuff that Linux has built in and see how that goes. I will also set up
>>to capture a packet trace, so that I can better figure out what is going on.
>>
>>I am hoping that Jeff Boote and/or Eric Boyd might be able to shed some
>>light on how the Abilene nodes are configured so as to get the
>>performance they do.
>>
>>And if anyone else can point out the folly of my ways, let me know...
>>
>>thanks,
>>michael
>>
>>
>>_______________________________________________
>>Network-performance mailing list
>>Network-performance at lists.cenic.org
>>http://lists.cenic.org/mailman/listinfo/network-performance
>_______________________________________________
>Network-performance mailing list
>Network-performance at lists.cenic.org
>http://lists.cenic.org/mailman/listinfo/network-performance
More information about the Network-performance
mailing list