In order to achieve the highest bandwidth from high-speed network devices, (40Gb and faster), it is highly likely that you will need to perform some tuning. The following steps were taken to tune the test system. These steps were performed on a system with 100Gb network devices and are provided only as an example.
Please Note: The following actions were all performed as the "root" user. Most or all of these steps will require you to have root access. This can be achieved either by running each command via "sudo" or by first switching to the root account completely by using "sudo su -
Additionally, not all tuning steps are supported on all systems or devices. For example, the NUMA steps only work on systems that support NUMA. Setting "MaxReadReq" comes directly from tuning suggestions for Mellanox 100Gb adapters and thus may not be supported on adapters from other vendors.
root@sys-6029u-trt:~# cat /sys/class/net/enp94s0f0/device/numa_node 0 root@sys-6029u-trt:~# cat /sys/class/net/enp94s0f1/device/numa_node 0
root@sys-6029u-trt:~# lscpu |grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71* this lets us set affinity to keep the iperf processes close to the CPU
root@sys-6029u-trt:~# grep -E '^cpu MHz' /proc/cpuinfo cpu MHz : 1000.000
for x in `seq 0 71`;do cpufreq-set -r -g performance -c $x done
grep -E '^cpu MHz' /proc/cpuinfo cpu MHz : 2301.000Note they should now be at or above the CPU max.
lspci -s 04:00.0 -vvv | grep Speed LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
lspci -s 04:00.0 -vvv | grep MaxReadReq MaxPayload 256 bytes, MaxReadReq 512 bytes setpci -s 04:00.0 68.w 2936
setpci -s 04:00.0 68.w=5936 lspci -s 04:00.0 -vvv | grep MaxReadReq MaxPayload 256 bytes, MaxReadReq 4096 bytes
root@sys-6029u-trt:~# sysctl net.core.rmem_max=563870912 net.core.rmem_max = 563870912 root@sys-6029u-trt:~# sysctl net.core.wmem_max=563870912 net.core.wmem_max = 563870912
root@sys-6029u-trt:~# sudo sysctl net.ipv4.tcp_rmem="4096 87380 268435456" net.ipv4.tcp_rmem = 4096 87380 268435456 root@sys-6029u-trt:~# sudo sysctl net.ipv4.tcp_wmem="4096 87380 268435456" net.ipv4.tcp_wmem = 4096 87380 268435456
root@sys-6029u-trt:~# sysctl net.core.netdev_max_backlog=300000 net.core.netdev_max_backlog = 300000
# sysctl net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_no_metrics_save = 1
# sysctl net.ipv4.tcp_congestion_control=htcp net.ipv4.tcp_congestion_control = htcp
# sysctl net.ipv4.tcp_mtu_probing=1 net.ipv4.tcp_mtu_probing = 1
# sysctl net.core.default_qdisc=fq net.core.default_qdisc = fq
# ethtool -K enp216s0f0 lro on # ethtool -K enp216s0f1 lro on
# ifconfig enp216s0f0 txqueuelen 20000 # ifconfig enp216s0f1 txqueuelen 20000
# ip link set enp216s0f0 mtu 9000 # ip link set enp216s0f1 mtu 9000
# ethtool -CNote, this only applies to Mellanox cards. See Issue #1241056 in the driver release notes.adaptive-rx off # ethtool -C rx-usecs 8 rx-frames 128
# systemctl stop irqbalance # systemctl status irqbalance |grep Active Active: inactive (dead) since Tue 2018-04-17 20:58:16 UTC; 23s ago
Note, the example below shows tests run over 60 seconds. Actual certification testing requires a test run of 1 hour per port. Thus for certification testing you would need to use "-t 3600" rather than "-t 60".
On the iperf target server, start 4 iperf3 daemons on different ports, pinned to NUMA Node 0 cores (see #2 above)
# iperf3 -sD -B 172.16.21.2 -p5101 -A0 # iperf3 -sD -B 172.16.21.2 -p5102 -A14 # iperf3 -sD -B 172.16.21.2 -p5103 -A36 # iperf3 -sD -B 172.16.21.2 -p5104 -A52Note we're using -A to ensure each process is on a CPU core on the same NUMA node that our 100Gb NIC is attached to. On the System Under Test, kick off four iperf3 processes, one for each remote port. Please note that this is for example only. It is easier and neater to perform this using the tool "parallel" as noted in the next section.
$ iperf3 -c 172.16.21.1 -O 15 -t 60 -p 5101 -R -i 60 -T s1 & iperf3 -c 172.16.21.1 -O 15 -t 60 -p 5102 -R -i 60 -T s2 & iperf3 -c 172.16.21.1 -O 15 -t 60 -p 5103 -R -i 60 -T s3 & iperf3 -c 172.16.21.1 -O 15 -t 60 -p 5104 -R -i 60 -T s4&This is abbreviated output
s4: [ ID] Interval Transfer Bandwidth Retr s4: [ 4] 0.00-60.00 sec 161 GBytes 23.1 Gbits/sec 18726 sender s4: [ 4] 0.00-60.00 sec 161 GBytes 23.1 Gbits/sec receiver s4: s4: iperf Done. s3: [ ID] Interval Transfer Bandwidth s3: [ 4] 0.00-60.00 sec 160 GBytes 22.9 Gbits/sec s3: - - - - - - - - - - - - - - - - - - - - - - - - - s3: [ ID] Interval Transfer Bandwidth Retr s3: [ 4] 0.00-60.00 sec 160 GBytes 22.9 Gbits/sec 16953 sender s3: [ 4] 0.00-60.00 sec 160 GBytes 22.9 Gbits/sec receiver s3: s3: iperf Done. s2: [ ID] Interval Transfer Bandwidth s2: [ 4] 0.00-60.00 sec 163 GBytes 23.3 Gbits/sec s2: - - - - - - - - - - - - - - - - - - - - - - - - - s2: [ ID] Interval Transfer Bandwidth Retr s2: [ 4] 0.00-60.00 sec 163 GBytes 23.3 Gbits/sec 17582 sender s2: [ 4] 0.00-60.00 sec 163 GBytes 23.3 Gbits/sec receiver s1: [ ID] Interval Transfer Bandwidth s2: s2: iperf Done. s1: [ 4] 0.00-60.00 sec 159 GBytes 22.7 Gbits/sec s1: - - - - - - - - - - - - - - - - - - - - - - - - - s1: [ ID] Interval Transfer Bandwidth Retr s1: [ 4] 0.00-60.00 sec 159 GBytes 22.7 Gbits/sec 17869 sender s1: [ 4] 0.00-60.00 sec 159 GBytes 22.7 Gbits/sec receiverThe average bandwidth over 60 seconds for all four threads adds up to 92Gb/s.
# sudo apt-get -y install parallel
# cat commands.txt iperf3 -c 172.16.21.1 -O 15 -t 30 -p 5101 -R -i 60 -T s1 iperf3 -c 172.16.21.1 -O 15 -t 30 -p 5102 -R -i 60 -T s2 iperf3 -c 172.16.21.1 -O 15 -t 30 -p 5103 -R -i 60 -T s3 iperf3 -c 172.16.21.1 -O 15 -t 30 -p 5104 -R -i 60 -T s4
# parallel -a commands.txt |tee -a 100Gb-Port0.log When using programs that use GNU Parallel to process data for publication please cite: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47. This helps funding further development; and it won't cost you a cent. Or you can get GNU Parallel without this requirement by paying 10000 EUR. To silence this citation notice run 'parallel --bibtex' once or use '--no-notice'. s1: Connecting to host 172.16.21.1, port 5101 s1: Reverse mode, remote host 172.16.21.1 is sending s1: [ 4] local 172.16.21.11 port 47762 connected to 172.16.21.1 port 5101 s1: [ ID] Interval Transfer Bandwidth s1: [ 4] 0.00-30.00 sec 74.4 GBytes 21.3 Gbits/sec s1: - - - - - - - - - - - - - - - - - - - - - - - - - s1: [ ID] Interval Transfer Bandwidth Retr s1: [ 4] 0.00-30.00 sec 74.5 GBytes 21.3 Gbits/sec 39793 sender s1: [ 4] 0.00-30.00 sec 74.4 GBytes 21.3 Gbits/sec receiver s1: s1: iperf Done. s2: Connecting to host 172.16.21.1, port 5102 s2: Reverse mode, remote host 172.16.21.1 is sending s2: [ 4] local 172.16.21.11 port 33354 connected to 172.16.21.1 port 5102 s2: [ ID] Interval Transfer Bandwidth s2: [ 4] 0.00-30.00 sec 79.6 GBytes 22.8 Gbits/sec s2: - - - - - - - - - - - - - - - - - - - - - - - - - s2: [ ID] Interval Transfer Bandwidth Retr s2: [ 4] 0.00-30.00 sec 79.7 GBytes 22.8 Gbits/sec 43638 sender s2: [ 4] 0.00-30.00 sec 79.6 GBytes 22.8 Gbits/sec receiver s2: s2: iperf Done. s3: Connecting to host 172.16.21.1, port 5103 s3: Reverse mode, remote host 172.16.21.1 is sending s3: [ 4] local 172.16.21.11 port 57094 connected to 172.16.21.1 port 5103 s3: [ ID] Interval Transfer Bandwidth s3: [ 4] 0.00-30.00 sec 75.3 GBytes 21.6 Gbits/sec s3: - - - - - - - - - - - - - - - - - - - - - - - - - s3: [ ID] Interval Transfer Bandwidth Retr s3: [ 4] 0.00-30.00 sec 75.4 GBytes 21.6 Gbits/sec 41230 sender s3: [ 4] 0.00-30.00 sec 75.3 GBytes 21.6 Gbits/sec receiver s3: s3: iperf Done. s4: Connecting to host 172.16.21.1, port 5104 s4: Reverse mode, remote host 172.16.21.1 is sending s4: [ 4] local 172.16.21.11 port 59674 connected to 172.16.21.1 port 5104 s4: [ ID] Interval Transfer Bandwidth s4: [ 4] 0.00-30.00 sec 75.7 GBytes 21.7 Gbits/sec s4: - - - - - - - - - - - - - - - - - - - - - - - - - s4: [ ID] Interval Transfer Bandwidth Retr s4: [ 4] 0.00-30.00 sec 75.8 GBytes 21.7 Gbits/sec 41177 sender s4: [ 4] 0.00-30.00 sec 75.7 GBytes 21.7 Gbits/sec receiver s4: s4: iperf Done.