03 Jul, 2019

1 commit

  • Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth
    Manager). Includes a new BPF program supporting EDT, and also updates
    corresponding programs.

    It will drop packets with an EDT of more than 500us in the future
    unless the packet belongs to a flow with less than 2 packets in flight.
    This is done so each flow has at least 2 packets in flight, so they
    will not starve, and also to help prevent delayed ACK timeouts.

    It will also work with ECN enabled traffic, where the packets will be
    CE marked if their EDT is more than 50us in the future.

    The table below shows some performance numbers. The flows are back to
    back RPCS. One server sending to another, either 2 or 4 flows.
    One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more
    than one flow of a given RPC size, the numbers represent averages.

    The rate limit applies to all flows (they are in the same cgroup).
    Tests ending with "-edt" ran with the new BPF program supporting EDT.
    Tests ending with "-hbt" ran on top HBT qdisc with the specified rate
    (i.e. no HBM). The other tests ran with the HBM BPF program included
    in the HBM patch-set.

    EDT has limited value when using DCTCP, but it helps in many cases when
    using Cubic. It usually achieves larger link utilization and lower
    99% latencies for the 1MB RPCs.
    HBM ends up queueing a lot of packets with its default parameter values,
    reducing the goodput of the 10KB RPCs and increasing their latency. Also,
    the RTTs seen by the flows are quite large.

    Aggr 10K 10K 10K 1MB 1MB 1MB
    Limit rate drops RTT rate P90 P99 rate P90 P99
    Test rate Flows Mbps % us Mbps us us Mbps ms ms
    -------- ---- ----- ---- ----- --- ---- ---- ---- ---- ---- ----
    cubic 1G 2 904 0.02 108 257 511 539 647 13.4 24.5
    cubic-edt 1G 2 982 0.01 156 239 656 967 743 14.0 17.2
    dctcp 1G 2 977 0.00 105 324 408 744 653 14.5 15.9
    dctcp-edt 1G 2 981 0.01 142 321 417 811 660 15.7 17.0
    cubic-htb 1G 2 919 0.00 1825 40 2822 4140 879 9.7 9.9

    cubic 200M 2 155 0.30 220 81 532 655 74 283 450
    cubic-edt 200M 2 188 0.02 222 87 1035 1095 101 84 85
    dctcp 200M 2 188 0.03 111 77 912 939 111 76 325
    dctcp-edt 200M 2 188 0.03 217 74 1416 1738 114 76 79
    cubic-htb 200M 2 188 0.00 5015 8 14ms 15ms 180 48 50

    cubic 1G 4 952 0.03 110 165 516 546 262 38 154
    cubic-edt 1G 4 973 0.01 190 111 1034 1314 287 65 79
    dctcp 1G 4 951 0.00 103 180 617 905 257 37 38
    dctcp-edt 1G 4 967 0.00 163 151 732 1126 272 43 55
    cubic-htb 1G 4 914 0.00 3249 13 7ms 8ms 300 29 34

    cubic 5G 4 4236 0.00 134 305 490 624 1310 10 17
    cubic-edt 5G 4 4865 0.00 156 306 425 759 1520 10 16
    dctcp 5G 4 4936 0.00 128 485 221 409 1484 7 9
    dctcp-edt 5G 4 4924 0.00 148 390 392 623 1508 11 26

    v1 -> v2: Incorporated Andrii's suggestions
    v2 -> v3: Incorporated Yonghong's suggestions
    v3 -> v4: Removed credit update that is not needed

    Signed-off-by: Lawrence Brakmo
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    brakmo
     

01 Jun, 2019

1 commit


03 Mar, 2019

1 commit

  • Script for testing HBM (Host Bandwidth Manager) framework.
    It creates a cgroup to use for testing and load a BPF program to limit
    egress bandwidht. It then uses iperf3 or netperf to create
    loads. The output is the goodput in Mbps (unless -D is used).

    It can work on a single host using loopback or among two hosts (with netperf).
    When using loopback, it is recommended to also introduce a delay of at least
    1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.

    USAGE: $name [out] [-b=|--bpf=] [-c=|--cc=] [-D]
    [-d=|--delay=] [--debug] [-E]
    [-f=|--flows=] [-h] [-i=|--id=] [-l]
    [-N] [-p=|--port=] [-P] [-q=]
    [-R] [-s=|--server=|--time=] [-w] [cubic|dctcp]
    Where:
    out Egress (default egress)
    -b or --bpf BPF program filename to load and attach.
    Default is nrm_out_kern.o for egress,
    -c or -cc TCP congestion control (cubic or dctcp)
    -d or --delay Add a delay in ms using netem
    -D In addition to the goodput in Mbps, it also outputs
    other detailed information. This information is
    test dependent (i.e. iperf3 or netperf).
    --debug Print BPF trace buffer
    -E Enable ECN (not required for dctcp)
    -f or --flows Number of concurrent flows (default=1)
    -i or --id cgroup id (an integer, default is 1)
    -l Do not limit flows using loopback
    -N Use netperf instead of iperf3
    -h Help
    -p or --port iperf3 port (default is 5201)
    -P Use an iperf3 instance for each flow
    -q Use the specified qdisc.
    -r or --rate Rate in Mbps (default 1s 1Gbps)
    -R Use TCP_RR for netperf. 1st flow has req
    size of 10KB, rest of 1MB. Reply in all
    cases is 1 byte.
    More detailed output for each flow can be found
    in the files netperf.., where is the
    cgroup id as specified with the -i flag, and
    is the flow id starting at 1 and increasing by 1 for
    flow (as specified by -f).
    -s or --server hostname of netperf server. Used to create netperf
    test traffic between to hosts (default is within host)
    netserver must be running on the host.
    --stats Get HBM stats (marked, dropped, etc.)
    -t or --time duration of iperf3 in seconds (default=5)
    -w Work conserving flag. cgroup can increase its
    bandwidth beyond the rate limit specified
    while there is available bandwidth. Current
    implementation assumes there is only one NIC
    (eth0), but can be extended to support multiple
    NICs. This is just a proof of concept.
    cubic or dctcp specify TCP CC to use

    Examples:
    ./do_hbm_test.sh -l -d=1 -D --stats
    Runs a 5 second test, using a single iperf3 flow and with the default
    rate limit of 1Gbps and a delay of 1ms (using netem) using the default
    TCP congestion control on the loopback device (hence we use "-l" to
    enforce bandwidth limit on loopback device). Since no direction is
    specified, it defaults to egress. Since no TCP CC algorithm is
    specified it uses the system default (Cubic for this test).
    With no -D flag, only the value of the AGGREGATE OUTPUT would show.
    id refers to the cgroup id and is useful when running multi cgroup
    tests (supported by a future patch).
    This patchset does not support calling TCP's congesion window
    reduction, even when packets are dropped by the BPF program, resulting
    in a large number of packets dropped. It is recommended that the current
    HBM implemenation only be used with ECN enabled flows. A future patch
    will add support for reducing TCP's cwnd and will increase the
    performance of non-ECN enabled flows.
    Output:
    Details for HBM in cgroup 1
    id:1
    rate_mbps:493
    duration:4.8 secs
    packets:11355
    bytes_MB:590
    pkts_dropped:4497
    bytes_dropped_MB:292
    pkts_marked_percent: 39.60
    bytes_marked_percent: 49.49
    pkts_dropped_percent: 39.60
    bytes_dropped_percent: 49.49
    PING AVG DELAY:2.075
    AGGREGATE_GOODPUT:505

    ./do_nrm_test.sh -l -d=1 -D --stats dctcp
    Same as above but using dctcp. Note that fewer bytes are dropped
    (0.01% vs. 49%).
    Output:
    Details for HBM in cgroup 1
    id:1
    rate_mbps:945
    duration:4.9 secs
    packets:16859
    bytes_MB:578
    pkts_dropped:1
    bytes_dropped_MB:0
    pkts_marked_percent: 28.74
    bytes_marked_percent: 45.15
    pkts_dropped_percent: 0.01
    bytes_dropped_percent: 0.01
    PING AVG DELAY:2.083
    AGGREGATE_GOODPUT:965

    ./do_nrm_test.sh -d=1 -D --stats
    As first example, but without limiting loopback device (i.e. no
    "-l" flag). Since there is no bandwidth limiting, no details for
    HBM are printed out.
    Output:
    Details for HBM in cgroup 1
    PING AVG DELAY:2.019
    AGGREGATE_GOODPUT:42655

    ./do_hbm.sh -l -d=1 -D --stats -f=2
    Uses iper3 and does 2 flows
    ./do_hbm.sh -l -d=1 -D --stats -f=4 -P
    Uses iperf3 and does 4 flows, each flow as a separate process.
    ./do_hbm.sh -l -d=1 -D --stats -f=4 -N
    Uses netperf, 4 flows
    ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=
    Uses netperf between two hosts. The remote host name is specified
    with -s= and you need to start the program netserver manually on
    the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
    ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
    -s=
    As previous, but allows use of extra bandwidth. For this test the
    rate is 8Gbps vs. 1Gbps of the previous test.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: Alexei Starovoitov

    brakmo