10 Jan, 2014

15 commits


08 Jan, 2014

12 commits

  • The ct expression can currently not be used in the inet family since
    we don't have a conntrack module for NFPROTO_INET, so
    nf_ct_l3proto_try_module_get() fails. Add some manual handling to
    load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the
    ct expression is used in the inet family.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • For L3-proto independant rules we need to get at the L4 protocol value
    directly. Add it to the nft_pktinfo struct and use the meta expression
    to retrieve it.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Needed by multi-family tables to distinguish IPv4 and IPv6 packets.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • This patch adds a new table family and a new filter chain that you can
    use to attach IPv4 and IPv6 rules. This should help to simplify
    rule-set maintainance in dual-stack setups.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Add support to register chains to multiple hooks for different address
    families for mixed IPv4/IPv6 tables.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Multi-family tables need the AF from the hook ops. Add a pointer to the
    hook ops and replace usage of the hooknum member in struct nft_pktinfo.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Currently the AF-specific hook functions override the chain-type specific
    hook functions. That doesn't make too much sense since the chain types
    are a special case of the AF-specific hooks.

    Make the AF-specific hook functions the default and make the optional
    chain type hooks override them.

    As a side effect, the necessary code restructuring reduces the code size,
    f.i. in case of nf_tables_ipv4.o:

    nf_tables_ipv4_init_net | -24
    nft_do_chain_ipv4 | -113
    2 functions changed, 137 bytes removed, diff: -137

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • net/netfilter/nft_reject.c: In function 'nft_reject_eval':
    net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable]

    Reported-by: kbuild test robot
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • There are many cases where this feature does not improve performance or even
    reduces it.

    For example, here are the results from tests that I've run using 3.12.6 on one
    Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
    from the Xeon, but they're similar on the i7. All numbers report the
    mean±stddev over 10 runs of 10s.

    1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
    copy from user on transmit"
    There is no statistically significant difference between tx-nocache-copy
    on/off.
    nic irqs spread out (one queue per cpu)

    200x netperf -r 1400,1
    tx-nocache-copy off
    692000±1000 tps
    50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
    tx-nocache-copy on
    693000±1000 tps
    50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

    200x netperf -r 14000,14000
    tx-nocache-copy off
    86450±80 tps
    50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
    tx-nocache-copy on
    86110±60 tps
    50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

    2) single stream throughput tests
    tx-nocache-copy leads to higher service demand

    throughput cpu0 cpu1 demand
    (Gb/s) (Gcycle) (Gcycle) (cycle/B)

    nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

    tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01
    tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004

    nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

    tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007
    tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002

    As a second example, here are some results from Eric Dumazet with latest
    net-next.
    tx-nocache-copy also leads to higher service demand

    (cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz)

    lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
    lpq83:~# perf stat ./netperf -H lpq84 -c
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

    87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000

    Performance counter stats for './netperf -H lpq84 -c':

    4282.648396 task-clock # 0.423 CPUs utilized
    9,348 context-switches # 0.002 M/sec
    88 CPU-migrations # 0.021 K/sec
    355 page-faults # 0.083 K/sec
    11,812,797,651 cycles # 2.758 GHz [82.79%]
    9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%]
    4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%]
    6,053,172,792 instructions # 0.51 insns per cycle
    # 1.49 stalled cycles per insn [83.64%]
    597,275,583 branches # 139.464 M/sec [83.70%]
    8,960,541 branch-misses # 1.50% of all branches [83.65%]

    10.128990264 seconds time elapsed

    lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
    lpq83:~# perf stat ./netperf -H lpq84 -c
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

    87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000

    Performance counter stats for './netperf -H lpq84 -c':

    2847.375441 task-clock # 0.281 CPUs utilized
    11,632 context-switches # 0.004 M/sec
    49 CPU-migrations # 0.017 K/sec
    354 page-faults # 0.124 K/sec
    7,646,889,749 cycles # 2.686 GHz [83.34%]
    6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%]
    1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%]
    2,079,702,453 instructions # 0.27 insns per cycle
    # 2.94 stalled cycles per insn [83.22%]
    363,773,213 branches # 127.757 M/sec [83.29%]
    4,242,732 branch-misses # 1.17% of all branches [83.51%]

    10.128449949 seconds time elapsed

    CC: Tom Herbert
    Signed-off-by: Benjamin Poirier
    Signed-off-by: David S. Miller

    Benjamin Poirier
     
  • When lo is brought up, new ifa is created. Then, devconf and neigh values
    bitfield should be set so later changes of default values would not
    affect lo values.

    Note that the same behaviour is in ipv6. Also note that this is likely
    not an issue in many distros (for example Fedora 19) because userspace
    sets address to lo manually before bringing it up.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • This change allows to follow a recommandation of RFC4942.

    - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
    as source addresses for ICMPv6 echo reply. This sysctl is false by default
    to preserve existing behavior.
    - Add inline check ipv6_anycast_destination().
    - Use them in icmpv6_echo_reply().

    Reference:
    RFC4942 - IPv6 Transition/Coexistence Security Considerations
    (http://tools.ietf.org/html/rfc4942#section-2.1.6)

    2.1.6. Anycast Traffic Identification and Security

    [...]
    To avoid exposing knowledge about the internal structure of the
    network, it is recommended that anycast servers now take advantage of
    the ability to return responses with the anycast address as the
    source address if possible.

    Signed-off-by: Francois-Xavier Le Bail
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    FX Le Bail
     
  • Fix to return a negative error code from the error handling
    case instead of 0.

    Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling')
    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

07 Jan, 2014

13 commits