19 Sep, 2015

1 commit


10 Jul, 2015

1 commit

  • inet_twsk_deschedule() calls are followed by inet_twsk_put().

    Only particular case is in inet_twsk_purge() but there is no point
    to defer the inet_twsk_put() after re-enabling BH.

    Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
    and move the inet_twsk_put() inside.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Apr, 2015

1 commit

  • Using a timer wheel for timewait sockets was nice ~15 years ago when
    memory was expensive and machines had a single processor.

    This does not scale, code is ugly and source of huge latencies
    (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

    We can afford to use an extra 64 bytes per timewait sock and spread
    timewait load to all cpus to have better behavior.

    Tested:

    On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
    on the target (lpaa24)

    Before patch :

    lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
    419594

    lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
    437171

    While test is running, we can observe 25 or even 33 ms latencies.

    lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
    ...
    1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
    rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

    lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
    ...
    1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
    rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

    After patch :

    About 90% increase of throughput :

    lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
    810442

    lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
    800992

    And latencies are kept to minimal values during this load, even
    if network utilization is 90% higher :

    lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
    ...
    1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
    rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Mar, 2015

1 commit


20 Mar, 2015

1 commit


18 Mar, 2015

1 commit


09 Oct, 2013

1 commit

  • TCP listener refactoring, part 4 :

    To speed up inet lookups, we moved IPv4 addresses from inet to struct
    sock_common

    Now is time to do the same for IPv6, because it permits us to have fast
    lookups for all kind of sockets, including upcoming SYN_RECV.

    Getting IPv6 addresses in TCP lookups currently requires two extra cache
    lines, plus a dereference (and memory stall).

    inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

    This patch is way bigger than its IPv4 counter part, because for IPv4,
    we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
    it's not doable easily.

    inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
    inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

    And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
    at the same offset.

    We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
    macro.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Aug, 2013

1 commit

  • after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h)

    CONFIG_IPV6=y
    CONFIG_IP6_NF_IPTABLES=n

    gives us:

    net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6':
    net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener'

    Reported-by: kbuild test robot
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

01 Aug, 2013

1 commit


31 Jul, 2013

1 commit

  • The module was "permanent", due to the special tproxy skb->destructor.
    Nowadays we have tcp early demux and its sock_edemux destructor in
    networking core which can be used instead.

    Thanks to early demux changes the input path now also handles
    "skb->sk is tw socket" correctly, so this no longer needs the special
    handling introduced with commit d503b30bd648b3cb4e5f50b65d27e389960cc6d9
    (netfilter: tproxy: do not assign timewait sockets to skb->sk).

    Thus:
    - move assign_sock function to where its needed
    - don't prevent timewait sockets from being assigned to the skb
    - remove nf_tproxy_core.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

11 Jul, 2012

1 commit


09 May, 2012

1 commit

  • This patch adds the flags parameter to ipv6_find_hdr. This flags
    allows us to:

    * know if this is a fragment.
    * stop at the AH header, so the information contained in that header
    can be used for some specific packet handling.

    This patch also adds the offset parameter for inspection of one
    inner IPv6 header that is contained in error messages.

    Signed-off-by: Hans Schillstrom
    Signed-off-by: Pablo Neira Ayuso

    Hans Schillstrom
     

17 Dec, 2011

1 commit


17 Feb, 2011

1 commit

  • Assigning a socket in timewait state to skb->sk can trigger
    kernel oops, e.g. in nfnetlink_log, which does:

    if (skb->sk) {
    read_lock_bh(&skb->sk->sk_callback_lock);
    if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

    in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
    is invalid.

    Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
    or xt_TPROXY must not assign a timewait socket to skb->sk.

    This does the latter.

    If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
    thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

    The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
    listener socket.

    Cc: Balazs Scheidler
    Cc: KOVACS Krisztian
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

26 Oct, 2010

1 commit

  • One of the previous tproxy related patches split IPv6 defragmentation and
    connection tracking, but did not correctly add Kconfig stanzas to handle the
    new dependencies correctly. This patch fixes that by making the config options
    mirror the setup we have for IPv4: a distinct config option for defragmentation
    that is automatically selected by both connection tracking and
    xt_TPROXY/xt_socket.

    The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
    and xt_TPROXY: we only compile these in case we have ip6tables support enabled.

    Signed-off-by: KOVACS Krisztian
    Signed-off-by: David S. Miller

    KOVACS Krisztian
     

21 Oct, 2010

3 commits

  • The REDIRECT target and the older TProxy versions used the primary address
    of the incoming interface as the default value of the --on-ip parameter.
    This was unintentionally changed during the initial TProxy submission and
    caused confusion among users.

    Since IPv6 has no notion of primary address, we just select the first address
    on the list: this way the socket lookup finds wildcard bound sockets
    properly and we cannot really do better without the user telling us the
    IPv6 address of the proxy.

    This is implemented for both IPv4 and IPv6.

    Signed-off-by: Balazs Scheidler
    Signed-off-by: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Balazs Scheidler
     
  • This requires a new revision as the old target structure was
    IPv4 specific.

    Signed-off-by: Balazs Scheidler
    Signed-off-by: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Balazs Scheidler
     
  • Without tproxy redirections an incoming SYN kicks out conflicting
    TIME_WAIT sockets, in order to handle clients that reuse ports
    within the TIME_WAIT period.

    The same mechanism didn't work in case TProxy is involved in finding
    the proper socket, as the time_wait processing code looked up the
    listening socket assuming that the listener addr/port matches those
    of the established connection.

    This is not the case with TProxy as the listener addr/port is possibly
    changed with the tproxy rule.

    Signed-off-by: Balazs Scheidler
    Signed-off-by: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Balazs Scheidler
     

17 Sep, 2010

1 commit


09 Jul, 2010

1 commit


12 May, 2010

1 commit


25 Mar, 2010

3 commits

  • Part of the transition of done by this semantic patch:
    //
    @ rule1 @
    struct xt_target ops;
    identifier check;
    @@
    ops.checkentry = check;

    @@
    identifier rule1.check;
    @@
    check(...) { }

    @@
    identifier rule1.check;
    @@
    check(...) { }
    //

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     
  • Restore function signatures from bool to int so that we can report
    memory allocation failures or similar using -ENOMEM rather than
    always having to pass -EINVAL back.

    //
    @@
    type bool;
    identifier check, par;
    @@
    -bool check
    +int check
    (struct xt_tgchk_param *par) { ... }
    //

    Minus the change it does to xt_ct_find_proto.

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     
  • Supplement to 1159683ef48469de71dc26f0ee1a9c30d131cf89.

    Downgrade the log level to INFO for most checkentry messages as they
    are, IMO, just an extra information to the -EINVAL code that is
    returned as part of a parameter "constraint violation". Leave errors
    to real errors, such as being unable to create a LED trigger.

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     

08 Oct, 2008

3 commits