20 Jan, 2021

1 commit

  • commit 869f4fdaf4ca7bb6e0d05caf6fa1108dddc346a7 upstream.

    When register_pernet_subsys() fails, nf_nat_bysource
    should be freed just like when nf_ct_extend_register()
    fails.

    Fixes: 1cd472bf036ca ("netfilter: nf_nat: add nat hook register functions to nf_nat")
    Signed-off-by: Dinghao Liu
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Dinghao Liu
     

22 Jul, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

13 Sep, 2019

1 commit


16 Jul, 2019

1 commit

  • In 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") the new
    generic nf_conntrack was introduced, and it came to supersede the old
    ip_conntrack.

    This change updates (some) of the obsolete comments referring to old
    file/function names of the ip_conntrack mechanism, as well as removes a
    few self-referencing comments that we shouldn't maintain anymore.

    I did not update any comments referring to historical actions (e.g,
    comments like "this file was derived from ..." were left untouched, even
    if the referenced file is no longer here).

    Signed-off-by: Yonatan Goldschmidt
    Signed-off-by: Pablo Neira Ayuso

    Yonatan Goldschmidt
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

1 commit

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

26 Apr, 2019

1 commit


15 Apr, 2019

1 commit

  • Sven Auhagen reported that a 2nd ping request will fail if 'fully-random'
    mode is used.

    Reason is that if no proto information is given, min/max are both 0,
    so we set the icmp id to 0 instead of chosing a random value between
    0 and 65535.

    Update test case as well to catch this, without fix this yields:
    [..]
    ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2)
    ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2)

    ... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets
    dropped.

    Fixes: 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple")
    Reported-by: Sven Auhagen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

09 Apr, 2019

1 commit

  • We need minimal support from the nat core for this, as we do not
    want to register additional base hooks.

    When an inet hook is registered, interally register ipv4 and ipv6
    hooks for them and unregister those when inet hooks are removed.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

27 Feb, 2019

6 commits

  • Empty case is fine and does not switch fall-through

    Signed-off-by: Li RongQing
    Signed-off-by: Pablo Neira Ayuso

    Li RongQing
     
  • The l3proto name is gone, its header file is the last trace.
    While at it, also remove nf_nat_core.h, its very small and all users
    include nf_nat.h too.

    before:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    after removal of l3proto register/unregister functions:
    text data bss dec hex filename
    22196 1516 4136 27848 6cc8 nf_nat.ko

    checkpatch complains about overly long lines, but line breaks
    do not make things more readable and the line length gets smaller
    here, not larger.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • All l3proto function pointers have been removed.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • We can now use direct calls.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • before:
    text data bss dec hex filename
    16566 1576 4136 22278 5706 nf_nat.ko
    3598 844 0 4442 115a nf_nat_ipv6.ko
    3187 844 0 4031 fbf nf_nat_ipv4.ko

    after:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    ... with ipv4/v6 nat now provided directly via nf_nat.ko.

    Also changes:
    ret = nf_nat_ipv4_fn(priv, skb, state);
    if (ret != NF_DROP && ret != NF_STOLEN &&
    into
    if (ret != NF_ACCEPT)
    return ret;

    everywhere.

    The nat hooks never should return anything other than
    ACCEPT or DROP (and the latter only in rare error cases).

    The original code uses multi-line ANDing including assignment-in-if:
    if (ret != NF_DROP && ret != NF_STOLEN &&
    !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {

    I removed this while moving, breaking those in separate conditionals
    and moving the assignments into extra lines.

    checkpatch still generates some warnings:
    1. Overly long lines (of moved code).
    Breaking them is even more ugly. so I kept this as-is.
    2. use of extern function declarations in a .c file.
    This is necessary evil, we must call
    nf_nat_l3proto_register() from the nat core now.
    All l3proto related functions are removed later in this series,
    those prototypes are then removed as well.

    v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
    v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
    are removed here.
    v4: also get rid of the assignments in conditionals.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • None of these functions calls any external functions, moving them allows
    to avoid both the indirection and a need to export these symbols.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

18 Jan, 2019

2 commits


21 Dec, 2018

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Support for destination MAC in ipset, from Stefano Brivio.

    2) Disallow all-zeroes MAC address in ipset, also from Stefano.

    3) Add IPSET_CMD_GET_BYNAME and IPSET_CMD_GET_BYINDEX commands,
    introduce protocol version number 7, from Jozsef Kadlecsik.
    A follow up patch to fix ip_set_byindex() is also included
    in this batch.

    4) Honor CTA_MARK_MASK from ctnetlink, from Andreas Jaggi.

    5) Statify nf_flow_table_iterate(), from Taehee Yoo.

    6) Use nf_flow_table_iterate() to simplify garbage collection in
    nf_flow_table logic, also from Taehee Yoo.

    7) Don't use _bh variants of call_rcu(), rcu_barrier() and
    synchronize_rcu_bh() in Netfilter, from Paul E. McKenney.

    8) Remove NFC_* cache definition from the old caching
    infrastructure.

    9) Remove layer 4 port rover in NAT helpers, use random port
    instead, from Florian Westphal.

    10) Use strscpy() in ipset, from Qian Cai.

    11) Remove NF_NAT_RANGE_PROTO_RANDOM_FULLY branch now that
    random port is allocated by default, from Xiaozhou Liu.

    12) Ignore NF_NAT_RANGE_PROTO_RANDOM too, from Florian Westphal.

    13) Limit port allocation selection routine in NAT to avoid
    softlockup splats when most ports are in use, from Florian.

    14) Remove unused parameters in nf_ct_l4proto_unregister_sysctl()
    from Yafang Shao.

    15) Direct call to nf_nat_l4proto_unique_tuple() instead of
    indirection, from Florian Westphal.

    16) Several patches to remove all layer 4 NAT indirections,
    remove nf_nat_l4proto struct, from Florian Westphal.

    17) Fix RTP/RTCP source port translation when SNAT is in place,
    from Alin Nastac.

    18) Selective rule dump per chain, from Phil Sutter.

    19) Revisit CLUSTERIP target, this includes a deadlock fix from
    netns path, sleep in atomic, remove bogus WARN_ON_ONCE()
    and disallow mismatching IP address and MAC address.
    Patchset from Taehee Yoo.

    20) Update UDP timeout to stream after 2 seconds, from Florian.

    21) Shrink UDP established timeout to 120 seconds like TCP timewait.

    22) Sysctl knobs to set GRE timeouts, from Yafang Shao.

    23) Move seq_print_acct() to conntrack core file, from Florian.

    24) Add enum for conntrack sysctl knobs, also from Florian.

    25) Place nf_conntrack_acct, nf_conntrack_helper, nf_conntrack_events
    and nf_conntrack_timestamp knobs in the core, from Florian Westphal.
    As a side effect, shrink netns_ct structure by removing obsolete
    sysctl anchors, also from Florian.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2018

6 commits

  • This removes the (now empty) nf_nat_l4proto struct, all its instances
    and all the no longer needed runtime (un)register functionality.

    nf_nat_need_gre() can be axed as well: the module that calls it (to
    load the no-longer-existing nat_gre module) also calls other nat core
    functions. GRE nat is now always available if kernel is built with it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
    just call it directly.

    The important difference is that we'll now also call it for
    protocols that we don't support (i.e., nf_nat_proto_unknown did
    not provide .nlattr_to_range).

    However, there should be no harm, even icmp provided this callback.
    If we don't implement a specific l4nat for this, nothing would make
    use of this information, so adding a big switch/case construct listing
    all supported l4protocols seems a bit pointless.

    This change leaves a single function pointer in the l4proto struct.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • With exception of icmp, all of the l4 nat protocols set this to
    nf_nat_l4proto_in_range.

    Get rid of this and just check the l4proto in the caller.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • No need for indirections here, we only support ipv4 and ipv6
    and the called functions are very small.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple.
    The static-save of old incarnation of resolved key in gre and icmp is
    removed as well, just use the prandom based offset like the others.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • almost all l4proto->unique_tuple implementations just call this helper,
    so make ->unique_tuple() optional and call its helper directly if the
    l4proto doesn't override it.

    This is an intermediate step to get rid of ->unique_tuple completely.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Dec, 2018

1 commit

  • The dst entry might already have a zero refcount, waiting on rcu list
    to be free'd. Using dst_hold() transitions its reference count to 1, and
    next dst release will try to free it again -- resulting in a double free:

    WARNING: CPU: 1 PID: 0 at include/net/dst.h:239 nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
    RIP: 0010:nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
    Code: 48 8b 5c 24 60 65 48 33 1c 25 28 00 00 00 75 53 48 83 c4 68 5b 5d 41 5c c3 85 c0 74 0d 8d 48 01 f0 0f b1 0a 74 86 85 c0 75 f3 0b e9 7b ff ff ff 29 c6 31 d2 b9 20 00 48 00 4c 89 e7 e8 31 27
    Call Trace:
    nf_nat_ipv4_out+0x78/0x90 [nf_nat_ipv4]
    nf_hook_slow+0x36/0xd0
    ip_output+0x9f/0xd0
    ip_forward+0x328/0x440
    ip_rcv+0x8a/0xb0

    Use dst_hold_safe instead and bail out if we cannot take a reference.

    Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag")
    Reported-by: Martin Zaharinov
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

04 Aug, 2018

1 commit

  • nf_ct_alloc_hashtable is used to allocate memory for conntrack,
    NAT bysrc and expectation hashtable. Assuming 64k bucket size,
    which means 7th order page allocation, __get_free_pages, called
    by nf_ct_alloc_hashtable, will trigger the direct memory reclaim
    and stall for a long time, when system has lots of memory stress

    so replace combination of __get_free_pages and vzalloc with
    kvmalloc_array, which provides a overflow check and a fallback
    if no high order memory is available, and do not retry to reclaim
    memory, reduce stall

    and remove nf_ct_free_hashtable, since it is just a kvfree

    Signed-off-by: Zhang Yu
    Signed-off-by: Wang Li
    Signed-off-by: Li RongQing
    Signed-off-by: Pablo Neira Ayuso

    Li RongQing
     

17 Jul, 2018

1 commit

  • This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
    abstraction.

    This gets rid of all l3proto indirect calls and the need to do
    a lookup on the function to call for l3 demux.

    It increases module size by only a small amount (12kbyte), so this reduces
    size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
    or nf_conntrack_ipv6 module.

    before:
    text data bss dec hex filename
    7357 1088 0 8445 20fd nf_conntrack_ipv4.ko
    7405 1084 4 8493 212d nf_conntrack_ipv6.ko
    72614 13689 236 86539 1520b nf_conntrack.ko
    19K nf_conntrack_ipv4.ko
    19K nf_conntrack_ipv6.ko
    179K nf_conntrack.ko

    after:
    text data bss dec hex filename
    79277 13937 236 93450 16d0a nf_conntrack.ko
    191K nf_conntrack.ko

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 Jun, 2018

1 commit

  • Netfilter assumes that if the socket is present in the skb, then
    it can be used because that reference is cleaned up while the skb
    is crossing netns.

    We want to change that to preserve the socket reference in a future
    patch, so this is a preparation updating netfilter to check if the
    socket netns matches before use it.

    Signed-off-by: Flavio Leitner
    Acked-by: Florian Westphal
    Signed-off-by: David S. Miller

    Flavio Leitner
     

13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

29 May, 2018

1 commit


23 May, 2018

5 commits

  • In nfqueue, two consecutive skbuffs may race to create the conntrack
    entry. Hence, the one that loses the race gets dropped due to clash in
    the insertion into the hashes from the nf_conntrack_confirm() path.

    This patch adds a new nf_conntrack_update() function which searches for
    possible clashes and resolve them. NAT mangling for the packet losing
    race is corrected by using the conntrack information that won race.

    In order to avoid direct module dependencies with conntrack and NAT, the
    nf_ct_hook and nf_nat_hook structures are used for this purpose.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Move decode_session() and parse_nat_setup_hook() indirections to struct
    nf_nat_hook structure.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Currently the packet rewrite and instantiation of nat NULL bindings
    happens from the protocol specific nat backend.

    Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.

    Invocation looks like this (simplified):
    NF_HOOK()
    |
    `---iptable_nat
    |
    `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
    |
    new packet? pass skb though iptables nat chain
    |
    `---> iptable_nat: ipt_do_table

    In nft case, this looks the same (nft_chain_nat_ipv4 instead of
    iptable_nat).

    This is a problem for two reasons:
    1. Can't use iptables nat and nf_tables nat at the same time,
    as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
    NULL binding if do_table() did not find a matching nat rule so we
    can detect post-nat tuple collisions).
    2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
    an empty base chain so that the nat core gets called fro NF_HOOK()
    to do the reverse translation, which is neither obvious nor user
    friendly.

    After this change, the base hook gets registered not from iptable_nat or
    nftables nat hooks, but from the l3 nat core.

    iptables/nft nat base hooks get registered with the nat core instead:

    NF_HOOK()
    |
    `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
    |
    new packet? pass skb through iptables/nftables nat chains
    |
    +-> iptables_nat: ipt_do_table
    +-> nft nat chain x
    `-> nft nat chain y

    The nat core deals with null bindings and reverse translation.
    When no mapping exists, it calls the registered nat lookup hooks until
    one creates a new mapping.
    If both iptables and nftables nat hooks exist, the first matching
    one is used (i.e., higher priority wins).

    Also, nft users do not need to create empty nat hooks anymore,
    nat core always registers the base hooks that take care of reverse/reply
    translation.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This adds the infrastructure to register nat hooks with the nat core
    instead of the netfilter core.

    nat hooks are used to configure nat bindings. Such hooks are registered
    from ip(6)table_nat or by the nftables core when a nat chain is added.

    After next patch, nat hooks will be registered with nf_nat instead of
    netfilter core. This allows to use many nat lookup functions at the
    same time while doing the real packet rewrite (nat transformation) in
    one place.

    This change doesn't convert the intended users yet to ease review.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Copy-pasted, both l3 helpers almost use same code here.
    Split out the common part into an 'inet' helper.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

24 Apr, 2018

1 commit

  • This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
    incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)

    Currently DNAT only works for single port or identical port ranges. (i.e.
    ports 5000-5100 on WAN interface redirected to a LAN host while original
    destination port is not altered) When different port ranges are configured,
    either 'random' mode should be used, or else all incoming connections are
    mapped onto the first port in the redirect range. (in described example
    WAN:5000-5100 will all be mapped to 192.168.1.5:2000)

    This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
    which uses a base port value to calculate an offset with the destination port
    present in the incoming stream. That offset is then applied as index within the
    redirect port range (index modulo rangewidth to handle range overflow).

    In described example the base port would be 5000. An incoming stream with
    destination port 5004 would result in an offset value 4 which means that the
    NAT'ed stream will be using destination port 2004.

    Other possibilities include deterministic mapping of larger or multiple ranges
    to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
    51xx)

    This patch does not change any current behavior. It just adds new NAT proto
    range functionality which must be selected via the specific flag when intended
    to use.

    A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
    which makes this functionality immediately available.

    Signed-off-by: Thierry Du Tre
    Signed-off-by: Pablo Neira Ayuso

    Thierry Du Tre
     

20 Mar, 2018

1 commit

  • Using pr_() is more concise than printk(KERN_).
    This patch:
    * Replace printks having a log level with the appropriate
    pr_*() macros.
    * Define pr_fmt() to include relevant name.
    * Remove redundant prefixes from pr_*() calls.
    * Indent the code where possible.
    * Remove the useless output messages.
    * Remove periods from messages.

    Signed-off-by: Arushi Singhal
    Signed-off-by: Pablo Neira Ayuso

    Arushi Singhal
     

24 Oct, 2017

1 commit


18 Sep, 2017

1 commit

  • If no spinlock debugging options (CONFIG_GENERIC_LOCKBREAK,
    CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC) are enabled on a UP
    platform (e.g. m68k defconfig), arch_spinlock_t is an empty struct,
    hence using ARRAY_SIZE(nf_nat_locks) causes a division by zero:

    net/netfilter/nf_nat_core.c: In function ‘nf_nat_setup_info’:
    net/netfilter/nf_nat_core.c:432: warning: division by zero
    net/netfilter/nf_nat_core.c: In function ‘__nf_nat_cleanup_conntrack’:
    net/netfilter/nf_nat_core.c:535: warning: division by zero
    net/netfilter/nf_nat_core.c:537: warning: division by zero
    net/netfilter/nf_nat_core.c: In function ‘nf_nat_init’:
    net/netfilter/nf_nat_core.c:810: warning: division by zero
    net/netfilter/nf_nat_core.c:811: warning: division by zero
    net/netfilter/nf_nat_core.c:824: warning: division by zero

    Fix this by using the CONNTRACK_LOCKS definition instead.

    Suggested-by: Florian Westphal
    Fixes: 8073e960a03bf7b5 ("netfilter: nat: use keyed locks")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Pablo Neira Ayuso

    Geert Uytterhoeven
     

09 Sep, 2017

1 commit

  • no need to serialize on a single lock, we can partition the table and
    add/delete in parallel to different slots.
    This restores one of the advantages that got lost with the rhlist
    revert.

    Cc: Ivan Babrou
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal