19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

18 Jan, 2019

3 commits


03 Nov, 2018

1 commit


21 Sep, 2018

4 commits

  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Its unused, next patch will remove l4proto->l3proto number to simplify
    l4 protocol demuxer lookup.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The error() handler gets called before allocating or looking up a
    connection tracking entry.

    We can instead use direct calls from the ->packet() handlers which get
    invoked for every packet anyway.

    Only exceptions are icmp and icmpv6, these two special cases will be
    handled in the next patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only two protocols need the ->error() function: icmp and icmpv6.
    This is because icmp error mssages might be RELATED to an existing
    connection (e.g. PMTUD, port unreachable and the like), and their
    ->error() handlers do this.

    The error callback is already optional, so remove it for
    udp and call them from ->packet() instead.

    As the error() callback can call checksum functions that write to
    skb->csum*, the const qualifier has to be removed as well.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

20 Sep, 2018

2 commits

  • ->new() gets invoked after ->error() and before ->packet() if
    a conntrack lookup has found no result for the tuple.

    We can fold it into ->packet() -- the packet() implementations
    can check if the conntrack is confirmed (new) or not
    (already in hash).

    If its unconfirmed, the conntrack isn't in the hash yet so current
    skb created a new conntrack entry.

    Only relevant side effect -- if packet() doesn't return NF_ACCEPT
    but -NF_ACCEPT (or drop), while the conntrack was just created,
    then the newly allocated conntrack is freed right away, rather than not
    created in the first place.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • nf_hook_state contains all the hook meta-information: netns, protocol family,
    hook location, and so on.

    Instead of only passing selected information, pass a pointer to entire
    structure.

    This will allow to merge the error and the packet handlers and remove
    the ->new() function in followup patches.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

11 Sep, 2018

1 commit

  • Now that cttimeout support for nft_ct is in place, these should depend
    on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
    policy if this option is not enabled.

    [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [...]
    [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
    [...]
    [ 71.600188] Call Trace:
    [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

29 Aug, 2018

1 commit

  • tcp, sctp and dccp trackers re-use the userspace ctnetlink states
    to index their timeout arrays, which means timeout[0] is never
    used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
    so external users can simply read it off timeouts[0] without need to
    differentiate dccp/sctp/tcp and udp/icmp/gre/generic.

    The alternative is to map all array accesses to 'i - 1', but that
    is a much more intrusive change.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

25 Jul, 2018

1 commit


20 Jul, 2018

1 commit


16 Jul, 2018

3 commits


09 Jan, 2018

1 commit


08 Jan, 2018

1 commit


25 Oct, 2017

3 commits


04 Sep, 2017

1 commit


25 Aug, 2017

3 commits


30 Jun, 2017

1 commit

  • After running the following commands for a while, kmemleak reported that
    "1879 new suspected memory leaks" happened:
    # while : ; do
    ip netns add test
    ip netns delete test
    done

    unreferenced object 0xffff88006342fa38 (size 1024):
    comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
    hex dump (first 32 bytes):
    b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] __kmalloc_track_caller+0x150/0x300
    [] kmemdup+0x20/0x50
    [] dccp_init_net+0x8a/0x160 [nf_conntrack]
    [] nf_ct_l4proto_pernet_register_one+0x25/0x90
    ...
    unreferenced object 0xffff88006342da58 (size 1024):
    comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
    hex dump (first 32 bytes):
    10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] __kmalloc_track_caller+0x150/0x300
    [] kmemdup+0x20/0x50
    [] sctp_init_net+0x5d/0x130 [nf_conntrack]
    [] nf_ct_l4proto_pernet_register_one+0x25/0x90
    ...

    This is because we forgot to implement the get_net_proto for sctp and
    dccp, so we won't invoke the nf_ct_unregister_sysctl to free the
    ctl_table when do netns cleanup. Also note, we will fail to register
    the sysctl for dccp/sctp either due to the lack of get_net_proto.

    Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP")
    Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP")
    Cc: Davide Caratti
    Signed-off-by: Liping Zhang
    Acked-by: Davide Caratti
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

01 May, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Apr, 2017

1 commit

  • If insertion of a new conntrack fails because the table is full, the kernel
    searches the next buckets of the hash slot where the new connection
    was supposed to be inserted at for an entry that hasn't seen traffic
    in reply direction (non-assured), if it finds one, that entry is
    is dropped and the new connection entry is allocated.

    Allow the conntrack gc worker to also remove *assured* conntracks if
    resources are low.

    Do this by querying the l4 tracker, e.g. tcp connections are now dropped
    if they are no longer established (e.g. in finwait).

    This could be refined further, e.g. by adding 'soft' established timeout
    (i.e., a timeout that is only used once we get close to resource
    exhaustion).

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

14 Apr, 2017

1 commit


02 Feb, 2017

1 commit


05 Dec, 2016

1 commit

  • CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
    tracking support for DCCP protocol is built-in into nf_conntrack.ko.

    footprint test:
    $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
    net/ipv4/netfilter/nf_conntrack_ipv4.ko \
    net/ipv6/netfilter/nf_conntrack_ipv6.ko

    (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack
    ---------++--------+--------+--------+--------------
    none || 469140 | 828755 | 828676 | 6141434
    DCCP || - | 830566 | 829935 | 6533526

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

10 Nov, 2016

1 commit

  • modify registration and deregistration of layer-4 protocol trackers to
    facilitate inclusion of new elements into the current list of builtin
    protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
    UDPlite) layer-4 protocol trackers usually register/deregister themselves
    using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
    This sequence is interrupted and rolled back in case of error; in order to
    simplify addition of builtin protocols, the input of the above functions
    has been modified to allow registering/unregistering multiple protocols.

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

12 Aug, 2016

1 commit


24 Apr, 2016

1 commit


19 Sep, 2015

1 commit