31 Mar, 2020

3 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Add support to specify a stateful expression in set definitions,
    this allows users to specify e.g. counters per set elements.

    2) Flowtable software counter support.

    3) Flowtable hardware offload counter support, from wenxu.

    3) Parallelize flowtable hardware offload requests, from Paul Blakey.
    This includes a patch to add one work entry per offload command.

    4) Several patches to rework nf_queue refcount handling, from Florian
    Westphal.

    4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
    information is missing and set up indirect flow block as TC_SETUP_FT,
    patch from wenxu.

    5) Stricter netlink attribute sanity check on filters, from Romain Bellan
    and Florent Fourcot.

    5) Annotations to make sparse happy, from Jules Irenge.

    6) Improve icmp errors in debugging information, from Haishuang Yan.

    7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.

    8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If outer_proto is not set, GCC warning as following:

    In file included from net/netfilter/ipvs/ip_vs_core.c:52:
    net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp':
    include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized]
    233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \
    | ^~~~~~
    net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here
    1666 | char *outer_proto;
    | ^~~~~~~~~~~

    Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors")
    Signed-off-by: Haishuang Yan
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Haishuang Yan
     
  • I got a problem on MIPS with Big-Endian is turned on: every time when
    NF trying to change TCP MSS it returns because of new.v16 was greater
    than old.v16. But real MSS was 1460 and my rule was like this:

    add rule table chain tcp option maxseg size set 1400

    And 1400 is lesser that 1460, not greater.

    Later I founded that main causer is cast from u32 to __be16.

    Debugging:

    In example MSS = 1400(HEX: 0x578). Here is representation of each byte
    like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2
    0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type.

    LE BE
    u32: [78 05 00 00] [00 00 05 78]

    As you can see, u32 representation will be casted to u16 from different
    half of 4-byte address range. But actually nf_tables uses registers and
    store data of various size. Actually TCP MSS stored in 2 bytes. But
    registers are still u32 in definition:

    struct nft_regs {
    union {
    u32 data[20];
    struct nft_verdict verdict;
    };
    };

    So, access like regs->data[priv->sreg] exactly u32. So, according to
    table presents above, per-byte representation of stored TCP MSS in
    register will be:

    LE BE
    (u32)regs->data[]: [78 05 00 00] [05 78 00 00]
    ^^ ^^

    We see that register uses just half of u32 and other 2 bytes may be
    used for some another data. But in nft_exthdr_tcp_set_eval() it casted
    just like u32 -> __be16:

    new.v16 = src

    But u32 overfill __be16, so it get 2 low bytes. For clarity draw
    one more table( means that bytes will be used for cast).

    LE BE
    u32: [ 00 00] [00 00 ]
    (u32)regs->data[]: [ 00 00] [05 78 ]

    As you can see, for Little-Endian nothing changes, but for Big-endian we
    take the wrong half. In my case there is some other data instead of
    zeros, so new MSS was wrongly greater.

    For shooting this bug I used solution for ports ranges. Applying of this
    patch does not affect Little-Endian systems.

    Signed-off-by: Sergey Marinkevich
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Sergey Marinkevich
     

30 Mar, 2020

6 commits


29 Mar, 2020

4 commits

  • Instead of dropping refs+kfree, use the helper added in previous patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • nf_queue is problematic when another NF_QUEUE invocation happens
    from nf_reinject().

    1. nf_queue is invoked, increments state->sk refcount.
    2. skb is queued, waiting for verdict.
    3. sk is closed/released.
    3. verdict comes back, nf_reinject is called.
    4. nf_reinject drops the reference -- refcount can now drop to 0

    Instead of get_ref/release_ref pattern, we need to nest the get_ref calls:
    get_ref
    get_ref
    release_ref
    release_ref

    So that when we invoke the next processing stage (another netfilter
    or the okfn()), we hold at least one reference count on the
    devices/socket.

    After previous patch, it is now safe to put the entry even after okfn()
    has potentially free'd the skb.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The refcount is done via entry->skb, which does work fine.
    Major problem: When putting the refcount of the bridge ports, we
    must always put the references while the skb is still around.

    However, we will need to put the references after okfn() to avoid
    a possible 1 -> 0 -> 1 refcount transition, so we cannot use the
    skb pointer anymore.

    Place the physports in the queue entry structure instead to allow
    for refcounting changes in the next patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This is a preparation patch, no logical changes.
    Move free_entry into core and rename it to something more sensible.

    Will ease followup patches which will complicate the refcount handling.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 Mar, 2020

10 commits


26 Mar, 2020

2 commits

  • Overlapping header include additions in macsec.c

    A bug fix in 'net' overlapping with the removal of 'version'
    string in ena_netdev.c

    Overlapping test additions in selftests Makefile

    Overlapping PCI ID table adjustments in iwlwifi driver.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
    pkt->skb->tc_redirected = 1;
    ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
    pkt->skb->tc_from_ingress = 1;
    ^~

    To avoid a direct dependency with tc actions from netfilter, wrap the
    redirect bits around CONFIG_NET_REDIRECT and move helpers to
    include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
    only existing client of these bits in the tree.

    This patch adds skb_set_redirected() that sets on the redirected bit
    on the skbuff, it specifies if the packet was redirect from ingress
    and resets the timestamp (timestamp reset was originally missing in the
    netfilter bugfix).

    Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
    Reported-by: noreply@ellerman.id.au
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

25 Mar, 2020

6 commits

  • Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
    Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
    path after leaving the ifb egress path.

    This patch inconditionally sets on these two skb fields that are
    meaningful to the ifb driver. The existing forward action is guaranteed
    to run from ingress path.

    Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Make sure the forward action is only used from ingress.

    Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • ...and return -ENOTEMPTY to the front-end in this case, instead of
    proceeding. Currently, nft takes care of checking for these cases
    and not sending them to the kernel, but if we drop the set_overlap()
    call in nft we can end up in situations like:

    # nft add table t
    # nft add set t s '{ type inet_service ; flags interval ; }'
    # nft add element t s '{ 1 - 5 }'
    # nft add element t s '{ 6 - 10 }'
    # nft add element t s '{ 4 - 7 }'
    # nft list set t s
    table ip t {
    set s {
    type inet_service
    flags interval
    elements = { 1-3, 4-5, 6-7 }
    }
    }

    This change has the primary purpose of making the behaviour
    consistent with nft_set_pipapo, but is also functional to avoid
    inconsistent behaviour if userspace sends overlapping elements for
    any reason.

    v2: When we meet the same key data in the tree, as start element while
    inserting an end element, or as end element while inserting a start
    element, actually check that the existing element is active, before
    resetting the overlap flag (Pablo Neira Ayuso)

    Signed-off-by: Stefano Brivio
    Signed-off-by: Pablo Neira Ayuso

    Stefano Brivio
     
  • Replace negations of nft_rbtree_interval_end() with a new helper,
    nft_rbtree_interval_start(), wherever this helps to visualise the
    problem at hand, that is, for all the occurrences except for the
    comparison against given flags in __nft_rbtree_get().

    This gets especially useful in the next patch.

    Signed-off-by: Stefano Brivio
    Signed-off-by: Pablo Neira Ayuso

    Stefano Brivio
     
  • ...and return -ENOTEMPTY to the front-end on collision, -EEXIST if
    an identical element already exists. Together with the previous patch,
    element collision will now be returned to the user as -EEXIST.

    Reported-by: Phil Sutter
    Signed-off-by: Stefano Brivio
    Signed-off-by: Pablo Neira Ayuso

    Stefano Brivio
     
  • Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it
    might indicate that a given element (including intervals) already exists as
    such, or that the new element would clash with existing ones.

    If identical elements already exist, the front-end is ignoring this without
    returning error, in case NLM_F_EXCL is not set. However, if the new element
    can't be inserted due an overlap, we should report this to the user.

    To this purpose, allow set back-ends to return -ENOTEMPTY on collision with
    existing elements, translate that to -EEXIST, and return that to userspace,
    no matter if NLM_F_EXCL was set.

    Reported-by: Phil Sutter
    Signed-off-by: Stefano Brivio
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

20 Mar, 2020

5 commits

  • nf_flow_rule_match() sets control.addr_type in key, so needs to also set
    the corresponding mask. An exact match is wanted, so mask is all ones.

    Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
    Signed-off-by: Edward Cree
    Signed-off-by: Pablo Neira Ayuso

    Edward Cree
     
  • The tc ct action does not cache the route in the flowtable entry.

    Fixes: 88bf6e4114d5 ("netfilter: flowtable: add tunnel encap/decap action offload support")
    Fixes: cfab6dbd0ecf ("netfilter: flowtable: add tunnel match offload support")
    Signed-off-by: wenxu
    Signed-off-by: Pablo Neira Ayuso

    wenxu
     
  • Freeing a flowtable with offloaded flows, the flow are deleted from
    hardware but are not deleted from the flow table, leaking them,
    and leaving their offload bit on.

    Add a second pass of the disabled gc to delete the these flows from
    the flow table before freeing it.

    Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
    Signed-off-by: Paul Blakey
    Signed-off-by: Pablo Neira Ayuso

    Paul Blakey
     
  • Since pskb_may_pull may change skb->data, so we need to reload ip{v6}h at
    the right place.

    Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
    Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
    Signed-off-by: Haishuang Yan
    Signed-off-by: Pablo Neira Ayuso

    Haishuang Yan
     
  • Since nf_flow_snat_port and nf_flow_snat_ip{v6} call pskb_may_pull()
    which may change skb->data, so we need to reload ip{v6}h at the right
    place.

    Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
    Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
    Signed-off-by: Haishuang Yan
    Signed-off-by: Pablo Neira Ayuso

    Haishuang Yan
     

19 Mar, 2020

4 commits