25 Jan, 2021

1 commit


20 Jan, 2021

3 commits

  • commit 869f4fdaf4ca7bb6e0d05caf6fa1108dddc346a7 upstream.

    When register_pernet_subsys() fails, nf_nat_bysource
    should be freed just like when nf_ct_extend_register()
    fails.

    Fixes: 1cd472bf036ca ("netfilter: nf_nat: add nat hook register functions to nf_nat")
    Signed-off-by: Dinghao Liu
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Dinghao Liu
     
  • commit f6351c3f1c27c80535d76cac2299aec44c36291e upstream.

    The old way of changing the conntrack hashsize runtime was through changing
    the module param via file /sys/module/nf_conntrack/parameters/hashsize. This
    was extended to sysctl change in commit 3183ab8997a4 ("netfilter: conntrack:
    allow increasing bucket size via sysctl too").

    The commit introduced second "user" variable nf_conntrack_htable_size_user
    which shadow actual variable nf_conntrack_htable_size. When hashsize is
    changed via module param this "user" variable isn't updated. This results in
    sysctl net/netfilter/nf_conntrack_buckets shows the wrong value when users
    update via the old way.

    This patch fix the issue by always updating "user" variable when reading the
    proc file. This will take care of changes to the actual variable without
    sysctl need to be aware.

    Fixes: 3183ab8997a4 ("netfilter: conntrack: allow increasing bucket size via sysctl too")
    Reported-by: Yoel Caspersen
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jesper Dangaard Brouer
     
  • [ Upstream commit 2b33d6ffa9e38f344418976b06057e2fc2aa9e2a ]

    currently mtype_resize() can cause oops

    t = ip_set_alloc(htable_size(htable_bits));
    if (!t) {
    ret = -ENOMEM;
    goto out;
    }
    t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));

    Increased htable_bits can force htable_size() to return 0.
    In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
    so follwoing access to t->hregion should trigger an OOPS.

    Signed-off-by: Vasily Averin
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Vasily Averin
     

13 Jan, 2021

3 commits

  • commit 95cd4bca7b1f4a25810f3ddfc5e767fb46931789 upstream.

    If userspace requests a feature which is not available the original set
    definition, then bail out with EOPNOTSUPP. If userspace sends
    unsupported dynset flags (new feature not supported by this kernel),
    then report EOPNOTSUPP to userspace. EINVAL should be only used to
    report malformed netlink messages from userspace.

    Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 6cb56218ad9e580e519dcd23bfb3db08d8692e5a upstream.

    syzbot reports:
    detected buffer overflow in strlen
    [..]
    Call Trace:
    strlen include/linux/string.h:325 [inline]
    strlcpy include/linux/string.h:348 [inline]
    xt_rateest_tg_checkentry+0x2a5/0x6b0 net/netfilter/xt_RATEEST.c:143

    strlcpy assumes src is a c-string. Check info->name before its used.

    Reported-by: syzbot+e86f7c428c8c50db65b4@syzkaller.appspotmail.com
    Fixes: 5859034d7eb8793 ("[NETFILTER]: x_tables: add RATEEST target")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 5c8193f568ae16f3242abad6518dc2ca6c8eef86 upstream.

    htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds

    UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 0 PID: 8498 Comm: syz-executor519
    Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
    Call Trace:
    __dump_stack lib/dump_stack.c:79 [inline]
    dump_stack+0x107/0x163 lib/dump_stack.c:120
    ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
    __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
    htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
    hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
    ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
    nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2399
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This patch replaces htable_bits() by simple fls(hashsize - 1) call:
    it alone returns valid nbits both for round and non-round hashsizes.
    It is normal to set any nbits here because it is validated inside
    following htable_size() call which returns 0 for nbits>31.

    Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation")
    Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com
    Signed-off-by: Vasily Averin
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     

11 Dec, 2020

1 commit


09 Dec, 2020

4 commits

  • Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id
    hash calculation") the ct id will not change from initialization to
    confirmation. Removing the confirmation check allows for things like
    adding an element to a 'typeof ct id' set in prerouting upon reception
    of the first packet of a new connection, and then being able to
    reference that set consistently both before and after the connection
    is confirmed.

    Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation")
    Signed-off-by: Brett Mastbergen
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Brett Mastbergen
     
  • Linux 5.10-rc7

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: Ie61b3510311a825ee57bee12610e25bc1500b350

    Greg Kroah-Hartman
     
  • Add an explicit comment in the code to describe the indirect
    serialization of the holders of the commit_mutex with the rtnl_mutex.
    Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
    netdevice from preparation phase") already describes this, but a comment
    in this case is better for reference.

    Reported-by: Vladimir Oltean
    Reviewed-by: Vladimir Oltean
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
    8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
    days"), otherwise ruleset listing breaks.

    Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

08 Dec, 2020

1 commit

  • When running concurrent iptables rules replacement with data, the per CPU
    sequence count is checked after the assignment of the new information.
    The sequence count is used to synchronize with the packet path without the
    use of any explicit locking. If there are any packets in the packet path using
    the table information, the sequence count is incremented to an odd value and
    is incremented to an even after the packet process completion.

    The new table value assignment is followed by a write memory barrier so every
    CPU should see the latest value. If the packet path has started with the old
    table information, the sequence counter will be odd and the iptables
    replacement will wait till the sequence count is even prior to freeing the
    old table info.

    However, this assumes that the new table information assignment and the memory
    barrier is actually executed prior to the counter check in the replacement
    thread. If CPU decides to execute the assignment later as there is no user of
    the table information prior to the sequence check, the packet path in another
    CPU may use the old table information. The replacement thread would then free
    the table information under it leading to a use after free in the packet
    processing context-

    Unable to handle kernel NULL pointer dereference at virtual
    address 000000000000008e
    pc : ip6t_do_table+0x5d0/0x89c
    lr : ip6t_do_table+0x5b8/0x89c
    ip6t_do_table+0x5d0/0x89c
    ip6table_filter_hook+0x24/0x30
    nf_hook_slow+0x84/0x120
    ip6_input+0x74/0xe0
    ip6_rcv_finish+0x7c/0x128
    ipv6_rcv+0xac/0xe4
    __netif_receive_skb+0x84/0x17c
    process_backlog+0x15c/0x1b8
    napi_poll+0x88/0x284
    net_rx_action+0xbc/0x23c
    __do_softirq+0x20c/0x48c

    This could be fixed by forcing instruction order after the new table
    information assignment or by switching to RCU for the synchronization.

    Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
    Reported-by: Sean Tranchetti
    Reported-by: kernel test robot
    Suggested-by: Florian Westphal
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: Pablo Neira Ayuso

    Subash Abhinov Kasiviswanathan
     

27 Nov, 2020

3 commits

  • Userspace might match on prefix bytes of header fields if they are on
    the byte boundary, this requires that the mask is adjusted accordingly.
    Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not
    allowed for this type of selector.

    The bitwise expression might be optimized out by userspace, hence the
    kernel needs to infer the prefix from the number of payload bytes to
    match on. This patch adds nft_payload_offload_mask() to calculate the
    bitmask to match on the prefix.

    Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds nft_flow_rule_set_addr_type() to set the address type
    from the nft_payload expression accordingly.

    If the address type is not set in the control dissector then a rule that
    matches either on source or destination IP address does not work.

    After this patch, nft hardware offload generates the flow dissector
    configuration as tc-flower does to match on an IP address.

    This patch has been also tested functionally to make sure packets are
    filtered out by the NIC.

    This is also getting the code aligned with the existing netfilter flow
    offload infrastructure which is also setting the control dissector.

    Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • kmemleak report a memory leak as follows:

    BUG: memory leak
    unreferenced object 0xffff8880759ea000 (size 256):
    backtrace:
    [] kmem_cache_zalloc include/linux/slab.h:656 [inline]
    [] __proc_create+0x23d/0x7d0 fs/proc/generic.c:421
    [] proc_create_reg+0x8e/0x140 fs/proc/generic.c:535
    [] proc_create_net_data+0x8c/0x1b0 fs/proc/proc_net.c:126
    [] ip_vs_control_net_init+0x308/0x13a0 net/netfilter/ipvs/ip_vs_ctl.c:4169
    [] __ip_vs_init+0x211/0x400 net/netfilter/ipvs/ip_vs_core.c:2429
    [] ops_init+0xa8/0x3c0 net/core/net_namespace.c:151
    [] setup_net+0x2de/0x7e0 net/core/net_namespace.c:341
    [] copy_net_ns+0x27d/0x530 net/core/net_namespace.c:482
    [] create_new_namespaces+0x382/0xa30 kernel/nsproxy.c:110
    [] copy_namespaces+0x2e6/0x3b0 kernel/nsproxy.c:179
    [] copy_process+0x220a/0x5f00 kernel/fork.c:2072
    [] _do_fork+0xc7/0xda0 kernel/fork.c:2428
    [] __do_sys_clone3+0x18a/0x280 kernel/fork.c:2703
    [] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    In the error path of ip_vs_control_net_init(), remove_proc_entry() needs
    to be called to remove the added proc entry, otherwise a memory leak
    will occur.

    Also, add some '#ifdef CONFIG_PROC_FS' because proc_create_net* return NULL
    when PROC is not used.

    Fixes: b17fc9963f83 ("IPVS: netns, ip_vs_stats and its procfs")
    Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.")
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Wang Hai
     

26 Nov, 2020

2 commits

  • There are reports wrt lockdep splat in nftables, e.g.:
    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 31416 at net/netfilter/nf_tables_api.c:622
    lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
    ...

    These are caused by an earlier, unrelated bug such as a n ABBA deadlock
    in a different subsystem.
    In such an event, lockdep is disabled and lockdep_is_held returns true
    unconditionally. This then causes the WARN() in nf_tables.

    Make the WARN conditional on lockdep still active to avoid this.

    Fixes: f102d66b335a417 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
    Reported-by: Naresh Kamboju
    Link: https://lore.kernel.org/linux-kselftest/CA+G9fYvFUpODs+NkSYcnwKnXm62tmP=ksLeBPmB+KFrB2rvCtQ@mail.gmail.com/
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • syzbot found that we are not validating user input properly
    before copying 16 bytes [1].

    Using NLA_BINARY in ipaddr_policy[] for IPv6 address is not correct,
    since it ensures at most 16 bytes were provided.

    We should instead make sure user provided exactly 16 bytes.

    In old kernels (before v4.20), fix would be to remove the NLA_BINARY,
    since NLA_POLICY_EXACT_LEN() was not yet available.

    [1]
    BUG: KMSAN: uninit-value in hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892
    CPU: 1 PID: 11611 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x21c/0x280 lib/dump_stack.c:118
    kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
    __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
    hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892
    hash_ip6_uadt+0x976/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:267
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45deb9
    Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fe2e503fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000029ec0 RCX: 000000000045deb9
    RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
    RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
    R13: 000000000169fb7f R14: 00007fe2e50409c0 R15: 000000000118bf2c

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
    __msan_chain_origin+0x57/0xa0 mm/kmsan/kmsan_instr.c:147
    ip6_netmask include/linux/netfilter/ipset/pfxlen.h:49 [inline]
    hash_ip6_netmask net/netfilter/ipset/ip_set_hash_ip.c:185 [inline]
    hash_ip6_uadt+0xb1c/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:263
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
    kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226
    kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246
    __msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110
    ip_set_get_ipaddr6+0x2cb/0x370 net/netfilter/ipset/ip_set_core.c:310
    hash_ip6_uadt+0x439/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
    kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
    slab_alloc_node mm/slub.c:2906 [inline]
    __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512
    __kmalloc_reserve net/core/skbuff.c:142 [inline]
    __alloc_skb+0x309/0xae0 net/core/skbuff.c:210
    alloc_skb include/linux/skbuff.h:1094 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
    netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

09 Nov, 2020

1 commit


31 Oct, 2020

1 commit

  • In ip_set_match_extensions(), for sets with counters, we take care of
    updating counters themselves by calling ip_set_update_counter(), and of
    checking if the given comparison and values match, by calling
    ip_set_match_counter() if needed.

    However, if a given comparison on counters doesn't match the configured
    values, that doesn't mean the set entry itself isn't matching.

    This fix restores the behaviour we had before commit 4750005a85f7
    ("netfilter: ipset: Fix "don't update counters" mode when counters used
    at the matching"), without reintroducing the issue fixed there: back
    then, mtype_data_match() first updated counters in any case, and then
    took care of matching on counters.

    Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
    ip_set_update_counter() will anyway skip counter updates if desired.

    The issue observed is illustrated by this reproducer:

    ipset create c hash:ip counters
    ipset add c 192.0.2.1
    iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP

    if we now send packets from 192.0.2.1, bytes and packets counters
    for the entry as shown by 'ipset list' are always zero, and, no
    matter how many bytes we send, the rule will never match, because
    counters themselves are not updated.

    Reported-by: Mithil Mhatre
    Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Stefano Brivio
     

30 Oct, 2020

3 commits

  • If userspace does not include the trailing end of batch message, then
    nfnetlink aborts the transaction. This allows to check that ruleset
    updates trigger no errors.

    After this patch, invoking this command from the prerouting chain:

    # nft -c add rule x y fib saddr . oif type local

    fails since oif is not supported there.

    This patch fixes the lack of rule validation from the abort/check path
    to catch configuration errors such as the one above.

    Fixes: a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • If netfilter changes the packet mark when mangling, the packet is
    rerouted using the route_me_harder set of functions. Prior to this
    commit, there's one big difference between route_me_harder and the
    ordinary initial routing functions, described in the comment above
    __ip_queue_xmit():

    /* Note: skb->sk can be different from sk, in case of tunnels */
    int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,

    That function goes on to correctly make use of sk->sk_bound_dev_if,
    rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
    tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
    It will make some transformations to that packet, and then it will send
    the encapsulated packet out of a *new* socket. That new socket will
    basically always have a different sk_bound_dev_if (otherwise there'd be
    a routing loop). So for the purposes of routing the encapsulated packet,
    the routing information as it pertains to the socket should come from
    that socket's sk, rather than the packet's original skb->sk. For that
    reason __ip_queue_xmit() and related functions all do the right thing.

    One might argue that all tunnels should just call skb_orphan(skb) before
    transmitting the encapsulated packet into the new socket. But tunnels do
    *not* do this -- and this is wisely avoided in skb_scrub_packet() too --
    because features like TSQ rely on skb->destructor() being called when
    that buffer space is truely available again. Calling skb_orphan(skb) too
    early would result in buffers filling up unnecessarily and accounting
    info being all wrong. Instead, additional routing must take into account
    the new sk, just as __ip_queue_xmit() notes.

    So, this commit addresses the problem by fishing the correct sk out of
    state->sk -- it's already set properly in the call to nf_hook() in
    __ip_local_out(), which receives the sk as part of its normal
    functionality. So we make sure to plumb state->sk through the various
    route_me_harder functions, and then make correct use of it following the
    example of __ip_queue_xmit().

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Jason A. Donenfeld
     
  • The netlink report should be sent regardless the available listeners.

    Fixes: 84d7fce69388 ("netfilter: nf_tables: export rule-set generation ID")
    Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

27 Oct, 2020

1 commit


26 Oct, 2020

1 commit


22 Oct, 2020

1 commit

  • Similar to 7980d2eabde8 ("ipvs: clear skb->tstamp in forwarding path").
    fq qdisc requires tstamp to be cleared in forwarding path.

    Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths")
    Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
    Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

20 Oct, 2020

3 commits

  • This patch fixes the issue due to:

    BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
    net/netfilter/nf_tables_offload.c:40
    Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

    The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

    This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

    Add nft_expr_more() and use it to fix this problem.

    Signed-off-by: Saeed Mirzamohammadi
    Signed-off-by: Pablo Neira Ayuso

    Saeed Mirzamohammadi
     
  • If the first packet conntrack sees after a re-register is an outgoing
    keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
    SND.NXT-1.
    When the peer correctly acknowledges SND.NXT, tcp_in_window fails
    check III (Upper bound for valid (s)ack: sack _nfct = 0 and in later conntrack iptables rules not matching.
    In cases where iptables are dropping packets that do not match
    conntrack rules this can result in idle tcp connections to time out.

    v2: adjust td_end when getting the reply rather than when sending out
    the keepalive packet.

    Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register")
    Signed-off-by: Francesco Ruggeri
    Signed-off-by: Pablo Neira Ayuso

    Francesco Ruggeri
     
  • Outputting client,virtual,dst addresses info when tcp state changes,
    which makes the connection debug more clear

    Signed-off-by: longguang.yue
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    longguang.yue
     

16 Oct, 2020

2 commits

  • Minor conflicts in net/mptcp/protocol.h and
    tools/testing/selftests/net/Makefile.

    In both cases code was added on both sides in the same place
    so just keep both.

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • nftables payload statements are used to mangle SCTP headers, but they can
    only replace the Internet Checksum. As a consequence, nftables rules that
    mangle sport/dport/vtag in SCTP headers potentially generate packets that
    are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the
    rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'.

    Fix this extending uAPI definitions and L4 checksum update function, in a
    way that userspace programs (e.g. nft) can instruct the kernel to compute
    CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES
    is 'y' or 'm' in the kernel build configuration.

    Signed-off-by: Davide Caratti
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

15 Oct, 2020

1 commit

  • This definition is used by the iptables legacy UAPI, restore it.

    Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support")
    Reported-by: Jason A. Donenfeld
    Tested-by: Jason A. Donenfeld
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Jakub Kicinski

    Pablo Neira Ayuso
     

14 Oct, 2020

1 commit

  • Dump vlan tag and proto for the usual vlan offload case if the
    NF_LOG_MACDECODE flag is set on. Without this information the logging is
    misleading as there is no reference to the VLAN header.

    [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
    [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1

    Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

12 Oct, 2020

7 commits