20 Jan, 2021

1 commit

  • [ Upstream commit 2b33d6ffa9e38f344418976b06057e2fc2aa9e2a ]

    currently mtype_resize() can cause oops

    t = ip_set_alloc(htable_size(htable_bits));
    if (!t) {
    ret = -ENOMEM;
    goto out;
    }
    t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));

    Increased htable_bits can force htable_size() to return 0.
    In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
    so follwoing access to t->hregion should trigger an OOPS.

    Signed-off-by: Vasily Averin
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Vasily Averin
     

13 Jan, 2021

1 commit

  • commit 5c8193f568ae16f3242abad6518dc2ca6c8eef86 upstream.

    htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds

    UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 0 PID: 8498 Comm: syz-executor519
    Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
    Call Trace:
    __dump_stack lib/dump_stack.c:79 [inline]
    dump_stack+0x107/0x163 lib/dump_stack.c:120
    ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
    __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
    htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
    hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
    ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
    nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2399
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This patch replaces htable_bits() by simple fls(hashsize - 1) call:
    it alone returns valid nbits both for round and non-round hashsizes.
    It is normal to set any nbits here because it is validated inside
    following htable_size() call which returns 0 for nbits>31.

    Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation")
    Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com
    Signed-off-by: Vasily Averin
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     

26 Nov, 2020

1 commit

  • syzbot found that we are not validating user input properly
    before copying 16 bytes [1].

    Using NLA_BINARY in ipaddr_policy[] for IPv6 address is not correct,
    since it ensures at most 16 bytes were provided.

    We should instead make sure user provided exactly 16 bytes.

    In old kernels (before v4.20), fix would be to remove the NLA_BINARY,
    since NLA_POLICY_EXACT_LEN() was not yet available.

    [1]
    BUG: KMSAN: uninit-value in hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892
    CPU: 1 PID: 11611 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x21c/0x280 lib/dump_stack.c:118
    kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
    __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
    hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892
    hash_ip6_uadt+0x976/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:267
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45deb9
    Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fe2e503fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000029ec0 RCX: 000000000045deb9
    RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
    RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
    R13: 000000000169fb7f R14: 00007fe2e50409c0 R15: 000000000118bf2c

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
    __msan_chain_origin+0x57/0xa0 mm/kmsan/kmsan_instr.c:147
    ip6_netmask include/linux/netfilter/ipset/pfxlen.h:49 [inline]
    hash_ip6_netmask net/netfilter/ipset/ip_set_hash_ip.c:185 [inline]
    hash_ip6_uadt+0xb1c/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:263
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
    kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226
    kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246
    __msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110
    ip_set_get_ipaddr6+0x2cb/0x370 net/netfilter/ipset/ip_set_core.c:310
    hash_ip6_uadt+0x439/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255
    call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
    ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
    ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
    nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
    netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
    nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
    kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
    kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
    slab_alloc_node mm/slub.c:2906 [inline]
    __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512
    __kmalloc_reserve net/core/skbuff.c:142 [inline]
    __alloc_skb+0x309/0xae0 net/core/skbuff.c:210
    alloc_skb include/linux/skbuff.h:1094 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
    netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
    __do_sys_sendmsg net/socket.c:2449 [inline]
    __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
    do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

31 Oct, 2020

1 commit

  • In ip_set_match_extensions(), for sets with counters, we take care of
    updating counters themselves by calling ip_set_update_counter(), and of
    checking if the given comparison and values match, by calling
    ip_set_match_counter() if needed.

    However, if a given comparison on counters doesn't match the configured
    values, that doesn't mean the set entry itself isn't matching.

    This fix restores the behaviour we had before commit 4750005a85f7
    ("netfilter: ipset: Fix "don't update counters" mode when counters used
    at the matching"), without reintroducing the issue fixed there: back
    then, mtype_data_match() first updated counters in any case, and then
    took care of matching on counters.

    Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
    ip_set_update_counter() will anyway skip counter updates if desired.

    The issue observed is illustrated by this reproducer:

    ipset create c hash:ip counters
    ipset add c 192.0.2.1
    iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP

    if we now send packets from 192.0.2.1, bytes and packets counters
    for the entry as shown by 'ipset list' are always zero, and, no
    matter how many bytes we send, the rule will never match, because
    counters themselves are not updated.

    Reported-by: Mithil Mhatre
    Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Stefano Brivio
     

05 Oct, 2020

1 commit


22 Jul, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

01 Jul, 2020

1 commit

  • Whenever ip_set_alloc() is used, allocated memory can either
    use kmalloc() or vmalloc(). We should call kvfree() or
    ip_set_free()

    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 21935 Comm: syz-executor.3 Not tainted 5.8.0-rc2-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__phys_addr+0xa7/0x110 arch/x86/mm/physaddr.c:28
    Code: 1d 7a 09 4c 89 e3 31 ff 48 d3 eb 48 89 de e8 d0 58 3f 00 48 85 db 75 0d e8 26 5c 3f 00 4c 89 e0 5b 5d 41 5c c3 e8 19 5c 3f 00 0b e8 12 5c 3f 00 48 c7 c0 10 10 a8 89 48 ba 00 00 00 00 00 fc
    RSP: 0000:ffffc900018572c0 EFLAGS: 00010046
    RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc9000fac3000
    RDX: 0000000000040000 RSI: ffffffff8133f437 RDI: 0000000000000007
    RBP: ffffc90098aff000 R08: 0000000000000000 R09: ffff8880ae636cdb
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000408018aff000
    R13: 0000000000080000 R14: 000000000000001d R15: ffffc900018573d8
    FS: 00007fc540c66700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc9dcd67200 CR3: 0000000059411000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    virt_to_head_page include/linux/mm.h:841 [inline]
    virt_to_cache mm/slab.h:474 [inline]
    kfree+0x77/0x2c0 mm/slab.c:3749
    hash_net_create+0xbb2/0xd70 net/netfilter/ipset/ip_set_hash_gen.h:1536
    ip_set_create+0x6a2/0x13c0 net/netfilter/ipset/ip_set_core.c:1128
    nfnetlink_rcv_msg+0xbe8/0xea0 net/netfilter/nfnetlink.c:230
    netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
    nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:564
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
    do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45cb19
    Code: Bad RIP value.
    RSP: 002b:00007fc540c65c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004fed80 RCX: 000000000045cb19
    RDX: 0000000000000000 RSI: 0000000020001080 RDI: 0000000000000003
    RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 000000000000095e R14: 00000000004cc295 R15: 00007fc540c666d4

    Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
    Fixes: 03c8b234e61a ("netfilter: ipset: Generalize extensions support")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

25 Jun, 2020

1 commit

  • When using ip_set with counters and comment, traffic causes the kernel
    to panic on 32-bit ARM:

    Alignment trap: not handling instruction e1b82f9f at []
    Unhandled fault: alignment exception (0x221) at 0xea08133c
    PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]

    The problem occurs when we try to update the 64-bit counters - the
    faulting address above is not 64-bit aligned. The problem occurs
    due to the way elements are allocated, for example:

    set->dsize = ip_set_elem_len(set, tb, 0, 0);
    map = ip_set_alloc(sizeof(*map) + elements * set->dsize);

    If the element has a requirement for a member to be 64-bit aligned,
    and set->dsize is not a multiple of 8, but is a multiple of four,
    then every odd numbered elements will be misaligned - and hitting
    an atomic64_add() on that element will cause the kernel to panic.

    ip_set_elem_len() must return a size that is rounded to the maximum
    alignment of any extension field stored in the element. This change
    ensures that is the case.

    Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment")
    Signed-off-by: Russell King
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Russell King
     

26 May, 2020

1 commit

  • If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not
    update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE
    must be set, not unset.

    Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters")
    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Phil Sutter
     

06 Apr, 2020

1 commit

  • ip_set_type_list is traversed using list_for_each_entry_rcu
    outside an RCU read-side critical section but under the protection
    of ip_set_type_mutex.

    Hence, add corresponding lockdep expression to silence false-positive
    warnings, and harden RCU lists.

    Signed-off-by: Amol Grover
    Signed-off-by: Pablo Neira Ayuso

    Amol Grover
     

15 Mar, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    Lastly, fix checkpatch.pl warning
    WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
    in net/bridge/netfilter/ebtables.c

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

22 Feb, 2020

2 commits

  • When the forceadd option is enabled, the hash:* types should find and replace
    the first entry in the bucket with the new one if there are no reuseable
    (deleted or timed out) entries. However, the position index was just not set
    to zero and remained the invalid -1 if there were no reuseable entries.

    Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
    Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik
     
  • In the case of huge hash:* types of sets, due to the single spinlock of
    a set the processing of the whole set under spinlock protection could take
    too long.

    There were four places where the whole hash table of the set was processed
    from bucket to bucket under holding the spinlock:

    - During resizing a set, the original set was locked to exclude kernel side
    add/del element operations (userspace add/del is excluded by the
    nfnetlink mutex). The original set is actually just read during the
    resize, so the spinlocking is replaced with rcu locking of regions.
    However, thus there can be parallel kernel side add/del of entries.
    In order not to loose those operations a backlog is added and replayed
    after the successful resize.
    - Garbage collection of timed out entries was also protected by the spinlock.
    In order not to lock too long, region locking is introduced and a single
    region is processed in one gc go. Also, the simple timer based gc running
    is replaced with a workqueue based solution. The internal book-keeping
    (number of elements, size of extensions) is moved to region level due to
    the region locking.
    - Adding elements: when the max number of the elements is reached, the gc
    was called to evict the timed out entries. The new approach is that the gc
    is called just for the matching region, assuming that if the region
    (proportionally) seems to be full, then the whole set does. We could scan
    the other regions to check every entry under rcu locking, but for huge
    sets it'd mean a slowdown at adding elements.
    - Listing the set header data: when the set was defined with timeout
    support, the garbage collector was called to clean up timed out entries
    to get the correct element numbers and set size values. Now the set is
    scanned to check non-timed out entries, without actually calling the gc
    for the whole set.

    Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
    SOFTIRQ-unsafe lock order issues during working on the patch.

    Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
    Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
    Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
    Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik
     

30 Jan, 2020

1 commit

  • find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
    However, in the error path there can be a follow-up recvmsg() without
    the mutex held. Use the start() function of struct netlink_dump_control
    instead of dump() to verify and report if the specified set does not
    exist.

    Thanks to Pablo Neira Ayuso for helping me to understand the subleties
    of the netlink protocol.

    Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Kadlecsik József
     

21 Jan, 2020

1 commit

  • The bitmap allocation did not use full unsigned long sizes
    when calculating the required size and that was triggered by KASAN
    as slab-out-of-bounds read in several places. The patch fixes all
    of them.

    Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com
    Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com
    Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com
    Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com
    Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com
    Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com
    Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Kadlecsik József
     

14 Jan, 2020

1 commit

  • map->members is freed by ip_set_free() right before using it in
    mtype_ext_cleanup() again. So we just have to move it down.

    Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com
    Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function")
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Cong Wang
    Signed-off-by: Pablo Neira Ayuso

    Cong Wang
     

09 Jan, 2020

1 commit

  • The set uadt functions assume lineno is never NULL, but it is in
    case of ip_set_utest().

    syzkaller managed to generate a netlink message that calls this with
    LINENO attr present:

    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104
    Call Trace:
    ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867
    nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563

    pass a dummy lineno storage, its easier than patching all set
    implementations.

    This seems to be a day-0 bug.

    Cc: Jozsef Kadlecsik
    Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com
    Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support")
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Nov, 2019

1 commit


10 Nov, 2019

1 commit


05 Nov, 2019

4 commits

  • Since v5.2 (commit "netlink: re-add parse/validate functions in strict
    mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies which did
    not support strict mode and convert from deprecated parsings to verified ones.

    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik
     
  • Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC
    address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
    went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on
    destination MAC address for mac and ipmac sets").

    When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that
    hash:ip,mac sets also support IPv6 as family, and this is covered by a
    separate function, hash_ipmac6_kadt().

    In hash:ip,mac sets, the first dimension is the IP address, and the
    second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
    in flags while deciding which MAC address to copy, destination or
    source.

    This way, mixing source and destination matches for the two dimensions
    of ip,mac hash type works as expected, also for IPv6. With this setup:

    ip netns add A
    ip link add veth1 type veth peer name veth2 netns A
    ip addr add 2001:db8::1/64 dev veth1
    ip -net A addr add 2001:db8::2/64 dev veth2
    ip link set veth1 up
    ip -net A link set veth2 up

    dst=$(ip netns exec A cat /sys/class/net/veth2/address)

    ip netns exec A ipset create test_hash hash:ip,mac family inet6
    ip netns exec A ipset add test_hash 2001:db8::1,${dst}
    ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
    ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

    ipset now correctly matches a test packet:

    # ping -c1 2001:db8::2 >/dev/null
    # echo $?
    0

    Reported-by: Chen, Yi
    Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik

    Stefano Brivio
     
  • The copy_to_user() function returns the number of bytes remaining to be
    copied. In this code, that positive return is checked at the end of the
    function and we return zero/success. What we should do instead is
    return -EFAULT.

    Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Jozsef Kadlecsik

    Dan Carpenter
     
  • The net,iface equal functions currently compares the full interface
    names. In several cases, wildcard (or prefix) matching is useful. For
    example, when converting a large iptables rule-set to make use of ipset,
    I was able to significantly reduce the number of set elements by making
    use of wildcard matching.

    Wildcard matching is enabled by adding "wildcard" when adding an element
    to a set. Internally, this causes the IPSET_FLAG_IFACE_WILDCARD-flag to
    be set. When this flag is set, only the initial part of the interface
    name is used for comparison.

    Wildcard matching is done per element and not per set, as there are many
    cases where mixing wildcard and non-wildcard elements are useful. This
    means that is up to the user to handle (avoid) overlapping interface
    names.

    Signed-off-by: Kristian Evensen
    Signed-off-by: Jozsef Kadlecsik

    Kristian Evensen
     

08 Oct, 2019

6 commits


14 Aug, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

    2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

    3) More strict validation of IPVS sysctl values, from Junwei Hu.

    4) Remove unnecessary spaces after on the right hand side of assignments,
    from yangxingwu.

    5) Add offload support for bitwise operation.

    6) Extend the nft_offload_reg structure to store immediate date.

    7) Collapse several ip_set header files into ip_set.h, from
    Jeremy Sowden.

    8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
    from Jeremy Sowden.

    9) Fix several sparse warnings due to missing prototypes, from
    Valdis Kletnieks.

    10) Use static lock initialiser to ensure connlabel spinlock is
    initialized on boot time to fix sched/act_ct.c, patch
    from Florian Westphal.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

13 Aug, 2019

2 commits

  • linux/netfilter/ipset/ip_set.h included four other header files:

    include/linux/netfilter/ipset/ip_set_comment.h
    include/linux/netfilter/ipset/ip_set_counter.h
    include/linux/netfilter/ipset/ip_set_skbinfo.h
    include/linux/netfilter/ipset/ip_set_timeout.h

    Of these the first three were not included anywhere else. The last,
    ip_set_timeout.h, was included in a couple of other places, but defined
    inline functions which call other inline functions defined in ip_set.h,
    so ip_set.h had to be included before it.

    Inlined all four into ip_set.h, and updated the other files that
    included ip_set_timeout.h.

    Signed-off-by: Jeremy Sowden
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • This patch removes extra spaces.

    Signed-off-by: yangxingwu
    Signed-off-by: Pablo Neira Ayuso

    yangxingwu
     

30 Jul, 2019

3 commits

  • Shijie Luo reported that when stress-testing ipset with multiple concurrent
    create, rename, flush, list, destroy commands, it can result

    ipset : Broken LIST kernel message: missing DATA part!

    error messages and broken list results. The problem was the rename operation
    was not properly handled with respect of listing. The patch fixes the issue.

    Reported-by: Shijie Luo
    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik
     
  • In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
    for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
    KADT functions for sets matching on MAC addreses the copy of source or
    destination MAC address depending on the configured match.

    This was done correctly for hash:mac, but for hash:ip,mac and
    bitmap:ip,mac, copying and pasting the same code block presents an
    obvious problem: in these two set types, the MAC address is the second
    dimension, not the first one, and we are actually selecting the MAC
    address depending on whether the first dimension (IP address) specifies
    source or destination.

    Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.

    This way, mixing source and destination matches for the two dimensions
    of ip,mac set types works as expected. With this setup:

    ip netns add A
    ip link add veth1 type veth peer name veth2 netns A
    ip addr add 192.0.2.1/24 dev veth1
    ip -net A addr add 192.0.2.2/24 dev veth2
    ip link set veth1 up
    ip -net A link set veth2 up

    dst=$(ip netns exec A cat /sys/class/net/veth2/address)

    ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
    ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
    ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP

    ip netns exec A ipset create test_hash hash:ip,mac
    ip netns exec A ipset add test_hash 192.0.2.1,${dst}
    ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

    ipset correctly matches a test packet:

    # ping -c1 192.0.2.2 >/dev/null
    # echo $?
    0

    Reported-by: Chen Yi
    Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik

    Stefano Brivio
     
  • In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
    for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
    KADT check that prevents matching on destination MAC addresses for
    hash:mac sets, but forgot to remove the same check for hash:ip,mac set.

    Drop this check: functionality is now commented in man pages and there's
    no reason to restrict to source MAC address matching anymore.

    Reported-by: Chen Yi
    Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik

    Stefano Brivio
     

25 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

10 Jun, 2019

3 commits

  • It's better to use my kadlec@netfilter.org email address in
    the source code. I might not be able to use
    kadlec@blackhole.kfki.hu in the future.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik
     
  • If a fresh array block is allocated during resize, the current in-memory
    set size should be increased by the size of the block, not replaced by it.

    Before the fix, adding entries to a hash set type, leading to a table
    resize, caused an inconsistent memory size to be reported. This becomes
    more obvious when swapping sets with similar sizes:

    # cat hash_ip_size.sh
    #!/bin/sh
    FAIL_RETRIES=10

    tries=0
    while [ ${tries} -lt ${FAIL_RETRIES} ]; do
    ipset create t1 hash:ip
    for i in `seq 1 4345`; do
    ipset add t1 1.2.$((i / 255)).$((i % 255))
    done
    t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"

    ipset create t2 hash:ip
    for i in `seq 1 4360`; do
    ipset add t2 1.2.$((i / 255)).$((i % 255))
    done
    t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"

    ipset swap t1 t2
    t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"
    t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"

    ipset destroy t1
    ipset destroy t2
    tries=$((tries + 1))

    if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then
    echo "FAIL after ${tries} tries:"
    echo "T1 size ${t1_init}, after swap ${t1_swap}"
    echo "T2 size ${t2_init}, after swap ${t2_swap}"
    exit 1
    fi
    done
    echo "PASS"
    # echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control
    # ./hash_ip_size.sh
    [ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa
    [ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163)
    [ 2035.080353] Table destroy by resize 00000000fe6551fa
    FAIL after 4 tries:
    T1 size 9064, after swap 71128
    T2 size 71128, after swap 9064

    Reported-by: NOYB
    Fixes: 9e41f26a505c ("netfilter: ipset: Count non-static extension memory for userspace")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik

    Stefano Brivio
     
  • In dump_init() the outdated comment was incorrect and we had a missing
    validation check of nla_parse_deprecated().

    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik