07 Oct, 2020

1 commit

  • commit 1cc5ef91d2ff94d2bf2de3b3585423e8a1051cb6 upstream.

    The indexes to the nf_nat_l[34]protos arrays come from userspace. So
    check the tuple's family, e.g. l3num, when creating the conntrack in
    order to prevent an OOB memory access during setup. Here is an example
    kernel panic on 4.14.180 when userspace passes in an index greater than
    NFPROTO_NUMPROTO.

    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:...
    Process poc (pid: 5614, stack limit = 0x00000000a3933121)
    CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483
    Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
    task: 000000002a3dfffe task.stack: 00000000a3933121
    pc : __cfi_check_fail+0x1c/0x24
    lr : __cfi_check_fail+0x1c/0x24
    ...
    Call trace:
    __cfi_check_fail+0x1c/0x24
    name_to_dev_t+0x0/0x468
    nfnetlink_parse_nat_setup+0x234/0x258
    ctnetlink_parse_nat_setup+0x4c/0x228
    ctnetlink_new_conntrack+0x590/0xc40
    nfnetlink_rcv_msg+0x31c/0x4d4
    netlink_rcv_skb+0x100/0x184
    nfnetlink_rcv+0xf4/0x180
    netlink_unicast+0x360/0x770
    netlink_sendmsg+0x5a0/0x6a4
    ___sys_sendmsg+0x314/0x46c
    SyS_sendmsg+0xb4/0x108
    el0_svc_naked+0x34/0x38

    This crash is not happening since 5.4+, however, ctnetlink still
    allows for creating entries with unsupported layer 3 protocol number.

    Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack")
    Signed-off-by: Will McVicker
    [pablo@netfilter.org: rebased original patch on top of nf.git]
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Will McVicker
     

01 Oct, 2020

2 commits

  • [ Upstream commit 526e81b990e53e31ba40ba304a2285ffd098721f ]

    The openvswitch module fails initialization when used in a kernel
    without IPv6 enabled. nf_conncount_init() fails because the ct code
    unconditionally tries to initialize the netns IPv6 related bit,
    regardless of the build option. The change below ignores the IPv6
    part if not enabled.

    Note that the corresponding _put() function already has this IPv6
    configuration check.

    Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
    Signed-off-by: Eelco Chaudron
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Eelco Chaudron
     
  • [ Upstream commit 0a6a9515fe390976cd762c52d8d4f446d7a14285 ]

    It is safe to traverse &net->nft.tables with &net->nft.commit_mutex
    held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false
    positive,

    WARNING: suspicious RCU usage
    net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by iptables/1384:
    #0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables]

    Call Trace:
    dump_stack+0xa1/0xea
    lockdep_rcu_suspicious+0x103/0x10d
    nft_table_lookup.part.0+0x116/0x120 [nf_tables]
    nf_tables_newtable+0x12c/0x7d0 [nf_tables]
    nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink]
    nfnetlink_rcv+0x1da/0x210 [nfnetlink]
    netlink_unicast+0x306/0x460
    netlink_sendmsg+0x44b/0x770
    ____sys_sendmsg+0x46b/0x4a0
    ___sys_sendmsg+0x138/0x1a0
    __sys_sendmsg+0xb6/0x130
    __x64_sys_sendmsg+0x48/0x50
    do_syscall_64+0x69/0xf4
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Signed-off-by: Qian Cai
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Qian Cai
     

17 Sep, 2020

1 commit

  • [ Upstream commit cc5453a5b7e90c39f713091a7ebc53c1f87d1700 ]

    If an sctp connection gets re-used, heartbeats are flagged as invalid
    because their vtag doesn't match.

    Handle this in a similar way as TCP conntrack when it suspects that the
    endpoints and conntrack are out-of-sync.

    When a HEARTBEAT request fails its vtag validation, flag this in the
    conntrack state and accept the packet.

    When a HEARTBEAT_ACK is received with an invalid vtag in the reverse
    direction after we allowed such a HEARTBEAT through, assume we are
    out-of-sync and re-set the vtag info.

    v2: remove left-over snippet from an older incarnation that moved
    new_state/old_state assignments, thats not needed so keep that
    as-is.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     

10 Sep, 2020

3 commits

  • [ Upstream commit ee921183557af39c1a0475f982d43b0fcac25e2e ]

    Frontend callback reports EAGAIN to nfnetlink to retry a command, this
    is used to signal that module autoloading is required. Unfortunately,
    nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets
    full, so it enters a busy-loop.

    This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and
    to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast()
    since this is always MSG_DONTWAIT in the existing code which is exactly
    what nlmsg_unicast() passes to netlink_unicast() as parameter.

    Fixes: 96518518cc41 ("netfilter: add nftables")
    Reported-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit 1e105e6afa6c3d32bfb52c00ffa393894a525c27 ]

    Following bug was reported via irc:
    nft list ruleset
    set knock_candidates_ipv4 {
    type ipv4_addr . inet_service
    size 65535
    elements = { 127.0.0.1 . 123,
    127.0.0.1 . 123 }
    }
    ..
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }

    It should not have been possible to add a duplicate set entry.

    After some debugging it turned out that the problem is the immediate
    value (123) in the second-to-last rule.

    Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
    not 6 and it turns out the kernel inserted

    inet firewall @knock_candidates_ipv4
    element 0100007f ffff7b00 : 0 [end]
    element 0100007f 00007b00 : 0 [end]

    Note the non-zero upper bits of the first element. It turns out that
    nft_immediate doesn't zero the destination register, but this is needed
    when the length isn't a multiple of 4.

    Furthermore, the zeroing in nft_payload is broken. We can't use
    [len / 4] = 0 -- if len is a multiple of 4, index is off by one.

    Skip zeroing in this case and use a conditional instead of (len -1) / 4.

    Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit 6f03bf43ee05b31d3822def2a80f11b3591c55b3 ]

    Kernel sends an empty NFTA_SET_USERDATA attribute with no value if
    userspace adds a set with no NFTA_SET_USERDATA attribute.

    Fixes: e6d8ecac9e68 ("netfilter: nf_tables: Add new attributes into nft_set to store user data.")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     

26 Aug, 2020

1 commit

  • [ Upstream commit b428336676dbca363262cc134b6218205df4f530 ]

    On big-endian machine, the returned register data when the exthdr is
    present is not being compared correctly because little-endian is
    assumed. The function nft_cmp_fast_mask(), called by nft_cmp_fast_eval()
    and nft_cmp_fast_init(), calls cpu_to_le32().

    The following dump also shows that little endian is assumed:

    $ nft --debug=netlink add rule ip recordroute forward ip option rr exists counter
    ip
    [ exthdr load ipv4 1b @ 7 + 0 present => reg 1 ]
    [ cmp eq reg 1 0x01000000 ]
    [ counter pkts 0 bytes 0 ]

    Lastly, debug print in nft_cmp_fast_init() and nft_cmp_fast_eval() when
    RR option exists in the packet shows that the comparison fails because
    the assumption:

    nft_cmp_fast_init:189 priv->sreg=4 desc.len=8 mask=0xff000000 data.data[0]=0x10003e0
    nft_cmp_fast_eval:57 regs->data[priv->sreg=4]=0x1 mask=0xff000000 priv->data=0x1000000

    v2: use nft_reg_store8() instead (Florian Westphal). Also to avoid the
    warnings reported by kernel test robot.

    Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options")
    Fixes: c078ca3b0c5b ("netfilter: nft_exthdr: Add support for existence check")
    Signed-off-by: Stephen Suryaputra
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Stephen Suryaputra
     

19 Aug, 2020

1 commit

  • [ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

    YangYuxi is reporting that connection reuse
    is causing one-second delay when SYN hits
    existing connection in TIME_WAIT state.
    Such delay was added to give time to expire
    both the IPVS connection and the corresponding
    conntrack. This was considered a rare case
    at that time but it is causing problem for
    some environments such as Kubernetes.

    As nf_conntrack_tcp_packet() can decide to
    release the conntrack in TIME_WAIT state and
    to replace it with a fresh NEW conntrack, we
    can use this to allow rescheduling just by
    tuning our check: if the conntrack is
    confirmed we can not schedule it to different
    real server and the one-second delay still
    applies but if new conntrack was created,
    we are free to select new real server without
    any delays.

    YangYuxi lists some of the problem reports:

    - One second connection delay in masquerading mode:
    https://marc.info/?t=151683118100004&r=1&w=2

    - IPVS low throughput #70747
    https://github.com/kubernetes/kubernetes/issues/70747

    - Apache Bench can fill up ipvs service proxy in seconds #544
    https://github.com/cloudnativelabs/kube-router/issues/544

    - Additional 1s latency in `host -> service IP -> pod`
    https://github.com/kubernetes/kubernetes/issues/90854

    Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
    Co-developed-by: YangYuxi
    Signed-off-by: YangYuxi
    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Julian Anastasov
     

29 Jul, 2020

1 commit

  • [ Upstream commit 8210e344ccb798c672ab237b1a4f241bda08909b ]

    The sync_thread_backup only checks sk_receive_queue is empty or not,
    there is a situation which cannot sync the connection entries when
    sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
    the sync packets are dropped in __udp_enqueue_schedule_skb, this is
    because the packets in reader_queue is not read, so the rmem is
    not reclaimed.

    Here I add the check of whether the reader_queue of the udp sock is
    empty or not to solve this problem.

    Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
    Reported-by: zhouxudong
    Signed-off-by: guodeqing
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    guodeqing
     

16 Jul, 2020

2 commits

  • [ Upstream commit d005fbb855d3b5660d62ee5a6bd2d99c13ff8cf3 ]

    __nf_conntrack_update() might refresh the conntrack object that is
    attached to the skbuff. Otherwise, this triggers UAF.

    [ 633.200434] ==================================================================
    [ 633.200472] BUG: KASAN: use-after-free in nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200478] Read of size 1 at addr ffff888370804c00 by task nfqnl_test/6769

    [ 633.200487] CPU: 1 PID: 6769 Comm: nfqnl_test Not tainted 5.8.0-rc2+ #388
    [ 633.200490] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
    [ 633.200491] Call Trace:
    [ 633.200499] dump_stack+0x7c/0xb0
    [ 633.200526] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200532] print_address_description.constprop.6+0x1a/0x200
    [ 633.200539] ? _raw_write_lock_irqsave+0xc0/0xc0
    [ 633.200568] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200594] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200598] kasan_report.cold.9+0x1f/0x42
    [ 633.200604] ? call_rcu+0x2c0/0x390
    [ 633.200633] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200659] nf_conntrack_update+0x34e/0x770 [nf_conntrack]
    [ 633.200687] ? nf_conntrack_find_get+0x30/0x30 [nf_conntrack]

    Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1436
    Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit c4e8fa9074ad94f80e5c0dcaa16b313e50e958c5 ]

    Whenever ip_set_alloc() is used, allocated memory can either
    use kmalloc() or vmalloc(). We should call kvfree() or
    ip_set_free()

    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 21935 Comm: syz-executor.3 Not tainted 5.8.0-rc2-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__phys_addr+0xa7/0x110 arch/x86/mm/physaddr.c:28
    Code: 1d 7a 09 4c 89 e3 31 ff 48 d3 eb 48 89 de e8 d0 58 3f 00 48 85 db 75 0d e8 26 5c 3f 00 4c 89 e0 5b 5d 41 5c c3 e8 19 5c 3f 00 0b e8 12 5c 3f 00 48 c7 c0 10 10 a8 89 48 ba 00 00 00 00 00 fc
    RSP: 0000:ffffc900018572c0 EFLAGS: 00010046
    RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc9000fac3000
    RDX: 0000000000040000 RSI: ffffffff8133f437 RDI: 0000000000000007
    RBP: ffffc90098aff000 R08: 0000000000000000 R09: ffff8880ae636cdb
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000408018aff000
    R13: 0000000000080000 R14: 000000000000001d R15: ffffc900018573d8
    FS: 00007fc540c66700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc9dcd67200 CR3: 0000000059411000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    virt_to_head_page include/linux/mm.h:841 [inline]
    virt_to_cache mm/slab.h:474 [inline]
    kfree+0x77/0x2c0 mm/slab.c:3749
    hash_net_create+0xbb2/0xd70 net/netfilter/ipset/ip_set_hash_gen.h:1536
    ip_set_create+0x6a2/0x13c0 net/netfilter/ipset/ip_set_core.c:1128
    nfnetlink_rcv_msg+0xbe8/0xea0 net/netfilter/nfnetlink.c:230
    netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
    nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:564
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
    do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45cb19
    Code: Bad RIP value.
    RSP: 002b:00007fc540c65c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004fed80 RCX: 000000000045cb19
    RDX: 0000000000000000 RSI: 0000000020001080 RDI: 0000000000000003
    RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 000000000000095e R14: 00000000004cc295 R15: 00007fc540c666d4

    Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
    Fixes: 03c8b234e61a ("netfilter: ipset: Generalize extensions support")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Eric Dumazet
     

01 Jul, 2020

1 commit

  • [ Upstream commit 715028460082d07a7ec6fcd87b14b46784346a72 ]

    When using ip_set with counters and comment, traffic causes the kernel
    to panic on 32-bit ARM:

    Alignment trap: not handling instruction e1b82f9f at []
    Unhandled fault: alignment exception (0x221) at 0xea08133c
    PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]

    The problem occurs when we try to update the 64-bit counters - the
    faulting address above is not 64-bit aligned. The problem occurs
    due to the way elements are allocated, for example:

    set->dsize = ip_set_elem_len(set, tb, 0, 0);
    map = ip_set_alloc(sizeof(*map) + elements * set->dsize);

    If the element has a requirement for a member to be 64-bit aligned,
    and set->dsize is not a multiple of 8, but is a multiple of four,
    then every odd numbered elements will be misaligned - and hitting
    an atomic64_add() on that element will cause the kernel to panic.

    ip_set_elem_len() must return a size that is rounded to the maximum
    alignment of any extension field stored in the element. This change
    ensures that is the case.

    Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment")
    Signed-off-by: Russell King
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Russell King
     

22 Jun, 2020

1 commit


03 Jun, 2020

7 commits

  • commit 4946ea5c1237036155c3b3a24f049fd5f849f8f6 upstream.

    >> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers]
    extern const char *const pptp_msg_name(u_int16_t msg);
    ^~~~~~

    Reported-by: kbuild test robot
    Fixes: 4c559f15efcc ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 46c1e0621a72e0469ec4edfdb6ed4d387ec34f8a upstream.

    Clang warns:

    net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is
    uninitialized when used here [-Wuninitialized]
    nf_ct_set(skb, ct, ctinfo);
    ^~~~~~
    net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is
    declared here
    enum ip_conntrack_info ctinfo;
    ^
    1 warning generated.

    nf_conntrack_update was split up into nf_conntrack_update and
    __nf_conntrack_update, where the assignment of ctinfo is in
    nf_conntrack_update but it is used in __nf_conntrack_update.

    Pass the value of ctinfo from nf_conntrack_update to
    __nf_conntrack_update so that uninitialized memory is not used
    and everything works properly.

    Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
    Link: https://github.com/ClangBuiltLinux/linux/issues/1039
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Nathan Chancellor
     
  • commit 94945ad2b330207cded0fd8d4abebde43a776dfb upstream.

    net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper:
    net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits]
    2117 | if (protoff < 0 || (frag_off & htons(~0x7)) != 0)
    | ^

    ipv6_skip_exthdr() returns a signed integer.

    Reported-by: Colin Ian King
    Fixes: 703acd70f249 ("netfilter: nfnetlink_cthelper: unbreak userspace helper support")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 4c559f15efcc43b996f4da528cd7f9483aaca36d upstream.

    Dan Carpenter says: "Smatch complains that the value for "cmd" comes
    from the network and can't be trusted."

    Add pptp_msg_name() helper function that checks for the array boundary.

    Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port")
    Reported-by: Dan Carpenter
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 703acd70f2496537457186211c2f03e792409e68 upstream.

    Restore helper data size initialization and fix memcopy of the helper
    data size.

    Fixes: 157ffffeb5dc ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests")
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit ee04805ff54a63ffd90bc6749ebfe73473734ddb upstream.

    Florian Westphal says:

    "Problem is that after the helper hook was merged back into the confirm
    one, the queueing itself occurs from the confirm hook, i.e. we queue
    from the last netfilter callback in the hook-list.

    Therefore, on return, the packet bypasses the confirm action and the
    connection is never committed to the main conntrack table.

    To fix this there are several ways:
    1. revert the 'Fixes' commit and have a extra helper hook again.
    Works, but has the drawback of adding another indirect call for
    everyone.

    2. Special case this: split the hooks only when userspace helper
    gets added, so queueing occurs at a lower priority again,
    and normal enqueue reinject would eventually call the last hook.

    3. Extend the existing nf_queue ct update hook to allow a forced
    confirmation (plus run the seqadj code).

    This goes for 3)."

    Fixes: 827318feb69cb ("netfilter: conntrack: remove helper hook again")
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit a164b95ad6055c50612795882f35e0efda1f1390 upstream.

    If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not
    update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE
    must be set, not unset.

    Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters")
    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Phil Sutter
     

20 May, 2020

3 commits

  • [ Upstream commit 340eaff651160234bdbce07ef34b92a8e45cd540 ]

    Expired intervals would still match and be dumped to user space until
    garbage collection wiped them out. Make sure they stop matching and
    disappear (from users' perspective) as soon as they expire.

    Fixes: 8d8540c4f5e03 ("netfilter: nft_set_rbtree: add timeout support")
    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Phil Sutter
     
  • [ Upstream commit 6f7c9caf017be8ab0fe3b99509580d0793bf0833 ]

    Replace negations of nft_rbtree_interval_end() with a new helper,
    nft_rbtree_interval_start(), wherever this helps to visualise the
    problem at hand, that is, for all the occurrences except for the
    comparison against given flags in __nft_rbtree_get().

    This gets especially useful in the next patch.

    Signed-off-by: Stefano Brivio
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Stefano Brivio
     
  • [ Upstream commit 2c407aca64977ede9b9f35158e919773cae2082f ]

    gcc-10 warns around a suspicious access to an empty struct member:

    net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
    net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
    1522 | memset(&ct->__nfct_init_offset[0], 0,
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from net/netfilter/nf_conntrack_core.c:37:
    include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
    90 | u8 __nfct_init_offset[0];
    | ^~~~~~~~~~~~~~~~~~

    The code is correct but a bit unusual. Rework it slightly in a way that
    does not trigger the warning, using an empty struct instead of an empty
    array. There are probably more elegant ways to do this, but this is the
    smallest change.

    Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Arnd Bergmann
     

14 May, 2020

2 commits

  • commit c165d57b552aaca607fa5daf3fb524a6efe3c5a3 upstream.

    gcc-10 points out that a code path exists where a pointer to a stack
    variable may be passed back to the caller:

    net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init':
    cc1: warning: function may return address of local variable [-Wreturn-local-addr]
    net/netfilter/nfnetlink_osf.c:171:16: note: declared here
    171 | struct tcphdr _tcph;
    | ^~~~~

    I am not sure whether this can happen in practice, but moving the
    variable declaration into the callers avoids the problem.

    Fixes: 31a9c29210e2 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit ea64d8d6c675c0bb712689b13810301de9d8f77a upstream.

    If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
    device has disabled UDP checksums and enabled Tx checksum offloading,
    then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
    checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
    checksum offloaded).

    Because of the ->ip_summed value, udp_manip_pkt() tries to update the
    outer checksum with the new address and port, leading to an invalid
    checksum sent on the wire, as the original null checksum obviously
    didn't take the old address and port into account.

    So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
    might not refer to the checksum we're acting on. Instead, we can base
    the decision to update the UDP checksum entirely on the value of
    hdr->check, because it's null if and only if checksum is disabled:

    * A fully computed checksum can't be 0, since a 0 checksum is
    represented by the CSUM_MANGLED_0 value instead.

    * A partial checksum can't be 0, since the pseudo-header always adds
    at least one non-zero value (the UDP protocol type 0x11) and adding
    more values to the sum can't make it wrap to 0 as the carry is then
    added to the wrapped number.

    * A disabled checksum uses the special value 0.

    The problem seems to be there from day one, although it was probably
    not visible before UDP tunnels were implemented.

    Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack")
    Signed-off-by: Guillaume Nault
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     

02 May, 2020

1 commit

  • commit b4faef1739dd1f3b3981b8bf173a2266ea86b1eb upstream.

    A case of warning was reported by syzbot.

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
    nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x188/0x20d lib/dump_stack.c:118
    panic+0x2e3/0x75c kernel/panic.c:221
    __warn.cold+0x2f/0x35 kernel/panic.c:582
    report_bug+0x27b/0x2f0 lib/bug.c:195
    fixup_bug arch/x86/kernel/traps.c:175 [inline]
    fixup_bug arch/x86/kernel/traps.c:170 [inline]
    do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
    do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
    invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
    RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
    Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
    RSP: 0018:ffffc90005487208 EFLAGS: 00010246
    RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000
    RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005
    RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79
    R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000
    R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
    nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
    nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
    nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
    nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
    nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
    nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
    nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
    nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
    nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
    ___sys_sendmsg+0x100/0x170 net/socket.c:2416
    __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
    do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
    in case of failing to register NFPROTO_IPV4 hook.

    Reported-by: syzbot
    Fixes: d164385ec572 ("netfilter: nat: add inet family nat support")
    Cc: Florian Westphal
    Cc: Stefano Brivio
    Signed-off-by: Hillf Danton
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Hillf Danton
     

23 Apr, 2020

1 commit

  • commit d9583cdf2f38d0f526d9a8c8564dd2e35e649bc7 upstream.

    EINVAL should be used for malformed netlink messages. New userspace
    utility and old kernels might easily result in EINVAL when exercising
    new set features, which is misleading.

    Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     

01 Apr, 2020

4 commits

  • commit 2c64605b590edadb3fb46d1ec6badb49e940b479 upstream.

    net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
    pkt->skb->tc_redirected = 1;
    ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
    pkt->skb->tc_from_ingress = 1;
    ^~

    To avoid a direct dependency with tc actions from netfilter, wrap the
    redirect bits around CONFIG_NET_REDIRECT and move helpers to
    include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
    only existing client of these bits in the tree.

    This patch adds skb_set_redirected() that sets on the redirected bit
    on the skbuff, it specifies if the packet was redirect from ingress
    and resets the timestamp (timestamp reset was originally missing in the
    netfilter bugfix).

    Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
    Reported-by: noreply@ellerman.id.au
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit bcfabee1afd99484b6ba067361b8678e28bbc065 upstream.

    Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
    Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
    path after leaving the ifb egress path.

    This patch inconditionally sets on these two skb fields that are
    meaningful to the ifb driver. The existing forward action is guaranteed
    to run from ingress path.

    Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 76a109fac206e158eb3c967af98c178cff738e6a upstream.

    Make sure the forward action is only used from ingress.

    Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 41e9ec5a54f95eee1a57c8d26ab70e0492548c1b upstream.

    Since pskb_may_pull may change skb->data, so we need to reload ip{v6}h at
    the right place.

    Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
    Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
    Signed-off-by: Haishuang Yan
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Haishuang Yan
     

21 Mar, 2020

2 commits

  • [ Upstream commit 99b79c3900d4627672c85d9f344b5b0f06bc2a4d ]

    Before releasing the global mutex, we only unlink the hashtable
    from the hash list, its proc file is still not unregistered at
    this point. So syzbot could trigger a race condition where a
    parallel htable_create() could register the same file immediately
    after the mutex is released.

    Move htable_remove_proc_entry() back to mutex protection to
    fix this. And, fold htable_destroy() into htable_put() to make
    the code slightly easier to understand.

    Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
    Fixes: c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
    Signed-off-by: Cong Wang
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Cong Wang
     
  • [ Upstream commit 28b3a4270c0fc064557e409111f2a678e64b6fa7 ]

    no need, just use a simple boolean to indicate we want to reap all
    entries.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     

18 Mar, 2020

6 commits

  • commit 6a42cefb25d8bdc1b391f4a53c78c32164eea2dd upstream.

    Set owner to THIS_MODULE, otherwise the nft_chain_nat module might be
    removed while there are still inet/nat chains in place.

    [ 117.942096] BUG: unable to handle page fault for address: ffffffffa0d5e040
    [ 117.942101] #PF: supervisor read access in kernel mode
    [ 117.942103] #PF: error_code(0x0000) - not-present page
    [ 117.942106] PGD 200c067 P4D 200c067 PUD 200d063 PMD 3dc909067 PTE 0
    [ 117.942113] Oops: 0000 [#1] PREEMPT SMP PTI
    [ 117.942118] CPU: 3 PID: 27 Comm: kworker/3:0 Not tainted 5.6.0-rc3+ #348
    [ 117.942133] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
    [ 117.942145] RIP: 0010:nf_tables_chain_destroy.isra.0+0x94/0x15a [nf_tables]
    [ 117.942149] Code: f6 45 54 01 0f 84 d1 00 00 00 80 3b 05 74 44 48 8b 75 e8 48 c7 c7 72 be de a0 e8 56 e6 2d e0 48 8b 45 e8 48 c7 c7 7f be de a0 8b 30 e8 43 e6 2d e0 48 8b 45 e8 48 8b 40 10 48 85 c0 74 5b 8b
    [ 117.942152] RSP: 0018:ffffc9000015be10 EFLAGS: 00010292
    [ 117.942155] RAX: ffffffffa0d5e040 RBX: ffff88840be87fc2 RCX: 0000000000000007
    [ 117.942158] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffffffa0debe7f
    [ 117.942160] RBP: ffff888403b54b50 R08: 0000000000001482 R09: 0000000000000004
    [ 117.942162] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8883eda7e540
    [ 117.942164] R13: dead000000000122 R14: dead000000000100 R15: ffff888403b3db80
    [ 117.942167] FS: 0000000000000000(0000) GS:ffff88840e4c0000(0000) knlGS:0000000000000000
    [ 117.942169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 117.942172] CR2: ffffffffa0d5e040 CR3: 00000003e4c52002 CR4: 00000000001606e0
    [ 117.942174] Call Trace:
    [ 117.942188] nf_tables_trans_destroy_work.cold+0xd/0x12 [nf_tables]
    [ 117.942196] process_one_work+0x1d6/0x3b0
    [ 117.942200] worker_thread+0x45/0x3c0
    [ 117.942203] ? process_one_work+0x3b0/0x3b0
    [ 117.942210] kthread+0x112/0x130
    [ 117.942214] ? kthread_create_worker_on_cpu+0x40/0x40
    [ 117.942221] ret_from_fork+0x35/0x40

    nf_tables_chain_destroy() crashes on module_put() because the module is
    gone.

    Fixes: d164385ec572 ("netfilter: nat: add inet family nat support")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit d78008de6103c708171baff9650a7862645d23b0 upstream.

    Missing NFTA_CHAIN_FLAGS netlink attribute when dumping basechain
    definitions.

    Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 88a637719a1570705c02cacb3297af164b1714e7 upstream.

    Add missing attribute validation for tunnel source and
    destination ports to the netlink policy.

    Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • commit 9d6effb2f1523eb84516e44213c00f2fd9e6afff upstream.

    Add missing attribute validation for NFTA_PAYLOAD_CSUM_FLAGS
    to the netlink policy.

    Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • commit c049b3450072b8e3998053490e025839fecfef31 upstream.

    Add missing attribute validation for cthelper
    to the netlink policy.

    Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • commit ee84f19cbbe9cf7cba2958acb03163fed3ecbb0f upstream.

    If .next function does not change position index,
    following .show function will repeat output related
    to current position index.

    Without patch:
    # dd if=/proc/net/ip_tables_matches # original file output
    conntrack
    conntrack
    conntrack
    recent
    recent
    icmp
    udplite
    udp
    tcp
    0+1 records in
    0+1 records out
    65 bytes copied, 5.4074e-05 s, 1.2 MB/s

    # dd if=/proc/net/ip_tables_matches bs=62 skip=1
    dd: /proc/net/ip_tables_matches: cannot skip to specified offset
    cp <<< end of last line
    tcp <<< and then unexpected whole last line once again
    0+1 records in
    0+1 records out
    7 bytes copied, 0.000102447 s, 68.3 kB/s

    Cc: stable@vger.kernel.org
    Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin