20 Jan, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Fix non-blocking connect() in x25, from Martin Schiller.

    2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.

    3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.

    4) Limit size of TSO packets properly in lan78xx driver, from Eric
    Dumazet.

    5) r8152 probe needs an endpoint sanity check, from Johan Hovold.

    6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
    John Fastabend.

    7) hns3 needs short frames padded on transmit, from Yunsheng Lin.

    8) Fix netfilter ICMP header corruption, from Eyal Birger.

    9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.

    10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.

    11) Fix memory leak in act_ctinfo, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
    cxgb4: reject overlapped queues in TC-MQPRIO offload
    cxgb4: fix Tx multi channel port rate limit
    net: sched: act_ctinfo: fix memory leak
    bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
    bnxt_en: Fix ipv6 RFS filter matching logic.
    bnxt_en: Fix NTUPLE firmware command failures.
    net: systemport: Fixed queue mapping in internal ring map
    net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
    net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
    net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
    net: hns: fix soft lockup when there is not enough memory
    net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
    net/sched: act_ife: initalize ife->metalist earlier
    netfilter: nat: fix ICMP header corruption on ICMP errors
    net: wan: lapbether.c: Use built-in RCU list checking
    netfilter: nf_tables: fix flowtable list del corruption
    netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
    netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
    netfilter: nft_tunnel: ERSPAN_VERSION must not be null
    netfilter: nft_tunnel: fix null-attribute check
    ...

    Linus Torvalds
     

19 Jan, 2020

1 commit

  • Implement a cleanup method to properly free ci->params

    BUG: memory leak
    unreferenced object 0xffff88811746e2c0 (size 64):
    comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00 .4`.............
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [] slab_post_alloc_hook mm/slab.h:586 [inline]
    [] slab_alloc mm/slab.c:3320 [inline]
    [] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
    [] kmalloc include/linux/slab.h:556 [inline]
    [] kzalloc include/linux/slab.h:670 [inline]
    [] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236
    [] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944
    [] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000
    [] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410
    [] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465
    [] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
    [] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
    [] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    [] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
    [] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
    [] sock_sendmsg_nosec net/socket.c:639 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:659
    [] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
    [] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
    [] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
    [] __do_sys_sendmsg net/socket.c:2426 [inline]
    [] __se_sys_sendmsg net/socket.c:2424 [inline]
    [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424

    Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Kevin 'ldir' Darbyshire-Bryant
    Cc: Cong Wang
    Cc: Toke Høiland-Jørgensen
    Acked-by: Kevin 'ldir' Darbyshire-Bryant
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Jan, 2020

3 commits

  • syzbot reported some bogus lockdep warnings, for example bad unlock
    balance in sch_direct_xmit(). They are due to a race condition between
    slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
    in netdev_update_lockdep_key() on slow path, while we could still
    acquire the queue->_xmit_lock on fast path in this small window:

    CPU A CPU B
    __netif_tx_lock();
    lockdep_unregister_key(qdisc_xmit_lock_key);
    __netif_tx_unlock();
    lockdep_register_key(qdisc_xmit_lock_key);

    In fact, unlike the addr_list_lock which has to be reordered when
    the master/slave device relationship changes, queue->_xmit_lock is
    only acquired on fast path and only when NETIF_F_LLTX is not set,
    so there is likely no nested locking for it.

    Therefore, we can just get rid of re-registration of
    qdisc_xmit_lock_key.

    Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com
    Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
    Cc: Taehee Yoo
    Signed-off-by: Cong Wang
    Acked-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Cong Wang
     
  • It seems better to init ife->metalist earlier in tcf_ife_init()
    to avoid the following crash :

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
    RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
    Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
    RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
    RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
    R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
    R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
    FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
    __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
    __tcf_idr_release net/sched/act_api.c:165 [inline]
    __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
    tcf_idr_release include/net/act_api.h:171 [inline]
    tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
    tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
    tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
    tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
    tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
    rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:639 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:659
    ____sys_sendmsg+0x753/0x880 net/socket.c:2330
    ___sys_sendmsg+0x100/0x170 net/socket.c:2384
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2417
    __do_sys_sendmsg net/socket.c:2426 [inline]
    __se_sys_sendmsg net/socket.c:2424 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Davide Caratti
    Reviewed-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net

    The following patchset contains Netfilter fixes for net:

    1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang.

    2) Missing init netns in entry cleanup path of arp_tables,
    from Florian Westphal.

    3) Fix WARN_ON in set destroy path due to missing cleanup on
    transaction error.

    4) Incorrect netlink sanity check in tunnel, from Florian Westphal.

    5) Missing sanity check for erspan version netlink attribute, also
    from Florian.

    6) Remove WARN in nft_request_module() that can be triggered from
    userspace, from Florian Westphal.

    7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter.

    8) List poison from commit path for flowtables that are added and
    deleted in the same batch, from Florian Westphal.

    9) Fix NAT ICMP packet corruption, from Eyal Birger.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jan, 2020

19 commits

  • Commit 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts")
    made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4
    manipulation function for the outer packet on ICMP errors.

    However, icmp_manip_pkt() assumes the packet has an 'id' field which
    is not correct for all types of ICMP messages.

    This is not correct for ICMP error packets, and leads to bogus bytes
    being written the ICMP header, which can be wrongfully regarded as
    'length' bytes by RFC 4884 compliant receivers.

    Fix by assigning the 'id' field only for ICMP messages that have this
    semantic.

    Reported-by: Shmulik Ladkani
    Fixes: 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts")
    Signed-off-by: Eyal Birger
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Eyal Birger
     
  • syzbot reported following crash:

    list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
    [..]
    Call Trace:
    __list_del_entry include/linux/list.h:131 [inline]
    list_del_rcu include/linux/rculist.h:148 [inline]
    nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
    [..]

    The commit transaction list has:

    NFT_MSG_NEWTABLE
    NFT_MSG_NEWFLOWTABLE
    NFT_MSG_DELFLOWTABLE
    NFT_MSG_DELTABLE

    A missing generation check during DELTABLE processing causes it to queue
    the DELFLOWTABLE operation a second time, so we corrupt the list here:

    case NFT_MSG_DELFLOWTABLE:
    list_del_rcu(&nft_trans_flowtable(trans)->list);
    nf_tables_flowtable_notify(&trans->ctx,

    because we have two different DELFLOWTABLE transactions for the same
    flowtable. We then call list_del_rcu() twice for the same flowtable->list.

    The object handling seems to suffer from the same bug so add a generation
    check too and only queue delete transactions for flowtables/objects that
    are still active in the next generation.

    Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
    Fixes: 3b49e2e94e6eb ("netfilter: nf_tables: add flow table netlink frontend")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Syzbot detected a leak in nf_tables_parse_netdev_hooks(). If the hook
    already exists, then the error handling doesn't free the newest "hook".

    Reported-by: syzbot+f9d4095107fc8749c69c@syzkaller.appspotmail.com
    Fixes: b75a3e8371bc ("netfilter: nf_tables: allow netdevice to be used only once per flowtable")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     
  • This WARN can trigger because some of the names fed to the module
    autoload function can be of arbitrary length.

    Remove the WARN and add limits for all NLA_STRING attributes.

    Reported-by: syzbot+0e63ae76d117ae1c3a01@syzkaller.appspotmail.com
    Fixes: 452238e8d5ffd8 ("netfilter: nf_tables: add and use helper for module autoload")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Fixes: af308b94a2a4a5 ("netfilter: nf_tables: add tunnel support")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • else we get null deref when one of the attributes is missing, both
    must be non-null.

    Reported-by: syzbot+76d0b80493ac881ff77b@syzkaller.appspotmail.com
    Fixes: aaecfdb5c5dd8ba ("netfilter: nf_tables: match on tunnel metadata")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This patch fixes a WARN_ON in nft_set_destroy() due to missing
    set reference count drop from the preparation phase. This is triggered
    by the module autoload path. Do not exercise the abort path from
    nft_request_module() while preparation phase cleaning up is still
    pending.

    WARNING: CPU: 3 PID: 3456 at net/netfilter/nf_tables_api.c:3740 nft_set_destroy+0x45/0x50 [nf_tables]
    [...]
    CPU: 3 PID: 3456 Comm: nft Not tainted 5.4.6-arch3-1 #1
    RIP: 0010:nft_set_destroy+0x45/0x50 [nf_tables]
    Code: e8 30 eb 83 c6 48 8b 85 80 00 00 00 48 8b b8 90 00 00 00 e8 dd 6b d7 c5 48 8b 7d 30 e8 24 dd eb c5 48 89 ef 5d e9 6b c6 e5 c5 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 10 e9 52
    RSP: 0018:ffffac4f43e53700 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: ffff99d63a154d80 RCX: 0000000001f88e03
    RDX: 0000000001f88c03 RSI: ffff99d6560ef0c0 RDI: ffff99d63a101200
    RBP: ffff99d617721de0 R08: 0000000000000000 R09: 0000000000000318
    R10: 00000000f0000000 R11: 0000000000000001 R12: ffffffff880fabf0
    R13: dead000000000122 R14: dead000000000100 R15: ffff99d63a154d80
    FS: 00007ff3dbd5b740(0000) GS:ffff99d6560c0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00001cb5de6a9000 CR3: 000000016eb6a004 CR4: 00000000001606e0
    Call Trace:
    __nf_tables_abort+0x3e3/0x6d0 [nf_tables]
    nft_request_module+0x6f/0x110 [nf_tables]
    nft_expr_type_request_module+0x28/0x50 [nf_tables]
    nf_tables_expr_parse+0x198/0x1f0 [nf_tables]
    nft_expr_init+0x3b/0xf0 [nf_tables]
    nft_dynset_init+0x1e2/0x410 [nf_tables]
    nf_tables_newrule+0x30a/0x930 [nf_tables]
    nfnetlink_rcv_batch+0x2a0/0x640 [nfnetlink]
    nfnetlink_rcv+0x125/0x171 [nfnetlink]
    netlink_unicast+0x179/0x210
    netlink_sendmsg+0x208/0x3d0
    sock_sendmsg+0x5e/0x60
    ____sys_sendmsg+0x21b/0x290

    Update comment on the code to describe the new behaviour.

    Reported-by: Marco Oliverio
    Fixes: 452238e8d5ff ("netfilter: nf_tables: add and use helper for module autoload")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • DSA subsystem takes care of netdev statistics since commit 4ed70ce9f01c
    ("net: dsa: Refactor transmit path to eliminate duplication"), so
    any accounting inside tagger callbacks is redundant and can lead to
    messing up the stats.
    This bug is present in Qualcomm tagger since day 0.

    Fixes: cafdc45c949b ("net-next: dsa: add Qualcomm tag RX/TX handler")
    Reviewed-by: Andrew Lunn
    Signed-off-by: Alexander Lobakin
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Alexander Lobakin
     
  • The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
    875138f81d71a ("dsa: Move tagger name into its ops structure") while
    moving tagger names to their structures.

    Fixes: 875138f81d71a ("dsa: Move tagger name into its ops structure")
    Reviewed-by: Andrew Lunn
    Signed-off-by: Alexander Lobakin
    Reviewed-by: Florian Fainelli
    Acked-by: Hauke Mehrtens
    Signed-off-by: David S. Miller

    Alexander Lobakin
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2020-01-15

    The following pull-request contains BPF updates for your *net* tree.

    We've added 12 non-merge commits during the last 9 day(s) which contain
    a total of 13 files changed, 95 insertions(+), 43 deletions(-).

    The main changes are:

    1) Fix refcount leak for TCP time wait and request sockets for socket lookup
    related BPF helpers, from Lorenz Bauer.

    2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.

    3) Batch of several sockmap and related TLS fixes found while operating
    more complex BPF programs with Cilium and OpenSSL, from John Fastabend.

    4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
    to avoid purging data upon teardown, from Lingpeng Chen.

    5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
    dump a BPF map's value with BTF, from Martin KaFai Lau.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When user returns SK_DROP we need to reset the number of copied bytes
    to indicate to the user the bytes were dropped and not sent. If we
    don't reset the copied arg sendmsg will return as if those bytes were
    copied giving the user a positive return value.

    This works as expected today except in the case where the user also
    pops bytes. In the pop case the sg.size is reduced but we don't correctly
    account for this when copied bytes is reset. The popped bytes are not
    accounted for and we return a small positive value potentially confusing
    the user.

    The reason this happens is due to a typo where we do the wrong comparison
    when accounting for pop bytes. In this fix notice the if/else is not
    needed and that we have a similar problem if we push data except its not
    visible to the user because if delta is larger the sg.size we return a
    negative value so it appears as an error regardless.

    Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com

    John Fastabend
     
  • Its possible through a set of push, pop, apply helper calls to construct
    a skmsg, which is just a ring of scatterlist elements, with the start
    value larger than the end value. For example,

    end start
    |_0_|_1_| ... |_n_|_n+1_|

    Where end points at 1 and start points and n so that valid elements is
    the set {n, n+1, 0, 1}.

    Currently, because we don't build the correct chain only {n, n+1} will
    be sent. This adds a check and sg_chain call to correctly submit the
    above to the crypto and tls send path.

    Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com

    John Fastabend
     
  • It is possible to build a plaintext buffer using push helper that is larger
    than the allocated encrypt buffer. When this record is pushed to crypto
    layers this can result in a NULL pointer dereference because the crypto
    API expects the encrypt buffer is large enough to fit the plaintext
    buffer. Kernel splat below.

    To resolve catch the cases this can happen and split the buffer into two
    records to send individually. Unfortunately, there is still one case to
    handle where the split creates a zero sized buffer. In this case we merge
    the buffers and unmark the split. This happens when apply is zero and user
    pushed data beyond encrypt buffer. This fixes the original case as well
    because the split allocated an encrypt buffer larger than the plaintext
    buffer and the merge simply moves the pointers around so we now have
    a reference to the new (larger) encrypt buffer.

    Perhaps its not ideal but it seems the best solution for a fixes branch
    and avoids handling these two cases, (a) apply that needs split and (b)
    non apply case. The are edge cases anyways so optimizing them seems not
    necessary unless someone wants later in next branches.

    [ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
    [...]
    [ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
    [...]
    [ 306.770350] Call Trace:
    [ 306.770956] scatterwalk_map_and_copy+0x6c/0x80
    [ 306.772026] gcm_enc_copy_hash+0x4b/0x50
    [ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110
    [ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0
    [ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0
    [ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0
    [ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0
    [ 306.778239] gcm_hash_init_continue+0x8f/0xa0
    [ 306.779121] gcm_hash+0x73/0x80
    [ 306.779762] gcm_encrypt_continue+0x6d/0x80
    [ 306.780582] crypto_gcm_encrypt+0xcb/0xe0
    [ 306.781474] crypto_aead_encrypt+0x1f/0x30
    [ 306.782353] tls_push_record+0x3b9/0xb20 [tls]
    [ 306.783314] ? sk_psock_msg_verdict+0x199/0x300
    [ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls]
    [ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls]

    test_sockmap test signature to trigger bug,

    [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):

    Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com

    John Fastabend
     
  • Leaving an incorrect end mark in place when passing to crypto
    layer will cause crypto layer to stop processing data before
    all data is encrypted. To fix clear the end mark on push
    data instead of expecting users of the helper to clear the
    mark value after the fact.

    This happens when we push data into the middle of a skmsg and
    have room for it so we don't do a set of copies that already
    clear the end flag.

    Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com

    John Fastabend
     
  • In the push, pull, and pop helpers operating on skmsg objects to make
    data writable or insert/remove data we use this bounds check to ensure
    specified data is valid,

    /* Bounds checks: start and pop must be inside message */
    if (start >= offset + l || last >= msg->sg.size)
    return -EINVAL;

    The problem here is offset has already included the length of the
    current element the 'l' above. So start could be past the end of
    the scatterlist element in the case where start also points into an
    offset on the last skmsg element.

    To fix do the accounting slightly different by adding the length of
    the previous entry to offset at the start of the iteration. And
    ensure its initialized to zero so that the first iteration does
    nothing.

    Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
    Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
    Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com

    John Fastabend
     
  • When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
    and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
    don't push the write_space callback up and instead simply overwrite the
    op with the psock stored previous op. This may or may not be correct so
    to ensure we don't overwrite the TLS write space hook pass this field to
    the ULP and have it fixup the ctx.

    This completes a previous fix that pushed the ops through to the ULP
    but at the time missed doing this for write_space, presumably because
    write_space TLS hook was added around the same time.

    Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Sitnicki
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com

    John Fastabend
     
  • The sock_map_free() and sock_hash_free() paths used to delete sockmap
    and sockhash maps walk the maps and destroy psock and bpf state associated
    with the socks in the map. When done the socks no longer have BPF programs
    attached and will function normally. This can happen while the socks in
    the map are still "live" meaning data may be sent/received during the walk.

    Currently, though we don't take the sock_lock when the psock and bpf state
    is removed through this path. Specifically, this means we can be writing
    into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
    while they are also being called from the networking side. This is not
    safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
    believed it was safe. Further its not clear to me its even a good idea
    to try and do this on "live" sockets while networking side might also
    be using the socket. Instead of trying to reason about using the socks
    from both sides lets realize that every use case I'm aware of rarely
    deletes maps, in fact kubernetes/Cilium case builds map at init and
    never tears it down except on errors. So lets do the simple fix and
    grab sock lock.

    This patch wraps sock deletes from maps in sock lock and adds some
    annotations so we catch any other cases easier.

    Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com

    John Fastabend
     
  • Simon Wunderlich says:

    ====================
    Here is a batman-adv bugfix:

    - Fix DAT candidate selection on little endian systems,
    by Sven Eckelmann
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When the packet pointed to by retransmit_skb_hint is unlinked by ACK,
    retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue().
    If packet loss is detected at this time, retransmit_skb_hint will be set
    to point to the current packet loss in tcp_verify_retransmit_hint(),
    then the packets that were previously marked lost but not retransmitted
    due to the restriction of cwnd will be skipped and cannot be
    retransmitted.

    To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can
    be reset only after all marked lost packets are retransmitted
    (retrans_out >= lost_out), otherwise we need to traverse from
    tcp_rtx_queue_head in tcp_xmit_retransmit_queue().

    Packetdrill to demonstrate:

    // Disable RACK and set max_reordering to keep things simple
    0 `sysctl -q net.ipv4.tcp_recovery=0`
    +0 `sysctl -q net.ipv4.tcp_max_reordering=3`

    // Establish a connection
    +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0

    +.1 < S 0:0(0) win 32792
    +0 > S. 0:0(0) ack 1
    +.01 < . 1:1(0) ack 1 win 257
    +0 accept(3, ..., ...) = 4

    // Send 8 data segments
    +0 write(4, ..., 8000) = 8000
    +0 > P. 1:8001(8000) ack 1

    // Enter recovery and 1:3001 is marked lost
    +.01 < . 1:1(0) ack 1 win 257
    +0 < . 1:1(0) ack 1 win 257
    +0 < . 1:1(0) ack 1 win 257

    // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001
    +0 > . 1:1001(1000) ack 1

    // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL
    +.01 < . 1:1(0) ack 2001 win 257
    // Now retransmit_skb_hint points to 4001:5001 which is now marked lost

    // BUG: 2001:3001 was not retransmitted
    +0 > . 2001:3001(1000) ack 1

    Signed-off-by: Pengcheng Yang
    Acked-by: Neal Cardwell
    Tested-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Pengcheng Yang
     

15 Jan, 2020

14 commits

  • …rnel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    A few fixes:
    * -O3 enablement fallout, thanks to Arnd who ran this
    * fixes for a few leaks, thanks to Felix
    * channel 12 regulatory fix for custom regdomains
    * check for a crash reported by syzbot
    (NULL function is called on drivers that don't have it)
    * fix TKIP replay protection after setup with some APs
    (from Jouni)
    * restrict obtaining some mesh data to avoid WARN_ONs
    * fix deadlocks with auto-disconnect (socket owner)
    * fix radar detection events with multiple devices
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • The fragments attached to a skb can be part of a compound page. In that case,
    page_ref_inc will increment the refcount for the wrong page. Fix this by
    using get_page instead, which calls page_ref_inc on the compound head and
    also checks for overflow.

    Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx")
    Cc: stable@vger.kernel.org
    Signed-off-by: Felix Fietkau
    Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • Check if set_wiphy_params is assigned and return an error if not,
    some drivers (e.g. virt_wifi where syzbot reported it) don't have
    it.

    Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com
    Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com
    Signed-off-by: Johannes Berg
    Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • The per-tid statistics need to be released after the call to rdev_get_station

    Cc: stable@vger.kernel.org
    Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info")
    Signed-off-by: Felix Fietkau
    Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • The per-tid statistics need to be released after the call to rdev_get_station

    Cc: stable@vger.kernel.org
    Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link")
    Signed-off-by: Felix Fietkau
    Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • Use methods which do not try to acquire the wdev lock themselves.

    Cc: stable@vger.kernel.org
    Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk")
    Signed-off-by: Markus Theil
    Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de
    Signed-off-by: Johannes Berg

    Markus Theil
     
  • After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3,
    the wext code produces a bogus warning:

    In function 'iw_handler_get_iwstats',
    inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9,
    inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10,
    inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8,
    inlined from 'wext_handle_ioctl':
    net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull]
    memcpy(extra, stats, sizeof(struct iw_statistics));
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from arch/x86/include/asm/string.h:5,
    net/wireless/wext-core.c: In function 'wext_handle_ioctl':
    arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here

    The problem is that ioctl_standard_call() sometimes calls the handler
    with a NULL argument that would cause a problem for iw_handler_get_iwstats.
    However, iw_handler_get_iwstats never actually gets called that way.

    Marking that function as noinline avoids the warning and leads
    to slightly smaller object code as well.

    Signed-off-by: Arnd Bergmann
    Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de
    Signed-off-by: Johannes Berg

    Arnd Bergmann
     
  • TKIP replay protection was skipped for the very first frame received
    after a new key is configured. While this is potentially needed to avoid
    dropping a frame in some cases, this does leave a window for replay
    attacks with group-addressed frames at the station side. Any earlier
    frame sent by the AP using the same key would be accepted as a valid
    frame and the internal RSC would then be updated to the TSC from that
    frame. This would allow multiple previously transmitted group-addressed
    frames to be replayed until the next valid new group-addressed frame
    from the AP is received by the station.

    Fix this by limiting the no-replay-protection exception to apply only
    for the case where TSC=0, i.e., when this is for the very first frame
    protected using the new key, and the local RSC had not been set to a
    higher value when configuring the key (which may happen with GTK).

    Signed-off-by: Jouni Malinen
    Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi
    Signed-off-by: Johannes Berg

    Jouni Malinen
     
  • In case a radar event of CAC_FINISHED or RADAR_DETECTED
    happens during another phy is during CAC we might need
    to cancel that CAC.

    If we got a radar in a channel that another phy is now
    doing CAC on then the CAC should be canceled there.

    If, for example, 2 phys doing CAC on the same channels,
    or on comptable channels, once on of them will finish his
    CAC the other might need to cancel his CAC, since it is no
    longer relevant.

    To fix that the commit adds an callback and implement it in
    mac80211 to end CAC.
    This commit also adds a call to said callback if after a radar
    event we see the CAC is no longer relevant

    Signed-off-by: Orr Mazor
    Reviewed-by: Sergey Matyukevich
    Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com
    [slightly reformat/reword commit message]
    Signed-off-by: Johannes Berg

    Orr Mazor
     
  • Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by
    default when checking usable channels"") fixed a broken
    regulatory (leaving channel 12 open for AP where not permitted).
    Apply a similar fix to custom regulatory domain processing.

    Signed-off-by: Cathy Luo
    Signed-off-by: Ganapathi Bhat
    Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com
    [reword commit message, fix coding style, add a comment]
    Signed-off-by: Johannes Berg

    Ganapathi Bhat
     
  • Currently, hv_sock restricts the port the guest socket can accept
    connections on. hv_sock divides the socket port namespace into two parts
    for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF
    (there are no restrictions on client port namespace). The first part
    (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted.
    The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports
    for the peer (host) socket, once a connection is accepted.
    This reservation of the port namespace is specific to hv_sock and not
    known by the generic vsock library (ex: af_vsock). This is problematic
    because auto-binds/ephemeral ports are handled by the generic vsock
    library and it has no knowledge of this port reservation and could
    allocate a port that is not compatible with hv_sock (and legitimately so).
    The issue hasn't surfaced so far because the auto-bind code of vsock
    (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for
    VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and
    start assigning ports. That will take a large number of iterations to hit
    0x7FFFFFFF. But, after the above change to randomize port selection, the
    issue has started coming up more frequently.
    There has really been no good reason to have this port reservation logic
    in hv_sock from the get go. Reserving a local port for peer ports is not
    how things are handled generally. Peer ports should reflect the peer port.
    This fixes the issue by lifting the port reservation, and also returns the
    right peer port. Since the code converts the GUID to the peer port (by
    using the first 4 bytes), there is a possibility of conflicts, but that
    seems like a reasonable risk to take, given this is limited to vsock and
    that only applies to all local sockets.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     
  • Since v5.4, a device removal occasionally triggered this oops:

    Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
    Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
    Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
    Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
    Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
    Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
    Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
    Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
    Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
    Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
    Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
    Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
    Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
    Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
    Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
    Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
    Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
    Dec 2 17:13:53 manet kernel: Call Trace:
    Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
    Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
    Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
    Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
    Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

    The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
    is still pointing to the old ib_device, which has been freed. The
    only way that is possible is if this rpcrdma_rep was not destroyed
    by rpcrdma_ia_remove.

    Debugging showed that was indeed the case: this rpcrdma_rep was
    still in use by a completing RPC at the time of the device removal,
    and thus wasn't on the rep free list. So, it was not found by
    rpcrdma_reps_destroy().

    The fix is to introduce a list of all rpcrdma_reps so that they all
    can be found when a device is removed. That list is used to perform
    only regbuf DMA unmapping, replacing that call to
    rpcrdma_reps_destroy().

    Meanwhile, to prevent corruption of this list, I've moved the
    destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
    rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
    not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
    protecting the rb_all_reps list.

    Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • I've found that on occasion, "rmmod " will hang while if an NFS
    is under load.

    Ensure that ri_remove_done is initialized only just before the
    transport is woken up to force a close. This avoids the completion
    possibly getting initialized again while the CM event handler is
    waiting for a wake-up.

    Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • On device re-insertion, the RDMA device driver crashes trying to set
    up a new QP:

    Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
    Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
    Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
    Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
    Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
    Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
    Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
    Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
    Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
    Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
    Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
    Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
    Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
    Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
    Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
    Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
    Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
    Nov 27 16:32:06 manet kernel: Call Trace:
    Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
    Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
    Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
    Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
    Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

    The fix is to copy the qp_init_attr struct that was just created by
    rpcrdma_ep_create() instead of using the one from the previous
    connection instance.

    Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

14 Jan, 2020

2 commits

  • An earlier commit (1b789577f655060d98d20e,
    "netfilter: arp_tables: init netns pointer in xt_tgchk_param struct")
    fixed missing net initialization for arptables, but turns out it was
    incomplete. We can get a very similar struct net NULL deref during
    error unwinding:

    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    RIP: 0010:xt_rateest_put+0xa1/0x440 net/netfilter/xt_RATEEST.c:77
    xt_rateest_tg_destroy+0x72/0xa0 net/netfilter/xt_RATEEST.c:175
    cleanup_entry net/ipv4/netfilter/arp_tables.c:509 [inline]
    translate_table+0x11f4/0x1d80 net/ipv4/netfilter/arp_tables.c:587
    do_replace net/ipv4/netfilter/arp_tables.c:981 [inline]
    do_arpt_set_ctl+0x317/0x650 net/ipv4/netfilter/arp_tables.c:1461

    Also init the netns pointer in xt_tgdtor_param struct.

    Fixes: add67461240c1d ("netfilter: add struct net * to target parameters")
    Reported-by: syzbot+91bdd8eece0f6629ec8b@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • map->members is freed by ip_set_free() right before using it in
    mtype_ext_cleanup() again. So we just have to move it down.

    Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com
    Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function")
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Cong Wang
    Signed-off-by: Pablo Neira Ayuso

    Cong Wang