31 Dec, 2019

13 commits

  • commit 00d4e14d2e4caf5f7254a505fee5eeca8cd37bd4 upstream.

    syzbot reproduced following crash:

    ===============================================================================
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 9844 Comm: syz-executor.0 Not tainted 5.4.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:__lock_acquire+0x1254/0x4a00 kernel/locking/lockdep.c:3828
    Code: 00 0f 85 96 24 00 00 48 81 c4 f0 00 00 00 5b 41 5c 41 5d 41 5e 41
    5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 3c 02
    00 0f 85 0b 28 00 00 49 81 3e 20 19 78 8a 0f 84 5f ee ff
    RSP: 0018:ffff888099c3fb48 EFLAGS: 00010006
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000218 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff888099c3fc60 R08: 0000000000000001 R09: 0000000000000001
    R10: fffffbfff146e1d0 R11: ffff888098720400 R12: 00000000000010c0
    R13: 0000000000000000 R14: 00000000000010c0 R15: 0000000000000000
    FS: 00007f0559e98700(0000) GS:ffff8880ae800000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe4d89e0000 CR3: 0000000099606000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
    _raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
    spin_lock_bh include/linux/spinlock.h:343 [inline]
    j1939_jsk_del+0x32/0x210 net/can/j1939/socket.c:89
    j1939_sk_bind+0x2ea/0x8f0 net/can/j1939/socket.c:448
    __sys_bind+0x239/0x290 net/socket.c:1648
    __do_sys_bind net/socket.c:1659 [inline]
    __se_sys_bind net/socket.c:1657 [inline]
    __x64_sys_bind+0x73/0xb0 net/socket.c:1657
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x45a679
    Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89
    f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01
    f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f0559e97c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a679
    RDX: 0000000000000018 RSI: 0000000020000240 RDI: 0000000000000003
    RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0559e986d4
    R13: 00000000004c09e9 R14: 00000000004d37d0 R15: 00000000ffffffff
    Modules linked in:
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 9844 at kernel/locking/mutex.c:1419
    mutex_trylock+0x279/0x2f0 kernel/locking/mutex.c:1427
    ===============================================================================

    This issues was caused by null pointer deference. Where j1939_sk_bind()
    was using currently not existing priv.

    Possible scenario may look as following:
    cpu0 cpu1
    bind()
    bind()
    j1939_sk_bind()
    j1939_sk_bind()
    priv = jsk->priv;
    priv = jsk->priv;
    lock_sock(sock->sk);
    priv = j1939_netdev_start(ndev);
    j1939_jsk_add(priv, jsk);
    jsk->priv = priv;
    relase_sock(sock->sk);
    lock_sock(sock->sk);
    j1939_jsk_del(priv, jsk);
    ..... ooops ......

    With this patch we move "priv = jsk->priv;" after the lock, to avoid
    assigning of wrong priv pointer.

    Reported-by: syzbot+99e9e1b200a1e363237d@syzkaller.appspotmail.com
    Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
    Signed-off-by: Oleksij Rempel
    Cc: linux-stable # >= v5.4
    Signed-off-by: Marc Kleine-Budde
    Signed-off-by: Greg Kroah-Hartman

    Oleksij Rempel
     
  • [ Upstream commit 08a5bdde3812993cb8eb7aa9124703df0de28e4b ]

    Commit 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing")
    let STAs send QoS Null frames as PS triggers if the AP was
    a QoS STA. However, the mac80211 PS stack relies on an
    interface flag IEEE80211_STA_NULLFUNC_ACKED for
    determining trigger frame ACK, which was not being set for
    acked non-QoS Null frames. The effect is an inability to
    trigger hardware sleep via IEEE80211_CONF_PS since the QoS
    Null frame was seemingly never acked.

    This bug only applies to drivers which set both
    IEEE80211_HW_REPORTS_TX_ACK_STATUS and
    IEEE80211_HW_PS_NULLFUNC_STACK.

    Detect the acked QoS Null frame to restore STA power save.

    Fixes: 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing")
    Signed-off-by: Thomas Pedersen
    Link: https://lore.kernel.org/r/20191119053538.25979-4-thomas@adapt-ip.com
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Thomas Pedersen
     
  • [ Upstream commit 6012b9346d8959194c239fd60a62dfec98d43048 ]

    Instances may have flags set as part of its data in which case the code
    should not attempt to add it again otherwise it can cause duplication:

    < HCI Command: LE Set Extended Advertising Data (0x08|0x0037) plen 35
    Handle: 0x00
    Operation: Complete extended advertising data (0x03)
    Fragment preference: Minimize fragmentation (0x01)
    Data length: 0x06
    Flags: 0x04
    BR/EDR Not Supported
    Flags: 0x06
    LE General Discoverable Mode
    BR/EDR Not Supported

    Signed-off-by: Luiz Augusto von Dentz
    Signed-off-by: Johan Hedberg
    Signed-off-by: Sasha Levin

    Luiz Augusto von Dentz
     
  • [ Upstream commit eb8c101e28496888a0dcfe16ab86a1bee369e820 ]

    During the setup() stage, HCI device drivers expect the chip to
    acknowledge its setup() completion via vendor specific frames.

    If userspace opens() such HCI device in HCI_USER_CHANNEL [1] mode,
    the vendor specific frames are never tranmitted to the driver, as
    they are filtered in hci_rx_work().

    Allow HCI devices which operate in HCI_USER_CHANNEL mode to receive
    frames if the HCI device is is HCI_INIT state.

    [1] https://www.spinics.net/lists/linux-bluetooth/msg37345.html

    Fixes: 23500189d7e0 ("Bluetooth: Introduce new HCI socket channel for user operation")
    Signed-off-by: Mattijs Korpershoek
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Sasha Levin

    Mattijs Korpershoek
     
  • [ Upstream commit 4c371bb95cf06ded80df0e6139fdd77cee1d9a94 ]

    It appears that some Broadcom controllers (eg BCM20702A0) reject LE Set
    Advertising Parameters command if advertising intervals provided are not
    within range for undirected and low duty directed advertising.

    Workaround this bug by populating min and max intervals with 'valid'
    values.

    < HCI Command: LE Set Advertising Parameters (0x08|0x0006) plen 15
    Min advertising interval: 0.000 msec (0x0000)
    Max advertising interval: 0.000 msec (0x0000)
    Type: Connectable directed - ADV_DIRECT_IND (high duty cycle) (0x01)
    Own address type: Public (0x00)
    Direct address type: Random (0x01)
    Direct address: E2:F0:7B:9F:DC:F4 (Static)
    Channel map: 37, 38, 39 (0x07)
    Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00)
    > HCI Event: Command Complete (0x0e) plen 4
    LE Set Advertising Parameters (0x08|0x0006) ncmd 1
    Status: Invalid HCI Command Parameters (0x12)

    Signed-off-by: Szymon Janc
    Tested-by: Sören Beye
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Sasha Levin

    Szymon Janc
     
  • [ Upstream commit 727ea61a5028f8ac96f75ab34cb1b56e63fd9227 ]

    It looks like in hci_init4_req() the request is being
    initialised from cpu-endian data but the packet is specified
    to be little-endian. This causes an warning from sparse due
    to __le16 to u16 conversion.

    Fix this by using cpu_to_le16() on the two fields in the packet.

    net/bluetooth/hci_core.c:845:27: warning: incorrect type in assignment (different base types)
    net/bluetooth/hci_core.c:845:27: expected restricted __le16 [usertype] tx_len
    net/bluetooth/hci_core.c:845:27: got unsigned short [usertype] le_max_tx_len
    net/bluetooth/hci_core.c:846:28: warning: incorrect type in assignment (different base types)
    net/bluetooth/hci_core.c:846:28: expected restricted __le16 [usertype] tx_time
    net/bluetooth/hci_core.c:846:28: got unsigned short [usertype] le_max_tx_time

    Signed-off-by: Ben Dooks
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Sasha Levin

    Ben Dooks (Codethink)
     
  • [ Upstream commit b3cb53c05f20c5b4026a36a7bbd3010d1f3e0a55 ]

    SMCD link groups belong to certain ISM-devices and SMCR link group
    links belong to certain IB-devices. Increase the refcount for
    these devices, as long as corresponding link groups exist.

    Signed-off-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Ursula Braun
     
  • [ Upstream commit f394722fb0d0f701119368959d7cd0ecbc46363a ]

    neigh_cleanup() has not been used for seven years, and was a wrong design.

    Messing with shared pointer in bond_neigh_init() without proper
    memory barriers would at least trigger syzbot complains eventually.

    It is time to remove this stuff.

    Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit b6f3320b1d5267e7b583a6d0c88dda518101740c ]

    Syzbot found a crash:

    BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
    BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
    BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
    Call Trace:
    crc32_body lib/crc32.c:112 [inline]
    crc32_le_generic lib/crc32.c:179 [inline]
    __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
    chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
    crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
    crc32c+0x150/0x220 lib/libcrc32c.c:47
    sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
    __skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
    sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
    sctp_packet_pack net/sctp/output.c:528 [inline]
    sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
    sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
    sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
    sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
    sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
    sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
    sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
    sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
    sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
    sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
    sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672

    The issue was caused by transport->ipaddr set with uninit addr param, which
    was passed by:

    sctp_transport_init net/sctp/transport.c:47 [inline]
    sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
    sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
    sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]

    where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
    the padding of addr->v4.

    Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
    will become the part of skb, and the issue occurs.

    This patch is to fix it by initializing the padding of addr->v4 in
    sctp_v4_from_addr_param(), as well as other functions that do the similar
    thing, and these functions shouldn't trust that the caller initializes the
    memory, as Marcelo suggested.

    Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 951c6db954a1adefab492f6da805decacabbd1a7 ]

    syzbot reported a memory leak when an allocation fails within
    genradix_prealloc() for output streams. That's because
    genradix_prealloc() leaves initialized members initialized when the
    issue happens and SCTP stack will abort the current initialization but
    without cleaning up such members.

    The fix here is to always call genradix_free() when genradix_prealloc()
    fails, for output and also input streams, as it suffers from the same
    issue.

    Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com
    Fixes: 2075e50caf5e ("sctp: convert to genradix")
    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Ricardo Leitner
     
  • [ Upstream commit ddd9b5e3e765d8ed5a35786a6cb00111713fe161 ]

    Dev_hold has to be called always in rx_queue_add_kobject.
    Otherwise usage count drops below 0 in case of failure in
    kobject_init_and_add.

    Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
    Reported-by: syzbot
    Cc: Tetsuo Handa
    Cc: David Miller
    Cc: Lukas Bulwahn
    Signed-off-by: Jouni Hogander
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jouni Hogander
     
  • [ Upstream commit b7ac893652cafadcf669f78452329727e4e255cc ]

    The kernel may sleep while holding a spinlock.
    The function call path (from bottom to top) in Linux 4.19 is:

    net/nfc/nci/uart.c, 349:
    nci_skb_alloc in nci_uart_default_recv_buf
    net/nfc/nci/uart.c, 255:
    (FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
    net/nfc/nci/uart.c, 254:
    spin_lock in nci_uart_tty_receive

    nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
    (FUNC_PTR) means a function pointer is called.

    To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
    nci_skb_alloc().

    This bug is found by a static analysis tool STCheck written by myself.

    Signed-off-by: Jia-Ju Bai
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jia-Ju Bai
     
  • [ Upstream commit b43d1f9f7067c6759b1051e8ecb84e82cef569fe ]

    There is softlockup when using TPACKET_V3:
    ...
    NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
    (__irq_svc) from [] (_raw_spin_unlock_irqrestore+0x44/0x54)
    (_raw_spin_unlock_irqrestore) from [] (mod_timer+0x210/0x25c)
    (mod_timer) from []
    (prb_retire_rx_blk_timer_expired+0x68/0x11c)
    (prb_retire_rx_blk_timer_expired) from []
    (call_timer_fn+0x90/0x17c)
    (call_timer_fn) from [] (run_timer_softirq+0x2d4/0x2fc)
    (run_timer_softirq) from [] (__do_softirq+0x218/0x318)
    (__do_softirq) from [] (irq_exit+0x88/0xac)
    (irq_exit) from [] (msa_irq_exit+0x11c/0x1d4)
    (msa_irq_exit) from [] (handle_IPI+0x650/0x7f4)
    (handle_IPI) from [] (gic_handle_irq+0x108/0x118)
    (gic_handle_irq) from [] (__irq_usr+0x44/0x5c)
    ...

    If __ethtool_get_link_ksettings() is failed in
    prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
    is zero and the timer expire for retire_blk_timer is turn to
    mod_timer(&pkc->retire_blk_timer, jiffies + 0),
    which will trigger cpu usage of softirq is 100%.

    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Tested-by: Xiao Jiangfeng
    Signed-off-by: Mao Wenan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mao Wenan
     

18 Dec, 2019

22 commits

  • [ Upstream commit 86c76c09898332143be365c702cf8d586ed4ed21 ]

    A lockdep splat was observed when trying to remove an xdp memory
    model from the table since the mutex was obtained when trying to
    remove the entry, but not before the table walk started:

    Fix the splat by obtaining the lock before starting the table walk.

    Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
    Reported-by: Grygorii Strashko
    Signed-off-by: Jonathan Lemon
    Tested-by: Grygorii Strashko
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     
  • [ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ]

    The page pool keeps track of the number of pages in flight, and
    it isn't safe to remove the pool until all pages are returned.

    Disallow removing the pool until all pages are back, so the pool
    is always available for page producers.

    Make the page pool responsible for its own delayed destruction
    instead of relying on XDP, so the page pool can be used without
    the xdp memory model.

    When all pages are returned, free the pool and notify xdp if the
    pool is registered with the xdp memory system. Have the callback
    perform a table walk since some drivers (cpsw) may share the pool
    among multiple xdp_rxq_info.

    Note that the increment of pages_state_release_cnt may result in
    inflight == 0, resulting in the pool being released.

    Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
    Signed-off-by: Jonathan Lemon
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     
  • [ Upstream commit 95219afbb980f10934de9f23a3e199be69c5ed09 ]

    The act_ct TC module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The act_ct action doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Aaron Conole
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aaron Conole
     
  • [ Upstream commit d04ac224b1688f005a84f764cfe29844f8e9da08 ]

    The skb_mpls_push was not updating ethertype of an ethernet packet if
    the packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to
    port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
    not update the ethertype of the packet even though the previous
    push_eth action had added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4

    Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit df95467b6d2bfce49667ee4b71c67249b01957f7 ]

    hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
    return NULL if master node is not existing in the list.
    But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
    could occur.

    Test commands:
    ip netns add nst
    ip link add veth0 type veth peer name veth1
    ip link add veth2 type veth peer name veth3
    ip link set veth1 netns nst
    ip link set veth3 netns nst
    ip link set veth0 up
    ip link set veth2 up
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip netns exec nst ip link set veth1 up
    ip netns exec nst ip link set veth3 up
    ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
    ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
    ip netns exec nst ip link set hsr1 up
    hping3 192.168.100.2 -2 --flood &
    modprobe -rv hsr

    Splat looks like:
    [ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
    [ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
    [ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
    [ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
    [ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
    [ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
    [ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
    [ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
    [ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
    [ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
    [ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
    [ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
    [ 217.385940][ T1635] Call Trace:
    [ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740
    [ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10
    [ 217.388118][ T1635] ? check_object+0xaf/0x260
    [ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500
    [ 217.392017][ T1635] ? init_object+0x6b/0x80
    [ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0
    [ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500
    [ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0
    [ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0
    [ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40
    [ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
    [ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0
    [ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
    [ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0
    [ 217.401118][ T1635] ? memset+0x1f/0x40
    [ 217.402529][ T1635] ? __alloc_skb+0x317/0x500
    [ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0
    [ ... ]

    Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()")
    Acked-by: Cong Wang
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 040b5cfbcefa263ccf2c118c4938308606bb7ed8 ]

    The skb_mpls_pop was not updating ethertype of an ethernet packet if the
    packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to port 7
    is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
    the ethertype of the packet even though the previous push_eth action had
    added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x8847),
    mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    pop_mpls(eth_type=0x800),4

    Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit 0e4940928c26527ce8f97237fef4c8a91cd34207 ]

    After pskb_may_pull() we should always refetch the header
    pointers from the skb->data in case it got reallocated.

    In gre_parse_header(), the erspan header is still fetched
    from the 'options' pointer which is fetched before
    pskb_may_pull().

    Found this during code review of a KMSAN bug report.

    Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
    Cc: Lorenzo Bianconi
    Signed-off-by: Cong Wang
    Acked-by: Lorenzo Bianconi
    Acked-by: William Tu
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 8ffb055beae58574d3e77b4bf9d4d15eace1ca27 ]

    The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
    packets using port ranges") had added filtering based on port ranges
    to tc flower. However the commit missed necessary changes in hw-offload
    code, so the feature gave rise to generating incorrect offloaded flow
    keys in NIC.

    One more detailed example is below:

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 100-200 action drop

    With the setup above, an exact match filter with dst_port == 0 will be
    installed in NIC by hw-offload. IOW, the NIC will have a rule which is
    equivalent to the following one.

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 0 action drop

    The behavior was caused by the flow dissector which extracts packet
    data into the flow key in the tc flower. More specifically, regardless
    of exact match or specified port ranges, fl_init_dissector() set the
    FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
    numbers from skb in skb_flow_dissect() called by fl_classify(). Note
    that device drivers received the same struct flow_dissector object as
    used in skb_flow_dissect(). Thus, offloaded drivers could not identify
    which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
    set to struct flow_dissector in either case.

    This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
    tp_range field in struct fl_flow_key to recognize which filters are applied
    to offloaded drivers. At this point, when filters based on port ranges
    passed to drivers, drivers return the EOPNOTSUPP error because they do
    not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
    flag).

    Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
    Signed-off-by: Yoshiki Komachi
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yoshiki Komachi
     
  • [ Upstream commit 25a443f74bcff2c4d506a39eae62fc15ad7c618a ]

    When a device is bound to a clsact qdisc, bind events are triggered to
    registered drivers for both ingress and egress. However, if a driver
    registers to such a device using the indirect block routines then it is
    assumed that it is only interested in ingress offload and so only replays
    ingress bind/unbind messages.

    The NFP driver supports the offload of some egress filters when
    registering to a block with qdisc of type clsact. However, on unregister,
    if the block is still active, it will not receive an unbind egress
    notification which can prevent proper cleanup of other registered
    callbacks.

    Modify the indirect block callback command in TC to send messages of
    ingress and/or egress bind depending on the qdisc in use. NFP currently
    supports egress offload for TC flower offload so the changes are only
    added to TC.

    Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
    Signed-off-by: John Hurley
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     
  • [ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ]

    With indirect blocks, a driver can register for callbacks from a device
    that is does not 'own', for example, a tunnel device. When registering to
    or unregistering from a new device, a callback is triggered to generate
    a bind/unbind event. This, in turn, allows the driver to receive any
    existing rules or to properly clean up installed rules.

    When first added, it was assumed that all indirect block registrations
    would be for ingress offloads. However, the NFP driver can, in some
    instances, support clsact qdisc binds for egress offload.

    Change the name of the indirect block callback command in flow_offload to
    remove the 'ingress' identifier from it. While this does not change
    functionality, a follow up patch will implement a more more generic
    callback than just those currently just supporting ingress offload.

    Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
    Signed-off-by: John Hurley
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     
  • [ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ]

    ipv6_stub uses the ip6_dst_lookup function to allow other modules to
    perform IPv6 lookups. However, this function skips the XFRM layer
    entirely.

    All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
    ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
    which calls xfrm_lookup_route(). This patch fixes this inconsistent
    behavior by switching the stub to ip6_dst_lookup_flow, which also calls
    xfrm_lookup_route().

    This requires some changes in all the callers, as these two functions
    take different arguments and have different return types.

    Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e ]

    This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
    as some modules currently pass a net argument without a socket to
    ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
    ipv6_stub_impl.ipv6_dst_lookup to take net argument").

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ]

    In order to set/get/dump, the tipc uses the generic netlink
    infrastructure. So, when tipc module is inserted, init function
    calls genl_register_family().
    After genl_register_family(), set/get/dump commands are immediately
    allowed and these callbacks internally use the net_generic.
    net_generic is allocated by register_pernet_device() but this
    is called after genl_register_family() in the __init function.
    So, these callbacks would use un-initialized net_generic.

    Test commands:
    #SHELL1
    while :
    do
    modprobe tipc
    modprobe -rv tipc
    done

    #SHELL2
    while :
    do
    tipc link list
    done

    Splat looks like:
    [ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
    [ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
    [ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
    [ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
    [ 59.622550][ T2780] NET: Registered protocol family 30
    [ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
    [ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
    [ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
    [ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
    [ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
    [ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
    [ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
    [ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 59.625875][ T2780] tipc: Started in single node mode
    [ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
    [ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 59.636478][ T2788] Call Trace:
    [ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc]
    [ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0
    [ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
    [ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
    [ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380
    [ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
    [ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
    [ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270
    [ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0
    [ 59.643050][ T2788] netlink_dump+0x49c/0xed0
    [ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0
    [ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800
    [ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670
    [ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800
    [ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90
    [ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0
    [ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0
    [ 59.647821][ T2788] ? genl_unlock+0x20/0x20
    [ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0
    [ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0
    [ 59.649276][ T2788] ? genl_rcv+0x15/0x40
    [ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0
    [ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350
    [ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.651491][ T2788] ? netlink_ack+0x940/0x940
    [ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0
    [ 59.652449][ T2788] genl_rcv+0x24/0x40
    [ 59.652841][ T2788] netlink_unicast+0x421/0x600
    [ ... ]

    Fixes: 7e4369057806 ("tipc: fix a slab object leak")
    Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
    Signed-off-by: Taehee Yoo
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 ]

    Back in 2008, Adam Langley fixed the corner case of packets for flows
    having all of the following options : MD5 TS SACK

    Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
    can be cooked from the remaining 8 bytes.

    tcp_established_options() correctly sets opts->num_sack_blocks
    to zero, but returns 36 instead of 32.

    This means TCP cooks packets with 4 extra bytes at the end
    of options, containing unitialized bytes.

    Fixes: 33ad798c924b ("tcp: options clean up")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Neal Cardwell
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 ]

    The openvswitch module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The openvswitch module doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aaron Conole
     
  • [ Upstream commit 4a5cdc604b9cf645e6fa24d8d9f055955c3c8516 ]

    ENOTSUPP is not available in userspace, for example:

    setsockopt failed, 524, Unknown error 524

    Signed-off-by: Valentin Vidic
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Valentin Vidic
     
  • [ Upstream commit 2dd5616ecdcebdf5a8d007af64e040d4e9214efe ]

    Use the new tcf_proto_check_kind() helper to make sure user
    provided value is well formed.

    BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
    BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
    CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x220 lib/dump_stack.c:118
    kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
    __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
    string_nocheck lib/vsprintf.c:606 [inline]
    string+0x4be/0x600 lib/vsprintf.c:668
    vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
    __request_module+0x2b1/0x11c0 kernel/kmod.c:143
    tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
    tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
    tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
    rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
    netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg net/socket.c:657 [inline]
    ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
    __sys_sendmsg net/socket.c:2356 [inline]
    __do_sys_sendmsg net/socket.c:2365 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
    do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45a649
    Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
    RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
    RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
    R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
    kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
    kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
    slab_alloc_node mm/slub.c:2773 [inline]
    __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
    __kmalloc_reserve net/core/skbuff.c:141 [inline]
    __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
    alloc_skb include/linux/skbuff.h:1049 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
    netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg net/socket.c:657 [inline]
    ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
    __sys_sendmsg net/socket.c:2356 [inline]
    __do_sys_sendmsg net/socket.c:2365 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
    do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 6f96c3c6904c ("net_sched: fix backward compatibility for TCA_KIND")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Cong Wang
    Cc: Marcelo Ricardo Leitner
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 2f23cd42e19c22c24ff0e221089b7b6123b117c5 ]

    sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
    in mq_dump() and mqprio_dump().

    Fixes: ce679e8df7ed ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
    Signed-off-by: Dust Li
    Signed-off-by: Tony Lu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dust Li
     
  • [ Upstream commit 8bef0af09a5415df761b04fa487a6c34acae74bc ]

    Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
    ability to override protocol and network offset during flow dissection
    for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
    in order to fix skb hashing for RPS on Rx path.

    However, skb_hash() and added part of code can be invoked not only on
    Rx, but also on Tx path if we have a multi-queued device and:
    - kernel is running on UP system or
    - XPS is not configured.

    The call stack in this two cases will be like: dev_queue_xmit() ->
    __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
    skb_tx_hash() -> skb_get_hash().

    The problem is that skbs queued for Tx have both network offset and
    correct protocol already set up even after inserting a CPU tag by DSA
    tagger, so calling tag_ops->flow_dissect() on this path actually only
    breaks flow dissection and hashing.

    This can be observed by adding debug prints just before and right after
    tag_ops->flow_dissect() call to the related block of code:

    Before the patch:

    Rx path (RPS):

    [ 19.240001] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 19.244271] tag_ops->flow_dissect()
    [ 19.247811] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */

    [ 19.215435] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 19.219746] tag_ops->flow_dissect()
    [ 19.223241] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */

    [ 18.654057] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 18.658332] tag_ops->flow_dissect()
    [ 18.661826] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */

    Tx path (UP system):

    [ 18.759560] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */
    [ 18.763933] tag_ops->flow_dissect()
    [ 18.767485] Tx: proto: 0x920b, nhoff: 34 /* junk */

    [ 22.800020] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */
    [ 22.804392] tag_ops->flow_dissect()
    [ 22.807921] Tx: proto: 0x920b, nhoff: 34 /* junk */

    [ 16.898342] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */
    [ 16.902705] tag_ops->flow_dissect()
    [ 16.906227] Tx: proto: 0x920b, nhoff: 34 /* junk */

    After:

    Rx path (RPS):

    [ 16.520993] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 16.525260] tag_ops->flow_dissect()
    [ 16.528808] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */

    [ 15.484807] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 15.490417] tag_ops->flow_dissect()
    [ 15.495223] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */

    [ 17.134621] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
    [ 17.138895] tag_ops->flow_dissect()
    [ 17.142388] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */

    Tx path (UP system):

    [ 15.499558] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */

    [ 20.664689] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */

    [ 18.565782] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */

    In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
    to prevent code from calling tag_ops->flow_dissect() on Tx.
    I also decided to initialize 'offset' variable so tagger callbacks can
    now safely leave it untouched without provoking a chaos.

    Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
    Signed-off-by: Alexander Lobakin
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexander Lobakin
     
  • [ Upstream commit c4b4c421857dc7b1cf0dccbd738472360ff2cd70 ]

    We have an interesting memory leak in the bridge when it is being
    unregistered and is a slave to a master device which would change the
    mac of its slaves on unregister (e.g. bond, team). This is a very
    unusual setup but we do end up leaking 1 fdb entry because
    dev_set_mac_address() would cause the bridge to insert the new mac address
    into its table after all fdbs are flushed, i.e. after dellink() on the
    bridge has finished and we call NETDEV_UNREGISTER the bond/team would
    release it and will call dev_set_mac_address() to restore its original
    address and that in turn will add an fdb in the bridge.
    One fix is to check for the bridge dev's reg_state in its
    ndo_set_mac_address callback and return an error if the bridge is not in
    NETREG_REGISTERED.

    Easy steps to reproduce:
    1. add bond in mode != A/B
    2. add any slave to the bond
    3. add bridge dev as a slave to the bond
    4. destroy the bridge device

    Trace:
    unreferenced object 0xffff888035c4d080 (size 128):
    comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
    hex dump (first 32 bytes):
    41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............
    d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?...........
    backtrace:
    [] kmem_cache_alloc+0x155/0x26f
    [] fdb_create+0x21/0x486 [bridge]
    [] fdb_insert+0x91/0xdc [bridge]
    [] br_fdb_change_mac_address+0xb3/0x175 [bridge]
    [] br_stp_change_bridge_id+0xf/0xff [bridge]
    [] br_set_mac_address+0x76/0x99 [bridge]
    [] dev_set_mac_address+0x63/0x9b
    [] __bond_release_one+0x3f6/0x455 [bonding]
    [] bond_netdev_event+0x2f2/0x400 [bonding]
    [] notifier_call_chain+0x38/0x56
    [] call_netdevice_notifiers+0x1e/0x23
    [] rollback_registered_many+0x353/0x6a4
    [] unregister_netdevice_many+0x17/0x6f
    [] rtnl_delete_link+0x3c/0x43
    [] rtnl_dellink+0x1dc/0x20a
    [] rtnetlink_rcv_msg+0x23d/0x268

    Fixes: 43598813386f ("bridge: add local MAC address to forwarding table (v2)")
    Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • [ Upstream commit 9f104c7736904ac72385bbb48669e0c923ca879b ]

    When user runs a command like
    tc qdisc add dev eth1 root mqprio
    KASAN stack-out-of-bounds warning is emitted.
    Currently, NLA_ALIGN macro used in mqprio_dump provides too large
    buffer size as argument for nla_put and memcpy down the call stack.
    The flow looks like this:
    1. nla_put expects exact object size as an argument;
    2. Later it provides this size to memcpy;
    3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
    macro itself.

    Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
    Otherwise it will lead to out-of-bounds memory access in memcpy.

    Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
    Signed-off-by: Vladyslav Tarasiuk
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vladyslav Tarasiuk
     
  • [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]

    syzbot was once again able to crash a host by setting a very small mtu
    on loopback device.

    Let's make inetdev_valid_mtu() available in include/net/ip.h,
    and use it in ip_setup_cork(), so that we protect both ip_append_page()
    and __ip_append_data()

    Also add a READ_ONCE() when the device mtu is read.

    Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
    even if other code paths might write over this field.

    Add a big comment in include/linux/netdevice.h about dev->mtu
    needing READ_ONCE()/WRITE_ONCE() annotations.

    Hopefully we will add the missing ones in followup patches.

    [1]

    refcount_t: saturated; leaking memory.
    WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x197/0x210 lib/dump_stack.c:118
    panic+0x2e3/0x75c kernel/panic.c:221
    __warn.cold+0x2f/0x3e kernel/panic.c:582
    report_bug+0x289/0x300 lib/bug.c:195
    fixup_bug arch/x86/kernel/traps.c:174 [inline]
    fixup_bug arch/x86/kernel/traps.c:169 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
    do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
    invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
    RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
    Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
    RSP: 0018:ffff88809689f550 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
    RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
    R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
    R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
    refcount_add include/linux/refcount.h:193 [inline]
    skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
    sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
    ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
    udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
    inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
    kernel_sendpage+0x92/0xf0 net/socket.c:3794
    sock_sendpage+0x8b/0xc0 net/socket.c:936
    pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
    splice_from_pipe_feed fs/splice.c:512 [inline]
    __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
    splice_from_pipe+0x108/0x170 fs/splice.c:671
    generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
    do_splice_from fs/splice.c:861 [inline]
    direct_splice_actor+0x123/0x190 fs/splice.c:1035
    splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
    do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
    do_sendfile+0x597/0xd00 fs/read_write.c:1464
    __do_sys_sendfile64 fs/read_write.c:1525 [inline]
    __se_sys_sendfile64 fs/read_write.c:1511 [inline]
    __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x441409
    Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
    RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
    RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
    R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
    R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
    Kernel Offset: disabled
    Rebooting in 86400 seconds..

    Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

13 Dec, 2019

2 commits

  • commit 8670b2b8b029a6650d133486be9d2ace146fd29a upstream.

    udev has a feature of creating /dev/ device-nodes if it finds
    a devnode: modalias. This allows for auto-loading of modules that
    provide the node. This requires to use a statically allocated minor
    number for misc character devices.

    However, rfkill uses dynamic minor numbers and prevents auto-loading
    of the module. So allocate the next static misc minor number and use
    it for rfkill.

    Signed-off-by: Marcel Holtmann
    Link: https://lore.kernel.org/r/20191024174042.19851-1-marcel@holtmann.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Marcel Holtmann
     
  • commit 66eb3add452aa1be65ad536da99fac4b8f620b74 upstream.

    Jon Hunter: "I have been tracking down another suspend/NFS related
    issue where again I am seeing random delays exiting suspend. The delays
    can be up to a couple minutes in the worst case and this is causing a
    suspend test we have to fail."

    Change the use of a deferrable work to a standard delayed one.

    Reported-by: Jon Hunter
    Tested-by: Jon Hunter
    Fixes: 7e0a0e38fcfea ("SUNRPC: Replace the queue timer with a delayed work function")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     

05 Dec, 2019

3 commits

  • [ Upstream commit fd567ac20cb0377ff466d3337e6e9ac5d0cb15e4 ]

    In commit 4f07b80c9733 ("tipc: check msg->req data len in
    tipc_nl_compat_bearer_disable") the same patch code was copied into
    routines: tipc_nl_compat_bearer_disable(),
    tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats().
    The two link routine occurrences should have been modified to check
    the maximum link name length and not bearer name length.

    Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable")
    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Rutherford
     
  • [ Upstream commit c5daa6cccdc2f94aca2c9b3fa5f94e4469997293 ]

    Partially sent record cleanup path increments an SG entry
    directly instead of using sg_next(). This should not be a
    problem today, as encrypted messages should be always
    allocated as arrays. But given this is a cleanup path it's
    easy to miss was this ever to change. Use sg_next(), and
    simplify the code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • [ Upstream commit 9e5ffed37df68d0ccfb2fdc528609e23a1e70ebe ]

    Looks like when BPF support was added by commit d3b18ad31f93
    ("tls: add bpf support to sk_msg handling") and
    commit d829e9c4112b ("tls: convert to generic sk_msg interface")
    it broke/removed the support for in-place crypto as added by
    commit 4e6d47206c32 ("tls: Add support for inplace records
    encryption").

    The inplace_crypto member of struct tls_rec is dead, inited
    to zero, and sometimes set to zero again. It used to be
    set to 1 when record was allocated, but the skmsg code doesn't
    seem to have been written with the idea of in-place crypto
    in mind.

    Since non trivial effort is required to bring the feature back
    and we don't really have the HW to measure the benefit just
    remove the left over support for now to avoid confusing readers.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski