22 Jul, 2018

40 commits

  • commit 3ee7e8697d5860b173132606d80a9cd35e7113ee upstream.

    syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
    wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
    WB_shutting_down after wb->bdi->dev became NULL. This indicates that
    unregister_bdi() failed to call wb_shutdown() on one of wb objects.

    The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
    drops bdi's reference to wb structures before going through the list of
    wbs again and calling wb_shutdown() on each of them. This way the loop
    iterating through all wbs can easily miss a wb if that wb has already
    passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
    from cgwb_release_workfn() and as a result fully shutdown bdi although
    wb_workfn() for this wb structure is still running. In fact there are
    also other ways cgwb_bdi_unregister() can race with
    cgwb_release_workfn() leading e.g. to use-after-free issues:

    CPU1 CPU2
    cgwb_bdi_unregister()
    cgwb_kill(*slot);

    cgwb_release()
    queue_work(cgwb_release_wq, &wb->release_work);
    cgwb_release_workfn()
    wb = list_first_entry(&bdi->wb_list, ...)
    spin_unlock_irq(&cgwb_lock);
    wb_shutdown(wb);
    ...
    kfree_rcu(wb, rcu);
    wb_shutdown(wb); -> oops use-after-free

    We solve these issues by synchronizing writeback structure shutdown from
    cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
    way we also no longer need synchronization using WB_shutting_down as the
    mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
    CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
    bdi_unregister().

    Reported-by: syzbot
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.

    Eric Dumazet reports:
    Here is a reproducer of an annoying bug detected by syzkaller on our production kernel
    [..]
    ./b78305423 enable_conntrack
    Then :
    sleep 60
    dmesg | tail -10
    [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2

    Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook.
    skb gets queued until frag timer expiry -- 1 minute.

    Normally nf_conntrack_reasm gets called during prerouting, so skb has
    no dst yet which might explain why this wasn't spotted earlier.

    Reported-by: Eric Dumazet
    Reported-by: John Sperbeck
    Signed-off-by: Florian Westphal
    Tested-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit bab2c80e5a6c855657482eac9e97f5f3eedb509a upstream.

    When pulling the NSH header in nsh_gso_segment, set the mac length
    based on the encapsulated packet type.

    skb_reset_mac_len computes an offset to the network header, which
    here still points to the outer packet:

    > skb_reset_network_header(skb);
    > [...]
    > __skb_pull(skb, nsh_len);
    > skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr
    > skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528
    > [..]
    > skb_mac_gso_segment(skb, ..)

    Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com
    Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com
    Fixes: c411ed854584 ("nsh: add GSO support")
    Signed-off-by: Willem de Bruijn
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • commit 02f51d45937f7bc7f4dee21e9f85b2d5eac37104 upstream.

    The autofs subsystem does not check that the "path" parameter is present
    for all cases where it is required when it is passed in via the "param"
    struct.

    In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
    ioctl command.

    To solve it, modify validate_dev_ioctl(function to check that a path has
    been provided for ioctl commands that require it.

    Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net
    Signed-off-by: Tomas Bortoli
    Signed-off-by: Ian Kent
    Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Tomas Bortoli
     
  • commit 32da12216e467dea70a09cd7094c30779ce0f9db upstream.

    In the zerocopy sendmsg() path, there are error checks to revert
    the zerocopy if we get any error code. syzkaller has discovered
    that tls_push_record can return -ECONNRESET, which is fatal, and
    happens after the point at which it is safe to revert the iter,
    as we've already passed the memory to do_tcp_sendpages.

    Previously this code could return -ENOMEM and we would want to
    revert the iter, but AFAIK this no longer returns ENOMEM after
    a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
    so we fail for all error codes.

    Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
    Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
    Signed-off-by: Dave Watson
    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dave Watson
     
  • commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream.

    My recent fix for dns_resolver_preparse() printing very long strings was
    incomplete, as shown by syzbot which still managed to hit the
    WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:

    precision 50001 too large
    WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0

    The bug this time isn't just a printing bug, but also a logical error
    when multiple options ("#"-separated strings) are given in the key
    payload. Specifically, when separating an option string into name and
    value, if there is no value then the name is incorrectly considered to
    end at the end of the key payload, rather than the end of the current
    option. This bypasses validation of the option length, and also means
    that specifying multiple options is broken -- which presumably has gone
    unnoticed as there is currently only one valid option anyway.

    A similar problem also applied to option values, as the kstrtoul() when
    parsing the "dnserror" option will read past the end of the current
    option and into the next option.

    Fix these bugs by correctly computing the length of the option name and
    by copying the option value, null-terminated, into a temporary buffer.

    Reproducer for the WARN_ONCE() that syzbot hit:

    perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s

    Reproducer for "dnserror" option being parsed incorrectly (expected
    behavior is to fail when seeing the unknown option "foo", actual
    behavior was to read the dnserror value as "1#foo" and fail there):

    perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s

    Reported-by: syzbot
    Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
    Signed-off-by: Eric Biggers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit fe10e398e860955bac4d28ec031b701d358465e4 upstream.

    ReiserFS prepares log messages into a 1024-byte buffer with no bounds
    checks. Long messages, such as the "unknown mount option" warning when
    userspace passes a crafted mount options string, overflow this buffer.
    This causes KASAN to report a global-out-of-bounds write.

    Fix it by truncating messages to the buffer size.

    Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com
    Signed-off-by: Eric Biggers
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream.

    the ebtables evaluation loop expects targets to return
    positive values (jumps), or negative values (absolute verdicts).

    This is completely different from what xtables does.
    In xtables, targets are expected to return the standard netfilter
    verdicts, i.e. NF_DROP, NF_ACCEPT, etc.

    ebtables will consider these as jumps.

    Therefore reject any target found due to unspec fallback.
    v2: also reject watchers. ebtables ignores their return value, so
    a target that assumes skb ownership (and returns NF_STOLEN) causes
    use-after-free.

    The only watchers in the 'ebtables' front-end are log and nflog;
    both have AF_BRIDGE specific wrappers on kernel side.

    Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 35a88a18d7ea58600e11590405bc93b08e16e7f5 upstream.

    Commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
    uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can
    also run in tasklet context as the channel event callback, so bottom halves
    should be disabled to prevent a race condition.

    With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that
    don't have commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs
    are disabled/enabled"), when the upper layer IRQ code calls
    hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the
    beginning of __local_bh_enable_ip():

    IRQs not enabled as expected
    WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip

    The warning exposes an issue in de0aa7b2f97d: local_bh_enable() can
    potentially call do_softirq(), which is not supposed to run when local IRQs
    are disabled. Let's fix this by using local_irq_save()/restore() instead.

    Note: hv_pci_onchannelcallback() is not a hot path because it's only called
    when the PCI device is hot added and removed, which is infrequent.

    Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
    Signed-off-by: Dexuan Cui
    Signed-off-by: Lorenzo Pieralisi
    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Haiyang Zhang
    Cc: stable@vger.kernel.org
    Cc: Stephen Hemminger
    Cc: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream.

    When blk_queue_enter() waits for a queue to unfreeze, or unset the
    PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

    The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
    ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
    device is resumed asynchronously, i.e. after un-freezing userspace tasks.

    So that commit exposed the bug as a regression in v4.15. A mysterious
    SIGBUS (or -EIO) sometimes happened during the time the device was being
    resumed. Most frequently, there was no kernel log message, and we saw Xorg
    or Xwayland killed by SIGBUS.[1]

    [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

    Without this fix, I get an IO error in this test:

    # dd if=/dev/sda of=/dev/null iflag=direct & \
    while killall -SIGUSR1 dd; do sleep 0.1; done & \
    echo mem > /sys/power/state ; \
    sleep 5; killall dd # stop after 5 seconds

    The interruptible wait was added to blk_queue_enter in
    commit 3ef28e83ab15 ("block: generic request_queue reference counting").
    Before then, the interruptible wait was only in blk-mq, but I don't think
    it could ever have been correct.

    Reviewed-by: Bart Van Assche
    Cc: stable@vger.kernel.org
    Signed-off-by: Alan Jenkins
    Signed-off-by: Jens Axboe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Alan Jenkins
     
  • commit 3f6e6986045d47f87bd982910821b7ab9758487e upstream.

    Since commit 1bb88666775e ("mtd: nand: denali: handle timing parameters
    by setup_data_interface()"), denali_dt.c gets the clock rate from the
    clock driver. The driver expects the frequency of the bus interface
    clock, whereas the clock driver of SOCFPGA provides the core clock.
    Thus, the setup_data_interface() hook calculates timing parameters
    based on a wrong frequency.

    To make it work without relying on the clock driver, hard-code the clock
    frequency, 200MHz. This is fine for existing DT of UniPhier, and also
    fixes the issue of SOCFPGA because both platforms use 200 MHz for the
    bus interface clock.

    Fixes: 1bb88666775e ("mtd: nand: denali: handle timing parameters by setup_data_interface()")
    Cc: linux-stable #4.14+
    Reported-by: Philipp Rosenberger
    Suggested-by: Boris Brezillon
    Signed-off-by: Masahiro Yamada
    Tested-by: Richard Weinberger
    Signed-off-by: Boris Brezillon
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Masahiro Yamada
     
  • commit 2546da99212f22034aecf279da9c47cbfac6c981 upstream.

    The RX SGL in processing is already registered with the RX SGL tracking
    list to support proper cleanup. The cleanup code path uses the
    sg_num_bytes variable which must therefore be always initialized, even
    in the error code path.

    Signed-off-by: Stephan Mueller
    Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com
    #syz test: https://github.com/google/kmsan.git master
    CC: #4.14
    Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
    Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Stephan Mueller
     
  • commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream.

    A number of places relies on list_empty(&cs->wd_list), however the
    list_head does not get initialized. Do so upon registration, such that
    thereafter it is possible to rely on list_empty() correctly reflecting
    the list membership status.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Diego Viola
    Reviewed-by: Rafael J. Wysocki
    Cc: stable@vger.kernel.org
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 8d4068810d9926250dd2435719a080b889eb44c3 upstream.

    If there is IR in the raw kfifo when ir_raw_event_unregister() is called,
    then kthread_stop() causes ir_raw_event_thread to be scheduled, decode
    some scancodes and re-arm timer_keyup. The timer_keyup then fires when
    the rc device is long gone.

    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Young
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Sean Young
     
  • commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e upstream.

    Hub driver will try to disable a USB3 device twice at logical disconnect,
    racing with xhci_free_dev() callback from the first port disable.

    This can be triggered with "udisksctl power-off --block-device "
    or by writing "1" to the "remove" sysfs file for a USB3 device
    in 4.17-rc4.

    USB3 devices don't have a similar disabled link state as USB2 devices,
    and use a U3 suspended link state instead. In this state the port
    is still enabled and connected.

    hub_port_connect() first disconnects the device, then later it notices
    that device is still enabled (due to U3 states) it will try to disable
    the port again (set to U3).

    The xhci_free_dev() called during device disable is async, so checking
    for existing xhci->devs[i] when setting link state to U3 the second time
    was successful, even if device was being freed.

    The regression was caused by, and whole thing revealed by,
    Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
    which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
    and causes a NULL pointer dereference the second time we try to set U3.

    Fix this by checking xhci->devs[i]->udev exists before setting link state.

    The original patch went to stable so this fix needs to be applied there as
    well.

    Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
    Cc:
    Reported-by: Jordan Glover
    Tested-by: Jordan Glover
    Signed-off-by: Mathias Nyman
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Mathias Nyman
     
  • commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

    The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
    which could lead to rare kernel oopses. So protect the whole skb walk with
    a spin lock. As a benefit we can unlink the skb directly.

    This patch was tested on Raspberry Pi 3B+

    Link: https://github.com/raspberrypi/linux/issues/2608
    Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
    Cc: stable
    Signed-off-by: Floris Bos
    Signed-off-by: Stefan Wahren
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefan Wahren
     
  • commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream.

    Without this patch, firmware will not run properly on rtl8821ae, and it
    causes bad user experience. For example, bad connection performance with
    low rate, higher power consumption, and so on.

    rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and
    each firmware has firmware data buffer and size individually. Original
    code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and
    this mismatch causes firmware checksum error, then firmware can't start.

    In this situation, driver gives message "Firmware is not ready to run!".

    Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware")
    Signed-off-by: Ping-Ke Shih
    Cc: Stable # 4.0+
    Reviewed-by: Larry Finger
    Signed-off-by: Kalle Valo
    Signed-off-by: Greg Kroah-Hartman

    Ping-Ke Shih
     
  • commit 12dfa2f68ab659636e092db13b5d17cf9aac82af upstream.

    When connecting to AP, mac80211 asks driver to enter and leave PS quickly,
    but driver deinit doesn't wait for delayed work complete when entering PS,
    then driver reinit procedure and delay work are running simultaneously.
    This will cause unpredictable kernel oops or crash like

    rtl8723be: error H2C cmd because of Fw download fail!!!
    WARNING: CPU: 3 PID: 159 at drivers/net/wireless/realtek/rtlwifi/
    rtl8723be/fw.c:227 rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be]
    CPU: 3 PID: 159 Comm: kworker/3:2 Tainted: G O 4.16.13-2-ARCH #1
    Hardware name: ASUSTeK COMPUTER INC. X556UF/X556UF, BIOS X556UF.406
    10/21/2016
    Workqueue: rtl8723be_pci rtl_c2hcmd_wq_callback [rtlwifi]
    RIP: 0010:rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be]
    RSP: 0018:ffffa6ab01e1bd70 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffffa26069071520 RCX: 0000000000000001
    RDX: 0000000080000001 RSI: ffffffff8be70e9c RDI: 00000000ffffffff
    RBP: 0000000000000000 R08: 0000000000000048 R09: 0000000000000348
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffffa26069071520 R14: 0000000000000000 R15: ffffa2607d205f70
    FS: 0000000000000000(0000) GS:ffffa26081d80000(0000) knlGS:000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000443b39d3000 CR3: 000000037700a005 CR4: 00000000003606e0
    Call Trace:
    ? halbtc_send_bt_mp_operation.constprop.17+0xd5/0xe0 [btcoexist]
    ? ex_btc8723b1ant_bt_info_notify+0x3b8/0x820 [btcoexist]
    ? rtl_c2hcmd_launcher+0xab/0x110 [rtlwifi]
    ? process_one_work+0x1d1/0x3b0
    ? worker_thread+0x2b/0x3d0
    ? process_one_work+0x3b0/0x3b0
    ? kthread+0x112/0x130
    ? kthread_create_on_node+0x60/0x60
    ? ret_from_fork+0x35/0x40
    Code: 00 76 b4 e9 e2 fe ff ff 4c 89 ee 4c 89 e7 e8 56 22 86 ca e9 5e ...

    This patch ensures all delayed works done before entering PS to satisfy
    our expectation, so use cancel_delayed_work_sync() instead. An exception
    is delayed work ips_nic_off_wq because running task may be itself, so add
    a parameter ips_wq to deinit function to handle this case.

    This issue is reported and fixed in below threads:
    https://github.com/lwfinger/rtlwifi_new/issues/367
    https://github.com/lwfinger/rtlwifi_new/issues/366

    Tested-by: Evgeny Kapun # 8723DE
    Tested-by: Shivam Kakkar # 8723BE on 4.18-rc1
    Signed-off-by: Ping-Ke Shih
    Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.")
    Cc: Stable # 4.11+
    Reviewed-by: Larry Finger
    Signed-off-by: Kalle Valo
    Signed-off-by: Greg Kroah-Hartman

    Ping-Ke Shih
     
  • commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream.

    t.qset_idx can be indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl()
    warn: potential spectre issue 'adapter->msix_info'

    Fix this by sanitizing t.qset_idx before using it to index
    adapter->msix_info

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Cc: stable@vger.kernel.org
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     
  • [ Upstream commit e5ab564c9ebee77794842ca7d7476147b83d6a27 ]

    The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
    used, and in fact in virtio_transport_common.c only 64 bit accessors are
    used. Using 32 bit accessors for 64 bit values breaks big endian systems.

    This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.

    Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")

    Signed-off-by: Claudio Imbrenda
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Claudio Imbrenda
     
  • [ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ]

    Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when
    we meet errors during ubuf allocation, the code does not check for
    NULL before calling sockfd_put(), this will lead NULL
    dereferencing. Fixing by checking sock pointer before.

    Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
    Reported-by: Dan Carpenter
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Wang
     
  • [ Upstream commit 1236f22fbae15df3736ab4a984c64c0c6ee6254c ]

    If SACK is not enabled and the first cumulative ACK after the RTO
    retransmission covers more than the retransmitted skb, a spurious
    FRTO undo will trigger (assuming FRTO is enabled for that RTO).
    The reason is that any non-retransmitted segment acknowledged will
    set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
    no indication that it would have been delivered for real (the
    scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
    case so the check for that bit won't help like it does with SACK).
    Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
    in tcp_process_loss.

    We need to use more strict condition for non-SACK case and check
    that none of the cumulatively ACKed segments were retransmitted
    to prove that progress is due to original transmissions. Only then
    keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
    non-SACK case.

    (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
    to better indicate its purpose but to keep this change minimal, it
    will be done in another patch).

    Besides burstiness and congestion control violations, this problem
    can result in RTO loop: When the loss recovery is prematurely
    undoed, only new data will be transmitted (if available) and
    the next retransmission can occur only after a new RTO which in case
    of multiple losses (that are not for consecutive packets) requires
    one RTO per loss to recover.

    Signed-off-by: Ilpo Järvinen
    Tested-by: Neal Cardwell
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ilpo Järvinen
     
  • [ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]

    Fast Open key could be stored in different endian based on the CPU.
    Previously hosts in different endianness in a server farm using
    the same key config (sysctl value) would produce different cookies.
    This patch fixes it by always storing it as little endian to keep
    same API for LE hosts.

    Reported-by: Daniele Iamartino
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yuchung Cheng
     
  • [ Upstream commit 977c7114ebda2e746a114840d3a875e0cdb826fb ]

    On receving an incomplete message, the existing code stores the
    remaining length of the cloned skb in the early_eaten field instead of
    incrementing the value returned by __strp_recv. This defers invocation
    of sock_rfree for the current skb until the next invocation of
    __strp_recv, which returns early_eaten if early_eaten is non-zero.

    This behavior causes a stall when the current message occupies the very
    tail end of a massive skb, and strp_peek/need_bytes indicates that the
    remainder of the current message has yet to arrive on the socket. The
    TCP receive buffer is totally full, causing the TCP window to go to
    zero, so the remainder of the message will never arrive.

    Incrementing the value returned by __strp_recv by the amount otherwise
    stored in early_eaten prevents stalls of this nature.

    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Doron Roberts-Kedes
     
  • [ Upstream commit b6cfffa7ad923c73f317ea50fd4ebcb3b4b6669c ]

    HW does not support Half-duplex mode in multi-queue
    scenario. Fix it by not advertising the Half-Duplex
    mode if multi-queue enabled.

    Signed-off-by: Bhadram Varka
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bhadram Varka
     
  • [ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]

    When unplugging an r8152 adapter while the interface is UP, the NIC
    becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes
    napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes
    netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to
    napi_disable, but the napi is already deleted by disconnect above. So
    the first while loop in napi_disable never finishes. This results in
    complete deadlock of the network layer as there is rtnl_mutex held by
    unregister_netdev.

    So avoid the call to napi_disable in rtl8152_close when the device is
    already gone.

    The other calls to usb_kill_urb, cancel_delayed_work_sync,
    netif_stop_queue etc. seem to be fine. The urb and netdev is not
    destroyed yet.

    Signed-off-by: Jiri Slaby
    Cc: linux-usb@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     
  • [ Upstream commit e7e197edd09c25774b4f12cab19f9d5462f240f4 ]

    This module exposes two USB configurations: a QMI+AT capable setup on
    USB config #1 and a MBIM capable setup on USB config #2.

    By default the kernel will choose the MBIM capable configuration as
    long as the cdc_mbim driver is available. This patch adds support for
    the QMI port in the secondary configuration.

    Signed-off-by: Aleksander Morgado
    Acked-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aleksander Morgado
     
  • [ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ]

    Memory size is limited in the kdump kernel environment. Allocation of more
    msix-vectors (or queues) consumes few tens of MBs of memory, which might
    lead to the kdump kernel failure.
    This patch adds changes to limit the number of MSI-X vectors in kdump
    kernel to minimum required value (i.e., 2 per engine).

    Fixes: fe56b9e6a ("qed: Add module with basic common support")
    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Michal Kalderon
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sudarsana Reddy Kalluru
     
  • [ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ]

    Use the correct size value while copying chassis/port id values.

    Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.")
    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Michal Kalderon
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sudarsana Reddy Kalluru
     
  • [ Upstream commit 538f8d00ba8bb417c4d9e76c61dee59d812d8287 ]

    By default, driver sets the eswitch mode incorrectly as VEB (virtual
    Ethernet bridging).
    Need to set VEB eswitch mode only when sriov is enabled, and it should be
    to set NONE by default. The patch incorporates this change.

    Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults")
    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Michal Kalderon
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sudarsana Reddy Kalluru
     
  • [ Upstream commit 82a4e71b1565dea8387f54503e806cf374e779ec ]

    When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode),
    get-tsinfo() callback should return the software timestamp capabilities
    instead of returning the error.

    Fixes: 4c55215c ("qede: Add driver support for PTP")
    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Michal Kalderon
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sudarsana Reddy Kalluru
     
  • [ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ]

    Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
    need to fail if dev_match is not true. Currently, a packet to a given port
    can match a socket bound to device when it should not. In the VRF case,
    this causes the lookup to hit a VRF socket and not a global socket
    resulting in a response trying to go through the VRF when it should not.

    Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
    Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
    Reported-by: Lou Berger
    Diagnosed-by: Renato Westphal
    Tested-by: Renato Westphal
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ]

    After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
    are friends"), sungem owners reported the infamous "eth0: hw csum failure"
    message.

    CHECKSUM_COMPLETE has in fact never worked for this driver, but this
    was masked by the fact that upper stacks had to strip the FCS, and
    therefore skb->ip_summed was set back to CHECKSUM_NONE before
    my recent change.

    Driver configures a number of bytes to skip when the chip computes
    the checksum, and for some reason only half of the Ethernet header
    was skipped.

    Then a second problem is that we should strip the FCS by default,
    unless the driver is updated to eventually support NETIF_F_RXFCS in
    the future.

    Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
    or not, so that the admin can turn off rx checksum if wanted.

    Many thanks to Andreas Schwab and Mathieu Malaterre for their
    help in debugging this issue.

    Signed-off-by: Eric Dumazet
    Reported-by: Meelis Roos
    Reported-by: Mathieu Malaterre
    Reported-by: Andreas Schwab
    Tested-by: Andreas Schwab
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ]

    When blackhole is used on top of classful qdisc like hfsc it breaks
    qlen and backlog counters because packets are disappear without notice.

    In HFSC non-zero qlen while all classes are inactive triggers warning:
    WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
    and schedules watchdog work endlessly.

    This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
    this flag tells upper layer: this packet is gone and isn't queued.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • [ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]

    We should put copy_skb in receive_queue only after
    a successful call to virtio_net_hdr_from_skb().

    syzbot report :

    BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
    BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
    BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553

    CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    __skb_unlink include/linux/skbuff.h:1843 [inline]
    __skb_dequeue include/linux/skbuff.h:1863 [inline]
    skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
    packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
    packet_release+0x630/0xd90 net/packet/af_packet.c:2991
    __sock_release+0xd7/0x260 net/socket.c:603
    sock_close+0x19/0x20 net/socket.c:1186
    __fput+0x35b/0x8b0 fs/file_table.c:209
    ____fput+0x15/0x20 fs/file_table.c:243
    task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x1b08/0x2750 kernel/exit.c:865
    do_group_exit+0x177/0x440 kernel/exit.c:968
    __do_sys_exit_group kernel/exit.c:979 [inline]
    __se_sys_exit_group kernel/exit.c:977 [inline]
    __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4448e9
    Code: Bad RIP value.
    RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
    RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
    RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
    R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
    R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000

    Allocated by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
    tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
    __kfree_skb net/core/skbuff.c:642 [inline]
    kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
    tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801b044ecc0
    which belongs to the cache skbuff_head_cache of size 232
    The buggy address is located 0 bytes inside of
    232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
    The buggy address belongs to the page:
    page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
    raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
    >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    ^
    ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

    Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 271f7ff5aa5a73488b7a9d8b84b5205fb5b2f7cc ]

    When using s/w buffer management, buffers are allocated and DMA mapped.
    When doing so on an arm64 platform, an offset correction is applied on
    the DMA address, before storing it in an Rx descriptor. The issue is
    this DMA address is then used later in the Rx path without removing the
    offset correction. Thus the DMA address is wrong, which can led to
    various issues.

    This patch fixes this by removing the offset correction from the DMA
    address retrieved from the Rx descriptor before using it in the Rx path.

    Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible")
    Signed-off-by: Antoine Tenart
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Antoine Tenart
     
  • [ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ]

    The driver allocates wrong size (due to wrong struct name) when issuing
    a query/set request to NIC's register.

    Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate")
    Signed-off-by: Shay Agroskin
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Shay Agroskin
     
  • [ Upstream commit f811980444ec59ad62f9e041adbb576a821132c7 ]

    Manipulating of the MPFS requires eswitch manager capabilities.

    Fixes: eeb66cdb6826 ('net/mlx5: Separate between E-Switch and MPFS')
    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Eli Cohen
     
  • [ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ]

    The NULL character was not set correctly for the string containing
    the command length, this caused failures reading the output of the
    command due to a random length. The fix is to initialize the output
    length string.

    Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Signed-off-by: Alex Vesker
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Alex Vesker
     
  • [ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ]

    The command interface can work in two modes: Events and Polling.
    In the general case, each time we invoke a command, a work is
    queued to handle it.

    When working in events, the interrupt handler completes the
    command execution. On the other hand, when working in polling
    mode, the work itself completes it.

    Due to a bug in the work handler, a command could have been
    completed by the interrupt handler, while the work handler
    hasn't finished yet, causing the it to complete once again
    if the command interface mode was changed from Events to
    polling after the interrupt handler was called.

    mlx5_unload_one()
    mlx5_stop_eqs()
    // Destroy the EQ before cmd EQ
    ...cmd_work_handler()
    write_doorbell()
    --> EVENT_TYPE_CMD
    mlx5_cmd_comp_handler() // First free
    free_ent(cmd, ent->idx)
    complete(&ent->done)

    mode = POLL;

    --> cmd_work_handler (continues)
    if (cmd->mode == POLL)
    mlx5_cmd_comp_handler() // Double free

    The solution is to store the cmd->mode before writing the doorbell.

    Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Signed-off-by: Alex Vesker
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Alex Vesker