31 Aug, 2022

40 commits

  • commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.

    Older Intel CPUs that are not in the affected processor list for MMIO
    Stale Data vulnerabilities currently report "Not affected" in sysfs,
    which may not be correct. Vulnerability status for these older CPUs is
    unknown.

    Add known-not-affected CPUs to the whitelist. Report "unknown"
    mitigation status for CPUs that are not in blacklist, whitelist and also
    don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
    immunity to MMIO Stale Data vulnerabilities.

    Mitigation is not deployed when the status is unknown.

    [ bp: Massage, fixup. ]

    Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
    Suggested-by: Andrew Cooper
    Suggested-by: Tony Luck
    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Pawan Gupta
     
  • commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.

    When meeting ftrace trampolines in ORC unwinding, unwinder uses address
    of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
    sp+176.

    If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
    sp+8 instead of 176. It makes unwinder skip correct frame and throw
    warnings such as "wrong direction" or "can't access registers", etc,
    depending on the content of the incorrect frame address.

    By adding the base address ftrace_{regs_}caller with the offset
    *ip - ops->trampoline*, we can get the correct address to find the ORC entry.

    Also change "caller" to "tramp_addr" to make variable name conform to
    its content.

    [ mingo: Clarified the changelog a bit. ]

    Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
    Signed-off-by: Chen Zhongjin
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (Google)
    Cc:
    Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Chen Zhongjin
     
  • commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.

    On the platform with Arch LBR, the HW raw branch type encoding may leak
    to the perf tool when the SAVE_TYPE option is not set.

    In the intel_pmu_store_lbr(), the HW raw branch type is stored in
    lbr_entries[].type. If the SAVE_TYPE option is set, the
    lbr_entries[].type will be converted into the generic PERF_BR_* type
    in the intel_pmu_lbr_filter() and exposed to the user tools.
    But if the SAVE_TYPE option is NOT set by the user, the current perf
    kernel doesn't clear the field. The HW raw branch type leaks.

    There are two solutions to fix the issue for the Arch LBR.
    One is to clear the field if the SAVE_TYPE option is NOT set.
    The other solution is to unconditionally convert the branch type and
    expose the generic type to the user tools.

    The latter is implemented here, because
    - The branch type is valuable information. I don't see a case where
    you would not benefit from the branch type. (Stephane Eranian)
    - Not having the branch type DOES NOT save any space in the
    branch record (Stephane Eranian)
    - The Arch LBR HW can retrieve the common branch types from the
    LBR_INFO. It doesn't require the high overhead SW disassemble.

    Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
    Reported-by: Stephane Eranian
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Kan Liang
     
  • commit 9ea0106a7a3d8116860712e3f17cd52ce99f6707 upstream.

    In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if
    the path is invalid. In this case, btrfs_get_dev_args_from_path()
    returns directly without freeing args->uuid and args->fsid allocated
    before, which causes memory leak.

    To fix these possible leaks, when btrfs_get_bdev_and_sb() fails,
    btrfs_put_dev_args_from_path() is called to clean up the memory.

    Reported-by: TOTE Robot
    Fixes: faa775c41d655 ("btrfs: add a btrfs_get_dev_args_from_path helper")
    CC: stable@vger.kernel.org # 5.16
    Reviewed-by: Boris Burkov
    Signed-off-by: Zixuan Fu
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Zixuan Fu
     
  • commit b51111271b0352aa596c5ae8faf06939e91b3b68 upstream.

    For a filesystem which has btrfs read-only property set to true, all
    write operations including xattr should be denied. However, security
    xattr can still be changed even if btrfs ro property is true.

    This happens because xattr_permission() does not have any restrictions
    on security.*, system.* and in some cases trusted.* from VFS and
    the decision is left to the underlying filesystem. See comments in
    xattr_permission() for more details.

    This patch checks if the root is read-only before performing the set
    xattr operation.

    Testcase:

    DEV=/dev/vdb
    MNT=/mnt

    mkfs.btrfs -f $DEV
    mount $DEV $MNT
    echo "file one" > $MNT/f1

    setfattr -n "security.one" -v 2 $MNT/f1
    btrfs property set /mnt ro true

    setfattr -n "security.one" -v 1 $MNT/f1

    umount $MNT

    CC: stable@vger.kernel.org # 4.9+
    Reviewed-by: Qu Wenruo
    Reviewed-by: Filipe Manana
    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Goldwyn Rodrigues
     
  • commit f2c3bec215694fb8bc0ef5010f2a758d1906fc2d upstream.

    If the replace target device reappears after the suspended replace is
    cancelled, it blocks the mount operation as it can't find the matching
    replace-item in the metadata. As shown below,

    BTRFS error (device sda5): replace devid present without an active replace item

    To overcome this situation, the user can run the command

    btrfs device scan --forget

    and try the mount command again. And also, to avoid repeating the issue,
    superblock on the devid=0 must be wiped.

    wipefs -a device-path-to-devid=0.

    This patch adds some info when this situation occurs.

    Reported-by: Samuel Greiner
    Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
    CC: stable@vger.kernel.org # 5.0+
    Signed-off-by: Anand Jain
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • commit 59a3991984dbc1fc47e5651a265c5200bd85464e upstream.

    If the filesystem mounts with the replace-operation in a suspended state
    and try to cancel the suspended replace-operation, we hit the assert. The
    assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
    scrub must not be running in suspended state") that was actually not
    required. So just remove it.

    $ mount /dev/sda5 /btrfs

    BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
    BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

    $ mount -o degraded /dev/sda5 /btrfs canceled

    Fixes: fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
    CC: stable@vger.kernel.org # 5.0+
    Signed-off-by: Anand Jain
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • commit 47bf225a8d2cccb15f7e8d4a1ed9b757dd86afd7 upstream.

    At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
    up returning from the function with a value of 0 (success). This happens
    because the function returns the value stored in the variable 'err',
    which is 0, while the error value we got from btrfs_search_slot() is
    stored in the 'ret' variable.

    So fix it by setting 'err' with the error value.

    Fixes: 8289ed9f93bef2 ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
    CC: stable@vger.kernel.org # 5.16+
    Reviewed-by: Qu Wenruo
    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • [ Upstream commit a3a57bf07de23fe1ff779e0fdf710aa581c3ff73 ]

    This is a follow-up to the discussion in [0]. It seems to me that
    at least the IP version used on Amlogic SoC's sometimes has a problem
    if register MAC_CTRL_REG is written whilst the chip is still processing
    a previous write. But that's just a guess.
    Adding a delay between two writes to this register helps, but we can
    also simply omit the offending second write. This patch uses the second
    approach and is based on a suggestion from Qi Duan.
    Benefit of this approach is that we can save few register writes, also
    on not affected chip versions.

    [0] https://www.spinics.net/lists/netdev/msg831526.html

    Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
    Suggested-by: Qi Duan
    Suggested-by: Jerome Brunet
    Signed-off-by: Heiner Kallweit
    Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Heiner Kallweit
     
  • [ Upstream commit 19058be7c48ceb3e60fa3948e24da1059bd68ee4 ]

    Assign a random mac address to the VF interface station
    address if it boots with a zero mac address in order to match
    similar behavior seen in other VF drivers. Handle the errors
    where the older firmware does not allow the VF to set its own
    station address.

    Newer firmware will allow the VF to set the station mac address
    if it hasn't already been set administratively through the PF.
    Setting it will also be allowed if the VF has trust.

    Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
    Signed-off-by: R Mohamed Shah
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    R Mohamed Shah
     
  • [ Upstream commit 0fc4dd452d6c14828eed6369155c75c0ac15bab3 ]

    In looping on FW update tests we occasionally see the
    FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
    waiting for the FW activate step to finsh inside the FW. The
    firmware is complaining that the done bit is set when a new
    dev_cmd is going to be processed.

    Doing a clean on the cmd registers and doorbell before exiting
    the wait-for-done and cleaning the done bit before the sleep
    prevents this from occurring.

    Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Shannon Nelson
     
  • [ Upstream commit 9cb9dadb8f45c67e4310e002c2f221b70312b293 ]

    There is a case found in heavy testing where a link flap happens just
    before a firmware Recovery event and the driver gets stuck in the
    BROKEN state. This comes from the driver getting interrupted by a FW
    generation change when coming back up from the link flap, and the call
    to ionic_start_queues() in ionic_link_status_check() fails. This can be
    addressed by having the fw_up code clear the BROKEN bit if seen, rather
    than waiting for a user to manually force the interface down and then
    back up.

    Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Shannon Nelson
     
  • [ Upstream commit 2624d95972dbebe5f226361bfc51a83bdb68c93b ]

    Widen the coverage of the queue_lock to be sure the lif init
    and lif deinit actions are protected. This addresses a hang
    seen when a Tx Timeout action was attempted at the same time
    as a FW Reset was started.

    Signed-off-by: Shannon Nelson
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Shannon Nelson
     
  • [ Upstream commit b0f571ecd7943423c25947439045f0d352ca3dbf ]

    Fix three bugs in the rxrpc's sendmsg implementation:

    (1) rxrpc_new_client_call() should release the socket lock when returning
    an error from rxrpc_get_call_slot().

    (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
    held in the event that we're interrupted by a signal whilst waiting
    for tx space on the socket or relocking the call mutex afterwards.

    Fix this by: (a) moving the unlock/lock of the call mutex up to
    rxrpc_send_data() such that the lock is not held around all of
    rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
    whether we're return with the lock dropped. Note that this means
    recvmsg() will not block on this call whilst we're waiting.

    (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
    to go and recheck the state of the tx_pending buffer and the
    tx_total_len check in case we raced with another sendmsg() on the same
    call.

    Thinking on this some more, it might make sense to have different locks for
    sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait
    for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating
    that a call is dead before a sendmsg() to that call returns - but that can
    currently happen anyway.

    Without fix (2), something like the following can be induced:

    WARNING: bad unlock balance detected!
    5.16.0-rc6-syzkaller #0 Not tainted
    -------------------------------------
    syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
    [] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by syz-executor011/3597.
    ...
    Call Trace:

    __dump_stack lib/dump_stack.c:88 [inline]
    dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
    print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
    __lock_release kernel/locking/lockdep.c:5306 [inline]
    lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
    __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
    rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
    rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
    sock_sendmsg_nosec net/socket.c:704 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:724
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

    Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
    Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
    cc: Hawkins Jiawei
    cc: Khalid Masum
    cc: Dan Carpenter
    cc: linux-afs@lists.infradead.org
    Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    David Howells
     
  • [ Upstream commit bcf3a156429306070afbfda5544f2b492d25e75b ]

    It was not possible to create 1-tuple flow director
    rule for IPv6 flow type. It was caused by incorrectly
    checking for source IP address when validating user provided
    destination IP address.

    Fix this by changing ip6src to correct ip6dst address
    in destination IP address validation for IPv6 flow type.

    Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
    Signed-off-by: Sylwester Dziedziuch
    Tested-by: Gurucharan (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen
    Signed-off-by: Sasha Levin

    Sylwester Dziedziuch
     
  • [ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]

    The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
    cyclecounter parameters need to be changed.

    Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
    devices"), this function has cleared the SYSTIME registers and reset the
    TSAUXC DISABLE_SYSTIME bit.

    While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
    them during ixgbe_ptp_start_cyclecounter. This function may be called
    during both reset and link status change. When link changes, the SYSTIME
    counter is still operating normally, but the cyclecounter should be updated
    to account for the possibly changed parameters.

    Clearing SYSTIME when link changes causes the timecounter to jump because
    the cycle counter now reads zero.

    Extract the SYSTIME initialization out to a new function and call this
    during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
    an unnecessary reset of the current time.

    This also restores the original SYSTIME clearing that occurred during
    ixgbe_ptp_reset before the commit above.

    Reported-by: Steve Payne
    Reported-by: Ilya Evenbach
    Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
    Signed-off-by: Jacob Keller
    Tested-by: Gurucharan (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen
    Signed-off-by: Sasha Levin

    Jacob Keller
     
  • [ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]

    While reading sysctl_somaxconn, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]

    While reading sysctl_devconf_inherit_init_net, it can be changed
    concurrently. Thus, we need to add READ_ONCE() to its readers.

    Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit af67508ea6cbf0e4ea27f8120056fa2efce127dd ]

    While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
    concurrently. Thus, we need to add READ_ONCE() to its readers.

    Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]

    While reading netdev_budget_usecs, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 657b991afb89d25fe6c4783b1b75a8ad4563670d ]

    While reading sysctl_max_skb_frags, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit f70cad1085d1e01d3ec73c1078405f906237feee ]

    We want to revert the skb TX cache, but MPTCP is currently
    using it unconditionally.

    Rework the MPTCP tx code, so that tcp_tx_skb_cache is not
    needed anymore: do the whole coalescing check, skb allocation
    skb initialization/update inside mptcp_sendmsg_frag(), quite
    alike the current TCP code.

    Reviewed-by: Mat Martineau
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     
  • [ Upstream commit 04d8825c30b718781197c8f07b1915a11bfb8685 ]

    the tcp_skb_entail() helper is actually skb_entail(), renamed
    to provide proper scope.

    The two helper will be used by the next patch.

    RFC -> v1:
    - rename skb_entail to tcp_skb_entail (Eric)

    Acked-by: Mat Martineau
    Signed-off-by: Paolo Abeni
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     
  • [ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]

    While reading netdev_budget, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]

    While reading sysctl_net_busy_read, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]

    While reading sysctl_net_busy_poll, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 060212928670 ("net: add low latency socket poll")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]

    While reading sysctl_tstamp_allow_data, it can be changed
    concurrently. Thus, we need to add READ_ONCE() to its reader.

    Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 7de6d09f51917c829af2b835aba8bb5040f8e86a ]

    While reading sysctl_optmem_max, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]

    While reading rs->interval and rs->burst, they can be changed
    concurrently via sysctl (e.g. net_ratelimit_state). Thus, we
    need to add READ_ONCE() to their readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]

    While reading netdev_tstamp_prequeue, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]

    While reading netdev_max_backlog, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    While at it, we remove the unnecessary spaces in the doc.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]

    While reading weight_p, it can be changed concurrently. Thus, we need
    to add READ_ONCE() to its reader.

    Also, dev_[rt]x_weight can be read/written at the same time. So, we
    need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to
    use the same weight_p while changing dev_[rt]x_weight, we add a mutex
    in proc_do_dev_weight().

    Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 1227c1771dd2ad44318aa3ab9e3a293b3f34ff2a ]

    While reading sysctl_[rw]mem_(max|default), they can be changed
    concurrently. Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 9afb4b27349a499483ae0134282cefd0c90f480f ]

    To clear the flow table on flow table free, the following sequence
    normally happens in order:

    1) gc_step work is stopped to disable any further stats/del requests.
    2) All flow table entries are set to teardown state.
    3) Run gc_step which will queue HW del work for each flow table entry.
    4) Waiting for the above del work to finish (flush).
    5) Run gc_step again, deleting all entries from the flow table.
    6) Flow table is freed.

    But if a flow table entry already has pending HW stats or HW add work
    step 3 will not queue HW del work (it will be skipped), step 4 will wait
    for the pending add/stats to finish, and step 5 will queue HW del work
    which might execute after freeing of the flow table.

    To fix the above, this patch flushes the pending work, then it sets the
    teardown flag to all flows in the flowtable and it forces a garbage
    collector run to queue work to remove the flows from hardware, then it
    flushes this new pending work and (finally) it forces another garbage
    collector run to remove the entry from the software flowtable.

    Stack trace:
    [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
    [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
    [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
    [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
    [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
    [47773.889727] Call Trace:
    [47773.890214] dump_stack+0xbb/0x107
    [47773.890818] print_address_description.constprop.0+0x18/0x140
    [47773.892990] kasan_report.cold+0x7c/0xd8
    [47773.894459] kasan_check_range+0x145/0x1a0
    [47773.895174] down_read+0x99/0x460
    [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
    [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
    [47773.913372] process_one_work+0x8ac/0x14e0
    [47773.921325]
    [47773.921325] Allocated by task 592159:
    [47773.922031] kasan_save_stack+0x1b/0x40
    [47773.922730] __kasan_kmalloc+0x7a/0x90
    [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
    [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct]
    [47773.925207] tcf_action_init_1+0x45b/0x700
    [47773.925987] tcf_action_init+0x453/0x6b0
    [47773.926692] tcf_exts_validate+0x3d0/0x600
    [47773.927419] fl_change+0x757/0x4a51 [cls_flower]
    [47773.928227] tc_new_tfilter+0x89a/0x2070
    [47773.936652]
    [47773.936652] Freed by task 543704:
    [47773.937303] kasan_save_stack+0x1b/0x40
    [47773.938039] kasan_set_track+0x1c/0x30
    [47773.938731] kasan_set_free_info+0x20/0x30
    [47773.939467] __kasan_slab_free+0xe7/0x120
    [47773.940194] slab_free_freelist_hook+0x86/0x190
    [47773.941038] kfree+0xce/0x3a0
    [47773.941644] tcf_ct_flow_table_cleanup_work

    Original patch description and stack trace by Paul Blakey.

    Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
    Reported-by: Paul Blakey
    Tested-by: Paul Blakey
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit 759eebbcfafcefa23b59e912396306543764bd3c ]

    Expose nf_flow_table_gc_run() to force a garbage collector run from the
    offload infrastructure.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit e02f0d3970404bfea385b6edb86f2d936db0ea2b ]

    Update nft_data_init() to report EINVAL if chain is already bound.

    Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
    Reported-by: Gwangun Jung
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit f323ef3a0d49e147365284bc1f02212e617b7f09 ]

    Extend struct nft_data_desc to add a flag field that specifies
    nft_data_init() is being called for set element data.

    Use it to disallow jump to implicit chain from set element, only jump
    to chain via immediate expression is allowed.

    Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit 341b6941608762d8235f3fd1e45e4d7114ed8c2c ]

    Instead of parsing the data and then validate that type and length are
    correct, pass a description of the expected data so it can be validated
    upfront before parsing it to bail out earlier.

    This patch adds a new .size field to specify the maximum size of the
    data area. The .len field is optional and it is used as an input/output
    field, it provides the specific length of the expected data in the input
    path. If then .len field is not specified, then obtained length from the
    netlink attribute is stored. This is required by cmp, bitwise, range and
    immediate, which provide no netlink attribute that describes the data
    length. The immediate expression uses the destination register type to
    infer the expected data type.

    Relying on opencoded validation of the expected data might lead to
    subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables:
    stricter validation of element data").

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit 00bd435208e5201eb935d273052930bd3b272b6f ]

    Replace two labels (`err1` and `err2`) with more informative ones.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Jeremy Sowden
     
  • [ Upstream commit 23f68d462984bfda47c7bf663dca347e8e3df549 ]

    Allow up to 16-byte comparisons with a new cmp fast version. Use two
    64-bit words and calculate the mask representing the bits to be
    compared. Make sure the comparison is 64-bit aligned and avoid
    out-of-bound memory access on registers.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso