19 Aug, 2020

7 commits

  • commit 7f897acbe5d57995438c831670b7c400e9c0dc00 upstream.

    Since the patch [1], building the kernel using a toolchain built with
    binutils 2.33.1 prevents booting a sh4 system under Qemu. Apply the patch
    provided by Alan Modra [2] that fix alignment of rodata.

    [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
    [2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html

    Signed-off-by: Romain Naour
    Signed-off-by: Andrew Morton
    Cc: Alan Modra
    Cc: Bin Meng
    Cc: Chen Zhou
    Cc: Geert Uytterhoeven
    Cc: John Paul Adrian Glaubitz
    Cc: Krzysztof Kozlowski
    Cc: Kuninori Morimoto
    Cc: Rich Felker
    Cc: Sam Ravnborg
    Cc: Yoshinori Sato
    Cc: Arnd Bergmann
    Cc:
    Link: https://marc.info/?l=linux-sh&m=158429470221261
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Romain Naour
     
  • [ Upstream commit 62ffc589abb176821662efc4525ee4ac0b9c3894 ]

    Refactor the fastreuse update code in inet_csk_get_port into a small
    helper function that can be called from other places.

    Acked-by: Matthieu Baerts
    Signed-off-by: Tim Froidcoeur
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tim Froidcoeur
     
  • [ Upstream commit f19008e676366c44e9241af57f331b6c6edf9552 ]

    When TFO keys are read back on big endian systems either via the global
    sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
    don't match what was written.

    For example, on s390x:

    # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    02000000-01000000-04000000-03000000

    Instead of:

    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    00000001-00000002-00000003-00000004

    Fix this by converting to the correct endianness on read. This was
    reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
    selftest on s390x, which depends on the read value matching what was
    written. I've confirmed that the test now passes on big and little endian
    systems.

    Signed-off-by: Jason Baron
    Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash")
    Cc: Ard Biesheuvel
    Cc: Eric Dumazet
    Reported-and-tested-by: Colin Ian King
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Baron
     
  • [ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

    YangYuxi is reporting that connection reuse
    is causing one-second delay when SYN hits
    existing connection in TIME_WAIT state.
    Such delay was added to give time to expire
    both the IPVS connection and the corresponding
    conntrack. This was considered a rare case
    at that time but it is causing problem for
    some environments such as Kubernetes.

    As nf_conntrack_tcp_packet() can decide to
    release the conntrack in TIME_WAIT state and
    to replace it with a fresh NEW conntrack, we
    can use this to allow rescheduling just by
    tuning our check: if the conntrack is
    confirmed we can not schedule it to different
    real server and the one-second delay still
    applies but if new conntrack was created,
    we are free to select new real server without
    any delays.

    YangYuxi lists some of the problem reports:

    - One second connection delay in masquerading mode:
    https://marc.info/?t=151683118100004&r=1&w=2

    - IPVS low throughput #70747
    https://github.com/kubernetes/kubernetes/issues/70747

    - Apache Bench can fill up ipvs service proxy in seconds #544
    https://github.com/cloudnativelabs/kube-router/issues/544

    - Additional 1s latency in `host -> service IP -> pod`
    https://github.com/kubernetes/kubernetes/issues/90854

    Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
    Co-developed-by: YangYuxi
    Signed-off-by: YangYuxi
    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Julian Anastasov
     
  • [ Upstream commit 47e33c05f9f07cac3de833e531bcac9ae052c7ca ]

    When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
    direction flag set. While this isn't a big deal as nothing currently
    enforces these bits in the kernel, it should be defined correctly. Fix
    the define and provide support for the old command until it is no longer
    needed for backward compatibility.

    Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
    Signed-off-by: Kees Cook
    Signed-off-by: Sasha Levin

    Kees Cook
     
  • [ Upstream commit 7f3d176f5f7e3f0477bf82df0f600fcddcdcc4e4 ]

    Require that the TCG_PCR_EVENT2.digests.count value strictly matches the
    value of TCG_EfiSpecIdEvent.numberOfAlgorithms in the event field of the
    TCG_PCClientPCREvent event log header. Also require that
    TCG_EfiSpecIdEvent.numberOfAlgorithms is non-zero.

    The TCG PC Client Platform Firmware Profile Specification section 9.1
    (Family "2.0", Level 00 Revision 1.04) states:

    For each Hash algorithm enumerated in the TCG_PCClientPCREvent entry,
    there SHALL be a corresponding digest in all TCG_PCR_EVENT2 structures.
    Note: This includes EV_NO_ACTION events which do not extend the PCR.

    Section 9.4.5.1 provides this description of
    TCG_EfiSpecIdEvent.numberOfAlgorithms:

    The number of Hash algorithms in the digestSizes field. This field MUST
    be set to a value of 0x01 or greater.

    Enforce these restrictions, as required by the above specification, in
    order to better identify and ignore invalid sequences of bytes at the
    end of an otherwise valid TPM2 event log. Firmware doesn't always have
    the means necessary to inform the kernel of the actual event log size so
    the kernel's event log parsing code should be stringent when parsing the
    event log for resiliency against firmware bugs. This is true, for
    example, when firmware passes the event log to the kernel via a reserved
    memory region described in device tree.

    POWER and some ARM systems use the "linux,sml-base" and "linux,sml-size"
    device tree properties to describe the memory region used to pass the
    event log from firmware to the kernel. Unfortunately, the
    "linux,sml-size" property describes the size of the entire reserved
    memory region rather than the size of the event long within the memory
    region and the event log format does not include information describing
    the size of the event log.

    tpm_read_log_of(), in drivers/char/tpm/eventlog/of.c, is where the
    "linux,sml-size" property is used. At the end of that function,
    log->bios_event_log_end is pointing at the end of the reserved memory
    region. That's typically 0x10000 bytes offset from "linux,sml-base",
    depending on what's defined in the device tree source.

    The firmware event log only fills a portion of those 0x10000 bytes and
    the rest of the memory region should be zeroed out by firmware. Even in
    the case of a properly zeroed bytes in the remainder of the memory
    region, the only thing allowing the kernel's event log parser to detect
    the end of the event log is the following conditional in
    __calc_tpm2_event_size():

    if (event_type == 0 && event_field->event_size == 0)
    size = 0;

    If that wasn't there, __calc_tpm2_event_size() would think that a 16
    byte sequence of zeroes, following an otherwise valid event log, was
    a valid event.

    However, problems can occur if a single bit is set in the offset
    corresponding to either the TCG_PCR_EVENT2.eventType or
    TCG_PCR_EVENT2.eventSize fields, after the last valid event log entry.
    This could confuse the parser into thinking that an additional entry is
    present in the event log and exposing this invalid entry to userspace in
    the /sys/kernel/security/tpm0/binary_bios_measurements file. Such
    problems have been seen if firmware does not fully zero the memory
    region upon a warm reboot.

    This patch significantly raises the bar on how difficult it is for
    stale/invalid memory to confuse the kernel's event log parser but
    there's still, ultimately, a reliance on firmware to properly initialize
    the remainder of the memory region reserved for the event log as the
    parser cannot be expected to detect a stale but otherwise properly
    formatted firmware event log entry.

    Fixes: fd5c78694f3f ("tpm: fix handling of the TPM 2.0 event logs")
    Signed-off-by: Tyler Hicks
    Reviewed-by: Jarkko Sakkinen
    Signed-off-by: Jarkko Sakkinen
    Signed-off-by: Sasha Levin

    Tyler Hicks
     
  • commit f3751ad0116fb6881f2c3c957d66a9327f69cefb upstream.

    __tracepoint_string's have their string data stored in .rodata, and an
    address to that data stored in the "__tracepoint_str" section. Functions
    that refer to those strings refer to the symbol of the address. Compiler
    optimization can replace those address references with references
    directly to the string data. If the address doesn't appear to have other
    uses, then it appears dead to the compiler and is removed. This can
    break the /tracing/printk_formats sysfs node which iterates the
    addresses stored in the "__tracepoint_str" section.

    Like other strings stored in custom sections in this header, mark these
    __used to inform the compiler that there are other non-obvious users of
    the address, so they should still be emitted.

    Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com

    Cc: Ingo Molnar
    Cc: Miguel Ojeda
    Cc: stable@vger.kernel.org
    Fixes: 102c9323c35a8 ("tracing: Add __tracepoint_string() to export string pointers")
    Reported-by: Tim Murray
    Reported-by: Simon MacMullen
    Suggested-by: Greg Hackmann
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Nick Desaulniers
     

11 Aug, 2020

5 commits

  • commit 412055398b9e67e07347a936fc4a6adddabe9cf4 upstream.

    svcrdma expects that the payload falls precisely into the xdr_buf
    page vector. This does not seem to be the case for
    nfsd4_encode_readv().

    This code is called only when fops->splice_read is missing or when
    RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
    common cases.

    Add new transport method: ->xpo_read_payload so that when a READ
    payload does not fit exactly in rq_res's page vector, the XDR
    encoder can inform the RPC transport exactly where that payload is,
    without the payload's XDR pad.

    That way, when a Write chunk is present, the transport knows what
    byte range in the Reply message is supposed to be matched with the
    chunk.

    Note that the Linux NFS server implementation of NFS/RDMA can
    currently handle only one Write chunk per RPC-over-RDMA message.
    This simplifies the implementation of this fix.

    Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
    Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
    Signed-off-by: Chuck Lever
    Cc: Timo Rothenpieler
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • [ Upstream commit 8c0de6e96c9794cb523a516c465991a70245da1c ]

    IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
    to IPv4, particularly struct ipv6_ac_socklist. Similar to
    struct ipv6_mc_socklist, we should just close it on this path.

    This bug can be easily reproduced with the following C program:

    #include
    #include
    #include
    #include
    #include

    int main()
    {
    int s, value;
    struct sockaddr_in6 addr;
    struct ipv6_mreq m6;

    s = socket(AF_INET6, SOCK_DGRAM, 0);
    addr.sin6_family = AF_INET6;
    addr.sin6_port = htons(5000);
    inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
    connect(s, (struct sockaddr *)&addr, sizeof(addr));

    inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
    m6.ipv6mr_interface = 5;
    setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));

    value = AF_INET;
    setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));

    close(s);
    return 0;
    }

    Reported-by: ch3332xr@gmail.com
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • commit 08b5d5014a27e717826999ad20e394a8811aae92 upstream.

    set/removexattr on an exported filesystem should break NFS delegations.
    This is true in general, but also for the upcoming support for
    RFC 8726 (NFSv4 extended attribute support). Make sure that they do.

    Additionally, they need to grow a _locked variant, since callers might
    call this with i_rwsem held (like the NFS server code).

    Cc: stable@vger.kernel.org # v4.9+
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Al Viro
    Signed-off-by: Frank van der Linden
    Signed-off-by: Chuck Lever
    Signed-off-by: Greg Kroah-Hartman

    Frank van der Linden
     
  • [ Upstream commit ddc9d357b991838c2d975e8d7e4e9db26f37a7ff ]

    When a Linux hv_sock app tries to connect to a Service GUID on which no
    host app is listening, a recent host (RS3+) sends a
    CHANNELMSG_TL_CONNECT_RESULT (23) message to Linux and this triggers such
    a warning:

    unknown msgtype=23
    WARNING: CPU: 2 PID: 0 at drivers/hv/vmbus_drv.c:1031 vmbus_on_msg_dpc

    Actually Linux can safely ignore the message because the Linux app's
    connect() will time out in 2 seconds: see VSOCK_DEFAULT_CONNECT_TIMEOUT
    and vsock_stream_connect(). We don't bother to make use of the message
    because: 1) it's only supported on recent hosts; 2) a non-trivial effort
    is required to use the message in Linux, but the benefit is small.

    So, let's not see the warning by silently ignoring the message.

    Signed-off-by: Dexuan Cui
    Reviewed-by: Michael Kelley
    Signed-off-by: Sasha Levin

    Dexuan Cui
     
  • [ Upstream commit 2a1658bf922ffd9b7907e270a7d9cdc9643fc45d ]

    Recent kernels have been reported to panic using the bochs_drm
    framebuffer under qemu-system-sparc64 which was bisected to
    commit 7a0483ac4ffc ("drm/bochs: switch to generic drm fbdev emulation").

    The backtrace indicates that the shadow framebuffer copy in
    drm_fb_helper_dirty_blit_real() is trying to access the real
    framebuffer using a virtual address rather than use an IO access
    typically implemented using a physical (ASI_PHYS) access on SPARC.

    The fix is to replace the memcpy with memcpy_toio() from io.h.

    memcpy_toio() uses writeb() where the original fbdev code
    used sbus_memcpy_toio(). The latter uses sbus_writeb().

    The difference between writeb() and sbus_memcpy_toio() is
    that writeb() writes bytes in little-endian, where sbus_writeb() writes
    bytes in big-endian. As endian does not matter for byte writes they are
    the same. So we can safely use memcpy_toio() here.

    Note that this only fixes bochs, in general fbdev helpers still have
    issues with mixing up system memory and __iomem space. Fixing that will
    require a lot more work.

    v3:
    - Improved changelog (Daniel)
    - Added FIXME to fbdev_use_iomem (Daniel)

    v2:
    - Added missing __iomem cast (kernel test robot)
    - Made changelog readable and fix typos (Mark)
    - Add flag to select iomem - and set it in the bochs driver

    Signed-off-by: Sam Ravnborg
    Reported-by: Mark Cave-Ayland
    Reported-by: kernel test robot
    Tested-by: Mark Cave-Ayland
    Reviewed-by: Daniel Vetter
    Cc: Mark Cave-Ayland
    Cc: Thomas Zimmermann
    Cc: Gerd Hoffmann
    Cc: "David S. Miller"
    Cc: sparclinux@vger.kernel.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20200709193016.291267-1-sam@ravnborg.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20200725191012.GA434957@ravnborg.org
    Signed-off-by: Sasha Levin

    Sam Ravnborg
     

07 Aug, 2020

5 commits

  • commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream.

    The sockmap code currently ignores the value of attach_bpf_fd when
    detaching a program. This is contrary to the usual behaviour of
    checking that attach_bpf_fd represents the currently attached
    program.

    Ensure that attach_bpf_fd is indeed the currently attached
    program. It turns out that all sockmap selftests already do this,
    which indicates that this is unlikely to cause breakage.

    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Lorenz Bauer
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman

    Lorenz Bauer
     
  • commit c0842fbc1b18c7a044e6ff3e8fa78bfa822c7d1a upstream.

    The addition of percpu.h to the list of includes in random.h revealed
    some circular dependencies on arm64 and possibly other platforms. This
    include was added solely for the pseudo-random definitions, which have
    nothing to do with the rest of the definitions in this file but are
    still there for legacy reasons.

    This patch moves the pseudo-random parts to linux/prandom.h and the
    percpu.h include with it, which is now guarded by _LINUX_PRANDOM_H and
    protected against recursive inclusion.

    A further cleanup step would be to remove this from
    entirely, and make people who use the prandom infrastructure include
    just the new header file. That's a bit of a churn patch, but grepping
    for "prandom_" and "next_pseudo_random32" "struct rnd_state" should
    catch most users.

    But it turns out that that nice cleanup step is fairly painful, because
    a _lot_ of code currently seems to depend on the implicit include of
    , which can currently come in a lot of ways, including
    such fairly core headfers as .

    So the "nice cleanup" part may or may never happen.

    Fixes: 1c9df907da83 ("random: fix circular include dependency on arm64 after addition of percpu.h")
    Tested-by: Guenter Roeck
    Acked-by: Willy Tarreau
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 83bdc7275e6206f560d247be856bceba3e1ed8f2 upstream.

    It turns out that the plugin right now ends up being really unhappy
    about the change from 'static' to 'extern' storage that happened in
    commit f227e3ec3b5c ("random32: update the net random state on interrupt
    and activity").

    This is probably a trivial fix for the latent_entropy plugin, but for
    now, just remove net_rand_state from the list of things the plugin
    worries about.

    Reported-by: Stephen Rothwell
    Cc: Emese Revfy
    Cc: Kees Cook
    Cc: Willy Tarreau
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 1c9df907da83812e4f33b59d3d142c864d9da57f upstream.

    Daniel Díaz and Kees Cook independently reported that commit
    f227e3ec3b5c ("random32: update the net random state on interrupt and
    activity") broke arm64 due to a circular dependency on include files
    since the addition of percpu.h in random.h.

    The correct fix would definitely be to move all the prandom32 stuff out
    of random.h but for backporting, a smaller solution is preferred.

    This one replaces linux/percpu.h with asm/percpu.h, and this fixes the
    problem on x86_64, arm64, arm, and mips. Note that moving percpu.h
    around didn't change anything and that removing it entirely broke
    differently. When backporting, such options might still be considered
    if this patch fails to help.

    [ It turns out that an alternate fix seems to be to just remove the
    troublesome remove from the arm64
    that causes the circular dependency.

    But we might as well do the whole belt-and-suspenders thing, and
    minimize inclusion in too. Either will fix the
    problem, and both are good changes. - Linus ]

    Reported-by: Daniel Díaz
    Reported-by: Kees Cook
    Tested-by: Marc Zyngier
    Fixes: f227e3ec3b5c
    Cc: Stephen Rothwell
    Signed-off-by: Willy Tarreau
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Willy Tarreau
     
  • commit f227e3ec3b5cad859ad15666874405e8c1bbc1d4 upstream.

    This modifies the first 32 bits out of the 128 bits of a random CPU's
    net_rand_state on interrupt or CPU activity to complicate remote
    observations that could lead to guessing the network RNG's internal
    state.

    Note that depending on some network devices' interrupt rate moderation
    or binding, this re-seeding might happen on every packet or even almost
    never.

    In addition, with NOHZ some CPUs might not even get timer interrupts,
    leaving their local state rarely updated, while they are running
    networked processes making use of the random state. For this reason, we
    also perform this update in update_process_times() in order to at least
    update the state when there is user or system activity, since it's the
    only case we care about.

    Reported-by: Amit Klein
    Suggested-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: "Jason A. Donenfeld"
    Cc: Andy Lutomirski
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Willy Tarreau
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Willy Tarreau
     

05 Aug, 2020

6 commits

  • [ Upstream commit 1748f6a2cbc4694523f16da1c892b59861045b9d ]

    The rcu_dereference call in rht_ptr_rcu is completely bogus because
    we've already dereferenced the value in __rht_ptr and operated on it.
    This causes potential double readings which could be fatal. The RCU
    dereference must occur prior to the comparison in __rht_ptr.

    This patch changes the order of RCU dereference so that it is done
    first and the result is then fed to __rht_ptr. The RCU marking
    changes have been minimised using casts which will be removed in
    a follow-up patch.

    Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...")
    Reported-by: "Gong, Sishuai"
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Herbert Xu
     
  • [ Upstream commit 7d0314b11cdd92bca8b89684c06953bf114605fc ]

    When setting the PF interface up/down, notify the firmware to update
    uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

    This behavior will prevent sending traffic out on uplink port when PF is
    down, such as sending traffic from a VF interface which is still up.
    Currently when calling mlx5e_open/close(), the driver only sends PAOS
    command to notify the firmware to set the physical port state to
    up/down, however, it is not sufficient. When VF is in "auto" state, it
    follows the uplink state, which was not updated on mlx5e_open/close()
    before this patch.

    When switchdev mode is enabled and uplink representor is first enabled,
    set the uplink port state value back to its FW default "AUTO".

    Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down")
    Signed-off-by: Ron Diskin
    Reviewed-by: Roi Dayan
    Reviewed-by: Moshe Shemesh
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Sasha Levin

    Ron Diskin
     
  • [ Upstream commit 101dde4207f1daa1fda57d714814a03835dccc3f ]

    The commits "xfrm: Move dst->path into struct xfrm_dst"
    and "net: Create and use new helper xfrm_dst_child()."
    changed xfrm bundle handling under the assumption
    that xdst->path and dst->child are not a NULL pointer
    only if dst->xfrm is not a NULL pointer. That is true
    with one exception. If the xfrm hold queue is used
    to wait until a SA is installed by the key manager,
    we create a dummy bundle without a valid dst->xfrm
    pointer. The current xfrm bundle handling crashes
    in that case. Fix this by extending the NULL check
    of dst->xfrm with a test of the DST_XFRM_QUEUE flag.

    Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
    Fixes: b92cf4aab8e6 ("net: Create and use new helper xfrm_dst_child().")
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Steffen Klassert
     
  • [ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

    In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    it would take 'priority' to make a policy unique, and allow duplicated
    policies with different 'priority' to be added, which is not expected
    by userland, as Tobias reported in strongswan.

    To fix this duplicated policies issue, and also fix the issue in
    commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    when doing add/del/get/update on user interfaces, this patch is to change
    to look up a policy with both mark and mask by doing:

    mark.v == pol->mark.v && mark.m == pol->mark.m

    and leave the check:

    (mark & pol->mark.m) == pol->mark.v

    for tx/rx path only.

    As the userland expects an exact mark and mask match to manage policies.

    v1->v2:
    - make xfrm_policy_mark_match inline and fix the changelog as
    Tobias suggested.

    Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
    Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
    Reported-by: Tobias Brunner
    Tested-by: Tobias Brunner
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Long
     
  • commit 6989310f5d4327e8595664954edd40a7f99ddd0d upstream.

    Use offsetof to calculate offset of a field to take advantage of
    compiler built-in version when possible, and avoid UBSAN warning when
    compiling with Clang:

    ==================================================================
    UBSAN: Undefined behaviour in net/wireless/wext-core.c:525:14
    member access within null pointer of type 'struct iw_point'
    CPU: 3 PID: 165 Comm: kworker/u16:3 Tainted: G S W 4.19.23 #43
    Workqueue: cfg80211 __cfg80211_scan_done [cfg80211]
    Call trace:
    dump_backtrace+0x0/0x194
    show_stack+0x20/0x2c
    __dump_stack+0x20/0x28
    dump_stack+0x70/0x94
    ubsan_epilogue+0x14/0x44
    ubsan_type_mismatch_common+0xf4/0xfc
    __ubsan_handle_type_mismatch_v1+0x34/0x54
    wireless_send_event+0x3cc/0x470
    ___cfg80211_scan_done+0x13c/0x220 [cfg80211]
    __cfg80211_scan_done+0x28/0x34 [cfg80211]
    process_one_work+0x170/0x35c
    worker_thread+0x254/0x380
    kthread+0x13c/0x158
    ret_from_fork+0x10/0x18
    ===================================================================

    Signed-off-by: Pi-Hsun Shih
    Reviewed-by: Nick Desaulniers
    Link: https://lore.kernel.org/r/20191204081307.138765-1-pihsun@chromium.org
    Signed-off-by: Johannes Berg
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Greg Kroah-Hartman

    Pi-Hsun Shih
     
  • commit 54a485e9ec084da1a4b32dcf7749c7d760ed8aa5 upstream.

    The lookaside count is improperly initialized to the size of the
    Receive Queue with the additional +1. In the traces below, the
    RQ size is 384, so the count was set to 385.

    The lookaside count is then rarely refreshed. Note the high and
    incorrect count in the trace below:

    rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c
    qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385
    rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1]
    Cc: # 5.4.x
    Reviewed-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Tested-by: Honggang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     

01 Aug, 2020

1 commit

  • [ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]

    Previously TLP may send multiple probes of new data in one
    flight. This happens when the sender is cwnd limited. After the
    initial TLP containing new data is sent, the sender receives another
    ACK that acks partial inflight. It may re-arm another TLP timer
    to send more, if no further ACK returns before the next TLP timeout
    (PTO) expires. The sender may send in theory a large amount of TLP
    until send queue is depleted. This only happens if the sender sees
    such irregular uncommon ACK pattern. But it is generally undesirable
    behavior during congestion especially.

    The original TLP design restrict only one TLP probe per inflight as
    published in "Reducing Web Latency: the Virtue of Gentle Aggression",
    SIGCOMM 2013. This patch changes TLP to send at most one probe
    per inflight.

    Note that if the sender is app-limited, TLP retransmits old data
    and did not have this issue.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yuchung Cheng
     

29 Jul, 2020

7 commits

  • commit 5df96f2b9f58a5d2dc1f30fe7de75e197f2c25f2 upstream.

    Commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended
    device during destroy") broke integrity recalculation.

    The problem is dm_suspended() returns true not only during suspend,
    but also during resume. So this race condition could occur:
    1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
    2. integrity_recalc (&ic->recalc_work) preempts the current thread
    3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
    4. integrity_recalc exits and no recalculating is done.

    To fix this race condition, add a function dm_post_suspending that is
    only true during the postsuspend phase and use it instead of
    dm_suspended().

    Signed-off-by: Mikulas Patocka
    Fixes: adc0daad366b ("dm: report suspended device during destroy")
    Cc: stable vger kernel org # v4.18+
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • commit 85ca6b17e2bb96b19caac3b02c003d670b66de96 upstream.

    The Lenovo Miix 2 10 has a keyboard dock with extra speakers in the dock.
    Rather then the ACL5672's GPIO1 pin being used as IRQ to the CPU, it is
    actually used to enable the amplifier for these speakers
    (the IRQ to the CPU comes directly from the jack-detect switch).

    Add a quirk for having an ext speaker-amplifier enable pin on GPIO1
    and replace the Lenovo Miix 2 10's dmi_system_id table entry's wrong
    GPIO_DEV quirk (which needs to be renamed to GPIO1_IS_IRQ) with the
    new RT5670_GPIO1_IS_EXT_SPK_EN quirk, so that we enable the external
    speaker-amplifier as necessary.

    Also update the ident field for the dmi_system_id table entry, the
    Miix models are not Thinkpads.

    Fixes: 67e03ff3f32f ("ASoC: codecs: rt5670: add Thinkpad Tablet 10 quirk")
    Signed-off-by: Hans de Goede
    BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1786723
    Link: https://lore.kernel.org/r/20200628155231.71089-4-hdegoede@redhat.com
    Signed-off-by: Mark Brown
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • commit de2b41be8fcccb2f5b6c480d35df590476344201 upstream.

    On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
    page-aligned, but the end of the .bss..page_aligned section is not
    guaranteed to be page-aligned.

    As a result, objects from other .bss sections may end up on the same 4k
    page as the idt_table, and will accidentially get mapped read-only during
    boot, causing unexpected page-faults when the kernel writes to them.

    This could be worked around by making the objects in the page aligned
    sections page sized, but that's wrong.

    Explicit sections which store only page aligned objects have an implicit
    guarantee that the object is alone in the page in which it is placed. That
    works for all objects except the last one. That's inconsistent.

    Enforcing page sized objects for these sections would wreckage memory
    sanitizers, because the object becomes artificially larger than it should
    be and out of bound access becomes legit.

    Align the end of the .bss..page_aligned and .data..page_aligned section on
    page-size so all objects places in these sections are guaranteed to have
    their own page.

    [ tglx: Amended changelog ]

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     
  • commit e0b3e0b1a04367fc15c07f44e78361545b55357c upstream.

    The !ATOMIC_IOMAP version of io_maping_init_wc will always return
    success, even when the ioremap fails.

    Since the ATOMIC_IOMAP version returns NULL when the init fails, and
    callers check for a NULL return on error this is unexpected.

    During a device probe, where the ioremap failed, a crash can look like
    this:

    BUG: unable to handle page fault for address: 0000000000210000
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    Oops: 0002 [#1] PREEMPT SMP
    CPU: 0 PID: 177 Comm:
    RIP: 0010:fill_page_dma [i915]
    gen8_ppgtt_create [i915]
    i915_ppgtt_create [i915]
    intel_gt_init [i915]
    i915_gem_init [i915]
    i915_driver_probe [i915]
    pci_device_probe
    really_probe
    driver_probe_device

    The remap failure occurred much earlier in the probe. If it had been
    propagated, the driver would have exited with an error.

    Return NULL on ioremap failure.

    [akpm@linux-foundation.org: detect ioremap_wc() errors earlier]

    Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping")
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Mike Rapoport
    Cc: Andy Shevchenko
    Cc: Chris Wilson
    Cc: Daniel Vetter
    Cc:
    Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • [ Upstream commit bd024e82e4cd95c7f1a475a55f99871936c2b2db ]

    Although mmiowb() is concerned only with serialising MMIO writes occuring
    in contexts where a spinlock is held, the call to mmiowb_set_pending()
    from the MMIO write accessors can occur in preemptible contexts, such
    as during driver probe() functions where ordering between CPUs is not
    usually a concern, assuming that the task migration path provides the
    necessary ordering guarantees.

    Unfortunately, the default implementation of mmiowb_set_pending() is not
    preempt-safe, as it makes use of a a per-cpu variable to track its
    internal state. This has been reported to generate the following splat
    on riscv:

    | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
    | caller is regmap_mmio_write32le+0x1c/0x46
    | CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-hfu+ #1
    | Call Trace:
    | walk_stackframe+0x0/0x7a
    | dump_stack+0x6e/0x88
    | regmap_mmio_write32le+0x18/0x46
    | check_preemption_disabled+0xa4/0xaa
    | regmap_mmio_write32le+0x18/0x46
    | regmap_mmio_write+0x26/0x44
    | regmap_write+0x28/0x48
    | sifive_gpio_probe+0xc0/0x1da

    Although it's possible to fix the driver in this case, other splats have
    been seen from other drivers, including the infamous 8250 UART, and so
    it's better to address this problem in the mmiowb core itself.

    Fix mmiowb_set_pending() by using the raw_cpu_ptr() to get at the mmiowb
    state and then only updating the 'mmiowb_pending' field if we are not
    preemptible (i.e. we have a non-zero nesting count).

    Cc: Arnd Bergmann
    Cc: Paul Walmsley
    Cc: Guo Ren
    Cc: Michael Ellerman
    Reported-by: Palmer Dabbelt
    Reported-by: Emil Renner Berthing
    Tested-by: Emil Renner Berthing
    Reviewed-by: Palmer Dabbelt
    Acked-by: Palmer Dabbelt
    Link: https://lore.kernel.org/r/20200716112816.7356-1-will@kernel.org
    Signed-off-by: Will Deacon
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • [ Upstream commit c463bb2a8f8d7d97aa414bf7714fc77e9d3b10df ]

    This event code represents the state of a removable cover of a device.
    Value 0 means that the cover is open or removed, value 1 means that the
    cover is closed.

    Reviewed-by: Sebastian Reichel
    Acked-by: Tony Lindgren
    Signed-off-by: Merlijn Wajer
    Link: https://lore.kernel.org/r/20200612125402.18393-2-merlijn@wizzup.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Sasha Levin

    Merlijn Wajer
     
  • [ Upstream commit 6348dd291e3653534a9e28e6917569bc9967b35b ]

    There exists a sleep-while-atomic bug while accessing the dmabuf->name
    under mutex in the dmabuffs_dname(). This is caused from the SELinux
    permissions checks on a process where it tries to validate the inherited
    files from fork() by traversing them through iterate_fd() (which
    traverse files under spin_lock) and call
    match_file(security/selinux/hooks.c) where the permission checks happen.
    This audit information is logged using dump_common_audit_data() where it
    calls d_path() to get the file path name. If the file check happen on
    the dmabuf's fd, then it ends up in ->dmabuffs_dname() and use mutex to
    access dmabuf->name. The flow will be like below:
    flush_unauthorized_files()
    iterate_fd()
    spin_lock() --> Start of the atomic section.
    match_file()
    file_has_perm()
    avc_has_perm()
    avc_audit()
    slow_avc_audit()
    common_lsm_audit()
    dump_common_audit_data()
    audit_log_d_path()
    d_path()
    dmabuffs_dname()
    mutex_lock()--> Sleep while atomic.

    Call trace captured (on 4.19 kernels) is below:
    ___might_sleep+0x204/0x208
    __might_sleep+0x50/0x88
    __mutex_lock_common+0x5c/0x1068
    __mutex_lock_common+0x5c/0x1068
    mutex_lock_nested+0x40/0x50
    dmabuffs_dname+0xa0/0x170
    d_path+0x84/0x290
    audit_log_d_path+0x74/0x130
    common_lsm_audit+0x334/0x6e8
    slow_avc_audit+0xb8/0xf8
    avc_has_perm+0x154/0x218
    file_has_perm+0x70/0x180
    match_file+0x60/0x78
    iterate_fd+0x128/0x168
    selinux_bprm_committing_creds+0x178/0x248
    security_bprm_committing_creds+0x30/0x48
    install_exec_creds+0x1c/0x68
    load_elf_binary+0x3a4/0x14e0
    search_binary_handler+0xb0/0x1e0

    So, use spinlock to access dmabuf->name to avoid sleep-while-atomic.

    Cc: [5.3+]
    Signed-off-by: Charan Teja Kalla
    Reviewed-by: Michael J. Ruhl
    Acked-by: Christian König
    [sumits: added comment to spinlock_t definition to avoid warning]
    Signed-off-by: Sumit Semwal
    Link: https://patchwork.freedesktop.org/patch/msgid/a83e7f0d-4e54-9848-4b58-e1acdbe06735@codeaurora.org
    Signed-off-by: Sasha Levin

    Charan Teja Kalla
     

22 Jul, 2020

9 commits

  • commit aadf9dcef9d4cd68c73a4ab934f93319c4becc47 upstream.

    The trace symbol printer (__print_symbolic()) ignores symbols that map to
    an empty string and prints the hex value instead.

    Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid
    this.

    Fixes: b54a134a7de4 ("rxrpc: Fix handling of enums-to-string translation in tracing")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit a50ca29523b18baea548bdf5df9b4b923c2bb4f6 upstream.

    This adds more hardware IDs for Elan touchpads found in various Lenovo
    laptops.

    Signed-off-by: Dave Wang
    Link: https://lore.kernel.org/r/000201d5a8bd$9fead3f0$dfc07bd0$@emc.com.tw
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dave Wang
     
  • commit f794db6841e5480208f0c3a3ac1df445a96b079e upstream.

    Until this commit the mainline kernel version (this version) of the
    vboxguest module contained a bug where it defined
    VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG using
    _IOC(_IOC_READ | _IOC_WRITE, 'V', ...) instead of
    _IO(V, ...) as the out of tree VirtualBox upstream version does.

    Since the VirtualBox userspace bits are always built against VirtualBox
    upstream's headers, this means that so far the mainline kernel version
    of the vboxguest module has been failing these 2 ioctls with -ENOTTY.
    I guess that VBGL_IOCTL_VMMDEV_REQUEST_BIG is never used causing us to
    not hit that one and sofar the vboxguest driver has failed to actually
    log any log messages passed it through VBGL_IOCTL_LOG.

    This commit changes the VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG
    defines to match the out of tree VirtualBox upstream vboxguest version,
    while keeping compatibility with the old wrong request defines so as
    to not break the kernel ABI in case someone has been using the old
    request defines.

    Fixes: f6ddd094f579 ("virt: Add vboxguest driver for Virtual Box Guest integration UAPI")
    Cc: stable@vger.kernel.org
    Acked-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Hans de Goede
    Link: https://lore.kernel.org/r/20200709120858.63928-2-hdegoede@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • [ Upstream commit e8639e1c986a8a9d0f94549170f6db579376c3ae ]

    The RTC modules on am3 and am4 need quirk handling to unlock and lock
    them for reset so let's add the quirk handling based on what we already
    have for legacy platform data. In later patches we will simply drop the
    RTC related platform data and the old quirk handling.

    Signed-off-by: Tony Lindgren
    Signed-off-by: Sasha Levin

    Tony Lindgren
     
  • [ Upstream commit bfe373f608cf81b7626dfeb904001b0e867c5110 ]

    Else there may be magic numbers in /sys/kernel/debug/block/*/state.

    Signed-off-by: Hou Tao
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hou Tao
     
  • [ Upstream commit 14b032b8f8fce03a546dcf365454bec8c4a58d7d ]

    In order for no_refcnt and is_data to be the lowest order two
    bits in the 'val' we have to pad out the bitfield of the u8.

    Fixes: ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()")
    Reported-by: Guenter Roeck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

    When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 469aceddfa3ed16e17ee30533fae45e90f62efd8 ]

    Toshiaki pointed out that we now have two very similar functions to extract
    the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
    that the unbounded parsing loop makes it possible for maliciously crafted
    packets to loop through potentially hundreds of tags.

    Fix both of these issues by consolidating the two parsing functions and
    limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
    switch over __vlan_get_protocol() to use skb_header_pointer() instead of
    pskb_may_pull(), to avoid the possible side effects of the latter and keep
    the skb pointer 'const' through all the parsing functions.

    v2:
    - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)

    Reported-by: Toshiaki Makita
    Reported-by: Daniel Borkmann
    Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

    There are a couple of places in net/sched/ that check skb->protocol and act
    on the value there. However, in the presence of VLAN tags, the value stored
    in skb->protocol can be inconsistent based on whether VLAN acceleration is
    enabled. The commit quoted in the Fixes tag below fixed the users of
    skb->protocol to use a helper that will always see the VLAN ethertype.

    However, most of the callers don't actually handle the VLAN ethertype, but
    expect to find the IP header type in the protocol field. This means that
    things like changing the ECN field, or parsing diffserv values, stops
    working if there's a VLAN tag, or if there are multiple nested VLAN
    tags (QinQ).

    To fix this, change the helper to take an argument that indicates whether
    the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
    make sure to skip all of them, so behaviour is consistent even in QinQ
    mode.

    To make the helper usable from the ECN code, move it to if_vlan.h instead
    of pkt_sched.h.

    v3:
    - Remove empty lines
    - Move vlan variable definitions inside loop in skb_protocol()
    - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
    bpf_skb_ecn_set_ce()

    v2:
    - Use eth_type_vlan() helper in skb_protocol()
    - Also fix code that reads skb->protocol directly
    - Change a couple of 'if/else if' statements to switch constructs to avoid
    calling the helper twice

    Reported-by: Ilya Ponetayev
    Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen