06 Dec, 2020

1 commit

  • Since insn.prefixes.nbytes can be bigger than the size of
    insn.prefixes.bytes[] when a prefix is repeated, the proper check must
    be

    insn.prefixes.bytes[i] != 0 and i < 4

    instead of using insn.prefixes.nbytes. Use the new
    for_each_insn_prefix() macro which does it correctly.

    Debugged by Kees Cook .

    [ bp: Massage commit message. ]

    Fixes: 32d0b95300db ("x86/insn-eval: Add utility functions to get segment selector")
    Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Borislav Petkov
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2

    Masami Hiramatsu
     

04 Nov, 2020

1 commit

  • Commit

    393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")

    added .weak directives to arch/x86/lib/mem*_64.S instead of changing the
    existing ENTRY macros to WEAK. This can lead to the assembly snippet

    .weak memcpy
    ...
    .globl memcpy

    which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy
    with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
    https://reviews.llvm.org/D90108) will error on such an overridden symbol
    binding.

    Commit

    ef1e03152cb0 ("x86/asm: Make some functions local")

    changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which
    was ineffective due to the preceding .weak directive.

    Use the appropriate SYM_FUNC_START_WEAK instead.

    Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")
    Fixes: ef1e03152cb0 ("x86/asm: Make some functions local")
    Reported-by: Sami Tolvanen
    Signed-off-by: Fangrui Song
    Signed-off-by: Borislav Petkov
    Reviewed-by: Nick Desaulniers
    Tested-by: Nathan Chancellor
    Tested-by: Nick Desaulniers
    Cc:
    Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com

    Fangrui Song
     

23 Oct, 2020

1 commit

  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • Pull x86 SEV-ES support from Borislav Petkov:
    "SEV-ES enhances the current guest memory encryption support called SEV
    by also encrypting the guest register state, making the registers
    inaccessible to the hypervisor by en-/decrypting them on world
    switches. Thus, it adds additional protection to Linux guests against
    exfiltration, control flow and rollback attacks.

    With SEV-ES, the guest is in full control of what registers the
    hypervisor can access. This is provided by a guest-host exchange
    mechanism based on a new exception vector called VMM Communication
    Exception (#VC), a new instruction called VMGEXIT and a shared
    Guest-Host Communication Block which is a decrypted page shared
    between the guest and the hypervisor.

    Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
    so in order for that exception mechanism to work, the early x86 init
    code needed to be made able to handle exceptions, which, in itself,
    brings a bunch of very nice cleanups and improvements to the early
    boot code like an early page fault handler, allowing for on-demand
    building of the identity mapping. With that, !KASLR configurations do
    not use the EFI page table anymore but switch to a kernel-controlled
    one.

    The main part of this series adds the support for that new exchange
    mechanism. The goal has been to keep this as much as possibly separate
    from the core x86 code by concentrating the machinery in two
    SEV-ES-specific files:

    arch/x86/kernel/sev-es-shared.c
    arch/x86/kernel/sev-es.c

    Other interaction with core x86 code has been kept at minimum and
    behind static keys to minimize the performance impact on !SEV-ES
    setups.

    Work by Joerg Roedel and Thomas Lendacky and others"

    * tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
    x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
    x86/sev-es: Check required CPU features for SEV-ES
    x86/efi: Add GHCB mappings when SEV-ES is active
    x86/sev-es: Handle NMI State
    x86/sev-es: Support CPU offline/online
    x86/head/64: Don't call verify_cpu() on starting APs
    x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
    x86/realmode: Setup AP jump table
    x86/realmode: Add SEV-ES specific trampoline entry point
    x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
    x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
    x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
    x86/sev-es: Handle #DB Events
    x86/sev-es: Handle #AC Events
    x86/sev-es: Handle VMMCALL Events
    x86/sev-es: Handle MWAIT/MWAITX Events
    x86/sev-es: Handle MONITOR/MONITORX Events
    x86/sev-es: Handle INVD Events
    x86/sev-es: Handle RDPMC Events
    x86/sev-es: Handle RDTSC(P) Events
    ...

    Linus Torvalds
     

13 Oct, 2020

4 commits

  • Instead of inlining the stac/mov/clac sequence (which also requires
    individual exception table entries and several asm instruction
    alternatives entries), just generate "call __put_user_nocheck_X" for the
    __put_user() cases, the same way we changed __get_user earlier.

    Unlike the get_user() case, we didn't have the same nice infrastructure
    to just generate the call with a single case, so this actually has to
    change some of the infrastructure in order to do this. But that only
    cleans up the code further.

    So now, instead of using a case statement for the sizes, we just do the
    same thing we've done on the get_user() side for a long time: use the
    size as an immediate constant to the asm, and generate the asm that way
    directly.

    In order to handle the special case of 64-bit data on a 32-bit kernel, I
    needed to change the calling convention slightly: the data is passed in
    %eax[:%edx], the pointer in %ecx, and the return value is also returned
    in %ecx. It used to be returned in %eax, but because of how %eax can
    now be a double register input, we don't want mix that with a
    single-register output.

    The actual low-level asm is easier to handle: we'll just share the code
    between the checking and non-checking case, with the non-checking case
    jumping into the middle of the function. That may sound a bit too
    special, but this code is all very very special anyway, so...

    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Instead of inlining the whole stac/lfence/mov/clac sequence (which also
    requires individual exception table entries and several asm instruction
    alternatives entries), just generate "call __get_user_nocheck_X" for the
    __get_user() cases.

    We can use all the same infrastructure that we already do for the
    regular "get_user()", and the end result is simpler source code, and
    much simpler code generation.

    It also means that when I introduce asm goto with input for
    "unsafe_get_user()", there are no nasty interactions with the
    __get_user() code.

    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull copy_and_csum cleanups from Al Viro:
    "Saner calling conventions for csum_and_copy_..._user() and friends"

    [ Removing 800+ lines of code and cleaning stuff up is good - Linus ]

    * 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ppc: propagate the calling conventions change down to csum_partial_copy_generic()
    amd64: switch csum_partial_copy_generic() to new calling conventions
    sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
    xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
    mips: propagate the calling convention change down into __csum_partial_copy_..._user()
    mips: __csum_partial_copy_kernel() has no users left
    mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
    sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
    i386: propagate the calling conventions change down to csum_partial_copy_generic()
    sh: propage the calling conventions change down to csum_partial_copy_generic()
    m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
    arm: propagate the calling convention changes down to csum_partial_copy_from_user()
    alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
    saner calling conventions for csum_and_copy_..._user()
    csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
    csum_partial_copy_nocheck(): drop the last argument
    unify generic instances of csum_partial_copy_nocheck()
    icmp_push_reply(): reorder adding the checksum up
    skb_copy_and_csum_bits(): don't bother with the last argument

    Linus Torvalds
     
  • Pull RAS updates from Borislav Petkov:

    - Extend the recovery from MCE in kernel space also to processes which
    encounter an MCE in kernel space but while copying from user memory
    by sending them a SIGBUS on return to user space and umapping the
    faulty memory, by Tony Luck and Youquan Song.

    - memcpy_mcsafe() rework by splitting the functionality into
    copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
    support for new hardware which can recover from a machine check
    encountered during a fast string copy and makes that the default and
    lets the older hardware which does not support that advance recovery,
    opt in to use the old, fragile, slow variant, by Dan Williams.

    - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

    - Do not use MSR-tracing accessors in #MC context and flag any fault
    while accessing MCA architectural MSRs as an architectural violation
    with the hope that such hw/fw misdesigns are caught early during the
    hw eval phase and they don't make it into production.

    - Misc fixes, improvements and cleanups, as always.

    * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
    x86/mce: Decode a kernel instruction to determine if it is copying from user
    x86/mce: Recover from poison found while copying from user space
    x86/mce: Avoid tail copy when machine check terminated a copy from user
    x86/mce: Add _ASM_EXTABLE_CPY for copy user access
    x86/mce: Provide method to find out the type of an exception handler
    x86/mce: Pass pointer to saved pt_regs to severity calculation routines
    x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
    x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
    x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
    x86/mce: Add Skylake quirk for patrol scrub reported errors
    RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
    x86/mce: Annotate mce_rd/wrmsrl() with noinstr
    x86/mce/dev-mcelog: Do not update kflags on AMD systems
    x86/mce: Stop mce_reign() from re-computing severity for every CPU
    x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
    x86/mce: Increase maximum number of banks to 64
    x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
    x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
    RAS/CEC: Fix cec_init() prototype

    Linus Torvalds
     

07 Oct, 2020

2 commits

  • In the page fault case it is ok to see if a few more unaligned bytes
    can be copied from the source address. Worst case is that the page fault
    will be triggered again.

    Machine checks are more serious. Just give up at the point where the
    main copy loop triggered the #MC and return from the copy code as if
    the copy succeeded. The machine check handler will use task_work_add() to
    make sure that the task is sent a SIGBUS.

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20201006210910.21062-5-tony.luck@intel.com

    Tony Luck
     
  • _ASM_EXTABLE_UA is a general exception entry to record the exception fixup
    for all exception spots between kernel and user space access.

    To enable recovery from machine checks while coping data from user
    addresses it is necessary to be able to distinguish the places that are
    looping copying data from those that copy a single byte/word/etc.

    Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA
    in the copy functions.

    Record the exception reason number to regs->ax at
    ex_handler_uaccess which is used to check MCE triggered.

    The new fixup routine ex_handler_copy() is almost an exact copy of
    ex_handler_uaccess() The difference is that it sets regs->ax to the trap
    number. Following patches use this to avoid trying to copy remaining
    bytes from the tail of the copy and possibly hitting the poison again.

    New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity()
    calculation to indicate that a machine check is recoverable because the
    kernel was copying from user space.

    Signed-off-by: Youquan Song
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.luck@intel.com

    Youquan Song
     

06 Oct, 2020

2 commits

  • The motivations to go rework memcpy_mcsafe() are that the benefit of
    doing slow and careful copies is obviated on newer CPUs, and that the
    current opt-in list of CPUs to instrument recovery is broken relative to
    those CPUs. There is no need to keep an opt-in list up to date on an
    ongoing basis if pmem/dax operations are instrumented for recovery by
    default. With recovery enabled by default the old "mcsafe_key" opt-in to
    careful copying can be made a "fragile" opt-out. Where the "fragile"
    list takes steps to not consume poison across cachelines.

    The discussion with Linus made clear that the current "_mcsafe" suffix
    was imprecise to a fault. The operations that are needed by pmem/dax are
    to copy from a source address that might throw #MC to a destination that
    may write-fault, if it is a user page.

    So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
    the separate precautions taken on source and destination.
    copy_mc_to_kernel() is introduced as a non-SMAP version that does not
    expect write-faults on the destination, but is still prepared to abort
    with an error code upon taking #MC.

    The original copy_mc_fragile() implementation had negative performance
    implications since it did not use the fast-string instruction sequence
    to perform copies. For this reason copy_mc_to_kernel() fell back to
    plain memcpy() to preserve performance on platforms that did not indicate
    the capability to recover from machine check exceptions. However, that
    capability detection was not architectural and now that some platforms
    can recover from fast-string consumption of memory errors the memcpy()
    fallback now causes these more capable platforms to fail.

    Introduce copy_mc_enhanced_fast_string() as the fast default
    implementation of copy_mc_to_kernel() and finalize the transition of
    copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
    With this in place, copy_mc_to_kernel() is fast and recovery-ready by
    default regardless of hardware capability.

    Thanks to Vivek for identifying that copy_user_generic() is not suitable
    as the copy_mc_to_user() backend since the #MC handler explicitly checks
    ex_has_fault_handler(). Thanks to the 0day robot for catching a
    performance bug in the x86/copy_mc_to_user implementation.

    [ bp: Add the "why" for this change from the 0/2th message, massage. ]

    Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
    Reported-by: Erwin Tsaur
    Reported-by: 0day robot
    Signed-off-by: Dan Williams
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Tested-by: Erwin Tsaur
    Cc:
    Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • In reaction to a proposal to introduce a memcpy_mcsafe_fast()
    implementation Linus points out that memcpy_mcsafe() is poorly named
    relative to communicating the scope of the interface. Specifically what
    addresses are valid to pass as source, destination, and what faults /
    exceptions are handled.

    Of particular concern is that even though x86 might be able to handle
    the semantics of copy_mc_to_user() with its common copy_user_generic()
    implementation other archs likely need / want an explicit path for this
    case:

    On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
    >
    > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
    > >
    > > However now I see that copy_user_generic() works for the wrong reason.
    > > It works because the exception on the source address due to poison
    > > looks no different than a write fault on the user address to the
    > > caller, it's still just a short copy. So it makes copy_to_user() work
    > > for the wrong reason relative to the name.
    >
    > Right.
    >
    > And it won't work that way on other architectures. On x86, we have a
    > generic function that can take faults on either side, and we use it
    > for both cases (and for the "in_user" case too), but that's an
    > artifact of the architecture oddity.
    >
    > In fact, it's probably wrong even on x86 - because it can hide bugs -
    > but writing those things is painful enough that everybody prefers
    > having just one function.

    Replace a single top-level memcpy_mcsafe() with either
    copy_mc_to_user(), or copy_mc_to_kernel().

    Introduce an x86 copy_mc_fragile() name as the rename for the
    low-level x86 implementation formerly named memcpy_mcsafe(). It is used
    as the slow / careful backend that is supplanted by a fast
    copy_mc_generic() in a follow-on patch.

    One side-effect of this reorganization is that separating copy_mc_64.S
    to its own file means that perf no longer needs to track dependencies
    for its memcpy_64.S benchmarks.

    [ bp: Massage a bit. ]

    Signed-off-by: Dan Williams
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Acked-by: Michael Ellerman
    Cc:
    Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
    Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

27 Sep, 2020

1 commit

  • If we copy less than 8 bytes and if the destination crosses a cache
    line, __copy_user_flushcache would invalidate only the first cache line.

    This patch makes it invalidate the second cache line as well.

    Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations")
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Ingo Molnar
    Cc:
    Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

09 Sep, 2020

1 commit

  • Stop providing the possibility to override the address space using
    set_fs() now that there is no need for that any more. To properly
    handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
    x86 a new alternative is introduced, which just like the one in
    entry_64.S has to use the hardcoded virtual address bits to escape
    the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
    page tables are enabled.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Sep, 2020

4 commits

  • Add a function to check whether an instruction has a REP prefix.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Borislav Petkov
    Reviewed-by: Masami Hiramatsu
    Link: https://lkml.kernel.org/r/20200907131613.12703-12-joro@8bytes.org

    Joerg Roedel
     
  • Add a function to the instruction decoder which returns the pt_regs
    offset of the register specified in the reg field of the modrm byte.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Borislav Petkov
    Acked-by: Masami Hiramatsu
    Link: https://lkml.kernel.org/r/20200907131613.12703-11-joro@8bytes.org

    Joerg Roedel
     
  • Factor out the code used to decode an instruction with the correct
    address and operand sizes to a helper function.

    No functional changes.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20200907131613.12703-10-joro@8bytes.org

    Joerg Roedel
     
  • Factor out the code to fetch the instruction from user-space to a helper
    function.

    No functional changes.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20200907131613.12703-9-joro@8bytes.org

    Joerg Roedel
     

03 Sep, 2020

1 commit

  • When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the
    switch statement in cmdline_find_option (jump tables are disabled when
    CONFIG_RETPOLINE is enabled). This function is called very early in boot
    from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time,
    the kernel is still executing out of the identity mapping, but the jump
    table will contain virtual addresses.

    Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is
    enabled.

    Signed-off-by: Arvind Sankar
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200903023056.3914690-1-nivedita@alum.mit.edu

    Arvind Sankar
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

21 Aug, 2020

4 commits

  • ... and fold handling of misaligned case into it.

    Implementation note: we stash the "will we need to rol8 the sum in the end"
    flag into the MSB of %rcx (the lower 32 bits are used for length); the rest
    is pretty straightforward.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and don't bother zeroing destination on error

    Signed-off-by: Al Viro

    Al Viro
     
  • All callers of these primitives will
    * discard anything we might've copied in case of error
    * ignore the csum value in case of error
    * always pass 0xffffffff as the initial sum, so the
    resulting csum value (in case of success, that is) will never be 0.

    That suggest the following calling conventions:
    * don't pass err_ptr - just return 0 on error.
    * don't bother with zeroing destination, etc. in case of error
    * don't pass the initial sum - just use 0xffffffff.

    This commit does the minimal conversion in the instances of csum_and_copy_...();
    the changes of actual asm code behind them are done later in the series.
    Note that this asm code is often shared with csum_partial_copy_nocheck();
    the difference is that csum_partial_copy_nocheck() passes 0 for initial
    sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are
    free to pass 0xffffffff in all cases and subsequent patches will use that
    freedom without any special comments.

    A part that could be split off: parisc and uml/i386 claimed to have
    csum_and_copy_to_user() instances of their own, but those were identical
    to the generic one, so we simply drop them. Not sure if it's worth
    a separate commit...

    Signed-off-by: Al Viro

    Al Viro
     
  • It's always 0. Note that we theoretically could use ~0U as well -
    result will be the same modulo 0xffff, _if_ the damn thing did the
    right thing for any value of initial sum; later we'll make use of
    that when convenient.

    However, unlike csum_and_copy_..._user(), there are instances that
    did not work for arbitrary initial sums; c6x is one such.

    Signed-off-by: Al Viro

    Al Viro
     

07 Jul, 2020

1 commit

  • Some Makefiles already pass -fno-stack-protector unconditionally.
    For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.

    No problem report so far about hard-coding this option. So, we can
    assume all supported compilers know -fno-stack-protector.

    GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)

    Get rid of cc-option from -fno-stack-protector.

    Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.

    Note:
    arch/mips/vdso/Makefile adds -fno-stack-protector twice, first
    unconditionally, and second conditionally. I removed the second one.

    Signed-off-by: Masahiro Yamada
    Reviewed-by: Kees Cook
    Acked-by: Ard Biesheuvel
    Reviewed-by: Nick Desaulniers

    Masahiro Yamada
     

29 Jun, 2020

1 commit

  • Pull x86 fixes from Borislav Petkov:

    - AMD Memory bandwidth counter width fix, by Babu Moger.

    - Use the proper length type in the 32-bit truncate() syscall variant,
    by Jiri Slaby.

    - Reinit IA32_FEAT_CTL during wakeup to fix the case where after
    resume, VMXON would #GP due to VMX not being properly enabled, by
    Sean Christopherson.

    - Fix a static checker warning in the resctrl code, by Dan Carpenter.

    - Add a CR4 pinning mask for bits which cannot change after boot, by
    Kees Cook.

    - Align the start of the loop of __clear_user() to 16 bytes, to improve
    performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

    * tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asm/64: Align start of __clear_user() loop to 16-bytes
    x86/cpu: Use pinning mask for CR4 bits needing to be 0
    x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
    x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
    syscalls: Fix offset type of ksys_ftruncate()
    x86/resctrl: Fix memory bandwidth counter width for AMD

    Linus Torvalds
     

25 Jun, 2020

1 commit

  • vmlinux.o: warning: objtool: fixup_bad_iret()+0x8e: call to memcpy() leaves .noinstr.text section

    Worse, when KASAN there is no telling what memcpy() actually is. Force
    the use of __memcpy() which is our assmebly implementation.

    Reported-by: Marco Elver
    Suggested-by: Marco Elver
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200618144801.760070502@infradead.org

    Peter Zijlstra
     

20 Jun, 2020

1 commit

  • x86 CPUs can suffer severe performance drops if a tight loop, such as
    the ones in __clear_user(), straddles a 16-byte instruction fetch
    window, or worse, a 64-byte cacheline. This issues was discovered in the
    SUSE kernel with the following commit,

    1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

    which increased the code object size from 10 bytes to 15 bytes and
    caused the 8-byte copy loop in __clear_user() to be split across a
    64-byte cacheline.

    Aligning the start of the loop to 16-bytes makes this fit neatly inside
    a single instruction fetch window again and restores the performance of
    __clear_user() which is used heavily when reading from /dev/zero.

    Here are some numbers from running libmicro's read_z* and pread_z*
    microbenchmarks which read from /dev/zero:

    Zen 1 (Naples)

    libmicro-file
    5.7.0-rc6 5.7.0-rc6 5.7.0-rc6
    revert-1153933703d9+ align16+
    Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%)
    Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%)
    Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%)
    Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%)
    Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%)
    Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%)

    Note that this doesn't affect Haswell or Broadwell microarchitectures
    which seem to avoid the alignment issue by executing the loop straight
    out of the Loop Stream Detector (verified using perf events).

    Fixes: 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
    Signed-off-by: Matt Fleming
    Signed-off-by: Borislav Petkov
    Cc: # v4.19+
    Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk

    Matt Fleming
     

12 Jun, 2020

1 commit

  • Merge the state of the locking kcsan branch before the read/write_once()
    and the atomics modifications got merged.

    Squash the fallout of the rebase on top of the read/write once and atomic
    fallback work into the merge. The history of the original branch is
    preserved in tag locking-kcsan-2020-06-02.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

04 Jun, 2020

1 commit

  • Pull x86 timer updates from Thomas Gleixner:
    "X86 timer specific updates:

    - Add TPAUSE based delay which allows the CPU to enter an optimized
    power state while waiting for the delay to pass. The delay is based
    on TSC cycles.

    - Add tsc_early_khz command line parameter to workaround the problem
    that overclocked CPUs can report the wrong frequency via CPUID.16h
    which causes the refined calibration to fail because the delta to
    the initial frequency value is too big. With the parameter users
    can provide an halfways accurate initial value"

    * tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tsc: Add tsc_early_khz command line parameter
    x86/delay: Introduce TPAUSE delay
    x86/delay: Refactor delay_mwaitx() for TPAUSE support
    x86/delay: Preparatory code cleanup

    Linus Torvalds
     

02 Jun, 2020

1 commit

  • Pull uaccess/csum updates from Al Viro:
    "Regularize the sitation with uaccess checksum primitives:

    - fold csum_partial_... into csum_and_copy_..._user()

    - on x86 collapse several access_ok()/stac()/clac() into
    user_access_begin()/user_access_end()"

    * 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    default csum_and_copy_to_user(): don't bother with access_ok()
    take the dummy csum_and_copy_from_user() into net/checksum.h
    arm: switch to csum_and_copy_from_user()
    sh32: convert to csum_and_copy_from_user()
    m68k: convert to csum_and_copy_from_user()
    xtensa: switch to providing csum_and_copy_from_user()
    sparc: switch to providing csum_and_copy_from_user()
    parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
    alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
    ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
    ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
    x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
    x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
    x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
    get rid of csum_partial_copy_to_user()

    Linus Torvalds
     

30 May, 2020

2 commits


07 May, 2020

3 commits

  • TPAUSE instructs the processor to enter an implementation-dependent
    optimized state. The instruction execution wakes up when the time-stamp
    counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
    The instruction execution also wakes up due to the expiration of
    the operating system time-limit or by an external interrupt
    or exceptions such as a debug exception or a machine check exception.

    TPAUSE offers a choice of two lower power states:
    1. Light-weight power/performance optimized state C0.1
    2. Improved power/performance optimized state C0.2

    This way, it can save power with low wake-up latency in comparison to
    spinloop based delay. The selection between the two is governed by the
    input register.

    TPAUSE is available on processors with X86_FEATURE_WAITPKG.

    Co-developed-by: Fenghua Yu
    Signed-off-by: Fenghua Yu
    Signed-off-by: Kyung Min Park
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com

    Kyung Min Park
     
  • Refactor code to make it easier to add a new model specific function to
    delay for a number of cycles.

    No functional change.

    Co-developed-by: Fenghua Yu
    Signed-off-by: Fenghua Yu
    Signed-off-by: Kyung Min Park
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Link: https://lkml.kernel.org/r/1587757076-30337-3-git-send-email-kyung.min.park@intel.com

    Kyung Min Park
     
  • The naming conventions in the delay code are confusing at best.

    All delay variants use a loops argument and or variable which originates
    from the original delay_loop() implementation. But all variants except
    delay_loop() are based on TSC cycles.

    Rename the argument to cycles and make it type u64 to avoid these weird
    expansions to u64 in the functions.

    Rename MWAITX_MAX_LOOPS to MWAITX_MAX_WAIT_CYCLES for the same reason
    and fixup the comment of delay_mwaitx() as well.

    Mark the delay_fn function pointer __ro_after_init and fixup the comment
    for it.

    No functional change and preparation for the upcoming TPAUSE based delay
    variant.

    [ Kyung Min Park: Added __init to use_tsc_delay() ]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Kyung Min Park
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/1587757076-30337-2-git-send-email-kyung.min.park@intel.com

    Thomas Gleixner
     

01 May, 2020

3 commits

  • Currently objtool cannot understand retpolines, and thus cannot
    generate ORC unwind information for them. This means that we cannot
    unwind from the middle of a retpoline.

    The recent ANNOTATE_INTRA_FUNCTION_CALL and UNWIND_HINT_RET_OFFSET
    support in objtool enables it to understand the basic retpoline
    construct. A further problem is that the ORC unwind information is
    alternative invariant; IOW. every alternative should have the same
    ORC, retpolines obviously violate this. This means we need to
    out-of-line them.

    Since all GCC generated code already uses out-of-line retpolines, this
    should not affect performance much, if anything.

    This will enable objtool to generate valid ORC data for the
    out-of-line copies, which means we can correctly and reliably unwind
    through a retpoline.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200428191700.210835357@infradead.org

    Peter Zijlstra
     
  • In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
    versions of the retpoline magic, we need to remove the '%' from the
    argument, such that we can paste it onto symbol names.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org

    Peter Zijlstra
     
  • Because of how KSYM works, we need one declaration per line. Seeing
    how we're going to be doubling the amount of retpoline symbols,
    simplify the machinery in order to avoid having to copy/paste even
    more.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200428191700.091696925@infradead.org

    Peter Zijlstra
     

23 Apr, 2020

1 commit

  • For historical reasons some architectures call their csum_and_copy_to_user()
    csum_partial_copy_to_user() instead (and supply a macro defining the
    former as the latter). That's the last remnants of old experiment that
    went nowhere; time to bury them. Rename those to csum_and_copy_to_user()
    and get rid of the macros.

    Signed-off-by: Al Viro

    Al Viro