Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2020

1 commit

12cb908a1 x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes ... Browse Code »

Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be

insn.prefixes.bytes[i] != 0 and i < 4

instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.

Debugged by Kees Cook .

[ bp: Massage commit message. ]

Fixes: 32d0b95300db ("x86/insn-eval: Add utility functions to get segment selector")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu
Signed-off-by: Borislav Petkov
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2

Masami Hiramatsu
2020-12-06 17:03:08 +0800

04 Nov, 2020

1 commit

4d6ffa27b x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S ... Browse Code »

Commit

393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")

added .weak directives to arch/x86/lib/mem*_64.S instead of changing the
existing ENTRY macros to WEAK. This can lead to the assembly snippet

.weak memcpy
...
.globl memcpy

which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy
with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding.

Commit

ef1e03152cb0 ("x86/asm: Make some functions local")

changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which
was ineffective due to the preceding .weak directive.

Use the appropriate SYM_FUNC_START_WEAK instead.

Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")
Fixes: ef1e03152cb0 ("x86/asm: Make some functions local")
Reported-by: Sami Tolvanen
Signed-off-by: Fangrui Song
Signed-off-by: Borislav Petkov
Reviewed-by: Nick Desaulniers
Tested-by: Nathan Chancellor
Tested-by: Nick Desaulniers
Cc:
Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com

Fangrui Song
2020-11-04 19:30:20 +0800

23 Oct, 2020

1 commit

f56e65dff Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode

Linus Torvalds
2020-10-23 00:59:21 +0800

15 Oct, 2020

1 commit

da9803dfd Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.

With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.

Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.

The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:

arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c

Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.

Work by Joerg Roedel and Thomas Lendacky and others"

* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...

Linus Torvalds
2020-10-15 01:21:34 +0800

13 Oct, 2020

4 commits

d55564cfc x86: Make __put_user() generate an out-of-line call ... Browse Code »

Instead of inlining the stac/mov/clac sequence (which also requires
individual exception table entries and several asm instruction
alternatives entries), just generate "call __put_user_nocheck_X" for the
__put_user() cases, the same way we changed __get_user earlier.

Unlike the get_user() case, we didn't have the same nice infrastructure
to just generate the call with a single case, so this actually has to
change some of the infrastructure in order to do this. But that only
cleans up the code further.

So now, instead of using a case statement for the sizes, we just do the
same thing we've done on the get_user() side for a long time: use the
size as an immediate constant to the asm, and generate the asm that way
directly.

In order to handle the special case of 64-bit data on a 32-bit kernel, I
needed to change the calling convention slightly: the data is passed in
%eax[:%edx], the pointer in %ecx, and the return value is also returned
in %ecx. It used to be returned in %eax, but because of how %eax can
now be a double register input, we don't want mix that with a
single-register output.

The actual low-level asm is easier to handle: we'll just share the code
between the checking and non-checking case, with the non-checking case
jumping into the middle of the function. That may sound a bit too
special, but this code is all very very special anyway, so...

Cc: Al Viro
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2020-10-13 07:57:57 +0800
ea6f043fc x86: Make __get_user() generate an out-of-line call ... Browse Code »

Instead of inlining the whole stac/lfence/mov/clac sequence (which also
requires individual exception table entries and several asm instruction
alternatives entries), just generate "call __get_user_nocheck_X" for the
__get_user() cases.

We can use all the same infrastructure that we already do for the
regular "get_user()", and the end result is simpler source code, and
much simpler code generation.

It also means that when I introduce asm goto with input for
"unsafe_get_user()", there are no nasty interactions with the
__get_user() code.

Cc: Al Viro
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2020-10-13 07:57:57 +0800
c90578360 Merge branch 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"

[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]

* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument

Linus Torvalds
2020-10-13 07:24:13 +0800
ca1b66922 Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RAS updates from Borislav Petkov:

- Extend the recovery from MCE in kernel space also to processes which
encounter an MCE in kernel space but while copying from user memory
by sending them a SIGBUS on return to user space and umapping the
faulty memory, by Tony Luck and Youquan Song.

- memcpy_mcsafe() rework by splitting the functionality into
copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
support for new hardware which can recover from a machine check
encountered during a fast string copy and makes that the default and
lets the older hardware which does not support that advance recovery,
opt in to use the old, fragile, slow variant, by Dan Williams.

- New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

- Do not use MSR-tracing accessors in #MC context and flag any fault
while accessing MCA architectural MSRs as an architectural violation
with the hope that such hw/fw misdesigns are caught early during the
hw eval phase and they don't make it into production.

- Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
x86/mce: Decode a kernel instruction to determine if it is copying from user
x86/mce: Recover from poison found while copying from user space
x86/mce: Avoid tail copy when machine check terminated a copy from user
x86/mce: Add _ASM_EXTABLE_CPY for copy user access
x86/mce: Provide method to find out the type of an exception handler
x86/mce: Pass pointer to saved pt_regs to severity calculation routines
x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
x86/mce: Add Skylake quirk for patrol scrub reported errors
RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
x86/mce: Annotate mce_rd/wrmsrl() with noinstr
x86/mce/dev-mcelog: Do not update kflags on AMD systems
x86/mce: Stop mce_reign() from re-computing severity for every CPU
x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
x86/mce: Increase maximum number of banks to 64
x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
RAS/CEC: Fix cec_init() prototype

Linus Torvalds
2020-10-13 01:14:38 +0800

07 Oct, 2020

2 commits

a2f73400e x86/mce: Avoid tail copy when machine check terminated a copy from user ... Browse Code »

In the page fault case it is ok to see if a few more unaligned bytes
can be copied from the source address. Worst case is that the page fault
will be triggered again.

Machine checks are more serious. Just give up at the point where the
main copy loop triggered the #MC and return from the copy code as if
the copy succeeded. The machine check handler will use task_work_add() to
make sure that the task is sent a SIGBUS.

Signed-off-by: Tony Luck
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20201006210910.21062-5-tony.luck@intel.com

Tony Luck
2020-10-07 17:26:56 +0800
278b917f8 x86/mce: Add _ASM_EXTABLE_CPY for copy user access ... Browse Code »

_ASM_EXTABLE_UA is a general exception entry to record the exception fixup
for all exception spots between kernel and user space access.

To enable recovery from machine checks while coping data from user
addresses it is necessary to be able to distinguish the places that are
looping copying data from those that copy a single byte/word/etc.

Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA
in the copy functions.

Record the exception reason number to regs->ax at
ex_handler_uaccess which is used to check MCE triggered.

The new fixup routine ex_handler_copy() is almost an exact copy of
ex_handler_uaccess() The difference is that it sets regs->ax to the trap
number. Following patches use this to avoid trying to copy remaining
bytes from the tail of the copy and possibly hitting the poison again.

New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity()
calculation to indicate that a machine check is recoverable because the
kernel was copying from user space.

Signed-off-by: Youquan Song
Signed-off-by: Tony Luck
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.luck@intel.com

Youquan Song
2020-10-07 17:19:11 +0800

06 Oct, 2020

2 commits

5da8e4a65 x86/copy_mc: Introduce copy_mc_enhanced_fast_string() ... Browse Code »

The motivations to go rework memcpy_mcsafe() are that the benefit of
doing slow and careful copies is obviated on newer CPUs, and that the
current opt-in list of CPUs to instrument recovery is broken relative to
those CPUs. There is no need to keep an opt-in list up to date on an
ongoing basis if pmem/dax operations are instrumented for recovery by
default. With recovery enabled by default the old "mcsafe_key" opt-in to
careful copying can be made a "fragile" opt-out. Where the "fragile"
list takes steps to not consume poison across cachelines.

The discussion with Linus made clear that the current "_mcsafe" suffix
was imprecise to a fault. The operations that are needed by pmem/dax are
to copy from a source address that might throw #MC to a destination that
may write-fault, if it is a user page.

So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
the separate precautions taken on source and destination.
copy_mc_to_kernel() is introduced as a non-SMAP version that does not
expect write-faults on the destination, but is still prepared to abort
with an error code upon taking #MC.

The original copy_mc_fragile() implementation had negative performance
implications since it did not use the fast-string instruction sequence
to perform copies. For this reason copy_mc_to_kernel() fell back to
plain memcpy() to preserve performance on platforms that did not indicate
the capability to recover from machine check exceptions. However, that
capability detection was not architectural and now that some platforms
can recover from fast-string consumption of memory errors the memcpy()
fallback now causes these more capable platforms to fail.

Introduce copy_mc_enhanced_fast_string() as the fast default
implementation of copy_mc_to_kernel() and finalize the transition of
copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
With this in place, copy_mc_to_kernel() is fast and recovery-ready by
default regardless of hardware capability.

Thanks to Vivek for identifying that copy_user_generic() is not suitable
as the copy_mc_to_user() backend since the #MC handler explicitly checks
ex_has_fault_handler(). Thanks to the 0day robot for catching a
performance bug in the x86/copy_mc_to_user implementation.

[ bp: Add the "why" for this change from the 0/2th message, massage. ]

Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Reported-by: Erwin Tsaur
Reported-by: 0day robot
Signed-off-by: Dan Williams
Signed-off-by: Borislav Petkov
Reviewed-by: Tony Luck
Tested-by: Erwin Tsaur
Cc:
Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com

Dan Williams
2020-10-06 17:37:36 +0800
ec6347bb4 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() ... Browse Code »

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams
Signed-off-by: Borislav Petkov
Reviewed-by: Tony Luck
Acked-by: Michael Ellerman
Cc:
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

Dan Williams
2020-10-06 17:18:04 +0800

27 Sep, 2020

1 commit

a1cd6c2ae arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback ... Browse Code »

If we copy less than 8 bytes and if the destination crosses a cache
line, __copy_user_flushcache would invalidate only the first cache line.

This patch makes it invalidate the second cache line as well.

Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations")
Signed-off-by: Mikulas Patocka
Signed-off-by: Andrew Morton
Reviewed-by: Dan Williams
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Al Viro
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Ingo Molnar
Cc:
Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Linus Torvalds

Mikulas Patocka
2020-09-27 01:33:57 +0800

09 Sep, 2020

1 commit

47058bb54 x86: remove address space overrides using set_fs() ... Browse Code »

Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more. To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.

Signed-off-by: Christoph Hellwig
Reviewed-by: Kees Cook
Signed-off-by: Al Viro

Christoph Hellwig
2020-09-09 10:21:36 +0800

08 Sep, 2020

4 commits

5901781a1 x86/insn: Add insn_has_rep_prefix() helper ... Browse Code »

Add a function to check whether an instruction has a REP prefix.

Signed-off-by: Joerg Roedel
Signed-off-by: Borislav Petkov
Reviewed-by: Masami Hiramatsu
Link: https://lkml.kernel.org/r/20200907131613.12703-12-joro@8bytes.org

Joerg Roedel
2020-09-08 01:45:25 +0800
7af1bd822 x86/insn: Add insn_get_modrm_reg_off() ... Browse Code »

Add a function to the instruction decoder which returns the pt_regs
offset of the register specified in the reg field of the modrm byte.

Signed-off-by: Joerg Roedel
Signed-off-by: Borislav Petkov
Acked-by: Masami Hiramatsu
Link: https://lkml.kernel.org/r/20200907131613.12703-11-joro@8bytes.org

Joerg Roedel
2020-09-08 01:45:24 +0800
172639d79 x86/umip: Factor out instruction decoding ... Browse Code »

Factor out the code used to decode an instruction with the correct
address and operand sizes to a helper function.

No functional changes.

Signed-off-by: Joerg Roedel
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20200907131613.12703-10-joro@8bytes.org

Joerg Roedel
2020-09-08 01:45:24 +0800
172b75e56 x86/umip: Factor out instruction fetch ... Browse Code »

Factor out the code to fetch the instruction from user-space to a helper
function.

No functional changes.

Signed-off-by: Joerg Roedel
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20200907131613.12703-9-joro@8bytes.org

Joerg Roedel
2020-09-08 01:45:24 +0800

03 Sep, 2020

1 commit

aef0148f3 x86/cmdline: Disable jump tables for cmdline.c ... Browse Code »

When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the
switch statement in cmdline_find_option (jump tables are disabled when
CONFIG_RETPOLINE is enabled). This function is called very early in boot
from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time,
the kernel is still executing out of the identity mapping, but the jump
table will contain virtual addresses.

Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is
enabled.

Signed-off-by: Arvind Sankar
Signed-off-by: Ingo Molnar
Link: https://lore.kernel.org/r/20200903023056.3914690-1-nivedita@alum.mit.edu

Arvind Sankar
2020-09-03 16:59:16 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

21 Aug, 2020

4 commits

daf52375c amd64: switch csum_partial_copy_generic() to new calling conventions ... Browse Code »

... and fold handling of misaligned case into it.

Implementation note: we stash the "will we need to rol8 the sum in the end"
flag into the MSB of %rcx (the lower 32 bits are used for length); the rest
is pretty straightforward.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:22 +0800
e8b950899 i386: propagate the calling conventions change down to csum_partial_copy_generic() ... Browse Code »

... and don't bother zeroing destination on error

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:18 +0800
c693cc467 saner calling conventions for csum_and_copy_..._user() ... Browse Code »

All callers of these primitives will
* discard anything we might've copied in case of error
* ignore the csum value in case of error
* always pass 0xffffffff as the initial sum, so the
resulting csum value (in case of success, that is) will never be 0.

That suggest the following calling conventions:
* don't pass err_ptr - just return 0 on error.
* don't bother with zeroing destination, etc. in case of error
* don't pass the initial sum - just use 0xffffffff.

This commit does the minimal conversion in the instances of csum_and_copy_...();
the changes of actual asm code behind them are done later in the series.
Note that this asm code is often shared with csum_partial_copy_nocheck();
the difference is that csum_partial_copy_nocheck() passes 0 for initial
sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are
free to pass 0xffffffff in all cases and subsequent patches will use that
freedom without any special comments.

A part that could be split off: parisc and uml/i386 claimed to have
csum_and_copy_to_user() instances of their own, but those were identical
to the generic one, so we simply drop them. Not sure if it's worth
a separate commit...

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:15 +0800
cc44c17ba csum_partial_copy_nocheck(): drop the last argument ... Browse Code »

It's always 0. Note that we theoretically could use ~0U as well -
result will be the same modulo 0xffff, _if_ the damn thing did the
right thing for any value of initial sum; later we'll make use of
that when convenient.

However, unlike csum_and_copy_..._user(), there are instances that
did not work for arbitrary initial sums; c6x is one such.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:14 +0800

07 Jul, 2020

1 commit

893ab0043 kbuild: remove cc-option test of -fno-stack-protector ... Browse Code »

Some Makefiles already pass -fno-stack-protector unconditionally.
For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.

No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -fno-stack-protector.

GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)

Get rid of cc-option from -fno-stack-protector.

Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.

Note:
arch/mips/vdso/Makefile adds -fno-stack-protector twice, first
unconditionally, and second conditionally. I removed the second one.

Signed-off-by: Masahiro Yamada
Reviewed-by: Kees Cook
Acked-by: Ard Biesheuvel
Reviewed-by: Nick Desaulniers

Masahiro Yamada
2020-07-07 10:13:10 +0800

29 Jun, 2020

1 commit

098c79382 Merge tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Borislav Petkov:

- AMD Memory bandwidth counter width fix, by Babu Moger.

- Use the proper length type in the 32-bit truncate() syscall variant,
by Jiri Slaby.

- Reinit IA32_FEAT_CTL during wakeup to fix the case where after
resume, VMXON would #GP due to VMX not being properly enabled, by
Sean Christopherson.

- Fix a static checker warning in the resctrl code, by Dan Carpenter.

- Add a CR4 pinning mask for bits which cannot change after boot, by
Kees Cook.

- Align the start of the loop of __clear_user() to 16 bytes, to improve
performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/64: Align start of __clear_user() loop to 16-bytes
x86/cpu: Use pinning mask for CR4 bits needing to be 0
x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
syscalls: Fix offset type of ksys_ftruncate()
x86/resctrl: Fix memory bandwidth counter width for AMD

Linus Torvalds
2020-06-29 01:35:01 +0800

25 Jun, 2020

1 commit

e3a9e681a x86/entry: Fixup bad_iret vs noinstr ... Browse Code »

vmlinux.o: warning: objtool: fixup_bad_iret()+0x8e: call to memcpy() leaves .noinstr.text section

Worse, when KASAN there is no telling what memcpy() actually is. Force
the use of __memcpy() which is our assmebly implementation.

Reported-by: Marco Elver
Suggested-by: Marco Elver
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Marco Elver
Link: https://lkml.kernel.org/r/20200618144801.760070502@infradead.org

Peter Zijlstra
2020-06-25 19:45:39 +0800

20 Jun, 2020

1 commit

bb5570ad3 x86/asm/64: Align start of __clear_user() loop to 16-bytes ... Browse Code »

x86 CPUs can suffer severe performance drops if a tight loop, such as
the ones in __clear_user(), straddles a 16-byte instruction fetch
window, or worse, a 64-byte cacheline. This issues was discovered in the
SUSE kernel with the following commit,

1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

which increased the code object size from 10 bytes to 15 bytes and
caused the 8-byte copy loop in __clear_user() to be split across a
64-byte cacheline.

Aligning the start of the loop to 16-bytes makes this fit neatly inside
a single instruction fetch window again and restores the performance of
__clear_user() which is used heavily when reading from /dev/zero.

Here are some numbers from running libmicro's read_z* and pread_z*
microbenchmarks which read from /dev/zero:

Zen 1 (Naples)

libmicro-file
5.7.0-rc6 5.7.0-rc6 5.7.0-rc6
revert-1153933703d9+ align16+
Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%)
Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%)
Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%)
Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%)
Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%)
Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%)

Note that this doesn't affect Haswell or Broadwell microarchitectures
which seem to avoid the alignment issue by executing the loop straight
out of the Loop Stream Detector (verified using perf events).

Fixes: 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
Signed-off-by: Matt Fleming
Signed-off-by: Borislav Petkov
Cc: # v4.19+
Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk

Matt Fleming
2020-06-20 00:32:11 +0800

12 Jun, 2020

1 commit

37d1a04b1 Rebase locking/kcsan to locking/urgent ... Browse Code »

Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.

Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2020-06-12 02:02:46 +0800

04 Jun, 2020

1 commit

f6aee505c Merge tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 timer updates from Thomas Gleixner:
"X86 timer specific updates:

- Add TPAUSE based delay which allows the CPU to enter an optimized
power state while waiting for the delay to pass. The delay is based
on TSC cycles.

- Add tsc_early_khz command line parameter to workaround the problem
that overclocked CPUs can report the wrong frequency via CPUID.16h
which causes the refined calibration to fail because the delta to
the initial frequency value is too big. With the parameter users
can provide an halfways accurate initial value"

* tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Add tsc_early_khz command line parameter
x86/delay: Introduce TPAUSE delay
x86/delay: Refactor delay_mwaitx() for TPAUSE support
x86/delay: Preparatory code cleanup

Linus Torvalds
2020-06-04 01:18:09 +0800

02 Jun, 2020

1 commit

4b01285e1 Merge branch 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull uaccess/csum updates from Al Viro:
"Regularize the sitation with uaccess checksum primitives:

- fold csum_partial_... into csum_and_copy_..._user()

- on x86 collapse several access_ok()/stac()/clac() into
user_access_begin()/user_access_end()"

* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
default csum_and_copy_to_user(): don't bother with access_ok()
take the dummy csum_and_copy_from_user() into net/checksum.h
arm: switch to csum_and_copy_from_user()
sh32: convert to csum_and_copy_from_user()
m68k: convert to csum_and_copy_from_user()
xtensa: switch to providing csum_and_copy_from_user()
sparc: switch to providing csum_and_copy_from_user()
parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
get rid of csum_partial_copy_to_user()

Linus Torvalds
2020-06-02 07:03:37 +0800

30 May, 2020

2 commits

0a5ea224b x86: switch both 32bit and 64bit to providing csum_and_copy_from_user() ... Browse Code »

... rather than messing with the wrapper. As a side effect,
32bit variant gets access_ok() into it and can be switched to
user_access_begin()/user_access_end()

Signed-off-by: Al Viro

Al Viro
2020-05-30 04:11:48 +0800
73e800ecb x86_64: csum_..._copy_..._user(): switch to unsafe_..._user() ... Browse Code »

We already have stac/clac pair around the calls of csum_partial_copy_generic().
Stretch that area back, so that it covers the preceding loop (and convert
the loop body from __{get,put}_user() to unsafe_{get,put}_user()).
That brings the beginning of the areas to the earlier access_ok(),
which allows to convert them into user_access_{begin,end}() ones.

Signed-off-by: Al Viro

Al Viro
2020-05-30 04:11:28 +0800

07 May, 2020

3 commits

cec5f268c x86/delay: Introduce TPAUSE delay ... Browse Code »

TPAUSE instructs the processor to enter an implementation-dependent
optimized state. The instruction execution wakes up when the time-stamp
counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
The instruction execution also wakes up due to the expiration of
the operating system time-limit or by an external interrupt
or exceptions such as a debug exception or a machine check exception.

TPAUSE offers a choice of two lower power states:
1. Light-weight power/performance optimized state C0.1
2. Improved power/performance optimized state C0.2

This way, it can save power with low wake-up latency in comparison to
spinloop based delay. The selection between the two is governed by the
input register.

TPAUSE is available on processors with X86_FEATURE_WAITPKG.

Co-developed-by: Fenghua Yu
Signed-off-by: Fenghua Yu
Signed-off-by: Kyung Min Park
Signed-off-by: Thomas Gleixner
Reviewed-by: Tony Luck
Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com

Kyung Min Park
2020-05-07 22:06:20 +0800
46f90c7aa x86/delay: Refactor delay_mwaitx() for TPAUSE support ... Browse Code »

Refactor code to make it easier to add a new model specific function to
delay for a number of cycles.

No functional change.

Co-developed-by: Fenghua Yu
Signed-off-by: Fenghua Yu
Signed-off-by: Kyung Min Park
Signed-off-by: Thomas Gleixner
Reviewed-by: Tony Luck
Link: https://lkml.kernel.org/r/1587757076-30337-3-git-send-email-kyung.min.park@intel.com

Kyung Min Park
2020-05-07 22:06:19 +0800
e88248902 x86/delay: Preparatory code cleanup ... Browse Code »

The naming conventions in the delay code are confusing at best.

All delay variants use a loops argument and or variable which originates
from the original delay_loop() implementation. But all variants except
delay_loop() are based on TSC cycles.

Rename the argument to cycles and make it type u64 to avoid these weird
expansions to u64 in the functions.

Rename MWAITX_MAX_LOOPS to MWAITX_MAX_WAIT_CYCLES for the same reason
and fixup the comment of delay_mwaitx() as well.

Mark the delay_fn function pointer __ro_after_init and fixup the comment
for it.

No functional change and preparation for the upcoming TPAUSE based delay
variant.

[ Kyung Min Park: Added __init to use_tsc_delay() ]

Signed-off-by: Thomas Gleixner
Signed-off-by: Kyung Min Park
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/1587757076-30337-2-git-send-email-kyung.min.park@intel.com

Thomas Gleixner
2020-05-07 22:06:19 +0800

01 May, 2020

3 commits

cc1ac9c79 x86/retpoline: Fix retpoline unwind ... Browse Code »

Currently objtool cannot understand retpolines, and thus cannot
generate ORC unwind information for them. This means that we cannot
unwind from the middle of a retpoline.

The recent ANNOTATE_INTRA_FUNCTION_CALL and UNWIND_HINT_RET_OFFSET
support in objtool enables it to understand the basic retpoline
construct. A further problem is that the ORC unwind information is
alternative invariant; IOW. every alternative should have the same
ORC, retpolines obviously violate this. This means we need to
out-of-line them.

Since all GCC generated code already uses out-of-line retpolines, this
should not affect performance much, if anything.

This will enable objtool to generate valid ORC data for the
out-of-line copies, which means we can correctly and reliably unwind
through a retpoline.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Josh Poimboeuf
Link: https://lkml.kernel.org/r/20200428191700.210835357@infradead.org

Peter Zijlstra
2020-05-01 02:14:34 +0800
34fdce698 x86: Change {JMP,CALL}_NOSPEC argument ... Browse Code »

In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
versions of the retpoline magic, we need to remove the '%' from the
argument, such that we can paste it onto symbol names.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Josh Poimboeuf
Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org

Peter Zijlstra
2020-05-01 02:14:34 +0800
ca3f0d80d x86: Simplify retpoline declaration ... Browse Code »

Because of how KSYM works, we need one declaration per line. Seeing
how we're going to be doubling the amount of retpoline symbols,
simplify the machinery in order to avoid having to copy/paste even
more.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Josh Poimboeuf
Link: https://lkml.kernel.org/r/20200428191700.091696925@infradead.org

Peter Zijlstra
2020-05-01 02:14:34 +0800

23 Apr, 2020

1 commit

2a89b674f get rid of csum_partial_copy_to_user() ... Browse Code »

For historical reasons some architectures call their csum_and_copy_to_user()
csum_partial_copy_to_user() instead (and supply a macro defining the
former as the latter). That's the last remnants of old experiment that
went nowhere; time to bury them. Rename those to csum_and_copy_to_user()
and get rid of the macros.

Signed-off-by: Al Viro

Al Viro
2020-04-23 02:37:50 +0800