Eric Lee / smarc-fsl-linux-kernel

28 Apr, 2021

3 commits

55e6be657 Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"The only notable change is Vipin's new misc cgroup controller.

This implements generic support for resources which can be controlled
by simply counting and limiting the number of resource instances - ie
there's X number of these on the system and this cgroup subtree can
have upto Y of those.

The first user is the address space IDs used for virtual machine
memory encryption and expected future usages are similar - niche
hardware features with concrete resource limits and simple usage
models"

* 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io()
cgroup/cpuset: fix typos in comments
cgroup: misc: mark dummy misc_cg_res_total_usage() static inline
svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
cgroup: Miscellaneous cgroup documentation.
cgroup: Add misc cgroup controller

Linus Torvalds
2021-04-28 09:47:42 +0800
57fa2369a Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull CFI on arm64 support from Kees Cook:
"This builds on last cycle's LTO work, and allows the arm64 kernels to
be built with Clang's Control Flow Integrity feature. This feature has
happily lived in Android kernels for almost 3 years[1], so I'm excited
to have it ready for upstream.

The wide diffstat is mainly due to the treewide fixing of mismatched
list_sort prototypes. Other things in core kernel are to address
various CFI corner cases. The largest code portion is the CFI runtime
implementation itself (which will be shared by all architectures
implementing support for CFI). The arm64 pieces are Acked by arm64
maintainers rather than coming through the arm64 tree since carrying
this tree over there was going to be awkward.

CFI support for x86 is still under development, but is pretty close.
There are a handful of corner cases on x86 that need some improvements
to Clang and objtool, but otherwise works well.

Summary:

- Clean up list_sort prototypes (Sami Tolvanen)

- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"

* tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: allow CONFIG_CFI_CLANG to be selected
KVM: arm64: Disable CFI for nVHE
arm64: ftrace: use function_nocfi for ftrace_call
arm64: add __nocfi to __apply_alternatives
arm64: add __nocfi to functions that jump to a physical address
arm64: use function_nocfi with __pa_symbol
arm64: implement function_nocfi
psci: use function_nocfi for cpu_resume
lkdtm: use function_nocfi
treewide: Change list_sort to use const pointers
bpf: disable CFI in dispatcher functions
kallsyms: strip ThinLTO hashes from static functions
kthread: use WARN_ON_FUNCTION_MISMATCH
workqueue: use WARN_ON_FUNCTION_MISMATCH
module: ensure __cfi_check alignment
mm: add generic function_nocfi macro
cfi: add __cficanonical
add support for Clang CFI

Linus Torvalds
2021-04-28 01:16:46 +0800
7e4910b9a Merge tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp updates from Kees Cook:

- Fix "cacheable" typo in comments (Cui GaoSheng)

- Fix CONFIG for /proc/$pid/status Seccomp_filters (Kenta.Tada@sony.com)

* tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Fix "cacheable" typo in comments
seccomp: Fix CONFIG tests for Seccomp_filters

Linus Torvalds
2021-04-28 01:03:12 +0800

09 Apr, 2021

1 commit

cf68fffb6 add support for Clang CFI ... Browse Code »

This change adds support for Clang’s forward-edge Control Flow
Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
injects a runtime check before each indirect function call to ensure
the target is a valid function with the correct static type. This
restricts possible call targets and makes it more difficult for
an attacker to exploit bugs that allow the modification of stored
function pointers. For more details, see:

https://clang.llvm.org/docs/ControlFlowIntegrity.html

Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
visibility to possible call targets. Kernel modules are supported
with Clang’s cross-DSO CFI mode, which allows checking between
independently compiled components.

With CFI enabled, the compiler injects a __cfi_check() function into
the kernel and each module for validating local call targets. For
cross-module calls that cannot be validated locally, the compiler
calls the global __cfi_slowpath_diag() function, which determines
the target module and calls the correct __cfi_check() function. This
patch includes a slowpath implementation that uses __module_address()
to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
shadow map that speeds up module look-ups by ~3x.

Clang implements indirect call checking using jump tables and
offers two methods of generating them. With canonical jump tables,
the compiler renames each address-taken function to .cfi
and points the original symbol to a jump table entry, which passes
__cfi_check() validation. This isn’t compatible with stand-alone
assembly code, which the compiler doesn’t instrument, and would
result in indirect calls to assembly code to fail. Therefore, we
default to using non-canonical jump tables instead, where the compiler
generates a local jump table entry .cfi_jt for each
address-taken function, and replaces all references to the function
with the address of the jump table entry.

Note that because non-canonical jump table addresses are local
to each component, they break cross-module function address
equality. Specifically, the address of a global function will be
different in each module, as it's replaced with the address of a local
jump table entry. If this address is passed to a different module,
it won’t match the address of the same function taken there. This
may break code that relies on comparing addresses passed from other
components.

CFI checking can be disabled in a function with the __nocfi attribute.
Additionally, CFI can be disabled for an entire compilation unit by
filtering out CC_FLAGS_CFI.

By default, CFI failures result in a kernel panic to stop a potential
exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
kernel prints out a rate-limited warning instead, and allows execution
to continue. This option is helpful for locating type mismatches, but
should only be enabled during development.

Signed-off-by: Sami Tolvanen
Reviewed-by: Kees Cook
Tested-by: Nathan Chancellor
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com

Sami Tolvanen
2021-04-09 07:04:20 +0800

08 Apr, 2021

1 commit

39218ff4c stack: Optionally randomize kernel stack offset each syscall ... Browse Code »

This provides the ability for architectures to enable kernel stack base
address offset randomization. This feature is controlled by the boot
param "randomize_kstack_offset=on/off", with its default value set by
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

This feature is based on the original idea from the last public release
of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
All the credit for the original idea goes to the PaX team. Note that
the design and implementation of this upstream randomize_kstack_offset
feature differs greatly from the RANDKSTACK feature (see below).

Reasoning for the feature:

This feature aims to make harder the various stack-based attacks that
rely on deterministic stack structure. We have had many such attacks in
past (just to name few):

https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

As Linux kernel stack protections have been constantly improving
(vmap-based stack allocation with guard pages, removal of thread_info,
STACKLEAK), attackers have had to find new ways for their exploits
to work. They have done so, continuing to rely on the kernel's stack
determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
were not relevant. For example, the following recent attacks would have
been hampered if the stack offset was non-deterministic between syscalls:

https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
(page 70: targeting the pt_regs copy with linear stack overflow)

https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
(leaked stack address from one syscall as a target during next syscall)

The main idea is that since the stack offset is randomized on each system
call, it is harder for an attack to reliably land in any particular place
on the thread stack, even with address exposures, as the stack base will
change on the next syscall. Also, since randomization is performed after
placing pt_regs, the ptrace-based approach[1] to discover the randomized
offset during a long-running syscall should not be possible.

Design description:

During most of the kernel's execution, it runs on the "thread stack",
which is pretty deterministic in its structure: it is fixed in size,
and on every entry from userspace to kernel on a syscall the thread
stack starts construction from an address fetched from the per-cpu
cpu_current_top_of_stack variable. The first element to be pushed to the
thread stack is the pt_regs struct that stores all required CPU registers
and syscall parameters. Finally the specific syscall function is called,
with the stack being used as the kernel executes the resulting request.

The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and before the rest of the
thread stack is used during the syscall processing, and to change it every
time a process issues a syscall. The source of randomness is currently
architecture-defined (but x86 is using the low byte of rdtsc()). Future
improvements for different entropy sources is possible, but out of scope
for this patch. Further more, to add more unpredictability, new offsets
are chosen at the end of syscalls (the timing of which should be less
easy to measure from userspace than at syscall entry time), and stored
in a per-CPU variable, so that the life of the value does not stay
explicitly tied to a single task.

As suggested by Andy Lutomirski, the offset is added using alloca()
and an empty asm() statement with an output constraint, since it avoids
changes to assembly syscall entry code, to the unwinder, and provides
correct stack alignment as defined by the compiler.

In order to make this available by default with zero performance impact
for those that don't want it, it is boot-time selectable with static
branches. This way, if the overhead is not wanted, it can just be
left turned off with no performance impact.

The generated assembly for x86_64 with GCC looks like this:

...
ffffffff81003977: 65 8b 05 02 ea 00 7f mov %gs:0x7f00ea02(%rip),%eax
# 12380
ffffffff8100397e: 25 ff 03 00 00 and $0x3ff,%eax
ffffffff81003983: 48 83 c0 0f add $0xf,%rax
ffffffff81003987: 25 f8 07 00 00 and $0x7f8,%eax
ffffffff8100398c: 48 29 c4 sub %rax,%rsp
ffffffff8100398f: 48 8d 44 24 0f lea 0xf(%rsp),%rax
ffffffff81003994: 48 83 e0 f0 and $0xfffffffffffffff0,%rax
...

As a result of the above stack alignment, this patch introduces about
5 bits of randomness after pt_regs is spilled to the thread stack on
x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
stack alignment). The amount of entropy could be adjusted based on how
much of the stack space we wish to trade for security.

My measure of syscall performance overhead (on x86_64):

lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
randomize_kstack_offset=y Simple syscall: 0.7082 microseconds
randomize_kstack_offset=n Simple syscall: 0.7016 microseconds

So, roughly 0.9% overhead growth for a no-op syscall, which is very
manageable. And for people that don't want this, it's off by default.

There are two gotchas with using the alloca() trick. First,
compilers that have Stack Clash protection (-fstack-clash-protection)
enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
any dynamic stack allocations. While the randomization offset is
always less than a page, the resulting assembly would still contain
(unreachable!) probing routines, bloating the resulting assembly. To
avoid this, -fno-stack-clash-protection is unconditionally added to
the kernel Makefile since this is the only dynamic stack allocation in
the kernel (now that VLAs have been removed) and it is provably safe
from Stack Clash style attacks.

The second gotcha with alloca() is a negative interaction with
-fstack-protector*, in that it sees the alloca() as an array allocation,
which triggers the unconditional addition of the stack canary function
pre/post-amble which slows down syscalls regardless of the static
branch. In order to avoid adding this unneeded check and its associated
performance impact, architectures need to carefully remove uses of
-fstack-protector-strong (or -fstack-protector) in the compilation units
that use the add_random_kstack() macro and to audit the resulting stack
mitigation coverage (to make sure no desired coverage disappears). No
change is visible for this on x86 because the stack protector is already
unconditionally disabled for the compilation unit, but the change is
required on arm64. There is, unfortunately, no attribute that can be
used to disable stack protector for specific functions.

Comparison to PaX RANDKSTACK feature:

The RANDKSTACK feature randomizes the location of the stack start
(cpu_current_top_of_stack), i.e. including the location of pt_regs
structure itself on the stack. Initially this patch followed the same
approach, but during the recent discussions[2], it has been determined
to be of a little value since, if ptrace functionality is available for
an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
different offsets in the pt_regs struct, observe the cache behavior of
the pt_regs accesses, and figure out the random stack offset. Another
difference is that the random offset is stored in a per-cpu variable,
rather than having it be per-thread. As a result, these implementations
differ a fair bit in their implementation details and results, though
obviously the intent is similar.

[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
[2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
[3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html

Co-developed-by: Elena Reshetova
Signed-off-by: Elena Reshetova
Signed-off-by: Kees Cook
Signed-off-by: Thomas Gleixner
Reviewed-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org

Kees Cook
2021-04-08 20:05:19 +0800

05 Apr, 2021

1 commit

a72232eab cgroup: Add misc cgroup controller ... Browse Code »

The Miscellaneous cgroup provides the resource limiting and tracking
mechanism for the scalar resources which cannot be abstracted like the
other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC
config option.

A resource can be added to the controller via enum misc_res_type{} in
the include/linux/misc_cgroup.h file and the corresponding name via
misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the
resource must set its capacity prior to using the resource by calling
misc_cg_set_capacity().

Once a capacity is set then the resource usage can be updated using
charge and uncharge APIs. All of the APIs to interact with misc
controller are in include/linux/misc_cgroup.h.

Miscellaneous controller provides 3 interface files. If two misc
resources (res_a and res_b) are registered then:

misc.capacity
A read-only flat-keyed file shown only in the root cgroup. It shows
miscellaneous scalar resources available on the platform along with
their quantities::

$ cat misc.capacity
res_a 50
res_b 10

misc.current
A read-only flat-keyed file shown in the non-root cgroups. It shows
the current usage of the resources in the cgroup and its children::

$ cat misc.current
res_a 3
res_b 0

misc.max
A read-write flat-keyed file shown in the non root cgroups. Allowed
maximum usage of the resources in the cgroup and its children.::

$ cat misc.max
res_a max
res_b 4

Limit can be set by::

# echo res_a 1 > misc.max

Limit can be set to max by::

# echo res_a max > misc.max

Limits can be set more than the capacity value in the misc.capacity
file.

Signed-off-by: Vipin Sharma
Reviewed-by: David Rientjes
Signed-off-by: Tejun Heo

Vipin Sharma
2021-04-05 01:34:46 +0800

31 Mar, 2021

1 commit

64bdc0244 seccomp: Fix CONFIG tests for Seccomp_filters ... Browse Code »

Strictly speaking, seccomp filters are only used
when CONFIG_SECCOMP_FILTER.
This patch fixes the condition to enable "Seccomp_filters"
in /proc/$pid/status.

Signed-off-by: Kenta Tada
Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com

Kenta.Tada@sony.com
2021-03-31 13:33:50 +0800

15 Mar, 2021

1 commit

50eb842fe Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"28 patches.

Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
ia64"

* emailed patches from Andrew Morton : (28 commits)
zram: fix broken page writeback
zram: fix return value on writeback_store
mm/memcg: set memcg when splitting page
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
mm/userfaultfd: fix memory corruption due to writeprotect
kasan: fix KASAN_STACK dependency for HW_TAGS
kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
mm/madvise: replace ptrace attach requirement for process_madvise
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
kfence: fix reports if constant function prefixes exist
kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
kfence: fix printk format for ptrdiff_t
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
MAINTAINERS: exclude uapi directories in API/ABI section
binfmt_misc: fix possible deadlock in bm_register_write
mm/highmem.c: fix zero_user_segments() with start > end
hugetlb: do early cow when page pinned on src mm
mm: use is_cow_mapping() across tree where proper
...

Linus Torvalds
2021-03-15 03:23:34 +0800

14 Mar, 2021

1 commit

ea29b20a8 init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM ... Browse Code »

I read the commit log of the following two:

- bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML")
- 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390")

Both are talking about HAS_IOMEM dependency missing in many drivers.

So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.

This does not change the behavior of UML. UML still cannot enable
COMPILE_TEST because it does not provide HAS_IOMEM.

The current dependency for S390 is too strong. Under the condition of
CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.

I also removed the meaningless 'default n'.

Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada
Cc: Heiko Carstens
Cc: Guenter Roeck
Cc: Arnd Bergmann
Cc: Kees Cook
Cc: Daniel Borkmann
Cc: Johannes Weiner
Cc: KP Singh
Cc: Nathan Chancellor
Cc: Nick Terrell
Cc: Quentin Perret
Cc: Valentin Schneider
Cc: "Enrico Weigelt, metux IT consult"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2021-03-14 03:27:30 +0800

11 Mar, 2021

1 commit

ce6ed1c4c kbuild: rebuild GCC plugins when the compiler is upgraded ... Browse Code »

Linus reported a build error due to the GCC plugin incompatibility
when the compiler is upgraded. [1]

GCC plugins are tied to a particular GCC version. So, they must be
rebuilt when the compiler is upgraded.

This seems to be a long-standing flaw since the initial support of
GCC plugins.

Extend commit 8b59cd81dc5e ("kbuild: ensure full rebuild when the
compiler is updated"), so that GCC plugins are covered by the
compiler upgrade detection.

[1]: https://lore.kernel.org/lkml/CAHk-=wieoN5ttOy7SnsGwZv+Fni3R6m-Ut=oxih6bbZ28G+4dw@mail.gmail.com/

Reported-by: Linus Torvalds
Signed-off-by: Masahiro Yamada
Reviewed-by: Kees Cook

Masahiro Yamada
2021-03-11 13:40:50 +0800

28 Feb, 2021

1 commit

a6aaeb841 kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO ... Browse Code »

Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
does not work as expected if the .config file has already specified
CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling
CONFIG_LTO_CLANG.

So, the user-supplied whitelist and LTO-specific white list must be
independent of each other.

I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO
handle whitelists in the same way.

Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Masahiro Yamada
Tested-by: Sedat Dilek

Masahiro Yamada
2021-02-28 14:19:21 +0800

27 Feb, 2021

7 commits

8b83369dd Merge tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux ... Browse Code »

Pull RISC-V updates from Palmer Dabbelt:
"A handful of new RISC-V related patches for this merge window:

- A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may
catch errors in new drivers.

- Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.

- NUMA support for RISC-V, which involves making the arm64 code
generic.

- Support for kasan on the vmalloc region.

- A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.

- Support for allocating ASIDs.

- Preliminary support for kernels larger than 128MiB.

- Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.

We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.

* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
riscv: Improve kasan population by using hugepages when possible
riscv: Improve kasan population function
riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
riscv: Improve kasan definitions
riscv: Get rid of MAX_EARLY_MAPPING_SIZE
soc: canaan: Sort the Makefile alphabetically
riscv: Disable KSAN_SANITIZE for vDSO
riscv: Remove unnecessary declaration
riscv: Add Canaan Kendryte K210 SD card defconfig
riscv: Update Canaan Kendryte K210 defconfig
riscv: Add Kendryte KD233 board device tree
riscv: Add SiPeed MAIXDUINO board device tree
riscv: Add SiPeed MAIX GO board device tree
riscv: Add SiPeed MAIX DOCK board device tree
riscv: Add SiPeed MAIX BiT board device tree
riscv: Update Canaan Kendryte K210 device tree
dt-bindings: add resets property to dw-apb-timer
dt-bindings: fix sifive gpio properties
dt-bindings: update sifive uart compatible string
dt-bindings: update sifive clint compatible string
...

Linus Torvalds
2021-02-27 02:28:35 +0800
dd23e8098 initramfs: panic with memory information ... Browse Code »

On systems with large amounts of reserved memory we may fail to
successfully complete unpack_to_rootfs() and be left with:

Kernel panic - not syncing: write error

this is not too helpful to understand what happened, so let's wrap the
panic() calls with a surrounding show_mem() such that we have a chance of
understanding the memory conditions leading to these allocation failures.

[akpm@linux-foundation.org: replace macro with C function]

Link: https://lkml.kernel.org/r/20210114231517.1854379-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli
Cc: Barret Rhoden
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Florian Fainelli
2021-02-27 01:41:05 +0800
d54ce6158 kgdb: fix to kill breakpoints on initmem after boot ... Browse Code »

Currently breakpoints in kernel .init.text section are not handled
correctly while allowing to remove them even after corresponding pages
have been freed.

Fix it via killing .init.text section breakpoints just prior to initmem
pages being freed.

Doug: "HW breakpoints aren't handled by this patch but it's probably
not such a big deal".

Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org
Signed-off-by: Sumit Garg
Suggested-by: Doug Anderson
Acked-by: Doug Anderson
Acked-by: Daniel Thompson
Tested-by: Daniel Thompson
Cc: Masami Hiramatsu
Cc: Steven Rostedt (VMware)
Cc: Jason Wessel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sumit Garg
2021-02-27 01:41:05 +0800
f9c8bc460 init/Kconfig: fix a typo in CC_VERSION_TEXT help text ... Browse Code »

s/compier/compiler/

Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury
Acked-by: Randy Dunlap
Reviewed-by: Nathan Chancellor
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bhaskar Chowdhury
2021-02-27 01:41:05 +0800
073a9ecb3 init/version.c: remove Version_<LINUX_VERSION_CODE> symbol ... Browse Code »

This code hunk creates a Version_ symbol if
CONFIG_KALLSYMS is disabled. For example, building the kernel v5.10 for
allnoconfig creates the following symbol:

$ nm vmlinux | grep Version_
c116b028 B Version_330240

There is no in-tree user of this symbol.

Commit 197dcffc8ba0 ("init/version.c: define version_string only if
CONFIG_KALLSYMS is not defined") mentions that Version_* is only used
with ksymoops.

However, a commit in the pre-git era [1] had added the statement,
"ksymoops is useless on 2.6. Please use the Oops in its original format".

That statement existed until commit 4eb9241127a0 ("Documentation:
admin-guide: update bug-hunting.rst") finally removed the stale
ksymoops information.

This symbol is no longer needed.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=ad68b2f085f5c79e4759ca2d13947b3c885ee831

Link: https://lkml.kernel.org/r/20210120033452.2895170-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada
Cc: Mauro Carvalho Chehab
Cc: Randy Dunlap
Cc: Daniel Guilak
Cc: Lee Revell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2021-02-27 01:41:04 +0800
e1fdc4033 lib: stackdepot: add support to disable stack depot ... Browse Code »

Add a kernel parameter stack_depot_disable to disable stack depot. So
that stack hash table doesn't consume any memory when stack depot is
disabled.

The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this
patch, stackdepot will consume the memory for the hashtable. By default,
it's 8M which is never trivial.

With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off,
stack_depot_disable in kernel command line, we could save the wasted
memory for the hashtable.

[akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]

Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org
Signed-off-by: Vinayak Menon
Signed-off-by: Vijayanand Jitta
Cc: Alexander Potapenko
Cc: Minchan Kim
Cc: Yogesh Lal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vijayanand Jitta
2021-02-27 01:41:04 +0800
0ce20dd84 mm: add Kernel Electric-Fence infrastructure ... Browse Code »

Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors. This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

---+-----------+-----------+-----------+-----------+-----------+---
| xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
| xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
| x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
| xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
| xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
| xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

[elver@google.com: fix parameter description for kfence_object_start()]
Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
[elver@google.com: avoid stalling work queue task without allocations]
Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
[elver@google.com: fix potential deadlock due to wake_up()]
Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
[elver@google.com: add option to use KFENCE without static keys]
Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
[elver@google.com: add missing copyright and description headers]
Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver
Signed-off-by: Alexander Potapenko
Reviewed-by: Dmitry Vyukov
Reviewed-by: SeongJae Park
Co-developed-by: Marco Elver
Reviewed-by: Jann Horn
Cc: "H. Peter Anvin"
Cc: Paul E. McKenney
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christopher Lameter
Cc: Dave Hansen
Cc: David Rientjes
Cc: Eric Dumazet
Cc: Greg Kroah-Hartman
Cc: Hillf Danton
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: Joonsoo Kim
Cc: Joern Engel
Cc: Kees Cook
Cc: Mark Rutland
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Vlastimil Babka
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2021-02-27 01:41:02 +0800

26 Feb, 2021

1 commit

6fbd6cf85 Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- Fix false-positive build warnings for ARCH=ia64 builds

- Optimize dictionary size for module compression with xz

- Check the compiler and linker versions in Kconfig

- Fix misuse of extra-y

- Support DWARF v5 debug info

- Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
exceeded the limit

- Add generic syscall{tbl,hdr}.sh for cleanups across arches

- Minor cleanups of genksyms

- Minor cleanups of Kconfig

* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
kbuild: remove deprecated 'always' and 'hostprogs-y/m'
kbuild: parse C= and M= before changing the working directory
kbuild: reuse this-makefile to define abs_srctree
kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
kconfig: omit --oldaskconfig option for 'make config'
kconfig: fix 'invalid option' for help option
kconfig: remove dead code in conf_askvalue()
kconfig: clean up nested if-conditionals in check_conf()
kconfig: Remove duplicate call to sym_get_string_value()
Makefile: Remove # characters from compiler string
Makefile: reuse CC_VERSION_TEXT
kbuild: check the minimum linker version in Kconfig
kbuild: remove ld-version macro
scripts: add generic syscallhdr.sh
scripts: add generic syscalltbl.sh
arch: syscalls: remove $(srctree)/ prefix from syscall tables
arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
gen_compile_commands: prune some directories
kbuild: simplify access to the kernel's version
...

Linus Torvalds
2021-02-26 02:17:31 +0800

25 Feb, 2021

4 commits

4c48faba5 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:
"A few small subsystems and some of MM.

172 patches.

Subsystems affected by this patch series: hexagon, scripts, ntfs,
ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap,
memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan,
pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
mempolicy, oom-kill, hugetlbfs, and migration)"

* emailed patches from Andrew Morton : (172 commits)
mm/migrate: remove unneeded semicolons
hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
hugetlbfs: fix some comment typos
hugetlbfs: correct some obsolete comments about inode i_mutex
hugetlbfs: make hugepage size conversion more readable
hugetlbfs: remove meaningless variable avoid_reserve
hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
hugetlbfs: remove special hugetlbfs_set_page_dirty()
mm/hugetlb: change hugetlb_reserve_pages() to type bool
mm, oom: fix a comment in dump_task()
mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
numa balancing: migrate on fault among multiple bound nodes
mm, compaction: make fast_isolate_freepages() stay within zone
mm/compaction: fix misbehaviors of fast_find_migrateblock()
mm/compaction: correct deferral logic for proactive compaction
mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
mm/compaction: remove rcu_read_lock during page compaction
z3fold: simplify the zhdr initialization code in init_z3fold_page()
...

Linus Torvalds
2021-02-25 08:20:38 +0800
fe2cce15d mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON ... Browse Code »

The boot param and config determine the value of memcg_sysfs_enabled,
which is unused since commit 10befea91b61 ("mm: memcg/slab: use a single
set of kmem_caches for all allocations") as there are no per-memcg kmem
caches anymore.

Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Reviewed-by: David Hildenbrand
Acked-by: Roman Gushchin
Acked-by: David Rientjes
Reviewed-by: Miaohe Lin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2021-02-25 05:38:27 +0800
c4fbde84f Merge tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki:
"Drop support for depercated platforms using SFI, drop the entire
support for SFI that has been long deprecated too and make some
janitorial changes on top of that (Andy Shevchenko)"

* tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/platform/intel-mid: Update Copyright year and drop file names
x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h
x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co.
x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h
x86/PCI: Describe @reg for type1_access_ok()
x86/PCI: Get rid of custom x86 model comparison
sfi: Remove framework for deprecated firmware
cpufreq: sfi-cpufreq: Remove driver for deprecated firmware
media: atomisp: Remove unused header
mfd: intel_msic: Remove driver for deprecated platform
x86/apb_timer: Remove driver for deprecated platform
x86/platform/intel-mid: Remove unused leftovers (vRTC)
x86/platform/intel-mid: Remove unused leftovers (msic)
x86/platform/intel-mid: Remove unused leftovers (msic_thermal)
x86/platform/intel-mid: Remove unused leftovers (msic_power_btn)
x86/platform/intel-mid: Remove unused leftovers (msic_gpio)
x86/platform/intel-mid: Remove unused leftovers (msic_battery)
x86/platform/intel-mid: Remove unused leftovers (msic_ocd)
x86/platform/intel-mid: Remove unused leftovers (msic_audio)
platform/x86: intel_scu_wdt: Drop mistakenly added const

Linus Torvalds
2021-02-25 02:35:29 +0800
a555bdd0c Kbuild: enable TRIM_UNUSED_KSYMS again, with some guarding ... Browse Code »

In commit 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") I
disabled this option because it's hugely expensive at build time, and I
questioned how much use it gets.

Several people piped up and convinced me it's actually useful, so
instead of disabling it entirely, it now depends on EXPERT and gets
disabled by COMPILE_TEST builds so that 'allmodconfig' style things
don't enable it.

I still hope somebody will take a look at the build time issue, because
as Arnd also noted:

"However, the combination of thinlto and trim indeed has a steep cost
in compile time, taking almost twice as long as a normal defconfig
(gc-sections makes it slightly faster)"

Cc: Masahiro Yamada
Cc: Arnd Bergmann
Cc: Jessica Yu
Cc: Cristoph Hellwig ,
Cc: Miroslav Benes
Cc: Emil Velikov
Signed-off-by: Linus Torvalds

Linus Torvalds
2021-02-25 00:57:06 +0800

24 Feb, 2021

3 commits

5cf0fd591 Kbuild: disable TRIM_UNUSED_KSYMS option ... Browse Code »

The removal of EXPORT_UNUSED_SYMBOL() in commit 367948220fce looks like
(and was sold as) a no-op, but it actually had a rather serious and
subtle side effect: the UNUSED_SYMBOLS option not only enabled the
removed (unused) functionality, it also _disabled_ the TRIM_UNUSED_KSYMS
functionality.

And it turns out that TRIM_UNUSED_KSYMS is a huge time waste, and takes
up a third of the kernel build time for me. For no actual upside, since
no distro is likely to ever be able to enable it (because they all
support external kernel modules).

Rather than re-enable EXPORT_UNUSED_SYMBOL, this just disables the
TRIM_UNUSED_KSYMS option by marking it broken. I'm tempted to just
remove the support entirely, but maybe somebody has a use-case and can
fix the behavior of it.

I could have just disabled it for COMPILE_TEST, but it really smells
like the TRIM_UNUSED_KSYMS option is badly done and not really useful,
so this takes the more direct approach - let's see if anybody ever
actually notices or complains.

Cc: Miroslav Benes
Cc: Emil Velikov
Cc: Christoph Hellwig
Cc: Jessica Yu
Fixes: 367948220fce ("module: remove EXPORT_UNUSED_SYMBOL*")
Signed-off-by: Linus Torvalds

Linus Torvalds
2021-02-24 04:21:58 +0800
21a6ab213 Merge tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux ... Browse Code »

Pull module updates from Jessica Yu:

- Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These
export types were introduced between 2006 - 2008. All the of the
unused symbols have been long removed and gpl future symbols were
converted to gpl quite a long time ago, and I don't believe these
export types have been used ever since. So, I think it should be safe
to retire those export types now (Christoph Hellwig)

- Refactor and clean up some aged code cruft in the module loader
(Christoph Hellwig)

- Build {,module_}kallsyms_on_each_symbol only when livepatching is
enabled, as it is the only caller (Christoph Hellwig)

- Unexport find_module() and module_mutex and fix the last module
callers to not rely on these anymore. Make module_mutex internal to
the module loader (Christoph Hellwig)

- Harden ELF checks on module load and validate ELF structures before
checking the module signature (Frank van der Linden)

- Fix undefined symbol warning for clang (Fangrui Song)

- Fix smatch warning (Dan Carpenter)

* tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: potential uninitialized return in module_kallsyms_on_each_symbol()
module: remove EXPORT_UNUSED_SYMBOL*
module: remove EXPORT_SYMBOL_GPL_FUTURE
module: move struct symsearch to module.c
module: pass struct find_symbol_args to find_symbol
module: merge each_symbol_section into find_symbol
module: remove each_symbol_in_section
module: mark module_mutex static
kallsyms: only build {,module_}kallsyms_on_each_symbol when required
kallsyms: refactor {,module_}kallsyms_on_each_symbol
module: use RCU to synchronize find_module
module: unexport find_module and module_mutex
drm: remove drm_fb_helper_modinit
powerpc/powernv: remove get_cxl_module
module: harden ELF info handling
module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols

Linus Torvalds
2021-02-24 02:15:33 +0800
79db4d229 Merge tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull clang LTO updates from Kees Cook:
"Clang Link Time Optimization.

This is built on the work done preparing for LTO by arm64 folks,
tracing folks, etc. This includes the core changes as well as the
remaining pieces for arm64 (LTO has been the default build method on
Android for about 3 years now, as it is the prerequisite for the
Control Flow Integrity protections).

While x86 LTO enablement is done, it depends on some pending objtool
clean-ups. It's possible that I'll send a "part 2" pull request for
LTO that includes x86 support.

For merge log posterity, and as detailed in commit dc5723b02e52
("kbuild: add support for Clang LTO"), here is the lt;dr to do an LTO
build:

make LLVM=1 LLVM_IAS=1 defconfig
scripts/config -e LTO_CLANG_THIN
make LLVM=1 LLVM_IAS=1

(To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-"
and "ARCH=arm64" to the "make" command lines.)

Summary:

- Clang LTO build infrastructure and arm64-specific enablement (Sami
Tolvanen)

- Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)"

* tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds
arm64: allow LTO to be selected
arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
arm64: vdso: disable LTO
drivers/misc/lkdtm: disable LTO for rodata.o
efi/libstub: disable LTO
scripts/mod: disable LTO for empty.c
modpost: lto: strip .lto from module names
PCI: Fix PREL32 relocations for LTO
init: lto: fix PREL32 relocations
init: lto: ensure initcall ordering
kbuild: lto: add a default list of used symbols
kbuild: lto: merge module sections
kbuild: lto: limit inlining
kbuild: lto: fix module versioning
kbuild: add support for Clang LTO
tracing: move function tracer options to Kconfig

Linus Torvalds
2021-02-24 01:28:51 +0800

23 Feb, 2021

1 commit

4b5f9254e Merge tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm ... Browse Code »

Pull kcmp kconfig update from Daniel Vetter:
"Make the kcmp syscall available independently of checkpoint/restore.

drm userspaces uses this, systemd uses this, so makes sense to pull it
out from the checkpoint-restore bundle.

Kees reviewed this from security pov and is happy with the final
version"

Link: https://lwn.net/Articles/845448/

* tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE

Linus Torvalds
2021-02-23 09:15:30 +0800

22 Feb, 2021

3 commits

02aff8592 kbuild: check the minimum linker version in Kconfig ... Browse Code »

Unify the two scripts/ld-version.sh and scripts/lld-version.sh, and
check the minimum linker version like scripts/cc-version.sh did.

I tested this script for some corner cases reported in the past:

- GNU ld version 2.25-15.fc23
as reported by commit 8083013fc320 ("ld-version: Fix it on Fedora")

- GNU ld (GNU Binutils) 2.20.1.20100303
as reported by commit 0d61ed17dd30 ("ld-version: Drop the 4th and
5th version components")

This script show an error message if the linker is too old:

$ make LD=ld.lld-9
SYNC include/config/auto.conf
***
*** Linker is too old.
*** Your LLD version: 9.0.1
*** Minimum LLD version: 10.0.1
***
scripts/Kconfig.include:50: Sorry, this linker is not supported.
make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
make[1]: *** [Makefile:600: syncconfig] Error 2
make: *** [Makefile:708: include/config/auto.conf] Error 2

I also moved the check for gold to this script, so gold is still rejected:

$ make LD=gold
SYNC include/config/auto.conf
gold linker is not supported as it is not capable of linking the kernel proper.
scripts/Kconfig.include:50: Sorry, this linker is not supported.
make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
make[1]: *** [Makefile:600: syncconfig] Error 2
make: *** [Makefile:708: include/config/auto.conf] Error 2

Thanks to David Laight for suggesting shell script improvements.

Signed-off-by: Masahiro Yamada
Acked-by: Nick Desaulniers
Reviewed-by: Nathan Chancellor
Tested-by: Nathan Chancellor

Masahiro Yamada
2021-02-22 07:22:04 +0800
657bd90c9 Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"Core scheduler updates:

- Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full), to allow
distros to build a PREEMPT kernel but fall back to close to
PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
a boot time selection.

There's also the /debug/sched_debug switch to do this runtime.

This feature is implemented via runtime patching (a new variant of
static calls).

The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.

( Note that the dynamic none/voluntary mode isn't 100% identical,
for example preempt-RCU is available in all cases, plus the
preempt count is maintained in all models, which has runtime
overhead even with the code patching. )

The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
majority of distributions, are supposed to be unaffected.

- Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.

In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address the
underlying issue of missed preemption events. These are the initial
fixes that should fix current incarnations of the bug.

- Clean up rbtree usage in the scheduler, by providing & using the
following consistent set of rbtree APIs:

partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached

total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found

- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two

- Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
single pass. This is a 4-commit series where each commit improves
one aspect of the idle sibling scan logic.

- Improve the cpufreq cooling driver by getting the effective CPU
utilization metrics from the scheduler

- Improve the fair scheduler's active load-balancing logic by
reducing the number of active LB attempts & lengthen the
load-balancing interval. This improves stress-ng mmapfork
performance.

- Fix CFS's estimated utilization (util_est) calculation bug that can
result in too high utilization values

Misc updates & fixes:

- Fix the HRTICK reprogramming & optimization feature

- Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code

- Reduce dl_add_task_root_domain() overhead

- Fix uprobes refcount bug

- Process pending softirqs in flush_smp_call_function_from_idle()

- Clean up task priority related defines, remove *USER_*PRIO and
USER_PRIO()

- Simplify the sched_init_numa() deduplication sort

- Documentation updates

- Fix EAS bug in update_misfit_status(), which degraded the quality
of energy-balancing

- Smaller cleanups"

* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
sched,x86: Allow !PREEMPT_DYNAMIC
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
entry: Explicitly flush pending rcuog wakeup before last rescheduling point
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
sched/features: Distinguish between NORMAL and DEADLINE hrtick
sched/features: Fix hrtick reprogramming
sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
uprobes: (Re)add missing get_uprobe() in __find_uprobe()
smp: Process pending softirqs in flush_smp_call_function_from_idle()
sched: Harden PREEMPT_DYNAMIC
static_call: Allow module use without exposing static_call_key
sched: Add /debug/sched_preempt
preempt/dynamic: Support dynamic preempt with preempt= boot option
preempt/dynamic: Provide irqentry_exit_cond_resched() static call
preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
preempt/dynamic: Provide cond_resched() and might_resched() static calls
preempt: Introduce CONFIG_PREEMPT_DYNAMIC
static_call: Provide DEFINE_STATIC_CALL_RET0()
...

Linus Torvalds
2021-02-22 04:35:04 +0800
24880bef4 Merge tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux ... Browse Code »

Pull oprofile and dcookies removal from Viresh Kumar:
"Remove oprofile and dcookies support

The 'oprofile' user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

The dcookies stuff is only used by the oprofile code. Now that
oprofile's support is getting removed from the kernel, there is no
need for dcookies as well.

Remove kernel's old oprofile and dcookies support"

* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
fs: Remove dcookies support
drivers: Remove CONFIG_OPROFILE support
arch: xtensa: Remove CONFIG_OPROFILE support
arch: x86: Remove CONFIG_OPROFILE support
arch: sparc: Remove CONFIG_OPROFILE support
arch: sh: Remove CONFIG_OPROFILE support
arch: s390: Remove CONFIG_OPROFILE support
arch: powerpc: Remove oprofile
arch: powerpc: Stop building and using oprofile
arch: parisc: Remove CONFIG_OPROFILE support
arch: mips: Remove CONFIG_OPROFILE support
arch: microblaze: Remove CONFIG_OPROFILE support
arch: ia64: Remove rest of perfmon support
arch: ia64: Remove CONFIG_OPROFILE support
arch: hexagon: Don't select HAVE_OPROFILE
arch: arc: Remove CONFIG_OPROFILE support
arch: arm: Remove CONFIG_OPROFILE support
arch: alpha: Remove CONFIG_OPROFILE support

Linus Torvalds
2021-02-22 02:40:34 +0800

19 Feb, 2021

1 commit

c72160fe0 initramfs: Provide a common initrd reserve function ... Browse Code »

Some architectures(eg, ARM and riscv) have similar logic to
check and reserve the memory of initrd, let's provide a common
function reserve_initrd_mem() to reduce duplicated code.

Signed-off-by: Kefeng Wang
Signed-off-by: Palmer Dabbelt

Kefeng Wang
2021-02-19 15:17:57 +0800

17 Feb, 2021

1 commit

ed3cd45f8 Merge tag 'v5.11' into sched/core, to pick up fixes & refresh the branch ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2021-02-17 21:04:39 +0800

16 Feb, 2021

3 commits

bfe3911a9 kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE ... Browse Code »

Userspace has discovered the functionality offered by SYS_kcmp and has
started to depend upon it. In particular, Mesa uses SYS_kcmp for
os_same_file_description() in order to identify when two fd (e.g. device
or dmabuf) point to the same struct file. Since they depend on it for
core functionality, lift SYS_kcmp out of the non-default
CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.

Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to
deduplicate the per-service file descriptor store.

Note that some distributions such as Ubuntu are already enabling
CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
Signed-off-by: Chris Wilson
Cc: Kees Cook
Cc: Andy Lutomirski
Cc: Will Drewry
Cc: Andrew Morton
Cc: Dave Airlie
Cc: Daniel Vetter
Cc: Lucas Stach
Cc: Rasmus Villemoes
Cc: Cyrill Gorcunov
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter # DRM depends on kcmp
Acked-by: Rasmus Villemoes # systemd uses kcmp
Reviewed-by: Cyrill Gorcunov
Reviewed-by: Kees Cook
Acked-by: Thomas Zimmermann
Signed-off-by: Daniel Vetter
Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk

Chris Wilson
2021-02-16 16:59:41 +0800
aec6c60a0 kbuild: check the minimum compiler version in Kconfig ... Browse Code »

Paul Gortmaker reported a regression in the GCC version check. [1]
If you use GCC 4.8, the build breaks before showing the error message
"error Sorry, your version of GCC is too old - please use 4.9 or newer."

I do not want to apply his fix-up since it implies we would not be able
to remove any cc-option test. Anyway, I admit checking the GCC version
in is too late.

Almost at the same time, Linus also suggested to move the compiler
version error to Kconfig time. [2]

I unified the two similar scripts, gcc-version.sh and clang-version.sh
into cc-version.sh. The old scripts invoked the compiler multiple times
(3 times for gcc-version.sh, 4 times for clang-version.sh). I refactored
the code so the new one invokes the compiler just once, and also tried
my best to use shell-builtin commands where possible.

The new script runs faster.

$ time ./scripts/clang-version.sh clang
120000

real 0m0.029s
user 0m0.012s
sys 0m0.021s

$ time ./scripts/cc-version.sh clang
Clang 120000

real 0m0.009s
user 0m0.006s
sys 0m0.004s

cc-version.sh also shows an error message if the compiler is too old:

$ make defconfig CC=clang-9
*** Default configuration is based on 'x86_64_defconfig'
***
*** Compiler is too old.
*** Your Clang version: 9.0.1
*** Minimum Clang version: 10.0.1
***
scripts/Kconfig.include:46: Sorry, this compiler is not supported.
make[1]: *** [scripts/kconfig/Makefile:81: defconfig] Error 1
make: *** [Makefile:602: defconfig] Error 2

The new script takes care of ICC because we have
although I am not sure if building the kernel with ICC is well-supported.

[1]: https://lore.kernel.org/r/20210110190807.134996-1-paul.gortmaker@windriver.com
[2]: https://lore.kernel.org/r/CAHk-=wh-+TMHPTFo1qs-MYyK7tZh-OQovA=pP3=e06aCVp6_kA@mail.gmail.com

Fixes: 87de84c9140e ("kbuild: remove cc-option test of -Werror=date-time")
Reported-by: Paul Gortmaker
Suggested-by: Linus Torvalds
Reviewed-by: Nick Desaulniers
Tested-by: Nick Desaulniers
Reviewed-by: Nathan Chancellor
Tested-by: Nathan Chancellor
Reviewed-by: Miguel Ojeda
Tested-by: Miguel Ojeda
Tested-by: Sedat Dilek
Signed-off-by: Masahiro Yamada

Masahiro Yamada
2021-02-16 11:01:32 +0800
4590d98f5 sfi: Remove framework for deprecated firmware ... Browse Code »

SFI-based platforms are gone. So does this framework.

This removes mention of SFI through the drivers and other code as well.

Signed-off-by: Andy Shevchenko
Reviewed-by: Hans de Goede
Acked-by: Linus Walleij
Signed-off-by: Rafael J. Wysocki

Andy Shevchenko
2021-02-16 03:09:46 +0800

08 Feb, 2021

1 commit

367948220 module: remove EXPORT_UNUSED_SYMBOL* ... Browse Code »

EXPORT_UNUSED_SYMBOL* is not actually used anywhere. Remove the
unused functionality as we generally just remove unused code anyway.

Reviewed-by: Miroslav Benes
Reviewed-by: Emil Velikov
Signed-off-by: Christoph Hellwig
Signed-off-by: Jessica Yu

Christoph Hellwig
2021-02-08 19:28:07 +0800

06 Feb, 2021

1 commit

55b6f763d init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov ... Browse Code »

On ARCH=um, loading a module doesn't result in its constructors getting
called, which breaks module gcov since the debugfs files are never
registered. On the other hand, in-kernel constructors have already been
called by the dynamic linker, so we can't call them again.

Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
selected, but avoiding the in-kernel constructor calls.

Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
really do want CONSTRUCTORS, just not kernel binary ones.

Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
Signed-off-by: Johannes Berg
Reviewed-by: Peter Oberparleiter
Cc: Arnd Bergmann
Cc: Jessica Yu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Berg
2021-02-06 03:03:47 +0800

30 Jan, 2021

1 commit

7e0a92204 fgraph: Initialize tracing_graph_pause at task creation ... Browse Code »

On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.

The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.

CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)

start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)

unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)

The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.

There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.

Cc: stable@vger.kernel.org
Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois@arm.com
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2021-01-30 04:07:32 +0800

29 Jan, 2021

1 commit

f8408264c drivers: Remove CONFIG_OPROFILE support ... Browse Code »

The "oprofile" user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

Remove kernel's old oprofile support.

Suggested-by: Christoph Hellwig
Suggested-by: Linus Torvalds
Signed-off-by: Viresh Kumar
Acked-by: Robert Richter
Acked-by: Paul E. McKenney #RCU
Acked-by: William Cohen
Acked-by: Al Viro
Acked-by: Thomas Gleixner

Viresh Kumar
2021-01-29 12:36:24 +0800

28 Jan, 2021

1 commit

432900f81 init/Kconfig: Correct thermal pressure help text ... Browse Code »

We're using arch_scale_thermal_pressure() to retrieve per CPU thermal
pressure.

Signed-off-by: Yue Hu
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Valentin Schneider
Link: https://lkml.kernel.org/r/20210127054451.1240-1-zbestahu@gmail.com

Yue Hu
2021-01-28 00:26:43 +0800