28 Apr, 2021

3 commits

  • Pull cgroup changes from Tejun Heo:
    "The only notable change is Vipin's new misc cgroup controller.

    This implements generic support for resources which can be controlled
    by simply counting and limiting the number of resource instances - ie
    there's X number of these on the system and this cgroup subtree can
    have upto Y of those.

    The first user is the address space IDs used for virtual machine
    memory encryption and expected future usages are similar - niche
    hardware features with concrete resource limits and simple usage
    models"

    * 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io()
    cgroup/cpuset: fix typos in comments
    cgroup: misc: mark dummy misc_cg_res_total_usage() static inline
    svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
    cgroup: Miscellaneous cgroup documentation.
    cgroup: Add misc cgroup controller

    Linus Torvalds
     
  • Pull CFI on arm64 support from Kees Cook:
    "This builds on last cycle's LTO work, and allows the arm64 kernels to
    be built with Clang's Control Flow Integrity feature. This feature has
    happily lived in Android kernels for almost 3 years[1], so I'm excited
    to have it ready for upstream.

    The wide diffstat is mainly due to the treewide fixing of mismatched
    list_sort prototypes. Other things in core kernel are to address
    various CFI corner cases. The largest code portion is the CFI runtime
    implementation itself (which will be shared by all architectures
    implementing support for CFI). The arm64 pieces are Acked by arm64
    maintainers rather than coming through the arm64 tree since carrying
    this tree over there was going to be awkward.

    CFI support for x86 is still under development, but is pretty close.
    There are a handful of corner cases on x86 that need some improvements
    to Clang and objtool, but otherwise works well.

    Summary:

    - Clean up list_sort prototypes (Sami Tolvanen)

    - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"

    * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    arm64: allow CONFIG_CFI_CLANG to be selected
    KVM: arm64: Disable CFI for nVHE
    arm64: ftrace: use function_nocfi for ftrace_call
    arm64: add __nocfi to __apply_alternatives
    arm64: add __nocfi to functions that jump to a physical address
    arm64: use function_nocfi with __pa_symbol
    arm64: implement function_nocfi
    psci: use function_nocfi for cpu_resume
    lkdtm: use function_nocfi
    treewide: Change list_sort to use const pointers
    bpf: disable CFI in dispatcher functions
    kallsyms: strip ThinLTO hashes from static functions
    kthread: use WARN_ON_FUNCTION_MISMATCH
    workqueue: use WARN_ON_FUNCTION_MISMATCH
    module: ensure __cfi_check alignment
    mm: add generic function_nocfi macro
    cfi: add __cficanonical
    add support for Clang CFI

    Linus Torvalds
     
  • Pull seccomp updates from Kees Cook:

    - Fix "cacheable" typo in comments (Cui GaoSheng)

    - Fix CONFIG for /proc/$pid/status Seccomp_filters (Kenta.Tada@sony.com)

    * tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    seccomp: Fix "cacheable" typo in comments
    seccomp: Fix CONFIG tests for Seccomp_filters

    Linus Torvalds
     

09 Apr, 2021

1 commit

  • This change adds support for Clang’s forward-edge Control Flow
    Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
    injects a runtime check before each indirect function call to ensure
    the target is a valid function with the correct static type. This
    restricts possible call targets and makes it more difficult for
    an attacker to exploit bugs that allow the modification of stored
    function pointers. For more details, see:

    https://clang.llvm.org/docs/ControlFlowIntegrity.html

    Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
    visibility to possible call targets. Kernel modules are supported
    with Clang’s cross-DSO CFI mode, which allows checking between
    independently compiled components.

    With CFI enabled, the compiler injects a __cfi_check() function into
    the kernel and each module for validating local call targets. For
    cross-module calls that cannot be validated locally, the compiler
    calls the global __cfi_slowpath_diag() function, which determines
    the target module and calls the correct __cfi_check() function. This
    patch includes a slowpath implementation that uses __module_address()
    to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
    shadow map that speeds up module look-ups by ~3x.

    Clang implements indirect call checking using jump tables and
    offers two methods of generating them. With canonical jump tables,
    the compiler renames each address-taken function to .cfi
    and points the original symbol to a jump table entry, which passes
    __cfi_check() validation. This isn’t compatible with stand-alone
    assembly code, which the compiler doesn’t instrument, and would
    result in indirect calls to assembly code to fail. Therefore, we
    default to using non-canonical jump tables instead, where the compiler
    generates a local jump table entry .cfi_jt for each
    address-taken function, and replaces all references to the function
    with the address of the jump table entry.

    Note that because non-canonical jump table addresses are local
    to each component, they break cross-module function address
    equality. Specifically, the address of a global function will be
    different in each module, as it's replaced with the address of a local
    jump table entry. If this address is passed to a different module,
    it won’t match the address of the same function taken there. This
    may break code that relies on comparing addresses passed from other
    components.

    CFI checking can be disabled in a function with the __nocfi attribute.
    Additionally, CFI can be disabled for an entire compilation unit by
    filtering out CC_FLAGS_CFI.

    By default, CFI failures result in a kernel panic to stop a potential
    exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
    kernel prints out a rate-limited warning instead, and allows execution
    to continue. This option is helpful for locating type mismatches, but
    should only be enabled during development.

    Signed-off-by: Sami Tolvanen
    Reviewed-by: Kees Cook
    Tested-by: Nathan Chancellor
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com

    Sami Tolvanen
     

08 Apr, 2021

1 commit

  • This provides the ability for architectures to enable kernel stack base
    address offset randomization. This feature is controlled by the boot
    param "randomize_kstack_offset=on/off", with its default value set by
    CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

    This feature is based on the original idea from the last public release
    of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
    All the credit for the original idea goes to the PaX team. Note that
    the design and implementation of this upstream randomize_kstack_offset
    feature differs greatly from the RANDKSTACK feature (see below).

    Reasoning for the feature:

    This feature aims to make harder the various stack-based attacks that
    rely on deterministic stack structure. We have had many such attacks in
    past (just to name few):

    https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
    https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
    https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

    As Linux kernel stack protections have been constantly improving
    (vmap-based stack allocation with guard pages, removal of thread_info,
    STACKLEAK), attackers have had to find new ways for their exploits
    to work. They have done so, continuing to rely on the kernel's stack
    determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
    were not relevant. For example, the following recent attacks would have
    been hampered if the stack offset was non-deterministic between syscalls:

    https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
    (page 70: targeting the pt_regs copy with linear stack overflow)

    https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
    (leaked stack address from one syscall as a target during next syscall)

    The main idea is that since the stack offset is randomized on each system
    call, it is harder for an attack to reliably land in any particular place
    on the thread stack, even with address exposures, as the stack base will
    change on the next syscall. Also, since randomization is performed after
    placing pt_regs, the ptrace-based approach[1] to discover the randomized
    offset during a long-running syscall should not be possible.

    Design description:

    During most of the kernel's execution, it runs on the "thread stack",
    which is pretty deterministic in its structure: it is fixed in size,
    and on every entry from userspace to kernel on a syscall the thread
    stack starts construction from an address fetched from the per-cpu
    cpu_current_top_of_stack variable. The first element to be pushed to the
    thread stack is the pt_regs struct that stores all required CPU registers
    and syscall parameters. Finally the specific syscall function is called,
    with the stack being used as the kernel executes the resulting request.

    The goal of randomize_kstack_offset feature is to add a random offset
    after the pt_regs has been pushed to the stack and before the rest of the
    thread stack is used during the syscall processing, and to change it every
    time a process issues a syscall. The source of randomness is currently
    architecture-defined (but x86 is using the low byte of rdtsc()). Future
    improvements for different entropy sources is possible, but out of scope
    for this patch. Further more, to add more unpredictability, new offsets
    are chosen at the end of syscalls (the timing of which should be less
    easy to measure from userspace than at syscall entry time), and stored
    in a per-CPU variable, so that the life of the value does not stay
    explicitly tied to a single task.

    As suggested by Andy Lutomirski, the offset is added using alloca()
    and an empty asm() statement with an output constraint, since it avoids
    changes to assembly syscall entry code, to the unwinder, and provides
    correct stack alignment as defined by the compiler.

    In order to make this available by default with zero performance impact
    for those that don't want it, it is boot-time selectable with static
    branches. This way, if the overhead is not wanted, it can just be
    left turned off with no performance impact.

    The generated assembly for x86_64 with GCC looks like this:

    ...
    ffffffff81003977: 65 8b 05 02 ea 00 7f mov %gs:0x7f00ea02(%rip),%eax
    # 12380
    ffffffff8100397e: 25 ff 03 00 00 and $0x3ff,%eax
    ffffffff81003983: 48 83 c0 0f add $0xf,%rax
    ffffffff81003987: 25 f8 07 00 00 and $0x7f8,%eax
    ffffffff8100398c: 48 29 c4 sub %rax,%rsp
    ffffffff8100398f: 48 8d 44 24 0f lea 0xf(%rsp),%rax
    ffffffff81003994: 48 83 e0 f0 and $0xfffffffffffffff0,%rax
    ...

    As a result of the above stack alignment, this patch introduces about
    5 bits of randomness after pt_regs is spilled to the thread stack on
    x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
    stack alignment). The amount of entropy could be adjusted based on how
    much of the stack space we wish to trade for security.

    My measure of syscall performance overhead (on x86_64):

    lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
    randomize_kstack_offset=y Simple syscall: 0.7082 microseconds
    randomize_kstack_offset=n Simple syscall: 0.7016 microseconds

    So, roughly 0.9% overhead growth for a no-op syscall, which is very
    manageable. And for people that don't want this, it's off by default.

    There are two gotchas with using the alloca() trick. First,
    compilers that have Stack Clash protection (-fstack-clash-protection)
    enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
    any dynamic stack allocations. While the randomization offset is
    always less than a page, the resulting assembly would still contain
    (unreachable!) probing routines, bloating the resulting assembly. To
    avoid this, -fno-stack-clash-protection is unconditionally added to
    the kernel Makefile since this is the only dynamic stack allocation in
    the kernel (now that VLAs have been removed) and it is provably safe
    from Stack Clash style attacks.

    The second gotcha with alloca() is a negative interaction with
    -fstack-protector*, in that it sees the alloca() as an array allocation,
    which triggers the unconditional addition of the stack canary function
    pre/post-amble which slows down syscalls regardless of the static
    branch. In order to avoid adding this unneeded check and its associated
    performance impact, architectures need to carefully remove uses of
    -fstack-protector-strong (or -fstack-protector) in the compilation units
    that use the add_random_kstack() macro and to audit the resulting stack
    mitigation coverage (to make sure no desired coverage disappears). No
    change is visible for this on x86 because the stack protector is already
    unconditionally disabled for the compilation unit, but the change is
    required on arm64. There is, unfortunately, no attribute that can be
    used to disable stack protector for specific functions.

    Comparison to PaX RANDKSTACK feature:

    The RANDKSTACK feature randomizes the location of the stack start
    (cpu_current_top_of_stack), i.e. including the location of pt_regs
    structure itself on the stack. Initially this patch followed the same
    approach, but during the recent discussions[2], it has been determined
    to be of a little value since, if ptrace functionality is available for
    an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
    different offsets in the pt_regs struct, observe the cache behavior of
    the pt_regs accesses, and figure out the random stack offset. Another
    difference is that the random offset is stored in a per-cpu variable,
    rather than having it be per-thread. As a result, these implementations
    differ a fair bit in their implementation details and results, though
    obviously the intent is similar.

    [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
    [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
    [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html

    Co-developed-by: Elena Reshetova
    Signed-off-by: Elena Reshetova
    Signed-off-by: Kees Cook
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org

    Kees Cook
     

05 Apr, 2021

1 commit

  • The Miscellaneous cgroup provides the resource limiting and tracking
    mechanism for the scalar resources which cannot be abstracted like the
    other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC
    config option.

    A resource can be added to the controller via enum misc_res_type{} in
    the include/linux/misc_cgroup.h file and the corresponding name via
    misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the
    resource must set its capacity prior to using the resource by calling
    misc_cg_set_capacity().

    Once a capacity is set then the resource usage can be updated using
    charge and uncharge APIs. All of the APIs to interact with misc
    controller are in include/linux/misc_cgroup.h.

    Miscellaneous controller provides 3 interface files. If two misc
    resources (res_a and res_b) are registered then:

    misc.capacity
    A read-only flat-keyed file shown only in the root cgroup. It shows
    miscellaneous scalar resources available on the platform along with
    their quantities::

    $ cat misc.capacity
    res_a 50
    res_b 10

    misc.current
    A read-only flat-keyed file shown in the non-root cgroups. It shows
    the current usage of the resources in the cgroup and its children::

    $ cat misc.current
    res_a 3
    res_b 0

    misc.max
    A read-write flat-keyed file shown in the non root cgroups. Allowed
    maximum usage of the resources in the cgroup and its children.::

    $ cat misc.max
    res_a max
    res_b 4

    Limit can be set by::

    # echo res_a 1 > misc.max

    Limit can be set to max by::

    # echo res_a max > misc.max

    Limits can be set more than the capacity value in the misc.capacity
    file.

    Signed-off-by: Vipin Sharma
    Reviewed-by: David Rientjes
    Signed-off-by: Tejun Heo

    Vipin Sharma
     

31 Mar, 2021

1 commit

  • Strictly speaking, seccomp filters are only used
    when CONFIG_SECCOMP_FILTER.
    This patch fixes the condition to enable "Seccomp_filters"
    in /proc/$pid/status.

    Signed-off-by: Kenta Tada
    Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com

    Kenta.Tada@sony.com
     

15 Mar, 2021

1 commit

  • Merge misc fixes from Andrew Morton:
    "28 patches.

    Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
    highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
    zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
    ia64"

    * emailed patches from Andrew Morton : (28 commits)
    zram: fix broken page writeback
    zram: fix return value on writeback_store
    mm/memcg: set memcg when splitting page
    mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
    ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
    ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
    mm/userfaultfd: fix memory corruption due to writeprotect
    kasan: fix KASAN_STACK dependency for HW_TAGS
    kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
    mm/madvise: replace ptrace attach requirement for process_madvise
    include/linux/sched/mm.h: use rcu_dereference in in_vfork()
    kfence: fix reports if constant function prefixes exist
    kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
    kfence: fix printk format for ptrdiff_t
    linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
    MAINTAINERS: exclude uapi directories in API/ABI section
    binfmt_misc: fix possible deadlock in bm_register_write
    mm/highmem.c: fix zero_user_segments() with start > end
    hugetlb: do early cow when page pinned on src mm
    mm: use is_cow_mapping() across tree where proper
    ...

    Linus Torvalds
     

14 Mar, 2021

1 commit

  • I read the commit log of the following two:

    - bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML")
    - 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390")

    Both are talking about HAS_IOMEM dependency missing in many drivers.

    So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.

    This does not change the behavior of UML. UML still cannot enable
    COMPILE_TEST because it does not provide HAS_IOMEM.

    The current dependency for S390 is too strong. Under the condition of
    CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.

    I also removed the meaningless 'default n'.

    Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org
    Signed-off-by: Masahiro Yamada
    Cc: Heiko Carstens
    Cc: Guenter Roeck
    Cc: Arnd Bergmann
    Cc: Kees Cook
    Cc: Daniel Borkmann
    Cc: Johannes Weiner
    Cc: KP Singh
    Cc: Nathan Chancellor
    Cc: Nick Terrell
    Cc: Quentin Perret
    Cc: Valentin Schneider
    Cc: "Enrico Weigelt, metux IT consult"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

11 Mar, 2021

1 commit

  • Linus reported a build error due to the GCC plugin incompatibility
    when the compiler is upgraded. [1]

    GCC plugins are tied to a particular GCC version. So, they must be
    rebuilt when the compiler is upgraded.

    This seems to be a long-standing flaw since the initial support of
    GCC plugins.

    Extend commit 8b59cd81dc5e ("kbuild: ensure full rebuild when the
    compiler is updated"), so that GCC plugins are covered by the
    compiler upgrade detection.

    [1]: https://lore.kernel.org/lkml/CAHk-=wieoN5ttOy7SnsGwZv+Fni3R6m-Ut=oxih6bbZ28G+4dw@mail.gmail.com/

    Reported-by: Linus Torvalds
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Kees Cook

    Masahiro Yamada
     

28 Feb, 2021

1 commit

  • Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
    does not work as expected if the .config file has already specified
    CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling
    CONFIG_LTO_CLANG.

    So, the user-supplied whitelist and LTO-specific white list must be
    independent of each other.

    I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO
    handle whitelists in the same way.

    Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
    Signed-off-by: Masahiro Yamada
    Tested-by: Sedat Dilek

    Masahiro Yamada
     

27 Feb, 2021

7 commits

  • Pull RISC-V updates from Palmer Dabbelt:
    "A handful of new RISC-V related patches for this merge window:

    - A check to ensure drivers are properly using uaccess. This isn't
    manifesting with any of the drivers I'm currently using, but may
    catch errors in new drivers.

    - Some preliminary support for the FU740, along with the HiFive
    Unleashed it will appear on.

    - NUMA support for RISC-V, which involves making the arm64 code
    generic.

    - Support for kasan on the vmalloc region.

    - A handful of new drivers for the Kendryte K210, along with the DT
    plumbing required to boot on a handful of K210-based boards.

    - Support for allocating ASIDs.

    - Preliminary support for kernels larger than 128MiB.

    - Various other improvements to our KASAN support, including the
    utilization of huge pages when allocating the KASAN regions.

    We may have already found a bug with the KASAN_VMALLOC code, but it's
    passing my tests. There's a fix in the works, but that will probably
    miss the merge window.

    * tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
    riscv: Improve kasan population by using hugepages when possible
    riscv: Improve kasan population function
    riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
    riscv: Improve kasan definitions
    riscv: Get rid of MAX_EARLY_MAPPING_SIZE
    soc: canaan: Sort the Makefile alphabetically
    riscv: Disable KSAN_SANITIZE for vDSO
    riscv: Remove unnecessary declaration
    riscv: Add Canaan Kendryte K210 SD card defconfig
    riscv: Update Canaan Kendryte K210 defconfig
    riscv: Add Kendryte KD233 board device tree
    riscv: Add SiPeed MAIXDUINO board device tree
    riscv: Add SiPeed MAIX GO board device tree
    riscv: Add SiPeed MAIX DOCK board device tree
    riscv: Add SiPeed MAIX BiT board device tree
    riscv: Update Canaan Kendryte K210 device tree
    dt-bindings: add resets property to dw-apb-timer
    dt-bindings: fix sifive gpio properties
    dt-bindings: update sifive uart compatible string
    dt-bindings: update sifive clint compatible string
    ...

    Linus Torvalds
     
  • On systems with large amounts of reserved memory we may fail to
    successfully complete unpack_to_rootfs() and be left with:

    Kernel panic - not syncing: write error

    this is not too helpful to understand what happened, so let's wrap the
    panic() calls with a surrounding show_mem() such that we have a chance of
    understanding the memory conditions leading to these allocation failures.

    [akpm@linux-foundation.org: replace macro with C function]

    Link: https://lkml.kernel.org/r/20210114231517.1854379-1-f.fainelli@gmail.com
    Signed-off-by: Florian Fainelli
    Cc: Barret Rhoden
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • Currently breakpoints in kernel .init.text section are not handled
    correctly while allowing to remove them even after corresponding pages
    have been freed.

    Fix it via killing .init.text section breakpoints just prior to initmem
    pages being freed.

    Doug: "HW breakpoints aren't handled by this patch but it's probably
    not such a big deal".

    Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org
    Signed-off-by: Sumit Garg
    Suggested-by: Doug Anderson
    Acked-by: Doug Anderson
    Acked-by: Daniel Thompson
    Tested-by: Daniel Thompson
    Cc: Masami Hiramatsu
    Cc: Steven Rostedt (VMware)
    Cc: Jason Wessel
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sumit Garg
     
  • s/compier/compiler/

    Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com
    Signed-off-by: Bhaskar Chowdhury
    Acked-by: Randy Dunlap
    Reviewed-by: Nathan Chancellor
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaskar Chowdhury
     
  • This code hunk creates a Version_ symbol if
    CONFIG_KALLSYMS is disabled. For example, building the kernel v5.10 for
    allnoconfig creates the following symbol:

    $ nm vmlinux | grep Version_
    c116b028 B Version_330240

    There is no in-tree user of this symbol.

    Commit 197dcffc8ba0 ("init/version.c: define version_string only if
    CONFIG_KALLSYMS is not defined") mentions that Version_* is only used
    with ksymoops.

    However, a commit in the pre-git era [1] had added the statement,
    "ksymoops is useless on 2.6. Please use the Oops in its original format".

    That statement existed until commit 4eb9241127a0 ("Documentation:
    admin-guide: update bug-hunting.rst") finally removed the stale
    ksymoops information.

    This symbol is no longer needed.

    [1] https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=ad68b2f085f5c79e4759ca2d13947b3c885ee831

    Link: https://lkml.kernel.org/r/20210120033452.2895170-1-masahiroy@kernel.org
    Signed-off-by: Masahiro Yamada
    Cc: Mauro Carvalho Chehab
    Cc: Randy Dunlap
    Cc: Daniel Guilak
    Cc: Lee Revell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Add a kernel parameter stack_depot_disable to disable stack depot. So
    that stack hash table doesn't consume any memory when stack depot is
    disabled.

    The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this
    patch, stackdepot will consume the memory for the hashtable. By default,
    it's 8M which is never trivial.

    With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off,
    stack_depot_disable in kernel command line, we could save the wasted
    memory for the hashtable.

    [akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]

    Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org
    Signed-off-by: Vinayak Menon
    Signed-off-by: Vijayanand Jitta
    Cc: Alexander Potapenko
    Cc: Minchan Kim
    Cc: Yogesh Lal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vijayanand Jitta
     
  • Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

    This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
    low-overhead sampling-based memory safety error detector of heap
    use-after-free, invalid-free, and out-of-bounds access errors. This
    series enables KFENCE for the x86 and arm64 architectures, and adds
    KFENCE hooks to the SLAB and SLUB allocators.

    KFENCE is designed to be enabled in production kernels, and has near
    zero performance overhead. Compared to KASAN, KFENCE trades performance
    for precision. The main motivation behind KFENCE's design, is that with
    enough total uptime KFENCE will detect bugs in code paths not typically
    exercised by non-production test workloads. One way to quickly achieve a
    large enough total uptime is when the tool is deployed across a large
    fleet of machines.

    KFENCE objects each reside on a dedicated page, at either the left or
    right page boundaries. The pages to the left and right of the object
    page are "guard pages", whose attributes are changed to a protected
    state, and cause page faults on any attempted access to them. Such page
    faults are then intercepted by KFENCE, which handles the fault
    gracefully by reporting a memory access error.

    Guarded allocations are set up based on a sample interval (can be set
    via kfence.sample_interval). After expiration of the sample interval,
    the next allocation through the main allocator (SLAB or SLUB) returns a
    guarded allocation from the KFENCE object pool. At this point, the timer
    is reset, and the next allocation is set up after the expiration of the
    interval.

    To enable/disable a KFENCE allocation through the main allocator's
    fast-path without overhead, KFENCE relies on static branches via the
    static keys infrastructure. The static branch is toggled to redirect the
    allocation to KFENCE.

    The KFENCE memory pool is of fixed size, and if the pool is exhausted no
    further KFENCE allocations occur. The default config is conservative
    with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
    pages).

    We have verified by running synthetic benchmarks (sysbench I/O,
    hackbench) and production server-workload benchmarks that a kernel with
    KFENCE (using sample intervals 100-500ms) is performance-neutral
    compared to a non-KFENCE baseline kernel.

    KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
    properties. The name "KFENCE" is a homage to the Electric Fence Malloc
    Debugger [2].

    For more details, see Documentation/dev-tools/kfence.rst added in the
    series -- also viewable here:

    https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

    [1] http://llvm.org/docs/GwpAsan.html
    [2] https://linux.die.net/man/3/efence

    This patch (of 9):

    This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
    low-overhead sampling-based memory safety error detector of heap
    use-after-free, invalid-free, and out-of-bounds access errors.

    KFENCE is designed to be enabled in production kernels, and has near
    zero performance overhead. Compared to KASAN, KFENCE trades performance
    for precision. The main motivation behind KFENCE's design, is that with
    enough total uptime KFENCE will detect bugs in code paths not typically
    exercised by non-production test workloads. One way to quickly achieve a
    large enough total uptime is when the tool is deployed across a large
    fleet of machines.

    KFENCE objects each reside on a dedicated page, at either the left or
    right page boundaries. The pages to the left and right of the object
    page are "guard pages", whose attributes are changed to a protected
    state, and cause page faults on any attempted access to them. Such page
    faults are then intercepted by KFENCE, which handles the fault
    gracefully by reporting a memory access error. To detect out-of-bounds
    writes to memory within the object's page itself, KFENCE also uses
    pattern-based redzones. The following figure illustrates the page
    layout:

    ---+-----------+-----------+-----------+-----------+-----------+---
    | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
    | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
    | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
    | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
    | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
    | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
    ---+-----------+-----------+-----------+-----------+-----------+---

    Guarded allocations are set up based on a sample interval (can be set
    via kfence.sample_interval). After expiration of the sample interval, a
    guarded allocation from the KFENCE object pool is returned to the main
    allocator (SLAB or SLUB). At this point, the timer is reset, and the
    next allocation is set up after the expiration of the interval.

    To enable/disable a KFENCE allocation through the main allocator's
    fast-path without overhead, KFENCE relies on static branches via the
    static keys infrastructure. The static branch is toggled to redirect the
    allocation to KFENCE. To date, we have verified by running synthetic
    benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
    is performance-neutral compared to the non-KFENCE baseline.

    For more details, see Documentation/dev-tools/kfence.rst (added later in
    the series).

    [elver@google.com: fix parameter description for kfence_object_start()]
    Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
    [elver@google.com: avoid stalling work queue task without allocations]
    Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
    Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
    [elver@google.com: fix potential deadlock due to wake_up()]
    Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
    Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
    [elver@google.com: add option to use KFENCE without static keys]
    Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
    [elver@google.com: add missing copyright and description headers]
    Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com

    Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
    Signed-off-by: Marco Elver
    Signed-off-by: Alexander Potapenko
    Reviewed-by: Dmitry Vyukov
    Reviewed-by: SeongJae Park
    Co-developed-by: Marco Elver
    Reviewed-by: Jann Horn
    Cc: "H. Peter Anvin"
    Cc: Paul E. McKenney
    Cc: Andrey Konovalov
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christopher Lameter
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Eric Dumazet
    Cc: Greg Kroah-Hartman
    Cc: Hillf Danton
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Joonsoo Kim
    Cc: Joern Engel
    Cc: Kees Cook
    Cc: Mark Rutland
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

26 Feb, 2021

1 commit

  • Pull Kbuild updates from Masahiro Yamada:

    - Fix false-positive build warnings for ARCH=ia64 builds

    - Optimize dictionary size for module compression with xz

    - Check the compiler and linker versions in Kconfig

    - Fix misuse of extra-y

    - Support DWARF v5 debug info

    - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
    exceeded the limit

    - Add generic syscall{tbl,hdr}.sh for cleanups across arches

    - Minor cleanups of genksyms

    - Minor cleanups of Kconfig

    * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
    initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
    kbuild: remove deprecated 'always' and 'hostprogs-y/m'
    kbuild: parse C= and M= before changing the working directory
    kbuild: reuse this-makefile to define abs_srctree
    kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
    kconfig: omit --oldaskconfig option for 'make config'
    kconfig: fix 'invalid option' for help option
    kconfig: remove dead code in conf_askvalue()
    kconfig: clean up nested if-conditionals in check_conf()
    kconfig: Remove duplicate call to sym_get_string_value()
    Makefile: Remove # characters from compiler string
    Makefile: reuse CC_VERSION_TEXT
    kbuild: check the minimum linker version in Kconfig
    kbuild: remove ld-version macro
    scripts: add generic syscallhdr.sh
    scripts: add generic syscalltbl.sh
    arch: syscalls: remove $(srctree)/ prefix from syscall tables
    arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
    gen_compile_commands: prune some directories
    kbuild: simplify access to the kernel's version
    ...

    Linus Torvalds
     

25 Feb, 2021

4 commits

  • Merge misc updates from Andrew Morton:
    "A few small subsystems and some of MM.

    172 patches.

    Subsystems affected by this patch series: hexagon, scripts, ntfs,
    ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap,
    memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan,
    pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
    mempolicy, oom-kill, hugetlbfs, and migration)"

    * emailed patches from Andrew Morton : (172 commits)
    mm/migrate: remove unneeded semicolons
    hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
    hugetlbfs: fix some comment typos
    hugetlbfs: correct some obsolete comments about inode i_mutex
    hugetlbfs: make hugepage size conversion more readable
    hugetlbfs: remove meaningless variable avoid_reserve
    hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
    hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
    hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
    hugetlbfs: remove special hugetlbfs_set_page_dirty()
    mm/hugetlb: change hugetlb_reserve_pages() to type bool
    mm, oom: fix a comment in dump_task()
    mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
    numa balancing: migrate on fault among multiple bound nodes
    mm, compaction: make fast_isolate_freepages() stay within zone
    mm/compaction: fix misbehaviors of fast_find_migrateblock()
    mm/compaction: correct deferral logic for proactive compaction
    mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
    mm/compaction: remove rcu_read_lock during page compaction
    z3fold: simplify the zhdr initialization code in init_z3fold_page()
    ...

    Linus Torvalds
     
  • The boot param and config determine the value of memcg_sysfs_enabled,
    which is unused since commit 10befea91b61 ("mm: memcg/slab: use a single
    set of kmem_caches for all allocations") as there are no per-memcg kmem
    caches anymore.

    Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reviewed-by: David Hildenbrand
    Acked-by: Roman Gushchin
    Acked-by: David Rientjes
    Reviewed-by: Miaohe Lin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki:
    "Drop support for depercated platforms using SFI, drop the entire
    support for SFI that has been long deprecated too and make some
    janitorial changes on top of that (Andy Shevchenko)"

    * tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    x86/platform/intel-mid: Update Copyright year and drop file names
    x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h
    x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co.
    x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h
    x86/PCI: Describe @reg for type1_access_ok()
    x86/PCI: Get rid of custom x86 model comparison
    sfi: Remove framework for deprecated firmware
    cpufreq: sfi-cpufreq: Remove driver for deprecated firmware
    media: atomisp: Remove unused header
    mfd: intel_msic: Remove driver for deprecated platform
    x86/apb_timer: Remove driver for deprecated platform
    x86/platform/intel-mid: Remove unused leftovers (vRTC)
    x86/platform/intel-mid: Remove unused leftovers (msic)
    x86/platform/intel-mid: Remove unused leftovers (msic_thermal)
    x86/platform/intel-mid: Remove unused leftovers (msic_power_btn)
    x86/platform/intel-mid: Remove unused leftovers (msic_gpio)
    x86/platform/intel-mid: Remove unused leftovers (msic_battery)
    x86/platform/intel-mid: Remove unused leftovers (msic_ocd)
    x86/platform/intel-mid: Remove unused leftovers (msic_audio)
    platform/x86: intel_scu_wdt: Drop mistakenly added const

    Linus Torvalds
     
  • In commit 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") I
    disabled this option because it's hugely expensive at build time, and I
    questioned how much use it gets.

    Several people piped up and convinced me it's actually useful, so
    instead of disabling it entirely, it now depends on EXPERT and gets
    disabled by COMPILE_TEST builds so that 'allmodconfig' style things
    don't enable it.

    I still hope somebody will take a look at the build time issue, because
    as Arnd also noted:

    "However, the combination of thinlto and trim indeed has a steep cost
    in compile time, taking almost twice as long as a normal defconfig
    (gc-sections makes it slightly faster)"

    Cc: Masahiro Yamada
    Cc: Arnd Bergmann
    Cc: Jessica Yu
    Cc: Cristoph Hellwig ,
    Cc: Miroslav Benes
    Cc: Emil Velikov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Feb, 2021

3 commits

  • The removal of EXPORT_UNUSED_SYMBOL() in commit 367948220fce looks like
    (and was sold as) a no-op, but it actually had a rather serious and
    subtle side effect: the UNUSED_SYMBOLS option not only enabled the
    removed (unused) functionality, it also _disabled_ the TRIM_UNUSED_KSYMS
    functionality.

    And it turns out that TRIM_UNUSED_KSYMS is a huge time waste, and takes
    up a third of the kernel build time for me. For no actual upside, since
    no distro is likely to ever be able to enable it (because they all
    support external kernel modules).

    Rather than re-enable EXPORT_UNUSED_SYMBOL, this just disables the
    TRIM_UNUSED_KSYMS option by marking it broken. I'm tempted to just
    remove the support entirely, but maybe somebody has a use-case and can
    fix the behavior of it.

    I could have just disabled it for COMPILE_TEST, but it really smells
    like the TRIM_UNUSED_KSYMS option is badly done and not really useful,
    so this takes the more direct approach - let's see if anybody ever
    actually notices or complains.

    Cc: Miroslav Benes
    Cc: Emil Velikov
    Cc: Christoph Hellwig
    Cc: Jessica Yu
    Fixes: 367948220fce ("module: remove EXPORT_UNUSED_SYMBOL*")
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull module updates from Jessica Yu:

    - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These
    export types were introduced between 2006 - 2008. All the of the
    unused symbols have been long removed and gpl future symbols were
    converted to gpl quite a long time ago, and I don't believe these
    export types have been used ever since. So, I think it should be safe
    to retire those export types now (Christoph Hellwig)

    - Refactor and clean up some aged code cruft in the module loader
    (Christoph Hellwig)

    - Build {,module_}kallsyms_on_each_symbol only when livepatching is
    enabled, as it is the only caller (Christoph Hellwig)

    - Unexport find_module() and module_mutex and fix the last module
    callers to not rely on these anymore. Make module_mutex internal to
    the module loader (Christoph Hellwig)

    - Harden ELF checks on module load and validate ELF structures before
    checking the module signature (Frank van der Linden)

    - Fix undefined symbol warning for clang (Fangrui Song)

    - Fix smatch warning (Dan Carpenter)

    * tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    module: potential uninitialized return in module_kallsyms_on_each_symbol()
    module: remove EXPORT_UNUSED_SYMBOL*
    module: remove EXPORT_SYMBOL_GPL_FUTURE
    module: move struct symsearch to module.c
    module: pass struct find_symbol_args to find_symbol
    module: merge each_symbol_section into find_symbol
    module: remove each_symbol_in_section
    module: mark module_mutex static
    kallsyms: only build {,module_}kallsyms_on_each_symbol when required
    kallsyms: refactor {,module_}kallsyms_on_each_symbol
    module: use RCU to synchronize find_module
    module: unexport find_module and module_mutex
    drm: remove drm_fb_helper_modinit
    powerpc/powernv: remove get_cxl_module
    module: harden ELF info handling
    module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols

    Linus Torvalds
     
  • Pull clang LTO updates from Kees Cook:
    "Clang Link Time Optimization.

    This is built on the work done preparing for LTO by arm64 folks,
    tracing folks, etc. This includes the core changes as well as the
    remaining pieces for arm64 (LTO has been the default build method on
    Android for about 3 years now, as it is the prerequisite for the
    Control Flow Integrity protections).

    While x86 LTO enablement is done, it depends on some pending objtool
    clean-ups. It's possible that I'll send a "part 2" pull request for
    LTO that includes x86 support.

    For merge log posterity, and as detailed in commit dc5723b02e52
    ("kbuild: add support for Clang LTO"), here is the lt;dr to do an LTO
    build:

    make LLVM=1 LLVM_IAS=1 defconfig
    scripts/config -e LTO_CLANG_THIN
    make LLVM=1 LLVM_IAS=1

    (To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-"
    and "ARCH=arm64" to the "make" command lines.)

    Summary:

    - Clang LTO build infrastructure and arm64-specific enablement (Sami
    Tolvanen)

    - Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)"

    * tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds
    arm64: allow LTO to be selected
    arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
    arm64: vdso: disable LTO
    drivers/misc/lkdtm: disable LTO for rodata.o
    efi/libstub: disable LTO
    scripts/mod: disable LTO for empty.c
    modpost: lto: strip .lto from module names
    PCI: Fix PREL32 relocations for LTO
    init: lto: fix PREL32 relocations
    init: lto: ensure initcall ordering
    kbuild: lto: add a default list of used symbols
    kbuild: lto: merge module sections
    kbuild: lto: limit inlining
    kbuild: lto: fix module versioning
    kbuild: add support for Clang LTO
    tracing: move function tracer options to Kconfig

    Linus Torvalds
     

23 Feb, 2021

1 commit

  • Pull kcmp kconfig update from Daniel Vetter:
    "Make the kcmp syscall available independently of checkpoint/restore.

    drm userspaces uses this, systemd uses this, so makes sense to pull it
    out from the checkpoint-restore bundle.

    Kees reviewed this from security pov and is happy with the final
    version"

    Link: https://lwn.net/Articles/845448/

    * tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
    kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE

    Linus Torvalds
     

22 Feb, 2021

3 commits

  • Unify the two scripts/ld-version.sh and scripts/lld-version.sh, and
    check the minimum linker version like scripts/cc-version.sh did.

    I tested this script for some corner cases reported in the past:

    - GNU ld version 2.25-15.fc23
    as reported by commit 8083013fc320 ("ld-version: Fix it on Fedora")

    - GNU ld (GNU Binutils) 2.20.1.20100303
    as reported by commit 0d61ed17dd30 ("ld-version: Drop the 4th and
    5th version components")

    This script show an error message if the linker is too old:

    $ make LD=ld.lld-9
    SYNC include/config/auto.conf
    ***
    *** Linker is too old.
    *** Your LLD version: 9.0.1
    *** Minimum LLD version: 10.0.1
    ***
    scripts/Kconfig.include:50: Sorry, this linker is not supported.
    make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
    make[1]: *** [Makefile:600: syncconfig] Error 2
    make: *** [Makefile:708: include/config/auto.conf] Error 2

    I also moved the check for gold to this script, so gold is still rejected:

    $ make LD=gold
    SYNC include/config/auto.conf
    gold linker is not supported as it is not capable of linking the kernel proper.
    scripts/Kconfig.include:50: Sorry, this linker is not supported.
    make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
    make[1]: *** [Makefile:600: syncconfig] Error 2
    make: *** [Makefile:708: include/config/auto.conf] Error 2

    Thanks to David Laight for suggesting shell script improvements.

    Signed-off-by: Masahiro Yamada
    Acked-by: Nick Desaulniers
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor

    Masahiro Yamada
     
  • Pull scheduler updates from Ingo Molnar:
    "Core scheduler updates:

    - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
    preempt=none/voluntary/full boot options (default: full), to allow
    distros to build a PREEMPT kernel but fall back to close to
    PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
    a boot time selection.

    There's also the /debug/sched_debug switch to do this runtime.

    This feature is implemented via runtime patching (a new variant of
    static calls).

    The scope of the runtime patching can be best reviewed by looking
    at the sched_dynamic_update() function in kernel/sched/core.c.

    ( Note that the dynamic none/voluntary mode isn't 100% identical,
    for example preempt-RCU is available in all cases, plus the
    preempt count is maintained in all models, which has runtime
    overhead even with the code patching. )

    The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
    majority of distributions, are supposed to be unaffected.

    - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
    was found via rcutorture triggering a hang. The bug is that
    rcu_idle_enter() may wake up a NOCB kthread, but this happens after
    the last generic need_resched() check. Some cpuidle drivers fix it
    by chance but many others don't.

    In true 2020 fashion the original bug fix has grown into a 5-patch
    scheduler/RCU fix series plus another 16 RCU patches to address the
    underlying issue of missed preemption events. These are the initial
    fixes that should fix current incarnations of the bug.

    - Clean up rbtree usage in the scheduler, by providing & using the
    following consistent set of rbtree APIs:

    partial-order; less() based:
    - rb_add(): add a new entry to the rbtree
    - rb_add_cached(): like rb_add(), but for a rb_root_cached

    total-order; cmp() based:
    - rb_find(): find an entry in an rbtree
    - rb_find_add(): find an entry, and add if not found

    - rb_find_first(): find the first (leftmost) matching entry
    - rb_next_match(): continue from rb_find_first()
    - rb_for_each(): iterate a sub-tree using the previous two

    - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
    single pass. This is a 4-commit series where each commit improves
    one aspect of the idle sibling scan logic.

    - Improve the cpufreq cooling driver by getting the effective CPU
    utilization metrics from the scheduler

    - Improve the fair scheduler's active load-balancing logic by
    reducing the number of active LB attempts & lengthen the
    load-balancing interval. This improves stress-ng mmapfork
    performance.

    - Fix CFS's estimated utilization (util_est) calculation bug that can
    result in too high utilization values

    Misc updates & fixes:

    - Fix the HRTICK reprogramming & optimization feature

    - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code

    - Reduce dl_add_task_root_domain() overhead

    - Fix uprobes refcount bug

    - Process pending softirqs in flush_smp_call_function_from_idle()

    - Clean up task priority related defines, remove *USER_*PRIO and
    USER_PRIO()

    - Simplify the sched_init_numa() deduplication sort

    - Documentation updates

    - Fix EAS bug in update_misfit_status(), which degraded the quality
    of energy-balancing

    - Smaller cleanups"

    * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
    sched,x86: Allow !PREEMPT_DYNAMIC
    entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
    entry: Explicitly flush pending rcuog wakeup before last rescheduling point
    rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
    rcu/nocb: Perform deferred wake up before last idle's need_resched() check
    rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
    sched/features: Distinguish between NORMAL and DEADLINE hrtick
    sched/features: Fix hrtick reprogramming
    sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
    uprobes: (Re)add missing get_uprobe() in __find_uprobe()
    smp: Process pending softirqs in flush_smp_call_function_from_idle()
    sched: Harden PREEMPT_DYNAMIC
    static_call: Allow module use without exposing static_call_key
    sched: Add /debug/sched_preempt
    preempt/dynamic: Support dynamic preempt with preempt= boot option
    preempt/dynamic: Provide irqentry_exit_cond_resched() static call
    preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
    preempt/dynamic: Provide cond_resched() and might_resched() static calls
    preempt: Introduce CONFIG_PREEMPT_DYNAMIC
    static_call: Provide DEFINE_STATIC_CALL_RET0()
    ...

    Linus Torvalds
     
  • Pull oprofile and dcookies removal from Viresh Kumar:
    "Remove oprofile and dcookies support

    The 'oprofile' user-space tools don't use the kernel OPROFILE support
    any more, and haven't in a long time. User-space has been converted to
    the perf interfaces.

    The dcookies stuff is only used by the oprofile code. Now that
    oprofile's support is getting removed from the kernel, there is no
    need for dcookies as well.

    Remove kernel's old oprofile and dcookies support"

    * tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
    fs: Remove dcookies support
    drivers: Remove CONFIG_OPROFILE support
    arch: xtensa: Remove CONFIG_OPROFILE support
    arch: x86: Remove CONFIG_OPROFILE support
    arch: sparc: Remove CONFIG_OPROFILE support
    arch: sh: Remove CONFIG_OPROFILE support
    arch: s390: Remove CONFIG_OPROFILE support
    arch: powerpc: Remove oprofile
    arch: powerpc: Stop building and using oprofile
    arch: parisc: Remove CONFIG_OPROFILE support
    arch: mips: Remove CONFIG_OPROFILE support
    arch: microblaze: Remove CONFIG_OPROFILE support
    arch: ia64: Remove rest of perfmon support
    arch: ia64: Remove CONFIG_OPROFILE support
    arch: hexagon: Don't select HAVE_OPROFILE
    arch: arc: Remove CONFIG_OPROFILE support
    arch: arm: Remove CONFIG_OPROFILE support
    arch: alpha: Remove CONFIG_OPROFILE support

    Linus Torvalds
     

19 Feb, 2021

1 commit


17 Feb, 2021

1 commit


16 Feb, 2021

3 commits

  • Userspace has discovered the functionality offered by SYS_kcmp and has
    started to depend upon it. In particular, Mesa uses SYS_kcmp for
    os_same_file_description() in order to identify when two fd (e.g. device
    or dmabuf) point to the same struct file. Since they depend on it for
    core functionality, lift SYS_kcmp out of the non-default
    CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.

    Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to
    deduplicate the per-service file descriptor store.

    Note that some distributions such as Ubuntu are already enabling
    CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.

    References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
    Signed-off-by: Chris Wilson
    Cc: Kees Cook
    Cc: Andy Lutomirski
    Cc: Will Drewry
    Cc: Andrew Morton
    Cc: Dave Airlie
    Cc: Daniel Vetter
    Cc: Lucas Stach
    Cc: Rasmus Villemoes
    Cc: Cyrill Gorcunov
    Cc: stable@vger.kernel.org
    Acked-by: Daniel Vetter # DRM depends on kcmp
    Acked-by: Rasmus Villemoes # systemd uses kcmp
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Kees Cook
    Acked-by: Thomas Zimmermann
    Signed-off-by: Daniel Vetter
    Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Paul Gortmaker reported a regression in the GCC version check. [1]
    If you use GCC 4.8, the build breaks before showing the error message
    "error Sorry, your version of GCC is too old - please use 4.9 or newer."

    I do not want to apply his fix-up since it implies we would not be able
    to remove any cc-option test. Anyway, I admit checking the GCC version
    in is too late.

    Almost at the same time, Linus also suggested to move the compiler
    version error to Kconfig time. [2]

    I unified the two similar scripts, gcc-version.sh and clang-version.sh
    into cc-version.sh. The old scripts invoked the compiler multiple times
    (3 times for gcc-version.sh, 4 times for clang-version.sh). I refactored
    the code so the new one invokes the compiler just once, and also tried
    my best to use shell-builtin commands where possible.

    The new script runs faster.

    $ time ./scripts/clang-version.sh clang
    120000

    real 0m0.029s
    user 0m0.012s
    sys 0m0.021s

    $ time ./scripts/cc-version.sh clang
    Clang 120000

    real 0m0.009s
    user 0m0.006s
    sys 0m0.004s

    cc-version.sh also shows an error message if the compiler is too old:

    $ make defconfig CC=clang-9
    *** Default configuration is based on 'x86_64_defconfig'
    ***
    *** Compiler is too old.
    *** Your Clang version: 9.0.1
    *** Minimum Clang version: 10.0.1
    ***
    scripts/Kconfig.include:46: Sorry, this compiler is not supported.
    make[1]: *** [scripts/kconfig/Makefile:81: defconfig] Error 1
    make: *** [Makefile:602: defconfig] Error 2

    The new script takes care of ICC because we have
    although I am not sure if building the kernel with ICC is well-supported.

    [1]: https://lore.kernel.org/r/20210110190807.134996-1-paul.gortmaker@windriver.com
    [2]: https://lore.kernel.org/r/CAHk-=wh-+TMHPTFo1qs-MYyK7tZh-OQovA=pP3=e06aCVp6_kA@mail.gmail.com

    Fixes: 87de84c9140e ("kbuild: remove cc-option test of -Werror=date-time")
    Reported-by: Paul Gortmaker
    Suggested-by: Linus Torvalds
    Reviewed-by: Nick Desaulniers
    Tested-by: Nick Desaulniers
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor
    Reviewed-by: Miguel Ojeda
    Tested-by: Miguel Ojeda
    Tested-by: Sedat Dilek
    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     
  • SFI-based platforms are gone. So does this framework.

    This removes mention of SFI through the drivers and other code as well.

    Signed-off-by: Andy Shevchenko
    Reviewed-by: Hans de Goede
    Acked-by: Linus Walleij
    Signed-off-by: Rafael J. Wysocki

    Andy Shevchenko
     

08 Feb, 2021

1 commit


06 Feb, 2021

1 commit

  • On ARCH=um, loading a module doesn't result in its constructors getting
    called, which breaks module gcov since the debugfs files are never
    registered. On the other hand, in-kernel constructors have already been
    called by the dynamic linker, so we can't call them again.

    Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
    selected, but avoiding the in-kernel constructor calls.

    Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
    really do want CONSTRUCTORS, just not kernel binary ones.

    Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
    Signed-off-by: Johannes Berg
    Reviewed-by: Peter Oberparleiter
    Cc: Arnd Bergmann
    Cc: Jessica Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     

30 Jan, 2021

1 commit

  • On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
    will disable or pause function graph tracing, as there's some paths in
    bringing down the CPU that can have issues with its return address being
    modified. The task_struct structure has a "tracing_graph_pause" atomic
    counter, that when set to something other than zero, the function graph
    tracer will not modify the return address.

    The problem is that the tracing_graph_pause counter is initialized when the
    function graph tracer is enabled. This can corrupt the counter for the idle
    task if it is suspended in these architectures.

    CPU 1 CPU 2
    ----- -----
    do_idle()
    cpu_suspend()
    pause_graph_tracing()
    task_struct->tracing_graph_pause++ (0 -> 1)

    start_graph_tracing()
    for_each_online_cpu(cpu) {
    ftrace_graph_init_idle_task(cpu)
    task-struct->tracing_graph_pause = 0 (1 -> 0)

    unpause_graph_tracing()
    task_struct->tracing_graph_pause-- (0 -> -1)

    The above should have gone from 1 to zero, and enabled function graph
    tracing again. But instead, it is set to -1, which keeps it disabled.

    There's no reason that the field tracing_graph_pause on the task_struct can
    not be initialized at boot up.

    Cc: stable@vger.kernel.org
    Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
    Reported-by: pierre.gondois@arm.com
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

29 Jan, 2021

1 commit

  • The "oprofile" user-space tools don't use the kernel OPROFILE support
    any more, and haven't in a long time. User-space has been converted to
    the perf interfaces.

    Remove kernel's old oprofile support.

    Suggested-by: Christoph Hellwig
    Suggested-by: Linus Torvalds
    Signed-off-by: Viresh Kumar
    Acked-by: Robert Richter
    Acked-by: Paul E. McKenney #RCU
    Acked-by: William Cohen
    Acked-by: Al Viro
    Acked-by: Thomas Gleixner

    Viresh Kumar
     

28 Jan, 2021

1 commit