17 Aug, 2022

1 commit

  • [ Upstream commit 375561bd6195a31bf4c109732bd538cb97a941f4 ]

    Fix the following Sparse warnings that got noticed when the PPC-dev
    patchwork was checking another patch (see the link below):

    init/main.c:862:1: warning: symbol 'randomize_kstack_offset' was not declared. Should it be static?
    init/main.c:864:1: warning: symbol 'kstack_offset' was not declared. Should it be static?

    Which in fact are triggered on all architectures that have
    HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET support (for instances x86, arm64
    etc).

    Link: https://lore.kernel.org/lkml/e7b0d68b-914d-7283-827c-101988923929@huawei.com/T/#m49b2d4490121445ce4bf7653500aba59eefcb67f
    Cc: Christophe Leroy
    Cc: Xiu Jianfeng
    Signed-off-by: GONG, Ruiqi
    Reviewed-by: Christophe Leroy
    Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall")
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/20220629060423.2515693-1-gongruiqi1@huawei.com
    Signed-off-by: Sasha Levin

    GONG, Ruiqi
     

09 Jun, 2022

1 commit

  • commit 1aa0e8b144b6474c4914439d232d15bfe883636b upstream.

    Add a config option to guard (future) usage of asm_volatile_goto() that
    includes "tied outputs", i.e. "+" constraints that specify both an input
    and output parameter. clang-13 has a bug[1] that causes compilation of
    such inline asm to fail, and KVM wants to use a "+m" constraint to
    implement a uaccess form of CMPXCHG[2]. E.g. the test code fails with

    :1:29: error: invalid operand in inline asm: '.long (${1:l}) - .'
    int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }
    ^
    :1:29: error: unknown token in expression
    :1:9: note: instantiated into assembly here
    .long () - .
    ^
    2 errors generated.

    on clang-13, but passes on gcc (with appropriate asm goto support). The
    bug is fixed in clang-14, but won't be backported to clang-13 as the
    changes are too invasive/risky.

    gcc also had a similar bug[3], fixed in gcc-11, where gcc failed to
    account for its behavior of assigning two numbers to tied outputs (one
    for input, one for output) when evaluating symbolic references.

    [1] https://github.com/ClangBuiltLinux/linux/issues/1512
    [2] https://lore.kernel.org/all/YfMruK8%2F1izZ2VHS@google.com
    [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096

    Suggested-by: Nick Desaulniers
    Reviewed-by: Nick Desaulniers
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     

30 May, 2022

2 commits

  • commit 2f14062bb14b0fcfcc21e6dc7d5b5c0d25966164 upstream.

    Currently, start_kernel() adds latent entropy and the command line to
    the entropy bool *after* the RNG has been initialized, deferring when
    it's actually used by things like stack canaries until the next time
    the pool is seeded. This surely is not intended.

    Rather than splitting up which entropy gets added where and when between
    start_kernel() and random_init(), just do everything in random_init(),
    which should eliminate these kinds of bugs in the future.

    While we're at it, rename the awkwardly titled "rand_initialize()" to
    the more standard "random_init()" nomenclature.

    Reviewed-by: Dominik Brodowski
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Greg Kroah-Hartman

    Jason A. Donenfeld
     
  • commit fe222a6ca2d53c38433cba5d3be62a39099e708e upstream.

    Currently time_init() is called after rand_initialize(), but
    rand_initialize() makes use of the timer on various platforms, and
    sometimes this timer needs to be initialized by time_init() first. In
    order for random_get_entropy() to not return zero during early boot when
    it's potentially used as an entropy source, reverse the order of these
    two calls. The block doing random initialization was right before
    time_init() before, so changing the order shouldn't have any complicated
    effects.

    Cc: Andrew Morton
    Reviewed-by: Stafford Horne
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Greg Kroah-Hartman

    Jason A. Donenfeld
     

14 Apr, 2022

2 commits

  • [ Upstream commit f9a40b0890658330c83c95511f9d6b396610defc ]

    initcall_blacklist() should return 1 to indicate that it handled its
    cmdline arguments.

    set_debug_rodata() should return 1 to indicate that it handled its
    cmdline arguments. Print a warning if the option string is invalid.

    This prevents these strings from being added to the 'init' program's
    environment as they are not init arguments/parameters.

    Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap
    Reported-by: Igor Zhbanov
    Cc: Ingo Molnar
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Randy Dunlap
     
  • [ Upstream commit 9c1be1935fb68b2413796cdc03d019b8cf35ab51 ]

    While testing a patch that will follow later
    ("net: add netns refcount tracker to struct nsproxy")
    I found that devtmpfs_init() was called before init_net
    was initialized.

    This is a bug, because devtmpfs_setup() calls
    ksys_unshare(CLONE_NEWNS);

    This has the effect of increasing init_net refcount,
    which will be later overwritten to 1, as part of setup_net(&init_net)

    We had too many prior patches [1] trying to work around the root cause.

    Really, make sure init_net is in BSS section, and that net_ns_init()
    is called earlier at boot time.

    Note that another patch ("vfs: add netns refcount tracker
    to struct fs_context") also will need net_ns_init() being called
    before vfs_caches_init()

    As a bonus, this patch saves around 4KB in .data section.

    [1]

    f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace")
    b5082df8019a ("net: Initialise init_net.count to 1")
    734b65417b24 ("net: Statically initialize init_net.dev_base_head")

    v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     

19 Nov, 2021

1 commit

  • [ Upstream commit 8bc2b3dca7292347d8e715fb723c587134abe013 ]

    The prior message is confusing users, which is the exact opposite of the
    goal. If the message is being seen, one of the following situations is
    happening:

    1. the param is misspelled
    2. the param is not valid due to the kernel configuration
    3. the param is intended for init but isn't after the '--'
    delineator on the command line

    To make that more clear to the user, explicitly mention "kernel command
    line" and also note that the params are still passed to user space to
    avoid causing any alarm over params intended for init.

    Link: https://lkml.kernel.org/r/20211013223502.96756-1-ahalaney@redhat.com
    Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters")
    Signed-off-by: Andrew Halaney
    Suggested-by: Steven Rostedt (VMware)
    Acked-by: Randy Dunlap
    Cc: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Andrew Halaney
     

11 Oct, 2021

1 commit

  • Free unused memblock in a error case to fix memblock leak
    in xbc_make_cmdline().

    Link: https://lkml.kernel.org/r/163177339181.682366.8713781325929549256.stgit@devnote2

    Fixes: 51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

25 Sep, 2021

1 commit


23 Sep, 2021

1 commit


20 Sep, 2021

2 commits

  • Attempt to mount 9p file system as root gives the following kernel panic:

    9pnet_virtio: no channels available for device root
    Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2
    CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc1+ #127
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Call Trace:
    dump_stack_lvl+0x45/0x59
    panic+0x1e2/0x44b
    ? __warn_printk+0xf3/0xf3
    ? free_unref_page+0x2d4/0x4a0
    ? trace_hardirqs_on+0x32/0x120
    ? free_unref_page+0x2d4/0x4a0
    mount_root+0x189/0x1e0
    prepare_namespace+0x136/0x165
    kernel_init_freeable+0x3b8/0x3cb
    ? rest_init+0x2e0/0x2e0
    kernel_init+0x19/0x130
    ret_from_fork+0x1f/0x30
    Kernel Offset: disabled
    ---[ end Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2 ]---

    QEMU command line:
    "qemu-system-x86_64 -append root=/dev/root rw rootfstype=9p rootflags=trans=virtio ..."

    This error is because root_device_name is truncated in prepare_namespace() from
    being "/dev/root" to be "root" prior to call to mount_nodev_root().

    As a solution, don't treat errors in mount_nodev_root() as errors that
    require panics and allow failback to the mount flow that existed before
    patch citied in Fixes tag.

    Fixes: f9259be6a9e7 ("init: allow mounting arbitrary non-blockdevice filesystems as root")
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Leon Romanovsky
     
  • split_fs_names() currently takes comma separate list of filesystems
    and converts it into individual filesystem strings. Pleaces these
    strings in the input buffer passed by caller and returns number of
    strings.

    If caller manages to pass input string bigger than buffer, then we
    can write beyond the buffer. Or if string just fits buffer, we will
    still write beyond the buffer as we append a '\0' byte at the end.

    Pass size of input buffer to split_fs_names() and put enough checks
    in place so such buffer overrun possibilities do not occur.

    This patch does few things.

    - Add a parameter "size" to split_fs_names(). This specifies size
    of input buffer.

    - Use strlcpy() (instead of strcpy()) so that we can't go beyond
    buffer size. If input string "names" is larger than passed in
    buffer, input string will be truncated to fit in buffer.

    - Stop appending extra '\0' character at the end and avoid one
    possibility of going beyond the input buffer size.

    - Do not use extra loop to count number of strings.

    - Previously if one passed "rootfstype=foo,,bar", split_fs_names()
    will return only 1 string "foo" (and "bar" will be truncated
    due to extra ,). After this patch, now split_fs_names() will
    return 3 strings ("foo", zero-sized-string, and "bar").

    Callers of split_fs_names() have been modified to check for
    zero sized string and skip to next one.

    Reported-by: xu xin
    Signed-off-by: Vivek Goyal
    Reviewed-by: Jan Kara
    Signed-off-by: Al Viro

    Vivek Goyal
     

15 Sep, 2021

1 commit

  • The boot-time allocation interface for memblock is a mess, with
    'memblock_alloc()' returning a virtual pointer, but then you are
    supposed to free it with 'memblock_free()' that takes a _physical_
    address.

    Not only is that all kinds of strange and illogical, but it actually
    causes bugs, when people then use it like a normal allocation function,
    and it fails spectacularly on a NULL pointer:

    https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/

    or just random memory corruption if the debug checks don't catch it:

    https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/

    I really don't want to apply patches that treat the symptoms, when the
    fundamental cause is this horribly confusing interface.

    I started out looking at just automating a sane replacement sequence,
    but because of this mix or virtual and physical addresses, and because
    people have used the "__pa()" macro that can take either a regular
    kernel pointer, or just the raw "unsigned long" address, it's all quite
    messy.

    So this just introduces a new saner interface for freeing a virtual
    address that was allocated using 'memblock_alloc()', and that was kept
    as a regular kernel pointer. And then it converts a couple of users
    that are obvious and easy to test, including the 'xbc_nodes' case in
    lib/bootconfig.c that caused problems.

    Reported-by: kernel test robot
    Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
    Cc: Steven Rostedt
    Cc: Mike Rapoport
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Vlastimil Babka
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Sep, 2021

2 commits

  • Pull more tracing updates from Steven Rostedt:

    - Add migrate-disable counter to tracing header

    - Fix error handling in event probes

    - Fix missed unlock in osnoise in error path

    - Fix merge issue with tools/bootconfig

    - Clean up bootconfig data when init memory is removed

    - Fix bootconfig to loop only on subkeys

    - Have kernel command lines override bootconfig options

    - Increase field counts for synthetic events

    - Have histograms dynamic allocate event elements to save space

    - Fixes in testing and documentation

    * tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/boot: Fix to loop on only subkeys
    selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events
    tracing: Dynamically allocate the per-elt hist_elt_data array
    tracing: synth events: increase max fields count
    tools/bootconfig: Show whole test command for each test case
    bootconfig: Fix missing return check of xbc_node_compose_key function
    tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh
    docs: bootconfig: Add how to use bootconfig for kernel parameters
    init/bootconfig: Reorder init parameter from bootconfig and cmdline
    init: bootconfig: Remove all bootconfig data when the init memory is removed
    tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()
    tracing: Fix some alloc_event_probe() error handling bugs
    tracing: Add migrate-disabled counter to tracing output.

    Linus Torvalds
     
  • Pull root filesystem type handling updates from Al Viro:
    "Teach init/do_mounts.c to handle non-block filesystems, hopefully
    preventing even more special-cased kludges (such as root=/dev/nfs,
    etc)"

    * 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: simplify get_filesystem_list / get_all_fs_names
    init: allow mounting arbitrary non-blockdevice filesystems as root
    init: split get_fs_names

    Linus Torvalds
     

09 Sep, 2021

5 commits

  • Merge more updates from Andrew Morton:
    "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.

    Subsystems affected by this patch series: mm (memory-hotplug, rmap,
    ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
    alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
    checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
    selftests, ipc, and scripts"

    * emailed patches from Andrew Morton : (94 commits)
    scripts: check_extable: fix typo in user error message
    mm/workingset: correct kernel-doc notations
    ipc: replace costly bailout check in sysvipc_find_ipc()
    selftests/memfd: remove unused variable
    Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
    configs: remove the obsolete CONFIG_INPUT_POLLDEV
    prctl: allow to setup brk for et_dyn executables
    pid: cleanup the stale comment mentioning pidmap_init().
    kernel/fork.c: unexport get_{mm,task}_exe_file
    coredump: fix memleak in dump_vma_snapshot()
    fs/coredump.c: log if a core dump is aborted due to changed file permissions
    nilfs2: use refcount_dec_and_lock() to fix potential UAF
    nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
    nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
    nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
    nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
    nilfs2: fix NULL pointer in nilfs_##name##_attr_release
    nilfs2: fix memory leak in nilfs_sysfs_create_device_group
    trap: cleanup trap_init()
    init: move usermodehelper_enable() to populate_rootfs()
    ...

    Linus Torvalds
     
  • Reorder the init parameters from bootconfig and kernel cmdline
    so that the kernel cmdline always be the last part of the
    parameters as below.

    " -- "[bootconfig init params][cmdline init params]

    This change will help us to prevent that bootconfig init params
    overwrite the init params which user gives in the command line.

    Link: https://lkml.kernel.org/r/163077085675.222577.5665176468023636160.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since the bootconfig is used only in the init functions,
    it doesn't need to keep the data after boot. Free it when
    the init memory is removed.

    Link: https://lkml.kernel.org/r/163077084958.222577.5924961258513004428.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • There are some empty trap_init() definitions in different ARCHs, Introduce
    a new weak trap_init() function to clean them up.

    Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang
    Acked-by: Russell King (Oracle) [arm32]
    Acked-by: Vineet Gupta [arc]
    Acked-by: Michael Ellerman [powerpc]
    Cc: Yoshinori Sato
    Cc: Ley Foon Tan
    Cc: Jonas Bonn
    Cc: Stefan Kristiansson
    Cc: Stafford Horne
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Anton Ivanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kefeng Wang
     
  • Currently, usermodehelper is enabled right before PID1 starts going
    through the initcalls. However, any call of a usermodehelper from a
    pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as
    there is no filesystem contents yet.

    Up until commit e7cb072eb988 ("init/initramfs.c: do unpacking
    asynchronously"), such calls, whether via some request_module(), a
    legacy uevent "/sbin/hotplug" notification or something else, would
    just fail silently with (presumably) -ENOENT from
    kernel_execve(). However, that commit introduced the
    wait_for_initramfs() synchronization hook which must be called from
    the usermodehelper exec path right before the kernel_execve, in order
    that request_module() et al done from *after* rootfs_initcall()
    time (i.e. device_ and late_ initcalls) would continue to find a
    populated initramfs as they used to.

    Any call of wait_for_initramfs() done before the unpacking has been
    scheduled (i.e. before rootfs_initcall time) must just return
    immediately [and let the caller find an empty file system] in order
    not to deadlock the machine. I mistakenly thought, and my limited
    testing confirmed, that there were no such calls, so I added a
    pr_warn_once() in wait_for_initramfs(). It turns out that one can
    indeed hit request_module() as well as kobject_uevent_env() during
    those early init calls, leading to a user-visible warning in the
    kernel log emitted consistently for certain configurations.

    We could just remove the pr_warn_once(), but I think it's better to
    postpone enabling the usermodehelper framework until there is at least
    some chance of finding the executable. That is also a little more
    efficient in that a lot of work done in umh.c will be elided. However,
    it does change the error seen by those early callers from -ENOENT to
    -EBUSY, so there is a risk of a regression if any caller care about
    the exact error value.

    Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk
    Fixes: e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously")
    Signed-off-by: Rasmus Villemoes
    Reported-by: Alexander Egorenkov
    Reported-by: Bruno Goncalves
    Reported-by: Heiner Kallweit
    Cc: Luis Chamberlain
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

08 Sep, 2021

1 commit

  • The cross-product of the kernel's supported toolchains, architectures,
    and configuration options is large. So large, that it's generally
    accepted to be infeasible to enumerate and build+test them all
    (many compile-testers rely on randomly generated configs).

    Without the possibility to enumerate all possible combinations of
    toolchains, architectures, and configuration options, it is inevitable
    that compiler warnings in this space exist.

    With -Werror, this means that an innumerable set of kernels are now
    broken, yet had been perfectly usable before (confused compilers, code
    with warnings unused, or luck).

    Distributors will necessarily pick a point in the toolchain X arch X
    config space, and if unlucky, will have a broken build. Granted, those
    will likely disable CONFIG_WERROR and move on.

    The kernel's default configuration is unlikely to be suitable for all
    users, but it's inappropriate to force many users to set CONFIG_WERROR=n.

    This also holds for CI systems which are focused on runtime testing,
    where the odd warning in some subsystem will disrupt testing of the rest
    of the kernel. Many of those runtime-focused CI systems run tests or
    fuzz the kernel using runtime debugging tools. Runtime testing of
    different subsystems can proceed in parallel, and potentially uncover
    serious bugs; halting runtime testing of the entire kernel because of
    the odd warning (now error) in a subsystem or driver is simply
    inappropriate.

    Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
    as well.

    The appropriate usecase for -Werror is therefore compile-test focused
    builds (often done by developers or CI systems).

    Reflect this in the Kconfig option by making the default value of WERROR
    match COMPILE_TEST.

    Signed-off-by: Marco Elver
    Acked-by: Guenter Roeck
    Acked-by: Randy Dunlap
    Reviwed-by: Mark Brown
    Reviewed-by: Nathan Chancellor
    Signed-off-by: Linus Torvalds

    Marco Elver
     

06 Sep, 2021

1 commit

  • ... but make it a config option so that broken environments can disable
    it when required.

    We really should always have a clean build, and will disable specific
    over-eager warnings as required, if we can't fix them. But while I
    fairly religiously enforce that in my own tree, it doesn't get enforced
    by various build robots that don't necessarily report warnings.

    So this just makes '-Werror' a default compiler flag, but allows people
    to disable it for their configuration if they have some particular
    issues.

    Occasionally, new compiler versions end up enabling new warnings, and it
    can take a while before we have them fixed (or the warnings disabled if
    that is what it takes), so the config option allows for that situation.

    Hopefully this will mean that I get fewer pull requests that have new
    warnings that were not noticed by various automation we have in place.

    Knock wood.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 Sep, 2021

1 commit

  • Pull printk updates from Petr Mladek:

    - Optionally, provide an index of possible printk messages via
    /printk/index/. It can be used when monitoring important
    kernel messages on a farm of various hosts. The monitor has to be
    updated when some messages has changed or are not longer available by
    a newly deployed kernel.

    - Add printk.console_no_auto_verbose boot parameter. It allows to
    generate crash dump even with slow consoles in a reasonable time
    frame.

    - Remove printk_safe buffers. The messages are always stored directly
    to the main logbuffer, even in NMI or recursive context. Also it
    allows to serialize syslog operations by a mutex instead of a spin
    lock.

    - Misc clean up and build fixes.

    * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
    printk/index: Fix -Wunused-function warning
    lib/nmi_backtrace: Serialize even messages about idle CPUs
    printk: Add printk.console_no_auto_verbose boot parameter
    printk: Remove console_silent()
    lib/test_scanf: Handle n_bits == 0 in random tests
    printk: syslog: close window between wait and read
    printk: convert @syslog_lock to mutex
    printk: remove NMI tracking
    printk: remove safe buffers
    printk: track/limit recursion
    lib/nmi_backtrace: explicitly serialize banner and regs
    printk: Move the printk() kerneldoc comment to its new home
    printk/index: Fix warning about missing prototypes
    MIPS/asm/printk: Fix build failure caused by printk
    printk: index: Add indexing support to dev_printk
    printk: Userspace format indexing support
    printk: Rework parse_prefix into printk_parse_prefix
    printk: Straighten out log_flags into printk_info_flags
    string_helpers: Escape double quotes in escape_special
    printk/console: Check consistent sequence number when handling race in console_unlock()

    Linus Torvalds
     

01 Sep, 2021

1 commit

  • Pull networking updates from Jakub Kicinski:
    "Core:

    - Enable memcg accounting for various networking objects.

    BPF:

    - Introduce bpf timers.

    - Add perf link and opaque bpf_cookie which the program can read out
    again, to be used in libbpf-based USDT library.

    - Add bpf_task_pt_regs() helper to access user space pt_regs in
    kprobes, to help user space stack unwinding.

    - Add support for UNIX sockets for BPF sockmap.

    - Extend BPF iterator support for UNIX domain sockets.

    - Allow BPF TCP congestion control progs and bpf iterators to call
    bpf_setsockopt(), e.g. to switch to another congestion control
    algorithm.

    Protocols:

    - Support IOAM Pre-allocated Trace with IPv6.

    - Support Management Component Transport Protocol.

    - bridge: multicast: add vlan support.

    - netfilter: add hooks for the SRv6 lightweight tunnel driver.

    - tcp:
    - enable mid-stream window clamping (by user space or BPF)
    - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
    - more accurate DSACK processing for RACK-TLP

    - mptcp:
    - add full mesh path manager option
    - add partial support for MP_FAIL
    - improve use of backup subflows
    - optimize option processing

    - af_unix: add OOB notification support.

    - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
    router.

    - mac80211: Target Wake Time support in AP mode.

    - can: j1939: extend UAPI to notify about RX status.

    Driver APIs:

    - Add page frag support in page pool API.

    - Many improvements to the DSA (distributed switch) APIs.

    - ethtool: extend IRQ coalesce uAPI with timer reset modes.

    - devlink: control which auxiliary devices are created.

    - Support CAN PHYs via the generic PHY subsystem.

    - Proper cross-chip support for tag_8021q.

    - Allow TX forwarding for the software bridge data path to be
    offloaded to capable devices.

    Drivers:

    - veth: more flexible channels number configuration.

    - openvswitch: introduce per-cpu upcall dispatch.

    - Add internet mix (IMIX) mode to pktgen.

    - Transparently handle XDP operations in the bonding driver.

    - Add LiteETH network driver.

    - Renesas (ravb):
    - support Gigabit Ethernet IP

    - NXP Ethernet switch (sja1105):
    - fast aging support
    - support for "H" switch topologies
    - traffic termination for ports under VLAN-aware bridge

    - Intel 1G Ethernet
    - support getcrosststamp() with PCIe PTM (Precision Time
    Measurement) for better time sync
    - support Credit-Based Shaper (CBS) offload, enabling HW traffic
    prioritization and bandwidth reservation

    - Broadcom Ethernet (bnxt)
    - support pulse-per-second output
    - support larger Rx rings

    - Mellanox Ethernet (mlx5)
    - support ethtool RSS contexts and MQPRIO channel mode
    - support LAG offload with bridging
    - support devlink rate limit API
    - support packet sampling on tunnels

    - Huawei Ethernet (hns3):
    - basic devlink support
    - add extended IRQ coalescing support
    - report extended link state

    - Netronome Ethernet (nfp):
    - add conntrack offload support

    - Broadcom WiFi (brcmfmac):
    - add WPA3 Personal with FT to supported cipher suites
    - support 43752 SDIO device

    - Intel WiFi (iwlwifi):
    - support scanning hidden 6GHz networks
    - support for a new hardware family (Bz)

    - Xen pv driver:
    - harden netfront against malicious backends

    - Qualcomm mobile
    - ipa: refactor power management and enable automatic suspend
    - mhi: move MBIM to WWAN subsystem interfaces

    Refactor:

    - Ambient BPF run context and cgroup storage cleanup.

    - Compat rework for ndo_ioctl.

    Old code removal:

    - prism54 remove the obsoleted driver, deprecated by the p54 driver.

    - wan: remove sbni/granch driver"

    * tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
    net: Add depends on OF_NET for LiteX's LiteETH
    ipv6: seg6: remove duplicated include
    net: hns3: remove unnecessary spaces
    net: hns3: add some required spaces
    net: hns3: clean up a type mismatch warning
    net: hns3: refine function hns3_set_default_feature()
    ipv6: remove duplicated 'net/lwtunnel.h' include
    net: w5100: check return value after calling platform_get_resource()
    net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
    net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
    net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
    fou: remove sparse errors
    ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
    octeontx2-af: Set proper errorcode for IPv4 checksum errors
    octeontx2-af: Fix static code analyzer reported issues
    octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
    octeontx2-af: Fix loop in free and unmap counter
    af_unix: fix potential NULL deref in unix_dgram_connect()
    dpaa2-eth: Replace strlcpy with strscpy
    octeontx2-af: Use NDC TX for transmit packet data
    ...

    Linus Torvalds
     

31 Aug, 2021

2 commits

  • Pull block updates from Jens Axboe:
    "Nothing major in here - lots of good cleanups and tech debt handling,
    which is also evident in the diffstats. In particular:

    - Add disk sequence numbers (Matteo)

    - Discard merge fix (Ming)

    - Relax disk zoned reporting restrictions (Niklas)

    - Bio error handling zoned leak fix (Pavel)

    - Start of proper add_disk() error handling (Luis, Christoph)

    - blk crypto fix (Eric)

    - Non-standard GPT location support (Dmitry)

    - IO priority improvements and cleanups (Damien)o

    - blk-throtl improvements (Chunguang)

    - diskstats_show() stack reduction (Abd-Alrhman)

    - Loop scheduler selection (Bart)

    - Switch block layer to use kmap_local_page() (Christoph)

    - Remove obsolete disk_name helper (Christoph)

    - block_device refcounting improvements (Christoph)

    - Ensure gendisk always has a request queue reference (Christoph)

    - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)"

    * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits)
    sg: pass the device name to blk_trace_setup
    block, bfq: cleanup the repeated declaration
    blk-crypto: fix check for too-large dun_bytes
    blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
    blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
    block: mark blkdev_fsync static
    block: refine the disk_live check in del_gendisk
    mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA
    mmc: block: Support alternative_gpt_sector() operation
    partitions/efi: Support non-standard GPT location
    block: Add alternative_gpt_sector() operation
    bio: fix page leak bio_add_hw_page failure
    block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT
    block: remove a pointless call to MINOR() in device_add_disk
    null_blk: add error handling support for add_disk()
    virtio_blk: add error handling support for add_disk()
    block: add error handling for device_add_disk / add_disk
    block: return errors from disk_alloc_events
    block: return errors from blk_integrity_add
    block: call blk_register_queue earlier in device_add_disk
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - The biggest change in this cycle is scheduler support for asymmetric
    scheduling affinity, to support the execution of legacy 32-bit tasks
    on AArch32 systems that also have 64-bit-only CPUs.

    Architectures can fill in this functionality by defining their own
    task_cpu_possible_mask(p). When this is done, the scheduler will make
    sure the task will only be scheduled on CPUs that support it.

    (The actual arm64 specific changes are not part of this tree.)

    For other architectures there will be no change in functionality.

    - Add cgroup SCHED_IDLE support

    - Increase node-distance flexibility & delay determining it until a CPU
    is brought online. (This enables platforms where node distance isn't
    final until the CPU is only.)

    - Deadline scheduler enhancements & fixes

    - Misc fixes & cleanups.

    * tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    eventfd: Make signal recursion protection a task bit
    sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case
    sched: Introduce dl_task_check_affinity() to check proposed affinity
    sched: Allow task CPU affinity to be restricted on asymmetric systems
    sched: Split the guts of sched_setaffinity() into a helper function
    sched: Introduce task_struct::user_cpus_ptr to track requested affinity
    sched: Reject CPU affinity changes based on task_cpu_possible_mask()
    cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
    cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
    cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
    sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
    sched: Cgroup SCHED_IDLE support
    sched/topology: Skip updating masks for non-online nodes
    sched: Replace deprecated CPU-hotplug functions.
    sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
    sched: Fix UCLAMP_FLAG_IDLE setting
    sched/deadline: Fix missing clock update in migrate_task_rq_dl()
    sched/fair: Avoid a second scan of target in select_idle_cpu
    sched/fair: Use prev instead of new target as recent_used_cpu
    sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
    ...

    Linus Torvalds
     

30 Aug, 2021

1 commit


24 Aug, 2021

1 commit


23 Aug, 2021

3 commits

  • Just output the '\0' separate list of supported file systems for block
    devices directly rather than going through a pointless round of string
    manipulation.

    Based on an earlier patch from Al Viro .

    Vivek:
    Modified list_bdev_fs_names() and split_fs_names() to return number of
    null terminted strings to caller. Callers now use that information to
    loop through all the strings instead of relying on one extra null char
    being present at the end.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Vivek Goyal
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Currently the only non-blockdevice filesystems that can be used as the
    initial root filesystem are NFS and CIFS, which use the magic
    "root=/dev/nfs" and "root=/dev/cifs" syntax that requires the root
    device file system details to come from filesystem specific kernel
    command line options.

    Add a little bit of new code that allows to just pass arbitrary
    string mount options to any non-blockdevice filesystems so that it can
    be mounted as the root file system.

    For example a virtiofs root file system can be mounted using the
    following syntax:

    "root=myfs rootfstype=virtiofs rw"

    Based on an earlier patch from Vivek Goyal .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split get_fs_names into one function that splits up the command line
    argument, and one that gets the list of all registered file systems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

20 Aug, 2021

2 commits


13 Aug, 2021

1 commit

  • Since the 'bootconfig' command line parameter is handled before
    parsing the command line, it doesn't use early_param(). But in
    this case, kernel shows a wrong warning message about it.

    [ 0.013714] Kernel command line: ro console=ttyS0 bootconfig console=tty0
    [ 0.013741] Unknown command line parameters: bootconfig

    To suppress this message, add a dummy handler for 'bootconfig'.

    Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2

    Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters")
    Reviewed-by: Andrew Halaney
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

03 Aug, 2021

1 commit

  • Now that ax88796.c exports the ax_NS8390_reinit() symbol, we can
    include 8390.h instead of lib8390.c, avoiding duplication of that
    function and killing a few compile warnings in the bargain.

    Fixes: 861928f4e60e826c ("net-next: New ax88796 platform
    driver for Amiga X-Surf 100 Zorro board (m68k)")

    Signed-off-by: Michael Schmitz
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Michael Schmitz
     

26 Jul, 2021

1 commit

  • All NMI contexts are handled the same as the safe context: store the
    message and defer printing. There is no need to have special NMI
    context tracking for this. Using in_nmi() is enough.

    There are several parts of the kernel that are manually calling into
    the printk NMI context tracking in order to cause general printk
    deferred printing:

    arch/arm/kernel/smp.c
    arch/powerpc/kexec/crash.c
    kernel/trace/trace.c

    For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
    function pair printk_deferred_enter/exit that explicitly achieves the
    same objective.

    For ftrace, remove the printk context manipulation completely. It was
    added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when
    accessing the main log buffer in NMI"). The purpose was to enforce
    storing messages directly into the ring buffer even in NMI context.
    It really should have only modified the behavior in NMI context.
    There is no need for a special behavior any longer. All messages are
    always stored directly now. The console deferring is handled
    transparently in vprintk().

    Signed-off-by: John Ogness
    [pmladek@suse.com: Remove special handling in ftrace.c completely.
    Signed-off-by: Petr Mladek
    Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de

    John Ogness
     

19 Jul, 2021

1 commit

  • We have a number of systems industry-wide that have a subset of their
    functionality that works as follows:

    1. Receive a message from local kmsg, serial console, or netconsole;
    2. Apply a set of rules to classify the message;
    3. Do something based on this classification (like scheduling a
    remediation for the machine), rinse, and repeat.

    As a couple of examples of places we have this implemented just inside
    Facebook, although this isn't a Facebook-specific problem, we have this
    inside our netconsole processing (for alarm classification), and as part
    of our machine health checking. We use these messages to determine
    fairly important metrics around production health, and it's important
    that we get them right.

    While for some kinds of issues we have counters, tracepoints, or metrics
    with a stable interface which can reliably indicate the issue, in order
    to react to production issues quickly we need to work with the interface
    which most kernel developers naturally use when developing: printk.

    Most production issues come from unexpected phenomena, and as such
    usually the code in question doesn't have easily usable tracepoints or
    other counters available for the specific problem being mitigated. We
    have a number of lines of monitoring defence against problems in
    production (host metrics, process metrics, service metrics, etc), and
    where it's not feasible to reliably monitor at another level, this kind
    of pragmatic netconsole monitoring is essential.

    As one would expect, monitoring using printk is rather brittle for a
    number of reasons -- most notably that the message might disappear
    entirely in a new version of the kernel, or that the message may change
    in some way that the regex or other classification methods start to
    silently fail.

    One factor that makes this even harder is that, under normal operation,
    many of these messages are never expected to be hit. For example, there
    may be a rare hardware bug which one wants to detect if it was to ever
    happen again, but its recurrence is not likely or anticipated. This
    precludes using something like checking whether the printk in question
    was printed somewhere fleetwide recently to determine whether the
    message in question is still present or not, since we don't anticipate
    that it should be printed anywhere, but still need to monitor for its
    future presence in the long-term.

    This class of issue has happened on a number of occasions, causing
    unhealthy machines with hardware issues to remain in production for
    longer than ideal. As a recent example, some monitoring around
    blk_update_request fell out of date and caused semi-broken machines to
    remain in production for longer than would be desirable.

    Searching through the codebase to find the message is also extremely
    fragile, because many of the messages are further constructed beyond
    their callsite (eg. btrfs_printk and other module-specific wrappers,
    each with their own functionality). Even if they aren't, guessing the
    format and formulation of the underlying message based on the aesthetics
    of the message emitted is not a recipe for success at scale, and our
    previous issues with fleetwide machine health checking demonstrate as
    much.

    This provides a solution to the issue of silently changed or deleted
    printks: we record pointers to all printk format strings known at
    compile time into a new .printk_index section, both in vmlinux and
    modules. At runtime, this can then be iterated by looking at
    /printk/index/, which emits the following format, both
    readable by humans and able to be parsed by machines:

    $ head -1 vmlinux; shuf -n 5 vmlinux
    # filename:line function "format"
    block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
    kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
    arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
    init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
    drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

    This mitigates the majority of cases where we have a highly-specific
    printk which we want to match on, as we can now enumerate and check
    whether the format changed or the printk callsite disappeared entirely
    in userspace. This allows us to catch changes to printks we monitor
    earlier and decide what to do about it before it becomes problematic.

    There is no additional runtime cost for printk callers or printk itself,
    and the assembly generated is exactly the same.

    Signed-off-by: Chris Down
    Cc: Petr Mladek
    Cc: Jessica Yu
    Cc: Sergey Senozhatsky
    Cc: John Ogness
    Cc: Steven Rostedt
    Cc: Greg Kroah-Hartman
    Cc: Johannes Weiner
    Cc: Kees Cook
    Reviewed-by: Petr Mladek
    Tested-by: Petr Mladek
    Reported-by: kernel test robot
    Acked-by: Andy Shevchenko
    Acked-by: Jessica Yu # for module.{c,h}
    Signed-off-by: Petr Mladek
    Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name

    Chris Down
     

18 Jul, 2021

1 commit

  • This reverts commit 788691464c29455346dc613a3b43c2fb9e5757a4.

    It's not clear why, but it causes unexplained problems in entirely
    unrelated xfs code. The most likely explanation is some slab
    corruption, possibly triggered due to CONFIG_SLUB_DEBUG_ON. See [1].

    It ends up having a few other problems too, like build errors on
    arch/arc, and Geert reporting it using much more memory on m68k [3] (it
    probably does so elsewhere too, but it is probably just more noticeable
    on m68k).

    The architecture issues (both build and memory use) are likely just
    because this change effectively force-enabled STACKDEPOT (along with a
    very bad default value for the stackdepot hash size). But together with
    the xfs issue, this all smells like "this commit was not ready" to me.

    Link: https://lore.kernel.org/linux-xfs/YPE3l82acwgI2OiV@infradead.org/ [1]
    Link: https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/ [2]
    Link: https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ [3]
    Reported-by: Christoph Hellwig
    Reported-by: kernel test robot
    Reported-by: Geert Uytterhoeven
    Cc: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Jul, 2021

1 commit

  • Pull Kbuild updates from Masahiro Yamada:

    - Increase the -falign-functions alignment for the debug option.

    - Remove ugly libelf checks from the top Makefile.

    - Make the silent build (-s) more silent.

    - Re-compile the kernel if KBUILD_BUILD_TIMESTAMP is specified.

    - Various script cleanups

    * tag 'kbuild-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (27 commits)
    scripts: add generic syscallnr.sh
    scripts: check duplicated syscall number in syscall table
    sparc: syscalls: use pattern rules to generate syscall headers
    parisc: syscalls: use pattern rules to generate syscall headers
    nds32: add arch/nds32/boot/.gitignore
    kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set
    kbuild: modpost: Explicitly warn about unprototyped symbols
    kbuild: remove trailing slashes from $(KBUILD_EXTMOD)
    kconfig.h: explain IS_MODULE(), IS_ENABLED()
    kconfig: constify long_opts
    scripts/setlocalversion: simplify the short version part
    scripts/setlocalversion: factor out 12-chars hash construction
    scripts/setlocalversion: add more comments to -dirty flag detection
    scripts/setlocalversion: remove workaround for old make-kpkg
    scripts/setlocalversion: remove mercurial, svn and git-svn supports
    kbuild: clean up ${quiet} checks in shell scripts
    kbuild: sink stdout from cmd for silent build
    init: use $(call cmd,) for generating include/generated/compile.h
    kbuild: merge scripts/mkmakefile to top Makefile
    sh: move core-y in arch/sh/Makefile to arch/sh/Kbuild
    ...

    Linus Torvalds
     

09 Jul, 2021

1 commit

  • Parse the kernel's build ID at initialization so that other code can print
    a hex format string representation of the running kernel's build ID. This
    will be used in the kdump and dump_stack code so that developers can
    easily locate the vmlinux debug symbols for a crash/stacktrace.

    [swboyd@chromium.org: fix implicit declaration of init_vmlinux_build_id()]
    Link: https://lkml.kernel.org/r/CAE-0n51UjTbay8N9FXAyE7_aR2+ePrQnKSRJ0gbmRsXtcLBVaw@mail.gmail.com

    Link: https://lkml.kernel.org/r/20210511003845.2429846-4-swboyd@chromium.org
    Signed-off-by: Stephen Boyd
    Acked-by: Baoquan He
    Cc: Jiri Olsa
    Cc: Alexei Starovoitov
    Cc: Jessica Yu
    Cc: Evan Green
    Cc: Hsin-Yi Wang
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Andy Shevchenko
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Ingo Molnar
    Cc: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: Petr Mladek
    Cc: Rasmus Villemoes
    Cc: Sasha Levin
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd