12 Jul, 2022

2 commits

  • [ Upstream commit 391e982bfa632b8315235d8be9c0a81374c6a19c ]

    It is trivial to craft a module to trigger OOB access in this line:

    if (info->secstrings[strhdr->sh_size - 1] != '\0') {

    BUG: unable to handle page fault for address: ffffc90000aa0fff
    PGD 100000067 P4D 100000067 PUD 100066067 PMD 10436f067 PTE 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 7 PID: 1215 Comm: insmod Not tainted 5.18.0-rc5-00007-g9bf578647087-dirty #10
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
    RIP: 0010:load_module+0x19b/0x2391

    Fixes: ec2a29593c83 ("module: harden ELF info handling")
    Signed-off-by: Alexey Dobriyan
    [rebased patch onto modules-next]
    Signed-off-by: Luis Chamberlain
    Signed-off-by: Sasha Levin

    Alexey Dobriyan
     
  • [ Upstream commit 7fd982f394c42f25a73fe9dfbf1e6b11fa26b40a ]

    elf_validity_check() checks ELF headers for errors and ELF Spec.
    compliance and if any of them fail it returns -ENOEXEC from all of
    these error paths. Almost all of them don't print any messages.

    When elf_validity_check() returns an error, load_module() prints an
    error message without error code. It is hard to determine why the
    module ELF structure is invalid, even if load_module() prints the
    error code which is -ENOEXEC in all of these cases.

    Change to print useful error messages from elf_validity_check() to
    clearly say what went wrong and why the ELF validity checks failed.

    Remove the load_module() error message which is no longer needed.
    This patch includes changes to fix build warns on 32-bit platforms:

    warning: format '%llu' expects argument of type 'long long unsigned int',
    but argument 3 has type 'Elf32_Off' {aka 'unsigned int'}
    Reported-by: kernel test robot

    Signed-off-by: Shuah Khan
    Signed-off-by: Luis Chamberlain
    Signed-off-by: Sasha Levin

    Shuah Khan
     

23 Feb, 2022

1 commit

  • [ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]

    This reverts commit 774a1221e862b343388347bac9b318767336b20b.

    We need to finish all async code before the module init sequence is
    done. In the reverted commit the PF_USED_ASYNC flag was added to mark a
    thread that called async_schedule(). Then the PF_USED_ASYNC flag was
    used to determine whether or not async_synchronize_full() needs to be
    invoked. This works when modprobe thread is calling async_schedule(),
    but it does not work if module dispatches init code to a worker thread
    which then calls async_schedule().

    For example, PCI driver probing is invoked from a worker thread based on
    a node where device is attached:

    if (cpu < nr_cpu_ids)
    error = work_on_cpu(cpu, local_pci_probe, &ddi);
    else
    error = local_pci_probe(&ddi);

    We end up in a situation where a worker thread gets the PF_USED_ASYNC
    flag set instead of the modprobe thread. As a result,
    async_synchronize_full() is not invoked and modprobe completes without
    waiting for the async code to finish.

    The issue was discovered while loading the pm80xx driver:
    (scsi_mod.scan=async)

    modprobe pm80xx worker
    ...
    do_init_module()
    ...
    pci_call_probe()
    work_on_cpu(local_pci_probe)
    local_pci_probe()
    pm8001_pci_probe()
    scsi_scan_host()
    async_schedule()
    worker->flags |= PF_USED_ASYNC;
    ...
    < return from worker >
    ...
    if (current->flags & PF_USED_ASYNC)
    Reviewed-by: Changyuan Lyu
    Reviewed-by: Luis Chamberlain
    Acked-by: Tejun Heo
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Igor Pylypiv
     

28 Sep, 2021

1 commit

  • When CONFIG_MODULE_UNLOAD is disabled, the module->exit member
    is not defined, causing a build failure:

    kernel/module.c:4493:8: error: no member named 'exit' in 'struct module'
    mod->exit = *exit;

    add an #ifdef block around this.

    Fixes: cf68fffb66d6 ("add support for Clang CFI")
    Acked-by: Kees Cook
    Reviewed-by: Sami Tolvanen
    Reviewed-by: Miroslav Benes
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jessica Yu

    Arnd Bergmann
     

19 Jul, 2021

1 commit

  • We have a number of systems industry-wide that have a subset of their
    functionality that works as follows:

    1. Receive a message from local kmsg, serial console, or netconsole;
    2. Apply a set of rules to classify the message;
    3. Do something based on this classification (like scheduling a
    remediation for the machine), rinse, and repeat.

    As a couple of examples of places we have this implemented just inside
    Facebook, although this isn't a Facebook-specific problem, we have this
    inside our netconsole processing (for alarm classification), and as part
    of our machine health checking. We use these messages to determine
    fairly important metrics around production health, and it's important
    that we get them right.

    While for some kinds of issues we have counters, tracepoints, or metrics
    with a stable interface which can reliably indicate the issue, in order
    to react to production issues quickly we need to work with the interface
    which most kernel developers naturally use when developing: printk.

    Most production issues come from unexpected phenomena, and as such
    usually the code in question doesn't have easily usable tracepoints or
    other counters available for the specific problem being mitigated. We
    have a number of lines of monitoring defence against problems in
    production (host metrics, process metrics, service metrics, etc), and
    where it's not feasible to reliably monitor at another level, this kind
    of pragmatic netconsole monitoring is essential.

    As one would expect, monitoring using printk is rather brittle for a
    number of reasons -- most notably that the message might disappear
    entirely in a new version of the kernel, or that the message may change
    in some way that the regex or other classification methods start to
    silently fail.

    One factor that makes this even harder is that, under normal operation,
    many of these messages are never expected to be hit. For example, there
    may be a rare hardware bug which one wants to detect if it was to ever
    happen again, but its recurrence is not likely or anticipated. This
    precludes using something like checking whether the printk in question
    was printed somewhere fleetwide recently to determine whether the
    message in question is still present or not, since we don't anticipate
    that it should be printed anywhere, but still need to monitor for its
    future presence in the long-term.

    This class of issue has happened on a number of occasions, causing
    unhealthy machines with hardware issues to remain in production for
    longer than ideal. As a recent example, some monitoring around
    blk_update_request fell out of date and caused semi-broken machines to
    remain in production for longer than would be desirable.

    Searching through the codebase to find the message is also extremely
    fragile, because many of the messages are further constructed beyond
    their callsite (eg. btrfs_printk and other module-specific wrappers,
    each with their own functionality). Even if they aren't, guessing the
    format and formulation of the underlying message based on the aesthetics
    of the message emitted is not a recipe for success at scale, and our
    previous issues with fleetwide machine health checking demonstrate as
    much.

    This provides a solution to the issue of silently changed or deleted
    printks: we record pointers to all printk format strings known at
    compile time into a new .printk_index section, both in vmlinux and
    modules. At runtime, this can then be iterated by looking at
    /printk/index/, which emits the following format, both
    readable by humans and able to be parsed by machines:

    $ head -1 vmlinux; shuf -n 5 vmlinux
    # filename:line function "format"
    block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
    kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
    arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
    init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
    drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

    This mitigates the majority of cases where we have a highly-specific
    printk which we want to match on, as we can now enumerate and check
    whether the format changed or the printk callsite disappeared entirely
    in userspace. This allows us to catch changes to printks we monitor
    earlier and decide what to do about it before it becomes problematic.

    There is no additional runtime cost for printk callers or printk itself,
    and the assembly generated is exactly the same.

    Signed-off-by: Chris Down
    Cc: Petr Mladek
    Cc: Jessica Yu
    Cc: Sergey Senozhatsky
    Cc: John Ogness
    Cc: Steven Rostedt
    Cc: Greg Kroah-Hartman
    Cc: Johannes Weiner
    Cc: Kees Cook
    Reviewed-by: Petr Mladek
    Tested-by: Petr Mladek
    Reported-by: kernel test robot
    Acked-by: Andy Shevchenko
    Acked-by: Jessica Yu # for module.{c,h}
    Signed-off-by: Petr Mladek
    Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name

    Chris Down
     

09 Jul, 2021

1 commit

  • Let's make kernel stacktraces easier to identify by including the build
    ID[1] of a module if the stacktrace is printing a symbol from a module.
    This makes it simpler for developers to locate a kernel module's full
    debuginfo for a particular stacktrace. Combined with
    scripts/decode_stracktrace.sh, a developer can download the matching
    debuginfo from a debuginfod[2] server and find the exact file and line
    number for the functions plus offsets in a stacktrace that match the
    module. This is especially useful for pstore crash debugging where the
    kernel crashes are recorded in something like console-ramoops and the
    recovery kernel/modules are different or the debuginfo doesn't exist on
    the device due to space concerns (the debuginfo can be too large for space
    limited devices).

    Originally, I put this on the %pS format, but that was quickly rejected
    given that %pS is used in other places such as ftrace where build IDs
    aren't meaningful. There was some discussions on the list to put every
    module build ID into the "Modules linked in:" section of the stacktrace
    message but that quickly becomes very hard to read once you have more than
    three or four modules linked in. It also provides too much information
    when we don't expect each module to be traversed in a stacktrace. Having
    the build ID for modules that aren't important just makes things messy.
    Splitting it to multiple lines for each module quickly explodes the number
    of lines printed in an oops too, possibly wrapping the warning off the
    console. And finally, trying to stash away each module used in a
    callstack to provide the ID of each symbol printed is cumbersome and would
    require changes to each architecture to stash away modules and return
    their build IDs once unwinding has completed.

    Instead, we opt for the simpler approach of introducing new printk formats
    '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
    for "pointer backtrace with module build ID" and then updating the few
    places in the architecture layer where the stacktrace is printed to use
    this new format.

    Before:

    Call trace:
    lkdtm_WARNING+0x28/0x30 [lkdtm]
    direct_entry+0x16c/0x1b4 [lkdtm]
    full_proxy_write+0x74/0xa4
    vfs_write+0xec/0x2e8

    After:

    Call trace:
    lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
    direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
    full_proxy_write+0x74/0xa4
    vfs_write+0xec/0x2e8

    [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
    [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
    Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
    [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
    [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
    Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com

    Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
    Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
    Link: https://sourceware.org/elfutils/Debuginfod.html [2]
    Signed-off-by: Stephen Boyd
    Signed-off-by: Bixuan Cui
    Signed-off-by: Randy Dunlap
    Cc: Jiri Olsa
    Cc: Alexei Starovoitov
    Cc: Jessica Yu
    Cc: Evan Green
    Cc: Hsin-Yi Wang
    Cc: Petr Mladek
    Cc: Steven Rostedt
    Cc: Sergey Senozhatsky
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Cc: Matthew Wilcox
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Young
    Cc: Ingo Molnar
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

08 Jul, 2021

1 commit


23 Jun, 2021

1 commit

  • Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
    "module.sig_enforce=1" on the boot command line sets "sig_enforce".
    Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.

    This patch makes the presence of /sys/module/module/parameters/sig_enforce
    dependent on CONFIG_MODULE_SIG=y.

    Fixes: fda784e50aac ("module: export module signature enforcement status")
    Reported-by: Nayna Jain
    Tested-by: Mimi Zohar
    Tested-by: Jessica Yu
    Signed-off-by: Mimi Zohar
    Signed-off-by: Jessica Yu
    Signed-off-by: Linus Torvalds

    Mimi Zohar
     

26 May, 2021

1 commit

  • Commit 013c1667cf78 ("kallsyms: refactor
    {,module_}kallsyms_on_each_symbol") replaced the return inside the
    nested loop with a break, changing the semantics of the function: the
    break only exits the innermost loop, so the code continues iterating the
    symbols of the next module instead of exiting.

    Fixes: 013c1667cf78 ("kallsyms: refactor {,module_}kallsyms_on_each_symbol")
    Reviewed-by: Petr Mladek
    Reviewed-by: Miroslav Benes
    Signed-off-by: Jon Mediero
    Signed-off-by: Jessica Yu

    Jon Mediero
     

17 May, 2021

1 commit

  • Previously, when CONFIG_MODULE_UNLOAD=n, the module loader just does not
    attempt to load exit sections since it never expects that any code in those
    sections will ever execute. However, dynamic code patching (alternatives,
    jump_label and static_call) can have sites in __exit code, even if __exit is
    never executed. Therefore __exit must be present at runtime, at least for as
    long as __init code is.

    Commit 33121347fb1c ("module: treat exit sections the same as init
    sections when !CONFIG_MODULE_UNLOAD") solves the requirements of
    jump_labels and static_calls by putting the exit sections in the init
    region of the module so that they are at least present at init, and
    discarded afterwards. It does this by including a check for exit
    sections in module_init_section(), so that it also returns true for exit
    sections, and the module loader will automatically sort them in the init
    region of the module.

    However, the solution there was not completely arch-independent. ARM is
    a special case where it supplies its own module_{init, exit}_section()
    functions. Instead of pushing the exit section checks into
    module_init_section(), just implement the exit section check in
    layout_sections(), so that we don't have to touch arch-dependent code.

    Fixes: 33121347fb1c ("module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD")
    Reviewed-by: Russell King (Oracle)
    Signed-off-by: Jessica Yu

    Jessica Yu
     

14 May, 2021

1 commit

  • Fix the following coccinelle report:

    kernel/module.c:1018:2-5:
    WARNING: Use BUG_ON instead of if condition followed by BUG.

    BUG_ON uses unlikely in if(). Through disassembly, we can see that
    brk #0x800 is compiled to the end of the function.
    As you can see below:
    ......
    ffffff8008660bec: d65f03c0 ret
    ffffff8008660bf0: d4210000 brk #0x800

    Usually, the condition in if () is not satisfied. For the
    multi-stage pipeline, we do not need to perform fetch decode
    and excute operation on brk instruction.

    In my opinion, this can improve the efficiency of the
    multi-stage pipeline.

    Signed-off-by: zhouchuangao
    Signed-off-by: Jessica Yu

    zhouchuangao
     

01 May, 2021

1 commit

  • Pull module updates from Jessica Yu:
    "Fix an age old bug involving jump_calls and static_labels when
    CONFIG_MODULE_UNLOAD=n.

    When CONFIG_MODULE_UNLOAD=n, it means you can't unload modules, so
    normally the __exit sections of a module are not loaded at all.
    However, dynamic code patching (jump_label, static_call, alternatives)
    can have sites in __exit sections even if __exit is never executed.

    Reported by Peter Zijlstra:
    'Alternatives, jump_labels and static_call all can have relocations
    into __exit code. Not loading it at all would be BAD.'

    Therefore, load the __exit sections even when CONFIG_MODULE_UNLOAD=n,
    and discard them after init"

    * tag 'modules-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD

    Linus Torvalds
     

09 Apr, 2021

1 commit

  • This change adds support for Clang’s forward-edge Control Flow
    Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
    injects a runtime check before each indirect function call to ensure
    the target is a valid function with the correct static type. This
    restricts possible call targets and makes it more difficult for
    an attacker to exploit bugs that allow the modification of stored
    function pointers. For more details, see:

    https://clang.llvm.org/docs/ControlFlowIntegrity.html

    Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
    visibility to possible call targets. Kernel modules are supported
    with Clang’s cross-DSO CFI mode, which allows checking between
    independently compiled components.

    With CFI enabled, the compiler injects a __cfi_check() function into
    the kernel and each module for validating local call targets. For
    cross-module calls that cannot be validated locally, the compiler
    calls the global __cfi_slowpath_diag() function, which determines
    the target module and calls the correct __cfi_check() function. This
    patch includes a slowpath implementation that uses __module_address()
    to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
    shadow map that speeds up module look-ups by ~3x.

    Clang implements indirect call checking using jump tables and
    offers two methods of generating them. With canonical jump tables,
    the compiler renames each address-taken function to .cfi
    and points the original symbol to a jump table entry, which passes
    __cfi_check() validation. This isn’t compatible with stand-alone
    assembly code, which the compiler doesn’t instrument, and would
    result in indirect calls to assembly code to fail. Therefore, we
    default to using non-canonical jump tables instead, where the compiler
    generates a local jump table entry .cfi_jt for each
    address-taken function, and replaces all references to the function
    with the address of the jump table entry.

    Note that because non-canonical jump table addresses are local
    to each component, they break cross-module function address
    equality. Specifically, the address of a global function will be
    different in each module, as it's replaced with the address of a local
    jump table entry. If this address is passed to a different module,
    it won’t match the address of the same function taken there. This
    may break code that relies on comparing addresses passed from other
    components.

    CFI checking can be disabled in a function with the __nocfi attribute.
    Additionally, CFI can be disabled for an entire compilation unit by
    filtering out CC_FLAGS_CFI.

    By default, CFI failures result in a kernel panic to stop a potential
    exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
    kernel prints out a rate-limited warning instead, and allows execution
    to continue. This option is helpful for locating type mismatches, but
    should only be enabled during development.

    Signed-off-by: Sami Tolvanen
    Reviewed-by: Kees Cook
    Tested-by: Nathan Chancellor
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com

    Sami Tolvanen
     

29 Mar, 2021

1 commit

  • Dynamic code patching (alternatives, jump_label and static_call) can
    have sites in __exit code, even it __exit is never executed. Therefore
    __exit must be present at runtime, at least for as long as __init code
    is.

    Additionally, for jump_label and static_call, the __exit sites must also
    identify as within_module_init(), such that the infrastructure is aware
    to never touch them after module init -- alternatives are only ran once
    at init and hence don't have this particular constraint.

    By making __exit identify as __init for MODULE_UNLOAD, the above is
    satisfied.

    So, when !CONFIG_MODULE_UNLOAD, the section ordering should look like the
    following, with the .exit sections moved to the init region of the module.

    Core section allocation order:
    .text
    .rodata
    __ksymtab_gpl
    __ksymtab_strings
    .note.* sections
    .bss
    .data
    .gnu.linkonce.this_module
    Init section allocation order:
    .init.text
    .exit.text
    .symtab
    .strtab

    [jeyu: thanks to Peter Zijlstra for most of changelog]

    Link: https://lore.kernel.org/lkml/YFiuphGw0RKehWsQ@gunter/
    Link: https://lore.kernel.org/r/20210323142756.11443-1-jeyu@kernel.org
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Jessica Yu

    Jessica Yu
     

10 Feb, 2021

1 commit


08 Feb, 2021

11 commits


19 Jan, 2021

1 commit

  • 5fdc7db644 ("module: setup load info before module_sig_check()")
    moved the ELF setup, so that it was done before the signature
    check. This made the module name available to signature error
    messages.

    However, the checks for ELF correctness in setup_load_info
    are not sufficient to prevent bad memory references due to
    corrupted offset fields, indices, etc.

    So, there's a regression in behavior here: a corrupt and unsigned
    (or badly signed) module, which might previously have been rejected
    immediately, can now cause an oops/crash.

    Harden ELF handling for module loading by doing the following:

    - Move the signature check back up so that it comes before ELF
    initialization. It's best to do the signature check to see
    if we can trust the module, before using the ELF structures
    inside it. This also makes checks against info->len
    more accurate again, as this field will be reduced by the
    length of the signature in mod_check_sig().

    The module name is now once again not available for error
    messages during the signature check, but that seems like
    a fair tradeoff.

    - Check if sections have offset / size fields that at least don't
    exceed the length of the module.

    - Check if sections have section name offsets that don't fall
    outside the section name table.

    - Add a few other sanity checks against invalid section indices,
    etc.

    This is not an exhaustive consistency check, but the idea is to
    at least get through the signature and blacklist checks without
    crashing because of corrupted ELF info, and to error out gracefully
    for most issues that would have caused problems later on.

    Fixes: 5fdc7db6448a ("module: setup load info before module_sig_check()")
    Signed-off-by: Frank van der Linden
    Signed-off-by: Jessica Yu

    Frank van der Linden
     

18 Jan, 2021

1 commit

  • clang-12 -fno-pic (since
    https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de0083333232da3f1d6)
    can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail`
    on x86. The two forms should have identical behaviors on x86-64 but the
    former causes GNU as
    Link: https://github.com/ClangBuiltLinux/linux/issues/1250
    Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27178
    Reported-by: Marco Elver
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Nathan Chancellor
    Tested-by: Marco Elver
    Signed-off-by: Fangrui Song
    Signed-off-by: Jessica Yu

    Fangrui Song
     

18 Dec, 2020

1 commit

  • Pull modules updates from Jessica Yu:
    "Summary of modules changes for the 5.11 merge window:

    - Fix a race condition between systemd/udev and the module loader.

    The module loader was sending a uevent before the module was fully
    initialized (i.e., before its init function has been called). This
    means udev can start processing the module uevent before the module
    has finished initializing, and some udev rules expect that the
    module has initialized already upon receiving the uevent.

    This resulted in some systemd mount units failing if udev processes
    the event faster than the module can finish init. This is fixed by
    delaying the uevent until after the module has called its init
    routine.

    - Make the linker array sections for kernel params and module version
    attributes more robust by switching to use the alignment of the
    type in question.

    Namely, linker section arrays will be constructed using the
    alignment required by the struct (using __alignof__()) as opposed
    to a specific value such as sizeof(void *) or sizeof(long). This is
    less likely to cause breakages should the size of the type ever
    change (Johan Hovold)

    - Fix module state inconsistency by setting it back to GOING when a
    module fails to load and is on its way out (Miroslav Benes)

    - Some comment and code cleanups (Sergey Shtylyov)"

    * tag 'modules-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    module: delay kobject uevent until after module init call
    module: drop semicolon from version macro
    init: use type alignment for kernel parameters
    params: clean up module-param macros
    params: use type alignment for kernel parameters
    params: drop redundant "unused" attributes
    module: simplify version-attribute handling
    module: drop version-attribute alignment
    module: fix comment style
    module: add more 'kernel-doc' comments
    module: fix up 'kernel-doc' comments
    module: only handle errors with the *switch* statement in module_sig_check()
    module: avoid *goto*s in module_sig_check()
    module: merge repetitive strings in module_sig_check()
    module: set MODULE_STATE_GOING state when a module fails to load

    Linus Torvalds
     

09 Dec, 2020

1 commit

  • Apparently there has been a longstanding race between udev/systemd and
    the module loader. Currently, the module loader sends a uevent right
    after sysfs initialization, but before the module calls its init
    function. However, some udev rules expect that the module has
    initialized already upon receiving the uevent.

    This race has been triggered recently (see link in references) in some
    systemd mount unit files. For instance, the configfs module creates the
    /sys/kernel/config mount point in its init function, however the module
    loader issues the uevent before this happens. sys-kernel-config.mount
    expects to be able to mount /sys/kernel/config upon receipt of the
    module loading uevent, but if the configfs module has not called its
    init function yet, then this directory will not exist and the mount unit
    fails. A similar situation exists for sys-fs-fuse-connections.mount, as
    the fuse sysfs mount point is created during the fuse module's init
    function. If udev is faster than module initialization then the mount
    unit would fail in a similar fashion.

    To fix this race, delay the module KOBJ_ADD uevent until after the
    module has finished calling its init routine.

    References: https://github.com/systemd/systemd/issues/17586
    Reviewed-by: Greg Kroah-Hartman
    Tested-By: Nicolas Morey-Chaisemartin
    Signed-off-by: Jessica Yu

    Jessica Yu
     

04 Dec, 2020

1 commit

  • Having real btf_data_size stored in struct module is benefitial to quickly
    determine which kernel modules have associated BTF object and which don't.
    There is no harm in keeping this info, as opposed to keeping invalid pointer.

    Fixes: 607c543f939d ("bpf: Sanitize BTF data pointer after module is loaded")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201203204634.1325171-3-andrii@kernel.org

    Andrii Nakryiko
     

25 Nov, 2020

1 commit

  • Given .BTF section is not allocatable, it will get trimmed after module is
    loaded. BPF system handles that properly by creating an independent copy of
    data. But prevent any accidental misused by resetting the pointer to BTF data.

    Fixes: 36e68442d1af ("bpf: Load and verify kernel module BTFs")
    Suggested-by: Jessica Yu
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Jessica Yu
    Cc: Greg Kroah-Hartman
    Link: https://lore.kernel.org/bpf/20201121070829.2612884-2-andrii@kernel.org

    Andrii Nakryiko
     

11 Nov, 2020

1 commit

  • Add kernel module listener that will load/validate and unload module BTF.
    Module BTFs gets ID generated for them, which makes it possible to iterate
    them with existing BTF iteration API. They are given their respective module's
    names, which will get reported through GET_OBJ_INFO API. They are also marked
    as in-kernel BTFs for tooling to distinguish them from user-provided BTFs.

    Also, similarly to vmlinux BTF, kernel module BTFs are exposed through
    sysfs as /sys/kernel/btf/. This is convenient for user-space
    tools to inspect module BTF contents and dump their types with existing tools:

    [vmuser@archvm bpf]$ ls -la /sys/kernel/btf
    total 0
    drwxr-xr-x 2 root root 0 Nov 4 19:46 .
    drwxr-xr-x 13 root root 0 Nov 4 19:46 ..

    ...

    -r--r--r-- 1 root root 888 Nov 4 19:46 irqbypass
    -r--r--r-- 1 root root 100225 Nov 4 19:46 kvm
    -r--r--r-- 1 root root 35401 Nov 4 19:46 kvm_intel
    -r--r--r-- 1 root root 120 Nov 4 19:46 pcspkr
    -r--r--r-- 1 root root 399 Nov 4 19:46 serio_raw
    -r--r--r-- 1 root root 4094095 Nov 4 19:46 vmlinux

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Greg Kroah-Hartman
    Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org

    Andrii Nakryiko
     

09 Nov, 2020

3 commits

  • Many comments in this module do not comply with the preferred multi-line
    comment style as reported by 'scripts/checkpatch.pl':

    WARNING: Block comments use * on subsequent lines
    WARNING: Block comments use a trailing */ on a separate line

    Fix those comments, along with (unreported for some reason?) the starts
    of the multi-line comments not being /* on their own line...

    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     
  • Some functions have the proper 'kernel-doc' comments but these don't start
    with proper /** -- fix that, along with adding () to the function name on
    the following lines to fully comply with the 'kernel-doc' format.

    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     
  • Some 'kernel-doc' function comments do not fully comply with the specified
    format due to:

    - missing () after the function name;

    - "RETURNS:"/"Returns:" instead of "Return:" when documenting the function's
    result.

    - empty line before describing the function's arguments.

    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     

04 Nov, 2020

3 commits

  • Let's handle the successful call of mod_verify_sig() right after that call,
    making the *switch* statement only handle the real errors, and then move
    the comment from the first *case* before *switch* itself and the comment
    before *default* after it. Fix the comment style, add article/comma/dash,
    spell out "nomem" as "lack of memory" in these comments, while at it...

    Suggested-by: Joe Perches
    Reviewed-by: Miroslav Benes
    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     
  • Let's move the common handling of the non-fatal errors after the *switch*
    statement -- this avoids *goto*s inside that *switch*...

    Suggested-by: Joe Perches
    Reviewed-by: Miroslav Benes
    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     
  • The 'reason' variable in module_sig_check() points to 3 strings across
    the *switch* statement, all needlessly starting with the same text.
    Let's put the starting text into the pr_notice() call -- it saves 21
    bytes of the object code (x86 gcc 10.2.1).

    Suggested-by: Joe Perches
    Reviewed-by: Miroslav Benes
    Signed-off-by: Sergey Shtylyov
    Signed-off-by: Jessica Yu

    Sergey Shtylyov
     

29 Oct, 2020

1 commit