27 Feb, 2020

1 commit

  • Pull tracing and bootconfig updates:
    "Fixes and changes to bootconfig before it goes live in a release.

    Change in API of bootconfig (before it comes live in a release):
    - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
    - Set CONFIG_BOOT_CONFIG to 'n' by default
    - Show error if "bootconfig" on cmdline but not compiled in
    - Prevent redefining the same value
    - Have a way to append values
    - Added a SELECT BLK_DEV_INITRD to fix a build failure

    Synthetic event fixes:
    - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
    - Fix samples always recording u64 values
    - Fix endianess
    - Check number of values matches number of fields
    - Fix a printing bug

    Fix of trace_printk() breaking postponed start up tests

    Make a function static that is only used in a single file"

    * tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
    bootconfig: Add append value operator support
    bootconfig: Prohibit re-defining value on same key
    bootconfig: Print array as multiple commands for legacy command line
    bootconfig: Reject subkey and value on same parent key
    tools/bootconfig: Remove unneeded error message silencer
    bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
    bootconfig: Set CONFIG_BOOT_CONFIG=n by default
    tracing: Clear trace_state when starting trace
    bootconfig: Mark boot_config_checksum() static
    tracing: Disable trace_printk() on post poned tests
    tracing: Have synthetic event test use raw_smp_processor_id()
    tracing: Fix number printing bug in print_synth_event()
    tracing: Check that number of vals matches number of synth event fields
    tracing: Make synth_event trace functions endian-correct
    tracing: Make sure synth_event_trace() example always uses u64

    Linus Torvalds
     

21 Feb, 2020

4 commits

  • Print arraied values as multiple same options for legacy
    kernel command line. With this rule, if the "kernel.*" and
    "init.*" array entries in bootconfig are printed out as
    multiple same options, e.g.

    kernel {
    console = "ttyS0,115200"
    console += "tty0"
    }

    will be correctly converted to

    console="ttyS0,115200" console="tty0"

    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220118213.26565.8163300497009463916.stgit@devnote2

    Reported-by: Borislav Petkov
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add bootconfig magic word to the end of bootconfig on initrd
    image for indicating explicitly the bootconfig is there.
    Also tools/bootconfig treats wrong size or wrong checksum or
    parse error as an error, because if there is a bootconfig magic
    word, there must be a bootconfig.

    The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word.
    Thus the block image of the initrd file with bootconfig is
    as follows.

    [Initrd][bootconfig][size][csum][#BOOTCONFIG\n]

    Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Set CONFIG_BOOT_CONFIG=n by default. This also warns
    user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • In fact, this function is only used in this file, so mark it with 'static'.

    Link: http://lkml.kernel.org/r/1581852511-14163-1-git-send-email-hqjagain@gmail.com

    Acked-by: Masami Hiramatsu
    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

12 Feb, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Various fixes:

    - Fix an uninitialized variable

    - Fix compile bug to bootconfig userspace tool (in tools directory)

    - Suppress some error messages of bootconfig userspace tool

    - Remove unneded CONFIG_LIBXBC from bootconfig

    - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
    taking up static memory at boot up

    - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string

    - Fix missing ring_buffer_nest_end() on synthetic event error path

    - Return zero not -EINVAL on soft disabled synthetic event (soft
    disabling must be the same as hard disabling, which returns zero)

    - Consolidate synthetic event code (remove duplicate code)"

    * tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Consolidate trace() functions
    tracing: Don't return -EINVAL when tracing soft disabled synth events
    tracing: Add missing nest end to synth_event_trace_start() error case
    tools/bootconfig: Suppress non-error messages
    bootconfig: Allocate xbc_nodes array dynamically
    bootconfig: Use parse_args() to find bootconfig and '--'
    tracing/kprobe: Fix uninitialized variable bug
    bootconfig: Remove unneeded CONFIG_LIBXBC
    tools/bootconfig: Fix wrong __VA_ARGS__ usage

    Linus Torvalds
     

11 Feb, 2020

1 commit

  • The current implementation does a naive search of "bootconfig" on the kernel
    command line. But this could find "bootconfig" that is part of another
    option in quotes (although highly unlikely). But it also needs to find '--'
    on the kernel command line to know if it should append a '--' or not when a
    bootconfig in the initrd file has an "init" section. The check uses the
    naive strstr() to find to see if it exists. But this can return a false
    positive if it exists in an option and then the "init" section in the initrd
    will not be appended properly.

    Using parse_args() to find both of these will solve both of these problems.

    Link: https://lore.kernel.org/r/202002070954.C18E7F58B@keescook

    Fixes: 7495e0926fdf3 ("bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline")
    Fixes: 1319916209ce8 ("bootconfig: init: Allow admin to use bootconfig for init command line")
    Reported-by: Kees Cook
    Reviewed-by: Kees Cook
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

06 Feb, 2020

3 commits

  • Pull tracing updates from Steven Rostedt:

    - Added new "bootconfig".

    This looks for a file appended to initrd to add boot config options,
    and has been discussed thoroughly at Linux Plumbers.

    Very useful for adding kprobes at bootup.

    Only enabled if "bootconfig" is on the real kernel command line.

    - Created dynamic event creation.

    Merges common code between creating synthetic events and kprobe
    events.

    - Rename perf "ring_buffer" structure to "perf_buffer"

    - Rename ftrace "ring_buffer" structure to "trace_buffer"

    Had to rename existing "trace_buffer" to "array_buffer"

    - Allow trace_printk() to work withing (some) tracing code.

    - Sort of tracing configs to be a little better organized

    - Fixed bug where ftrace_graph hash was not being protected properly

    - Various other small fixes and clean ups

    * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
    bootconfig: Show the number of nodes on boot message
    tools/bootconfig: Show the number of bootconfig nodes
    bootconfig: Add more parse error messages
    bootconfig: Use bootconfig instead of boot config
    ftrace: Protect ftrace_graph_hash with ftrace_sync
    ftrace: Add comment to why rcu_dereference_sched() is open coded
    tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
    tracing: Annotate ftrace_graph_hash pointer with __rcu
    bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
    tracing: Use seq_buf for building dynevent_cmd string
    tracing: Remove useless code in dynevent_arg_pair_add()
    tracing: Remove check_arg() callbacks from dynevent args
    tracing: Consolidate some synth_event_trace code
    tracing: Fix now invalid var_ref_vals assumption in trace action
    tracing: Change trace_boot to use synth_event interface
    tracing: Move tracing selftests to bottom of menu
    tracing: Move mmio tracer config up with the other tracers
    tracing: Move tracing test module configs together
    tracing: Move all function tracing configs together
    tracing: Documentation for in-kernel synthetic event API
    ...

    Linus Torvalds
     
  • Show the number of bootconfig nodes on boot message.

    Link: http://lkml.kernel.org/r/158091062297.27924.9051634676068550285.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Use "bootconfig" (1 word) instead of "boot config" (2 words)
    in the boot message.

    Link: http://lkml.kernel.org/r/158091059459.27924.14414336187441539879.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

05 Feb, 2020

1 commit

  • As the bootconfig is appended to the initrd it is not as easy to modify as
    the kernel command line. If there's some issue with the kernel, and the
    developer wants to boot a pristine kernel, it should not be needed to modify
    the initrd to remove the bootconfig for a single boot.

    As bootconfig is silently added (if the admin does not know where to look
    they may not know it's being loaded). It should be explicitly added to the
    kernel cmdline. The loading of the bootconfig is only done if "bootconfig"
    is on the kernel command line. This will let admins know that the kernel
    command line is extended.

    Note, after adding printk()s for when the size is too great or the checksum
    is wrong, exposed that the current method always looked for the boot config,
    and if this size and checksum matched, it would parse it (as if either is
    wrong a printk has been added to show this). It's better to only check this
    if the boot config is asked to be looked for.

    Link: https://lore.kernel.org/r/CAHk-=wjfjO+h6bQzrTf=YCZA53Y3EDyAs3Z4gEsT7icA3u_Psw@mail.gmail.com

    Acked-by: Masami Hiramatsu
    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

01 Feb, 2020

4 commits

  • This message leads to thinking that memory protection is not implemented
    for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
    only means that memory protection has not been selected at compile time.

    Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
    selected by the architecture. Instead, print "Kernel memory protection
    not selected by kernel config."

    Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.fr
    Signed-off-by: Christophe Leroy
    Acked-by: Kees Cook
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.

    unknown_bootoption passes unrecognized command line arguments to init as
    either environment variables or arguments. Some of the logic in the
    function is broken for quoted command line arguments.

    When an argument of the form param="value" is processed by parse_args
    and passed to unknown_bootoption, the command line has

    param\0"value\0

    with val pointing to the beginning of value. The helper function
    repair_env_string is then used to restore the '=' character that was
    removed by parse_args, and strip the quotes off fully. This results in

    param=value\0\0

    and val ends up pointing to the 'a' instead of the 'v' in value. This
    bug was introduced when repair_env_string was refactored into a separate
    function, and the decrement of val in repair_env_string became dead
    code.

    This causes two problems in unknown_bootoption in the two places where
    the val pointer is used as a substitute for the length of param:

    1. An argument of the form param=".value" is misinterpreted as a
    potential module parameter, with the result that it will not be
    placed in init's environment.

    2. An argument of the form param="value" is checked to see if param is
    an existing environment variable that should be overwritten, but the
    comparison is off-by-one and compares 'param=v' instead of 'param='
    against the existing environment. So passing, for example,
    TERM="vt100" on the command line results in init being passed both
    TERM=linux and TERM=vt100 in its environment.

    Patch 1 adds logging for the arguments and environment passed to init
    and is independent of the rest: it can be dropped if this is
    unnecessarily verbose.

    Patch 2 removes repair_env_string from initcall parameter parsing in
    do_initcall_level, as that uses a separate copy of the command line now
    and the repairing is no longer necessary.

    Patch 3 fixes the bug in unknown_bootoption by recording the length of
    param explicitly instead of implying it from val-param.

    This patch (of 3):

    Commit a99cd1125189 ("init: fix bug where environment vars can't be
    passed via boot args") introduced two minor bugs in unknown_bootoption
    by factoring out the quoted value handling into a separate function.

    When value is quoted, repair_env_string will move the value up 1 byte to
    strip the quotes, so val in unknown_bootoption no longer points to the
    actual location of the value.

    The result is that an argument of the form param=".value" is mistakenly
    treated as a potential module parameter and is not placed in init's
    environment, and an argument of the form param="value" can result in a
    duplicate environment variable: eg TERM="vt100" on the command line will
    result in both TERM=linux and TERM=vt100 being placed into init's
    environment.

    Fix this by recording the length of the param before calling
    repair_env_string instead of relying on val.

    Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Since commit 08746a65c296 ("init: fix in-place parameter modification
    regression"), parse_args in do_initcall_level is called on a copy of
    saved_command_line. It is unnecessary to call repair_env_string during
    this parsing, as this copy is not used for anything later.

    Remove the now unnecessary arguments from repair_env_string as well.

    Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Krzysztof Mazur
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Extend logging in `run_init_process` to also show the arguments and
    environment that we are passing to init.

    Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     

30 Jan, 2020

1 commit

  • Pull driver core updates from Greg KH:
    "Here is a small set of changes for 5.6-rc1 for the driver core and
    some firmware subsystem changes.

    Included in here are:
    - device.h splitup like you asked for months ago
    - devtmpfs minor cleanups
    - firmware core minor changes
    - debugfs fix for lockdown mode
    - kernfs cleanup fix
    - cpu topology minor fix

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits)
    firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS
    devtmpfs: factor out common tail of devtmpfs_{create,delete}_node
    devtmpfs: initify a bit
    devtmpfs: simplify initialization of mount_dev
    devtmpfs: factor out setup part of devtmpfsd()
    devtmpfs: fix theoretical stale pointer deref in devtmpfsd()
    driver core: platform: fix u32 greater or equal to zero comparison
    cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree
    debugfs: Return -EPERM when locked down
    driver core: Print device when resources present in really_probe()
    driver core: Fix test_async_driver_probe if NUMA is disabled
    driver core: platform: Prevent resouce overflow from causing infinite loops
    fs/kernfs/dir.c: Clean code by removing always true condition
    component: do not dereference opaque pointer in debugfs
    drivers/component: remove modular code
    debugfs: Fix warnings when building documentation
    device.h: move 'struct driver' stuff out to device/driver.h
    device.h: move 'struct class' stuff out to device/class.h
    device.h: move 'struct bus' stuff out to device/bus.h
    device.h: move dev_printk()-like functions to dev_printk.h
    ...

    Linus Torvalds
     

14 Jan, 2020

5 commits

  • Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
    debugging") has introduced a static key to reduce overhead when
    debug_pagealloc is compiled in but not enabled. It relied on the
    assumption that jump_label_init() is called before parse_early_param()
    as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
    it is safe to enable the static key.

    However, it turns out multiple architectures call parse_early_param()
    earlier from their setup_arch(). x86 also calls jump_label_init() even
    earlier, so no issue was found while testing the commit, but same is not
    true for e.g. ppc64 and s390 where the kernel would not boot with
    debug_pagealloc=on as found by our QA.

    To fix this without tricky changes to init code of multiple
    architectures, this patch partially reverts the static key conversion
    from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
    code) of debug_pagealloc_enabled() will again test a simple bool
    variable. Fastpath mm code is converted to a new
    debug_pagealloc_enabled_static() variant that relies on the static key,
    which is enabled in a well-defined point in mm_init() where it's
    guaranteed that jump_label_init() has been called, regardless of
    architecture.

    [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
    Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
    Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Stephen Rothwell
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Since the current kernel command line is too short to describe
    long and many options for init (e.g. systemd command line options),
    this allows admin to use boot config for init command line.

    All init command line under "init." keywords will be passed to
    init.

    For example,

    init.systemd {
    unified_cgroup_hierarchy = 1
    debug_shell
    default_timeout_start_sec = 60
    }

    Link: http://lkml.kernel.org/r/157867229521.17873.654222294326542349.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since the current kernel command line is too short to describe
    many options which supported by kernel, allow user to use boot
    config to setup (add) the command line options.

    All kernel parameters under "kernel." keywords will be used
    for setting up extra kernel command line.

    For example,

    kernel {
    audit = on
    audit_backlog_limit = 256
    }

    Note that you can not specify some early parameters
    (like console etc.) by this method, since it is
    loaded after early parameters parsed.

    Link: http://lkml.kernel.org/r/157867228333.17873.11962796367032622466.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since initcall_command_line is used as a temporary buffer,
    it could be freed after usage. Allocate it in do_initcall()
    and free it after used.

    Link: http://lkml.kernel.org/r/157867227145.17873.17513760552008505454.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Load the extended boot config data from the tail of initrd
    image. If there is an SKC data there, it has
    [(u32)size][(u32)checksum] header (in really, this is a
    footer) at the end of initrd. If the checksum (simple sum
    of bytes) is match, this starts parsing it from there.

    Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

03 Jan, 2020

1 commit

  • This reverts commit 8243186f0cc7 ("fs: remove ksys_dup()") and the
    subsequent fix for it in commit 2d3145f8d280 ("early init: fix error
    handling when opening /dev/console").

    Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls
    caused more trouble than what is worth it: it requires accessing vfs
    internals and it turns out there were other bugs in it too.

    In particular, the file reference counting was wrong - because unlike
    the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and
    thus had an extra leaked file reference.

    That in turn then caused odd problems with Androidx86 long after boot
    becaue of how the extra reference to the console kept the session active
    even after all file descriptors had been closed.

    Reported-by: youling 257
    Cc: Arvind Sankar
    Cc: Al Viro
    Signed-off-by: Dominik Brodowski
    Signed-off-by: Linus Torvalds

    Dominik Brodowski
     

18 Dec, 2019

1 commit

  • The comment says "this should never fail", but it definitely can fail
    when you have odd initial boot filesystems, or kernel configurations.

    So get the error handling right: filp_open() returns an error pointer.

    Reported-by: Jesse Barnes
    Reported-by: youling 257
    Fixes: 8243186f0cc7 ("fs: remove ksys_dup()")
    Cc: Dominik Brodowski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Dec, 2019

1 commit

  • device.h has everything and the kitchen sink when it comes to struct
    device things, so split out the struct driver things things to a
    separate .h file to make things easier to maintain and manage over time.

    Cc: "Rafael J. Wysocki"
    Cc: Suzuki K Poulose
    Cc: Saravana Kannan
    Cc: Heikki Krogerus
    Link: https://lore.kernel.org/r/20191209193303.1694546-7-gregkh@linuxfoundation.org
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

13 Dec, 2019

2 commits


28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

25 Sep, 2019

3 commits

  • Replace open-coded bitmap array initialization of init_mm.cpu_bitmask with
    neat CPU_BITS_NONE macro.

    And, since init_mm.cpu_bitmask is statically set to zero, there is no way
    to clear it again in start_kernel().

    Link: http://lkml.kernel.org/r/1565703815-8584-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
    cache for page table allocations on several architectures that do not use
    PAGE_SIZE tables for one or more levels of the page table hierarchy.

    Most architectures do not implement these functions and use __weak default
    NOP implementation of pgd_cache_init(). Since there is no such default
    for pgtable_cache_init(), its empty stub is duplicated among most
    architectures.

    Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
    drop empty stubs of pgtable_cache_init().

    Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Will Deacon [arm64]
    Acked-by: Thomas Gleixner [x86]
    Cc: Catalin Marinas
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Currently kmemleak uses a static early_log buffer to trace all memory
    allocation/freeing before the slab allocator is initialised. Such early
    log is replayed during kmemleak_init() to properly initialise the kmemleak
    metadata for objects allocated up that point. With a memory pool that
    does not rely on the slab allocator, it is possible to skip this early log
    entirely.

    In order to remove the early logging, consider kmemleak_enabled == 1 by
    default while the kmem_cache availability is checked directly on the
    object_cache and scan_area_cache variables. The RCU callback is only
    invoked after object_cache has been initialised as we wouldn't have any
    concurrent list traversal before this.

    In order to reduce the number of callbacks before kmemleak is fully
    initialised, move the kmemleak_init() call to mm_init().

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: remove WARN_ON(), per Catalin]
    Link: http://lkml.kernel.org/r/20190812160642.52134-4-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Qian Cai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

20 Aug, 2019

1 commit

  • The lockdown module is intended to allow for kernels to be locked down
    early in boot - sufficiently early that we don't have the ability to
    kmalloc() yet. Add support for early initialisation of some LSMs, and
    then add them to the list of names when we do full initialisation later.
    Early LSMs are initialised in link order and cannot be overridden via
    boot parameters, and cannot make use of kmalloc() (since the allocator
    isn't initialised yet).

    (Fixed by Stephen Rothwell to include a stub to fix builds when
    !CONFIG_SECURITY)

    Signed-off-by: Matthew Garrett
    Acked-by: Kees Cook
    Acked-by: Casey Schaufler
    Cc: Stephen Rothwell
    Signed-off-by: James Morris

    Matthew Garrett
     

01 Aug, 2019

1 commit

  • CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
    CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
    functionality which today depends on CONFIG_PREEMPT.

    Switch the preemption code, scheduler and init task over to use
    CONFIG_PREEMPTION.

    That's the first step towards RT in that area. The more complex changes are
    coming separately.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20190726212124.117528401@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

13 Jul, 2019

1 commit

  • Print the currently enabled stack and heap initialization modes.

    Stack initialization is enabled by a config flag, while heap
    initialization is configured at boot time with defaults being set in the
    config. It's more convenient for the user to have all information about
    these hardening measures in one place at boot, so the user can reason
    about the expected behavior of the running system.

    The possible options for stack are:
    - "all" for CONFIG_INIT_STACK_ALL;
    - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
    - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
    - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
    - "off" otherwise.

    Depending on the values of init_on_alloc and init_on_free boottime options
    we also report "heap alloc" and "heap free" as "on"/"off".

    In the init_on_free mode initializing pages at boot time may take a while,
    so print a notice about that as well. This depends on how much memory is
    installed, the memory bandwidth, etc. On a relatively modern x86 system,
    it takes about 0.75s/GB to wipe all memory:

    [ 0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
    [ 0.419765] mem auto-init: clearing system memory may take some time...
    [ 12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)

    Link: http://lkml.kernel.org/r/20190617151050.92663-3-glider@google.com
    Signed-off-by: Alexander Potapenko
    Suggested-by: Kees Cook
    Acked-by: Kees Cook
    Cc: Christoph Lameter
    Cc: Dmitry Vyukov
    Cc: James Morris
    Cc: Jann Horn
    Cc: Kostya Serebryany
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Masahiro Yamada
    Cc: Matthew Wilcox
    Cc: Nick Desaulniers
    Cc: Randy Dunlap
    Cc: Sandeep Patil
    Cc: "Serge E. Hallyn"
    Cc: Souptick Joarder
    Cc: Marco Elver
    Cc: Kaiwan N Billimoria
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

05 Jul, 2019

1 commit

  • No point having two call sites (earlier in init_rootfs() from
    mnt_init() in case we are going to use shmem-style rootfs,
    later from do_basic_setup() unconditionally), along with the
    logics in shmem_init() itself to make the second call a no-op...

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • Various architectures including x86 poison the freed init memory. Do the
    same in the generic free_initmem implementation and switch sparc32
    architecture that is identical to the generic code over to it now.

    Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "provide a generic free_initmem implementation", v2.

    Many architectures implement free_initmem() in exactly the same or very
    similar way: they wrap the call to free_initmem_default() with sometimes
    different 'poison' parameter.

    These patches switch those architectures to use a generic implementation
    that does free_initmem_default(POISON_FREE_INITMEM).

    This was inspired by Christoph's patches for free_initrd_mem [1] and I
    shamelessly copied changelog entries from his patches :)

    [1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

    This patch (of 2):

    For most architectures free_initmem just a wrapper for the same
    free_initmem_default(-1) call. Provide that as a generic implementation
    marked __weak.

    Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

08 May, 2019

2 commits

  • Pull randomness updates from Ted Ts'o:

    - initialize the random driver earler

    - fix CRNG initialization when we trust the CPU's RNG on NUMA systems

    - other miscellaneous cleanups and fixes.

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: add a spinlock_t to struct batched_entropy
    random: document get_random_int() family
    random: fix CRNG initialization when random.trust_cpu=1
    random: move rand_initialize() earlier
    random: only read from /dev/random after its pool has received 128 bits
    drivers/char/random.c: make primary_crng static
    drivers/char/random.c: remove unused stuct poolinfo::poolbits
    drivers/char/random.c: constify poolinfo_table

    Linus Torvalds
     
  • Pull printk updates from Petr Mladek:

    - Allow state reset of printk_once() calls.

    - Prevent crashes when dereferencing invalid pointers in vsprintf().
    Only the first byte is checked for simplicity.

    - Make vsprintf warnings consistent and inlined.

    - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
    modifiers.

    - Some clean up of vsprintf and test_printf code.

    * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Make function pointer_string static
    vsprintf: Limit the length of inlined error messages
    vsprintf: Avoid confusion between invalid address and value
    vsprintf: Prevent crash when dereferencing invalid pointers
    vsprintf: Consolidate handling of unknown pointer specifiers
    vsprintf: Factor out %pO handler as kobject_string()
    vsprintf: Factor out %pV handler as va_format()
    vsprintf: Factor out %p[iI] handler as ip_addr_string()
    vsprintf: Do not check address of well-known strings
    vsprintf: Consistent %pK handling for kptr_restrict == 0
    vsprintf: Shuffle restricted_pointer()
    printk: Tie printk_once / printk_deferred_once into .data.once for reset
    treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
    lib/test_printf: Switch to bitmap_zalloc()

    Linus Torvalds