01 Apr, 2020

2 commits

  • Pull networking updates from David Miller:
    "Highlights:

    1) Fix the iwlwifi regression, from Johannes Berg.

    2) Support BSS coloring and 802.11 encapsulation offloading in
    hardware, from John Crispin.

    3) Fix some potential Spectre issues in qtnfmac, from Sergey
    Matyukevich.

    4) Add TTL decrement action to openvswitch, from Matteo Croce.

    5) Allow paralleization through flow_action setup by not taking the
    RTNL mutex, from Vlad Buslov.

    6) A lot of zero-length array to flexible-array conversions, from
    Gustavo A. R. Silva.

    7) Align XDP statistics names across several drivers for consistency,
    from Lorenzo Bianconi.

    8) Add various pieces of infrastructure for offloading conntrack, and
    make use of it in mlx5 driver, from Paul Blakey.

    9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

    10) Lots of parallelization improvements during configuration changes
    in mlxsw driver, from Ido Schimmel.

    11) Add support to devlink for generic packet traps, which report
    packets dropped during ACL processing. And use them in mlxsw
    driver. From Jiri Pirko.

    12) Support bcmgenet on ACPI, from Jeremy Linton.

    13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
    Starovoitov, and your's truly.

    14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

    15) Fix sysfs permissions when network devices change namespaces, from
    Christian Brauner.

    16) Add a flags element to ethtool_ops so that drivers can more simply
    indicate which coalescing parameters they actually support, and
    therefore the generic layer can validate the user's ethtool
    request. Use this in all drivers, from Jakub Kicinski.

    17) Offload FIFO qdisc in mlxsw, from Petr Machata.

    18) Support UDP sockets in sockmap, from Lorenz Bauer.

    19) Fix stretch ACK bugs in several TCP congestion control modules,
    from Pengcheng Yang.

    20) Support virtual functiosn in octeontx2 driver, from Tomasz
    Duszynski.

    21) Add region operations for devlink and use it in ice driver to dump
    NVM contents, from Jacob Keller.

    22) Add support for hw offload of MACSEC, from Antoine Tenart.

    23) Add support for BPF programs that can be attached to LSM hooks,
    from KP Singh.

    24) Support for multiple paths, path managers, and counters in MPTCP.
    From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
    and others.

    25) More progress on adding the netlink interface to ethtool, from
    Michal Kubecek"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
    net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
    cxgb4/chcr: nic-tls stats in ethtool
    net: dsa: fix oops while probing Marvell DSA switches
    net/bpfilter: remove superfluous testing message
    net: macb: Fix handling of fixed-link node
    net: dsa: ksz: Select KSZ protocol tag
    netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
    net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
    net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
    net: stmmac: create dwmac-intel.c to contain all Intel platform
    net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
    net: dsa: bcm_sf2: Add support for matching VLAN TCI
    net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
    net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
    net: dsa: bcm_sf2: Disable learning for ASP port
    net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
    net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
    net: dsa: b53: Restore VLAN entries upon (re)configuration
    net: dsa: bcm_sf2: Fix overflow checks
    hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
    ...

    Linus Torvalds
     
  • Pull Kbuild updates from Masahiro Yamada:
    "Build system:

    - add CONFIG_UNUSED_KSYMS_WHITELIST, which will be useful to define a
    fixed set of export symbols for Generic Kernel Image (GKI)

    - allow to run 'make dt_binding_check' without .config

    - use full schema for checking DT examples in *.yaml files

    - make modpost fail for missing MODULE_IMPORT_NS(), which makes more
    sense because we know the produced modules are never loadable

    - Remove unused 'AS' variable

    Kconfig:

    - sanitize DEFCONFIG_LIST, and remove ARCH_DEFCONFIG from Kconfig
    files

    - relax the 'imply' behavior so that symbols implied by 'y' can
    become 'm'

    - make 'imply' obey 'depends on' in order to make 'imply' really weak

    Misc:

    - add documentation on building the kernel with Clang/LLVM

    - revive __HAVE_ARCH_STRLEN for 32bit sparc to use optimized strlen()

    - fix warning from deb-pkg builds when CONFIG_DEBUG_INFO=n

    - various script and Makefile cleanups"

    * tag 'kbuild-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
    Makefile: Update kselftest help information
    kbuild: deb-pkg: fix warning when CONFIG_DEBUG_INFO is unset
    kbuild: add outputmakefile to no-dot-config-targets
    kbuild: remove AS variable
    net: wan: wanxl: refactor the firmware rebuild rule
    net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
    net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
    kbuild: add comment about grouped target
    kbuild: add -Wall to KBUILD_HOSTCXXFLAGS
    kconfig: remove unused variable in qconf.cc
    sparc: revive __HAVE_ARCH_STRLEN for 32bit sparc
    kbuild: refactor Makefile.dtbinst more
    kbuild: compute the dtbs_install destination more simply
    Makefile: disallow data races on gcc-10 as well
    kconfig: make 'imply' obey the direct dependency
    kconfig: allow symbols implied by y to become m
    net: drop_monitor: use IS_REACHABLE() to guard net_dm_hw_report()
    modpost: return error if module is missing ns imports and MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=n
    modpost: rework and consolidate logging interface
    kbuild: allow to run dt_binding_check without kernel configuration
    ...

    Linus Torvalds
     

31 Mar, 2020

4 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle are:

    - Various NUMA scheduling updates: harmonize the load-balancer and
    NUMA placement logic to not work against each other. The intended
    result is better locality, better utilization and fewer migrations.

    - Introduce Thermal Pressure tracking and optimizations, to improve
    task placement on thermally overloaded systems.

    - Implement frequency invariant scheduler accounting on (some) x86
    CPUs. This is done by observing and sampling the 'recent' CPU
    frequency average at ~tick boundaries. The CPU provides this data
    via the APERF/MPERF MSRs. This hopefully makes our capacity
    estimates more precise and keeps tasks on the same CPU better even
    if it might seem overloaded at a lower momentary frequency. (As
    usual, turbo mode is a complication that we resolve by observing
    the maximum frequency and renormalizing to it.)

    - Add asymmetric CPU capacity wakeup scan to improve capacity
    utilization on asymmetric topologies. (big.LITTLE systems)

    - PSI fixes and optimizations.

    - RT scheduling capacity awareness fixes & improvements.

    - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

    - Misc fixes, cleanups and optimizations - see the changelog for
    details"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
    threads: Update PID limit comment according to futex UAPI change
    sched/fair: Fix condition of avg_load calculation
    sched/rt: cpupri_find: Trigger a full search as fallback
    kthread: Do not preempt current task if it is going to call schedule()
    sched/fair: Improve spreading of utilization
    sched: Avoid scale real weight down to zero
    psi: Move PF_MEMSTALL out of task->flags
    MAINTAINERS: Add maintenance information for psi
    psi: Optimize switching tasks inside shared cgroups
    psi: Fix cpu.pressure for cpu.max and competing cgroups
    sched/core: Distribute tasks within affinity masks
    sched/fair: Fix enqueue_task_fair warning
    thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
    sched/rt: Remove unnecessary push for unfit tasks
    sched/rt: Allow pulling unfitting task
    sched/rt: Optimize cpupri_find() on non-heterogenous systems
    sched/rt: Re-instate old behavior in select_task_rq_rt()
    sched/rt: cpupri_find: Implement fallback mechanism for !fit case
    sched/fair: Fix reordering of enqueue/dequeue_task_fair()
    sched/fair: Fix runnable_avg for throttled cfs
    ...

    Linus Torvalds
     
  • LSM and tracing programs share their helpers with bpf_tracing_func_proto
    which is only defined (in bpf_trace.c) when BPF_EVENTS is enabled.

    Instead of adding __weak symbol, make BPF_LSM depend on BPF_EVENTS so
    that both tracing and LSM programs can actually share helpers.

    Fixes: fc611f47f218 ("bpf: Introduce BPF_PROG_TYPE_LSM")
    Reported-by: Randy Dunlap
    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200330204059.13024-1-kpsingh@chromium.org

    KP Singh
     
  • Pull block updates from Jens Axboe:

    - Online capacity resizing (Balbir)

    - Number of hardware queue change fixes (Bart)

    - null_blk fault injection addition (Bart)

    - Cleanup of queue allocation, unifying the node/no-node API
    (Christoph)

    - Cleanup of genhd, moving code to where it makes sense (Christoph)

    - Cleanup of the partition handling code (Christoph)

    - disk stat fixes/improvements (Konstantin)

    - BFQ improvements (Paolo)

    - Various fixes and improvements

    * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
    block: return NULL in blk_alloc_queue() on error
    block: move bio_map_* to blk-map.c
    Revert "blkdev: check for valid request queue before issuing flush"
    block: simplify queue allocation
    bcache: pass the make_request methods to blk_queue_make_request
    null_blk: use blk_mq_init_queue_data
    block: add a blk_mq_init_queue_data helper
    block: move the ->devnode callback to struct block_device_operations
    block: move the part_stat* helpers from genhd.h to a new header
    block: move block layer internals out of include/linux/genhd.h
    block: move guard_bio_eod to bio.c
    block: unexport get_gendisk
    block: unexport disk_map_sector_rcu
    block: unexport disk_get_part
    block: mark part_in_flight and part_in_flight_rw static
    block: mark block_depr static
    block: factor out requeue handling from dispatch code
    block/diskstats: replace time_in_queue with sum of request times
    block/diskstats: accumulate all per-cpu counters in one pass
    block/diskstats: more accurate approximation of io_ticks for slow disks
    ...

    Linus Torvalds
     

30 Mar, 2020

1 commit

  • Introduce types and configs for bpf programs that can be attached to
    LSM hooks. The programs can be enabled by the config option
    CONFIG_BPF_LSM.

    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Brendan Jackman
    Reviewed-by: Florent Revest
    Reviewed-by: Thomas Garnier
    Acked-by: Yonghong Song
    Acked-by: Andrii Nakryiko
    Acked-by: James Morris
    Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org

    KP Singh
     

24 Mar, 2020

1 commit

  • There is no good reason for __bdevname to exist. Just open code
    printing the string in the callers. For three of them the format
    string can be trivially merged into existing printk statements,
    and in init/do_mounts.c we can at least do the scnprintf once at
    the start of the function, and unconditional of CONFIG_BLOCK to
    make the output for tiny configfs a little more helpful.

    Acked-by: Theodore Ts'o # for ext4
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Mar, 2020

1 commit

  • The support for __uint128_t is dependent on the target bit size.

    GCC that defaults to the 32-bit can still build the 64-bit kernel
    with -m64 flag passed.

    However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the
    default machine bit, which may not match to the kernel it is building.

    Theoretically, this could be evaluated separately for 64BIT/32BIT.

    config CC_HAS_INT128
    bool
    default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT
    default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0)

    I simplified it more because the 32-bit compiler is unlikely to support
    __uint128_t.

    Fixes: c12d3362a74b ("int128: move __uint128_t compiler test to Kconfig")
    Reported-by: George Spelvin
    Signed-off-by: Masahiro Yamada
    Tested-by: George Spelvin

    Masahiro Yamada
     

06 Mar, 2020

1 commit

  • Extrapolating on the existing framework to track rt/dl utilization using
    pelt signals, add a similar mechanism to track thermal pressure. The
    difference here from rt/dl utilization tracking is that, instead of
    tracking time spent by a CPU running a RT/DL task through util_avg, the
    average thermal pressure is tracked through load_avg. This is because
    thermal pressure signal is weighted time "delta" capacity unlike util_avg
    which is binary. "delta capacity" here means delta between the actual
    capacity of a CPU and the decreased capacity a CPU due to a thermal event.

    In order to track average thermal pressure, a new sched_avg variable
    avg_thermal is introduced. Function update_thermal_load_avg can be called
    to do the periodic bookkeeping (accumulate, decay and average) of the
    thermal pressure.

    Reviewed-by: Vincent Guittot
    Signed-off-by: Thara Gopinath
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org

    Thara Gopinath
     

03 Mar, 2020

2 commits

  • CONFIG_TRIM_UNUSED_KSYMS currently removes all unused exported symbols
    from ksymtab. This works really well when using in-tree drivers, but
    cannot be used in its current form if some of them are out-of-tree.

    Indeed, even if the list of symbols required by out-of-tree drivers is
    known at compile time, the only solution today to guarantee these don't
    get trimmed is to set CONFIG_TRIM_UNUSED_KSYMS=n. This not only wastes
    space, but also makes it difficult to control the ABI usable by vendor
    modules in distribution kernels such as Android. Being able to control
    the kernel ABI surface is particularly useful to ship a unique Generic
    Kernel Image (GKI) for all vendors, which is a first step in the
    direction of getting all vendors to contribute their code upstream.

    As such, attempt to improve the situation by enabling users to specify a
    symbol 'whitelist' at compile time. Any symbol specified in this
    whitelist will be kept exported when CONFIG_TRIM_UNUSED_KSYMS is set,
    even if it has no in-tree user. The whitelist is defined as a simple
    text file, listing symbols, one per line.

    Acked-by: Jessica Yu
    Acked-by: Nicolas Pitre
    Tested-by: Matthias Maennich
    Reviewed-by: Matthias Maennich
    Signed-off-by: Quentin Perret
    Signed-off-by: Masahiro Yamada

    Quentin Perret
     
  • Most of the Kconfig commands (except defconfig and all*config) read
    the .config file as a base set of CONFIG options.

    When it does not exist, the files in DEFCONFIG_LIST are searched in
    this order and loaded if found.

    I do not see much sense in the last two lines in DEFCONFIG_LIST.

    [1] ARCH_DEFCONFIG

    The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the
    ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless.

    arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or
    64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG
    should be a fixed string because the base config file is loaded before
    the symbol evaluation stage.

    Using KBUILD_DEFCONFIG makes more sense because it is fixed before
    Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it
    in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG
    with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)".

    [2] arch/$(ARCH)/defconfig

    This file path is no longer valid. The defconfig files are always located
    in the arch configs/ directories.

    $ find arch -name defconfig | sort
    arch/alpha/configs/defconfig
    arch/arm64/configs/defconfig
    arch/csky/configs/defconfig
    arch/nds32/configs/defconfig
    arch/riscv/configs/defconfig
    arch/s390/configs/defconfig
    arch/unicore32/configs/defconfig

    The path arch/*/configs/defconfig is already covered by
    "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is
    not necessary.

    I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise,
    the 7 architectures listed above would end up with endless loop of
    syncconfig.

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

27 Feb, 2020

1 commit

  • Pull tracing and bootconfig updates:
    "Fixes and changes to bootconfig before it goes live in a release.

    Change in API of bootconfig (before it comes live in a release):
    - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
    - Set CONFIG_BOOT_CONFIG to 'n' by default
    - Show error if "bootconfig" on cmdline but not compiled in
    - Prevent redefining the same value
    - Have a way to append values
    - Added a SELECT BLK_DEV_INITRD to fix a build failure

    Synthetic event fixes:
    - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
    - Fix samples always recording u64 values
    - Fix endianess
    - Check number of values matches number of fields
    - Fix a printing bug

    Fix of trace_printk() breaking postponed start up tests

    Make a function static that is only used in a single file"

    * tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
    bootconfig: Add append value operator support
    bootconfig: Prohibit re-defining value on same key
    bootconfig: Print array as multiple commands for legacy command line
    bootconfig: Reject subkey and value on same parent key
    tools/bootconfig: Remove unneeded error message silencer
    bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
    bootconfig: Set CONFIG_BOOT_CONFIG=n by default
    tracing: Clear trace_state when starting trace
    bootconfig: Mark boot_config_checksum() static
    tracing: Disable trace_printk() on post poned tests
    tracing: Have synthetic event test use raw_smp_processor_id()
    tracing: Fix number printing bug in print_synth_event()
    tracing: Check that number of vals matches number of synth event fields
    tracing: Make synth_event trace functions endian-correct
    tracing: Make sure synth_event_trace() example always uses u64

    Linus Torvalds
     

26 Feb, 2020

1 commit

  • Since commit d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
    default") also changed the CONFIG_BOOTTIME_TRACING to select
    CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
    it introduced wrong dependencies with BLK_DEV_INITRD as below.

    WARNING: unmet direct dependencies detected for BOOT_CONFIG
    Depends on [n]: BLK_DEV_INITRD [=n]
    Selected by [y]:
    - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]

    This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
    fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
    that both boot-time tracing and boot configuration off but those
    appear on the menu list.

    Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2

    Fixes: d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
    Reported-by: Randy Dunlap
    Compiled-tested-by: Randy Dunlap
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

21 Feb, 2020

4 commits

  • Print arraied values as multiple same options for legacy
    kernel command line. With this rule, if the "kernel.*" and
    "init.*" array entries in bootconfig are printed out as
    multiple same options, e.g.

    kernel {
    console = "ttyS0,115200"
    console += "tty0"
    }

    will be correctly converted to

    console="ttyS0,115200" console="tty0"

    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220118213.26565.8163300497009463916.stgit@devnote2

    Reported-by: Borislav Petkov
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add bootconfig magic word to the end of bootconfig on initrd
    image for indicating explicitly the bootconfig is there.
    Also tools/bootconfig treats wrong size or wrong checksum or
    parse error as an error, because if there is a bootconfig magic
    word, there must be a bootconfig.

    The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word.
    Thus the block image of the initrd file with bootconfig is
    as follows.

    [Initrd][bootconfig][size][csum][#BOOTCONFIG\n]

    Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Set CONFIG_BOOT_CONFIG=n by default. This also warns
    user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • In fact, this function is only used in this file, so mark it with 'static'.

    Link: http://lkml.kernel.org/r/1581852511-14163-1-git-send-email-hqjagain@gmail.com

    Acked-by: Masami Hiramatsu
    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

12 Feb, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Various fixes:

    - Fix an uninitialized variable

    - Fix compile bug to bootconfig userspace tool (in tools directory)

    - Suppress some error messages of bootconfig userspace tool

    - Remove unneded CONFIG_LIBXBC from bootconfig

    - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
    taking up static memory at boot up

    - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string

    - Fix missing ring_buffer_nest_end() on synthetic event error path

    - Return zero not -EINVAL on soft disabled synthetic event (soft
    disabling must be the same as hard disabling, which returns zero)

    - Consolidate synthetic event code (remove duplicate code)"

    * tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Consolidate trace() functions
    tracing: Don't return -EINVAL when tracing soft disabled synth events
    tracing: Add missing nest end to synth_event_trace_start() error case
    tools/bootconfig: Suppress non-error messages
    bootconfig: Allocate xbc_nodes array dynamically
    bootconfig: Use parse_args() to find bootconfig and '--'
    tracing/kprobe: Fix uninitialized variable bug
    bootconfig: Remove unneeded CONFIG_LIBXBC
    tools/bootconfig: Fix wrong __VA_ARGS__ usage

    Linus Torvalds
     

11 Feb, 2020

2 commits

  • The current implementation does a naive search of "bootconfig" on the kernel
    command line. But this could find "bootconfig" that is part of another
    option in quotes (although highly unlikely). But it also needs to find '--'
    on the kernel command line to know if it should append a '--' or not when a
    bootconfig in the initrd file has an "init" section. The check uses the
    naive strstr() to find to see if it exists. But this can return a false
    positive if it exists in an option and then the "init" section in the initrd
    will not be appended properly.

    Using parse_args() to find both of these will solve both of these problems.

    Link: https://lore.kernel.org/r/202002070954.C18E7F58B@keescook

    Fixes: 7495e0926fdf3 ("bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline")
    Fixes: 1319916209ce8 ("bootconfig: init: Allow admin to use bootconfig for init command line")
    Reported-by: Kees Cook
    Reviewed-by: Kees Cook
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Since there is no user except CONFIG_BOOT_CONFIG and no plan
    to use it from other functions, CONFIG_LIBXBC can be removed
    and we can use CONFIG_BOOT_CONFIG directly.

    Link: http://lkml.kernel.org/r/158098769281.939.16293492056419481105.stgit@devnote2

    Suggested-by: Geert Uytterhoeven
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

06 Feb, 2020

3 commits

  • Pull tracing updates from Steven Rostedt:

    - Added new "bootconfig".

    This looks for a file appended to initrd to add boot config options,
    and has been discussed thoroughly at Linux Plumbers.

    Very useful for adding kprobes at bootup.

    Only enabled if "bootconfig" is on the real kernel command line.

    - Created dynamic event creation.

    Merges common code between creating synthetic events and kprobe
    events.

    - Rename perf "ring_buffer" structure to "perf_buffer"

    - Rename ftrace "ring_buffer" structure to "trace_buffer"

    Had to rename existing "trace_buffer" to "array_buffer"

    - Allow trace_printk() to work withing (some) tracing code.

    - Sort of tracing configs to be a little better organized

    - Fixed bug where ftrace_graph hash was not being protected properly

    - Various other small fixes and clean ups

    * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
    bootconfig: Show the number of nodes on boot message
    tools/bootconfig: Show the number of bootconfig nodes
    bootconfig: Add more parse error messages
    bootconfig: Use bootconfig instead of boot config
    ftrace: Protect ftrace_graph_hash with ftrace_sync
    ftrace: Add comment to why rcu_dereference_sched() is open coded
    tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
    tracing: Annotate ftrace_graph_hash pointer with __rcu
    bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
    tracing: Use seq_buf for building dynevent_cmd string
    tracing: Remove useless code in dynevent_arg_pair_add()
    tracing: Remove check_arg() callbacks from dynevent args
    tracing: Consolidate some synth_event_trace code
    tracing: Fix now invalid var_ref_vals assumption in trace action
    tracing: Change trace_boot to use synth_event interface
    tracing: Move tracing selftests to bottom of menu
    tracing: Move mmio tracer config up with the other tracers
    tracing: Move tracing test module configs together
    tracing: Move all function tracing configs together
    tracing: Documentation for in-kernel synthetic event API
    ...

    Linus Torvalds
     
  • Show the number of bootconfig nodes on boot message.

    Link: http://lkml.kernel.org/r/158091062297.27924.9051634676068550285.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Use "bootconfig" (1 word) instead of "boot config" (2 words)
    in the boot message.

    Link: http://lkml.kernel.org/r/158091059459.27924.14414336187441539879.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

05 Feb, 2020

1 commit

  • As the bootconfig is appended to the initrd it is not as easy to modify as
    the kernel command line. If there's some issue with the kernel, and the
    developer wants to boot a pristine kernel, it should not be needed to modify
    the initrd to remove the bootconfig for a single boot.

    As bootconfig is silently added (if the admin does not know where to look
    they may not know it's being loaded). It should be explicitly added to the
    kernel cmdline. The loading of the bootconfig is only done if "bootconfig"
    is on the kernel command line. This will let admins know that the kernel
    command line is extended.

    Note, after adding printk()s for when the size is too great or the checksum
    is wrong, exposed that the current method always looked for the boot config,
    and if this size and checksum matched, it would parse it (as if either is
    wrong a printk has been added to show this). It's better to only check this
    if the boot config is asked to be looked for.

    Link: https://lore.kernel.org/r/CAHk-=wjfjO+h6bQzrTf=YCZA53Y3EDyAs3Z4gEsT7icA3u_Psw@mail.gmail.com

    Acked-by: Masami Hiramatsu
    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

01 Feb, 2020

4 commits

  • This message leads to thinking that memory protection is not implemented
    for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
    only means that memory protection has not been selected at compile time.

    Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
    selected by the architecture. Instead, print "Kernel memory protection
    not selected by kernel config."

    Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.fr
    Signed-off-by: Christophe Leroy
    Acked-by: Kees Cook
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.

    unknown_bootoption passes unrecognized command line arguments to init as
    either environment variables or arguments. Some of the logic in the
    function is broken for quoted command line arguments.

    When an argument of the form param="value" is processed by parse_args
    and passed to unknown_bootoption, the command line has

    param\0"value\0

    with val pointing to the beginning of value. The helper function
    repair_env_string is then used to restore the '=' character that was
    removed by parse_args, and strip the quotes off fully. This results in

    param=value\0\0

    and val ends up pointing to the 'a' instead of the 'v' in value. This
    bug was introduced when repair_env_string was refactored into a separate
    function, and the decrement of val in repair_env_string became dead
    code.

    This causes two problems in unknown_bootoption in the two places where
    the val pointer is used as a substitute for the length of param:

    1. An argument of the form param=".value" is misinterpreted as a
    potential module parameter, with the result that it will not be
    placed in init's environment.

    2. An argument of the form param="value" is checked to see if param is
    an existing environment variable that should be overwritten, but the
    comparison is off-by-one and compares 'param=v' instead of 'param='
    against the existing environment. So passing, for example,
    TERM="vt100" on the command line results in init being passed both
    TERM=linux and TERM=vt100 in its environment.

    Patch 1 adds logging for the arguments and environment passed to init
    and is independent of the rest: it can be dropped if this is
    unnecessarily verbose.

    Patch 2 removes repair_env_string from initcall parameter parsing in
    do_initcall_level, as that uses a separate copy of the command line now
    and the repairing is no longer necessary.

    Patch 3 fixes the bug in unknown_bootoption by recording the length of
    param explicitly instead of implying it from val-param.

    This patch (of 3):

    Commit a99cd1125189 ("init: fix bug where environment vars can't be
    passed via boot args") introduced two minor bugs in unknown_bootoption
    by factoring out the quoted value handling into a separate function.

    When value is quoted, repair_env_string will move the value up 1 byte to
    strip the quotes, so val in unknown_bootoption no longer points to the
    actual location of the value.

    The result is that an argument of the form param=".value" is mistakenly
    treated as a potential module parameter and is not placed in init's
    environment, and an argument of the form param="value" can result in a
    duplicate environment variable: eg TERM="vt100" on the command line will
    result in both TERM=linux and TERM=vt100 being placed into init's
    environment.

    Fix this by recording the length of the param before calling
    repair_env_string instead of relying on val.

    Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Since commit 08746a65c296 ("init: fix in-place parameter modification
    regression"), parse_args in do_initcall_level is called on a copy of
    saved_command_line. It is unnecessary to call repair_env_string during
    this parsing, as this copy is not used for anything later.

    Remove the now unnecessary arguments from repair_env_string as well.

    Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Krzysztof Mazur
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Extend logging in `run_init_process` to also show the arguments and
    environment that we are passing to init.

    Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     

30 Jan, 2020

1 commit

  • Pull driver core updates from Greg KH:
    "Here is a small set of changes for 5.6-rc1 for the driver core and
    some firmware subsystem changes.

    Included in here are:
    - device.h splitup like you asked for months ago
    - devtmpfs minor cleanups
    - firmware core minor changes
    - debugfs fix for lockdown mode
    - kernfs cleanup fix
    - cpu topology minor fix

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits)
    firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS
    devtmpfs: factor out common tail of devtmpfs_{create,delete}_node
    devtmpfs: initify a bit
    devtmpfs: simplify initialization of mount_dev
    devtmpfs: factor out setup part of devtmpfsd()
    devtmpfs: fix theoretical stale pointer deref in devtmpfsd()
    driver core: platform: fix u32 greater or equal to zero comparison
    cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree
    debugfs: Return -EPERM when locked down
    driver core: Print device when resources present in really_probe()
    driver core: Fix test_async_driver_probe if NUMA is disabled
    driver core: platform: Prevent resouce overflow from causing infinite loops
    fs/kernfs/dir.c: Clean code by removing always true condition
    component: do not dereference opaque pointer in debugfs
    drivers/component: remove modular code
    debugfs: Fix warnings when building documentation
    device.h: move 'struct driver' stuff out to device/driver.h
    device.h: move 'struct class' stuff out to device/class.h
    device.h: move 'struct bus' stuff out to device/bus.h
    device.h: move dev_printk()-like functions to dev_printk.h
    ...

    Linus Torvalds
     

29 Jan, 2020

3 commits

  • Pull UML updates from Anton Ivanov:
    "I am sending this on behalf of Richard who is traveling.

    This contains the following changes for UML:

    - Fix for time travel mode

    - Disable CONFIG_CONSTRUCTORS again

    - A new command line option to have an non-raw serial line

    - Preparations to remove obsolete UML network drivers"

    * tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Fix time-travel=inf-cpu with xor/raid6
    Revert "um: Enable CONFIG_CONSTRUCTORS"
    um: Mark non-vector net transports as obsolete
    um: Add an option to make serial driver non-raw

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Add WireGuard

    2) Add HE and TWT support to ath11k driver, from John Crispin.

    3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

    4) Add variable window congestion control to TIPC, from Jon Maloy.

    5) Add BCM84881 PHY driver, from Russell King.

    6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

    7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

    8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

    9) Add BPF dynamic program extensions, from Alexei Starovoitov.

    10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

    11) Add support for macsec HW offloading, from Antoine Tenart.

    12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

    13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
    net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
    udp: segment looped gso packets correctly
    netem: change mailing list
    qed: FW 8.42.2.0 debug features
    qed: rt init valid initialization changed
    qed: Debug feature: ilt and mdump
    qed: FW 8.42.2.0 Add fw overlay feature
    qed: FW 8.42.2.0 HSI changes
    qed: FW 8.42.2.0 iscsi/fcoe changes
    qed: Add abstraction for different hsi values per chip
    qed: FW 8.42.2.0 Additional ll2 type
    qed: Use dmae to write to widebus registers in fw_funcs
    qed: FW 8.42.2.0 Parser offsets modified
    qed: FW 8.42.2.0 Queue Manager changes
    qed: FW 8.42.2.0 Expose new registers and change windows
    qed: FW 8.42.2.0 Internal ram offsets modifications
    MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
    Documentation: net: octeontx2: Add RVU HW and drivers overview
    octeontx2-pf: ethtool RSS config support
    octeontx2-pf: Add basic ethtool support
    ...

    Linus Torvalds
     
  • Pull objtool updates from Ingo Molnar:
    "The main changes are to move the ORC unwind table sorting from early
    init to build-time - this speeds up booting.

    No change in functionality intended"

    * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/unwind/orc: Fix !CONFIG_MODULES build warning
    x86/unwind/orc: Remove boot-time ORC unwind tables sorting
    scripts/sorttable: Implement build-time ORC unwind table sorting
    scripts/sorttable: Rename 'sortextable' to 'sorttable'
    scripts/sortextable: Refactor the do_func() function
    scripts/sortextable: Remove dead code
    scripts/sortextable: Clean up the code to meet the kernel coding style better
    scripts/sortextable: Rewrite error/success handling

    Linus Torvalds
     

28 Jan, 2020

1 commit

  • Pull timer updates from Thomas Gleixner:
    "The timekeeping and timers departement provides:

    - Time namespace support:

    If a container migrates from one host to another then it expects
    that clocks based on MONOTONIC and BOOTTIME are not subject to
    disruption. Due to different boot time and non-suspended runtime
    these clocks can differ significantly on two hosts, in the worst
    case time goes backwards which is a violation of the POSIX
    requirements.

    The time namespace addresses this problem. It allows to set offsets
    for clock MONOTONIC and BOOTTIME once after creation and before
    tasks are associated with the namespace. These offsets are taken
    into account by timers and timekeeping including the VDSO.

    Offsets for wall clock based clocks (REALTIME/TAI) are not provided
    by this mechanism. While in theory possible, the overhead and code
    complexity would be immense and not justified by the esoteric
    potential use cases which were discussed at Plumbers '18.

    The overhead for tasks in the root namespace (ie where host time
    offsets = 0) is in the noise and great effort was made to ensure
    that especially in the VDSO. If time namespace is disabled in the
    kernel configuration the code is compiled out.

    Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
    feature and kept on for more than a year addressing review
    comments, finding better solutions. A pleasant experience.

    - Overhaul of the alarmtimer device dependency handling to ensure
    that the init/suspend/resume ordering is correct.

    - A new clocksource/event driver for Microchip PIT64

    - Suspend/resume support for the Hyper-V clocksource

    - The usual pile of fixes, updates and improvements mostly in the
    driver code"

    * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
    alarmtimer: Use wakeup source from alarmtimer platform device
    alarmtimer: Make alarmtimer platform device child of RTC device
    alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
    hrtimer: Add missing sparse annotation for __run_timer()
    lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
    MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
    clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
    clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
    clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
    clocksource/drivers/exynos_mct: Rename Exynos to lowercase
    clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
    clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
    clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
    clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
    clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
    clocksource/drivers/bcm2835_timer: Fix memory leak of timer
    clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
    clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
    clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
    ...

    Linus Torvalds
     

22 Jan, 2020

1 commit

  • Fix Kconfig help message since the bootconfig file is
    only available to be appended to initramfs. And also
    add a reference to the documentation.

    Link: http://lkml.kernel.org/r/157949058031.25888.18399447161895787505.stgit@devnote2

    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

20 Jan, 2020

2 commits

  • This reverts commit 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS").

    There are two issues with this commit, uncovered by Anton in tests
    on some (Debian) systems:

    1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
    isn't set. Don't recall now if it just wasn't needed on my system, or
    if I never tested this case.

    2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
    set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
    unexpected since whatever wanted to run is likely to have to run
    before the kernel init etc. that calls the constructors in this case.

    Basically, some constructors that gcc emits (libc has?) need to run
    very early during init; the failure mode otherwise was that the ptrace
    fork test already failed:

    ----------------------
    $ ./linux mem=512M
    Core dump limits :
    soft - 0
    hard - NONE
    Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
    Aborted
    ----------------------

    Thinking more about this, it's clear that we simply cannot support
    CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
    involve not use of the __attribute__((constructor)), but instead
    some constructor code/entry generated by gcc. Therefore, we cannot
    distinguish between kernel constructors and system constructors.

    Thus, revert this commit.

    Cc: stable@vger.kernel.org [5.4+]
    Fixes: 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS")
    Reported-by: Anton Ivanov
    Signed-off-by: Johannes Berg
    Acked-by: Anton Ivanov

    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • David S. Miller
     

14 Jan, 2020

3 commits

  • To support time namespaces in the vdso with a minimal impact on regular non
    time namespace affected tasks, the namespace handling needs to be hidden in
    a slow path.

    The most obvious place is vdso_seq_begin(). If a task belongs to a time
    namespace then the VVAR page which contains the system wide vdso data is
    replaced with a namespace specific page which has the same layout as the
    VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
    and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
    namespace handling path.

    The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
    update of the vdso data is in progress, is not really affecting regular
    tasks which are not part of a time namespace as the task is spin waiting
    for the update to finish and vdso_data->seq to become even again.

    If a time namespace task hits that code path, it invokes the corresponding
    time getter function which retrieves the real VVAR page, reads host time
    and then adds the offset for the requested clock which is stored in the
    special VVAR page.

    If VDSO time namespace support is disabled the whole magic is compiled out.

    Initial testing shows that the disabled case is almost identical to the
    host case which does not take the slow timens path. With the special timens
    page installed the performance hit is constant time and in the range of
    5-7%.

    For the vdso functions which are not using the sequence count an
    unconditional check for vdso_data->clock_mode is added which switches to
    the real vdso when the clock_mode is VCLOCK_TIMENS.

    [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
    pointer by CS_RAW offset.]

    Suggested-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com

    Thomas Gleixner
     
  • Time Namespace isolates clock values.

    The kernel provides access to several clocks CLOCK_REALTIME,
    CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

    CLOCK_REALTIME
    System-wide clock that measures real (i.e., wall-clock) time.

    CLOCK_MONOTONIC
    Clock that cannot be set and represents monotonic time since
    some unspecified starting point.

    CLOCK_BOOTTIME
    Identical to CLOCK_MONOTONIC, except it also includes any time
    that the system is suspended.

    For many users, the time namespace means the ability to changes date and
    time in a container (CLOCK_REALTIME). Providing per namespace notions of
    CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
    value.

    But in the context of checkpoint/restore functionality, monotonic and
    boottime clocks become interesting. Both clocks are monotonic with
    unspecified starting points. These clocks are widely used to measure time
    slices and set timers. After restoring or migrating processes, it has to be
    guaranteed that they never go backward. In an ideal case, the behavior of
    these clocks should be the same as for a case when a whole system is
    suspended. All this means that it is required to set CLOCK_MONOTONIC and
    CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
    offsets for clocks.

    A time namespace is similar to a pid namespace in the way how it is
    created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
    but doesn't set it to the current process. Then all children of the process
    will be born in the new time namespace, or a process can use the setns()
    system call to join a namespace.

    This scheme allows setting clock offsets for a namespace, before any
    processes appear in it.

    All available clone flags have been used, so CLONE_NEWTIME uses the highest
    bit of CSIGNAL. It means that it can be used only with the unshare() and
    the clone3() system calls.

    [ tglx: Adjusted paragraph about clone3() to reality and massaged the
    changelog a bit. ]

    Co-developed-by: Dmitry Safonov
    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://criu.org/Time_namespace
    Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
    Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com

    Andrei Vagin
     
  • Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
    debugging") has introduced a static key to reduce overhead when
    debug_pagealloc is compiled in but not enabled. It relied on the
    assumption that jump_label_init() is called before parse_early_param()
    as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
    it is safe to enable the static key.

    However, it turns out multiple architectures call parse_early_param()
    earlier from their setup_arch(). x86 also calls jump_label_init() even
    earlier, so no issue was found while testing the commit, but same is not
    true for e.g. ppc64 and s390 where the kernel would not boot with
    debug_pagealloc=on as found by our QA.

    To fix this without tricky changes to init code of multiple
    architectures, this patch partially reverts the static key conversion
    from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
    code) of debug_pagealloc_enabled() will again test a simple bool
    variable. Fastpath mm code is converted to a new
    debug_pagealloc_enabled_static() variant that relies on the static key,
    which is enabled in a well-defined point in mm_init() where it's
    guaranteed that jump_label_init() has been called, regardless of
    architecture.

    [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
    Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
    Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Stephen Rothwell
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka