27 Feb, 2020

1 commit

  • Pull tracing and bootconfig updates:
    "Fixes and changes to bootconfig before it goes live in a release.

    Change in API of bootconfig (before it comes live in a release):
    - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
    - Set CONFIG_BOOT_CONFIG to 'n' by default
    - Show error if "bootconfig" on cmdline but not compiled in
    - Prevent redefining the same value
    - Have a way to append values
    - Added a SELECT BLK_DEV_INITRD to fix a build failure

    Synthetic event fixes:
    - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
    - Fix samples always recording u64 values
    - Fix endianess
    - Check number of values matches number of fields
    - Fix a printing bug

    Fix of trace_printk() breaking postponed start up tests

    Make a function static that is only used in a single file"

    * tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
    bootconfig: Add append value operator support
    bootconfig: Prohibit re-defining value on same key
    bootconfig: Print array as multiple commands for legacy command line
    bootconfig: Reject subkey and value on same parent key
    tools/bootconfig: Remove unneeded error message silencer
    bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
    bootconfig: Set CONFIG_BOOT_CONFIG=n by default
    tracing: Clear trace_state when starting trace
    bootconfig: Mark boot_config_checksum() static
    tracing: Disable trace_printk() on post poned tests
    tracing: Have synthetic event test use raw_smp_processor_id()
    tracing: Fix number printing bug in print_synth_event()
    tracing: Check that number of vals matches number of synth event fields
    tracing: Make synth_event trace functions endian-correct
    tracing: Make sure synth_event_trace() example always uses u64

    Linus Torvalds
     

26 Feb, 2020

1 commit

  • Since commit d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
    default") also changed the CONFIG_BOOTTIME_TRACING to select
    CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
    it introduced wrong dependencies with BLK_DEV_INITRD as below.

    WARNING: unmet direct dependencies detected for BOOT_CONFIG
    Depends on [n]: BLK_DEV_INITRD [=n]
    Selected by [y]:
    - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]

    This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
    fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
    that both boot-time tracing and boot configuration off but those
    appear on the menu list.

    Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2

    Fixes: d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
    Reported-by: Randy Dunlap
    Compiled-tested-by: Randy Dunlap
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

21 Feb, 2020

4 commits

  • Print arraied values as multiple same options for legacy
    kernel command line. With this rule, if the "kernel.*" and
    "init.*" array entries in bootconfig are printed out as
    multiple same options, e.g.

    kernel {
    console = "ttyS0,115200"
    console += "tty0"
    }

    will be correctly converted to

    console="ttyS0,115200" console="tty0"

    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220118213.26565.8163300497009463916.stgit@devnote2

    Reported-by: Borislav Petkov
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add bootconfig magic word to the end of bootconfig on initrd
    image for indicating explicitly the bootconfig is there.
    Also tools/bootconfig treats wrong size or wrong checksum or
    parse error as an error, because if there is a bootconfig magic
    word, there must be a bootconfig.

    The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word.
    Thus the block image of the initrd file with bootconfig is
    as follows.

    [Initrd][bootconfig][size][csum][#BOOTCONFIG\n]

    Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Set CONFIG_BOOT_CONFIG=n by default. This also warns
    user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
    in the kernel command line.

    Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2

    Suggested-by: Steven Rostedt
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • In fact, this function is only used in this file, so mark it with 'static'.

    Link: http://lkml.kernel.org/r/1581852511-14163-1-git-send-email-hqjagain@gmail.com

    Acked-by: Masami Hiramatsu
    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

12 Feb, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Various fixes:

    - Fix an uninitialized variable

    - Fix compile bug to bootconfig userspace tool (in tools directory)

    - Suppress some error messages of bootconfig userspace tool

    - Remove unneded CONFIG_LIBXBC from bootconfig

    - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
    taking up static memory at boot up

    - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string

    - Fix missing ring_buffer_nest_end() on synthetic event error path

    - Return zero not -EINVAL on soft disabled synthetic event (soft
    disabling must be the same as hard disabling, which returns zero)

    - Consolidate synthetic event code (remove duplicate code)"

    * tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Consolidate trace() functions
    tracing: Don't return -EINVAL when tracing soft disabled synth events
    tracing: Add missing nest end to synth_event_trace_start() error case
    tools/bootconfig: Suppress non-error messages
    bootconfig: Allocate xbc_nodes array dynamically
    bootconfig: Use parse_args() to find bootconfig and '--'
    tracing/kprobe: Fix uninitialized variable bug
    bootconfig: Remove unneeded CONFIG_LIBXBC
    tools/bootconfig: Fix wrong __VA_ARGS__ usage

    Linus Torvalds
     

11 Feb, 2020

2 commits

  • The current implementation does a naive search of "bootconfig" on the kernel
    command line. But this could find "bootconfig" that is part of another
    option in quotes (although highly unlikely). But it also needs to find '--'
    on the kernel command line to know if it should append a '--' or not when a
    bootconfig in the initrd file has an "init" section. The check uses the
    naive strstr() to find to see if it exists. But this can return a false
    positive if it exists in an option and then the "init" section in the initrd
    will not be appended properly.

    Using parse_args() to find both of these will solve both of these problems.

    Link: https://lore.kernel.org/r/202002070954.C18E7F58B@keescook

    Fixes: 7495e0926fdf3 ("bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline")
    Fixes: 1319916209ce8 ("bootconfig: init: Allow admin to use bootconfig for init command line")
    Reported-by: Kees Cook
    Reviewed-by: Kees Cook
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Since there is no user except CONFIG_BOOT_CONFIG and no plan
    to use it from other functions, CONFIG_LIBXBC can be removed
    and we can use CONFIG_BOOT_CONFIG directly.

    Link: http://lkml.kernel.org/r/158098769281.939.16293492056419481105.stgit@devnote2

    Suggested-by: Geert Uytterhoeven
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

06 Feb, 2020

3 commits

  • Pull tracing updates from Steven Rostedt:

    - Added new "bootconfig".

    This looks for a file appended to initrd to add boot config options,
    and has been discussed thoroughly at Linux Plumbers.

    Very useful for adding kprobes at bootup.

    Only enabled if "bootconfig" is on the real kernel command line.

    - Created dynamic event creation.

    Merges common code between creating synthetic events and kprobe
    events.

    - Rename perf "ring_buffer" structure to "perf_buffer"

    - Rename ftrace "ring_buffer" structure to "trace_buffer"

    Had to rename existing "trace_buffer" to "array_buffer"

    - Allow trace_printk() to work withing (some) tracing code.

    - Sort of tracing configs to be a little better organized

    - Fixed bug where ftrace_graph hash was not being protected properly

    - Various other small fixes and clean ups

    * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
    bootconfig: Show the number of nodes on boot message
    tools/bootconfig: Show the number of bootconfig nodes
    bootconfig: Add more parse error messages
    bootconfig: Use bootconfig instead of boot config
    ftrace: Protect ftrace_graph_hash with ftrace_sync
    ftrace: Add comment to why rcu_dereference_sched() is open coded
    tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
    tracing: Annotate ftrace_graph_hash pointer with __rcu
    bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
    tracing: Use seq_buf for building dynevent_cmd string
    tracing: Remove useless code in dynevent_arg_pair_add()
    tracing: Remove check_arg() callbacks from dynevent args
    tracing: Consolidate some synth_event_trace code
    tracing: Fix now invalid var_ref_vals assumption in trace action
    tracing: Change trace_boot to use synth_event interface
    tracing: Move tracing selftests to bottom of menu
    tracing: Move mmio tracer config up with the other tracers
    tracing: Move tracing test module configs together
    tracing: Move all function tracing configs together
    tracing: Documentation for in-kernel synthetic event API
    ...

    Linus Torvalds
     
  • Show the number of bootconfig nodes on boot message.

    Link: http://lkml.kernel.org/r/158091062297.27924.9051634676068550285.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Use "bootconfig" (1 word) instead of "boot config" (2 words)
    in the boot message.

    Link: http://lkml.kernel.org/r/158091059459.27924.14414336187441539879.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

05 Feb, 2020

1 commit

  • As the bootconfig is appended to the initrd it is not as easy to modify as
    the kernel command line. If there's some issue with the kernel, and the
    developer wants to boot a pristine kernel, it should not be needed to modify
    the initrd to remove the bootconfig for a single boot.

    As bootconfig is silently added (if the admin does not know where to look
    they may not know it's being loaded). It should be explicitly added to the
    kernel cmdline. The loading of the bootconfig is only done if "bootconfig"
    is on the kernel command line. This will let admins know that the kernel
    command line is extended.

    Note, after adding printk()s for when the size is too great or the checksum
    is wrong, exposed that the current method always looked for the boot config,
    and if this size and checksum matched, it would parse it (as if either is
    wrong a printk has been added to show this). It's better to only check this
    if the boot config is asked to be looked for.

    Link: https://lore.kernel.org/r/CAHk-=wjfjO+h6bQzrTf=YCZA53Y3EDyAs3Z4gEsT7icA3u_Psw@mail.gmail.com

    Acked-by: Masami Hiramatsu
    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

01 Feb, 2020

4 commits

  • This message leads to thinking that memory protection is not implemented
    for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
    only means that memory protection has not been selected at compile time.

    Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
    selected by the architecture. Instead, print "Kernel memory protection
    not selected by kernel config."

    Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.fr
    Signed-off-by: Christophe Leroy
    Acked-by: Kees Cook
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.

    unknown_bootoption passes unrecognized command line arguments to init as
    either environment variables or arguments. Some of the logic in the
    function is broken for quoted command line arguments.

    When an argument of the form param="value" is processed by parse_args
    and passed to unknown_bootoption, the command line has

    param\0"value\0

    with val pointing to the beginning of value. The helper function
    repair_env_string is then used to restore the '=' character that was
    removed by parse_args, and strip the quotes off fully. This results in

    param=value\0\0

    and val ends up pointing to the 'a' instead of the 'v' in value. This
    bug was introduced when repair_env_string was refactored into a separate
    function, and the decrement of val in repair_env_string became dead
    code.

    This causes two problems in unknown_bootoption in the two places where
    the val pointer is used as a substitute for the length of param:

    1. An argument of the form param=".value" is misinterpreted as a
    potential module parameter, with the result that it will not be
    placed in init's environment.

    2. An argument of the form param="value" is checked to see if param is
    an existing environment variable that should be overwritten, but the
    comparison is off-by-one and compares 'param=v' instead of 'param='
    against the existing environment. So passing, for example,
    TERM="vt100" on the command line results in init being passed both
    TERM=linux and TERM=vt100 in its environment.

    Patch 1 adds logging for the arguments and environment passed to init
    and is independent of the rest: it can be dropped if this is
    unnecessarily verbose.

    Patch 2 removes repair_env_string from initcall parameter parsing in
    do_initcall_level, as that uses a separate copy of the command line now
    and the repairing is no longer necessary.

    Patch 3 fixes the bug in unknown_bootoption by recording the length of
    param explicitly instead of implying it from val-param.

    This patch (of 3):

    Commit a99cd1125189 ("init: fix bug where environment vars can't be
    passed via boot args") introduced two minor bugs in unknown_bootoption
    by factoring out the quoted value handling into a separate function.

    When value is quoted, repair_env_string will move the value up 1 byte to
    strip the quotes, so val in unknown_bootoption no longer points to the
    actual location of the value.

    The result is that an argument of the form param=".value" is mistakenly
    treated as a potential module parameter and is not placed in init's
    environment, and an argument of the form param="value" can result in a
    duplicate environment variable: eg TERM="vt100" on the command line will
    result in both TERM=linux and TERM=vt100 being placed into init's
    environment.

    Fix this by recording the length of the param before calling
    repair_env_string instead of relying on val.

    Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Since commit 08746a65c296 ("init: fix in-place parameter modification
    regression"), parse_args in do_initcall_level is called on a copy of
    saved_command_line. It is unnecessary to call repair_env_string during
    this parsing, as this copy is not used for anything later.

    Remove the now unnecessary arguments from repair_env_string as well.

    Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Krzysztof Mazur
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     
  • Extend logging in `run_init_process` to also show the arguments and
    environment that we are passing to init.

    Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.edu
    Signed-off-by: Arvind Sankar
    Cc: Chris Metcalf
    Cc: Krzysztof Mazur
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Sankar
     

30 Jan, 2020

1 commit

  • Pull driver core updates from Greg KH:
    "Here is a small set of changes for 5.6-rc1 for the driver core and
    some firmware subsystem changes.

    Included in here are:
    - device.h splitup like you asked for months ago
    - devtmpfs minor cleanups
    - firmware core minor changes
    - debugfs fix for lockdown mode
    - kernfs cleanup fix
    - cpu topology minor fix

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits)
    firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS
    devtmpfs: factor out common tail of devtmpfs_{create,delete}_node
    devtmpfs: initify a bit
    devtmpfs: simplify initialization of mount_dev
    devtmpfs: factor out setup part of devtmpfsd()
    devtmpfs: fix theoretical stale pointer deref in devtmpfsd()
    driver core: platform: fix u32 greater or equal to zero comparison
    cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree
    debugfs: Return -EPERM when locked down
    driver core: Print device when resources present in really_probe()
    driver core: Fix test_async_driver_probe if NUMA is disabled
    driver core: platform: Prevent resouce overflow from causing infinite loops
    fs/kernfs/dir.c: Clean code by removing always true condition
    component: do not dereference opaque pointer in debugfs
    drivers/component: remove modular code
    debugfs: Fix warnings when building documentation
    device.h: move 'struct driver' stuff out to device/driver.h
    device.h: move 'struct class' stuff out to device/class.h
    device.h: move 'struct bus' stuff out to device/bus.h
    device.h: move dev_printk()-like functions to dev_printk.h
    ...

    Linus Torvalds
     

29 Jan, 2020

3 commits

  • Pull UML updates from Anton Ivanov:
    "I am sending this on behalf of Richard who is traveling.

    This contains the following changes for UML:

    - Fix for time travel mode

    - Disable CONFIG_CONSTRUCTORS again

    - A new command line option to have an non-raw serial line

    - Preparations to remove obsolete UML network drivers"

    * tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Fix time-travel=inf-cpu with xor/raid6
    Revert "um: Enable CONFIG_CONSTRUCTORS"
    um: Mark non-vector net transports as obsolete
    um: Add an option to make serial driver non-raw

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Add WireGuard

    2) Add HE and TWT support to ath11k driver, from John Crispin.

    3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

    4) Add variable window congestion control to TIPC, from Jon Maloy.

    5) Add BCM84881 PHY driver, from Russell King.

    6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

    7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

    8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

    9) Add BPF dynamic program extensions, from Alexei Starovoitov.

    10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

    11) Add support for macsec HW offloading, from Antoine Tenart.

    12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

    13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
    net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
    udp: segment looped gso packets correctly
    netem: change mailing list
    qed: FW 8.42.2.0 debug features
    qed: rt init valid initialization changed
    qed: Debug feature: ilt and mdump
    qed: FW 8.42.2.0 Add fw overlay feature
    qed: FW 8.42.2.0 HSI changes
    qed: FW 8.42.2.0 iscsi/fcoe changes
    qed: Add abstraction for different hsi values per chip
    qed: FW 8.42.2.0 Additional ll2 type
    qed: Use dmae to write to widebus registers in fw_funcs
    qed: FW 8.42.2.0 Parser offsets modified
    qed: FW 8.42.2.0 Queue Manager changes
    qed: FW 8.42.2.0 Expose new registers and change windows
    qed: FW 8.42.2.0 Internal ram offsets modifications
    MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
    Documentation: net: octeontx2: Add RVU HW and drivers overview
    octeontx2-pf: ethtool RSS config support
    octeontx2-pf: Add basic ethtool support
    ...

    Linus Torvalds
     
  • Pull objtool updates from Ingo Molnar:
    "The main changes are to move the ORC unwind table sorting from early
    init to build-time - this speeds up booting.

    No change in functionality intended"

    * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/unwind/orc: Fix !CONFIG_MODULES build warning
    x86/unwind/orc: Remove boot-time ORC unwind tables sorting
    scripts/sorttable: Implement build-time ORC unwind table sorting
    scripts/sorttable: Rename 'sortextable' to 'sorttable'
    scripts/sortextable: Refactor the do_func() function
    scripts/sortextable: Remove dead code
    scripts/sortextable: Clean up the code to meet the kernel coding style better
    scripts/sortextable: Rewrite error/success handling

    Linus Torvalds
     

28 Jan, 2020

1 commit

  • Pull timer updates from Thomas Gleixner:
    "The timekeeping and timers departement provides:

    - Time namespace support:

    If a container migrates from one host to another then it expects
    that clocks based on MONOTONIC and BOOTTIME are not subject to
    disruption. Due to different boot time and non-suspended runtime
    these clocks can differ significantly on two hosts, in the worst
    case time goes backwards which is a violation of the POSIX
    requirements.

    The time namespace addresses this problem. It allows to set offsets
    for clock MONOTONIC and BOOTTIME once after creation and before
    tasks are associated with the namespace. These offsets are taken
    into account by timers and timekeeping including the VDSO.

    Offsets for wall clock based clocks (REALTIME/TAI) are not provided
    by this mechanism. While in theory possible, the overhead and code
    complexity would be immense and not justified by the esoteric
    potential use cases which were discussed at Plumbers '18.

    The overhead for tasks in the root namespace (ie where host time
    offsets = 0) is in the noise and great effort was made to ensure
    that especially in the VDSO. If time namespace is disabled in the
    kernel configuration the code is compiled out.

    Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
    feature and kept on for more than a year addressing review
    comments, finding better solutions. A pleasant experience.

    - Overhaul of the alarmtimer device dependency handling to ensure
    that the init/suspend/resume ordering is correct.

    - A new clocksource/event driver for Microchip PIT64

    - Suspend/resume support for the Hyper-V clocksource

    - The usual pile of fixes, updates and improvements mostly in the
    driver code"

    * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
    alarmtimer: Use wakeup source from alarmtimer platform device
    alarmtimer: Make alarmtimer platform device child of RTC device
    alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
    hrtimer: Add missing sparse annotation for __run_timer()
    lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
    MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
    clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
    clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
    clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
    clocksource/drivers/exynos_mct: Rename Exynos to lowercase
    clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
    clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
    clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
    clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
    clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
    clocksource/drivers/bcm2835_timer: Fix memory leak of timer
    clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
    clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
    clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
    ...

    Linus Torvalds
     

22 Jan, 2020

1 commit

  • Fix Kconfig help message since the bootconfig file is
    only available to be appended to initramfs. And also
    add a reference to the documentation.

    Link: http://lkml.kernel.org/r/157949058031.25888.18399447161895787505.stgit@devnote2

    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

20 Jan, 2020

2 commits

  • This reverts commit 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS").

    There are two issues with this commit, uncovered by Anton in tests
    on some (Debian) systems:

    1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
    isn't set. Don't recall now if it just wasn't needed on my system, or
    if I never tested this case.

    2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
    set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
    unexpected since whatever wanted to run is likely to have to run
    before the kernel init etc. that calls the constructors in this case.

    Basically, some constructors that gcc emits (libc has?) need to run
    very early during init; the failure mode otherwise was that the ptrace
    fork test already failed:

    ----------------------
    $ ./linux mem=512M
    Core dump limits :
    soft - 0
    hard - NONE
    Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
    Aborted
    ----------------------

    Thinking more about this, it's clear that we simply cannot support
    CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
    involve not use of the __attribute__((constructor)), but instead
    some constructor code/entry generated by gcc. Therefore, we cannot
    distinguish between kernel constructors and system constructors.

    Thus, revert this commit.

    Cc: stable@vger.kernel.org [5.4+]
    Fixes: 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS")
    Reported-by: Anton Ivanov
    Signed-off-by: Johannes Berg
    Acked-by: Anton Ivanov

    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • David S. Miller
     

14 Jan, 2020

8 commits

  • To support time namespaces in the vdso with a minimal impact on regular non
    time namespace affected tasks, the namespace handling needs to be hidden in
    a slow path.

    The most obvious place is vdso_seq_begin(). If a task belongs to a time
    namespace then the VVAR page which contains the system wide vdso data is
    replaced with a namespace specific page which has the same layout as the
    VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
    and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
    namespace handling path.

    The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
    update of the vdso data is in progress, is not really affecting regular
    tasks which are not part of a time namespace as the task is spin waiting
    for the update to finish and vdso_data->seq to become even again.

    If a time namespace task hits that code path, it invokes the corresponding
    time getter function which retrieves the real VVAR page, reads host time
    and then adds the offset for the requested clock which is stored in the
    special VVAR page.

    If VDSO time namespace support is disabled the whole magic is compiled out.

    Initial testing shows that the disabled case is almost identical to the
    host case which does not take the slow timens path. With the special timens
    page installed the performance hit is constant time and in the range of
    5-7%.

    For the vdso functions which are not using the sequence count an
    unconditional check for vdso_data->clock_mode is added which switches to
    the real vdso when the clock_mode is VCLOCK_TIMENS.

    [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
    pointer by CS_RAW offset.]

    Suggested-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com

    Thomas Gleixner
     
  • Time Namespace isolates clock values.

    The kernel provides access to several clocks CLOCK_REALTIME,
    CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

    CLOCK_REALTIME
    System-wide clock that measures real (i.e., wall-clock) time.

    CLOCK_MONOTONIC
    Clock that cannot be set and represents monotonic time since
    some unspecified starting point.

    CLOCK_BOOTTIME
    Identical to CLOCK_MONOTONIC, except it also includes any time
    that the system is suspended.

    For many users, the time namespace means the ability to changes date and
    time in a container (CLOCK_REALTIME). Providing per namespace notions of
    CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
    value.

    But in the context of checkpoint/restore functionality, monotonic and
    boottime clocks become interesting. Both clocks are monotonic with
    unspecified starting points. These clocks are widely used to measure time
    slices and set timers. After restoring or migrating processes, it has to be
    guaranteed that they never go backward. In an ideal case, the behavior of
    these clocks should be the same as for a case when a whole system is
    suspended. All this means that it is required to set CLOCK_MONOTONIC and
    CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
    offsets for clocks.

    A time namespace is similar to a pid namespace in the way how it is
    created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
    but doesn't set it to the current process. Then all children of the process
    will be born in the new time namespace, or a process can use the setns()
    system call to join a namespace.

    This scheme allows setting clock offsets for a namespace, before any
    processes appear in it.

    All available clone flags have been used, so CLONE_NEWTIME uses the highest
    bit of CSIGNAL. It means that it can be used only with the unshare() and
    the clone3() system calls.

    [ tglx: Adjusted paragraph about clone3() to reality and massaged the
    changelog a bit. ]

    Co-developed-by: Dmitry Safonov
    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://criu.org/Time_namespace
    Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
    Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com

    Andrei Vagin
     
  • Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
    debugging") has introduced a static key to reduce overhead when
    debug_pagealloc is compiled in but not enabled. It relied on the
    assumption that jump_label_init() is called before parse_early_param()
    as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
    it is safe to enable the static key.

    However, it turns out multiple architectures call parse_early_param()
    earlier from their setup_arch(). x86 also calls jump_label_init() even
    earlier, so no issue was found while testing the commit, but same is not
    true for e.g. ppc64 and s390 where the kernel would not boot with
    debug_pagealloc=on as found by our QA.

    To fix this without tricky changes to init code of multiple
    architectures, this patch partially reverts the static key conversion
    from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
    code) of debug_pagealloc_enabled() will again test a simple bool
    variable. Fastpath mm code is converted to a new
    debug_pagealloc_enabled_static() variant that relies on the static key,
    which is enabled in a well-defined point in mm_init() where it's
    guaranteed that jump_label_init() has been called, regardless of
    architecture.

    [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
    Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
    Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Stephen Rothwell
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Since the current kernel command line is too short to describe
    long and many options for init (e.g. systemd command line options),
    this allows admin to use boot config for init command line.

    All init command line under "init." keywords will be passed to
    init.

    For example,

    init.systemd {
    unified_cgroup_hierarchy = 1
    debug_shell
    default_timeout_start_sec = 60
    }

    Link: http://lkml.kernel.org/r/157867229521.17873.654222294326542349.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since the current kernel command line is too short to describe
    many options which supported by kernel, allow user to use boot
    config to setup (add) the command line options.

    All kernel parameters under "kernel." keywords will be used
    for setting up extra kernel command line.

    For example,

    kernel {
    audit = on
    audit_backlog_limit = 256
    }

    Note that you can not specify some early parameters
    (like console etc.) by this method, since it is
    loaded after early parameters parsed.

    Link: http://lkml.kernel.org/r/157867228333.17873.11962796367032622466.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since initcall_command_line is used as a temporary buffer,
    it could be freed after usage. Allocate it in do_initcall()
    and free it after used.

    Link: http://lkml.kernel.org/r/157867227145.17873.17513760552008505454.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Load the extended boot config data from the tail of initrd
    image. If there is an SKC data there, it has
    [(u32)size][(u32)checksum] header (in really, this is a
    footer) at the end of initrd. If the checksum (simple sum
    of bytes) is match, this starts parsing it from there.

    Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Extra Boot Config (XBC) allows admin to pass a tree-structured
    boot configuration file when boot up the kernel. This extends
    the kernel command line in an efficient way.

    Boot config will contain some key-value commands, e.g.

    key.word = value1
    another.key.word = value2

    It can fold same keys with braces, also you can write array
    data. For example,

    key {
    word1 {
    setting1 = data
    setting2
    }
    word2.array = "val1", "val2"
    }

    User can access these key-value pair and tree structure via
    SKC APIs.

    Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

10 Jan, 2020

1 commit


03 Jan, 2020

1 commit

  • This reverts commit 8243186f0cc7 ("fs: remove ksys_dup()") and the
    subsequent fix for it in commit 2d3145f8d280 ("early init: fix error
    handling when opening /dev/console").

    Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls
    caused more trouble than what is worth it: it requires accessing vfs
    internals and it turns out there were other bugs in it too.

    In particular, the file reference counting was wrong - because unlike
    the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and
    thus had an extra leaked file reference.

    That in turn then caused odd problems with Androidx86 long after boot
    becaue of how the extra reference to the console kept the session active
    even after all file descriptors had been closed.

    Reported-by: youling 257
    Cc: Arvind Sankar
    Cc: Al Viro
    Signed-off-by: Dominik Brodowski
    Signed-off-by: Linus Torvalds

    Dominik Brodowski
     

28 Dec, 2019

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2019-12-27

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 127 non-merge commits during the last 17 day(s) which contain
    a total of 110 files changed, 6901 insertions(+), 2721 deletions(-).

    There are three merge conflicts. Conflicts and resolution looks as follows:

    1) Merge conflict in net/bpf/test_run.c:

    There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro")
    which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in
    BPF_PROG_TEST_RUN"):

    <<<<<<< HEAD
    if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
    sizeof_field(struct __sk_buff, priority),
    =======
    if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority),
    >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c

    There are a few occasions that look similar to this. Always take the chunk with
    offsetofend(). Note that there is one where the fields differ in here:

    <<<<<<< HEAD
    if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
    sizeof_field(struct __sk_buff, tstamp),
    =======
    if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
    >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c

    Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to
    850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN").

    2) Merge conflict in arch/riscv/net/bpf_jit_comp.c:

    (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.)

    <<<<<<< HEAD
    if (is_13b_check(off, insn))
    return -1;
    emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx);
    =======
    emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
    >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c

    Result should look like:

    emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);

    3) Merge conflict in arch/riscv/include/asm/pgtable.h:

    <<<<<<< HEAD
    =======
    #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
    #define VMALLOC_END (PAGE_OFFSET - 1)
    #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)

    #define BPF_JIT_REGION_SIZE (SZ_128M)
    #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
    #define BPF_JIT_REGION_END (VMALLOC_END)

    /*
    * Roughly size the vmemmap space to be large enough to fit enough
    * struct pages to map half the virtual address space. Then
    * position vmemmap directly below the VMALLOC region.
    */
    #define VMEMMAP_SHIFT \
    (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
    #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
    #define VMEMMAP_END (VMALLOC_START - 1)
    #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)

    #define vmemmap ((struct page *)VMEMMAP_START)

    >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c

    Only take the BPF_* defines from there and move them higher up in the
    same file. Remove the rest from the chunk. The VMALLOC_* etc defines
    got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page
    calls"). Result:

    [...]
    #define __S101 PAGE_READ_EXEC
    #define __S110 PAGE_SHARED_EXEC
    #define __S111 PAGE_SHARED_EXEC

    #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
    #define VMALLOC_END (PAGE_OFFSET - 1)
    #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)

    #define BPF_JIT_REGION_SIZE (SZ_128M)
    #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
    #define BPF_JIT_REGION_END (VMALLOC_END)

    /*
    * Roughly size the vmemmap space to be large enough to fit enough
    * struct pages to map half the virtual address space. Then
    * position vmemmap directly below the VMALLOC region.
    */
    #define VMEMMAP_SHIFT \
    (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
    #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
    #define VMEMMAP_END (VMALLOC_START - 1)
    #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)

    [...]

    Let me know if there are any other issues.

    Anyway, the main changes are:

    1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific
    to a provided BPF object file. This provides an alternative, simplified API
    compared to standard libbpf interaction. Also, add libbpf extern variable
    resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko.

    2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by
    generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also,
    add various BPF riscv JIT improvements, from Björn Töpel.

    3) Extend bpftool to allow matching BPF programs and maps by name,
    from Paul Chaignon.

    4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI
    flag for allowing updates without service interruption, from Andrey Ignatov.

    5) Cleanup and simplification of ring access functions for AF_XDP with a
    bonus of 0-5% performance improvement, from Magnus Karlsson.

    6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of
    audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa.

    7) Move and extend test_select_reuseport into BPF program tests under
    BPF selftests, from Jakub Sitnicki.

    8) Various BPF sample improvements for xdpsock for customizing parameters
    to set up and benchmark AF_XDP, from Jay Jayatheerthan.

    9) Improve libbpf to provide a ulimit hint on permission denied errors.
    Also change XDP sample programs to attach in driver mode by default,
    from Toke Høiland-Jørgensen.

    10) Extend BPF test infrastructure to allow changing skb mark from tc BPF
    programs, from Nikita V. Shirokov.

    11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King.

    12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after
    libbpf conversion, from Jesper Dangaard Brouer.

    13) Minor misc improvements from various others.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2019

1 commit

  • The comment says "this should never fail", but it definitely can fail
    when you have odd initial boot filesystems, or kernel configurations.

    So get the error handling right: filp_open() returns an error pointer.

    Reported-by: Jesse Barnes
    Reported-by: youling 257
    Fixes: 8243186f0cc7 ("fs: remove ksys_dup()")
    Cc: Dominik Brodowski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Dec, 2019

1 commit

  • The "trivial conversion" in commit cccaa5e33525 ("init: use do_mount()
    instead of ksys_mount()") was totally broken, since it didn't handle the
    case of a NULL mount data pointer. And while I had "tested" it (and
    presumably Dominik had too) that bug was hidden by me having options.

    Cc: Dominik Brodowski
    Cc: Arnd Bergmann
    Reported-by: Ondřej Jirman
    Reported-by: Guenter Roeck
    Reported-by: Naresh Kamboju
    Reported-and-tested-by: Borislav Petkov
    Tested-by: Chris Clayton
    Tested-by: Eric Biggers
    Tested-by: Geert Uytterhoeven
    Tested-by: Guido Günther
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Dec, 2019

1 commit

  • device.h has everything and the kitchen sink when it comes to struct
    device things, so split out the struct driver things things to a
    separate .h file to make things easier to maintain and manage over time.

    Cc: "Rafael J. Wysocki"
    Cc: Suzuki K Poulose
    Cc: Saravana Kannan
    Cc: Heikki Krogerus
    Link: https://lore.kernel.org/r/20191209193303.1694546-7-gregkh@linuxfoundation.org
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

13 Dec, 2019

1 commit

  • Use a more generic name for additional table sorting usecases,
    such as the upcoming ORC table sorting feature. This tool is
    not tied to exception table sorting anymore.

    No functional changes intended.

    [ mingo: Rewrote the changelog. ]

    Signed-off-by: Shile Zhang
    Acked-by: Peter Zijlstra (Intel)
    Cc: Josh Poimboeuf
    Cc: Masahiro Yamada
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com
    Signed-off-by: Ingo Molnar

    Shile Zhang