01 Feb, 2020

3 commits

  • Pull updates from Andrew Morton:
    "Most of -mm and quite a number of other subsystems: hotfixes, scripts,
    ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.

    MM is fairly quiet this time. Holidays, I assume"

    * emailed patches from Andrew Morton : (118 commits)
    kcov: ignore fault-inject and stacktrace
    include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
    execve: warn if process starts with executable stack
    reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
    init/main.c: fix misleading "This architecture does not have kernel memory protection" message
    init/main.c: fix quoted value handling in unknown_bootoption
    init/main.c: remove unnecessary repair_env_string in do_initcall_level
    init/main.c: log arguments and environment passed to init
    fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
    fs/binfmt_elf.c: coredump: delete duplicated overflow check
    fs/binfmt_elf.c: coredump: allocate core ELF header on stack
    fs/binfmt_elf.c: make BAD_ADDR() unlikely
    fs/binfmt_elf.c: better codegen around current->mm
    fs/binfmt_elf.c: don't copy ELF header around
    fs/binfmt_elf.c: fix ->start_code calculation
    fs/binfmt_elf.c: smaller code generation around auxv vector fill
    lib/find_bit.c: uninline helper _find_next_bit()
    lib/find_bit.c: join _find_next_bit{_le}
    uapi: rename ext2_swab() to swab() and share globally in swab.h
    lib/scatterlist.c: adjust indentation in __sg_alloc_table
    ...

    Linus Torvalds
     
  • Pull module updates from Jessica Yu:
    "Summary of modules changes for the 5.6 merge window:

    - Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings
    to indicate to the linker that it can perform string deduplication
    (i.e., duplicate strings are reduced to a single copy in the string
    table). This means any repeated namespace string would be merged to
    just one entry in __ksymtab_strings.

    - Various code cleanups and small fixes (fix small memleak in error
    path, improve moduleparam docs, silence rcu warnings, improve error
    logging)"

    * tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    module.h: Annotate mod_kallsyms with __rcu
    module: avoid setting info->name early in case we can fall back to info->mod->name
    modsign: print module name along with error message
    kernel/module: Fix memleak in module_add_modinfo_attrs()
    export.h: reduce __ksymtab_strings string duplication by using "MS" section flags
    moduleparam: fix kerneldoc
    modules: lockdep: Suppress suspicious RCU usage warning

    Linus Torvalds
     
  • Don't instrument 3 more files that contain debugging facilities and
    produce large amounts of uninteresting coverage for every syscall.

    The following snippets are sprinkled all over the place in kcov traces
    in a debugging kernel. We already try to disable instrumentation of
    stack unwinding code and of most debug facilities. I guess we did not
    use fault-inject.c at the time, and stacktrace.c was somehow missed (or
    something has changed in kernel/configs). This change both speeds up
    kcov (kernel doesn't need to store these PCs, user-space doesn't need to
    process them) and frees trace buffer capacity for more useful coverage.

    should_fail
    lib/fault-inject.c:149
    fail_dump
    lib/fault-inject.c:45

    stack_trace_save
    kernel/stacktrace.c:124
    stack_trace_consume_entry
    kernel/stacktrace.c:86
    stack_trace_consume_entry
    kernel/stacktrace.c:89
    ... a hundred frames skipped ...
    stack_trace_consume_entry
    kernel/stacktrace.c:93
    stack_trace_consume_entry
    kernel/stacktrace.c:86

    Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.com
    Signed-off-by: Dmitry Vyukov
    Reviewed-by: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

30 Jan, 2020

7 commits

  • Pull mmu_notifier updates from Jason Gunthorpe:
    "This small series revises the names in mmu_notifier to make the code
    clearer and more readable"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifier
    mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifier
    mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptions

    Linus Torvalds
     
  • Pull thread management updates from Christian Brauner:
    "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
    syscall.

    This syscall allows for the retrieval of file descriptors of a process
    based on its pidfd. A task needs to have ptrace_may_access()
    permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
    Andy) on the target.

    One of the main use-cases is in combination with seccomp's user
    notification feature. As a reminder, seccomp's user notification
    feature was made available in v5.0. It allows a task to retrieve a
    file descriptor for its seccomp filter. The file descriptor is usually
    handed of to a more privileged supervising process. The supervisor can
    then listen for syscall events caught by the seccomp filter of the
    supervisee and perform actions in lieu of the supervisee, usually
    emulating syscalls. pidfd_getfd() is needed to expand its uses.

    There are currently two major users that wait on pidfd_getfd() and one
    future user:

    - Netflix, Sargun said, is working on a service mesh where users
    should be able to connect to a dns-based VIP. When a user connects
    to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
    redirected to an envoy process. This service mesh uses seccomp user
    notifications and pidfd to intercept all connect calls and instead
    of connecting them to 1.2.3.4:80 connects them to e.g.
    127.0.0.1:8080.

    - LXD uses the seccomp notifier heavily to intercept and emulate
    mknod() and mount() syscalls for unprivileged containers/processes.
    With pidfd_getfd() more uses-cases e.g. bridging socket connections
    will be possible.

    - The patchset has also seen some interest from the browser corner.
    Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
    broker process. In the future glibc will start blocking all signals
    during dlopen() rendering this type of sandbox impossible. Hence,
    in the future Firefox will switch to a seccomp-user-nofication
    based sandbox which also makes use of file descriptor retrieval.
    The thread for this can be found at
    https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

    With pidfd_getfd() it is e.g. possible to bridge socket connections
    for the supervisee (binding to a privileged port) and taking actions
    on file descriptors on behalf of the supervisee in general.

    Sargun's first version was using an ioctl on pidfds but various people
    pushed for it to be a proper syscall which he duely implemented as
    well over various review cycles. Selftests are of course included.
    I've also added instructions how to deal with merge conflicts below.

    There's also a small fix coming from the kernel mentee project to
    correctly annotate struct sighand_struct with __rcu to fix various
    sparse warnings. We've received a few more such fixes and even though
    they are mostly trivial I've decided to postpone them until after -rc1
    since they came in rather late and I don't want to risk introducing
    build warnings.

    Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
    needed to avoid allocation recursions triggerable by storage drivers
    that have userspace parts that run in the IO path (e.g. dm-multipath,
    iscsi, etc). These allocation recursions deadlock the device.

    The new prctl() allows such privileged userspace components to avoid
    allocation recursions by setting the PF_MEMALLOC_NOIO and
    PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
    relevant maintainers and is routed here as part of prctl()
    thread-management."

    * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
    sched.h: Annotate sighand_struct with __rcu
    test: Add test for pidfd getfd
    arch: wire up pidfd_getfd syscall
    pid: Implement pidfd_getfd syscall
    vfs, fdtable: Add fget_task helper

    Linus Torvalds
     
  • …kernel/git/shuah/linux-kselftest

    Pull Kselftest kunit updates from Shuah Khan:
    "This kunit update consists of:

    - Support for building kunit as a module from Alan Maguire

    - AppArmor KUnit tests for policy unpack from Mike Salvatore"

    * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    kunit: building kunit as a module breaks allmodconfig
    kunit: update documentation to describe module-based build
    kunit: allow kunit to be loaded as a module
    kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
    kunit: allow kunit tests to be loaded as a module
    kunit: hide unexported try-catch interface in try-catch-impl.h
    kunit: move string-stream.h to lib/kunit
    apparmor: add AppArmor KUnit tests for policy unpack

    Linus Torvalds
     
  • …/kernel/git/arnd/playground

    Pull y2038 updates from Arnd Bergmann:
    "Core, driver and file system changes

    These are updates to device drivers and file systems that for some
    reason or another were not included in the kernel in the previous
    y2038 series.

    I've gone through all users of time_t again to make sure the kernel is
    in a long-term maintainable state, replacing all remaining references
    to time_t with safe alternatives.

    Some related parts of the series were picked up into the nfsd, xfs,
    alsa and v4l2 trees. A final set of patches in linux-mm removes the
    now unused time_t/timeval/timespec types and helper functions after
    all five branches are merged for linux-5.6, ensuring that no new users
    get merged.

    As a result, linux-5.6, or my backport of the patches to 5.4 [1],
    should be the first release that can serve as a base for a 32-bit
    system designed to run beyond year 2038, with a few remaining caveats:

    - All user space must be compiled with a 64-bit time_t, which will be
    supported in the coming musl-1.2 and glibc-2.32 releases, along
    with installed kernel headers from linux-5.6 or higher.

    - Applications that use the system call interfaces directly need to
    be ported to use the time64 syscalls added in linux-5.1 in place of
    the existing system calls. This impacts most users of futex() and
    seccomp() as well as programming languages that have their own
    runtime environment not based on libc.

    - Applications that use a private copy of kernel uapi header files or
    their contents may need to update to the linux-5.6 version, in
    particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
    linux/elfcore.h, linux/sockios.h, linux/timex.h and
    linux/can/bcm.h.

    - A few remaining interfaces cannot be changed to pass a 64-bit
    time_t in a compatible way, so they must be configured to use
    CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
    timestamps. Most importantly this impacts all users of 'struct
    input_event'.

    - All y2038 problems that are present on 64-bit machines also apply
    to 32-bit machines. In particular this affects file systems with
    on-disk timestamps using signed 32-bit seconds: ext4 with
    ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"

    [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame

    * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
    Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
    y2038: sh: remove timeval/timespec usage from headers
    y2038: sparc: remove use of struct timex
    y2038: rename itimerval to __kernel_old_itimerval
    y2038: remove obsolete jiffies conversion functions
    nfs: fscache: use timespec64 in inode auxdata
    nfs: fix timstamp debug prints
    nfs: use time64_t internally
    sunrpc: convert to time64_t for expiry
    drm/etnaviv: avoid deprecated timespec
    drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
    drm/msm: avoid using 'timespec'
    hfs/hfsplus: use 64-bit inode timestamps
    hostfs: pass 64-bit timestamps to/from user space
    packet: clarify timestamp overflow
    tsacct: add 64-bit btime field
    acct: stop using get_seconds()
    um: ubd: use 64-bit time_t where possible
    xtensa: ISS: avoid struct timeval
    dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
    ...

    Linus Torvalds
     
  • Pull printk update from Petr Mladek:
    "Prevent replaying log on all consoles"

    * tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    printk: fix exclusive_console replaying

    Linus Torvalds
     
  • Pull openat2 support from Al Viro:
    "This is the openat2() series from Aleksa Sarai.

    I'm afraid that the rest of namei stuff will have to wait - it got
    zero review the last time I'd posted #work.namei, and there had been a
    leak in the posted series I'd caught only last weekend. I was going to
    repost it on Monday, but the window opened and the odds of getting any
    review during that... Oh, well.

    Anyway, openat2 part should be ready; that _did_ get sane amount of
    review and public testing, so here it comes"

    From Aleksa's description of the series:
    "For a very long time, extending openat(2) with new features has been
    incredibly frustrating. This stems from the fact that openat(2) is
    possibly the most famous counter-example to the mantra "don't silently
    accept garbage from userspace" -- it doesn't check whether unknown
    flags are present[1].

    This means that (generally) the addition of new flags to openat(2) has
    been fraught with backwards-compatibility issues (O_TMPFILE has to be
    defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
    kernels gave errors, since it's insecure to silently ignore the
    flag[2]). All new security-related flags therefore have a tough road
    to being added to openat(2).

    Furthermore, the need for some sort of control over VFS's path
    resolution (to avoid malicious paths resulting in inadvertent
    breakouts) has been a very long-standing desire of many userspace
    applications.

    This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
    (which was a variant of David Drysdale's O_BENEATH patchset[4] which
    was a spin-off of the Capsicum project[5]) with a few additions and
    changes made based on the previous discussion within [6] as well as
    others I felt were useful.

    In line with the conclusions of the original discussion of
    AT_NO_JUMPS, the flag has been split up into separate flags. However,
    instead of being an openat(2) flag it is provided through a new
    syscall openat2(2) which provides several other improvements to the
    openat(2) interface (see the patch description for more details). The
    following new LOOKUP_* flags are added:

    LOOKUP_NO_XDEV:

    Blocks all mountpoint crossings (upwards, downwards, or through
    absolute links). Absolute pathnames alone in openat(2) do not
    trigger this. Magic-link traversal which implies a vfsmount jump is
    also blocked (though magic-link jumps on the same vfsmount are
    permitted).

    LOOKUP_NO_MAGICLINKS:

    Blocks resolution through /proc/$pid/fd-style links. This is done
    by blocking the usage of nd_jump_link() during resolution in a
    filesystem. The term "magic-links" is used to match with the only
    reference to these links in Documentation/, but I'm happy to change
    the name.

    It should be noted that this is different to the scope of
    ~LOOKUP_FOLLOW in that it applies to all path components. However,
    you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
    will *not* fail (assuming that no parent component was a
    magic-link), and you will have an fd for the magic-link.

    In order to correctly detect magic-links, the introduction of a new
    LOOKUP_MAGICLINK_JUMPED state flag was required.

    LOOKUP_BENEATH:

    Disallows escapes to outside the starting dirfd's
    tree, using techniques such as ".." or absolute links. Absolute
    paths in openat(2) are also disallowed.

    Conceptually this flag is to ensure you "stay below" a certain
    point in the filesystem tree -- but this requires some additional
    to protect against various races that would allow escape using
    "..".

    Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
    can trivially beam you around the filesystem (breaking the
    protection). In future, there might be similar safety checks done
    as in LOOKUP_IN_ROOT, but that requires more discussion.

    In addition, two new flags are added that expand on the above ideas:

    LOOKUP_NO_SYMLINKS:

    Does what it says on the tin. No symlink resolution is allowed at
    all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
    can still be used with NOFOLLOW to open an fd for the symlink as
    long as no parent path had a symlink component.

    LOOKUP_IN_ROOT:

    This is an extension of LOOKUP_BENEATH that, rather than blocking
    attempts to move past the root, forces all such movements to be
    scoped to the starting point. This provides chroot(2)-like
    protection but without the cost of a chroot(2) for each filesystem
    operation, as well as being safe against race attacks that
    chroot(2) is not.

    If a race is detected (as with LOOKUP_BENEATH) then an error is
    generated, and similar to LOOKUP_BENEATH it is not permitted to
    cross magic-links with LOOKUP_IN_ROOT.

    The primary need for this is from container runtimes, which
    currently need to do symlink scoping in userspace[7] when opening
    paths in a potentially malicious container.

    There is a long list of CVEs that could have bene mitigated by
    having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
    CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
    few).

    In order to make all of the above more usable, I'm working on
    libpathrs[8] which is a C-friendly library for safe path resolution.
    It features a userspace-emulated backend if the kernel doesn't support
    openat2(2). Hopefully we can get userspace to switch to using it, and
    thus get openat2(2) support for free once it's ready.

    Future work would include implementing things like
    RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
    programs to be sure they don't hit DoSes though stale NFS handles)"

    * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    Documentation: path-lookup: include new LOOKUP flags
    selftests: add openat2(2) selftests
    open: introduce openat2(2) syscall
    namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
    namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
    namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
    namei: LOOKUP_NO_XDEV: block mountpoint crossing
    namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
    namei: LOOKUP_NO_SYMLINKS: block symlink resolution
    namei: allow set_root() to produce errors
    namei: allow nd_jump_link() to produce errors
    nsfs: clean-up ns_get_path() signature to return int
    namei: only return -ECHILD from follow_dotdot_rcu()

    Linus Torvalds
     
  • Pull RCU warning removal from Paul McKenney:
    "A single commit that fixes an embarrassing bug discussed here:

    https://lore.kernel.org/lkml/20200125131425.GB16136@zn.tnic/

    which apparently also affects smaller systems"

    [ This was sent to Ingo, but since I see the issue on the laptop I use for
    testing during the merge window, I'm doing the pull directly - Linus ]

    * 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
    rcu: Forgive slow expedited grace periods at boot time

    Linus Torvalds
     

29 Jan, 2020

8 commits

  • Pull UML updates from Anton Ivanov:
    "I am sending this on behalf of Richard who is traveling.

    This contains the following changes for UML:

    - Fix for time travel mode

    - Disable CONFIG_CONSTRUCTORS again

    - A new command line option to have an non-raw serial line

    - Preparations to remove obsolete UML network drivers"

    * tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Fix time-travel=inf-cpu with xor/raid6
    Revert "um: Enable CONFIG_CONSTRUCTORS"
    um: Mark non-vector net transports as obsolete
    um: Add an option to make serial driver non-raw

    Linus Torvalds
     
  • Pull tracing fix from Steven Rostedt:
    "Kprobe events added 'ustring' to distinguish reading strings from
    kernel space or user space.

    But the creating of the event format file only checks for 'string' to
    display string formats. 'ustring' must also be handled"

    * tag 'trace-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/kprobes: Have uname use __get_str() in print_fmt

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Add WireGuard

    2) Add HE and TWT support to ath11k driver, from John Crispin.

    3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

    4) Add variable window congestion control to TIPC, from Jon Maloy.

    5) Add BCM84881 PHY driver, from Russell King.

    6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

    7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

    8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

    9) Add BPF dynamic program extensions, from Alexei Starovoitov.

    10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

    11) Add support for macsec HW offloading, from Antoine Tenart.

    12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

    13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
    net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
    udp: segment looped gso packets correctly
    netem: change mailing list
    qed: FW 8.42.2.0 debug features
    qed: rt init valid initialization changed
    qed: Debug feature: ilt and mdump
    qed: FW 8.42.2.0 Add fw overlay feature
    qed: FW 8.42.2.0 HSI changes
    qed: FW 8.42.2.0 iscsi/fcoe changes
    qed: Add abstraction for different hsi values per chip
    qed: FW 8.42.2.0 Additional ll2 type
    qed: Use dmae to write to widebus registers in fw_funcs
    qed: FW 8.42.2.0 Parser offsets modified
    qed: FW 8.42.2.0 Queue Manager changes
    qed: FW 8.42.2.0 Expose new registers and change windows
    qed: FW 8.42.2.0 Internal ram offsets modifications
    MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
    Documentation: net: octeontx2: Add RVU HW and drivers overview
    octeontx2-pf: ethtool RSS config support
    octeontx2-pf: Add basic ethtool support
    ...

    Linus Torvalds
     
  • Pull crypto updates from Herbert Xu:
    "API:
    - Removed CRYPTO_TFM_RES flags
    - Extended spawn grabbing to all algorithm types
    - Moved hash descsize verification into API code

    Algorithms:
    - Fixed recursive pcrypt dead-lock
    - Added new 32 and 64-bit generic versions of poly1305
    - Added cryptogams implementation of x86/poly1305

    Drivers:
    - Added support for i.MX8M Mini in caam
    - Added support for i.MX8M Nano in caam
    - Added support for i.MX8M Plus in caam
    - Added support for A33 variant of SS in sun4i-ss
    - Added TEE support for Raven Ridge in ccp
    - Added in-kernel API to submit TEE commands in ccp
    - Added AMD-TEE driver
    - Added support for BCM2711 in iproc-rng200
    - Added support for AES256-GCM based ciphers for chtls
    - Added aead support on SEC2 in hisilicon"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
    crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
    crypto: caam - add support for i.MX8M Plus
    crypto: x86/poly1305 - emit does base conversion itself
    crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
    crypto: chacha20poly1305 - add back missing test vectors and test chunking
    crypto: x86/poly1305 - fix .gitignore typo
    tee: fix memory allocation failure checks on drv_data and amdtee
    crypto: ccree - erase unneeded inline funcs
    crypto: ccree - make cc_pm_put_suspend() void
    crypto: ccree - split overloaded usage of irq field
    crypto: ccree - fix PM race condition
    crypto: ccree - fix FDE descriptor sequence
    crypto: ccree - cc_do_send_request() is void func
    crypto: ccree - fix pm wrongful error reporting
    crypto: ccree - turn errors to debug msgs
    crypto: ccree - fix AEAD decrypt auth fail
    crypto: ccree - fix typo in comment
    crypto: ccree - fix typos in error msgs
    crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
    crypto: x86/sha - Eliminate casts on asm implementations
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "These were the main changes in this cycle:

    - More -rt motivated separation of CONFIG_PREEMPT and
    CONFIG_PREEMPTION.

    - Add more low level scheduling topology sanity checks and warnings
    to filter out nonsensical topologies that break scheduling.

    - Extend uclamp constraints to influence wakeup CPU placement

    - Make the RT scheduler more aware of asymmetric topologies and CPU
    capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y

    - Make idle CPU selection more consistent

    - Various fixes, smaller cleanups, updates and enhancements - please
    see the git log for details"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    sched/fair: Define sched_idle_cpu() only for SMP configurations
    sched/topology: Assert non-NUMA topology masks don't (partially) overlap
    idle: fix spelling mistake "iterrupts" -> "interrupts"
    sched/fair: Remove redundant call to cpufreq_update_util()
    sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
    sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
    sched/fair: calculate delta runnable load only when it's needed
    sched/cputime: move rq parameter in irqtime_account_process_tick
    stop_machine: Make stop_cpus() static
    sched/debug: Reset watchdog on all CPUs while processing sysrq-t
    sched/core: Fix size of rq::uclamp initialization
    sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
    sched/fair: Load balance aggressively for SCHED_IDLE CPUs
    sched/fair : Improve update_sd_pick_busiest for spare capacity case
    watchdog: Remove soft_lockup_hrtimer_cnt and related code
    sched/rt: Make RT capacity-aware
    sched/fair: Make EAS wakeup placement consider uclamp restrictions
    sched/fair: Make task_fits_capacity() consider uclamp restrictions
    sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
    sched/uclamp: Make uclamp util helpers use and return UL values
    ...

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "Kernel side changes:

    - Ftrace is one of the last W^X violators (after this only KLP is
    left). These patches move it over to the generic text_poke()
    interface and thereby get rid of this oddity. This requires a
    surprising amount of surgery, by Peter Zijlstra.

    - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
    count certain types of events that have a special, quirky hw ABI
    (by Kim Phillips)

    - kprobes fixes by Masami Hiramatsu

    Lots of tooling updates as well, the following subcommands were
    updated: annotate/report/top, c2c, clang, record, report/top TUI,
    sched timehist, tests; plus updates were done to the gtk ui, libperf,
    headers and the parser"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
    perf/x86/amd: Add support for Large Increment per Cycle Events
    perf/x86/amd: Constrain Large Increment per Cycle events
    perf/x86/intel/rapl: Add Comet Lake support
    tracing: Initialize ret in syscall_enter_define_fields()
    perf header: Use last modification time for timestamp
    perf c2c: Fix return type for histogram sorting comparision functions
    perf beauty sockaddr: Fix augmented syscall format warning
    perf/ui/gtk: Fix gtk2 build
    perf ui gtk: Add missing zalloc object
    perf tools: Use %define api.pure full instead of %pure-parser
    libperf: Setup initial evlist::all_cpus value
    perf report: Fix no libunwind compiled warning break s390 issue
    perf tools: Support --prefix/--prefix-strip
    perf report: Clarify in help that --children is default
    tools build: Fix test-clang.cpp with Clang 8+
    perf clang: Fix build with Clang 9
    kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
    tools lib: Fix builds when glibc contains strlcpy()
    perf report/top: Make 'e' visible in the help and make it toggle showing callchains
    perf report/top: Do not offer annotation for symbols without samples
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "Just a handful of changes in this cycle: an ARM64 performance
    optimization, a comment fix and a debug output fix"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/osq: Use optimized spinning loop for arm64
    locking/qspinlock: Fix inaccessible URL of MCS lock paper
    locking/lockdep: Fix lockdep_stats indentation problem

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The RCU changes in this cycle were:
    - Expedited grace-period updates
    - kfree_rcu() updates
    - RCU list updates
    - Preemptible RCU updates
    - Torture-test updates
    - Miscellaneous fixes
    - Documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
    rcu: Remove unused stop-machine #include
    powerpc: Remove comment about read_barrier_depends()
    .mailmap: Add entries for old paulmck@kernel.org addresses
    srcu: Apply *_ONCE() to ->srcu_last_gp_end
    rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
    rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
    rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
    rcu: Remove the declaration of call_rcu() in tree.h
    rcu: Fix tracepoint tracking RCU CPU kthread utilization
    rcu: Fix harmless omission of "CONFIG_" from #if condition
    rcu: Avoid tick_dep_set_cpu() misordering
    rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
    rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
    rcu: Clear ->rcu_read_unlock_special only once
    rcu: Clear .exp_hint only when deferred quiescent state has been reported
    rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
    rcu: Remove kfree_call_rcu_nobatch()
    rcu: Remove kfree_rcu() special casing and lazy-callback handling
    rcu: Add support for debug_objects debugging for kfree_rcu()
    rcu: Add multiple in-flight batches of kfree_rcu() work
    ...

    Linus Torvalds
     

28 Jan, 2020

12 commits

  • There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
    amd nbd that have userspace components that can run in the IO path. For
    example, iscsi and nbd's userspace deamons may need to recreate a socket
    and/or send IO on it, and dm-multipath's daemon multipathd may need to
    send SG IO or read/write IO to figure out the state of paths and re-set
    them up.

    In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
    memalloc_*_save/restore functions to control the allocation behavior,
    but for userspace we would end up hitting an allocation that ended up
    writing data back to the same device we are trying to allocate for.
    The device is then in a state of deadlock, because to execute IO the
    device needs to allocate memory, but to allocate memory the memory
    layers want execute IO to the device.

    Here is an example with nbd using a local userspace daemon that performs
    network IO to a remote server. We are using XFS on top of the nbd device,
    but it can happen with any FS or other modules layered on top of the nbd
    device that can write out data to free memory. Here a nbd daemon helper
    thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
    a request. This kicks off a reclaim operation which results in a WRITE to
    the nbd device and the nbd thread calling back into the mm layer.

    [ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
    [ 1626.609193] Call Trace:
    [ 1626.609195] ? __schedule+0x29b/0x630
    [ 1626.609197] ? wait_for_completion+0xe0/0x170
    [ 1626.609198] schedule+0x30/0xb0
    [ 1626.609200] schedule_timeout+0x1f6/0x2f0
    [ 1626.609202] ? blk_finish_plug+0x21/0x2e
    [ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
    [ 1626.609206] ? wait_for_completion+0xe0/0x170
    [ 1626.609208] wait_for_completion+0x108/0x170
    [ 1626.609210] ? wake_up_q+0x70/0x70
    [ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
    [ 1626.609214] ? xfs_bwrite+0x25/0x60
    [ 1626.609215] xfs_buf_iowait+0x22/0xf0
    [ 1626.609218] __xfs_buf_submit+0x12e/0x250
    [ 1626.609220] xfs_bwrite+0x25/0x60
    [ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
    [ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
    [ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
    [ 1626.609228] super_cache_scan+0x152/0x1a0
    [ 1626.609231] do_shrink_slab+0x12c/0x2d0
    [ 1626.609233] shrink_slab+0x9c/0x2a0
    [ 1626.609235] shrink_node+0xd7/0x470
    [ 1626.609237] do_try_to_free_pages+0xbf/0x380
    [ 1626.609240] try_to_free_pages+0xd9/0x1f0
    [ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
    [ 1626.609251] ? ___slab_alloc+0x238/0x560
    [ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
    [ 1626.609259] skb_page_frag_refill+0x97/0xd0
    [ 1626.609274] sk_page_frag_refill+0x1d/0x80
    [ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
    [ 1626.609304] tcp_sendmsg+0x27/0x40
    [ 1626.609307] sock_sendmsg+0x54/0x60
    [ 1626.609308] ___sys_sendmsg+0x29f/0x320
    [ 1626.609313] ? sock_poll+0x66/0xb0
    [ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
    [ 1626.609320] ? ep_send_events_proc+0xe6/0x230
    [ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
    [ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
    [ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
    [ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
    [ 1626.609329] ? __hrtimer_init+0xb0/0xb0
    [ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
    [ 1626.609334] ? ep_poll+0x26c/0x4a0
    [ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
    [ 1626.609339] ? release_sock+0x43/0x90
    [ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
    [ 1626.609342] __sys_sendmsg+0x47/0x80
    [ 1626.609347] do_syscall_64+0x5f/0x1c0
    [ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
    [ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This patch adds a new prctl command that daemons can use after they have
    done their initial setup, and before they start to do allocations that
    are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
    flags so both userspace block and FS threads can use it to avoid the
    allocation recursion and try to prevent from being throttled while
    writing out data to free up memory.

    Signed-off-by: Mike Christie
    Acked-by: Michal Hocko
    Tested-by: Masato Suzuki
    Reviewed-by: Damien Le Moal
    Reviewed-by: Bart Van Assche
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
    Signed-off-by: Christian Brauner

    Mike Christie
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Pull irq updates from Thomas Gleixner:
    "The interrupt departement provides:

    - A mechanism to shield isolated tasks from managed interrupts:

    The affinity of managed interrupts is completely controlled by the
    kernel and user space has no influence on them. The reason is that
    the automatically assigned affinity correlates to the multi-queue
    CPU handling of block devices.

    If the generated affinity mask spaws both housekeeping and isolated
    CPUs the interrupt could be routed to an isolated CPU which would
    then be disturbed by I/O submitted by a housekeeping CPU.

    The new mechamism ensures that as long as one housekeeping CPU is
    online in the assigned affinity mask the interrupt is routed to a
    housekeeping CPU.

    If there is no online housekeeping CPU in the affinity mask, then
    the interrupt is routed to an isolated CPU to keep the device queue
    intact, but unless the isolated CPU submits I/O by itself these
    interrupts are not raised.

    - A small addon to the device tree irqdomain core code to avoid
    duplication in irq chip drivers

    - Conversion of the SiFive PLIC to hierarchical domains

    - The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
    NXP INTMUX, Meson A1 GPIO

    - The first cut of support for the new ARM GICv4.1

    - The usual pile of fixes and improvements in core and driver code"

    * tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
    genirq, sched/isolation: Isolate from handling managed interrupts
    irqchip/gic-v4.1: Allow direct invalidation of VLPIs
    irqchip/gic-v4.1: Suppress per-VLPI doorbell
    irqchip/gic-v4.1: Add VPE INVALL callback
    irqchip/gic-v4.1: Add VPE eviction callback
    irqchip/gic-v4.1: Add VPE residency callback
    irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
    irqchip/gic-v4.1: Plumb skeletal VPE irqchip
    irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
    irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
    irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
    irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
    irqchip/gic-v3: Add GICv4.1 VPEID size discovery
    irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
    irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
    irqdomain: Fix a memory leak in irq_domain_push_irq()
    irqchip: Add NXP INTMUX interrupt multiplexer support
    dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
    irqchip: Define EXYNOS_IRQ_COMBINER
    irqchip/meson-gpio: Add support for meson a1 SoCs
    ...

    Linus Torvalds
     
  • Pull core SMP updates from Thomas Gleixner:
    "A small set of SMP core code changes:

    - Rework the smp function call core code to avoid the allocation of
    an additional cpumask

    - Remove the not longer required GFP argument from on_each_cpu_cond()
    and on_each_cpu_cond_mask() and fixup the callers"

    * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    smp: Remove allocation mask from on_each_cpu_cond.*()
    smp: Add a smp_cond_func_t argument to smp_call_function_many()
    smp: Use smp_cond_func_t as type for the conditional function

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "The timekeeping and timers departement provides:

    - Time namespace support:

    If a container migrates from one host to another then it expects
    that clocks based on MONOTONIC and BOOTTIME are not subject to
    disruption. Due to different boot time and non-suspended runtime
    these clocks can differ significantly on two hosts, in the worst
    case time goes backwards which is a violation of the POSIX
    requirements.

    The time namespace addresses this problem. It allows to set offsets
    for clock MONOTONIC and BOOTTIME once after creation and before
    tasks are associated with the namespace. These offsets are taken
    into account by timers and timekeeping including the VDSO.

    Offsets for wall clock based clocks (REALTIME/TAI) are not provided
    by this mechanism. While in theory possible, the overhead and code
    complexity would be immense and not justified by the esoteric
    potential use cases which were discussed at Plumbers '18.

    The overhead for tasks in the root namespace (ie where host time
    offsets = 0) is in the noise and great effort was made to ensure
    that especially in the VDSO. If time namespace is disabled in the
    kernel configuration the code is compiled out.

    Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
    feature and kept on for more than a year addressing review
    comments, finding better solutions. A pleasant experience.

    - Overhaul of the alarmtimer device dependency handling to ensure
    that the init/suspend/resume ordering is correct.

    - A new clocksource/event driver for Microchip PIT64

    - Suspend/resume support for the Hyper-V clocksource

    - The usual pile of fixes, updates and improvements mostly in the
    driver code"

    * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
    alarmtimer: Use wakeup source from alarmtimer platform device
    alarmtimer: Make alarmtimer platform device child of RTC device
    alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
    hrtimer: Add missing sparse annotation for __run_timer()
    lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
    MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
    clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
    clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
    clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
    clocksource/drivers/exynos_mct: Rename Exynos to lowercase
    clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
    clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
    clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
    clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
    clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
    clocksource/drivers/bcm2835_timer: Fix memory leak of timer
    clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
    clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
    clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
    ...

    Linus Torvalds
     
  • Pull watchdog updates from Thomas Gleixner:
    "A set of watchdog/softlockup related improvements:

    - Enforce that the watchdog timestamp is always valid on boot. The
    original implementation caused a watchdog disabled gap of one
    second in the boot process due to truncation of the underlying
    sched clock.

    The sched clock is divided by 1e9 to convert nanoseconds to
    seconds. So for the first second of the boot process the result is
    0 which is at the same time the indicator to disable the watchdog.

    The trivial fix is to change the disabled indicator to ULONG_MAX.

    - Two cleanup patches removing unused and redundant code which got
    forgotten to be cleaned up in previous changes"

    * tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    watchdog/softlockup: Enforce that timestamp is valid on boot
    watchdog/softlockup: Remove obsolete check of last reported task
    watchdog: Remove soft_lockup_hrtimer_cnt and related code

    Linus Torvalds
     
  • Pull timer fixes from Thomas Gleixner:
    "Two fixes for the generic VDSO code which missed 5.5:

    - Make the update to the coarse timekeeper unconditional.

    This is required because the coarse timekeeper interfaces in the
    VDSO do not depend on a VDSO capable clocksource. If the system
    does not have a VDSO capable clocksource and the update is
    depending on the VDSO capable clocksource, the coarse VDSO
    interfaces would operate on stale data forever.

    - Invert the logic of __arch_update_vdso_data() to avoid further head
    scratching.

    Tripped over this several times while analyzing the update problem
    above"

    * tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lib/vdso: Update coarse timekeeper unconditionally
    lib/vdso: Make __arch_update_vdso_data() logic understandable

    Linus Torvalds
     
  • Pull audit update from Paul Moore:
    "One small audit patch for the Linux v5.6 merge window, and
    unsurprisingly it passes our test suite with flying colors"

    * tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: Add __rcu annotation to RCU pointer

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:

    - cgroup2 interface for hugetlb controller. I think this was the last
    remaining bit which was missing from cgroup2

    - fixes for race and a spurious warning in threaded cgroup handling

    - other minor changes

    * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    iocost: Fix iocost_monitor.py due to helper type mismatch
    cgroup: Prevent double killing of css when enabling threaded cgroup
    cgroup: fix function name in comment
    mm: hugetlb controller for cgroups v2

    Linus Torvalds
     
  • Pull workqueue updates from Tejun Heo:
    "Just a couple tracepoint patches"

    * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: remove workqueue_work event class
    workqueue: add worker function to workqueue_execute_end tracepoint

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These add ACPI support to the intel_idle driver along with an admin
    guide document for it, add support for CPR (Core Power Reduction) to
    the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
    in a few places, add some new sysfs attributes, debugfs files and
    tracepoints, fix bugs and clean up a bunch of things all over.

    Specifics:

    - Update the ACPI processor driver in order to export
    acpi_processor_evaluate_cst() to the code outside of it, add ACPI
    support to the intel_idle driver based on that and clean up that
    driver somewhat (Rafael Wysocki).

    - Add an admin guide document for the intel_idle driver (Rafael
    Wysocki).

    - Clean up cpuidle core and drivers, enable compilation testing for
    some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
    Wysocki, Yangtao Li).

    - Fix reference counting of OPP (operating performance points) table
    structures (Viresh Kumar).

    - Add support for CPR (Core Power Reduction) to the AVS (Adaptive
    Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
    YueHaibing).

    - Add support for TigerLake Mobile and JasperLake to the Intel RAPL
    power capping driver (Zhang Rui).

    - Update cpufreq drivers:
    - Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
    - Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
    - Fix cpufreq policy reference counting issues in s3c and
    brcmstb-avs (chenqiwu).
    - Fix ACPI table reference counting issue and HiSilicon quirk
    handling in the CPPC driver (Hanjun Guo).
    - Clean up spelling mistake in intel_pstate (Harry Pan).
    - Convert the kirkwood and tegra186 drivers to using
    devm_platform_ioremap_resource() (Yangtao Li).

    - Update devfreq core:
    - Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
    - Clean up the handing of transition statistics and allow them to
    be reset by writing 0 to the 'trans_stat' devfreq device
    attribute in sysfs (Kamil Konieczny).
    - Add 'devfreq_summary' to debugfs (Chanwoo Choi).
    - Clean up kerneldoc comments and Kconfig indentation (Krzysztof
    Kozlowski, Randy Dunlap).

    - Update devfreq drivers:
    - Add dynamic scaling for the imx8m DDR controller and clean up
    imx8m-ddrc (Leonard Crestez, YueHaibing).
    - Fix DT node reference counting and nitialization error code path
    in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
    for it (Chanwoo Choi, Yangtao Li).
    - Fix DT node reference counting in rockchip-dfi and make it use
    devm_platform_ioremap_resource() (Yangtao Li).
    - Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
    - Fix initialization error code paths in exynos-bus (Yangtao Li).
    - Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
    Kozlowski).

    - Add tracepoints for tracking usage_count updates unrelated to
    status changes in PM-runtime (Michał Mirosław).

    - Add sysfs attribute to control the "sync on suspend" behavior
    during system-wide suspend (Jonas Meurer).

    - Switch system-wide suspend tests over to 64-bit time (Alexandre
    Belloni).

    - Make wakeup sources statistics in debugfs cover deleted ones which
    used to be the case some time ago (zhuguangqing).

    - Clean up computations carried out during hibernation, update
    messages related to hibernation and fix a spelling mistake in one
    of them (Wen Yang, Luigi Semenzato, Colin Ian King).

    - Add mailmap entry for maintainer e-mail address that has not been
    functional for several years (Rafael Wysocki)"

    * tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
    cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
    intel_idle: Clean up irtl_2_usec()
    intel_idle: Move 3 functions closer to their callers
    intel_idle: Annotate initialization code and data structures
    intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
    intel_idle: Rearrange intel_idle_cpuidle_driver_init()
    intel_idle: Clean up NULL pointer check in intel_idle_init()
    intel_idle: Fold intel_idle_probe() into intel_idle_init()
    intel_idle: Eliminate __setup_broadcast_timer()
    cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
    cpuidle: sysfs: fix warnings when compiling with W=1
    cpuidle: coupled: fix warnings when compiling with W=1
    cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
    PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
    PM / devfreq: Add debugfs support with devfreq_summary file
    Documentation: admin-guide: PM: Add intel_idle document
    cpuidle: arm: Enable compile testing for some of drivers
    PM-runtime: add tracepoints for usage_count changes
    cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
    PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "The changes are a real mixed bag this time around.

    The only scary looking one from the diffstat is the uapi change to
    asm-generic/mman-common.h, but this has been acked by Arnd and is
    actually just adding a pair of comments in an attempt to prevent
    allocation of some PROT values which tend to get used for
    arch-specific purposes. We'll be using them for Branch Target
    Identification (a CFI-like hardening feature), which is currently
    under review on the mailing list.

    New architecture features:

    - Support for Armv8.5 E0PD, which benefits KASLR in the same way as
    KPTI but without the overhead. This allows KPTI to be disabled on
    CPUs that are not affected by Meltdown, even is KASLR is enabled.

    - Initial support for the Armv8.5 RNG instructions, which claim to
    provide access to a high bandwidth, cryptographically secure
    hardware random number generator. As well as exposing these to
    userspace, we also use them as part of the KASLR seed and to seed
    the crng once all CPUs have come online.

    - Advertise a bunch of new instructions to userspace, including
    support for Data Gathering Hint, Matrix Multiply and 16-bit
    floating point.

    Kexec:

    - Cleanups in preparation for relocating with the MMU enabled

    - Support for loading crash dump kernels with kexec_file_load()

    Perf and PMU drivers:

    - Cleanups and non-critical fixes for a couple of system PMU drivers

    FPU-less (aka broken) CPU support:

    - Considerable fixes to support CPUs without the FP/SIMD extensions,
    including their presence in heterogeneous systems. Good luck
    finding a 64-bit userspace that handles this.

    Modern assembly function annotations:

    - Start migrating our use of ENTRY() and ENDPROC() over to the
    new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended
    to aid debuggers

    Kbuild:

    - Cleanup detection of LSE support in the assembler by introducing
    'as-instr'

    - Remove compressed Image files when building clean targets

    IP checksumming:

    - Implement optimised IPv4 checksumming routine when hardware offload
    is not in use. An IPv6 version is in the works, pending testing.

    Hardware errata:

    - Work around Cortex-A55 erratum #1530923

    Shadow call stack:

    - Work around some issues with Clang's integrated assembler not
    liking our perfectly reasonable assembly code

    - Avoid allocating the X18 register, so that it can be used to hold
    the shadow call stack pointer in future

    ACPI:

    - Fix ID count checking in IORT code. This may regress broken
    firmware that happened to work with the old implementation, in
    which case we'll have to revert it and try something else

    - Fix DAIF corruption on return from GHES handler with pseudo-NMIs

    Miscellaneous:

    - Whitelist some CPUs that are unaffected by Spectre-v2

    - Reduce frequency of ASID rollover when KPTI is compiled in but
    inactive

    - Reserve a couple of arch-specific PROT flags that are already used
    by Sparc and PowerPC and are planned for later use with BTI on
    arm64

    - Preparatory cleanup of our entry assembly code in preparation for
    moving more of it into C later on

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits)
    arm64: acpi: fix DAIF manipulation with pNMI
    arm64: kconfig: Fix alignment of E0PD help text
    arm64: Use v8.5-RNG entropy for KASLR seed
    arm64: Implement archrandom.h for ARMv8.5-RNG
    arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
    arm64: entry: Avoid empty alternatives entries
    arm64: Kconfig: select HAVE_FUTEX_CMPXCHG
    arm64: csum: Fix pathological zero-length calls
    arm64: entry: cleanup sp_el0 manipulation
    arm64: entry: cleanup el0 svc handler naming
    arm64: entry: mark all entry code as notrace
    arm64: assembler: remove smp_dmb macro
    arm64: assembler: remove inherit_daif macro
    ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()
    mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use
    arm64: Use macros instead of hard-coded constants for MAIR_EL1
    arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list
    arm64: kernel: avoid x18 in __cpu_soft_restart
    arm64: kvm: stop treating register x18 as caller save
    arm64/lib: copy_page: avoid x18 register in assembler code
    ...

    Linus Torvalds
     

27 Jan, 2020

6 commits

  • Thomas Richter reported:

    > Test case 66 'Use vfs_getname probe to get syscall args filenames'
    > is broken on s390, but works on x86. The test case fails with:
    >
    > [root@m35lp76 perf]# perf test -F 66
    > 66: Use vfs_getname probe to get syscall args filenames
    > :Recording open file:
    > [ perf record: Woken up 1 times to write data ]
    > [ perf record: Captured and wrote 0.004 MB /tmp/__perf_test.perf.data.TCdYj\
    > (20 samples) ]
    > Looking at perf.data file for vfs_getname records for the file we touched:
    > FAILED!
    > [root@m35lp76 perf]#

    The root cause was the print_fmt of the kprobe event that referenced the
    "ustring"

    > Setting up the kprobe event using perf command:
    >
    > # ./perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring"
    >
    > generates this format file:
    > [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/probe/\
    > vfs_getname/format
    > name: vfs_getname
    > ID: 1172
    > format:
    > field:unsigned short common_type; offset:0; size:2; signed:0;
    > field:unsigned char common_flags; offset:2; size:1; signed:0;
    > field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    > field:int common_pid; offset:4; size:4; signed:1;
    >
    > field:unsigned long __probe_ip; offset:8; size:8; signed:0;
    > field:__data_loc char[] pathname; offset:16; size:4; signed:1;
    >
    > print fmt: "(%lx) pathname=\"%s\"", REC->__probe_ip, REC->pathname

    Instead of using "__get_str(pathname)" it referenced it directly.

    Link: http://lkml.kernel.org/r/20200124100742.4050c15e@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
    Acked-by: Masami Hiramatsu
    Reported-by: Thomas Richter
    Tested-by: Thomas Richter
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-01-27

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 20 non-merge commits during the last 5 day(s) which contain
    a total of 24 files changed, 433 insertions(+), 104 deletions(-).

    The main changes are:

    1) Make BPF trampolines and dispatcher aware for the stack unwinder, from Jiri Olsa.

    2) Improve handling of failed CO-RE relocations in libbpf, from Andrii Nakryiko.

    3) Several fixes to BPF sockmap and reuseport selftests, from Lorenz Bauer.

    4) Various cleanups in BPF devmap's XDP flush code, from John Fastabend.

    5) Fix BPF flow dissector when used with port ranges, from Yoshiki Komachi.

    6) Fix bpffs' map_seq_next callback to always inc position index, from Vasily Averin.

    7) Allow overriding LLVM tooling for runqslower utility, from Andrey Ignatov.

    8) Silence false-positive lockdep splats in devmap hash lookup, from Amol Grover.

    9) Fix fentry/fexit selftests to initialize a variable before use, from John Sperbeck.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * pm-cpufreq:
    cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
    cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
    cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
    cpufreq: s3c: fix unbalances of cpufreq policy refcount
    cpufreq: imx-cpufreq-dt: Add i.MX8MP support
    cpufreq: Use imx-cpufreq-dt for i.MX8MP's speed grading
    cpufreq: tegra186: convert to devm_platform_ioremap_resource
    cpufreq: kirkwood: convert to devm_platform_ioremap_resource
    cpufreq: CPPC: put ACPI table after using it
    cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched

    * pm-sleep:
    PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
    PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
    PM: hibernate: Add more logging on hibernation failure
    PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
    PM: wakeup: Show statistics for deleted wakeup sources again
    PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()

    Rafael J. Wysocki
     
  • Now that we depend on rcu_call() and synchronize_rcu() to also wait
    for preempt_disabled region to complete the rcu read critical section
    in __dev_map_flush() is no longer required. Except in a few special
    cases in drivers that need it for other reasons.

    These originally ensured the map reference was safe while a map was
    also being free'd. And additionally that bpf program updates via
    ndo_bpf did not happen while flush updates were in flight. But flush
    by new rules can only be called from preempt-disabled NAPI context.
    The synchronize_rcu from the map free path and the rcu_call from the
    delete path will ensure the reference there is safe. So lets remove
    the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
    around how this is being protected.

    If the rcu_read_lock was required it would mean errors in the above
    logic and the original patch would also be wrong.

    Now that we have done above we put the rcu_read_lock in the driver
    code where it is needed in a driver dependent way. I think this
    helps readability of the code so we know where and why we are
    taking read locks. Most drivers will not need rcu_read_locks here
    and further XDP drivers already have rcu_read_locks in their code
    paths for reading xdp programs on RX side so this makes it symmetric
    where we don't have half of rcu critical sections define in driver
    and the other half in devmap.

    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jesper Dangaard Brouer
    Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com

    John Fastabend
     
  • Now that we rely on synchronize_rcu and call_rcu waiting to
    exit perempt-disable regions (NAPI) lets update the comments
    to reflect this.

    Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Björn Töpel
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/1580084042-11598-2-git-send-email-john.fastabend@gmail.com

    John Fastabend
     
  • If seq_file .next fuction does not change position index,
    read after some lseek can generate an unexpected output.

    See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283

    v1 -> v2: removed missed increment in end of function

    Signed-off-by: Vasily Averin
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com

    Vasily Averin
     

26 Jan, 2020

3 commits

  • This patch fixes the following sparse errors by annotating the
    sighand_struct with __rcu

    kernel/fork.c:1511:9: error: incompatible types in comparison expression
    kernel/exit.c:100:19: error: incompatible types in comparison expression
    kernel/signal.c:1370:27: error: incompatible types in comparison expression

    This fix introduces the following sparse error in signal.c due to
    checking the sighand pointer without rcu primitives:

    kernel/signal.c:1386:21: error: incompatible types in comparison expression

    This new sparse error is also fixed in this patch.

    Signed-off-by: Madhuparna Bhowmik
    Acked-by: Paul E. McKenney
    Link: https://lore.kernel.org/r/20200124045908.26389-1-madhuparnabhowmik10@gmail.com
    Signed-off-by: Christian Brauner

    Madhuparna Bhowmik
     
  • Minor conflict in mlx5 because changes happened to code that has
    moved meanwhile.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Boot-time processing often loops in the kernel longer than one might
    prefer, which can prevent expedited grace periods from completing in
    a timely manner. This in turn triggers a splat In nohz_full CPUs One
    could argue that long-looping code should be fixed, but on the other hand,
    boot time is a bit special.

    This commit therefore removes the splat. Later commits will add the
    splat back in, but in a way that removes false positives.

    Reported-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

25 Jan, 2020

1 commit

  • When unwinding the stack we need to identify each address
    to successfully continue. Adding latch tree to keep trampolines
    for quick lookup during the unwind.

    The patch uses first 48 bytes for latch tree node, leaving 4048
    bytes from the rest of the page for trampoline or dispatcher
    generated code.

    It's still enough not to affect trampoline and dispatcher progs
    maximum counts.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200123161508.915203-3-jolsa@kernel.org

    Jiri Olsa