24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

26 Oct, 2020

2 commits

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Pull more parisc updates from Helge Deller:

    - During this merge window O_NONBLOCK was changed to become 000200000,
    but we missed that the syscalls timerfd_create(), signalfd4(),
    eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict
    bit-wise check of the flags parameter.

    To provide backward compatibility with existing userspace we
    introduce parisc specific wrappers for those syscalls which filter
    out the old O_NONBLOCK value and replaces it with the new one.

    - Prevent HIL bus driver to get stuck when keyboard or mouse isn't
    attached

    - Improve error return codes when setting rtc time

    - Minor documentation fix in pata_ns87415.c

    * 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    ata: pata_ns87415.c: Document support on parisc with superio chip
    parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage
    hil/parisc: Disable HIL driver when it gets stuck
    parisc: Improve error return codes when setting rtc time

    Linus Torvalds
     

24 Oct, 2020

2 commits

  • The commit 75ae04206a4d ("parisc: Define O_NONBLOCK to become
    000200000") changed the O_NONBLOCK constant to have only one bit set
    (like all other architectures). This change broke some existing
    userspace code (e.g. udevadm, systemd-udevd, elogind) which called
    specific syscalls which do strict value checking on their flag
    parameter.

    This patch adds wrapper functions for the relevant syscalls. The
    wrappers masks out any old invalid O_NONBLOCK flags, reports in the
    syslog if the old O_NONBLOCK value was used and then calls the target
    syscall with the new O_NONBLOCK value.

    Fixes: 75ae04206a4d ("parisc: Define O_NONBLOCK to become 000200000")
    Signed-off-by: Helge Deller
    Tested-by: Meelis Roos
    Tested-by: Jeroen Roovers

    Helge Deller
     
  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

1 commit


19 Oct, 2020

1 commit

  • There is usecase that System Management Software(SMS) want to give a
    memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
    case of Android, it is the ActivityManagerService.

    The information required to make the reclaim decision is not known to the
    app. Instead, it is known to the centralized userspace
    daemon(ActivityManagerService), and that daemon must be able to initiate
    reclaim on its own without any app involvement.

    To solve the issue, this patch introduces a new syscall
    process_madvise(2). It uses pidfd of an external process to give the
    hint. It also supports vector address range because Android app has
    thousands of vmas due to zygote so it's totally waste of CPU and power if
    we should call the syscall one by one for each vma.(With testing 2000-vma
    syscall vs 1-vector syscall, it showed 15% performance improvement. I
    think it would be bigger in real practice because the testing ran very
    cache friendly environment).

    Another potential use case for the vector range is to amortize the cost
    ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
    benefit users like TCP receive zerocopy and malloc implementations. In
    future, we could find more usecases for other advises so let's make it
    happens as API since we introduce a new syscall at this moment. With
    that, existing madvise(2) user could replace it with process_madvise(2)
    with their own pid if they want to have batch address ranges support
    feature.

    ince it could affect other process's address range, only privileged
    process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
    UID) gives it the right to ptrace the process could use it successfully.
    The flag argument is reserved for future use if we need to extend the API.

    I think supporting all hints madvise has/will supported/support to
    process_madvise is rather risky. Because we are not sure all hints make
    sense from external process and implementation for the hint may rely on
    the caller being in the current context so it could be error-prone. Thus,
    I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

    If someone want to add other hints, we could hear the usecase and review
    it for each hint. It's safer for maintenance rather than introducing a
    buggy syscall but hard to fix it later.

    So finally, the API is as follows,

    ssize_t process_madvise(int pidfd, const struct iovec *iovec,
    unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
    The process_madvise() system call is used to give advice or directions
    to the kernel about the address ranges from external process as well as
    local process. It provides the advice to address ranges of process
    described by iovec and vlen. The goal of such advice is to improve
    system or application performance.

    The pidfd selects the process referred to by the PID file descriptor
    specified in pidfd. (See pidofd_open(2) for further information)

    The pointer iovec points to an array of iovec structures, defined in
    as:

    struct iovec {
    void *iov_base; /* starting address */
    size_t iov_len; /* number of bytes to be advised */
    };

    The iovec describes address ranges beginning at address(iov_base)
    and with size length of bytes(iov_len).

    The vlen represents the number of elements in iovec.

    The advice is indicated in the advice argument, which is one of the
    following at this moment if the target process specified by pidfd is
    external.

    MADV_COLD
    MADV_PAGEOUT

    Permission to provide a hint to external process is governed by a
    ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

    The process_madvise supports every advice madvise(2) has if target
    process is in same thread group with calling process so user could
    use process_madvise(2) to extend existing madvise(2) to support
    vector address ranges.

    RETURN VALUE
    On success, process_madvise() returns the number of bytes advised.
    This return value may be less than the total number of requested
    bytes, if an error occurred. The caller should check return value
    to determine whether a partial advice occurred.

    FAQ:

    Q.1 - Why does any external entity have better knowledge?

    Quote from Sandeep

    "For Android, every application (including the special SystemServer)
    are forked from Zygote. The reason of course is to share as many
    libraries and classes between the two as possible to benefit from the
    preloading during boot.

    After applications start, (almost) all of the APIs end up calling into
    this SystemServer process over IPC (binder) and back to the
    application.

    In a fully running system, the SystemServer monitors every single
    process periodically to calculate their PSS / RSS and also decides
    which process is "important" to the user for interactivity.

    So, because of how these processes start _and_ the fact that the
    SystemServer is looping to monitor each process, it does tend to *know*
    which address range of the application is not used / useful.

    Besides, we can never rely on applications to clean things up
    themselves. We've had the "hey app1, the system is low on memory,
    please trim your memory usage down" notifications for a long time[1].
    They rely on applications honoring the broadcasts and very few do.

    So, if we want to avoid the inevitable killing of the application and
    restarting it, some way to be able to tell the OS about unimportant
    memory in these applications will be useful.

    - ssp

    Q.2 - How to guarantee the race(i.e., object validation) between when
    giving a hint from an external process and get the hint from the target
    process?

    process_madvise operates on the target process's address space as it
    exists at the instant that process_madvise is called. If the space
    target process can run between the time the process_madvise process
    inspects the target process address space and the time that
    process_madvise is actually called, process_madvise may operate on
    memory regions that the calling process does not expect. It's the
    responsibility of the process calling process_madvise to close this
    race condition. For example, the calling process can suspend the
    target process with ptrace, SIGSTOP, or the freezer cgroup so that it
    doesn't have an opportunity to change its own address space before
    process_madvise is called. Another option is to operate on memory
    regions that the caller knows a priori will be unchanged in the target
    process. Yet another option is to accept the race for certain
    process_madvise calls after reasoning that mistargeting will do no
    harm. The suggested API itself does not provide synchronization. It
    also apply other APIs like move_pages, process_vm_write.

    The race isn't really a problem though. Why is it so wrong to require
    that callers do their own synchronization in some manner? Nobody
    objects to write(2) merely because it's possible for two processes to
    open the same file and clobber each other's writes --- instead, we tell
    people to use flock or something. Think about mmap. It never
    guarantees newly allocated address space is still valid when the user
    tries to access it because other threads could unmap the memory right
    before. That's where we need synchronization by using other API or
    design from userside. It shouldn't be part of API itself. If someone
    needs more fine-grained synchronization rather than process level,
    there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
    applicable via using last reserved argument of the API but I don't
    think it's necessary right now since we have already ways to prevent
    the race so don't want to add additional complexity with more
    fine-grained optimization model.

    To make the API extend, it reserved an unsigned long as last argument
    so we could support it in future if someone really needs it.

    Q.3 - Why doesn't ptrace work?

    Injecting an madvise in the target process using ptrace would not work
    for us because such injected madvise would have to be executed by the
    target process, which means that process would have to be runnable and
    that creates the risk of the abovementioned race and hinting a wrong
    VMA. Furthermore, we want to act the hint in caller's context, not the
    callee's, because the callee is usually limited in cpuset/cgroups or
    even freezed state so they can't act by themselves quick enough, which
    causes more thrashing/kill. It doesn't work if the target process are
    ptraced(e.g., strace, debugger, minidump) because a process can have at
    most one ptracer.

    [1] https://developer.android.com/topic/performance/memory"

    [2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

    [3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

    [minchan@kernel.org: fix process_madvise build break for arm64]
    Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
    [minchan@kernel.org: fix build error for mips of process_madvise]
    Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
    [akpm@linux-foundation.org: fix patch ordering issue]
    [akpm@linux-foundation.org: fix arm64 whoops]
    [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
    [akpm@linux-foundation.org: fix i386 build]
    [sfr@canb.auug.org.au: fix syscall numbering]
    Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
    [sfr@canb.auug.org.au: madvise.c needs compat.h]
    Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
    [minchan@kernel.org: fix mips build]
    Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
    [yuehaibing@huawei.com: remove duplicate header which is included twice]
    Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
    [minchan@kernel.org: do not use helper functions for process_madvise]
    Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
    [akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
    [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
    Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

    Signed-off-by: Minchan Kim
    Signed-off-by: YueHaibing
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Alexander Duyck
    Cc: Brian Geffon
    Cc: Christian Brauner
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Jens Axboe
    Cc: Joel Fernandes
    Cc: Johannes Weiner
    Cc: John Dias
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleksandr Natalenko
    Cc: Sandeep Patil
    Cc: SeongJae Park
    Cc: SeongJae Park
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Tim Murray
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
    Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Oct, 2020

1 commit


16 Oct, 2020

2 commits

  • Pull parisc updates from Helge Deller:

    - Added fw_cfg support for parisc on qemu

    - Added font support in sti text console driver for byte- and word-mode
    ROMs

    - Switch to more fine grained lws locks and improve spinlock handling

    - Add ioread64_hi_lo() and iowrite64_hi_lo() to avoid 0-day linking
    errors

    - Mark pointers volatile in __xchg8(), __xchg32() and __xchg64() to
    help compiler

    - Header file cleanups, mostly removal of unused HP-UX compat defines

    - Drop one bit from our O_NONBLOCK define to become now 000200000

    - Add MAP_UNINITIALIZED define to avoid userspace compile errors

    - Drop CONFIG_IDE from defconfigs

    - Speed up synchronize_caches() on UP machines

    - Rewrite tlb flush threshold calculation

    - Comment fixes and cleanups

    * 'parisc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc/sticon: Add user font support
    parisc/sticon: Always register sticon console driver
    parisc: Add MAP_UNINITIALIZED define
    parisc: Improve spinlock handling
    parisc: Install vmlinuz instead of zImage file
    parisc: Rewrite tlb flush threshold calculation
    parisc: Switch to more fine grained lws locks
    parisc: Mark pointers volatile in __xchg8(), __xchg32() and __xchg64()
    parisc: Fix comments and enable interrupts later
    parisc: Add alternative patching to synchronize_caches define
    parisc: Add ioread64_hi_lo() and iowrite64_hi_lo()
    parisc: disable CONFIG_IDE in defconfigs
    parisc: Drop useless comments in uapi/asm/signal.h
    parisc: Define O_NONBLOCK to become 000200000
    parisc: Drop HP-UX specific fcntl and signal flags
    parisc: Avoid external interrupts when IPI finishes
    parisc: Add qemu fw_cfg interface
    fw_cfg: Add support for parisc architecture

    Linus Torvalds
     
  • Pull dma-mapping updates from Christoph Hellwig:

    - rework the non-coherent DMA allocator

    - move private definitions out of

    - lower CMA_ALIGNMENT (Paul Cercueil)

    - remove the omap1 dma address translation in favor of the common code

    - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

    - support per-node DMA CMA areas (Barry Song)

    - increase the default seg boundary limit (Nicolin Chen)

    - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

    - various cleanups

    * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
    ARM/ixp4xx: add a missing include of dma-map-ops.h
    dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
    dma-direct: factor out a dma_direct_alloc_from_pool helper
    dma-direct check for highmem pages in dma_direct_alloc_pages
    dma-mapping: merge into
    dma-mapping: move large parts of to kernel/dma
    dma-mapping: move dma-debug.h to kernel/dma/
    dma-mapping: remove
    dma-mapping: merge into
    dma-contiguous: remove dma_contiguous_set_default
    dma-contiguous: remove dev_set_cma_area
    dma-contiguous: remove dma_declare_contiguous
    dma-mapping: split
    cma: decrease CMA_ALIGNMENT lower limit to 2
    firewire-ohci: use dma_alloc_pages
    dma-iommu: implement ->alloc_noncoherent
    dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
    dma-mapping: add a new dma_alloc_pages API
    dma-mapping: remove dma_cache_sync
    53c700: convert to dma_alloc_noncoherent
    ...

    Linus Torvalds
     

15 Oct, 2020

5 commits


13 Oct, 2020

4 commits

  • Pull compat mount cleanups from Al Viro:
    "The last remnants of mount(2) compat buried by Christoph.

    Buried into NFS, that is.

    Generally I'm less enthusiastic about "let's use in_compat_syscall()
    deep in call chain" kind of approach than Christoph seems to be, but
    in this case it's warranted - that had been an NFS-specific wart,
    hopefully not to be repeated in any other filesystems (read: any new
    filesystem introducing non-text mount options will get NAKed even if
    it doesn't mess the layout up).

    IOW, not worth trying to grow an infrastructure that would avoid that
    use of in_compat_syscall()..."

    * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove compat_sys_mount
    fs,nfs: lift compat nfs4 mount data handling into the nfs code
    nfs: simplify nfs4_parse_monolithic

    Linus Torvalds
     
  • Pull compat iovec cleanups from Al Viro:
    "Christoph's series around import_iovec() and compat variant thereof"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    security/keys: remove compat_keyctl_instantiate_key_iov
    mm: remove compat_process_vm_{readv,writev}
    fs: remove compat_sys_vmsplice
    fs: remove the compat readv/writev syscalls
    fs: remove various compat readv/writev helpers
    iov_iter: transparently handle compat iovecs in import_iovec
    iov_iter: refactor rw_copy_check_uvector and import_iovec
    iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
    compat.h: fix a spelling error in

    Linus Torvalds
     
  • Pull perf/kprobes updates from Ingo Molnar:
    "This prepares to unify the kretprobe trampoline handler and make
    kretprobe lockless (those patches are still work in progress)"

    * tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
    kprobes: Make local functions static
    kprobes: Free kretprobe_instance with RCU callback
    kprobes: Remove NMI context check
    sparc: kprobes: Use generic kretprobe trampoline handler
    sh: kprobes: Use generic kretprobe trampoline handler
    s390: kprobes: Use generic kretprobe trampoline handler
    powerpc: kprobes: Use generic kretprobe trampoline handler
    parisc: kprobes: Use generic kretprobe trampoline handler
    mips: kprobes: Use generic kretprobe trampoline handler
    ia64: kprobes: Use generic kretprobe trampoline handler
    csky: kprobes: Use generic kretprobe trampoline handler
    arc: kprobes: Use generic kretprobe trampoline handler
    arm64: kprobes: Use generic kretprobe trampoline handler
    arm: kprobes: Use generic kretprobe trampoline handler
    x86/kprobes: Use generic kretprobe trampoline handler
    kprobes: Add generic kretprobe trampoline handler

    Linus Torvalds
     
  • Pull orphan section checking from Ingo Molnar:
    "Orphan link sections were a long-standing source of obscure bugs,
    because the heuristics that various linkers & compilers use to handle
    them (include these bits into the output image vs discarding them
    silently) are both highly idiosyncratic and also version dependent.

    Instead of this historically problematic mess, this tree by Kees Cook
    (et al) adds build time asserts and build time warnings if there's any
    orphan section in the kernel or if a section is not sized as expected.

    And because we relied on so many silent assumptions in this area, fix
    a metric ton of dependencies and some outright bugs related to this,
    before we can finally enable the checks on the x86, ARM and ARM64
    platforms"

    * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/boot/compressed: Warn on orphan section placement
    x86/build: Warn on orphan section placement
    arm/boot: Warn on orphan section placement
    arm/build: Warn on orphan section placement
    arm64/build: Warn on orphan section placement
    x86/boot/compressed: Add missing debugging sections to output
    x86/boot/compressed: Remove, discard, or assert for unwanted sections
    x86/boot/compressed: Reorganize zero-size section asserts
    x86/build: Add asserts for unwanted sections
    x86/build: Enforce an empty .got.plt section
    x86/asm: Avoid generating unused kprobe sections
    arm/boot: Handle all sections explicitly
    arm/build: Assert for unwanted sections
    arm/build: Add missing sections
    arm/build: Explicitly keep .ARM.attributes sections
    arm/build: Refactor linker script headers
    arm64/build: Assert for unwanted sections
    arm64/build: Add missing DWARF sections
    arm64/build: Use common DISCARDS in linker script
    arm64/build: Remove .eh_frame* sections due to unwind tables
    ...

    Linus Torvalds
     

06 Oct, 2020

2 commits


03 Oct, 2020

3 commits


25 Sep, 2020

1 commit


23 Sep, 2020

1 commit


08 Sep, 2020

1 commit


01 Sep, 2020

1 commit

  • The .comment section doesn't belong in STABS_DEBUG. Split it out into a
    new macro named ELF_DETAILS. This will gain other non-debug sections
    that need to be accounted for when linking with --orphan-handling=warn.

    Signed-off-by: Kees Cook
    Signed-off-by: Ingo Molnar
    Cc: linux-arch@vger.kernel.org
    Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org

    Kees Cook
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

15 Aug, 2020

1 commit

  • Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
    sys_sysctl is actually unavailable: any input can only return an error.

    We have been warning about people using the sysctl system call for years
    and believe there are no more users. Even if there are users of this
    interface if they have not complained or fixed their code by now they
    probably are not going to, so there is no point in warning them any
    longer.

    So completely remove sys_sysctl on all architectures.

    [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
    Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com

    Signed-off-by: Xiaoming Ni
    Signed-off-by: Andrew Morton
    Acked-by: Will Deacon [arm/arm64]
    Acked-by: "Eric W. Biederman"
    Cc: Aleksa Sarai
    Cc: Alexander Shishkin
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Bin Meng
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Catalin Marinas
    Cc: chenzefeng
    Cc: Christian Borntraeger
    Cc: Christian Brauner
    Cc: Chris Zankel
    Cc: David Howells
    Cc: David S. Miller
    Cc: Diego Elio Pettenò
    Cc: Dmitry Vyukov
    Cc: Dominik Brodowski
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Iurii Zaikin
    Cc: Ivan Kokshaysky
    Cc: James Bottomley
    Cc: Jens Axboe
    Cc: Jiri Olsa
    Cc: Kars de Jong
    Cc: Kees Cook
    Cc: Krzysztof Kozlowski
    Cc: Luis Chamberlain
    Cc: Marco Elver
    Cc: Mark Rutland
    Cc: Martin K. Petersen
    Cc: Masahiro Yamada
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miklos Szeredi
    Cc: Minchan Kim
    Cc: Namhyung Kim
    Cc: Naveen N. Rao
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Olof Johansson
    Cc: Paul Burton
    Cc: "Paul E. McKenney"
    Cc: Paul Mackerras
    Cc: Peter Zijlstra (Intel)
    Cc: Randy Dunlap
    Cc: Ravi Bangoria
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Sami Tolvanen
    Cc: Sargun Dhillon
    Cc: Stephen Rothwell
    Cc: Sudeep Holla
    Cc: Sven Schnelle
    Cc: Thiago Jung Bauermann
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vasily Gorbik
    Cc: Vlastimil Babka
    Cc: Yoshinori Sato
    Cc: Zhou Yanjie
    Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
    Signed-off-by: Linus Torvalds

    Xiaoming Ni
     

13 Aug, 2020

1 commit

  • Pull more parisc updates from Helge Deller:

    - Oscar Carter contributed a patch which fixes parisc's usage of
    dereference_function_descriptor() and thus will allow using the
    -Wcast-function-type compiler option in the top-level Makefile

    - Sven Schnelle fixed a bug in the SBA code to prevent crashes during
    kexec

    - John David Anglin provided implementations for __smp_store_release()
    and __smp_load_acquire barriers() which avoids using the sync
    assembler instruction and thus speeds up barrier paths

    - Some whitespace cleanups in parisc's atomic.h header file

    * 'parisc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Implement __smp_store_release and __smp_load_acquire barriers
    parisc: mask out enable and reserved bits from sba imask
    parisc: Whitespace cleanups in atomic.h
    parisc/kernel/ftrace: Remove function callback casts
    sections.h: dereference_function_descriptor() returns void pointer

    Linus Torvalds
     

11 Aug, 2020

1 commit


10 Aug, 2020

1 commit

  • Pull regset conversion fix from Al Viro:
    "Fix a regression from an unnoticed bisect hazard in the regset series.

    A bunch of old (aout, originally) primitives used by coredumps became
    dead code after fdpic conversion to regsets. Removal of that dead code
    had been the first commit in the followups to regset series;
    unfortunately, it happened to hide the bisect hazard on sh (extern for
    fpregs_get() had not been updated in the main series when it should
    have been; followup simply made fpregs_get() static). And without that
    followup commit this bisect hazard became breakage in the mainline"

    Tested-by: John Paul Adrian Glaubitz

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    kill unused dump_fpu() instances

    Linus Torvalds
     

08 Aug, 2020

3 commits

  • Merge misc updates from Andrew Morton:

    - a few MM hotfixes

    - kthread, tools, scripts, ntfs and ocfs2

    - some of MM

    Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
    ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
    debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
    sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

    * emailed patches from Andrew Morton : (162 commits)
    mm: vmscan: consistent update to pgrefill
    mm/vmscan.c: fix typo
    khugepaged: khugepaged_test_exit() check mmget_still_valid()
    khugepaged: retract_page_tables() remember to test exit
    khugepaged: collapse_pte_mapped_thp() protect the pmd lock
    khugepaged: collapse_pte_mapped_thp() flush the right range
    mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
    mm: thp: replace HTTP links with HTTPS ones
    mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
    mm/page_alloc.c: skip setting nodemask when we are in interrupt
    mm/page_alloc: fallbacks at most has 3 elements
    mm/page_alloc: silence a KASAN false positive
    mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
    mm/page_alloc.c: simplify pageblock bitmap access
    mm/page_alloc.c: extract the common part in pfn_to_bitidx()
    mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
    mm/shuffle: remove dynamic reconfiguration
    mm/memory_hotplug: document why shuffle_zone() is relevant
    mm/page_alloc: remove nr_free_pagecache_pages()
    mm: remove vm_total_pages
    ...

    Linus Torvalds
     
  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Pull ptrace regset updates from Al Viro:
    "Internal regset API changes:

    - regularize copy_regset_{to,from}_user() callers

    - switch to saner calling conventions for ->get()

    - kill user_regset_copyout()

    The ->put() side of things will have to wait for the next cycle,
    unfortunately.

    The balance is about -1KLoC and replacements for ->get() instances are
    a lot saner"

    * 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    regset: kill user_regset_copyout{,_zero}()
    regset(): kill ->get_size()
    regset: kill ->get()
    csky: switch to ->regset_get()
    xtensa: switch to ->regset_get()
    parisc: switch to ->regset_get()
    nds32: switch to ->regset_get()
    nios2: switch to ->regset_get()
    hexagon: switch to ->regset_get()
    h8300: switch to ->regset_get()
    openrisc: switch to ->regset_get()
    riscv: switch to ->regset_get()
    c6x: switch to ->regset_get()
    ia64: switch to ->regset_get()
    arc: switch to ->regset_get()
    arm: switch to ->regset_get()
    sh: convert to ->regset_get()
    arm64: switch to ->regset_get()
    mips: switch to ->regset_get()
    sparc: switch to ->regset_get()
    ...

    Linus Torvalds
     

06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

05 Aug, 2020

3 commits

  • Pull documentation updates from Jonathan Corbet:
    "It's been a busy cycle for documentation - hopefully the busiest for a
    while to come. Changes include:

    - Some new Chinese translations

    - Progress on the battle against double words words and non-HTTPS
    URLs

    - Some block-mq documentation

    - More RST conversions from Mauro. At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again
    for a while. Unless we decide to switch to asciidoc or
    something...:)

    - Lots of typo fixes, warning fixes, and more"

    * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
    scripts/kernel-doc: optionally treat warnings as errors
    docs: ia64: correct typo
    mailmap: add entry for
    doc/zh_CN: add cpu-load Chinese version
    Documentation/admin-guide: tainted-kernels: fix spelling mistake
    MAINTAINERS: adjust kprobes.rst entry to new location
    devices.txt: document rfkill allocation
    PCI: correct flag name
    docs: filesystems: vfs: correct flag name
    docs: filesystems: vfs: correct sync_mode flag names
    docs: path-lookup: markup fixes for emphasis
    docs: path-lookup: more markup fixes
    docs: path-lookup: fix HTML entity mojibake
    CREDITS: Replace HTTP links with HTTPS ones
    docs: process: Add an example for creating a fixes tag
    doc/zh_CN: add Chinese translation prefer section
    doc/zh_CN: add clearing-warn-once Chinese version
    doc/zh_CN: add admin-guide index
    doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
    futex: MAINTAINERS: Re-add selftests directory
    ...

    Linus Torvalds
     
  • Pull parisc updates from Helge Deller:
    "The majority of the patches are reverts of previous commits regarding
    the parisc-specific low level spinlocking code and barrier handling,
    with which we tried to fix CPU stalls on our build servers. In the end
    John David Anglin found the culprit: We missed a define for
    atomic64_set_release(). This seems to have fixed our issues, so now
    it's good to remove the unnecessary code again.

    Other than that it's trivial stuff: Spelling fixes, constifications
    and such"

    * 'parisc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: make the log level string for register dumps const
    parisc: Do not use an ordered store in pa_tlb_lock()
    Revert "parisc: Revert "Release spinlocks using ordered store""
    Revert "parisc: Use ldcw instruction for SMP spinlock release barrier"
    Revert "parisc: Drop LDCW barrier in CAS code when running UP"
    Revert "parisc: Improve interrupt handling in arch_spin_lock_flags()"
    parisc: Replace HTTP links with HTTPS ones
    parisc: elf.h: delete a duplicated word
    parisc: Report bad pages as HardwareCorrupted
    parisc: Convert to BIT_MASK() and BIT_WORD()

    Linus Torvalds
     
  • Pull close_range() implementation from Christian Brauner:
    "This adds the close_range() syscall. It allows to efficiently close a
    range of file descriptors up to all file descriptors of a calling
    task.

    This is coordinated with the FreeBSD folks which have copied our
    version of this syscall and in the meantime have already merged it in
    April 2019:

    https://reviews.freebsd.org/D21627
    https://svnweb.freebsd.org/base?view=revision&revision=359836

    The syscall originally came up in a discussion around the new mount
    API and making new file descriptor types cloexec by default. During
    this discussion, Al suggested the close_range() syscall.

    First, it helps to close all file descriptors of an exec()ing task.
    This can be done safely via (quoting Al's example from [1] verbatim):

    /* that exec is sensitive */
    unshare(CLONE_FILES);
    /* we don't want anything past stderr here */
    close_range(3, ~0U);
    execve(....);

    The code snippet above is one way of working around the problem that
    file descriptors are not cloexec by default. This is aggravated by the
    fact that we can't just switch them over without massively regressing
    userspace. For a whole class of programs having an in-kernel method of
    closing all file descriptors is very helpful (e.g. demons, service
    managers, programming language standard libraries, container managers
    etc.).

    Second, it allows userspace to avoid implementing closing all file
    descriptors by parsing through /proc//fd/* and calling close() on
    each file descriptor and other hacks. From looking at various
    large(ish) userspace code bases this or similar patterns are very
    common in service managers, container runtimes, and programming
    language runtimes/standard libraries such as Python or Rust.

    In addition, the syscall will also work for tasks that do not have
    procfs mounted and on kernels that do not have procfs support compiled
    in. In such situations the only way to make sure that all file
    descriptors are closed is to call close() on each file descriptor up
    to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.

    Based on Linus' suggestion close_range() also comes with a new flag
    CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
    right before exec. This would usually be expressed in the sequence:

    unshare(CLONE_FILES);
    close_range(3, ~0U);

    as pointed out by Linus it might be desirable to have this be a part
    of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
    gets especially handy when we're closing all file descriptors above a
    certain threshold.

    Test-suite as always included"

    * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    tests: add CLOSE_RANGE_UNSHARE tests
    close_range: add CLOSE_RANGE_UNSHARE
    tests: add close_range() tests
    arch: wire-up close_range()
    open: add close_range()

    Linus Torvalds