14 Jun, 2020

2 commits

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix build rules in binderfs sample

    - fix build errors when Kbuild recurses to the top Makefile

    - covert '---help---' in Kconfig to 'help'

    * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    treewide: replace '---help---' in Kconfig files with 'help'
    kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
    samples: binderfs: really compile this sample and fix build issues

    Linus Torvalds
     
  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

12 Jun, 2020

5 commits

  • Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
    "The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
    which relies on compile-time instrumentation, and uses a
    watchpoint-based sampling approach to detect races.

    The feature was under development for quite some time and has already
    found legitimate bugs.

    Unfortunately it comes with a limitation, which was only understood
    late in the development cycle:

    It requires an up to date CLANG-11 compiler

    CLANG-11 is not yet released (scheduled for June), but it's the only
    compiler today which handles the kernel requirements and especially
    the annotations of functions to exclude them from KCSAN
    instrumentation correctly.

    These annotations really need to work so that low level entry code and
    especially int3 text poke handling can be completely isolated.

    A detailed discussion of the requirements and compiler issues can be
    found here:

    https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/

    We came to the conclusion that trying to work around compiler
    limitations and bugs again would end up in a major trainwreck, so
    requiring a working compiler seemed to be the best choice.

    For Continous Integration purposes the compiler restriction is
    manageable and that's where most xxSAN reports come from.

    For a change this limitation might make GCC people actually look at
    their bugs. Some issues with CSAN in GCC are 7 years old and one has
    been 'fixed' 3 years ago with a half baken solution which 'solved' the
    reported issue but not the underlying problem.

    The KCSAN developers also ponder to use a GCC plugin to become
    independent, but that's not something which will show up in a few
    days.

    Blocking KCSAN until wide spread compiler support is available is not
    a really good alternative because the continuous growth of lockless
    optimizations in the kernel demands proper tooling support"

    * tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
    compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
    compiler.h: Move function attributes to compiler_types.h
    compiler.h: Avoid nested statement expression in data_race()
    compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
    kcsan: Update Documentation to change supported compilers
    kcsan: Remove 'noinline' from __no_kcsan_or_inline
    kcsan: Pass option tsan-instrument-read-before-write to Clang
    kcsan: Support distinguishing volatile accesses
    kcsan: Restrict supported compilers
    kcsan: Avoid inserting __tsan_func_entry/exit if possible
    ubsan, kcsan: Don't combine sanitizer with kcov on clang
    objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
    kcsan: Add __kcsan_{enable,disable}_current() variants
    checkpatch: Warn about data_race() without comment
    kcsan: Use GFP_ATOMIC under spin lock
    Improve KCSAN documentation a bit
    kcsan: Make reporting aware of KCSAN tests
    kcsan: Fix function matching in report
    kcsan: Change data_race() to no longer require marking racing accesses
    kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
    ...

    Linus Torvalds
     
  • Action Required memory error should happen only when a processor is
    about to access to a corrupted memory, so it's synchronous and only
    affects current process/thread.

    Recently commit 872e9a205c84 ("mm, memory_failure: don't send
    BUS_MCEERR_AO for action required error") fixed the issue that Action
    Required memory could unnecessarily send SIGBUS to the processes which
    share the error memory. But we still have another issue that we could
    send SIGBUS to a wrong thread.

    This is because collect_procs() and task_early_kill() fails to add the
    current process to "to-kill" list. So this patch is suggesting to fix
    it. With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
    process/thread.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Acked-by: Tony Luck
    Acked-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-3-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Patch series "hwpoison: fixes signaling on memory error"

    This is a small patchset to solve issues in memory error handler to send
    SIGBUS to proper process/thread as expected in configuration. Please
    see descriptions in individual patches for more details.

    This patch (of 2):

    Early-kill policy is controlled from two types of settings, one is
    per-process setting prctl(PR_MCE_KILL) and the other is system-wide
    setting vm.memory_failure_early_kill. Users expect per-process setting
    to override system-wide setting as many other settings do, but
    early-kill setting doesn't work as such.

    For example, if a system configures vm.memory_failure_early_kill to 1
    (enabled), a process receives SIGBUS even if it's configured to
    explicitly disable PF_MCE_KILL by prctl(). That's not desirable for
    applications with their own policies.

    This patch is suggesting to change the priority of these two types of
    settings, by checking sysctl_memory_failure_early_kill only when a given
    process has the default kill policy.

    Note that this patch is solving a thread choice issue too.

    Originally, collect_procs() always chooses the main thread when
    vm.memory_failure_early_kill is 1, even if the process has a dedicated
    thread for memory error handling. SIGBUS should be sent to the
    dedicated thread if early-kill is enabled via
    vm.memory_failure_early_kill as we are doing for PR_MCE_KILL_EARLY
    processes.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Cc: Tony Luck
    Cc: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-1-git-send-email-naoya.horiguchi@nec.com
    Link: http://lkml.kernel.org/r/1591321039-22141-2-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Merge some more updates from Andrew Morton:

    - various hotfixes and minor things

    - hch's use_mm/unuse_mm clearnups

    Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
    lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.

    * emailed patches from Andrew Morton :
    kernel: set USER_DS in kthread_use_mm
    kernel: better document the use_mm/unuse_mm API contract
    kernel: move use_mm/unuse_mm to kthread.c
    kernel: move use_mm/unuse_mm to kthread.c
    stacktrace: cleanup inconsistent variable type
    lib: test get_count_order/long in test_bitops.c
    mm: add comments on pglist_data zones
    ocfs2: fix spelling mistake and grammar
    mm/debug_vm_pgtable: fix kernel crash by checking for THP support
    lib: fix bitmap_parse() on 64-bit big endian archs
    checkpatch: correct check for kernel parameters doc
    nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
    lib/lz4/lz4_decompress.c: document deliberate use of `&'
    kcov: check kcov_softirq in kcov_remote_stop()
    scripts/spelling: add a few more typos
    khugepaged: selftests: fix timeout condition in wait_for_scan()

    Linus Torvalds
     
  • Merge the state of the locking kcsan branch before the read/write_once()
    and the atomics modifications got merged.

    Squash the fallout of the rebase on top of the read/write once and atomic
    fallback work into the merge. The history of the original branch is
    preserved in tag locking-kcsan-2020-06-02.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

11 Jun, 2020

4 commits

  • Switch the function documentation to kerneldoc comments, and add
    WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
    not have ->mm set (or has ->mm set in the case of unuse_mm).

    Also give the functions a kthread_ prefix to better document the use case.

    [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
    Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
    [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
    Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Tested-by: Jens Axboe
    Reviewed-by: Jens Axboe
    Acked-by: Felix Kuehling
    Acked-by: Greg Kroah-Hartman [usb]
    Acked-by: Haren Myneni
    Cc: Alex Deucher
    Cc: Al Viro
    Cc: Felipe Balbi
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Zhenyu Wang
    Cc: Zhi Wang
    Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "improve use_mm / unuse_mm", v2.

    This series improves the use_mm / unuse_mm interface by better documenting
    the assumptions, and my taking the set_fs manipulations spread over the
    callers into the core API.

    This patch (of 3):

    Use the proper API instead.

    Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

    These helpers are only for use with kernel threads, and I will tie them
    more into the kthread infrastructure going forward. Also move the
    prototypes to kthread.h - mmu_context.h was a little weird to start with
    as it otherwise contains very low-level MM bits.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Tested-by: Jens Axboe
    Reviewed-by: Jens Axboe
    Acked-by: Felix Kuehling
    Cc: Alex Deucher
    Cc: Al Viro
    Cc: Felipe Balbi
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Zhenyu Wang
    Cc: Zhi Wang
    Cc: Greg Kroah-Hartman
    Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Architectures can have CONFIG_TRANSPARENT_HUGEPAGE enabled but no THP
    support enabled based on platforms. For ex: with 4K PAGE_SIZE ppc64
    supports THP only with radix translation.

    This results in below crash when running with hash translation and 4K
    PAGE_SIZE.

    kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:140!
    cpu 0x61: Vector: 700 (Program Check) at [c000000ff948f860]
    pc: debug_vm_pgtable+0x480/0x8b0
    lr: debug_vm_pgtable+0x474/0x8b0
    ...
    debug_vm_pgtable+0x374/0x8b0 (unreliable)
    do_one_initcall+0x98/0x4f0
    kernel_init_freeable+0x330/0x3fc
    kernel_init+0x24/0x148

    Check for THP support correctly

    Link: http://lkml.kernel.org/r/20200608125252.407659-1-aneesh.kumar@linux.ibm.com
    Fixes: 399145f9eb6c ("mm/debug: add tests validating architecture page table helpers")
    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Pull virtio updates from Michael Tsirkin:

    - virtio-mem: paravirtualized memory hotplug

    - support doorbell mapping for vdpa

    - config interrupt support in ifc

    - fixes all over the place

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
    vhost/test: fix up after API change
    virtio_mem: convert device block size into 64bit
    virtio-mem: drop unnecessary initialization
    ifcvf: implement config interrupt in IFCVF
    vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
    vhost_vdpa: Support config interrupt in vdpa
    ifcvf: ignore continuous setting same status value
    virtio-mem: Don't rely on implicit compiler padding for requests
    virtio-mem: Try to unplug the complete online memory block first
    virtio-mem: Use -ETXTBSY as error code if the device is busy
    virtio-mem: Unplug subblocks right-to-left
    virtio-mem: Drop manual check for already present memory
    virtio-mem: Add parent resource for all added "System RAM"
    virtio-mem: Better retry handling
    virtio-mem: Offline and remove completely unplugged memory blocks
    mm/memory_hotplug: Introduce offline_and_remove_memory()
    virtio-mem: Allow to offline partially unplugged memory blocks
    mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
    virtio-mem: Paravirtualized memory hotunplug part 2
    virtio-mem: Paravirtualized memory hotunplug part 1
    ...

    Linus Torvalds
     

10 Jun, 2020

23 commits

  • Allow the callers to distinguish a real unmapped address vs a range
    that can't be probed.

    Suggested-by: Masami Hiramatsu
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Reviewed-by: Masami Hiramatsu
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-24-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Provide alternative versions of probe_kernel_read, probe_kernel_write
    and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
    use arch hooks that are modelled after unsafe_{get,put}_user to access
    kernel memory in an exception safe way.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-19-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Move kernel access vs user access routines together to ease upcoming
    ifdefs.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-18-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Except for historical confusion in the kprobes/uprobes and bpf tracers,
    which has been fixed now, there is no good reason to ever allow user
    memory accesses from probe_kernel_read. Switch probe_kernel_read to only
    read from kernel memory.

    [akpm@linux-foundation.org: update it for "mm, dump_page(): do not crash with invalid mapping pointer"]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-17-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • All users are gone now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-16-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Currently architectures have to override every routine that probes
    kernel memory, which includes a pure read and strcpy, both in strict
    and not strict variants. Just provide a single arch hooks instead to
    make sure all architectures cover all the cases.

    [akpm@linux-foundation.org: fix !CONFIG_X86_64 build]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-11-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Each of the helpers has just two callers, which also different in
    dealing with kernel or userspace pointers. Just open code the logic
    in the callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-10-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This matches the naming of strnlen_user, and also makes it more clear
    what the function is supposed to do.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-9-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This matches the naming of strncpy_from_user_nofault, and also makes it
    more clear what the function is supposed to do.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-8-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This matches the naming of strncpy_from_user, and also makes it more
    clear what the function is supposed to do.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-7-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This file now also contains several helpers for accessing user memory.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-6-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Add proper kerneldoc comments for probe_kernel_read_strict and
    probe_kernel_read strncpy_from_unsafe_strict and explain the different
    versus the non-strict version.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-5-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • maccess tends to define lots of underscore prefixed symbols that then
    have other weak aliases. But except for two cases they are never
    actually used, so remove them.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-3-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "clean up and streamline probe_kernel_* and friends", v4.

    This series start cleaning up the safe kernel and user memory probing
    helpers in mm/maccess.c, and then allows architectures to implement the
    kernel probing without overriding the address space limit and temporarily
    allowing access to user memory. It then switches x86 over to this new
    mechanism by reusing the unsafe_* uaccess logic.

    This version also switches to the saner copy_{from,to}_kernel_nofault
    naming suggested by Linus.

    I kept the x86 helpers as-is without calling unsage_{get,put}_user as that
    avoids a number of hard to trace casts, and it will still work with the
    asm-goto based version easily.

    This patch (of 20):

    probe_kernel_write() is not used by any modular code.

    [sfr@canb.auug.org.au: turns out that probe_user_write is used in modular code]
    Link: http://lkml.kernel.org/r/20200602195741.4faaa348@canb.auug.org.au

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200521152301.2587579-2-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Rename the mmap_sem field to mmap_lock. Any new uses of this lock should
    now go through the new mmap locking api. The mmap_lock is still
    implemented as a rwsem, though this could change in the future.

    [akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add new APIs to assert that mmap_sem is held.

    Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
    makes the assertions more tolerant of future changes to the lock type.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Define a new initializer for the mmap locking api. Initially this just
    evaluates to __RWSEM_INITIALIZER as the API is defined as wrappers around
    rwsem.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-9-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • The replacement of with made the include
    of the latter in the middle of asm includes. Fix this up with the aid of
    the below script and manual adjustments here and there.

    import sys
    import re

    if len(sys.argv) is not 3:
    print "USAGE: %s " % (sys.argv[0])
    sys.exit(1)

    hdr_to_move="#include " % sys.argv[2]
    moved = False
    in_hdrs = False

    with open(sys.argv[1], "r") as f:
    lines = f.readlines()
    for _line in lines:
    line = _line.rstrip('
    ')
    if line == hdr_to_move:
    continue
    if line.startswith("#include
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The include/linux/pgtable.h is going to be the home of generic page table
    manipulation functions.

    Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
    make the latter include asm/pgtable.h.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

09 Jun, 2020

6 commits

  • Merge still more updates from Andrew Morton:
    "Various trees. Mainly those parts of MM whose linux-next dependents
    are now merged. I'm still sitting on ~160 patches which await merges
    from -next.

    Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
    panic, lib, sysctl, mm/gup, mm/pagemap"

    * emailed patches from Andrew Morton : (52 commits)
    doc: cgroup: update note about conditions when oom killer is invoked
    module: move the set_fs hack for flush_icache_range to m68k
    nommu: use flush_icache_user_range in brk and mmap
    binfmt_flat: use flush_icache_user_range
    exec: use flush_icache_user_range in read_code
    exec: only build read_code when needed
    m68k: implement flush_icache_user_range
    arm: rename flush_cache_user_range to flush_icache_user_range
    xtensa: implement flush_icache_user_range
    sh: implement flush_icache_user_range
    asm-generic: add a flush_icache_user_range stub
    mm: rename flush_icache_user_range to flush_icache_user_page
    arm,sparc,unicore32: remove flush_icache_user_range
    riscv: use asm-generic/cacheflush.h
    powerpc: use asm-generic/cacheflush.h
    openrisc: use asm-generic/cacheflush.h
    m68knommu: use asm-generic/cacheflush.h
    microblaze: use asm-generic/cacheflush.h
    ia64: use asm-generic/cacheflush.h
    hexagon: use asm-generic/cacheflush.h
    ...

    Linus Torvalds
     
  • These obviously operate on user addresses.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Greg Ungerer
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/20200515143646.3857579-29-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • All of the pin_user_pages*() API calls will cause pages to be
    dma-pinned. As such, they are all suitable for either DMA, RDMA, and/or
    Direct IO.

    The documentation should say so, but it was instead saying that three of
    the API calls were only suitable for Direct IO. This was discovered
    when a reviewer wondered why an API call that specifically recommended
    against Case 2 (DMA/RDMA) was being used in a DMA situation [1].

    Fix this by simply deleting those claims. The gup.c comments already
    refer to the more extensive Documentation/core-api/pin_user_pages.rst,
    which does have the correct guidance. So let's just write it once,
    there.

    [1] https://lore.kernel.org/r/20200529074658.GM30374@kadam

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Acked-by: Pankaj Gupta
    Acked-by: Souptick Joarder
    Cc: Dan Carpenter
    Cc: Jan Kara
    Cc: Vlastimil Babka
    Link: http://lkml.kernel.org/r/20200529084515.46259-1-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • This code was using get_user_pages*(), and all of the callers so far
    were in a "Case 2" scenario (DMA/RDMA), using the categorization from [1].

    That means that it's time to convert the get_user_pages*() + put_page()
    calls to pin_user_pages*() + unpin_user_pages() calls.

    There is some helpful background in [2]: basically, this is a small part
    of fixing a long-standing disconnect between pinning pages, and file
    systems' use of those pages.

    [1] Documentation/core-api/pin_user_pages.rst

    [2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Cc: Daniel Vetter
    Cc: Jérôme Glisse
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Pankaj Gupta
    Cc: Souptick Joarder
    Link: http://lkml.kernel.org/r/20200527223243.884385-3-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • Patch series "mm/gup: introduce pin_user_pages_locked(), use it in frame_vector.c", v2.

    This adds yet one more pin_user_pages*() variant, and uses that to
    convert mm/frame_vector.c.

    With this, along with maybe 20 or 30 other recent patches in various
    trees, we are close to having the relevant gup call sites
    converted--with the notable exception of the bio/block layer.

    This patch (of 2):

    Introduce pin_user_pages_locked(), which is nearly identical to
    get_user_pages_locked() except that it sets FOLL_PIN and rejects
    FOLL_GET.

    As with other pairs of get_user_pages*() and pin_user_pages() API calls,
    it's prudent to assert that FOLL_PIN is *not* set in the
    get_user_pages*() call, so add that as part of this.

    [jhubbard@nvidia.com: v2]
    Link: http://lkml.kernel.org/r/20200531234131.770697-2-jhubbard@nvidia.com

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Acked-by: Pankaj Gupta
    Cc: Daniel Vetter
    Cc: Jérôme Glisse
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Souptick Joarder
    Link: http://lkml.kernel.org/r/20200531234131.770697-1-jhubbard@nvidia.com
    Link: http://lkml.kernel.org/r/20200527223243.884385-1-jhubbard@nvidia.com
    Link: http://lkml.kernel.org/r/20200527223243.884385-2-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
    align with pin_user_pages_fast_only().

    As part of this we will get rid of write parameter. Instead caller will
    pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any
    existing functionality of the API.

    All the callers are changed to pass FOLL_WRITE.

    Also introduce get_user_page_fast_only(), and use it in a few places
    that hard-code nr_pages to 1.

    Updated the documentation of the API.

    Signed-off-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Reviewed-by: Paul Mackerras [arch/powerpc/kvm]
    Cc: Matthew Wilcox
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paolo Bonzini
    Cc: Stephen Rothwell
    Cc: Mike Rapoport
    Cc: Aneesh Kumar K.V
    Cc: Michal Suchanek
    Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com
    Signed-off-by: Linus Torvalds

    Souptick Joarder