26 Oct, 2020

1 commit


17 Oct, 2020

1 commit

  • The preceding patches have ensured that core dumping properly takes the
    mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
    its users.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     

11 Aug, 2020

2 commits

  • …ub/scm/linux/kernel/git/acme/linux") into android-mainline

    Tiny steps on the way to 5.9-rc1.

    Fixes conflicts in:
    fs/f2fs/inline.c

    Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
    Change-Id: I16d863ae44a51156499458e8c3486587cbe2babe

    Greg Kroah-Hartman
     
  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

06 Aug, 2020

1 commit


04 Aug, 2020

1 commit

  • Instead of waiting in a loop for the userfaultfd condition to become
    true, just wait once and return VM_FAULT_RETRY.

    We've already dropped the mmap lock, we know we can't really
    successfully handle the fault at this point and the caller will have to
    retry anyway. So there's no point in making the wait any more
    complicated than it needs to be - just schedule away.

    And once you don't have that complexity with explicit looping, you can
    also just lose all the 'userfaultfd_signal_pending()' complexity,
    because once we've set the correct process sleeping state, and don't
    loop, the act of scheduling itself will be checking if there are any
    pending signals before going to sleep.

    We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
    retried faults as major soon anyway (series to regularize and share more
    of fault handling across architectures in a separate series by Peter Xu,
    and in the meantime we won't worry about the possible minor - I'll be
    here all week, try the veal - accounting difference).

    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. A plain seqcount_t does not
    contain the information of which lock must be held when entering a write
    side critical section.

    Use the new seqcount_spinlock_t data type, which allows to associate a
    spinlock with the sequence counter. This enables lockdep to verify that
    the spinlock used for writer serialization is held when the write side
    critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de

    Ahmed S. Darwish
     

24 Jun, 2020

1 commit


10 Jun, 2020

4 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add new APIs to assert that mmap_sem is held.

    Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
    makes the assertions more tolerant of future changes to the lock type.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

10 Apr, 2020

1 commit


08 Apr, 2020

4 commits

  • Only declare _UFFDIO_WRITEPROTECT if the user specified
    UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user
    registers regions with shmem/hugetlbfs we won't expose the new ioctl to
    them. Even with complete anonymous memory range, we'll only expose the
    new WP ioctl bit if the register mode has MODE_WP.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • It does not make sense to try to wake up any waiting thread when we're
    write-protecting a memory region. Only wake up when resolving a write
    protected page fault.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Introduce the new uffd-wp APIs for userspace.

    Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
    using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can
    co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
    userspace program can not only resolve missing page faults, and at the
    same time tracking page data changes along the way.

    Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
    write protection tracking. Note that we will need to register the memory
    region with UFFDIO_REGISTER_MODE_WP before that.

    [peterx@redhat.com: write up the commit message]
    [peterx@redhat.com: remove useless block, write commit message, check against
    VM_MAYWRITE rather than VM_WRITE when register]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This allows UFFDIO_COPY to map pages write-protected.

    [peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
    around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
    commit messages]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Reviewed-by: Mike Rapoport
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

07 Apr, 2020

1 commit


03 Apr, 2020

3 commits

  • Userfaultfd fault path was by default killable even if the caller does not
    have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup
    we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous
    patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
    also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.

    Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
    right now, this patch should have no functional change. It also cleaned
    the code a little bit by introducing some helpers.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • handle_userfaultfd() is currently the only one place in the kernel page
    fault procedures that can respond to non-fatal userspace signals. It was
    trying to detect such an allowance by checking against USER & KILLABLE
    flags, which was "un-official".

    In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
    that the fault handler allows the fault procedure to respond even to
    non-fatal signals. Meanwhile, add this new flag to the default fault
    flags so that all the page fault handlers can benefit from the new flag.
    With that, replacing the userfault check to this one.

    Since the line is getting even longer, clean up the fault flags a bit too
    to ease TTY users.

    Although we've got a new flag and applied it, we shouldn't have any
    functional change with this patch so far.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Reviewed-by: David Hildenbrand
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • This patch removes the risk path in handle_userfault() then we will be
    sure that the callers of handle_mm_fault() will know that the VMAs might
    have changed. Meanwhile with previous patch we don't lose responsiveness
    as well since the core mm code now can handle the nonfatal userspace
    signals even if we return VM_FAULT_RETRY.

    Suggested-by: Andrea Arcangeli
    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Reviewed-by: Jerome Glisse
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

09 Dec, 2019

1 commit


02 Dec, 2019

3 commits

  • Merge updates from Andrew Morton:
    "Incoming:

    - a small number of updates to scripts/, ocfs2 and fs/buffer.c

    - most of MM

    I still have quite a lot of material (mostly not MM) staged after
    linux-next due to -next dependencies. I'll send those across next week
    as the preprequisites get merged up"

    * emailed patches from Andrew Morton : (135 commits)
    mm/page_io.c: annotate refault stalls from swap_readpage
    mm/Kconfig: fix trivial help text punctuation
    mm/Kconfig: fix indentation
    mm/memory_hotplug.c: remove __online_page_set_limits()
    mm: fix typos in comments when calling __SetPageUptodate()
    mm: fix struct member name in function comments
    mm/shmem.c: cast the type of unmap_start to u64
    mm: shmem: use proper gfp flags for shmem_writepage()
    mm/shmem.c: make array 'values' static const, makes object smaller
    userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
    fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
    userfaultfd: wrap the common dst_vma check into an inlined function
    userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
    userfaultfd: use vma_pagesize for all huge page size calculation
    mm/madvise.c: use PAGE_ALIGN[ED] for range checking
    mm/madvise.c: replace with page_size() in madvise_inject_error()
    mm/mmap.c: make vma_merge() comment more easy to understand
    mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
    autonuma: reduce cache footprint when scanning page tables
    autonuma: fix watermark checking in migrate_balanced_pgdat()
    ...

    Linus Torvalds
     
  • A while ago Andy noticed
    (http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
    that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
    security implications.

    As the first step of the solution the following patch limits the availably
    of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.

    The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.

    Yet, if there are other users of non-cooperative userfaultfd that run
    without CAP_SYS_PTRACE, they would be broken :(

    Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
    descriptor table from the read() implementation of uffd, which may have
    security implications for unprivileged use of the userfaultfd.

    Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
    CAP_SYS_PTRACE.

    Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrea Arcangeli
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Lokesh Gidra
    Cc: Nick Kralevich
    Cc: Nosh Minwalla
    Cc: Pavel Emelyanov
    Cc: Tim Murray
    Cc: Aleksa Sarai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
    need to be cleared. Currently setting UFFDIO_REGISTER_MODE_WP returns
    -EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
    is applied.

    Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
    Signed-off-by: Andrea Arcangeli
    Reported-by: Wei Yang
    Reviewed-by: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

23 Oct, 2019

1 commit

  • The .ioctl and .compat_ioctl file operations have the same prototype so
    they can both point to the same function, which works great almost all
    the time when all the commands are compatible.

    One exception is the s390 architecture, where a compat pointer is only
    31 bit wide, and converting it into a 64-bit pointer requires calling
    compat_ptr(). Most drivers here will never run in s390, but since we now
    have a generic helper for it, it's easy enough to use it consistently.

    I double-checked all these drivers to ensure that all ioctl arguments
    are used as pointers or are ignored, but are not interpreted as integer
    values.

    Acked-by: Jason Gunthorpe
    Acked-by: Daniel Vetter
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Sterba
    Acked-by: Darren Hart (VMware)
    Acked-by: Jonathan Cameron
    Acked-by: Bjorn Andersson
    Acked-by: Dan Williams
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

02 Oct, 2019

1 commit


26 Sep, 2019

1 commit

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    userfaultfd code use provided user pointers for vma lookups, which can
    only by done with untagged pointers.

    Untag user pointers in validate_range().

    Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Mike Rapoport
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Auger
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Khalid Aziz
    Cc: Mauro Carvalho Chehab
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

26 Aug, 2019

1 commit


25 Aug, 2019

1 commit

  • userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
    mm->core_state != NULL.

    Otherwise a page fault can see userfaultfd_missing() == T and use an
    already freed userfaultfd_ctx.

    Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
    Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
    Signed-off-by: Oleg Nesterov
    Reported-by: Kefeng Wang
    Reviewed-by: Andrea Arcangeli
    Tested-by: Kefeng Wang
    Cc: Peter Xu
    Cc: Mike Rapoport
    Cc: Jann Horn
    Cc: Jason Gunthorpe
    Cc: Michal Hocko
    Cc: Tetsuo Handa
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

08 Jul, 2019

1 commit


05 Jul, 2019

1 commit

  • When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
    and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.

    This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
    userfaultfd_ctx_read(), which in turn can be waiting for
    userfaultfd_ctx::fault_pending_wqh.lock or
    userfaultfd_ctx::event_wqh.lock.

    But elsewhere the fault_pending_wqh and event_wqh locks are taken with
    IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep
    reports that a deadlock is possible.

    Fix it by always disabling IRQs when taking the fault_pending_wqh and
    event_wqh locks.

    Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
    waitqueue lock") didn't fix this because it only accounted for the
    fd_wqh lock, not the other locks nested inside it.

    Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
    Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
    Signed-off-by: Eric Biggers
    Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
    Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
    Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Andrea Arcangeli
    Cc: [4.19+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

23 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2 see
    the copying file in the top level directory

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 35 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


15 May, 2019

1 commit

  • Userfaultfd can be misued to make it easier to exploit existing
    use-after-free (and similar) bugs that might otherwise only make a
    short window or race condition available. By using userfaultfd to
    stall a kernel thread, a malicious program can keep some state that it
    wrote, stable for an extended period, which it can then access using an
    existing exploit. While it doesn't cause the exploit itself, and while
    it's not the only thing that can stall a kernel thread when accessing a
    memory location, it's one of the few that never needs privilege.

    We can add a flag, allowing userfaultfd to be restricted, so that in
    general it won't be useable by arbitrary user programs, but in
    environments that require userfaultfd it can be turned back on.

    Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
    whether userfaultfd is allowed by unprivileged users. When this is
    set to zero, only privileged users (root user, or users with the
    CAP_SYS_PTRACE capability) will be able to use the userfaultfd
    syscalls.

    Andrea said:

    : The only difference between the bpf sysctl and the userfaultfd sysctl
    : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
    : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
    : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
    : already if it's doing other kind of tracking on processes runtime, in
    : addition of userfaultfd. In other words both syscalls works only for
    : root, when the two sysctl are opt-in set to 1.

    [dgilbert@redhat.com: changelog additions]
    [akpm@linux-foundation.org: documentation tweak, per Mike]
    Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
    Signed-off-by: Peter Xu
    Suggested-by: Andrea Arcangeli
    Suggested-by: Mike Rapoport
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrea Arcangeli
    Cc: Paolo Bonzini
    Cc: Hugh Dickins
    Cc: Luis Chamberlain
    Cc: Maxime Coquelin
    Cc: Maya Gokhale
    Cc: Jerome Glisse
    Cc: Pavel Emelyanov
    Cc: Johannes Weiner
    Cc: Martin Cracauer
    Cc: Denis Plotnikov
    Cc: Marty McFadden
    Cc: Mike Kravetz
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: "Kirill A . Shutemov"
    Cc: "Dr . David Alan Gilbert"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu
     

04 May, 2019

2 commits

  • Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3

    Todd Kjos
     
  • Userspace processes often have multiple allocators that each do
    anonymous mmaps to get memory. When examining memory usage of
    individual processes or systems as a whole, it is useful to be
    able to break down the various heaps that were allocated by
    each layer and examine their size, RSS, and physical memory
    usage.

    This patch adds a user pointer to the shared union in
    vm_area_struct that points to a null terminated string inside
    the user process containing a name for the vma. vmas that
    point to the same address will be merged, but vmas that
    point to equivalent strings at different addresses will
    not be merged.

    Userspace can set the name for a region of memory by calling
    prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
    Setting the name to NULL clears it.

    The names of named anonymous vmas are shown in /proc/pid/maps
    as [anon:] and in /proc/pid/smaps in a new "Name" field
    that is only present for named vmas. If the userspace pointer
    is no longer valid all or part of the name will be replaced
    with "".

    The idea to store a userspace pointer to reduce the complexity
    within mm (at the expense of the complexity of reading
    /proc/pid/mem) came from Dave Hansen. This results in no
    runtime overhead in the mm subsystem other than comparing
    the anon_name pointers when considering vma merging. The pointer
    is stored in a union with fieds that are only used on file-backed
    mappings, so it does not increase memory usage.

    Includes fix from Jed Davis for typo in
    prctl_set_vma_anon_name, which could attempt to set the name
    across two vmas at the same time due to a typo, which might
    corrupt the vma list. Fix it to use tmp instead of end to limit
    the name setting to a single vma at a time.

    Bug: 120441514
    Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
    Signed-off-by: Dmitry Shmidt
    [AmitP: Fix get_user_pages_remote() call to align with upstream commit
    5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")]
    Signed-off-by: Amit Pundir

    Colin Cross
     

20 Apr, 2019

1 commit

  • The core dumping code has always run without holding the mmap_sem for
    writing, despite that is the only way to ensure that the entire vma
    layout will not change from under it. Only using some signal
    serialization on the processes belonging to the mm is not nearly enough.
    This was pointed out earlier. For example in Hugh's post from Jul 2017:

    https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

    "Not strictly relevant here, but a related note: I was very surprised
    to discover, only quite recently, how handle_mm_fault() may be called
    without down_read(mmap_sem) - when core dumping. That seems a
    misguided optimization to me, which would also be nice to correct"

    In particular because the growsdown and growsup can move the
    vm_start/vm_end the various loops the core dump does around the vma will
    not be consistent if page faults can happen concurrently.

    Pretty much all users calling mmget_not_zero()/get_task_mm() and then
    taking the mmap_sem had the potential to introduce unexpected side
    effects in the core dumping code.

    Adding mmap_sem for writing around the ->core_dump invocation is a
    viable long term fix, but it requires removing all copy user and page
    faults and to replace them with get_dump_page() for all binary formats
    which is not suitable as a short term fix.

    For the time being this solution manually covers the places that can
    confuse the core dump either by altering the vma layout or the vma flags
    while it runs. Once ->core_dump runs under mmap_sem for writing the
    function mmget_still_valid() can be dropped.

    Allowing mmap_sem protected sections to run in parallel with the
    coredump provides some minor parallelism advantage to the swapoff code
    (which seems to be safe enough by never mangling any vma field and can
    keep doing swapins in parallel to the core dumping) and to some other
    corner case.

    In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
    however the side effect of this same race condition in /proc/pid/mem
    should be reproducible since before 2.6.12-rc2 so I couldn't add any
    other "Fixes:" because there's no hash beyond the git genesis commit.

    Because find_extend_vma() is the only location outside of the process
    context that could modify the "mm" structures under mmap_sem for
    reading, by adding the mmget_still_valid() check to it, all other cases
    that take the mmap_sem for reading don't need the new check after
    mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
    context also doesn't need the new check, because all tasks under core
    dumping are frozen.

    Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
    Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
    Signed-off-by: Andrea Arcangeli
    Reported-by: Jann Horn
    Suggested-by: Oleg Nesterov
    Acked-by: Peter Xu
    Reviewed-by: Mike Rapoport
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Jann Horn
    Acked-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

29 Dec, 2018

1 commit

  • When the process being tracked does mremap() without
    UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
    we should not generate the remap event, and at the same time we should
    clear all the uffd flags on the new VMA. Without this patch, we can still
    have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
    handling process does not even know the existance of the VMA.

    Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com
    Signed-off-by: Peter Xu
    Reviewed-by: Andrea Arcangeli
    Acked-by: Mike Rapoport
    Reviewed-by: William Kucharski
    Cc: Andrea Arcangeli
    Cc: Mike Rapoport
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Cc: Pravin Shedge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu