17 Oct, 2020

1 commit

  • The preceding patches have ensured that core dumping properly takes the
    mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
    its users.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     

11 Aug, 2020

1 commit

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit

  • Instead of waiting in a loop for the userfaultfd condition to become
    true, just wait once and return VM_FAULT_RETRY.

    We've already dropped the mmap lock, we know we can't really
    successfully handle the fault at this point and the caller will have to
    retry anyway. So there's no point in making the wait any more
    complicated than it needs to be - just schedule away.

    And once you don't have that complexity with explicit looping, you can
    also just lose all the 'userfaultfd_signal_pending()' complexity,
    because once we've set the correct process sleeping state, and don't
    loop, the act of scheduling itself will be checking if there are any
    pending signals before going to sleep.

    We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
    retried faults as major soon anyway (series to regularize and share more
    of fault handling across architectures in a separate series by Peter Xu,
    and in the meantime we won't worry about the possible minor - I'll be
    here all week, try the veal - accounting difference).

    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. A plain seqcount_t does not
    contain the information of which lock must be held when entering a write
    side critical section.

    Use the new seqcount_spinlock_t data type, which allows to associate a
    spinlock with the sequence counter. This enables lockdep to verify that
    the spinlock used for writer serialization is held when the write side
    critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de

    Ahmed S. Darwish
     

10 Jun, 2020

4 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add new APIs to assert that mmap_sem is held.

    Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
    makes the assertions more tolerant of future changes to the lock type.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

08 Apr, 2020

4 commits

  • Only declare _UFFDIO_WRITEPROTECT if the user specified
    UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user
    registers regions with shmem/hugetlbfs we won't expose the new ioctl to
    them. Even with complete anonymous memory range, we'll only expose the
    new WP ioctl bit if the register mode has MODE_WP.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • It does not make sense to try to wake up any waiting thread when we're
    write-protecting a memory region. Only wake up when resolving a write
    protected page fault.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Introduce the new uffd-wp APIs for userspace.

    Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
    using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can
    co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
    userspace program can not only resolve missing page faults, and at the
    same time tracking page data changes along the way.

    Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
    write protection tracking. Note that we will need to register the memory
    region with UFFDIO_REGISTER_MODE_WP before that.

    [peterx@redhat.com: write up the commit message]
    [peterx@redhat.com: remove useless block, write commit message, check against
    VM_MAYWRITE rather than VM_WRITE when register]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This allows UFFDIO_COPY to map pages write-protected.

    [peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
    around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
    commit messages]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: Jerome Glisse
    Reviewed-by: Mike Rapoport
    Cc: Bobby Powers
    Cc: Brian Geffon
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Rik van Riel
    Cc: Shaohua Li
    Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

03 Apr, 2020

3 commits

  • Userfaultfd fault path was by default killable even if the caller does not
    have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup
    we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous
    patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
    also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.

    Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
    right now, this patch should have no functional change. It also cleaned
    the code a little bit by introducing some helpers.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • handle_userfaultfd() is currently the only one place in the kernel page
    fault procedures that can respond to non-fatal userspace signals. It was
    trying to detect such an allowance by checking against USER & KILLABLE
    flags, which was "un-official".

    In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
    that the fault handler allows the fault procedure to respond even to
    non-fatal signals. Meanwhile, add this new flag to the default fault
    flags so that all the page fault handlers can benefit from the new flag.
    With that, replacing the userfault check to this one.

    Since the line is getting even longer, clean up the fault flags a bit too
    to ease TTY users.

    Although we've got a new flag and applied it, we shouldn't have any
    functional change with this patch so far.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Reviewed-by: David Hildenbrand
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • This patch removes the risk path in handle_userfault() then we will be
    sure that the callers of handle_mm_fault() will know that the VMAs might
    have changed. Meanwhile with previous patch we don't lose responsiveness
    as well since the core mm code now can handle the nonfatal userspace
    signals even if we return VM_FAULT_RETRY.

    Suggested-by: Andrea Arcangeli
    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Reviewed-by: Jerome Glisse
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

02 Dec, 2019

3 commits

  • Merge updates from Andrew Morton:
    "Incoming:

    - a small number of updates to scripts/, ocfs2 and fs/buffer.c

    - most of MM

    I still have quite a lot of material (mostly not MM) staged after
    linux-next due to -next dependencies. I'll send those across next week
    as the preprequisites get merged up"

    * emailed patches from Andrew Morton : (135 commits)
    mm/page_io.c: annotate refault stalls from swap_readpage
    mm/Kconfig: fix trivial help text punctuation
    mm/Kconfig: fix indentation
    mm/memory_hotplug.c: remove __online_page_set_limits()
    mm: fix typos in comments when calling __SetPageUptodate()
    mm: fix struct member name in function comments
    mm/shmem.c: cast the type of unmap_start to u64
    mm: shmem: use proper gfp flags for shmem_writepage()
    mm/shmem.c: make array 'values' static const, makes object smaller
    userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
    fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
    userfaultfd: wrap the common dst_vma check into an inlined function
    userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
    userfaultfd: use vma_pagesize for all huge page size calculation
    mm/madvise.c: use PAGE_ALIGN[ED] for range checking
    mm/madvise.c: replace with page_size() in madvise_inject_error()
    mm/mmap.c: make vma_merge() comment more easy to understand
    mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
    autonuma: reduce cache footprint when scanning page tables
    autonuma: fix watermark checking in migrate_balanced_pgdat()
    ...

    Linus Torvalds
     
  • A while ago Andy noticed
    (http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
    that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
    security implications.

    As the first step of the solution the following patch limits the availably
    of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.

    The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.

    Yet, if there are other users of non-cooperative userfaultfd that run
    without CAP_SYS_PTRACE, they would be broken :(

    Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
    descriptor table from the read() implementation of uffd, which may have
    security implications for unprivileged use of the userfaultfd.

    Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
    CAP_SYS_PTRACE.

    Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrea Arcangeli
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Lokesh Gidra
    Cc: Nick Kralevich
    Cc: Nosh Minwalla
    Cc: Pavel Emelyanov
    Cc: Tim Murray
    Cc: Aleksa Sarai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
    need to be cleared. Currently setting UFFDIO_REGISTER_MODE_WP returns
    -EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
    is applied.

    Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
    Signed-off-by: Andrea Arcangeli
    Reported-by: Wei Yang
    Reviewed-by: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

23 Oct, 2019

1 commit

  • The .ioctl and .compat_ioctl file operations have the same prototype so
    they can both point to the same function, which works great almost all
    the time when all the commands are compatible.

    One exception is the s390 architecture, where a compat pointer is only
    31 bit wide, and converting it into a 64-bit pointer requires calling
    compat_ptr(). Most drivers here will never run in s390, but since we now
    have a generic helper for it, it's easy enough to use it consistently.

    I double-checked all these drivers to ensure that all ioctl arguments
    are used as pointers or are ignored, but are not interpreted as integer
    values.

    Acked-by: Jason Gunthorpe
    Acked-by: Daniel Vetter
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Sterba
    Acked-by: Darren Hart (VMware)
    Acked-by: Jonathan Cameron
    Acked-by: Bjorn Andersson
    Acked-by: Dan Williams
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

26 Sep, 2019

1 commit

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    userfaultfd code use provided user pointers for vma lookups, which can
    only by done with untagged pointers.

    Untag user pointers in validate_range().

    Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Mike Rapoport
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Auger
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Khalid Aziz
    Cc: Mauro Carvalho Chehab
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

25 Aug, 2019

1 commit

  • userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
    mm->core_state != NULL.

    Otherwise a page fault can see userfaultfd_missing() == T and use an
    already freed userfaultfd_ctx.

    Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
    Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
    Signed-off-by: Oleg Nesterov
    Reported-by: Kefeng Wang
    Reviewed-by: Andrea Arcangeli
    Tested-by: Kefeng Wang
    Cc: Peter Xu
    Cc: Mike Rapoport
    Cc: Jann Horn
    Cc: Jason Gunthorpe
    Cc: Michal Hocko
    Cc: Tetsuo Handa
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

05 Jul, 2019

1 commit

  • When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
    and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.

    This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
    userfaultfd_ctx_read(), which in turn can be waiting for
    userfaultfd_ctx::fault_pending_wqh.lock or
    userfaultfd_ctx::event_wqh.lock.

    But elsewhere the fault_pending_wqh and event_wqh locks are taken with
    IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep
    reports that a deadlock is possible.

    Fix it by always disabling IRQs when taking the fault_pending_wqh and
    event_wqh locks.

    Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
    waitqueue lock") didn't fix this because it only accounted for the
    fd_wqh lock, not the other locks nested inside it.

    Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
    Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
    Signed-off-by: Eric Biggers
    Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
    Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
    Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Andrea Arcangeli
    Cc: [4.19+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2 see
    the copying file in the top level directory

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 35 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

1 commit

  • Userfaultfd can be misued to make it easier to exploit existing
    use-after-free (and similar) bugs that might otherwise only make a
    short window or race condition available. By using userfaultfd to
    stall a kernel thread, a malicious program can keep some state that it
    wrote, stable for an extended period, which it can then access using an
    existing exploit. While it doesn't cause the exploit itself, and while
    it's not the only thing that can stall a kernel thread when accessing a
    memory location, it's one of the few that never needs privilege.

    We can add a flag, allowing userfaultfd to be restricted, so that in
    general it won't be useable by arbitrary user programs, but in
    environments that require userfaultfd it can be turned back on.

    Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
    whether userfaultfd is allowed by unprivileged users. When this is
    set to zero, only privileged users (root user, or users with the
    CAP_SYS_PTRACE capability) will be able to use the userfaultfd
    syscalls.

    Andrea said:

    : The only difference between the bpf sysctl and the userfaultfd sysctl
    : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
    : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
    : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
    : already if it's doing other kind of tracking on processes runtime, in
    : addition of userfaultfd. In other words both syscalls works only for
    : root, when the two sysctl are opt-in set to 1.

    [dgilbert@redhat.com: changelog additions]
    [akpm@linux-foundation.org: documentation tweak, per Mike]
    Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
    Signed-off-by: Peter Xu
    Suggested-by: Andrea Arcangeli
    Suggested-by: Mike Rapoport
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrea Arcangeli
    Cc: Paolo Bonzini
    Cc: Hugh Dickins
    Cc: Luis Chamberlain
    Cc: Maxime Coquelin
    Cc: Maya Gokhale
    Cc: Jerome Glisse
    Cc: Pavel Emelyanov
    Cc: Johannes Weiner
    Cc: Martin Cracauer
    Cc: Denis Plotnikov
    Cc: Marty McFadden
    Cc: Mike Kravetz
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: "Kirill A . Shutemov"
    Cc: "Dr . David Alan Gilbert"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu
     

20 Apr, 2019

1 commit

  • The core dumping code has always run without holding the mmap_sem for
    writing, despite that is the only way to ensure that the entire vma
    layout will not change from under it. Only using some signal
    serialization on the processes belonging to the mm is not nearly enough.
    This was pointed out earlier. For example in Hugh's post from Jul 2017:

    https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

    "Not strictly relevant here, but a related note: I was very surprised
    to discover, only quite recently, how handle_mm_fault() may be called
    without down_read(mmap_sem) - when core dumping. That seems a
    misguided optimization to me, which would also be nice to correct"

    In particular because the growsdown and growsup can move the
    vm_start/vm_end the various loops the core dump does around the vma will
    not be consistent if page faults can happen concurrently.

    Pretty much all users calling mmget_not_zero()/get_task_mm() and then
    taking the mmap_sem had the potential to introduce unexpected side
    effects in the core dumping code.

    Adding mmap_sem for writing around the ->core_dump invocation is a
    viable long term fix, but it requires removing all copy user and page
    faults and to replace them with get_dump_page() for all binary formats
    which is not suitable as a short term fix.

    For the time being this solution manually covers the places that can
    confuse the core dump either by altering the vma layout or the vma flags
    while it runs. Once ->core_dump runs under mmap_sem for writing the
    function mmget_still_valid() can be dropped.

    Allowing mmap_sem protected sections to run in parallel with the
    coredump provides some minor parallelism advantage to the swapoff code
    (which seems to be safe enough by never mangling any vma field and can
    keep doing swapins in parallel to the core dumping) and to some other
    corner case.

    In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
    however the side effect of this same race condition in /proc/pid/mem
    should be reproducible since before 2.6.12-rc2 so I couldn't add any
    other "Fixes:" because there's no hash beyond the git genesis commit.

    Because find_extend_vma() is the only location outside of the process
    context that could modify the "mm" structures under mmap_sem for
    reading, by adding the mmget_still_valid() check to it, all other cases
    that take the mmap_sem for reading don't need the new check after
    mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
    context also doesn't need the new check, because all tasks under core
    dumping are frozen.

    Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
    Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
    Signed-off-by: Andrea Arcangeli
    Reported-by: Jann Horn
    Suggested-by: Oleg Nesterov
    Acked-by: Peter Xu
    Reviewed-by: Mike Rapoport
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Jann Horn
    Acked-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

29 Dec, 2018

2 commits

  • When the process being tracked does mremap() without
    UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
    we should not generate the remap event, and at the same time we should
    clear all the uffd flags on the new VMA. Without this patch, we can still
    have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
    handling process does not even know the existance of the VMA.

    Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com
    Signed-off-by: Peter Xu
    Reviewed-by: Andrea Arcangeli
    Acked-by: Mike Rapoport
    Reviewed-by: William Kucharski
    Cc: Andrea Arcangeli
    Cc: Mike Rapoport
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Pavel Emelyanov
    Cc: Pravin Shedge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Reference counters should use refcount_t rather than atomic_t, since the
    refcount_t implementation can prevent overflows, reducing the
    exploitability of reference leak bugs. userfaultfd_ctx::refcount is a
    reference counter with the usual semantics, so convert it to refcount_t.

    Note: I replaced the BUG() on incrementing a 0 refcount with just
    refcount_inc(), since part of the semantics of refcount_t is that that
    incrementing a 0 refcount is not allowed; with CONFIG_REFCOUNT_FULL,
    refcount_inc() already checks for it and warns.

    Link: http://lkml.kernel.org/r/20181115003916.63381-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Reviewed-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

27 Dec, 2018

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The biggest RCU changes in this cycle were:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions to
    their vanilla RCU counterparts. This series is a step towards
    complete removal of the RCU-bh and RCU-sched update-side functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for rcutorture
    testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein for a
    bag-on-head-class bug.

    - RCU torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
    rcutorture: Don't do busted forward-progress testing
    rcutorture: Use 100ms buckets for forward-progress callback histograms
    rcutorture: Recover from OOM during forward-progress tests
    rcutorture: Print forward-progress test age upon failure
    rcutorture: Print time since GP end upon forward-progress failure
    rcutorture: Print histogram of CB invocation at OOM time
    rcutorture: Print GP age upon forward-progress failure
    rcu: Print per-CPU callback counts for forward-progress failures
    rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
    rcutorture: Dump grace-period diagnostics upon forward-progress OOM
    rcutorture: Prepare for asynchronous access to rcu_fwd_startat
    torture: Remove unnecessary "ret" variables
    rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
    rcutorture: Break up too-long rcu_torture_fwd_prog() function
    rcutorture: Remove cbflood facility
    torture: Bring any extra CPUs online during kernel startup
    rcutorture: Add call_rcu() flooding forward-progress tests
    rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
    tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
    net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
    ...

    Linus Torvalds
     

15 Dec, 2018

1 commit

  • Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
    could trigger an harmless false positive WARN_ON. Check the vma is
    already registered before checking VM_MAYWRITE to shut off the false
    positive warning.

    Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
    Cc:
    Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
    Signed-off-by: Andrea Arcangeli
    Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
    Acked-by: Mike Rapoport
    Acked-by: Hugh Dickins
    Acked-by: Peter Xu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

04 Dec, 2018

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions
    to their vanilla RCU counterparts. This series is a step
    towards complete removal of the RCU-bh and RCU-sched update-side
    functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for
    rcutorture testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein
    for a bag-on-head-class bug.

    - RCU torture-test updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

01 Dec, 2018

1 commit

  • After the VMA to register the uffd onto is found, check that it has
    VM_MAYWRITE set before allowing registration. This way we inherit all
    common code checks before allowing to fill file holes in shmem and
    hugetlbfs with UFFDIO_COPY.

    The userfaultfd memory model is not applicable for readonly files unless
    it's a MAP_PRIVATE.

    Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
    Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Rapoport
    Reviewed-by: Hugh Dickins
    Reported-by: Jann Horn
    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Cc:
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Peter Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

13 Nov, 2018

1 commit

  • lockdep_assert_held() is better suited to checking locking requirements,
    since it only checks if the current thread holds the lock regardless of
    whether someone else does. This is also a step towards possibly removing
    spin_is_locked().

    Signed-off-by: Lance Roy
    Cc: Alexander Viro
    Cc:
    Signed-off-by: Paul E. McKenney

    Lance Roy
     

27 Oct, 2018

1 commit

  • userfaultfd contains howe-grown locking of the waitqueue lock, and does
    not disable interrupts. This relies on the fact that no one else takes it
    from interrupt context and violates an invariat of the normal waitqueue
    locking scheme. With aio poll it is easy to trigger other locks that
    disable interrupts (or are called from interrupt context).

    Link: http://lkml.kernel.org/r/20181018154101.18750-1-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Andrea Arcangeli
    Reviewed-by: Andrew Morton
    Cc: [4.19.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

24 Aug, 2018

1 commit

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    The aim is to change the return type of finish_fault() and
    handle_mm_fault() to vm_fault_t type. As part of that clean up return
    type of all other recursively called functions have been changed to
    vm_fault_t type.

    The places from where handle_mm_fault() is getting invoked will be
    change to vm_fault_t type but in a separate patch.

    vmf_error() is the newly introduce inline function in 4.17-rc6.

    [akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()]
    Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

23 Aug, 2018

1 commit

  • The userfaultfd code currently uses the unlocked waitqueue helpers for
    managing fault_wqh, but instead of holding the waitqueue lock for this
    waitqueue around these calls, it the waitqueue lock of
    fault_pending_wq, which is a different waitqueue instance. Given that
    the waitqueue is not exposed to the rest of the kernel this actually
    works ok at the moment, but prevents the userfaultfd locking rules from
    being enforced using lockdep.

    Switch to the internally locked waitqueue helpers instead. This means
    that the lock inside fault_wqh now nests inside the fault_pending_wqh
    lock, but that's not a problem since it was entirely unused before.

    [hch@lst.de: slight changelog updates]
    [rppt@linux.vnet.ibm.com: spotted changelog spellos]
    Link: http://lkml.kernel.org/r/20171214152344.6880-3-hch@lst.de
    Signed-off-by: Matthew Wilcox
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mike Rapoport
    Cc: Al Viro
    Cc: Andrea Arcangeli
    Cc: Ingo Molnar
    Cc: Jason Baron
    Cc: Peter Zijlstra
    Cc: Davidlohr Bueso
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

18 Aug, 2018

1 commit

  • Pointer uwq is being assigned but is never used hence it is redundant
    and can be removed.

    Cleans up clang warning:
    warning: variable 'uwq' set but not used [-Wunused-but-set-variable]

    Link: http://lkml.kernel.org/r/20180717090802.18357-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     

03 Aug, 2018

1 commit

  • The fix in commit 0cbb4b4f4c44 ("userfaultfd: clear the
    vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") cleared the
    vma->vm_userfaultfd_ctx but kept userfaultfd flags in vma->vm_flags
    that were copied from the parent process VMA.

    As the result, there is an inconsistency between the values of
    vma->vm_userfaultfd_ctx.ctx and vma->vm_flags which triggers BUG_ON
    in userfaultfd_release().

    Clearing the uffd flags from vma->vm_flags in case of UFFD_EVENT_FORK
    failure resolves the issue.

    Link: http://lkml.kernel.org/r/1532931975-25473-1-git-send-email-rppt@linux.vnet.ibm.com
    Fixes: 0cbb4b4f4c44 ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails")
    Signed-off-by: Mike Rapoport
    Reported-by: syzbot+121be635a7a35ddb7dcb@syzkaller.appspotmail.com
    Cc: Andrea Arcangeli
    Cc: Eric Biggers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

04 Jul, 2018

1 commit

  • Use huge_ptep_get() to translate huge ptes to normal ptes so we can
    check them with the huge_pte_* functions. Otherwise some architectures
    will check the wrong values and will not wait for userspace to bring in
    the memory.

    Link: http://lkml.kernel.org/r/20180626132421.78084-1-frankja@linux.ibm.com
    Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
    Signed-off-by: Janosch Frank
    Reviewed-by: David Hildenbrand
    Reviewed-by: Mike Kravetz
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Janosch Frank
     

08 Jun, 2018

1 commit

  • If a process monitored with userfaultfd changes it's memory mappings or
    forks() at the same time as uffd monitor fills the process memory with
    UFFDIO_COPY, the actual creation of page table entries and copying of
    the data in mcopy_atomic may happen either before of after the memory
    mapping modifications and there is no way for the uffd monitor to
    maintain consistent view of the process memory layout.

    For instance, let's consider fork() running in parallel with
    userfaultfd_copy():

    process | uffd monitor
    ---------------------------------+------------------------------
    fork() | userfaultfd_copy()
    ... | ...
    dup_mmap() | down_read(mmap_sem)
    down_write(mmap_sem) | /* create PTEs, copy data */
    dup_uffd() | up_read(mmap_sem)
    copy_page_range() |
    up_write(mmap_sem) |
    dup_uffd_complete() |
    /* notify monitor */ |

    If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
    be present by the time copy_page_range() is called and they will appear
    in the child's memory mappings. However, if the fork() is the first to
    take the mmap_sem, the new pages won't be mapped in the child's address
    space.

    If the pages are not present and child tries to access them, the monitor
    will get page fault notification and everything is fine. However, if
    the pages *are present*, the child can access them without uffd
    noticing. And if we copy them into child it'll see the wrong data.
    Since we are talking about background copy, we'd need to decide whether
    the pages should be copied or not regardless #PF notifications.

    Since userfaultfd monitor has no way to determine what was the order,
    let's disallow userfaultfd_copy in parallel with the non-cooperative
    events. In such case we return -EAGAIN and the uffd monitor can
    understand that userfaultfd_copy() clashed with a non-cooperative event
    and take an appropriate action.

    Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Pavel Emelyanov
    Cc: Andrea Arcangeli
    Cc: Mike Kravetz
    Cc: Andrei Vagin
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds