26 Oct, 2020
1 commit
-
…nux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
17 Oct, 2020
1 commit
-
The preceding patches have ensured that core dumping properly takes the
mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
its users.Signed-off-by: Jann Horn
Signed-off-by: Andrew Morton
Acked-by: Linus Torvalds
Cc: Christoph Hellwig
Cc: Alexander Viro
Cc: "Eric W . Biederman"
Cc: Oleg Nesterov
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds
11 Aug, 2020
2 commits
-
…ub/scm/linux/kernel/git/acme/linux") into android-mainline
Tiny steps on the way to 5.9-rc1.
Fixes conflicts in:
fs/f2fs/inline.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I16d863ae44a51156499458e8c3486587cbe2babe -
Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.- The seqcount associated lock debug addons which were blocked by the
above fallout.seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.- Conversion of seqcount usage sites to associated types and
initializers"* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
x86/headers: Remove APIC headers from
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...
06 Aug, 2020
1 commit
-
…nux/kernel/git/mtd/linux") into android-mainline
steps along the way to 5.9-rc1
Change-Id: I3090afff778aaa50064b4d8cce21cd7d8bf746a4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
04 Aug, 2020
1 commit
-
Instead of waiting in a loop for the userfaultfd condition to become
true, just wait once and return VM_FAULT_RETRY.We've already dropped the mmap lock, we know we can't really
successfully handle the fault at this point and the caller will have to
retry anyway. So there's no point in making the wait any more
complicated than it needs to be - just schedule away.And once you don't have that complexity with explicit looping, you can
also just lose all the 'userfaultfd_signal_pending()' complexity,
because once we've set the correct process sleeping state, and don't
loop, the act of scheduling itself will be checking if there are any
pending signals before going to sleep.We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
retried faults as major soon anyway (series to regularize and share more
of fault handling across architectures in a separate series by Peter Xu,
and in the meantime we won't worry about the possible minor - I'll be
here all week, try the veal - accounting difference).Cc: Andrea Arcangeli
Cc: Peter Xu
Signed-off-by: Linus Torvalds
29 Jul, 2020
1 commit
-
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.Signed-off-by: Ahmed S. Darwish
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de
24 Jun, 2020
1 commit
-
…cm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4
10 Jun, 2020
4 commits
-
Convert comments that reference mmap_sem to reference mmap_lock instead.
[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds -
Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Davidlohr Bueso
Reviewed-by: Daniel Jordan
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds -
Add new APIs to assert that mmap_sem is held.
Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds -
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.The change is generated using coccinelle with the following rule:
// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Daniel Jordan
Reviewed-by: Laurent Dufour
Reviewed-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds
10 Apr, 2020
1 commit
-
…x") into android-mainline
Baby steps on the way to 5.7-rc1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I89095a90046a14eab189aab257a75b3dfdb5b1db
08 Apr, 2020
4 commits
-
Only declare _UFFDIO_WRITEPROTECT if the user specified
UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user
registers regions with shmem/hugetlbfs we won't expose the new ioctl to
them. Even with complete anonymous memory range, we'll only expose the
new WP ioctl bit if the register mode has MODE_WP.Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Mike Rapoport
Cc: Andrea Arcangeli
Cc: Bobby Powers
Cc: Brian Geffon
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Jerome Glisse
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Cc: Rik van Riel
Cc: Shaohua Li
Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
Signed-off-by: Linus Torvalds -
It does not make sense to try to wake up any waiting thread when we're
write-protecting a memory region. Only wake up when resolving a write
protected page fault.Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Mike Rapoport
Cc: Andrea Arcangeli
Cc: Bobby Powers
Cc: Brian Geffon
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Jerome Glisse
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Cc: Rik van Riel
Cc: Shaohua Li
Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com
Signed-off-by: Linus Torvalds -
Introduce the new uffd-wp APIs for userspace.
Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can
co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
userspace program can not only resolve missing page faults, and at the
same time tracking page data changes along the way.Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
write protection tracking. Note that we will need to register the memory
region with UFFDIO_REGISTER_MODE_WP before that.[peterx@redhat.com: write up the commit message]
[peterx@redhat.com: remove useless block, write commit message, check against
VM_MAYWRITE rather than VM_WRITE when register]
Signed-off-by: Andrea Arcangeli
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Jerome Glisse
Cc: Bobby Powers
Cc: Brian Geffon
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Cc: Rik van Riel
Cc: Shaohua Li
Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
Signed-off-by: Linus Torvalds -
This allows UFFDIO_COPY to map pages write-protected.
[peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
commit messages]
Signed-off-by: Andrea Arcangeli
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: Jerome Glisse
Reviewed-by: Mike Rapoport
Cc: Bobby Powers
Cc: Brian Geffon
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Cc: Rik van Riel
Cc: Shaohua Li
Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
Signed-off-by: Linus Torvalds
07 Apr, 2020
1 commit
-
…ux/kernel/git/mtd/linux") into android-mainline
Baby steps on the way to 5.7-rc1
Change-Id: I136ebb5242e3499873dcd5f5178ad7f68512d11c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
03 Apr, 2020
3 commits
-
Userfaultfd fault path was by default killable even if the caller does not
have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup
we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous
patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
right now, this patch should have no functional change. It also cleaned
the code a little bit by introducing some helpers.Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Tested-by: Brian Geffon
Cc: Andrea Arcangeli
Cc: Bobby Powers
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Jerome Glisse
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Matthew Wilcox
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com
Signed-off-by: Linus Torvalds -
handle_userfaultfd() is currently the only one place in the kernel page
fault procedures that can respond to non-fatal userspace signals. It was
trying to detect such an allowance by checking against USER & KILLABLE
flags, which was "un-official".In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
that the fault handler allows the fault procedure to respond even to
non-fatal signals. Meanwhile, add this new flag to the default fault
flags so that all the page fault handlers can benefit from the new flag.
With that, replacing the userfault check to this one.Since the line is getting even longer, clean up the fault flags a bit too
to ease TTY users.Although we've got a new flag and applied it, we shouldn't have any
functional change with this patch so far.Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Tested-by: Brian Geffon
Reviewed-by: David Hildenbrand
Cc: Andrea Arcangeli
Cc: Bobby Powers
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Jerome Glisse
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Matthew Wilcox
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com
Signed-off-by: Linus Torvalds -
This patch removes the risk path in handle_userfault() then we will be
sure that the callers of handle_mm_fault() will know that the VMAs might
have changed. Meanwhile with previous patch we don't lose responsiveness
as well since the core mm code now can handle the nonfatal userspace
signals even if we return VM_FAULT_RETRY.Suggested-by: Andrea Arcangeli
Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Tested-by: Brian Geffon
Reviewed-by: Jerome Glisse
Cc: Bobby Powers
Cc: David Hildenbrand
Cc: Denis Plotnikov
Cc: "Dr . David Alan Gilbert"
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: "Kirill A . Shutemov"
Cc: Martin Cracauer
Cc: Marty McFadden
Cc: Matthew Wilcox
Cc: Maya Gokhale
Cc: Mel Gorman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com
Signed-off-by: Linus Torvalds
09 Dec, 2019
1 commit
-
Linux 5.5-rc1
Signed-off-by: Greg Kroah-Hartman
Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237
02 Dec, 2019
3 commits
-
Merge updates from Andrew Morton:
"Incoming:- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM
I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"* emailed patches from Andrew Morton : (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
... -
A while ago Andy noticed
(http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
security implications.As the first step of the solution the following patch limits the availably
of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.
Yet, if there are other users of non-cooperative userfaultfd that run
without CAP_SYS_PTRACE, they would be broken :(Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
descriptor table from the read() implementation of uffd, which may have
security implications for unprivileged use of the userfaultfd.Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
CAP_SYS_PTRACE.Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrea Arcangeli
Cc: Daniel Colascione
Cc: Jann Horn
Cc: Lokesh Gidra
Cc: Nick Kralevich
Cc: Nosh Minwalla
Cc: Pavel Emelyanov
Cc: Tim Murray
Cc: Aleksa Sarai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
need to be cleared. Currently setting UFFDIO_REGISTER_MODE_WP returns
-EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
is applied.Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
Signed-off-by: Andrea Arcangeli
Reported-by: Wei Yang
Reviewed-by: Wei Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Oct, 2019
1 commit
-
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.Acked-by: Jason Gunthorpe
Acked-by: Daniel Vetter
Acked-by: Mauro Carvalho Chehab
Acked-by: Greg Kroah-Hartman
Acked-by: David Sterba
Acked-by: Darren Hart (VMware)
Acked-by: Jonathan Cameron
Acked-by: Bjorn Andersson
Acked-by: Dan Williams
Signed-off-by: Arnd Bergmann
02 Oct, 2019
1 commit
-
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.Signed-off-by: Greg Kroah-Hartman
Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e
26 Sep, 2019
1 commit
-
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.userfaultfd code use provided user pointers for vma lookups, which can
only by done with untagged pointers.Untag user pointers in validate_range().
Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Mike Rapoport
Reviewed-by: Vincenzo Frascino
Reviewed-by: Catalin Marinas
Reviewed-by: Kees Cook
Cc: Al Viro
Cc: Dave Hansen
Cc: Eric Auger
Cc: Felix Kuehling
Cc: Jens Wiklander
Cc: Khalid Aziz
Cc: Mauro Carvalho Chehab
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Aug, 2019
1 commit
-
Linux 5.3-rc6
Signed-off-by: Greg Kroah-Hartman
Change-Id: Id10580d48d56054408b3efe0bd1866d67aba2a3d
25 Aug, 2019
1 commit
-
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov
Reported-by: Kefeng Wang
Reviewed-by: Andrea Arcangeli
Tested-by: Kefeng Wang
Cc: Peter Xu
Cc: Mike Rapoport
Cc: Jann Horn
Cc: Jason Gunthorpe
Cc: Michal Hocko
Cc: Tetsuo Handa
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Jul, 2019
1 commit
-
Linux 5.2
Signed-off-by: Greg Kroah-Hartman
05 Jul, 2019
1 commit
-
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which in turn can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock.But elsewhere the fault_pending_wqh and event_wqh locks are taken with
IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep
reports that a deadlock is possible.Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
waitqueue lock") didn't fix this because it only accounted for the
fd_wqh lock, not the other locks nested inside it.Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers
Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Andrea Arcangeli
Cc: [4.19+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2019
1 commit
-
Linux 5.2-rc6
Signed-off-by: Greg Kroah-Hartman
19 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this work is licensed under the terms of the gnu gpl version 2 see
the copying file in the top level directoryextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 35 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Kate Stewart
Reviewed-by: Enrico Weigelt
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de
Signed-off-by: Greg Kroah-Hartman
21 May, 2019
1 commit
-
Signed-off-by: Greg Kroah-Hartman
15 May, 2019
1 commit
-
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available. By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit. While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users. When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.Andrea said:
: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd. In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Signed-off-by: Peter Xu
Suggested-by: Andrea Arcangeli
Suggested-by: Mike Rapoport
Reviewed-by: Mike Rapoport
Reviewed-by: Andrea Arcangeli
Cc: Paolo Bonzini
Cc: Hugh Dickins
Cc: Luis Chamberlain
Cc: Maxime Coquelin
Cc: Maya Gokhale
Cc: Jerome Glisse
Cc: Pavel Emelyanov
Cc: Johannes Weiner
Cc: Martin Cracauer
Cc: Denis Plotnikov
Cc: Marty McFadden
Cc: Mike Kravetz
Cc: Kees Cook
Cc: Mel Gorman
Cc: "Kirill A . Shutemov"
Cc: "Dr . David Alan Gilbert"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2019
2 commits
-
Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3
-
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "".The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.Includes fix from Jed Davis for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir
20 Apr, 2019
1 commit
-
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli
Reported-by: Jann Horn
Suggested-by: Oleg Nesterov
Acked-by: Peter Xu
Reviewed-by: Mike Rapoport
Reviewed-by: Oleg Nesterov
Reviewed-by: Jann Horn
Acked-by: Jason Gunthorpe
Acked-by: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Dec, 2018
1 commit
-
When the process being tracked does mremap() without
UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
we should not generate the remap event, and at the same time we should
clear all the uffd flags on the new VMA. Without this patch, we can still
have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
handling process does not even know the existance of the VMA.Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com
Signed-off-by: Peter Xu
Reviewed-by: Andrea Arcangeli
Acked-by: Mike Rapoport
Reviewed-by: William Kucharski
Cc: Andrea Arcangeli
Cc: Mike Rapoport
Cc: Kirill A. Shutemov
Cc: Hugh Dickins
Cc: Pavel Emelyanov
Cc: Pravin Shedge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds