Eric Lee / smarc-fsl-linux-kernel

07 Dec, 2020

1 commit

309d08d9b mm/mmap.c: fix mmap return value when vma is merged after call_mmap() ... Browse Code »

On success, mmap should return the begin address of newly mapped area,
but patch "mm: mmap: merge vma after call_mmap() if possible" set
vm_start of newly merged vma to return value addr. Users of mmap will
get wrong address if vma is merged after call_mmap(). We fix this by
moving the assignment to addr before merging vma.

We have a driver which changes vm_flags, and this bug is found by our
testcases.

Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Signed-off-by: Liu Zixian
Signed-off-by: Andrew Morton
Reviewed-by: Jason Gunthorpe
Reviewed-by: David Hildenbrand
Cc: Miaohe Lin
Cc: Hongxiang Lou
Cc: Hu Shiyuan
Cc: Matthew Wilcox
Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.com
Signed-off-by: Linus Torvalds

Liu Zixian
2020-12-07 02:19:07 +0800

19 Oct, 2020

2 commits

fb8090b69 mm/mmap: add inline munmap_vma_range() for code readability ... Browse Code »

There are two locations that have a block of code for munmapping a vma
range. Change those two locations to use a function and add meaningful
comments about what happens to the arguments, which was unclear in the
previous code.

Signed-off-by: Liam R. Howlett
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200818154707.2515169-2-Liam.Howlett@Oracle.com
Signed-off-by: Linus Torvalds

Liam R. Howlett
2020-10-19 00:27:09 +0800
3903b55a6 mm/mmap: add inline vma_next() for readability of mmap code ... Browse Code »

There are three places that the next vma is required which uses the same
block of code. Replace the block with a function and add comments on what
happens in the case where NULL is encountered.

Signed-off-by: Liam R. Howlett
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200818154707.2515169-1-Liam.Howlett@Oracle.com
Signed-off-by: Linus Torvalds

Liam R. Howlett
2020-10-19 00:27:09 +0800

17 Oct, 2020

2 commits

4d45e75a9 mm: remove the now-unnecessary mmget_still_valid() hack ... Browse Code »

The preceding patches have ensured that core dumping properly takes the
mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn
Signed-off-by: Andrew Morton
Acked-by: Linus Torvalds
Cc: Christoph Hellwig
Cc: Alexander Viro
Cc: "Eric W . Biederman"
Cc: Oleg Nesterov
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds

Jann Horn
2020-10-17 02:11:22 +0800
73eb7f9a4 mm: use helper function put_write_access() ... Browse Code »

In commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), the helper put_write_access()
came with the atomic_dec operation of the i_writecount field. But it
forgot to use this helper in __vma_link_file() and dup_mmap().

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200924115235.5111-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-17 02:11:19 +0800

14 Oct, 2020

9 commits

8332326e8 mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct() ... Browse Code »

Replace do_brk with do_brk_flags in comment of insert_vm_struct(), since
do_brk was removed in following commit.

Fixes: bb177a732c4369 ("mm: do not bug_on on incorrect length in __mm_populate()")
Signed-off-by: Liao Pingfang
Signed-off-by: Yi Wang
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/1600650778-43230-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Linus Torvalds

Liao Pingfang
2020-10-14 09:38:32 +0800
cb48841fb mm/mmap.c: use helper function allow_write_access() in __remove_shared_vm_struct() ... Browse Code »

In commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), the helper allow_write_access
came with the atomic_inc operation of the i_writecount field in the func
__remove_shared_vm_struct(). But it forgot to use this helper function.

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200921115814.39680-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-14 09:38:32 +0800
cf508b584 mm: use helper function mapping_allow_writable() ... Browse Code »

Commit 4bb5f5d9395b ("mm: allow drivers to prevent new writable mappings")
changed i_mmap_writable from unsigned int to atomic_t and add the helper
function mapping_allow_writable() to atomic_inc i_mmap_writable. But it
forgot to use this helper function in dup_mmap() and __vma_link_file().

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Cc: Christian Brauner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Christian Kellner
Cc: Suren Baghdasaryan
Cc: Adrian Reber
Cc: Shakeel Butt
Cc: Aleksa Sarai
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200917112736.7789-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-14 09:38:31 +0800
0fc48a6e2 mm/mmap: check on file instead of the rb_root_cached of its address_space ... Browse Code »

In __vma_adjust(), we do the check on *root* to decide whether to adjust
the address_space. It seems to be more meaningful to do the check on
*file* itself. This means we are adjusting some data because it is a file
backed vma.

Since we seem to assume the address_space is valid if it is a file backed
vma, let's just replace *root* with *file* here.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200913133631.37781-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-14 09:38:31 +0800
808fbdbea mm/mmap: not necessary to check mapping separately ... Browse Code »

*root* with type of struct rb_root_cached is an element of *mapping*
with type of struct address_space. This implies when we have a valid
*root* it must be a part of valid *mapping*.

So we can merge these two checks together to make the code more easy to
read and to save some cpu cycles.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200913133631.37781-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-14 09:38:31 +0800
f9d86a605 mm/mmap: leave adjust_next as virtual address instead of page frame number ... Browse Code »

Instead of converting adjust_next between bytes and pages number, let's
just store the virtual address into adjust_next.

Also, this patch fixes one typo in the comment of vma_adjust_trans_huge().

[vbabka@suse.cz: changelog tweak]

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Cc: Mike Kravetz
Link: http://lkml.kernel.org/r/20200828081031.11306-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-14 09:38:31 +0800
4d1e72437 mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase() ... Browse Code »

These two functions share the same logic except ignore a different vma.

Let's reuse the code.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200809232057.23477-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-14 09:38:31 +0800
7c61f917b mm/mmap: rename __vma_unlink_common() to __vma_unlink() ... Browse Code »

__vma_unlink_common() and __vma_unlink() are counterparts. Since there is
no function named __vma_unlink(), let's rename __vma_unlink_common() to
__vma_unlink() to make the code more self-explanatory and easy for
audience to understand.

Otherwise we may expect there are several variants of vma_unlink() and
__vma_unlink_common() is used by them.

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200809232057.23477-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-14 09:38:31 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

13 Oct, 2020

1 commit

6734e20e3 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.

In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.

Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.

We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.

Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.

Summary:

- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.

- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.

- Support for the Armv8.3 Pointer Authentication enhancements.

- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.

- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.

- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.

- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.

- Implementation of ARCH_STACKWALK for unwinding.

- Improve reporting of unexpected kernel traps due to BPF JIT
failure.

- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.

- Removal of TEXT_OFFSET.

- Removal of some unused functions, parameters and prototypes.

- Removal of MPIDR-based topology detection in favour of firmware
description.

- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.

- Cleanups to the SDEI driver in preparation for support in KVM.

- Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...

Linus Torvalds
2020-10-13 01:00:51 +0800

12 Oct, 2020

1 commit

bc4fe4cdd mm: mmap: Fix general protection fault in unlink_file_vma() ... Browse Code »

The syzbot reported the below general protection fault:

general protection fault, probably for non-canonical address
0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
Call Trace:
free_pgtables+0x1b3/0x2f0 mm/memory.c:415
exit_mmap+0x2c0/0x530 mm/mmap.c:3184
__mmput+0x122/0x470 kernel/fork.c:1076
mmput+0x53/0x60 kernel/fork.c:1097
exit_mm kernel/exit.c:483 [inline]
do_exit+0xa8b/0x29f0 kernel/exit.c:793
do_group_exit+0x125/0x310 kernel/exit.c:903
get_signal+0x428/0x1f00 kernel/signal.c:2757
arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's because the ->mmap() callback can change vma->vm_file and fput the
original file. But the commit d70cec898324 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().

[ Thanks Hillf for pointing this extra fput() out. ]

Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com
Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Cc: Christian König
Cc: Hongxiang Lou
Cc: Chris Wilson
Cc: Dave Airlie
Cc: Daniel Vetter
Cc: Sumit Semwal
Cc: Matthew Wilcox (Oracle)
Cc: John Hubbard
Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-12 01:31:10 +0800

25 Sep, 2020

1 commit

f56753ac2 bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag ... Browse Code »

Replace the two negative flags that are always used together with a
single positive flag that indicates the writeback capability instead
of two related non-capabilities. Also remove the pointless wrappers
to just check the flag.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-25 03:43:39 +0800

04 Sep, 2020

1 commit

c462ac288 mm: Introduce arch_validate_flags() ... Browse Code »

Similarly to arch_validate_prot() called from do_mprotect_pkey(), an
architecture may need to sanity-check the new vm_flags.

Define a dummy function always returning true. In addition to
do_mprotect_pkey(), also invoke it from mmap_region() prior to updating
vma->vm_page_prot to allow the architecture code to veto potentially
inconsistent vm_flags.

Signed-off-by: Catalin Marinas
Acked-by: Andrew Morton

Catalin Marinas
2020-09-04 19:46:07 +0800

08 Aug, 2020

3 commits

45e55300f mm: remove unnecessary wrapper function do_mmap_pgoff() ... Browse Code »

The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap(). However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: David Hildenbrand
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds

Peter Collingbourne
2020-08-08 02:33:27 +0800
d70cec898 mm: mmap: merge vma after call_mmap() if possible ... Browse Code »

The vm_flags may be changed after call_mmap() because drivers may set some
flags for their own purpose. As a result, we failed to merge the adjacent
vma due to the different vm_flags as userspace can't pass in the same one.
Try to merge vma after call_mmap() to fix this issue.

Signed-off-by: Hongxiang Lou
Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/1594954065-23733-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-08-08 02:33:27 +0800
7bba8f0ea mm/mmap: optimize a branch judgment in ksys_mmap_pgoff() ... Browse Code »

Look at the pseudo code below. It's very clear that, the judgement
"!is_file_hugepages(file)" at 3) is duplicated to the one at 1), we can
use "else if" to avoid it. And the assignment "retval = -EINVAL" at 2) is
only needed by the branch 3), because "retval" will be overwritten at 4).

No functional change, but it can reduce the code size. Maybe more clearer?
Before:
text data bss dec hex filename
28733 1590 1 30324 7674 mm/mmap.o

After:
text data bss dec hex filename
28701 1590 1 30292 7654 mm/mmap.o

====pseudo code====:
if (!(flags & MAP_ANONYMOUS)) {
...
1) if (is_file_hugepages(file))
len = ALIGN(len, huge_page_size(hstate_file(file)));
2) retval = -EINVAL;
3) if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
goto out_fput;
} else if (flags & MAP_HUGETLB) {
...
}
...

4) retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
out_fput:
...
return retval;

Signed-off-by: Zhen Lei
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200705080112.1405-1-thunder.leizhen@huawei.com
Signed-off-by: Linus Torvalds

Zhen Lei
2020-08-08 02:33:26 +0800

31 Jul, 2020

1 commit

c1cc4784c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmc… ... Browse Code »

…k/linux-rcu into core/rcu

Pull the v5.9 RCU bits from Paul E. McKenney:

- Documentation updates
- Miscellaneous fixes
- kfree_rcu updates
- RCU tasks updates
- Read-side scalability tests
- SRCU updates
- Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2020-07-31 06:15:53 +0800

25 Jul, 2020

1 commit

246c320a8 mm/mmap.c: close race between munmap() and expand_upwards()/downwards() ... Browse Code »

VMA with VM_GROWSDOWN or VM_GROWSUP flag set can change their size under
mmap_read_lock(). It can lead to race with __do_munmap():

Thread A Thread B
__do_munmap()
detach_vmas_to_be_unmapped()
mmap_write_downgrade()
expand_downwards()
vma->vm_start = address;
// The VMA now overlaps with
// VMAs detached by the Thread A
// page fault populates expanded part
// of the VMA
unmap_region()
// Zaps pagetables partly
// populated by Thread B

Similar race exists for expand_upwards().

The fix is to avoid downgrading mmap_lock in __do_munmap() if detached
VMAs are next to VM_GROWSDOWN or VM_GROWSUP VMA.

[akpm@linux-foundation.org: s/mmap_sem/mmap_lock/ in comment]

Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn
Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Reviewed-by: Yang Shi
Acked-by: Vlastimil Babka
Cc: Oleg Nesterov
Cc: Matthew Wilcox
Cc: [4.20+]
Link: http://lkml.kernel.org/r/20200709105309.42495-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2020-07-25 03:42:41 +0800

30 Jun, 2020

1 commit

0a3b3c253 mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls ... Browse Code »

A large process running on a heavily loaded system can encounter the
following RCU CPU stall warning:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190
(t=21013 jiffies g=1005461 q=132576)
NMI backtrace for cpu 3
CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1
Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
Call Trace:

dump_stack+0x46/0x60
nmi_cpu_backtrace.cold.3+0x13/0x50
? lapic_can_unplug_cpu.cold.27+0x34/0x34
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x99/0xc7
rcu_sched_clock_irq.cold.87+0x1aa/0x397
? tick_sched_do_timer+0x60/0x60
update_process_times+0x28/0x60
tick_sched_timer+0x37/0x70
__hrtimer_run_queues+0xfe/0x270
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x5e/0x120
apic_timer_interrupt+0xf/0x20

RIP: 0010:kmem_cache_free+0x223/0x300
Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65
RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030
RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18
RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff
R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f
R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00
? remove_vma+0x4f/0x60
remove_vma+0x4f/0x60
exit_mmap+0xd6/0x160
mmput+0x4a/0x110
do_exit+0x278/0xae0
? syscall_trace_enter+0x1d3/0x2b0
? handle_mm_fault+0xaa/0x1c0
do_group_exit+0x3a/0xa0
__x64_sys_exit_group+0x14/0x20
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run
for a very long time given a large process. This commit therefore adds
a cond_resched() to this loop, providing RCU any needed quiescent states.

Cc: Andrew Morton
Cc:
Reviewed-by: Shakeel Butt
Reviewed-by: Joel Fernandes (Google)
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2020-06-30 02:58:49 +0800

10 Jun, 2020

4 commits

c1e8d7c6a mmap locking API: convert mmap_sem comments ... Browse Code »

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
3e4e28c5a mmap locking API: convert mmap_sem API comments ... Browse Code »

Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Davidlohr Bueso
Reviewed-by: Daniel Jordan
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
da1c55f1b mmap locking API: rename mmap_sem to mmap_lock ... Browse Code »

Rename the mmap_sem field to mmap_lock. Any new uses of this lock should
now go through the new mmap locking api. The mmap_lock is still
implemented as a rwsem, though this could change in the future.

[akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch]

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Davidlohr Bueso
Reviewed-by: Daniel Jordan
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
d8ed45c5d mmap locking API: use coccinelle to convert mmap_sem rwsem call sites ... Browse Code »

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Daniel Jordan
Reviewed-by: Laurent Dufour
Reviewed-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800

05 Jun, 2020

1 commit

b4f315b40 mm: mmap: fix a typo in comment "compatbility"->"compatibility" ... Browse Code »

There is a typo in comment, fix it.

Signed-off-by: Ethon Paul
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Ralph Campbell
Link: http://lkml.kernel.org/r/20200410163206.14016-1-ethp@qq.com
Signed-off-by: Linus Torvalds

Ethon Paul
2020-06-05 10:06:23 +0800

11 Apr, 2020

2 commits

6cb4d9a28 mm/vma: introduce VM_ACCESS_FLAGS ... Browse Code »

There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group. One such example
is during page fault. Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Cc: Russell King
Cc: Catalin Marinas
Cc: Mark Salter
Cc: Nick Hu
Cc: Ley Foon Tan
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Guan Xuetao
Cc: Dave Hansen
Cc: Thomas Gleixner
Cc: Rob Springer
Cc: Greg Kroah-Hartman
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Anshuman Khandual
2020-04-11 06:36:21 +0800
09ef5283f mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area ... Browse Code »

On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset. Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.

But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.

Fix this uninitialized value issue by setting it to 0 explicitly.

Before:
vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022

After:
vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0

Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Matthew Wilcox (Oracle)
Cc: Michel Lespinasse
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

Jaewon Kim
2020-04-11 06:36:21 +0800

08 Apr, 2020

2 commits

e4a9bc589 mm: use fallthrough; ... Browse Code »

Convert the various /* fallthrough */ comments to the pseudo-keyword
fallthrough;

Done via script:
https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Reviewed-by: Gustavo A. R. Silva
Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
Signed-off-by: Linus Torvalds

Joe Perches
2020-04-08 01:43:41 +0800
3122e80ef mm/vma: make vma_is_accessible() available for general use ... Browse Code »

Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use. While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual
Signed-off-by: Andrew Morton
Acked-by: Geert Uytterhoeven
Acked-by: Guo Ren
Acked-by: Vlastimil Babka
Cc: Guo Ren
Cc: Geert Uytterhoeven
Cc: Ralf Baechle
Cc: Paul Burton
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: Dave Hansen
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Mel Gorman
Cc: Alexander Viro
Cc: "Aneesh Kumar K.V"
Cc: Arnaldo Carvalho de Melo
Cc: Arnd Bergmann
Cc: Nick Piggin
Cc: Paul Mackerras
Cc: Will Deacon
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Anshuman Khandual
2020-04-08 01:43:37 +0800

03 Apr, 2020

2 commits

df529cabb mm: mmap: add trace point of vm_unmapped_area ... Browse Code »

Even on 64 bit kernel, the mmap failure can happen for a 32 bit task.
Virtual memory space shortage of a task on mmap is reported to userspace
as -ENOMEM. It can be confused as physical memory shortage of overall
system.

The vm_unmapped_area can be called to by some drivers or other kernel core
system like filesystem. In my platform, GPU driver calls to
vm_unmapped_area and the driver returns -ENOMEM even in GPU side shortage.
It can be hard to distinguish which code layer returns the -ENOMEM.

Create mmap trace file and add trace point of vm_unmapped_area.

i.e.)
277.156599: vm_unmapped_area: addr=77e0d03000 err=0 total_vm=0x17014b flags=0x1 len=0x400000 lo=0x8000 hi=0x7878c27000 mask=0x0 ofs=0x1
342.838740: vm_unmapped_area: addr=0 err=-12 total_vm=0xffb08 flags=0x0 len=0x100000 lo=0x40000000 hi=0xfffff000 mask=0x0 ofs=0x22

[akpm@linux-foundation.org: prefix address printk with 0x, per Matthew]
Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Cc: Borislav Petkov
Cc: Matthew Wilcox (Oracle)
Cc: Michel Lespinasse
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/20200320055823.27089-3-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

Jaewon Kim
2020-04-03 00:35:30 +0800
baceaf1c8 mmap: remove inline of vm_unmapped_area ... Browse Code »

Patch series "mm: mmap: add mmap trace point", v3.

Create mmap trace file and add trace point of vm_unmapped_area().

This patch (of 2):

In preparation for next patch remove inline of vm_unmapped_area and move
code to mmap.c. There is no logical change.

Also remove unmapped_area[_topdown] out of mm.h, there is no code
calling to them.

Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Cc: Matthew Wilcox (Oracle)
Cc: Michel Lespinasse
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20200320055823.27089-2-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

Jaewon Kim
2020-04-03 00:35:30 +0800

20 Feb, 2020

1 commit

dcde23731 mm: Avoid creating virtual address aliases in brk()/mmap()/mremap() ... Browse Code »

Currently the arm64 kernel ignores the top address byte passed to brk(),
mmap() and mremap(). When the user is not aware of the 56-bit address
limit or relies on the kernel to return an error, untagging such
pointers has the potential to create address aliases in user-space.
Passing a tagged address to munmap(), madvise() is permitted since the
tagged pointer is expected to be inside an existing mapping.

The current behaviour breaks the existing glibc malloc() implementation
which relies on brk() with an address beyond 56-bit to be rejected by
the kernel.

Remove untagging in the above functions by partially reverting commit
ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk"). In
addition, update the arm64 tagged-address-abi.rst document accordingly.

Link: https://bugzilla.redhat.com/1797052
Fixes: ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk")
Cc: # 5.4.x-
Cc: Florian Weimer
Reviewed-by: Andrew Morton
Reported-by: Victor Stinner
Acked-by: Will Deacon
Acked-by: Andrey Konovalov
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon

Catalin Marinas
2020-02-20 18:03:14 +0800

01 Feb, 2020

1 commit

a67c8caae mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma() ... Browse Code »

The jump labels try_prev and none are not really needed in
find_mergeable_anon_vma(), eliminate them to improve readability.

Link: http://lkml.kernel.org/r/1574079844-17493-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin
Reviewed-by: David Hildenbrand
Reviewed-by: John Hubbard
Reviewed-by: Wei Yang
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-02-01 02:30:39 +0800

28 Jan, 2020

1 commit

e279160f4 Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:

- Time namespace support:

If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.

The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.

Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.

The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.

Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.

- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.

- A new clocksource/event driver for Microchip PIT64

- Suspend/resume support for the Hyper-V clocksource

- The usual pile of fixes, updates and improvements mostly in the
driver code"

* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...

Linus Torvalds
2020-01-28 08:47:05 +0800

14 Jan, 2020

1 commit

af34ebeb8 x86/vdso: Handle faults on timens page ... Browse Code »

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Co-developed-by: Andrei Vagin
Signed-off-by: Andrei Vagin
Signed-off-by: Dmitry Safonov
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com

Dmitry Safonov
2020-01-14 19:20:58 +0800

07 Jan, 2020

1 commit

24cecc377 arm64: Revert support for execute-only user mappings ... Browse Code »

The ARMv8 64-bit architecture supports execute-only user permissions by
clearing the PTE_USER and PTE_UXN bits, practically making it a mostly
privileged mapping but from which user running at EL0 can still execute.

The downside, however, is that the kernel at EL1 inadvertently reading
such mapping would not trip over the PAN (privileged access never)
protection.

Revert the relevant bits from commit cab15ce604e5 ("arm64: Introduce
execute-only page access permissions") so that PROT_EXEC implies
PROT_READ (and therefore PTE_USER) until the architecture gains proper
support for execute-only user mappings.

Fixes: cab15ce604e5 ("arm64: Introduce execute-only page access permissions")
Cc: # 4.9.x-
Acked-by: Will Deacon
Signed-off-by: Catalin Marinas
Signed-off-by: Linus Torvalds

Catalin Marinas
2020-01-07 02:10:07 +0800