Eric Lee / smarc-fsl-linux-kernel

11 Apr, 2020

15 commits

5b8b9d0c6 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge yet more updates from Andrew Morton:

- Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
gup, hugetlb, pagemap, memremap)

- Various other things (hfs, ocfs2, kmod, misc, seqfile)

* akpm: (34 commits)
ipc/util.c: sysvipc_find_ipc() should increase position index
kernel/gcov/fs.c: gcov_seq_next() should increase position index
fs/seq_file.c: seq_read(): add info message about buggy .next functions
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
change email address for Pali Rohár
selftests: kmod: test disabling module autoloading
selftests: kmod: fix handling test numbers above 9
docs: admin-guide: document the kernel.modprobe sysctl
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
kmod: make request_module() return an error when autoloading is disabled
mm/memremap: set caching mode for PCI P2PDMA memory to WC
mm/memory_hotplug: add pgprot_t to mhp_params
powerpc/mm: thread pgprot_t through create_section_mapping()
x86/mm: introduce __set_memory_prot()
x86/mm: thread pgprot_t through init_memory_mapping()
mm/memory_hotplug: rename mhp_restrictions to mhp_params
mm/memory_hotplug: drop the flags field from struct mhp_restrictions
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
mm/vma: introduce VM_ACCESS_FLAGS
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
...

Linus Torvalds
2020-04-11 08:57:48 +0800
a50d8d98a mm/memremap: set caching mode for PCI P2PDMA memory to WC ... Browse Code »

PCI BAR IO memory should never be mapped as WB, however prior to this
the PAT bits were set WB and it was typically overridden by MTRR
registers set by the firmware.

Set PCI P2PDMA memory to be UC as this is what it currently, typically,
ends up being mapped as on x86 after the MTRR registers override the
cache setting.

Future use-cases may need to generalize this by adding flags to select
the caching type, as some P2PDMA cases may not want UC. However, those
use-cases are not upstream yet and this can be changed when they arrive.

Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Reviewed-by: Dan Williams
Cc: Christoph Hellwig
Cc: Jason Gunthorpe
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
Signed-off-by: Linus Torvalds

Logan Gunthorpe
2020-04-11 06:36:21 +0800
bfeb022f8 mm/memory_hotplug: add pgprot_t to mhp_params ... Browse Code »

devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory. At present, these mappings are
created with PAGE_KERNEL which implies setting the PAT bits to be WB.
However, on x86, an mtrr register will typically override this and force
the cache type to be UC-. In the case firmware doesn't set this
register it is effectively WB and will typically result in a machine
check exception when it's accessed.

Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.

To solve this, provide a way to specify the pgprot value explicitly to
arch_add_memory().

Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables. For x86_32, set the page tables
explicitly using _set_memory_prot() (seeing they are already mapped).

For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
should be fine, for now, seeing these architectures don't support
ZONE_DEVICE.

A check in __add_pages() is also added to ensure the pgprot parameter
was set for all arches.

Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Acked-by: David Hildenbrand
Acked-by: Michal Hocko
Acked-by: Dan Williams
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Dave Hansen
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
Signed-off-by: Linus Torvalds

Logan Gunthorpe
2020-04-11 06:36:21 +0800
f5637d3b4 mm/memory_hotplug: rename mhp_restrictions to mhp_params ... Browse Code »

The mhp_restrictions struct really doesn't specify anything resembling a
restriction anymore so rename it to be mhp_params as it is a list of
extended parameters.

Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Reviewed-by: Dan Williams
Acked-by: Michal Hocko
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Dave Hansen
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
Signed-off-by: Linus Torvalds

Logan Gunthorpe
2020-04-11 06:36:21 +0800
6cb4d9a28 mm/vma: introduce VM_ACCESS_FLAGS ... Browse Code »

There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group. One such example
is during page fault. Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Cc: Russell King
Cc: Catalin Marinas
Cc: Mark Salter
Cc: Nick Hu
Cc: Ley Foon Tan
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Guan Xuetao
Cc: Dave Hansen
Cc: Thomas Gleixner
Cc: Rob Springer
Cc: Greg Kroah-Hartman
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Anshuman Khandual
2020-04-11 06:36:21 +0800
8cd3984d8 mm/memory.c: add vm_insert_pages() ... Browse Code »

Add the ability to insert multiple pages at once to a user VM with lower
PTE spinlock operations.

The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.

[akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
[arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
[arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
Signed-off-by: Arjun Roy
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Andrew Morton
Cc: David Miller
Cc: Matthew Wilcox
Cc: Jason Gunthorpe
Cc: Stephen Rothwell
Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds

Arjun Roy
2020-04-11 06:36:21 +0800
8efd6f5b1 mm/memory.c: refactor insert_page to prepare for batched-lock insert ... Browse Code »

Add helper methods for vm_insert_page()/insert_page() to prepare for
vm_insert_pages(), which batch-inserts pages to reduce spinlock
operations when inserting multiple consecutive pages into the user page
table.

The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.

Signed-off-by: Arjun Roy
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Andrew Morton
Cc: David Miller
Cc: Matthew Wilcox
Cc: Jason Gunthorpe
Cc: Stephen Rothwell
Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds

Arjun Roy
2020-04-11 06:36:21 +0800
09ef5283f mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area ... Browse Code »

On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset. Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.

But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.

Fix this uninitialized value issue by setting it to 0 explicitly.

Before:
vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022

After:
vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0

Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Matthew Wilcox (Oracle)
Cc: Michel Lespinasse
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

Jaewon Kim
2020-04-11 06:36:21 +0800
cf11e85fc mm: hugetlb: optionally allocate gigantic hugepages using cma ... Browse Code »

Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.

However it actually works only at early stages of the system loading,
when the majority of memory is free. After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero. Even dropping caches manually doesn't
help a lot.

At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex. At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.

The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
cma allocator and the dedicated cma area

In this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.

* On a multi-node machine a per-node cma area is allocated on each node.
Following gigantic hugetlb allocation are using the first available
numa node if the mask isn't specified by a user.

Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
pass hugetlb_cma=10G as a kernel argument

2) allocate hugetlb pages as usual, e.g.
echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

If the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.

x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.

The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap. It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks!

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Tested-by: Andreas Schaufler
Acked-by: Mike Kravetz
Acked-by: Michal Hocko
Cc: Aslan Bakirov
Cc: Randy Dunlap
Cc: Rik van Riel
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-04-11 06:36:21 +0800
8676af1ff mm: cma: NUMA node interface ... Browse Code »

I've noticed that there is no interface exposed by CMA which would let
me to declare contigous memory on particular NUMA node.

This patchset adds the ability to try to allocate contiguous memory on a
specific node. It will fallback to other nodes if the specified one
doesn't work.

Implement a new method for declaring contigous memory on particular node
and keep cma_declare_contiguous() as a wrapper.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Aslan Bakirov
Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Acked-by: Michal Hocko
Cc: Andreas Schaufler
Cc: Mike Kravetz
Cc: Rik van Riel
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
Signed-off-by: Linus Torvalds

Aslan Bakirov
2020-04-11 06:36:21 +0800
8b885f53b mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static ... Browse Code »

Fix the following sparse warning:

mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static?
mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static?

Reported-by: Hulk Robot
Signed-off-by: Jason Yan
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200407023925.46438-1-yanaijie@huawei.com
Signed-off-by: Linus Torvalds

Jason Yan
2020-04-11 06:36:21 +0800
e6a0a7ad1 mm/page_alloc.c: fix kernel-doc warning ... Browse Code »

Add description of function parameter 'mt' to fix kernel-doc warning:

mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Acked-by: Pankaj Gupta
Link: http://lkml.kernel.org/r/02998bd4-0b82-2f15-2570-f86130304d1e@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-04-11 06:36:20 +0800
b991cee56 mm, slab_common: fix a typo in comment "eariler"->"earlier" ... Browse Code »

There is a typo in comment, fix it.
s/eariler/earlier/

Signed-off-by: Qiujun Huang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Acked-by: Christoph Lameter
Link: http://lkml.kernel.org/r/20200405160544.1246-1-hqjagain@gmail.com
Signed-off-by: Linus Torvalds

Qiujun Huang
2020-04-11 06:36:20 +0800
9b8b17541 mm, memcg: do not high throttle allocators based on wraparound ... Browse Code »

If a cgroup violates its memory.high constraints, we may end up unduly
penalising it. For example, for the following hierarchy:

A: max high, 20 usage
A/B: 9 high, 10 usage
A/C: max high, 10 usage

We would end up doing the following calculation below when calculating
high delay for A/B:

A/B: 10 - 9 = 1...
A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21.

This gets worse with higher disparities in usage in the parent.

I have no idea how this disappeared from the final version of the patch,
but it is certainly Not Good(tm). This wasn't obvious in testing because,
for a simple cgroup hierarchy with only one child, the result is usually
roughly the same. It's only in more complex hierarchies that things go
really awry (although still, the effects are limited to a maximum of 2
seconds in schedule_timeout_killable at a maximum).

[chris@chrisdown.name: changelog]
Fixes: e26733e0d0ec ("mm, memcg: throttle allocators based on ancestral memory.high")
Signed-off-by: Jakub Kicinski
Signed-off-by: Chris Down
Signed-off-by: Andrew Morton
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: [5.4.x]
Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name
Signed-off-by: Linus Torvalds

Jakub Kicinski
2020-04-11 06:36:20 +0800
8df2a0a6d Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Here's a set of fixes that should go into this merge window. This
contains:

- NVMe pull request from Christoph with various fixes

- Better discard support for loop (Evan)

- Only call ->commit_rqs() if we have queued IO (Keith)

- blkcg offlining fixes (Tejun)

- fix (and fix the fix) for busy partitions"

* tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block:
block: fix busy device checking in blk_drop_partitions again
block: fix busy device checking in blk_drop_partitions
nvmet-rdma: fix double free of rdma queue
blk-mq: don't commit_rqs() if none were queued
nvme-fc: Revert "add module to ops template to allow module references"
nvme: fix deadlock caused by ANA update wrong locking
nvmet-rdma: fix bonding failover possible NULL deref
loop: Better discard support for block devices
loop: Report EOPNOTSUPP properly
nvmet: fix NULL dereference when removing a referral
nvme: inherit stable pages constraint in the mpath stack device
blkcg: don't offline parent blkcg first
blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
nvme-tcp: fix possible crash in recv error flow
nvme-tcp: don't poll a non-live queue
nvme-tcp: fix possible crash in write_zeroes processing
nvmet-fc: fix typo in comment
nvme-rdma: Replace comma with a semicolon
nvme-fcloop: fix deallocation of working context
nvme: fix compat address handling in several ioctls

Linus Torvalds
2020-04-11 01:06:54 +0800

09 Apr, 2020

2 commits

9b06860d7 Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.

This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.

Summary:

- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.

- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.

- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.

- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.

- Promote numa_map_to_online_node() to a cross-kernel generic
facility.

- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.

- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.

- Fixup some flexible-array declarations"

* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...

Linus Torvalds
2020-04-09 12:03:40 +0800
ae46d2aa6 mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal ... Browse Code »

__get_user_pages_locked() will return 0 instead of -EINTR after commit
4426e945df588 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which
added extra code to allow gup detect fatal signal faster.

Restore the original -EINTR behavior.

Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+3be1a33f04dc782e9fd5@syzkaller.appspotmail.com
Signed-off-by: Hillf Danton
Acked-by: Michal Hocko
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds

Hillf Danton
2020-04-09 00:05:58 +0800

08 Apr, 2020

23 commits

c7b6a566b mm/gup: Mark lock taken only after a successful retake ... Browse Code »

It's definitely incorrect to mark the lock as taken even if
down_read_killable() failed.

This wass overlooked when we switched from down_read() to
down_read_killable() because down_read() won't fail while
down_read_killable() could.

Fixes: 71335f37c5e8 ("mm/gup: allow to react to fatal signals")
Reported-by: syzbot+a8c70b7f3579fc0587dc@syzkaller.appspotmail.com
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds

Peter Xu
2020-04-08 10:34:31 +0800
ba841078c mm/mempolicy: Allow lookup_node() to handle fatal signal ... Browse Code »

lookup_node() uses gup to pin the page and get node information. It
checks against ret>=0 assuming the page will be filled in. However it's
also possible that gup will return zero, for example, when the thread is
quickly killed with a fatal signal. Teach lookup_node() to gracefully
return an error -EFAULT if it happens.

Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.

Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds

Peter Xu
2020-04-08 10:34:31 +0800
1d2252fab kasan: unset panic_on_warn before calling panic() ... Browse Code »

As done in the full WARN() handler, panic_on_warn needs to be cleared
before calling panic() to avoid recursive panics.

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Acked-by: Dmitry Vyukov
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Dan Carpenter
Cc: Elena Petrova
Cc: "Gustavo A. R. Silva"
Link: http://lkml.kernel.org/r/20200227193516.32566-6-keescook@chromium.org
Signed-off-by: Linus Torvalds

Kees Cook
2020-04-08 01:43:44 +0800
505a0ef15 kasan: stackdepot: move filter_irq_stacks() to stackdepot.c ... Browse Code »

filter_irq_stacks() can be used by other tools (e.g. KMSAN), so it needs
to be moved to a common location. lib/stackdepot.c seems a good place, as
filter_irq_stacks() is usually applied to the output of
stack_trace_save().

This patch has been previously mailed as part of KMSAN RFC patch series.

[glider@google.co: nds32: linker script: add SOFTIRQENTRY_TEXT\
Link: http://lkml.kernel.org/r/20200311121002.241430-1-glider@google.com
[glider@google.com: add IRQENTRY_TEXT and SOFTIRQENTRY_TEXT to linker script]
Link: http://lkml.kernel.org/r/20200311121124.243352-1-glider@google.com
Signed-off-by: Alexander Potapenko
Signed-off-by: Andrew Morton
Cc: Vegard Nossum
Cc: Dmitry Vyukov
Cc: Marco Elver
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Arnd Bergmann
Cc: Sergey Senozhatsky
Link: http://lkml.kernel.org/r/20200220141916.55455-3-glider@google.com
Signed-off-by: Linus Torvalds

Alexander Potapenko
2020-04-08 01:43:43 +0800
d919b33da proc: faster open/read/close with "permanent" files ... Browse Code »

Now that "struct proc_ops" exist we can start putting there stuff which
could not fly with VFS "struct file_operations"...

Most of fs/proc/inode.c file is dedicated to make open/read/.../close
reliable in the event of disappearing /proc entries which usually happens
if module is getting removed. Files like /proc/cpuinfo which never
disappear simply do not need such protection.

Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
"permanent" files.

Enable "permanent" flag for

/proc/cpuinfo
/proc/kmsg
/proc/modules
/proc/slabinfo
/proc/stat
/proc/sysvipc/*
/proc/swaps

More will come once I figure out foolproof way to prevent out module
authors from marking their stuff "permanent" for performance reasons
when it is not.

This should help with scalability: benchmark is "read /proc/cpuinfo R times
by N threads scattered over the system".

N R t, s (before) t, s (after)
-----------------------------------------------------
64 4096 1.582458 1.530502 -3.2%
256 4096 6.371926 6.125168 -3.9%
1024 4096 25.64888 24.47528 -4.6%

Benchmark source:

#include
#include
#include
#include

#include
#include
#include
#include

const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
int N;
const char *filename;
int R;

int xxx = 0;

int glue(int n)
{
cpu_set_t m;
CPU_ZERO(&m);
CPU_SET(n, &m);
return sched_setaffinity(0, sizeof(cpu_set_t), &m);
}

void f(int n)
{
glue(n % NR_CPUS);

while (*(volatile int *)&xxx == 0) {
}

for (int i = 0; i < R; i++) {
int fd = open(filename, O_RDONLY);
char buf[4096];
ssize_t rv = read(fd, buf, sizeof(buf));
asm volatile ("" :: "g" (rv));
close(fd);
}
}

int main(int argc, char *argv[])
{
if (argc < 4) {
std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
";
return 1;
}

N = atoi(argv[1]);
filename = argv[2];
R = atoi(argv[3]);

for (int i = 0; i < NR_CPUS; i++) {
if (glue(i) == 0)
break;
}

std::vector T;
T.reserve(N);
for (int i = 0; i < N; i++) {
T.emplace_back(f, i);
}

auto t0 = std::chrono::system_clock::now();
{
*(volatile int *)&xxx = 1;
for (auto& t: T) {
t.join();
}
}
auto t1 = std::chrono::system_clock::now();
std::chrono::duration dt = t1 - t0;
std::cout << dt.count() << '
';

return 0;
}

P.S.:
Explicit randomization marker is added because adding non-function pointer
will silently disable structure layout randomization.

[akpm@linux-foundation.org: coding style fixes]
Reported-by: kbuild test robot
Reported-by: Dan Carpenter
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Cc: Al Viro
Cc: Joe Perches
Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2020-04-08 01:43:42 +0800
1386f7a3b mm/dmapool.c: micro-optimisation remove unnecessary branch ... Browse Code »

Previously there was a check if 'size' is aligned to 'align' and if not
then it was aligned. This check was expensive as both branch and division
are expensive instructions in most architectures. 'ALIGN' function on
already aligned value will not change it, and as it is cheaper than branch
+ division it can be executed all the time and branch can be removed.

Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200320173317.26408-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds

Mateusz Nosek
2020-04-08 01:43:42 +0800
e4a9bc589 mm: use fallthrough; ... Browse Code »

Convert the various /* fallthrough */ comments to the pseudo-keyword
fallthrough;

Done via script:
https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Reviewed-by: Gustavo A. R. Silva
Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
Signed-off-by: Linus Torvalds

Joe Perches
2020-04-08 01:43:41 +0800
e46b893dd mm/mm_init.c: clean code. Use BUILD_BUG_ON when comparing compile time constant ... Browse Code »

MAX_ZONELISTS is a compile time constant, so it should be compared using
BUILD_BUG_ON not BUG_ON.

Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Wei Yang
Link: http://lkml.kernel.org/r/20200228224617.11343-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds

Mateusz Nosek
2020-04-08 01:43:41 +0800
552657b7b mm: fix ambiguous comments for better code readability ... Browse Code »

The parameter of remap_pfn_range() @pfn passed from the caller is actually
a page-frame number converted by corresponding physical address of kernel
memory, the original comment is ambiguous that may mislead the users.

Meanwhile, there is an ambiguous typo "VMM" in the comment of
vm_area_struct. So fixing them will make the code more readable.

Signed-off-by: chenqiwu
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/1583026921-15279-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Linus Torvalds

chenqiwu
2020-04-08 01:43:41 +0800
bc22b18b1 mm/zsmalloc: add missing annotation for unpin_tag() ... Browse Code »

Sparse reports a warning at unpin_tag()()

warning: context imbalance in unpin_tag() - unexpected unlock

The root cause is the missing annotation at unpin_tag()
Add the missing __releases(bitlock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-14-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
70c7ec95b mm/zsmalloc: add missing annotation for pin_tag() ... Browse Code »

Sparse reports a warning at pin_tag()()

warning: context imbalance in pin_tag() - wrong count at exit

The root cause is the missing annotation at pin_tag()
Add the missing __acquires(bitlock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-13-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
8a374ccce mm/zsmalloc: add missing annotation for migrate_read_unlock() ... Browse Code »

Sparse reports a warning at migrate_read_unlock()()

warning: context imbalance in migrate_read_unlock() - unexpected unlock

The root cause is the missing annotation at migrate_read_unlock()
Add the missing __releases(&zspage->lock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-12-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
cfc451cfd mm/zsmalloc: add missing annotation for migrate_read_lock() ... Browse Code »

Sparse reports a warning at migrate_read_lock()()

warning: context imbalance in migrate_read_lock() - wrong count at exit

The root cause is the missing annotation at migrate_read_lock()
Add the missing __acquires(&zspage->lock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-11-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
81aba9e06 mm/slub: add missing annotation for put_map() ... Browse Code »

Sparse reports a warning at put_map()()

warning: context imbalance in put_map() - unexpected unlock

The root cause is the missing annotation at put_map()
Add the missing __releases(&object_map_lock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-10-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
31364c2e1 mm/slub: add missing annotation for get_map() ... Browse Code »

Sparse reports a warning at get_map()()

warning: context imbalance in get_map() - wrong count at exit

The root cause is the missing annotation at get_map()
Add the missing __acquires(&object_map_lock) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-9-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
959a7e136 mm/mempolicy: add missing annotation for queue_pages_pmd() ... Browse Code »

Sparse reports a warning at queue_pages_pmd()

context imbalance in queue_pages_pmd() - unexpected unlock

The root cause is the missing annotation at queue_pages_pmd()
Add the missing __releases(ptl)

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-8-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
1b2a1e7bb mm/hugetlb: add missing annotation for gather_surplus_pages() ... Browse Code »

Sparse reports a warning at gather_surplus_pages()

warning: context imbalance in hugetlb_cow() - unexpected unlock

The root cause is the missing annotation at gather_surplus_pages()
Add the missing __must_hold(&hugetlb_lock)

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Reviewed-by: Mike Kravetz
Link: http://lkml.kernel.org/r/20200214204741.94112-7-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
77337edee mm/compaction: add missing annotation for compact_lock_irqsave ... Browse Code »

Sparse reports a warning at compact_lock_irqsave()

warning: context imbalance in compact_lock_irqsave() - wrong count at exit

The root cause is the missing annotation at compact_lock_irqsave()
Add the missing __acquires(lock) annotation.

Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-6-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds

Jules Irenge
2020-04-08 01:43:41 +0800
bb8b93b5b mm/zswap: allow setting default status, compressor and allocator in Kconfig ... Browse Code »

The compressed cache for swap pages (zswap) currently needs from 1 to 3
extra kernel command line parameters in order to make it work: it has to
be enabled by adding a "zswap.enabled=1" command line parameter and if one
wants a different compressor or pool allocator than the default lzo / zbud
combination then these choices also need to be specified on the kernel
command line in additional parameters.

Using a different compressor and allocator for zswap is actually pretty
common as guides often recommend using the lz4 / z3fold pair instead of
the default one. In such case it is also necessary to remember to enable
the appropriate compression algorithm and pool allocator in the kernel
config manually.

Let's avoid the need for adding these kernel command line parameters and
automatically pull in the dependencies for the selected compressor
algorithm and pool allocator by adding an appropriate default switches to
Kconfig.

The default values for these options match what the code was using
previously as its defaults.

Signed-off-by: Maciej S. Szmigiero
Signed-off-by: Andrew Morton
Reviewed-by: Vitaly Wool
Link: http://lkml.kernel.org/r/20200202000112.456103-1-mail@maciej.szmigiero.name
Signed-off-by: Linus Torvalds

Maciej S. Szmigiero
2020-04-08 01:43:41 +0800
4708f3188 mm: prevent a warning when casting void* -> enum ... Browse Code »

I recently build the RISC-V port with LLVM trunk, which has introduced a
new warning when casting from a pointer to an enum of a smaller size.
This patch simply casts to a long in the middle to stop the warning. I'd
be surprised this is the only one in the kernel, but it's the only one I
saw.

Signed-off-by: Palmer Dabbelt
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200227211741.83165-1-palmer@dabbelt.com
Signed-off-by: Linus Torvalds

Palmer Dabbelt
2020-04-08 01:43:41 +0800
71725ed10 mm: huge tmpfs: try to split_huge_page() when punching hole ... Browse Code »

Yang Shi writes:

Currently, when truncating a shmem file, if the range is partly in a THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed, unless the range covers the whole THP.
Even though all the subpages are truncated (randomly or sequentially), the
THP may still be kept in page cache.

This might be fine for some usecases which prefer preserving THP, but
balloon inflation is handled in base page size. So when using shmem THP
as memory backend, QEMU inflation actually doesn't work as expected since
it doesn't free memory. But the inflation usecase really needs to get the
memory freed. (Anonymous THP will also not get freed right away, but will
be freed eventually when all subpages are unmapped: whereas shmem THP
still stays in page cache.)

Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read of the punched area will return zeroes.

Hugh Dickins adds:

Our earlier "team of pages" huge tmpfs implementation worked in the way
that Yang Shi proposes; and we have been using this patch to continue to
split the huge page when hole-punched or truncated, since converting over
to the compound page implementation. Although huge tmpfs gives out huge
pages when available, if the user specifically asks to truncate or punch a
hole (perhaps to free memory, perhaps to reduce the memcg charge), then
the filesystem should do so as best it can, splitting the huge page.

That is not always possible: any additional reference to the huge page
prevents split_huge_page() from succeeding, so the result can be flaky.
But in practice it works successfully enough that we've not seen any
problem from that.

Add shmem_punch_compound() to encapsulate the decision of when a split is
needed, and doing the split if so. Using this simplifies the flow in
shmem_undo_range(); and the first (trylock) pass does not need to do any
page clearing on failure, because the second pass will either succeed or
do that clearing. Following the example of zero_user_segment() when
clearing a partial page, add flush_dcache_page() and set_page_dirty() when
clearing a hole - though I'm not certain that either is needed.

But: split_huge_page() would be sure to fail if shmem_undo_range()'s
pagevec holds further references to the huge page. The easiest way to fix
that is for find_get_entries() to return early, as soon as it has put one
compound head or tail into the pagevec. At first this felt like a hack;
but on examination, this convention better suits all its callers - or will
do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping()
and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec
speedup by checking for compound pages there.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Cc: Yang Shi
Cc: Alexander Duyck
Cc: "Michael S. Tsirkin"
Cc: David Hildenbrand
Cc: "Kirill A. Shutemov"
Cc: Matthew Wilcox
Cc: Andrea Arcangeli
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2002261959020.10801@eggly.anvils
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-04-08 01:43:41 +0800
343c3d7f0 mm/shmem.c: clean code by removing unnecessary assignment ... Browse Code »

Previously 0 was assigned to variable 'error' but the variable was never
read before reassignemnt later. So the assignment can be removed.

Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Matthew Wilcox (Oracle)
Acked-by: Pankaj Gupta
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20200301152832.24595-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds

Mateusz Nosek
2020-04-08 01:43:41 +0800
27d80fa24 mm/shmem.c: distribute switch variables for initialization ... Browse Code »

Variables declared in a switch statement before any case statements cannot
be automatically initialized with compiler instrumentation (as they are
not part of any execution flow). With GCC's proposed automatic stack
variable initialization feature, this triggers a warning (and they don't
get initialized). Clang's automatic stack variable initialization (via
CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't
initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase, so
even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where they're
used or lift them up into the main function body.

mm/shmem.c: In function `shmem_getpage_gfp':
mm/shmem.c:1816:10: warning: statement will never be executed [-Wswitch-unreachable]
1816 | loff_t i_size;
| ^~~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Hugh Dickins
Cc: Alexander Potapenko
Link: http://lkml.kernel.org/r/20200220062312.69165-1-keescook@chromium.org
Signed-off-by: Linus Torvalds

Kees Cook
2020-04-08 01:43:41 +0800