11 Apr, 2020
15 commits
-
Merge yet more updates from Andrew Morton:
- Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
gup, hugetlb, pagemap, memremap)- Various other things (hfs, ocfs2, kmod, misc, seqfile)
* akpm: (34 commits)
ipc/util.c: sysvipc_find_ipc() should increase position index
kernel/gcov/fs.c: gcov_seq_next() should increase position index
fs/seq_file.c: seq_read(): add info message about buggy .next functions
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
change email address for Pali Rohár
selftests: kmod: test disabling module autoloading
selftests: kmod: fix handling test numbers above 9
docs: admin-guide: document the kernel.modprobe sysctl
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
kmod: make request_module() return an error when autoloading is disabled
mm/memremap: set caching mode for PCI P2PDMA memory to WC
mm/memory_hotplug: add pgprot_t to mhp_params
powerpc/mm: thread pgprot_t through create_section_mapping()
x86/mm: introduce __set_memory_prot()
x86/mm: thread pgprot_t through init_memory_mapping()
mm/memory_hotplug: rename mhp_restrictions to mhp_params
mm/memory_hotplug: drop the flags field from struct mhp_restrictions
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
mm/vma: introduce VM_ACCESS_FLAGS
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
... -
PCI BAR IO memory should never be mapped as WB, however prior to this
the PAT bits were set WB and it was typically overridden by MTRR
registers set by the firmware.Set PCI P2PDMA memory to be UC as this is what it currently, typically,
ends up being mapped as on x86 after the MTRR registers override the
cache setting.Future use-cases may need to generalize this by adding flags to select
the caching type, as some P2PDMA cases may not want UC. However, those
use-cases are not upstream yet and this can be changed when they arrive.Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Reviewed-by: Dan Williams
Cc: Christoph Hellwig
Cc: Jason Gunthorpe
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
Signed-off-by: Linus Torvalds -
devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory. At present, these mappings are
created with PAGE_KERNEL which implies setting the PAT bits to be WB.
However, on x86, an mtrr register will typically override this and force
the cache type to be UC-. In the case firmware doesn't set this
register it is effectively WB and will typically result in a machine
check exception when it's accessed.Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.To solve this, provide a way to specify the pgprot value explicitly to
arch_add_memory().Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables. For x86_32, set the page tables
explicitly using _set_memory_prot() (seeing they are already mapped).For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
should be fine, for now, seeing these architectures don't support
ZONE_DEVICE.A check in __add_pages() is also added to ensure the pgprot parameter
was set for all arches.Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Acked-by: David Hildenbrand
Acked-by: Michal Hocko
Acked-by: Dan Williams
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Dave Hansen
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
Signed-off-by: Linus Torvalds -
The mhp_restrictions struct really doesn't specify anything resembling a
restriction anymore so rename it to be mhp_params as it is a list of
extended parameters.Signed-off-by: Logan Gunthorpe
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Reviewed-by: Dan Williams
Acked-by: Michal Hocko
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Dave Hansen
Cc: Eric Badger
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
Signed-off-by: Linus Torvalds -
There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group. One such example
is during page fault. Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.Signed-off-by: Anshuman Khandual
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Cc: Russell King
Cc: Catalin Marinas
Cc: Mark Salter
Cc: Nick Hu
Cc: Ley Foon Tan
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Guan Xuetao
Cc: Dave Hansen
Cc: Thomas Gleixner
Cc: Rob Springer
Cc: Greg Kroah-Hartman
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds -
Add the ability to insert multiple pages at once to a user VM with lower
PTE spinlock operations.The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.[akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
[arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
[arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
Signed-off-by: Arjun Roy
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Andrew Morton
Cc: David Miller
Cc: Matthew Wilcox
Cc: Jason Gunthorpe
Cc: Stephen Rothwell
Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds -
Add helper methods for vm_insert_page()/insert_page() to prepare for
vm_insert_pages(), which batch-inserts pages to reduce spinlock
operations when inserting multiple consecutive pages into the user page
table.The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.Signed-off-by: Arjun Roy
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Andrew Morton
Cc: David Miller
Cc: Matthew Wilcox
Cc: Jason Gunthorpe
Cc: Stephen Rothwell
Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds -
On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset. Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.Fix this uninitialized value issue by setting it to 0 explicitly.
Before:
vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022After:
vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0Signed-off-by: Jaewon Kim
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Matthew Wilcox (Oracle)
Cc: Michel Lespinasse
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds -
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.However it actually works only at early stages of the system loading,
when the majority of memory is free. After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero. Even dropping caches manually doesn't
help a lot.At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex. At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
cma allocator and the dedicated cma areaIn this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.* On a multi-node machine a per-node cma area is allocated on each node.
Following gigantic hugetlb allocation are using the first available
numa node if the mask isn't specified by a user.Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
pass hugetlb_cma=10G as a kernel argument2) allocate hugetlb pages as usual, e.g.
echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepagesIf the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap. It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks!Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Tested-by: Andreas Schaufler
Acked-by: Mike Kravetz
Acked-by: Michal Hocko
Cc: Aslan Bakirov
Cc: Randy Dunlap
Cc: Rik van Riel
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds -
I've noticed that there is no interface exposed by CMA which would let
me to declare contigous memory on particular NUMA node.This patchset adds the ability to try to allocate contiguous memory on a
specific node. It will fallback to other nodes if the specified one
doesn't work.Implement a new method for declaring contigous memory on particular node
and keep cma_declare_contiguous() as a wrapper.[akpm@linux-foundation.org: build fix]
Signed-off-by: Aslan Bakirov
Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Acked-by: Michal Hocko
Cc: Andreas Schaufler
Cc: Mike Kravetz
Cc: Rik van Riel
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
Signed-off-by: Linus Torvalds -
Fix the following sparse warning:
mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static?
mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static?Reported-by: Hulk Robot
Signed-off-by: Jason Yan
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200407023925.46438-1-yanaijie@huawei.com
Signed-off-by: Linus Torvalds -
Add description of function parameter 'mt' to fix kernel-doc warning:
mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page'
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Acked-by: Pankaj Gupta
Link: http://lkml.kernel.org/r/02998bd4-0b82-2f15-2570-f86130304d1e@infradead.org
Signed-off-by: Linus Torvalds -
There is a typo in comment, fix it.
s/eariler/earlier/Signed-off-by: Qiujun Huang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Acked-by: Christoph Lameter
Link: http://lkml.kernel.org/r/20200405160544.1246-1-hqjagain@gmail.com
Signed-off-by: Linus Torvalds -
If a cgroup violates its memory.high constraints, we may end up unduly
penalising it. For example, for the following hierarchy:A: max high, 20 usage
A/B: 9 high, 10 usage
A/C: max high, 10 usageWe would end up doing the following calculation below when calculating
high delay for A/B:A/B: 10 - 9 = 1...
A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21.This gets worse with higher disparities in usage in the parent.
I have no idea how this disappeared from the final version of the patch,
but it is certainly Not Good(tm). This wasn't obvious in testing because,
for a simple cgroup hierarchy with only one child, the result is usually
roughly the same. It's only in more complex hierarchies that things go
really awry (although still, the effects are limited to a maximum of 2
seconds in schedule_timeout_killable at a maximum).[chris@chrisdown.name: changelog]
Fixes: e26733e0d0ec ("mm, memcg: throttle allocators based on ancestral memory.high")
Signed-off-by: Jakub Kicinski
Signed-off-by: Chris Down
Signed-off-by: Andrew Morton
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: [5.4.x]
Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name
Signed-off-by: Linus Torvalds -
Pull block fixes from Jens Axboe:
"Here's a set of fixes that should go into this merge window. This
contains:- NVMe pull request from Christoph with various fixes
- Better discard support for loop (Evan)
- Only call ->commit_rqs() if we have queued IO (Keith)
- blkcg offlining fixes (Tejun)
- fix (and fix the fix) for busy partitions"
* tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block:
block: fix busy device checking in blk_drop_partitions again
block: fix busy device checking in blk_drop_partitions
nvmet-rdma: fix double free of rdma queue
blk-mq: don't commit_rqs() if none were queued
nvme-fc: Revert "add module to ops template to allow module references"
nvme: fix deadlock caused by ANA update wrong locking
nvmet-rdma: fix bonding failover possible NULL deref
loop: Better discard support for block devices
loop: Report EOPNOTSUPP properly
nvmet: fix NULL dereference when removing a referral
nvme: inherit stable pages constraint in the mpath stack device
blkcg: don't offline parent blkcg first
blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
nvme-tcp: fix possible crash in recv error flow
nvme-tcp: don't poll a non-live queue
nvme-tcp: fix possible crash in write_zeroes processing
nvmet-fc: fix typo in comment
nvme-rdma: Replace comma with a semicolon
nvme-fcloop: fix deallocation of working context
nvme: fix compat address handling in several ioctls
09 Apr, 2020
2 commits
-
Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.Summary:
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.- Promote numa_map_to_online_node() to a cross-kernel generic
facility.- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.- Fixup some flexible-array declarations"
* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
... -
__get_user_pages_locked() will return 0 instead of -EINTR after commit
4426e945df588 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which
added extra code to allow gup detect fatal signal faster.Restore the original -EINTR behavior.
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+3be1a33f04dc782e9fd5@syzkaller.appspotmail.com
Signed-off-by: Hillf Danton
Acked-by: Michal Hocko
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds
08 Apr, 2020
23 commits
-
It's definitely incorrect to mark the lock as taken even if
down_read_killable() failed.This wass overlooked when we switched from down_read() to
down_read_killable() because down_read() won't fail while
down_read_killable() could.Fixes: 71335f37c5e8 ("mm/gup: allow to react to fatal signals")
Reported-by: syzbot+a8c70b7f3579fc0587dc@syzkaller.appspotmail.com
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds -
lookup_node() uses gup to pin the page and get node information. It
checks against ret>=0 assuming the page will be filled in. However it's
also possible that gup will return zero, for example, when the thread is
quickly killed with a fatal signal. Teach lookup_node() to gracefully
return an error -EFAULT if it happens.Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
Signed-off-by: Peter Xu
Signed-off-by: Linus Torvalds -
As done in the full WARN() handler, panic_on_warn needs to be cleared
before calling panic() to avoid recursive panics.Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Acked-by: Dmitry Vyukov
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Dan Carpenter
Cc: Elena Petrova
Cc: "Gustavo A. R. Silva"
Link: http://lkml.kernel.org/r/20200227193516.32566-6-keescook@chromium.org
Signed-off-by: Linus Torvalds -
filter_irq_stacks() can be used by other tools (e.g. KMSAN), so it needs
to be moved to a common location. lib/stackdepot.c seems a good place, as
filter_irq_stacks() is usually applied to the output of
stack_trace_save().This patch has been previously mailed as part of KMSAN RFC patch series.
[glider@google.co: nds32: linker script: add SOFTIRQENTRY_TEXT\
Link: http://lkml.kernel.org/r/20200311121002.241430-1-glider@google.com
[glider@google.com: add IRQENTRY_TEXT and SOFTIRQENTRY_TEXT to linker script]
Link: http://lkml.kernel.org/r/20200311121124.243352-1-glider@google.com
Signed-off-by: Alexander Potapenko
Signed-off-by: Andrew Morton
Cc: Vegard Nossum
Cc: Dmitry Vyukov
Cc: Marco Elver
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Arnd Bergmann
Cc: Sergey Senozhatsky
Link: http://lkml.kernel.org/r/20200220141916.55455-3-glider@google.com
Signed-off-by: Linus Torvalds -
Now that "struct proc_ops" exist we can start putting there stuff which
could not fly with VFS "struct file_operations"...Most of fs/proc/inode.c file is dedicated to make open/read/.../close
reliable in the event of disappearing /proc entries which usually happens
if module is getting removed. Files like /proc/cpuinfo which never
disappear simply do not need such protection.Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
"permanent" files.Enable "permanent" flag for
/proc/cpuinfo
/proc/kmsg
/proc/modules
/proc/slabinfo
/proc/stat
/proc/sysvipc/*
/proc/swapsMore will come once I figure out foolproof way to prevent out module
authors from marking their stuff "permanent" for performance reasons
when it is not.This should help with scalability: benchmark is "read /proc/cpuinfo R times
by N threads scattered over the system".N R t, s (before) t, s (after)
-----------------------------------------------------
64 4096 1.582458 1.530502 -3.2%
256 4096 6.371926 6.125168 -3.9%
1024 4096 25.64888 24.47528 -4.6%Benchmark source:
#include
#include
#include
#include#include
#include
#include
#includeconst int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
int N;
const char *filename;
int R;int xxx = 0;
int glue(int n)
{
cpu_set_t m;
CPU_ZERO(&m);
CPU_SET(n, &m);
return sched_setaffinity(0, sizeof(cpu_set_t), &m);
}void f(int n)
{
glue(n % NR_CPUS);while (*(volatile int *)&xxx == 0) {
}for (int i = 0; i < R; i++) {
int fd = open(filename, O_RDONLY);
char buf[4096];
ssize_t rv = read(fd, buf, sizeof(buf));
asm volatile ("" :: "g" (rv));
close(fd);
}
}int main(int argc, char *argv[])
{
if (argc < 4) {
std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
";
return 1;
}N = atoi(argv[1]);
filename = argv[2];
R = atoi(argv[3]);for (int i = 0; i < NR_CPUS; i++) {
if (glue(i) == 0)
break;
}std::vector T;
T.reserve(N);
for (int i = 0; i < N; i++) {
T.emplace_back(f, i);
}auto t0 = std::chrono::system_clock::now();
{
*(volatile int *)&xxx = 1;
for (auto& t: T) {
t.join();
}
}
auto t1 = std::chrono::system_clock::now();
std::chrono::duration dt = t1 - t0;
std::cout << dt.count() << '
';return 0;
}P.S.:
Explicit randomization marker is added because adding non-function pointer
will silently disable structure layout randomization.[akpm@linux-foundation.org: coding style fixes]
Reported-by: kbuild test robot
Reported-by: Dan Carpenter
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Cc: Al Viro
Cc: Joe Perches
Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2
Signed-off-by: Linus Torvalds -
Previously there was a check if 'size' is aligned to 'align' and if not
then it was aligned. This check was expensive as both branch and division
are expensive instructions in most architectures. 'ALIGN' function on
already aligned value will not change it, and as it is cheaper than branch
+ division it can be executed all the time and branch can be removed.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200320173317.26408-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
Convert the various /* fallthrough */ comments to the pseudo-keyword
fallthrough;Done via script:
https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Reviewed-by: Gustavo A. R. Silva
Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
Signed-off-by: Linus Torvalds -
MAX_ZONELISTS is a compile time constant, so it should be compared using
BUILD_BUG_ON not BUG_ON.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Wei Yang
Link: http://lkml.kernel.org/r/20200228224617.11343-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
The parameter of remap_pfn_range() @pfn passed from the caller is actually
a page-frame number converted by corresponding physical address of kernel
memory, the original comment is ambiguous that may mislead the users.Meanwhile, there is an ambiguous typo "VMM" in the comment of
vm_area_struct. So fixing them will make the code more readable.Signed-off-by: chenqiwu
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/1583026921-15279-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at unpin_tag()()
warning: context imbalance in unpin_tag() - unexpected unlock
The root cause is the missing annotation at unpin_tag()
Add the missing __releases(bitlock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-14-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at pin_tag()()
warning: context imbalance in pin_tag() - wrong count at exit
The root cause is the missing annotation at pin_tag()
Add the missing __acquires(bitlock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-13-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at migrate_read_unlock()()
warning: context imbalance in migrate_read_unlock() - unexpected unlock
The root cause is the missing annotation at migrate_read_unlock()
Add the missing __releases(&zspage->lock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-12-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at migrate_read_lock()()
warning: context imbalance in migrate_read_lock() - wrong count at exit
The root cause is the missing annotation at migrate_read_lock()
Add the missing __acquires(&zspage->lock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Link: http://lkml.kernel.org/r/20200214204741.94112-11-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at put_map()()
warning: context imbalance in put_map() - unexpected unlock
The root cause is the missing annotation at put_map()
Add the missing __releases(&object_map_lock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-10-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at get_map()()
warning: context imbalance in get_map() - wrong count at exit
The root cause is the missing annotation at get_map()
Add the missing __acquires(&object_map_lock) annotationSigned-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-9-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at queue_pages_pmd()
context imbalance in queue_pages_pmd() - unexpected unlock
The root cause is the missing annotation at queue_pages_pmd()
Add the missing __releases(ptl)Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-8-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at gather_surplus_pages()
warning: context imbalance in hugetlb_cow() - unexpected unlock
The root cause is the missing annotation at gather_surplus_pages()
Add the missing __must_hold(&hugetlb_lock)Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Reviewed-by: Mike Kravetz
Link: http://lkml.kernel.org/r/20200214204741.94112-7-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
Sparse reports a warning at compact_lock_irqsave()
warning: context imbalance in compact_lock_irqsave() - wrong count at exit
The root cause is the missing annotation at compact_lock_irqsave()
Add the missing __acquires(lock) annotation.Signed-off-by: Jules Irenge
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200214204741.94112-6-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds -
The compressed cache for swap pages (zswap) currently needs from 1 to 3
extra kernel command line parameters in order to make it work: it has to
be enabled by adding a "zswap.enabled=1" command line parameter and if one
wants a different compressor or pool allocator than the default lzo / zbud
combination then these choices also need to be specified on the kernel
command line in additional parameters.Using a different compressor and allocator for zswap is actually pretty
common as guides often recommend using the lz4 / z3fold pair instead of
the default one. In such case it is also necessary to remember to enable
the appropriate compression algorithm and pool allocator in the kernel
config manually.Let's avoid the need for adding these kernel command line parameters and
automatically pull in the dependencies for the selected compressor
algorithm and pool allocator by adding an appropriate default switches to
Kconfig.The default values for these options match what the code was using
previously as its defaults.Signed-off-by: Maciej S. Szmigiero
Signed-off-by: Andrew Morton
Reviewed-by: Vitaly Wool
Link: http://lkml.kernel.org/r/20200202000112.456103-1-mail@maciej.szmigiero.name
Signed-off-by: Linus Torvalds -
I recently build the RISC-V port with LLVM trunk, which has introduced a
new warning when casting from a pointer to an enum of a smaller size.
This patch simply casts to a long in the middle to stop the warning. I'd
be surprised this is the only one in the kernel, but it's the only one I
saw.Signed-off-by: Palmer Dabbelt
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200227211741.83165-1-palmer@dabbelt.com
Signed-off-by: Linus Torvalds -
Yang Shi writes:
Currently, when truncating a shmem file, if the range is partly in a THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed, unless the range covers the whole THP.
Even though all the subpages are truncated (randomly or sequentially), the
THP may still be kept in page cache.This might be fine for some usecases which prefer preserving THP, but
balloon inflation is handled in base page size. So when using shmem THP
as memory backend, QEMU inflation actually doesn't work as expected since
it doesn't free memory. But the inflation usecase really needs to get the
memory freed. (Anonymous THP will also not get freed right away, but will
be freed eventually when all subpages are unmapped: whereas shmem THP
still stays in page cache.)Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read of the punched area will return zeroes.Hugh Dickins adds:
Our earlier "team of pages" huge tmpfs implementation worked in the way
that Yang Shi proposes; and we have been using this patch to continue to
split the huge page when hole-punched or truncated, since converting over
to the compound page implementation. Although huge tmpfs gives out huge
pages when available, if the user specifically asks to truncate or punch a
hole (perhaps to free memory, perhaps to reduce the memcg charge), then
the filesystem should do so as best it can, splitting the huge page.That is not always possible: any additional reference to the huge page
prevents split_huge_page() from succeeding, so the result can be flaky.
But in practice it works successfully enough that we've not seen any
problem from that.Add shmem_punch_compound() to encapsulate the decision of when a split is
needed, and doing the split if so. Using this simplifies the flow in
shmem_undo_range(); and the first (trylock) pass does not need to do any
page clearing on failure, because the second pass will either succeed or
do that clearing. Following the example of zero_user_segment() when
clearing a partial page, add flush_dcache_page() and set_page_dirty() when
clearing a hole - though I'm not certain that either is needed.But: split_huge_page() would be sure to fail if shmem_undo_range()'s
pagevec holds further references to the huge page. The easiest way to fix
that is for find_get_entries() to return early, as soon as it has put one
compound head or tail into the pagevec. At first this felt like a hack;
but on examination, this convention better suits all its callers - or will
do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping()
and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec
speedup by checking for compound pages there.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Cc: Yang Shi
Cc: Alexander Duyck
Cc: "Michael S. Tsirkin"
Cc: David Hildenbrand
Cc: "Kirill A. Shutemov"
Cc: Matthew Wilcox
Cc: Andrea Arcangeli
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2002261959020.10801@eggly.anvils
Signed-off-by: Linus Torvalds -
Previously 0 was assigned to variable 'error' but the variable was never
read before reassignemnt later. So the assignment can be removed.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Matthew Wilcox (Oracle)
Acked-by: Pankaj Gupta
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20200301152832.24595-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
Variables declared in a switch statement before any case statements cannot
be automatically initialized with compiler instrumentation (as they are
not part of any execution flow). With GCC's proposed automatic stack
variable initialization feature, this triggers a warning (and they don't
get initialized). Clang's automatic stack variable initialization (via
CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't
initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase, so
even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.To avoid these problems, move such variables into the "case" where they're
used or lift them up into the main function body.mm/shmem.c: In function `shmem_getpage_gfp':
mm/shmem.c:1816:10: warning: statement will never be executed [-Wswitch-unreachable]
1816 | loff_t i_size;
| ^~~~~~[1] https://bugs.llvm.org/show_bug.cgi?id=44916
Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Hugh Dickins
Cc: Alexander Potapenko
Link: http://lkml.kernel.org/r/20200220062312.69165-1-keescook@chromium.org
Signed-off-by: Linus Torvalds