11 Mar, 2022
1 commit
-
This is the 5.15.27 stable release
* tag 'v5.15.27': (3069 commits)
Linux 5.15.27
hamradio: fix macro redefine warning
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
...Signed-off-by: Jason Liu
Conflicts:
arch/arm/boot/dts/imx7ulp.dtsi
arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts
arch/arm64/boot/dts/freescale/imx8mq.dtsi
drivers/dma-buf/heaps/cma_heap.c
drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
drivers/gpu/drm/mxsfb/mxsfb_kms.c
drivers/mmc/host/sdhci-esdhc-imx.c
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
drivers/rpmsg/rpmsg_char.c
drivers/soc/imx/gpcv2.c
drivers/thermal/imx_thermal.c
09 Mar, 2022
1 commit
-
[ Upstream commit 60115fa54ad7b913b7cb5844e6b7ffeb842d55f2 ]
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.module_alloc
__vmalloc_node_range
kmemleak_vmalloc
kmemleak_scan
update_checksum
kasan_module_alloc
kmemleak_ignoreNote, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated. Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.comLink: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82de4 ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc682 ("arm64: add KASAN support")
Fixes: bebf56a1b176 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang
Reported-by: Yongqiang Liu
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Alexander Gordeev
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Alexander Potapenko
Cc: Kefeng Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
30 Nov, 2021
1 commit
-
* jailhouse/next: (45 commits)
LF-3330 net: ivshmem-net: include ethtool to avoid build break
MLK-25346: net: add imx-shmem-net driver
LF-2949 arm: kernel: hyp-stub: not export __hyp_stub_vectors
LF-3097/LF-3172 virtio: ivshmem: check peer_state early
LF-3016-3 tools/virtio: ivshmem-console: correct device_vector to 0
...
02 Nov, 2021
4 commits
-
Needed by Jailhouse so far, as replacement of __get_vm_area. Should be
solved there eventually (via removal of JAILHOUSE_BORROW_ROOT_PT), then
this can be dropped again.Signed-off-by: Jan Kiszka
-
We need this in Jailhouse to map at specific virtual addresses, at
least for the moment.Signed-off-by: Jan Kiszka
[ Aisheng: update the code change to vmalloc.c ]
Signed-off-by: Dong Aisheng -
This reverts commit 8491502f787c4a902bd4f223b578ef47d3490264.
Jailhouse relies on the X flag to execute hypervisor entry code,
Before we find a better way to fix the jailhouse enable issue,
revert this commit to make Jailhouse could work.Signed-off-by: Peng Fan
-
We need this in Jailhouse to map at specific virtual addresses, at
least for the moment.Signed-off-by: Jan Kiszka
(cherry picked from commit 94bb285491a9a9e15c82c0761505b1073d6b7a47)
29 Oct, 2021
1 commit
-
Eric Dumazet reported a strange numa spreading info in [1], and found
commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced
this issue [2].Dig into the difference before and after this patch, page allocation has
some difference:before:
alloc_large_system_hash
__vmalloc
__vmalloc_node(..., NUMA_NO_NODE, ...)
__vmalloc_node_range
__vmalloc_area_node
alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */
alloc_pages_current
alloc_page_interleave /* can be proved by print policy mode */after:
alloc_large_system_hash
__vmalloc
__vmalloc_node(..., NUMA_NO_NODE, ...)
__vmalloc_node_range
__vmalloc_area_node
alloc_pages_node /* choose nid by nuam_mem_id() */
__alloc_pages_node(nid, ....)So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"),
it will allocate memory in current node instead of interleaving allocate
memory.Link: https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@mail.gmail.com/ [2]
Link: https://lkml.kernel.org/r/20211021080744.874701-2-chenwandun@huawei.com
Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Chen Wandun
Reported-by: Eric Dumazet
Cc: Shakeel Butt
Cc: Nicholas Piggin
Cc: Kefeng Wang
Cc: Hanjun Guo
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Sep, 2021
3 commits
-
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"* emailed patches from Andrew Morton : (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
... -
There is no need to execute from iomem (and most platforms it is
impossible anyway), so add the pgprot_nx() call similar to vmap.Link: https://lkml.kernel.org/r/20210824091259.1324527-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "small ioremap cleanups".
The first patch moves a little code around the vmalloc/ioremap boundary
following a bigger move by Nick earlier. The second enforces
non-executable mapping on ioremap just like we do for vmap. No driver
currently uses executable mappings anyway, as they should.This patch (of 2):
This keeps it together with the implementation, and to remove the
vmap_range wrapper.Link: https://lkml.kernel.org/r/20210824091259.1324527-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210824091259.1324527-2-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Nicholas Piggin
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Sep, 2021
3 commits
-
commit f608788cd2d6 ("mm/vmalloc: use rb_tree instead of list for vread()
lookups") use rb_tree instread of list to speed up lookup, but function
__find_vmap_area is try to find a vmap_area that include target address,
if target address is smaller than the leftmost node in vmap_area_root, it
will return NULL, then vread will read nothing. This behavior is
different from the primitive semantics.The correct way is find the first vmap_are that bigger than target addr,
that is what function find_vmap_area_exceed_addr does.Link: https://lkml.kernel.org/r/20210714015959.3204871-1-chenwandun@huawei.com
Fixes: f608788cd2d6 ("mm/vmalloc: use rb_tree instead of list for vread() lookups")
Signed-off-by: Chen Wandun
Reported-by: Hulk Robot
Cc: Serapheim Dimitropoulos
Cc: Uladzislau Rezki (Sony)
Cc: Kefeng Wang
Cc: Wei Yongjun
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Get rid of gfpflags_allow_blocking() check from the vmalloc() path as it
is supposed to be sleepable anyway. Thus remove it from the
alloc_vmap_area() as well as from the vm_area_alloc_pages().Link: https://lkml.kernel.org/r/20210707182639.31282-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Nicholas Piggin
Cc: Hillf Danton
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In case of simultaneous vmalloc allocations, for example it is 1GB and 12
CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
kernel.RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0
Call Trace:
__vmalloc_node_range+0x11c/0x2d0
__vmalloc_node+0x4b/0x70
fix_size_alloc_test+0x44/0x60 [test_vmalloc]
test_func+0xe7/0x1f0 [test_vmalloc]
kthread+0x11a/0x140
ret_from_fork+0x22/0x30To address this issue invoke a bulk-allocator many times until all pages
are obtained, i.e. do batched page requests adding cond_resched()
meanwhile to reschedule. Batched value is hard-coded and is 100 pages per
call.Link: https://lkml.kernel.org/r/20210707182639.31282-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Acked-by: Michal Hocko
Cc: Christoph Hellwig
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Nicholas Piggin
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Jul, 2021
1 commit
-
make W=1 generates the following warning for mm/vmalloc.c
mm/vmalloc.c:1599:6: warning: no previous prototype for `set_iounmap_nonlazy' [-Wmissing-prototypes]
void set_iounmap_nonlazy(void)
^~~~~~~~~~~~~~~~~~~This is an arch-generic function only used by x86. On other arches, it's
dead code. Include the header with the definition and make it x86-64
specific.Link: https://lkml.kernel.org/r/20210520084809.8576-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Reviewed-by: Yang Shi
Acked-by: Vlastimil Babka
Cc: Dan Streetman
Cc: David Hildenbrand
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2021
2 commits
-
On some architectures like powerpc, there are huge pages that are mapped
at pte level.Enable it in vmalloc.
For that, architectures can provide arch_vmap_pte_supported_shift() that
returns the shift for pages to map at pte level.Link: https://lkml.kernel.org/r/2c717e3b1fba1894d890feb7669f83025bfa314d.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Nicholas Piggin
Cc: Paul Mackerras
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On some architectures like powerpc, there are huge pages that are mapped
at pte level.Enable it in vmap.
For that, architectures can provide arch_vmap_pte_range_map_size() that
returns the size of pages to map at pte level.Link: https://lkml.kernel.org/r/fb3ccc73377832ac6708181ec419128a2f98ce36.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Mike Kravetz
Cc: Mike Rapoport
Cc: Nicholas Piggin
Cc: Paul Mackerras
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Jun, 2021
5 commits
-
On non-preemptible kernel builds the watchdog can complain about soft
lockups when vfree() is called against large vmalloc areas:[ 210.851798] kvmalloc-test: vmalloc(2199023255552) succeeded
[ 238.654842] watchdog: BUG: soft lockup - CPU#181 stuck for 26s! [rmmod:5203]
[ 238.662716] Modules linked in: kvmalloc_test(OE-) ...
[ 238.772671] CPU: 181 PID: 5203 Comm: rmmod Tainted: G S OE 5.13.0-rc7+ #1
[ 238.781413] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0553.D01.1809190614 09/19/2018
[ 238.792383] RIP: 0010:free_unref_page+0x52/0x60
[ 238.797447] Code: 48 c1 fd 06 48 89 ee e8 9c d0 ff ff 84 c0 74 19 9c 41 5c fa 48 89 ee 48 89 df e8 b9 ea ff ff 41 f7 c4 00 02 00 00 74 01 fb 5b 41 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 29 77
[ 238.818406] RSP: 0018:ffffb4d87868fe98 EFLAGS: 00000206
[ 238.824236] RAX: 0000000000000000 RBX: 000000001da0c945 RCX: ffffb4d87868fe40
[ 238.832200] RDX: ffffd79d3beed108 RSI: ffffd7998501dc08 RDI: ffff9c6fbffd7010
[ 238.840166] RBP: 000000000d518cbd R08: ffffd7998501dc08 R09: 0000000000000001
[ 238.848131] R10: 0000000000000000 R11: ffffd79d3beee088 R12: 0000000000000202
[ 238.856095] R13: ffff9e5be3eceec0 R14: 0000000000000000 R15: 0000000000000000
[ 238.864059] FS: 00007fe082c2d740(0000) GS:ffff9f4c69b40000(0000) knlGS:0000000000000000
[ 238.873089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 238.879503] CR2: 000055a000611128 CR3: 000000f6094f6006 CR4: 00000000007706e0
[ 238.887467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 238.895433] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 238.903397] PKRU: 55555554
[ 238.906417] Call Trace:
[ 238.909149] __vunmap+0x17c/0x220
[ 238.912851] __x64_sys_delete_module+0x13a/0x250
[ 238.918008] ? syscall_trace_enter.isra.20+0x13c/0x1b0
[ 238.923746] do_syscall_64+0x39/0x80
[ 238.927740] entry_SYSCALL_64_after_hwframe+0x44/0xaeLike in other range zapping routines that iterate over a large list, lets
just add cond_resched() within __vunmap()'s page-releasing loop in order
to avoid the watchdog splats.Link: https://lkml.kernel.org/r/20210622225030.478384-1-aquini@redhat.com
Signed-off-by: Rafael Aquini
Acked-by: Nicholas Piggin
Reviewed-by: Uladzislau Rezki (Sony)
Reviewed-by: Aaron Tomlin
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently for order-0 pages we use a bulk-page allocator to get set of
pages. From the other hand not allocating all pages is something that
might occur. In that case we should fallbak to the single-page allocator
trying to get missing pages, because it is more permissive(direct reclaim,
etc).Introduce a vm_area_alloc_pages() function where the described logic is
implemented.Link: https://lkml.kernel.org/r/20210521130718.GA17882@pc638.lan
Signed-off-by: Uladzislau Rezki (Sony)
Reviewed-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Cc: Mel Gorman
Cc: Nicholas Piggin
Cc: Hillf Danton
Cc: Michal Hocko
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A checkpatch.pl script complains on splitting a text across lines. It is
because if a user wants to find an entire string he or she will not
succeeded.WARNING: quoted string split across lines
+ "vmalloc size %lu allocation failure: "
+ "page order %u allocation failed",total: 0 errors, 1 warnings, 10 lines checked
Link: https://lkml.kernel.org/r/20210521204359.19943-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Cc: Mel Gorman
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Nicholas Piggin
Cc: Hillf Danton
Cc: Michal Hocko
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a memory allocation for array of pages are not succeed emit a warning
message as a first step and then perform the further cleanup.The reason it should be done in a right order is the clean up function
which is free_vm_area() can potentially also follow its error paths what
can lead to confusion what was broken first.Link: https://lkml.kernel.org/r/20210516202056.2120-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Nicholas Piggin
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Recently there has been introduced a page bulk allocator for users which
need to get number of pages per one call request.For order-0 pages switch to an alloc_pages_bulk_array_node() instead of
alloc_pages_node(), the reason is the former is not capable of allocating
set of pages, thus a one call is per one page.Second, according to my tests the bulk allocator uses less cycles even for
scenarios when only one page is requested. Running the "perf" on same
test case shows below difference:- 45.18% __vmalloc_node
- __vmalloc_node_range
- 35.60% __alloc_pages
- get_page_from_freelist
3.36% __list_del_entry_valid
3.00% check_preemption_disabled
1.42% prep_new_page- 31.00% __vmalloc_node
- __vmalloc_node_range
- 14.48% __alloc_pages_bulk
3.22% __list_del_entry_valid
- 0.83% __alloc_pages
get_page_from_freelistThe "test_vmalloc.sh" also shows performance improvements:
fix_size_alloc_test_4MB loops: 1000000 avg: 89105095 usec
fix_size_alloc_test loops: 1000000 avg: 513672 usec
full_fit_alloc_test loops: 1000000 avg: 748900 usec
long_busy_list_alloc_test loops: 1000000 avg: 8043038 usec
random_size_alloc_test loops: 1000000 avg: 4028582 usec
fix_align_alloc_test loops: 1000000 avg: 1457671 usecfix_size_alloc_test_4MB loops: 1000000 avg: 62083711 usec
fix_size_alloc_test loops: 1000000 avg: 449207 usec
full_fit_alloc_test loops: 1000000 avg: 735985 usec
long_busy_list_alloc_test loops: 1000000 avg: 5176052 usec
random_size_alloc_test loops: 1000000 avg: 2589252 usec
fix_align_alloc_test loops: 1000000 avg: 1365009 usecFor example 4MB allocations illustrates ~30% gain, all the
rest is also better.Link: https://lkml.kernel.org/r/20210516202056.2120-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Acked-by: Mel Gorman
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Nicholas Piggin
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jun, 2021
2 commits
-
In commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"),
__vmalloc_node_range was changed such that __get_vm_area_node was no
longer called with the requested/real size of the vmalloc allocation,
but rather with a rounded-up size.This means that __get_vm_area_node called kasan_unpoision_vmalloc() with
a rounded up size rather than the real size. This led to it allowing
access to too much memory and so missing vmalloc OOBs and failing the
kasan kunit tests.Pass the real size and the desired shift into __get_vm_area_node. This
allows it to round up the size for the underlying allocators while still
unpoisioning the correct quantity of shadow memory.Adjust the other call-sites to pass in PAGE_SHIFT for the shift value.
Link: https://lkml.kernel.org/r/20210617081330.98629-1-dja@axtens.net
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213335
Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Daniel Axtens
Tested-by: David Gow
Reviewed-by: Nicholas Piggin
Reviewed-by: Uladzislau Rezki (Sony)
Tested-by: Andrey Konovalov
Acked-by: Andrey Konovalov
Cc: Dmitry Vyukov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "mm: add vmalloc_no_huge and use it", v4.
Add vmalloc_no_huge() and export it, so modules can allocate memory with
small pages.Use the newly added vmalloc_no_huge() in KVM on s390 to get around a
hardware limitation.This patch (of 2):
Commit 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") added
support for hugepage vmalloc mappings, it also added the flag
VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be
performed with 0-order non-huge pages.This flag is not accessible when calling vmalloc, the only option is to
call directly __vmalloc_node_range, which is not exported.This means that a module can't vmalloc memory with small pages.
Case in point: KVM on s390x needs to vmalloc a large area, and it needs
to be mapped with non-huge pages, because of a hardware limitation.This patch adds the function vmalloc_no_huge, which works like vmalloc,
but it is guaranteed to always back the mapping using small pages. This
new function is exported, therefore it is usable by modules.[akpm@linux-foundation.org: whitespace fixes, per Christoph]
Link: https://lkml.kernel.org/r/20210614132357.10202-1-imbrenda@linux.ibm.com
Link: https://lkml.kernel.org/r/20210614132357.10202-2-imbrenda@linux.ibm.com
Fixes: 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Claudio Imbrenda
Reviewed-by: Uladzislau Rezki (Sony)
Acked-by: Nicholas Piggin
Reviewed-by: David Hildenbrand
Acked-by: David Rientjes
Cc: Uladzislau Rezki (Sony)
Cc: Catalin Marinas
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Cornelia Huck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 May, 2021
3 commits
-
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar
Reviewed-by: Matthew Wilcox (Oracle)
Reviewed-by: Randy Dunlap
Cc: Bhaskar Chowdhury
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The last user (/dev/kmem) is gone. Let's drop it.
Link: https://lkml.kernel.org/r/20210324102351.6932-4-david@redhat.com
Signed-off-by: David Hildenbrand
Acked-by: Michal Hocko
Cc: Linus Torvalds
Cc: Greg Kroah-Hartman
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Oleksiy Avramchenko
Cc: Steven Rostedt
Cc: Minchan Kim
Cc: huang ying
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "drivers/char: remove /dev/kmem for good".
Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
memory ballooning, I started questioning the existence of /dev/kmem.Comparing it with the /proc/kcore implementation, it does not seem to be
able to deal with things likea) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
-> kern_addr_valid(). virt_addr_valid() is not sufficient.b) Special cases like gart aperture memory that is not to be touched
-> mem_pfn_is_ram()Unless I am missing something, it's at least broken in some cases and might
fault/crash the machine.Looks like its existence has been questioned before in 2005 and 2010 [1],
after ~11 additional years, it might make sense to revive the discussion.CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
mistake?). All distributions disable it: in Ubuntu it has been disabled
for more than 10 years, in Debian since 2.6.31, in Fedora at least
starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
15sp2, and OpenSUSE has it disabled as well.1) /dev/kmem was popular for rootkits [2] before it got disabled
basically everywhere. Ubuntu documents [3] "There is no modern user of
/dev/kmem any more beyond attackers using it to load kernel rootkits.".
RHEL documents in a BZ [5] "it served no practical purpose other than to
serve as a potential security problem or to enable binary module drivers
to access structures/functions they shouldn't be touching"2) /proc/kcore is a decent interface to have a controlled way to read
kernel memory for debugging puposes. (will need some extensions to
deal with memory offlining/unplug, memory ballooning, and poisoned
pages, though)3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
better fit, especially, to write random memory; harder to shoot
yourself into the foot.4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
to be incompatible with 64bit [1]. For educational purposes,
/proc/kcore might be used to monitor value updates -- or older
kernels can be used.5) It's broken on arm64, and therefore, completely disabled there.
Looks like it's essentially unused and has been replaced by better
suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
just remove it.[1] https://lwn.net/Articles/147901/
[2] https://www.linuxjournal.com/article/10505
[3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
[4] https://sourceforge.net/projects/kme/
[5] https://bugzilla.redhat.com/show_bug.cgi?id=154796Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com
Signed-off-by: David Hildenbrand
Acked-by: Michal Hocko
Acked-by: Kees Cook
Cc: Linus Torvalds
Cc: Greg Kroah-Hartman
Cc: "Alexander A. Klimov"
Cc: Alexander Viro
Cc: Alexandre Belloni
Cc: Andrew Lunn
Cc: Andrey Zhizhikin
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Brian Cain
Cc: Christian Borntraeger
Cc: Christophe Leroy
Cc: Chris Zankel
Cc: Corentin Labbe
Cc: "David S. Miller"
Cc: "Eric W. Biederman"
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Gregory Clement
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Hillf Danton
Cc: huang ying
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: "James E.J. Bottomley"
Cc: James Troup
Cc: Jiaxun Yang
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Kairui Song
Cc: Krzysztof Kozlowski
Cc: Kuninori Morimoto
Cc: Liviu Dudau
Cc: Lorenzo Pieralisi
Cc: Luc Van Oostenryck
Cc: Luis Chamberlain
Cc: Matthew Wilcox
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Mikulas Patocka
Cc: Minchan Kim
Cc: Niklas Schnelle
Cc: Oleksiy Avramchenko
Cc: openrisc@lists.librecores.org
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: "Pavel Machek (CIP)"
Cc: Pavel Machek
Cc: "Peter Zijlstra (Intel)"
Cc: Pierre Morel
Cc: Randy Dunlap
Cc: Richard Henderson
Cc: Rich Felker
Cc: Robert Richter
Cc: Rob Herring
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sebastian Andrzej Siewior
Cc: Sebastian Hesselbarth
Cc: sparclinux@vger.kernel.org
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Steven Rostedt
Cc: Sudeep Holla
Cc: Theodore Dubois
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Viresh Kumar
Cc: William Cohen
Cc: Xiaoming Ni
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 May, 2021
1 commit
-
Various coding style tweaks to various files under mm/
[daizhiyuan@phytium.com.cn: mm/swapfile: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614223624-16055-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/sparse: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614227288-19363-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmscan: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614227649-19853-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/compaction: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228218-20770-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/oom_kill: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228360-21168-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/shmem: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228504-21491-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/page_alloc: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228613-21754-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/filemap: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228936-22337-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mlock: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613956588-2453-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/frontswap: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613962668-15045-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmalloc: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613963379-15988-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/memory_hotplug: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613971784-24878-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mempolicy: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613972228-25501-1-git-send-email-daizhiyuan@phytium.com.cnLink: https://lkml.kernel.org/r/1614222374-13805-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 May, 2021
12 commits
-
Link: https://lkml.kernel.org/r/20210402202237.20334-5-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Oleksiy Avramchenko
Cc: Shuah Khan
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of keeping open-coded style, move the code related to preloading
into a separate function. Therefore introduce the preload_this_cpu_lock()
routine that prelaods a current CPU with one extra vmap_area object.There is no functional change as a result of this patch.
Link: https://lkml.kernel.org/r/20210402202237.20334-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony)
Cc: Hillf Danton
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Oleksiy Avramchenko
Cc: Shuah Khan
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A potential use after free can occur in _vm_unmap_aliases where an already
freed vmap_area could be accessed, Consider the following scenario:Process 1 Process 2
__vm_unmap_aliases __vm_unmap_aliases
purge_fragmented_blocks_allcpus rcu_read_lock()
rcu_read_lock()
list_del_rcu(&vb->free_list)
list_for_each_entry_rcu(vb .. )
__purge_vmap_area_lazy
kmem_cache_free(va)
va_start = vb->va->va_startHere Process 1 is in purge path and it does list_del_rcu on vmap_block and
later frees the vmap_area, since Process 2 was holding the rcu lock at
this time vmap_block will still be present in and Process 2 accesse it and
thereby it tries to access vmap_area of that vmap_block which was already
freed by Process 1 and this results in use after free.Fix this by adding a check for vb->dirty before accessing vmap_area
structure since vb->dirty will be set to VMAP_BBMAP_BITS in purge path
checking for this will prevent the use after free.Link: https://lkml.kernel.org/r/1616062105-23263-1-git-send-email-vjitta@codeaurora.org
Signed-off-by: Vijayanand Jitta
Reviewed-by: Uladzislau Rezki (Sony)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are several reasons why a vmalloc can fail, virtual space exhausted,
page array allocation failure, page allocation failure, and kernel page
table allocation failure.Add distinct warning messages for the main causes of failure, with some
added information like page order or allocation size where applicable.[urezki@gmail.com: print correct vmalloc allocation size]
Link: https://lkml.kernel.org/r/20210329193214.GA28602@pc638.lanLink: https://lkml.kernel.org/r/20210322021806.892164-6-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Signed-off-by: Uladzislau Rezki (Sony)
Reviewed-by: Christoph Hellwig
Cc: Cédric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is a shim around vunmap_range, get rid of it.
Move the main API comment from the _noflush variant to the normal
variant, and make _noflush internal to mm/.[npiggin@gmail.com: fix nommu builds and a comment bug per sfr]
Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none
[akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA]
[npiggin@gmail.com: fix nommu builds]
Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.noneLink: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Christoph Hellwig
Cc: Cédric Le Goater
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "mm/vmalloc: cleanup after hugepage series", v2.
Christoph pointed out some overdue cleanups required after the huge
vmalloc series, and I had another failure error message improvement as
well.This patch (of 5):
This is a shim around vmap_pages_range, get rid of it.
Move the main API comment from the _noflush variant to the normal variant,
and make _noflush internal to mm/.Link: https://lkml.kernel.org/r/20210322021806.892164-1-npiggin@gmail.com
Link: https://lkml.kernel.org/r/20210322021806.892164-2-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Christoph Hellwig
Cc: Uladzislau Rezki
Cc: Cédric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.Architectures must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx)
use the VM_NOHUGE flag to inhibit larger mappings.This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.[colin.king@canonical.com: fix read of uninitialized pointer area]
Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.comLink: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Ding Tianhong
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Miaohe Lin
Cc: Michael Ellerman
Cc: Russell King
Cc: Thomas Gleixner
Cc: Uladzislau Rezki (Sony)
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As a side-effect, the order of flush_cache_vmap() and
arch_sync_kernel_mappings() calls are switched, but that now matches the
other callers in this file.Link: https://lkml.kernel.org/r/20210317062402.533919-13-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Christoph Hellwig
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Ding Tianhong
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Miaohe Lin
Cc: Michael Ellerman
Cc: Russell King
Cc: Thomas Gleixner
Cc: Uladzislau Rezki (Sony)
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is a generic kernel virtual memory mapper, not specific to ioremap.
Code is unchanged other than making vmap_range non-static.
Link: https://lkml.kernel.org/r/20210317062402.533919-12-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Christoph Hellwig
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Ding Tianhong
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Miaohe Lin
Cc: Michael Ellerman
Cc: Russell King
Cc: Thomas Gleixner
Cc: Uladzislau Rezki (Sony)
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The vmalloc mapper operates on a struct page * array rather than a linear
physical address, re-name it to make this distinction clear.Link: https://lkml.kernel.org/r/20210317062402.533919-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Miaohe Lin
Reviewed-by: Christoph Hellwig
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Ding Tianhong
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Michael Ellerman
Cc: Russell King
Cc: Thomas Gleixner
Cc: Uladzislau Rezki (Sony)
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected to
know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.This change teaches vmalloc_to_page about larger pages, and returns the
struct page that corresponds to the offset within the large page. This
makes the API agnostic to mapping implementation details.[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")[npiggin@gmail.com: sparc32: add stub pud_page define for walking huge vmalloc page tables]
Link: https://lkml.kernel.org/r/20210324232825.1157363-1-npiggin@gmail.comLink: https://lkml.kernel.org/r/20210317062402.533919-3-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Miaohe Lin
Reviewed-by: Christoph Hellwig
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Ding Tianhong
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Michael Ellerman
Cc: Russell King
Cc: Thomas Gleixner
Cc: Uladzislau Rezki (Sony)
Cc: Will Deacon
Cc: Stephen Rothwell
Cc: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
vread() has been linearly searching vmap_area_list for looking up vmalloc
areas to read from. These same areas are also tracked by a rb_tree
(vmap_area_root) which offers logarithmic lookup.This patch modifies vread() to use the rb_tree structure instead of the
list and the speedup for heavy /proc/kcore readers can be pretty
significant. Below are the wall clock measurements of a Python
application that leverages the drgn debugging library to read and
interpret data read from /proc/kcore.Before the patch:
-----
$ time sudo sdb -e 'dbuf | head 3000 | wc'
(unsigned long)3000real 0m22.446s
user 0m2.321s
sys 0m20.690s
-----With the patch:
-----
$ time sudo sdb -e 'dbuf | head 3000 | wc'
(unsigned long)3000real 0m2.104s
user 0m2.043s
sys 0m0.921s
-----Link: https://lkml.kernel.org/r/20210209190253.108763-1-serapheim@delphix.com
Signed-off-by: Serapheim Dimitropoulos
Reviewed-by: Uladzislau Rezki (Sony)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds