09 Jun, 2020
40 commits
-
read_code operates on user addresses.
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Link: http://lkml.kernel.org/r/20200515143646.3857579-27-hch@lst.de
Signed-off-by: Linus Torvalds -
Only build read_code when binary formats that use it are built into the
kernel.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Link: http://lkml.kernel.org/r/20200515143646.3857579-26-hch@lst.de
Signed-off-by: Linus Torvalds -
Rename the current flush_icache_range to flush_icache_user_range as per
commit ae92ef8a4424 ("PATCH] flush icache in correct context") there
seems to be an assumption that it operates on user addresses. Add a
flush_icache_range around it that for now is a no-op.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Geert Uytterhoeven
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/20200515143646.3857579-25-hch@lst.de
Signed-off-by: Linus Torvalds -
flush_icache_user_range will be the name for a generic primitive. Move
the arm name so that arm already has an implementation.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Russell King
Link: http://lkml.kernel.org/r/20200515143646.3857579-24-hch@lst.de
Signed-off-by: Linus Torvalds -
The Xtensa implementation of flush_icache_range seems to be able to cope
with user addresses. Just define flush_icache_user_range to
flush_icache_range.[jcmvbkbc@gmail.com: fix flush_icache_user_range in noMMU configs]
Link: http://lkml.kernel.org/r/20200525221556.4270-1-jcmvbkbc@gmail.comSigned-off-by: Christoph Hellwig
Signed-off-by: Max Filippov
Signed-off-by: Andrew Morton
Cc: Chris Zankel
Cc: Max Filippov
Link: http://lkml.kernel.org/r/20200515143646.3857579-23-hch@lst.de
Signed-off-by: Linus Torvalds -
The SuperH implementation of flush_icache_range seems to be able to cope
with user addresses. Just define flush_icache_user_range to
flush_icache_range.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Yoshinori Sato
Cc: Rich Felker
Link: http://lkml.kernel.org/r/20200515143646.3857579-22-hch@lst.de
Signed-off-by: Linus Torvalds -
Define flush_icache_user_range to flush_icache_range unless the
architecture provides its own implementation.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200515143646.3857579-21-hch@lst.de
Signed-off-by: Linus Torvalds -
The function currently known as flush_icache_user_range only operates on
a single page. Rename it to flush_icache_user_page as we'll need the
name flush_icache_user_range for something else soon.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Geert Uytterhoeven
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Vincent Chen
Cc: Jonas Bonn
Cc: Stefan Kristiansson
Cc: Stafford Horne
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Cc: Albert Ou
Cc: Arnd Bergmann
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Mark Rutland
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20200515143646.3857579-20-hch@lst.de
Signed-off-by: Linus Torvalds -
flush_icache_user_range is only used by , so
remove it from the architectures that implement it, but don't use
.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Russell King
Cc: "David S. Miller"
Cc: Guan Xuetao
Link: http://lkml.kernel.org/r/20200515143646.3857579-19-hch@lst.de
Signed-off-by: Linus Torvalds -
RISC-V needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Also remove the pointless __KERNEL__ ifdef while we're at it.
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Reviewed-by: Palmer Dabbelt
Acked-by: Palmer Dabbelt
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Cc: Albert Ou
Link: http://lkml.kernel.org/r/20200515143646.3857579-18-hch@lst.de
Signed-off-by: Linus Torvalds -
Power needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Also remove the pointless __KERNEL__ ifdef while we're at it.
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Link: http://lkml.kernel.org/r/20200515143646.3857579-17-hch@lst.de
Signed-off-by: Linus Torvalds -
OpenRISC needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Jonas Bonn
Cc: Stefan Kristiansson
Cc: Stafford Horne
Link: http://lkml.kernel.org/r/20200515143646.3857579-16-hch@lst.de
Signed-off-by: Linus Torvalds -
m68knommu needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Greg Ungerer
Cc: Greg Ungerer
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/20200515143646.3857579-15-hch@lst.de
Signed-off-by: Linus Torvalds -
Microblaze needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Michal Simek
Link: http://lkml.kernel.org/r/20200515143646.3857579-14-hch@lst.de
Signed-off-by: Linus Torvalds -
IA64 needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Tony Luck
Cc: Fenghua Yu
Link: http://lkml.kernel.org/r/20200515143646.3857579-13-hch@lst.de
Signed-off-by: Linus Torvalds -
Hexagon needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Brian Cain
Link: http://lkml.kernel.org/r/20200515143646.3857579-12-hch@lst.de
Signed-off-by: Linus Torvalds -
C6x needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Mark Salter
Cc: Aurelien Jacquiot
Link: http://lkml.kernel.org/r/20200515143646.3857579-11-hch@lst.de
Signed-off-by: Linus Torvalds -
ARM64 needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Catalin Marinas
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200515143646.3857579-10-hch@lst.de
Signed-off-by: Linus Torvalds -
Alpha needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Link: http://lkml.kernel.org/r/20200515143646.3857579-9-hch@lst.de
Signed-off-by: Linus Torvalds -
There is a magic ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE cpp symbol that
guards non-stub availability of flush_dcache_pagge. Use that to check
if flush_dcache_pagg is implemented.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200515143646.3857579-8-hch@lst.de
Signed-off-by: Linus Torvalds -
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.[hch@lst.de: fix warning]
Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.deSigned-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Will Deacon
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Anton Ivanov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Dan Williams
Cc: Vishal Verma
Cc: Dave Jiang
Cc: Keith Busch
Cc: Ira Weiny
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds -
cacheflush.h uses a somewhat to generic include guard name that clashes
with various arch files. Use a more specific one.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200515143646.3857579-6-hch@lst.de
Signed-off-by: Linus Torvalds -
flush_cache_user_range is an ARMism not used by any generic or unicore32
specific code.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Guan Xuetao
Link: http://lkml.kernel.org/r/20200515143646.3857579-5-hch@lst.de
Signed-off-by: Linus Torvalds -
flush_icache_user_range is only used by copy_to_user_page, which is only
used by core VM code.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Link: http://lkml.kernel.org/r/20200515143646.3857579-4-hch@lst.de
Signed-off-by: Linus Torvalds -
flush_icache_page is only used by mm/memory.c.
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Greentime Hu
Cc: Vincent Chen
Link: http://lkml.kernel.org/r/20200515143646.3857579-3-hch@lst.de
Signed-off-by: Linus Torvalds -
Patch series "sort out the flush_icache_range mess", v2.
flush_icache_range is mostly used for kernel address, except for the
following cases:- the nommu brk and mmap implementations
- the read_code helper that is only used for binfmt_flat,
binfmt_elf_fdpic, and binfmt_aout including the broken
ia32 compat version- binfmt_flat itself
none of which really are used by a typical MMU enabled kernel, as a.out
can only be build for alpha and m68k to start with.But strangely enough commit ae92ef8a4424 ("PATCH] flush icache in
correct context") added a "set_fs(KERNEL_DS)" around the
flush_icache_range call in the module loader, because apparently m68k
assumed user pointers.This series first cleans up the cacheflush implementations, largely by
switching as much as possible to the asm-generic version after a few
preparations, then moves the misnamed current flush_icache_user_range to
a new name, to finally introduce a real flush_icache_user_range to be
used for the above use cases to flush the instruction cache for a
userspace address range. The last patch then drops the set_fs in the
module code and moves it into the m68k implementation.This patch (of 29):
The arguments passed look bogus, try to fix them to something that seems
to make sense.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Cc: Roman Zippel
Cc: Jessica Yu
Cc: Michal Simek
Cc: Albert Ou
Cc: Alexander Shishkin
Cc: Alexander Viro
Cc: Alexei Starovoitov
Cc: Anton Ivanov
Cc: Arnaldo Carvalho de Melo
Cc: Aurelien Jacquiot
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Chris Zankel
Cc: Daniel Borkmann
Cc: Dan Williams
Cc: Dave Jiang
Cc: "David S. Miller"
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Ungerer
Cc: Guan Xuetao
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Ivan Kokshaysky
Cc: Jeff Dike
Cc: Jiri Olsa
Cc: Jonas Bonn
Cc: Keith Busch
Cc: Mark Rutland
Cc: Mark Salter
Cc: Martin KaFai Lau
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Namhyung Kim
Cc: Nick Piggin
Cc: Palmer Dabbelt
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Song Liu
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vincent Chen
Cc: Vishal Verma
Cc: Will Deacon
Cc: Yonghong Song
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200515143646.3857579-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200515143646.3857579-2-hch@lst.de
Signed-off-by: Linus Torvalds -
This code was using get_user_pages*(), in approximately a "Case 5"
scenario (accessing the data within a page), using the categorization
from [1]. That means that it's time to convert the get_user_pages*() +
put_page() calls to pin_user_pages*() + unpin_user_pages() calls.There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.[1] Documentation/core-api/pin_user_pages.rst
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Acked-by: Michael S. Tsirkin
Acked-by: Pankaj Gupta
Cc: Jason Wang
Cc: Dave Chinner
Cc: Jérôme Glisse
Cc: Jonathan Corbet
Cc: Souptick Joarder
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/20200529234309.484480-3-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
Patch series "vhost, docs: convert to pin_user_pages(), new "case 5""
It recently became clear to me that there are some get_user_pages*()
callers that don't fit neatly into any of the four cases that are so far
listed in pin_user_pages.rst. vhost.c is one of those.Add a Case 5 to the documentation, and refer to that when converting
vhost.c.Thanks to Jan Kara for helping me (again) in understanding the
interaction between get_user_pages() and page writeback [1].This is based on today's mmotm, which has a nearby patch to
pin_user_pages.rst that rewords cases 3 and 4.Note that I have only compile-tested the vhost.c patch, although that
does also include cross-compiling for a few other arches. Any run-time
testing would be greatly appreciated.[1] https://lore.kernel.org/r/20200529070343.GL14550@quack2.suse.cz
This patch (of 2):
There are four cases listed in pin_user_pages.rst. These are intended
to help developers figure out whether to use get_user_pages*(), or
pin_user_pages*(). However, the four cases do not cover all the
situations. For example, drivers/vhost/vhost.c has a "pin, write to
page, set page dirty, unpin" case.Add a fifth case, to help explain that there is a general pattern that
requires pin_user_pages*() API calls.[jhubbard@nvidia.com: v2]
Link: http://lkml.kernel.org/r/20200601052633.853874-2-jhubbard@nvidia.comSigned-off-by: John Hubbard
Signed-off-by: Andrew Morton
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Jérôme Glisse
Cc: Dave Chinner
Cc: Jonathan Corbet
Cc: Souptick Joarder
Cc: "Michael S . Tsirkin"
Cc: Jason Wang
Link: http://lkml.kernel.org/r/20200529234309.484480-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20200529234309.484480-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
All of the pin_user_pages*() API calls will cause pages to be
dma-pinned. As such, they are all suitable for either DMA, RDMA, and/or
Direct IO.The documentation should say so, but it was instead saying that three of
the API calls were only suitable for Direct IO. This was discovered
when a reviewer wondered why an API call that specifically recommended
against Case 2 (DMA/RDMA) was being used in a DMA situation [1].Fix this by simply deleting those claims. The gup.c comments already
refer to the more extensive Documentation/core-api/pin_user_pages.rst,
which does have the correct guidance. So let's just write it once,
there.[1] https://lore.kernel.org/r/20200529074658.GM30374@kadam
Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Acked-by: Pankaj Gupta
Acked-by: Souptick Joarder
Cc: Dan Carpenter
Cc: Jan Kara
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/20200529084515.46259-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
This code was using get_user_pages*(), and all of the callers so far
were in a "Case 2" scenario (DMA/RDMA), using the categorization from [1].That means that it's time to convert the get_user_pages*() + put_page()
calls to pin_user_pages*() + unpin_user_pages() calls.There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.[1] Documentation/core-api/pin_user_pages.rst
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Acked-by: David Hildenbrand
Cc: Daniel Vetter
Cc: Jérôme Glisse
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dave Chinner
Cc: Pankaj Gupta
Cc: Souptick Joarder
Link: http://lkml.kernel.org/r/20200527223243.884385-3-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
Patch series "mm/gup: introduce pin_user_pages_locked(), use it in frame_vector.c", v2.
This adds yet one more pin_user_pages*() variant, and uses that to
convert mm/frame_vector.c.With this, along with maybe 20 or 30 other recent patches in various
trees, we are close to having the relevant gup call sites
converted--with the notable exception of the bio/block layer.This patch (of 2):
Introduce pin_user_pages_locked(), which is nearly identical to
get_user_pages_locked() except that it sets FOLL_PIN and rejects
FOLL_GET.As with other pairs of get_user_pages*() and pin_user_pages() API calls,
it's prudent to assert that FOLL_PIN is *not* set in the
get_user_pages*() call, so add that as part of this.[jhubbard@nvidia.com: v2]
Link: http://lkml.kernel.org/r/20200531234131.770697-2-jhubbard@nvidia.comSigned-off-by: John Hubbard
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Acked-by: Pankaj Gupta
Cc: Daniel Vetter
Cc: Jérôme Glisse
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dave Chinner
Cc: Souptick Joarder
Link: http://lkml.kernel.org/r/20200531234131.770697-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20200527223243.884385-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20200527223243.884385-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
Update case 3 so that it covers the use of mmu notifiers, for hardware
that does, or does not have replayable page faults.Also, elaborate case 4 slightly, as it was quite cryptic.
Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Cc: Daniel Vetter
Cc: Jérôme Glisse
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dave Chinner
Cc: Jonathan Corbet
Link: http://lkml.kernel.org/r/20200527194953.11130-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
align with pin_user_pages_fast_only().As part of this we will get rid of write parameter. Instead caller will
pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any
existing functionality of the API.All the callers are changed to pass FOLL_WRITE.
Also introduce get_user_page_fast_only(), and use it in a few places
that hard-code nr_pages to 1.Updated the documentation of the API.
Signed-off-by: Souptick Joarder
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Paul Mackerras [arch/powerpc/kvm]
Cc: Matthew Wilcox
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Mark Rutland
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paolo Bonzini
Cc: Stephen Rothwell
Cc: Mike Rapoport
Cc: Aneesh Kumar K.V
Cc: Michal Suchanek
Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Linus Torvalds -
Users with SYS_ADMIN capability can add arbitrary taint flags to the
running kernel by writing to /proc/sys/kernel/tainted or issuing the
command 'sysctl -w kernel.tainted=...'. This interface, however, is
open for any integer value and this might cause an invalid set of flags
being committed to the tainted_mask bitset.This patch introduces a simple way for proc_taint() to ignore any
eventual invalid bit coming from the user input before committing those
bits to the kernel tainted_mask.Signed-off-by: Rafael Aquini
Signed-off-by: Andrew Morton
Reviewed-by: Luis Chamberlain
Cc: Kees Cook
Cc: Iurii Zaikin
Cc: "Theodore Ts'o"
Link: http://lkml.kernel.org/r/20200512223946.888020-1-aquini@redhat.com
Signed-off-by: Linus Torvalds -
Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it. The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that). The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops. Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.Signed-off-by: Guilherme G. Piccoli
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Cc: Luis Chamberlain
Cc: Iurii Zaikin
Cc: Thomas Gleixner
Cc: Vlastimil Babka
Cc: Randy Dunlap
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds -
Commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before
panic") introduced a change in that we started to show all CPUs
backtraces when a hung task is detected _and_ the sysctl/kernel
parameter "hung_task_panic" is set. The idea is good, because usually
when observing deadlocks (that may lead to hung tasks), the culprit is
another task holding a lock and not necessarily the task detected as
hung.The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in
many CPU machines, as servers commonly found nowadays). So, users that
plan to collect a kdump to investigate the hung tasks and narrow down
the deadlock definitely don't need the CPUs backtrace on dmesg/console,
which will delay the panic and pollute the log (crash tool would easily
grab all CPUs traces with 'bt -a' command).Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung
task is detected. The current approach hence is almost as embedding a
policy in the kernel, by forcing the CPUs backtraces' dump (only) on
hung_task_panic.This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
allow individual control. The new mechanism for dumping the CPUs
backtraces on hung task detection respects "hung_task_warnings" by not
dumping the traces in case there's no warnings left.Signed-off-by: Guilherme G. Piccoli
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Cc: Tetsuo Handa
Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds -
After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters on kernel command line; also, the
series introduced a simple infrastructure to convert legacy boot
parameters (that duplicate sysctls) into sysctl aliases.This patch converts the watchdog parameters softlockup_panic and
{hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
It fixes the documentation too, since the alias only accepts values 0 or
1, not the full range of integers.We also took the opportunity here to improve the documentation of the
previously converted hung_task_panic (see the patch series [0]) and put
the alias table in alphabetical order.[0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
Signed-off-by: Guilherme G. Piccoli
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Cc: Kees Cook
Cc: Iurii Zaikin
Cc: Luis Chamberlain
Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds -
Testing is done by a new parameter debug.test_sysctl.boot_int which
defaults to 0 and it's expected that the tester passes a boot parameter
that sets it to 1. The test checks if it's set to 1.To distinguish true failure from parameter not being set, the test
checks /proc/cmdline for the expected parameter, and whether test_sysctl
is built-in and not a module.[vbabka@suse.cz: skip the new test if boot_int sysctl is not present]
Link: http://lkml.kernel.org/r/305af605-1e60-cf84-fada-6ce1ca37c102@suse.czSigned-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Cc: Alexey Dobriyan
Cc: Christian Brauner
Cc: David Rientjes
Cc: "Eric W . Biederman"
Cc: Greg Kroah-Hartman
Cc: "Guilherme G . Piccoli"
Cc: Iurii Zaikin
Cc: Ivan Teterevkov
Cc: Kees Cook
Cc: Luis Chamberlain
Cc: Masami Hiramatsu
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Michal Hocko
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20200427180433.7029-6-vbabka@suse.cz
Signed-off-by: Linus Torvalds -
The testing script recommends CONFIG_TEST_SYSCTL=y, but actually only
works with CONFIG_TEST_SYSCTL=m. Testing of sysctl setting via boot
param however requires the test to be built-in, so make sure the test
script supports it.Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Luis Chamberlain
Cc: Alexey Dobriyan
Cc: Christian Brauner
Cc: David Rientjes
Cc: "Eric W . Biederman"
Cc: Greg Kroah-Hartman
Cc: "Guilherme G . Piccoli"
Cc: Iurii Zaikin
Cc: Ivan Teterevkov
Cc: Kees Cook
Cc: Masami Hiramatsu
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Michal Hocko
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20200427180433.7029-5-vbabka@suse.cz
Signed-off-by: Linus Torvalds -
We can now handle sysctl parameters on kernel command line and have
infrastructure to convert legacy command line options that duplicate
sysctl to become a sysctl alias.This patch converts the hung_task_panic parameter. Note that the sysctl
handler is more strict and allows only 0 and 1, while the legacy
parameter allowed any non-zero value. But there is little reason anyone
would not be using 1.Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Michal Hocko
Cc: Alexey Dobriyan
Cc: Christian Brauner
Cc: David Rientjes
Cc: "Eric W . Biederman"
Cc: Greg Kroah-Hartman
Cc: "Guilherme G . Piccoli"
Cc: Iurii Zaikin
Cc: Ivan Teterevkov
Cc: Luis Chamberlain
Cc: Masami Hiramatsu
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz
Signed-off-by: Linus Torvalds