06 Jan, 2021
2 commits
-
[ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ]
Internally, UBD treats each physical IO segment as a separate command to
be submitted in the execution pipe. If the pipe returns a transient
error after a few segments have already been written, UBD will tell the
block layer to requeue the request, but there is no way to reclaim the
segments already submitted. When a new attempt to dispatch the request
is done, those segments already submitted will get duplicated, causing
the WARN_ON below in the best case, and potentially data corruption.In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
I can reproduce the WARN_ON by simply running mkfs.fvat against the
disk on a freshly booted system.There are a few ways to around this, like reducing the pressure on
the pipe by reducing the queue depth, which almost eliminates the
occurrence of the problem, increasing the pipe buffer size on the host
system, or by limiting the request to one physical segment, which causes
the block layer to submit way more requests to resolve a single
operation.Instead, this patch modifies the format of a UBD command, such that all
segments are sent through a single element in the communication pipe,
turning the command submission atomic from the point of view of the
block layer. The new format has a variable size, depending on the
number of elements, and looks like this:+------------+-----------+-----------+------------
| cmd_header | segment 0 | segment 1 | segment ...
+------------+-----------+-----------+------------With this format, we push a pointer to cmd_header in the submission
pipe.This has the advantage of reducing the memory footprint of executing a
single request, since it allow us to merge some fields in the header.
It is possible to reduce even further each segment memory footprint, by
merging bitmap_words and cow_offset, for instance, but this is not the
focus of this patch and is left as future work. One issue with the
patch is that for a big number of segments, we now perform one big
memory allocation instead of multiple small ones, but I wasn't able to
trigger any real issues or -ENOMEM because of this change, that wouldn't
be reproduced otherwise.This was tested using fio with the verify-crc32 option, and by running
an ext4 filesystem over this UBD device.The original WARN_ON was:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
Stack:
6084eed0 6063dc77 00000009 6084ef60
00000000 604b8d9f 6084eee0 6063dcbc
6084ef40 6006ab8d e013d780 1c00000000
Call Trace:
[] ? printk+0x0/0x94
[] show_stack+0x13b/0x155
[] ? dump_stack_print_info+0xdf/0xe8
[] ? refcount_warn_saturate+0x13f/0x141
[] dump_stack+0x2a/0x2c
[] __warn+0x107/0x134
[] ? wake_up_process+0x17/0x19
[] ? blk_queue_max_discard_sectors+0x0/0xd
[] warn_slowpath_fmt+0xd1/0xdf
[] ? warn_slowpath_fmt+0x0/0xdf
[] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
[] ? os_nsecs+0x1d/0x2b
[] refcount_warn_saturate+0x13f/0x141
[] refcount_sub_and_test.constprop.0+0x2f/0x37
[] blk_mq_free_request+0xf1/0x10d
[] __blk_mq_end_request+0x10c/0x114
[] ubd_intr+0xb5/0x169
[] __handle_irq_event_percpu+0x6b/0x17e
[] handle_irq_event_percpu+0x26/0x69
[] handle_irq_event+0x26/0x34
[] ? handle_irq_event+0x0/0x34
[] ? unmask_irq+0x0/0x37
[] handle_edge_irq+0xbc/0xd6
[] generic_handle_irq+0x21/0x29
[] do_IRQ+0x39/0x54
[...]
---[ end trace c6e7444e55386c0f ]---Cc: Christopher Obbard
Reported-by: Martyn Welch
Signed-off-by: Gabriel Krisman Bertazi
Tested-by: Christopher Obbard
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin -
[ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ]
The UML random driver creates a dummy device under the guest,
/dev/hw_random. When this file is read from the guest, the driver
reads from the host machine's /dev/random, in-turn reading from
the host kernel's entropy pool. This entropy pool could have been
filled by a hardware random number generator or just the host
kernel's internal software entropy generator.Currently the driver does not fill the guests kernel entropy pool,
this requires a userspace tool running inside the guest (like
rng-tools) to read from the dummy device provided by this driver,
which then would fill the guest's internal entropy pool.This all seems quite pointless when we are already reading from an
entropy pool, so this patch aims to register the device as a hwrng
device using the hwrng-core framework. This not only improves and
cleans up the driver, but also fills the guest's entropy pool
without having to resort to using extra userspace tools in the guest.This is typically a nuisance when booting a guest: the random pool
takes a long time (~200s) to build up enough entropy since the dummy
hwrng is not used to fill the guest's pool.This port was originally attempted by Alexander Neville "dark" (in CC,
discussion in Link), but the conversation there stalled since the
handling of -EAGAIN errors were no removed and longer handled by the
driver. This patch attempts to use the existing method of error
handling but utilises the new hwrng core.The issue can be noticed when booting a UML guest:
[ 2.560000] random: fast init done
[ 214.000000] random: crng init doneWith the patch applied, filling the pool becomes a lot quicker:
[ 2.560000] random: fast init done
[ 12.000000] random: crng init doneCc: Alexander Neville
Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/
Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/
Cc: Sjoerd Simons
Signed-off-by: Christopher Obbard
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin
30 Dec, 2020
5 commits
-
commit ff9632d2a66512436d616ef4c380a0e73f748db1 upstream.
Since the time-travel rework, basic time-travel mode hasn't worked
properly, but there's no longer a need for this WARN_ON() so just
remove it and thereby fix things.Cc: stable@vger.kernel.org
Fixes: 4b786e24ca80 ("um: time-travel: Rewrite as an event scheduler")
Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger
Signed-off-by: Greg Kroah-Hartman -
commit 97be7ceaf7fea68104824b6aa874cff235333ac1 upstream.
asprintf is not compatible with the existing uml memory allocation
mechanism. Its use on the "user" side of UML results in a corrupt slab
state.Fixes: 0d4e5ac7e780 ("um: remove uses of variable length arrays")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9431f7c199ab0d02da1482d62255e0b4621cb1b5 ]
xterm serial channel was leaking a fd used in setting up the
port helperThis bug is prehistoric - it predates switching to git. The "fixes"
header here is really just to mark all the versions we would like this to
apply to which is "Anything from the Cretaceous period onwards".No dinosaurs were harmed in fixing this bug.
Fixes: b40997b872cd ("um: drivers/xterm.c: fix a file descriptor leak")
Signed-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin -
[ Upstream commit 9b1c0c0e25dcccafd30e7d4c150c249cc65550eb ]
Fix a logical error in tty reading. We get 0 and errno == EAGAIN
on the first attempt to read from a closed file descriptor.Compared to that a true EAGAIN is EAGAIN and -1.
If we check errno for EAGAIN first, before checking the return
value we miss the fact that the descriptor is closed.This bug is as old as the driver. It was not showing up with
the original POLL based IRQ controller, because it was
producing multiple events. Switching to EPOLL unmasked it.Fixes: ff6a17989c08 ("Epoll based IRQ controller")
Signed-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin -
[ Upstream commit e3a01cbee9c5f2c6fc813dd6af007716e60257e7 ]
Ensure that file closes, connection closes, etc are propagated
as interrupts in the interrupt controller.Fixes: ff6a17989c08 ("Epoll based IRQ controller")
Signed-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin
30 Nov, 2020
1 commit
-
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.Similar to the entry path the low level idle functions have to be
non-instrumentable"* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
24 Nov, 2020
1 commit
-
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)Reported-by: Sven Schnelle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Mark Rutland
Tested-by: Mark Rutland
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
11 Nov, 2020
1 commit
-
Commit b2b29d6d0119 ("mm: account PMD tables like PTE tables") uncovered
a bug in uml, we forgot to call the destructor.
While we are here, give x a sane name.Reported-by: Anton Ivanov
Co-developed-by: Matthew Wilcox (Oracle)
Signed-off-by: Richard Weinberger
Tested-by: Christopher Obbard
27 Oct, 2020
1 commit
-
A couple of um files ended up not including the header file that defines
the __section() macro, and the simplest fix is to just revert the change
for those files.Fixes: 33def8498fdd treewide: Convert macro and uses of __section(foo) to __section("foo")
Reported-and-tested-by: Guenter Roeck
Cc: Joe Perches
Signed-off-by: Linus Torvalds
26 Oct, 2020
1 commit
-
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches
Reviewed-by: Nick Desaulniers
Reviewed-by: Miguel Ojeda
Signed-off-by: Linus Torvalds
24 Oct, 2020
1 commit
-
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
23 Oct, 2020
2 commits
-
Pull Kbuild updates from Masahiro Yamada:
- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
- Use sha1 build id for both BFD linker and LLD
- Improve deb-pkg for reproducible builds and rootless builds
- Remove stale, useless scripts/namespace.pl
- Turn -Wreturn-type warning into error
- Fix build error of deb-pkg when CONFIG_MODULES=n
- Replace 'hostname' command with more portable 'uname -n'
- Various Makefile cleanups
* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
... -
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
19 Oct, 2020
1 commit
-
Pull UML updates from Richard Weinberger:
- Improve support for non-glibc systems
- Vector: Add support for scripting and dynamic tap devices
- Various fixes for the vector networking driver
- Various fixes for time travel mode
* tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: vector: Add dynamic tap interfaces and scripting
um: Clean up stacktrace dump
um: Fix incorrect assumptions about max pid length
um: Remove dead usage of TIF_IA32
um: Remove redundant NULL check
um: change sigio_spinlock to a mutex
um: time-travel: Return the sequence number in ACK messages
um: time-travel: Fix IRQ handling in time_travel_handle_message()
um: Allow static linking for non-glibc implementations
um: Some fixes to build UML with musl
um: vector: Use GFP_ATOMIC under spin lock
um: Fix null pointer dereference in vector_user_bpf
18 Oct, 2020
1 commit
-
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.Reviewed-by: Oleg Nesterov
Reviewed-by: Thomas Gleixner
Signed-off-by: Jens Axboe
14 Oct, 2020
1 commit
-
Pull seccomp updates from Kees Cook:
"The bulk of the changes are with the seccomp selftests to accommodate
some powerpc-specific behavioral characteristics. Additional cleanups,
fixes, and improvements are also included:- heavily refactor seccomp selftests (and clone3 selftests
dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
Cascardo)- fix style issue in selftests (Zou Wei)
- upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
Felker)- replace task_pt_regs(current) with current_pt_regs() (Denis
Efremov)- fix corner-case race in USER_NOTIF (Jann Horn)
- make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"
* tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
seccomp: Make duplicate listener detection non-racy
seccomp: Move config option SECCOMP to arch/Kconfig
selftests/clone3: Avoid OS-defined clone_args
selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
selftests/seccomp: Allow syscall nr and ret value to be set separately
selftests/seccomp: Record syscall during ptrace entry
selftests/seccomp: powerpc: Fix seccomp return value testing
selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
selftests/seccomp: Avoid redundant register flushes
selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Remove syscall setting #ifdefs
selftests/seccomp: mips: Remove O32-specific macro
selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
selftests/seccomp: Provide generic syscall setting macro
selftests/seccomp: Refactor arch register macros to avoid xtensa special case
selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
selftests/seccomp: Use bitwise instead of arithmetic operator for flags
...
13 Oct, 2020
1 commit
-
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
12 Oct, 2020
11 commits
-
Provide functionality roughly compatible with the existing qemu
ifup scripting:
* invocation of an ifup script. The interface name is passed as the
first and only argument
* allocating tap interfaces on the fly if they are not explicitly
specifiedSigned-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger -
We currently get a few stray newlines, due to the interaction
between printk() and the code here. Remove a few explicit
newline prints to neaten the output.Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger -
pids are no longer limited to 16-bits, bump to 32-bits,
ie. 9 decimal characters. Additionally sizeof("/") already
returns 2 - ie. it already accounts for trailing zero.Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Anton Ivanov
Cc: Linux UM Mailing List
Signed-off-by: Maciej Żenczykowski
Signed-off-by: Richard Weinberger -
Fix below warnings reported by coccicheck:
./arch/um/drivers/vector_user.c:403:2-7: WARNING: NULL check before some freeing functions is not needed.Fixes: bc8f8e4e6e7a ("um: Add a generic "fd" vector transport")
Signed-off-by: Li Heng
Signed-off-by: Richard Weinberger -
Lockdep complains at boot:
=============================
[ BUG: Invalid wait context ]
5.7.0-05093-g46d91ecd597b #98 Not tainted
-----------------------------
swapper/1 is trying to lock:
0000000060931b98 (&desc[i].request_mutex){+.+.}-{3:3}, at: __setup_irq+0x11d/0x623
other info that might help us debug this:
context-{4:4}
1 lock held by swapper/1:
#0: 000000006074fed8 (sigio_spinlock){+.+.}-{2:2}, at: sigio_lock+0x1a/0x1c
stack backtrace:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0-05093-g46d91ecd597b #98
Stack:
7fa4fab0 6028dfd1 0000002a 6008bea5
7fa50700 7fa50040 7fa4fac0 6028e016
7fa4fb50 6007f6da 60959c18 00000000
Call Trace:
[] show_stack+0x13b/0x155
[] dump_stack+0x2a/0x2c
[] __lock_acquire+0x515/0x15f2
[] lock_acquire+0x245/0x273
[] __mutex_lock+0xbd/0x325
[] mutex_lock_nested+0x1d/0x1f
[] __setup_irq+0x11d/0x623
[] request_threaded_irq+0x169/0x1a6
[] um_request_irq+0x1ee/0x24b
[] write_sigio_irq+0x3b/0x76
[] sigio_broken+0x146/0x2e4
[] do_one_initcall+0xde/0x281Because we hold sigio_spinlock and then get into requesting
an interrupt with a mutex.Change the spinlock to a mutex to avoid that.
Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger -
For external time travel, the protocol says to return the
incoming sequence number in the ACK message to aid debugging,
so do that.Signed-off-by: Johannes Berg
Acked-By: Anton Ivanov
Signed-off-by: Richard Weinberger -
As the comment here indicates, we need to do the polling in the
idle loop without blocking interrupts, since interrupts can be
vhost-user messages that we must process even while in our idle
loop.I don't know why I explained one thing and implemented another,
but we have indeed observed random hangs due to this, depending
on the timing of the messages.Fixes: 88ce64249233 ("um: Implement time-travel=ext")
Signed-off-by: Johannes Berg
Acked-By: Anton Ivanov
Signed-off-by: Richard Weinberger -
It is possible to produce a statically linked UML binary with UML_NET_VECTOR,
UML_NET_VDE and UML_NET_PCAP options enabled using alternative libc
implementations, which do not rely on NSS, such as musl.Allow static linking in this case.
Signed-off-by: Ignat Korchagin
Reviewed-by: Brendan Higgins
Tested-by: Brendan Higgins
Signed-off-by: Richard Weinberger -
musl toolchain and headers are a bit more strict. These fixes enable building
UML with musl as well as seem not to break on glibc.Signed-off-by: Ignat Korchagin
Tested-by: Brendan Higgins
Signed-off-by: Richard Weinberger -
Use GFP_ATOMIC instead of GFP_KERNEL under spin lock to fix possible
sleep-in-atomic-context bugs.Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers")
Signed-off-by: Tiezhu Yang
Acked-By: Anton Ivanov
Signed-off-by: Richard Weinberger -
The bpf_prog is being checked for !NULL after uml_kmalloc
but later its used directly for example:
bpf_prog->filter = bpf and is also later returned upon
success. Fix this, do a NULL check and return right away.Signed-off-by: Gaurav Singh
Acked-By: Anton Ivanov
Signed-off-by: Richard Weinberger
09 Oct, 2020
1 commit
-
In order to make adding configurable features into seccomp easier,
it's better to have the options at one single location, considering
especially that the bulk of seccomp code is arch-independent. An quick
look also show that many SECCOMP descriptions are outdated; they talk
about /proc rather than prctl.As a result of moving the config option and keeping it default on,
architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
on by default prior to this and SECCOMP will be default in this change.Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
outdated depend on PROC_FS and this dependency is removed in this change.Suggested-by: Jann Horn
Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
Signed-off-by: YiFei Zhu
[kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu
24 Sep, 2020
1 commit
-
There was a request to preprocess the module linker script like we
do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512)The difference between vmlinux.lds and module.lds is that the latter
is needed for external module builds, thus must be cleaned up by
'make mrproper' instead of 'make clean'. Also, it must be created
by 'make modules_prepare'.You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by
'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to
arch/$(SRCARCH)/include/asm/module.lds.h, which is included from
scripts/module.lds.S.scripts/module.lds is fine because 'make clean' keeps all the
build artifacts under scripts/.You can add arch-specific sections in .
Signed-off-by: Masahiro Yamada
Tested-by: Jessica Yu
Acked-by: Will Deacon
Acked-by: Geert Uytterhoeven
Acked-by: Palmer Dabbelt
Reviewed-by: Kees Cook
Acked-by: Jessica Yu
09 Sep, 2020
1 commit
-
Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.Signed-off-by: Christoph Hellwig
Reviewed-by: Kees Cook
Signed-off-by: Al Viro
01 Sep, 2020
1 commit
-
The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.Signed-off-by: Kees Cook
Signed-off-by: Ingo Molnar
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
24 Aug, 2020
1 commit
-
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva
16 Aug, 2020
1 commit
-
Pull arch/sh updates from Rich Felker:
"Cleanup, SECCOMP_FILTER support, message printing fixes, and other
changes to arch/sh"* tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
sh: landisk: Add missing initialization of sh_io_port_base
sh: bring syscall_set_return_value in line with other architectures
sh: Add SECCOMP_FILTER
sh: Rearrange blocks in entry-common.S
sh: switch to copy_thread_tls()
sh: use the generic dma coherent remap allocator
sh: don't allow non-coherent DMA for NOMMU
dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
sh: unexport register_trapped_io and match_trapped_io_handler
sh: don't include in
sh: move the ioremap implementation out of line
sh: move ioremap_fixed details out of
sh: remove __KERNEL__ ifdefs from non-UAPI headers
sh: sort the selects for SUPERH alphabetically
sh: remove -Werror from Makefiles
sh: Replace HTTP links with HTTPS ones
arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
sh: stacktrace: Remove stacktrace_ops.stack()
sh: machvec: Modernize printing of kernel messages
sh: pci: Modernize printing of kernel messages
...
15 Aug, 2020
1 commit
-
Have a single definition that architetures can select.
Signed-off-by: Christoph Hellwig
Signed-off-by: Rich Felker
13 Aug, 2020
3 commits
-
Merge more updates from Andrew Morton:
- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).* emailed patches from Andrew Morton : (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
... -
Here're the last pieces of page fault accounting that were still done
outside handle_mm_fault() where we still have regs==NULL when calling
handle_mm_fault():arch/powerpc/mm/copro_fault.c: copro_handle_mm_fault
arch/sparc/mm/fault_32.c: force_user_fault
arch/um/kernel/trap.c: handle_page_fault
mm/gup.c: faultin_page
fixup_user_fault
mm/hmm.c: hmm_vma_fault
mm/ksm.c: break_ksmSome of them has the issue of duplicated accounting for page fault
retries. Some of them didn't do the accounting at all.This patch cleans all these up by letting handle_mm_fault() to do per-task
page fault accounting even if regs==NULL (though we'll still skip the perf
event accountings). With that, we can safely remove all the outliers now.There's another functional change in that now we account the page faults
to the caller of gup, rather than the task_struct that passed into the gup
code. More information of this can be found at [1].After this patch, below things should never be touched again outside
handle_mm_fault():- task_struct.[maj|min]_flt
- PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN][1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Albert Ou
Cc: Alexander Gordeev
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: Dave Hansen
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: John Hubbard
Cc: Jonas Bonn
Cc: Ley Foon Tan
Cc: "Luck, Tony"
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Nick Hu
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200707225021.200906-25-peterx@redhat.com
Signed-off-by: Linus Torvalds -
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
What this series did:
- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.Case (3): there're still quite some archs that have not enabled
this perf event.Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.Patchset layout:
Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any moreThis patch (of 25):
This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Albert Ou
Cc: Alexander Gordeev
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: Dave Hansen
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: John Hubbard
Cc: Jonas Bonn
Cc: Ley Foon Tan
Cc: "Luck, Tony"
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Nick Hu
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds