Eric Lee / smarc-fsl-linux-kernel

23 Aug, 2019

1 commit

e0917f879 um: fix time travel mode ... Browse Code »

Unfortunately, my build fix for when time travel mode isn't
enabled broke time travel mode, because I forgot that we need
to use the timer time after the timer has been marked disabled,
and thus need to leave the time stored instead of zeroing it.

Fix that by splitting the inline into two, so we can call only
the _mode() one in the relevant code path.

Fixes: b482e48d29f1 ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT")
Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-08-23 06:39:53 +0800

15 Jul, 2019

1 commit

f2772a0e4 Merge tag 'for-linus-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml ... Browse Code »

Pull UML updates from Richard Weinberger:

- A new timer mode, time travel, for testing with UML

- Many bugixes/improvements for the serial line driver

- Various bugfixes

* tag 'for-linus-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT
um: Fix kcov crash during startup
um: configs: Remove useless UEVENT_HELPER_PATH
um: Support time travel mode
um: Pass nsecs to os timer functions
um: Remove drivers/ssl.h
um: Don't garbage collect in deactivate_all_fds()
um: Silence lockdep complaint about mmap_sem
um: Remove locking in deactivate_all_fds()
um: Timer code cleanup
um: fix os_timer_one_shot()
um: Fix IRQ controller regression on console read

Linus Torvalds
2019-07-15 08:17:34 +0800

13 Jul, 2019

2 commits

39ceda5ce Merge tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- remove headers_{install,check}_all targets

- remove unreasonable 'depends on !UML' from CONFIG_SAMPLES

- re-implement 'make headers_install' more cleanly

- add new header-test-y syntax to compile-test headers

- compile-test exported headers to ensure they are compilable in
user-space

- compile-test headers under include/ to ensure they are self-contained

- remove -Waggregate-return, -Wno-uninitialized, -Wno-unused-value
flags

- add -Werror=unknown-warning-option for Clang

- add 128-bit built-in types support to genksyms

- fix missed rebuild of modules.builtin

- propagate 'No space left on device' error in fixdep to Make

- allow Clang to use its integrated assembler

- improve some coccinelle scripts

- add a new flag KBUILD_ABS_SRCTREE to request Kbuild to use absolute
path for $(srctree).

- do not ignore errors when compression utility is missing

- misc cleanups

* tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (49 commits)
kbuild: use -- separater intead of $(filter-out ...) for cc-cross-prefix
kbuild: Inform user to pass ARCH= for make mrproper
kbuild: fix compression errors getting ignored
kbuild: add a flag to force absolute path for srctree
kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree
kbuild: remove src and obj from the top Makefile
scripts/tags.sh: remove unused environment variables from comments
scripts/tags.sh: drop SUBARCH support for ARM
kbuild: compile-test kernel headers to ensure they are self-contained
kheaders: include only headers into kheaders_data.tar.xz
kheaders: remove meaningless -R option of 'ls'
kbuild: support header-test-pattern-y
kbuild: do not create wrappers for header-test-y
kbuild: compile-test exported headers to ensure they are self-contained
init/Kconfig: add CONFIG_CC_CAN_LINK
kallsyms: exclude kasan local symbols on s390
kbuild: add more hints about SUBDIRS replacement
coccinelle: api/stream_open: treat all wait_.*() calls as blocking
coccinelle: put_device: Add a cast to an expression for an assignment
coccinelle: put_device: Adjust a message construction
...

Linus Torvalds
2019-07-13 07:03:16 +0800
f32848e16 um: switch to generic version of pte allocation ... Browse Code »

um allocates PTE pages with __get_free_page() and uses
GFP_KERNEL | __GFP_ZERO for the allocations.

Switch it to the generic version that does exactly the same thing for the
kernel page tables and adds __GFP_ACCOUNT for the user PTEs.

The pte_free() and pte_free_kernel() versions are identical to the generic
ones and can be simply dropped.

Link: http://lkml.kernel.org/r/1557296232-15361-14-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Anton Ivanov
Acked-by: Anton Ivanov
Cc: Albert Ou
Cc: Anshuman Khandual
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Guo Ren
Cc: Helge Deller
Cc: Ley Foon Tan
Cc: Matthew Wilcox
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Ralf Baechle
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Russell King
Cc: Sam Creasey
Cc: Vincent Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-07-13 02:05:45 +0800

10 Jul, 2019

1 commit

75dd47472 kbuild: remove src and obj from the top Makefile ... Browse Code »

Replace $(src) and $(obj) with $(srctree) and $(objtree), respectively.

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2019-07-10 23:05:09 +0800

09 Jul, 2019

1 commit

5ad18b2e6 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.

The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.

Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.

This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...

Linus Torvalds
2019-07-09 12:48:15 +0800

04 Jul, 2019

1 commit

b482e48d2 um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT ... Browse Code »

When CONFIG_UML_TIME_TRAVEL_SUPPORT isn't set, the build was broken.
Fix this.

Fixes: 065038706f77 ("um: Support time travel mode")
Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-04 15:52:18 +0800

03 Jul, 2019

11 commits

c4683cd5f um: Fix kcov crash during startup ... Browse Code »

Kcov fails to start when compiled with kcov. Disable KCOV on
arch/uml/kernel/skas.

$ gdb -q -ex r ./vmlinux
Program received signal SIGSEGV, Segmentation fault.
check_kcov_mode (t=<>, needed_mode=<>) at kernel/kcov.c:70
70 mode = READ_ONCE(t->kcov_mode);

Signed-off-by: Marek Majkowski
Signed-off-by: Richard Weinberger

Marek Majkowski
2019-07-03 05:27:42 +0800
80b81cdc6 um: configs: Remove useless UEVENT_HELPER_PATH ... Browse Code »

Remove the CONFIG_UEVENT_HELPER_PATH because:
1. It is disabled since commit 1be01d4a5714 ("driver: base: Disable
CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was
made default to 'n',
2. It is not recommended (help message: "This should not be used today
[...] creates a high system load") and was kept only for ancient
userland,
3. Certain userland specifically requests it to be disabled (systemd
README: "Legacy hotplug slows down the system and confuses udev").

Signed-off-by: Krzysztof Kozlowski
Acked-by: Geert Uytterhoeven
Signed-off-by: Richard Weinberger

Krzysztof Kozlowski
2019-07-03 05:27:41 +0800
065038706 um: Support time travel mode ... Browse Code »

Sometimes it can be useful to run with "time travel" inside the
UML instance, for example for testing. For example, some tests
for the wireless subsystem and userspace are based on hwsim, a
virtual wireless adapter. Some tests can take a long time to
run because they e.g. wait for 120 seconds to elapse for some
regulatory checks. This obviously goes faster if it need not
actually wait that long, but time inside the test environment
just "bumps up" when there's nothing to do.

Add CONFIG_UML_TIME_TRAVEL_SUPPORT to enable code to support
such modes at runtime, selected on the command line:
* just "time-travel", in which time inside the UML instance
can move faster than real time, if there's nothing to do
* "time-travel=inf-cpu" in which time also moves slower and
any CPU processing takes no time at all, which allows to
implement consistent behaviour regardless of host CPU load
(or speed) or debug overhead.

An additional "time-travel-start=" parameter is also
supported in this case to start the wall clock at this time
(in unix epoch).

With this enabled, the test mentioned above goes from a runtime
of about 140 seconds (with startup overhead and all) to being
CPU bound and finishing in 15 seconds (on my slow laptop).

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:36 +0800
c7c6f3b95 um: Pass nsecs to os timer functions ... Browse Code »

This makes the code clearer and lets the time travel patch have
the actual time used for these functions in just one place.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:29 +0800
b00bdd324 um: Remove drivers/ssl.h ... Browse Code »

This file just contains two unused prototypes, remove it.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:24 +0800
c7f04e87e um: Don't garbage collect in deactivate_all_fds() ... Browse Code »

My previous commit didn't actually address the whole issue with
lockdep shutdown, I had another local modification that disabled
lockdep but that wasn't sufficient alone, so had to do the other
change.

Another issue remained though - during kfree() we acquire locks
and lockdep tries to annotate those with exactly the same issue
in the other patch - we no longer have "current".

So, just remove the garbage collection. There's no value in it
anyway since we're going to shut down anyway and marking a slab
object as free is now not very useful anymore.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:19 +0800
80bf6ceaf um: Silence lockdep complaint about mmap_sem ... Browse Code »

When we get into activate_mm(), lockdep complains that we're doing
something strange:

WARNING: possible circular locking dependency detected
5.1.0-10252-gb00152307319-dirty #121 Not tainted
------------------------------------------------------
inside.sh/366 is trying to acquire lock:
(____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7

but task is already holding lock:
(____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++++}:
[...]
__lock_acquire+0x12ab/0x139f
lock_acquire+0x155/0x18e
down_write+0x3f/0x98
flush_old_exec+0x748/0x8d7
load_elf_binary+0x2ca/0xddb
[...]

-> #0 (&(&p->alloc_lock)->rlock){+.+.}:
[...]
__lock_acquire+0x12ab/0x139f
lock_acquire+0x155/0x18e
_raw_spin_lock+0x30/0x83
flush_old_exec+0x703/0x8d7
load_elf_binary+0x2ca/0xddb
[...]

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&(&p->alloc_lock)->rlock);
lock(&mm->mmap_sem);
lock(&(&p->alloc_lock)->rlock);

*** DEADLOCK ***

2 locks held by inside.sh/366:
#0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
#1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7

stack backtrace:
CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
Stack:
[...]
Call Trace:
[] show_stack+0x13b/0x155
[] dump_stack+0x2a/0x2c
[] print_circular_bug+0x332/0x343
[] check_prev_add+0x669/0xdad
[] __lock_acquire+0x12ab/0x139f
[] lock_acquire+0x155/0x18e
[] _raw_spin_lock+0x30/0x83
[] flush_old_exec+0x703/0x8d7
[] load_elf_binary+0x2ca/0xddb
[...]

I think it's because in exec_mmap() we have

down_read(&old_mm->mmap_sem);
...
task_lock(tsk);
...
activate_mm(active_mm, mm);
(which does down_write(&mm->mmap_sem))

I'm not really sure why lockdep throws in the whole knowledge
about the task lock, but it seems that old_mm and mm shouldn't
ever be the same (and it doesn't deadlock) so tell lockdep that
they're different.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:11 +0800
8eacd6fca um: Remove locking in deactivate_all_fds() ... Browse Code »

Not only does the locking contradict the comment, and as
the comment says is pointless and actually harmful (all
the actual OS threads have exited already), but it also
causes crashes when lockdep is enabled, because calling
into the spinlock calls into lockdep, which then tries
to determine the current task, which no longer exists.

Remove the locking to let UML shut down cleanly in case
lockdep is enabled.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:05 +0800
56fc18706 um: Timer code cleanup ... Browse Code »

There are some unused functions, and some others that have
unused arguments; clean up the timer code a bit.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:27:00 +0800
fcd242c6c um: fix os_timer_one_shot() ... Browse Code »

os_timer_one_shot() gets passed a value "unsigned long delta",
so must not have an "int ticks" as that actually ends up being
-1, and thus triggering a timer over and over again.

Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Johannes Berg
2019-07-03 05:26:57 +0800
bebe4681d um: Fix IRQ controller regression on console read ... Browse Code »

The conversion of UML to use epoll based IRQ controller claimed that
clone_one_chan() can safely call um_free_irq() while starting to ignore
the delay_free_irq parameter that explicitly noted that the IRQ cannot
be freed because this is being called from chan_interrupt(). This
resulted in free_irq() getting called in interrupt context ("Trying to
free IRQ 6 from IRQ context!").

Fix this by restoring previously used delay_free_irq processing.

Fixes: ff6a17989c08 ("Epoll based IRQ controller")
Signed-off-by: Jouni Malinen
Signed-off-by: Johannes Berg
Signed-off-by: Richard Weinberger

Jouni Malinen
2019-07-03 05:26:52 +0800

19 Jun, 2019

1 commit

d2912cb15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 ... Browse Code »

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Kate Stewart
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:55 +0800

31 May, 2019

1 commit

96ac6d435 treewide: Add SPDX license identifier - Kbuild ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0

Reported-by: Masahiro Yamada
Signed-off-by: Greg Kroah-Hartman
Reviewed-by: Kate Stewart
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2019-05-31 02:32:33 +0800

29 May, 2019

2 commits

2e1661d26 signal: Remove the task parameter from force_sig_fault ... Browse Code »

As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.

The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.

The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:

force_sig_fault[(]$[^,]+$[,]$[^,]+$[,]$[^,]+$[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-29 22:31:43 +0800
9d6317598 signal/um: Remove task parameter from send_sigtrap ... Browse Code »

The send_sigtrap function is always called with task == current. Make
that explicit by removing the task parameter.

This also makes it clear that the uml send_sigtrap passes current
into force_sig_fault.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-29 22:31:42 +0800

27 May, 2019

2 commits

3cf5d076f signal: Remove task parameter from force_sig ... Browse Code »

All of the remaining callers pass current into force_sig so
remove the task parameter to make this obvious and to make
misuse more difficult in the future.

This also makes it clear force_sig passes current into force_sig_info.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-27 22:36:28 +0800
cb44c9a0a signal: Remove task parameter from force_sigsegv ... Browse Code »

The function force_sigsegv is always called on the current task
so passing in current is redundant and not passing in current
makes this fact obvious.

This also makes it clear force_sigsegv always calls force_sig
on the current task.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-27 22:36:28 +0800

21 May, 2019

1 commit

09c434b8a treewide: Add SPDX license identifier for more missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

20 May, 2019

1 commit

1335d9a1f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core fixes from Ingo Molnar:
"This fixes a particularly thorny munmap() bug with MPX, plus fixes a
host build environment assumption in objtool"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Allow AR to be overridden with HOSTAR
x86/mpx, mm/core: Fix recursive munmap() corruption

Linus Torvalds
2019-05-20 01:23:24 +0800

15 May, 2019

1 commit

4afd58e14 initramfs: provide a generic free_initrd_mem implementation ... Browse Code »

For most architectures free_initrd_mem just expands to the same
free_reserved_area call. Provide that as a generic implementation marked
__weak.

Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Geert Uytterhoeven [m68k]
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800

13 May, 2019

1 commit

983dfa4b6 Merge tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml ... Browse Code »

Pull UML updates from Richard Weinberger:

- Kconfig cleanups

- Fix cpu_all_mask() usage

- Various bug fixes

* tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: irq: don't set the chip for all irqs
um: define set_pte_at() as a static inline function, not a macro
um: remove uses of variable length arrays
um: remove unused variable
uml: fix a boot splat wrt use of cpu_all_mask
um: Do not unlock mutex that is not hold.
hostfs: fix mismatch between link_file definition and declaration
arch: um: drivers: Kconfig: pedantic formatting
arch: um: Kconfig: pedantic indention cleanups
um: Revert to using stack for pt_regs in signal handling

Linus Torvalds
2019-05-13 05:52:13 +0800

09 May, 2019

1 commit

5a28fc94c x86/mpx, mm/core: Fix recursive munmap() corruption ... Browse Code »

This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.

MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.

But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.

See the "long story" further below for the gory details.

To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.

== UML / unicore32 impact ==

Remove unused 'vma' argument to arch_unmap(). No functional
change.

I compile tested this on UML but not unicore32.

== powerpc impact ==

powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.

powerpc does not use the 'vma' argument and is unaffected by
its removal.

I compile-tested a 64-bit powerpc defconfig.

== x86 impact ==

For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).

I can't imagine anyone doing this:

ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.

Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.

This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.

The long story:

munmap() has a couple of pieces:

1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.

This specific ordering was actually introduced by:

dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")

during the 4.20 merge window. The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list(). arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.

Richard Biener reported a test that shows this in dmesg:

[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576

What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.

But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.

I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.

Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.

This was also reported in the following kernel bugzilla entry:

https://bugzilla.kernel.org/show_bug.cgi?id=203123

There are some reports that this commit triggered this bug:

dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap")

While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.

[ mingo: Minor edits to the changelog and the patch. ]

Reported-by: Richard Biener
Reported-by: H.J. Lu
Signed-off-by: Dave Hansen
Reviewed-by Thomas Gleixner
Reviewed-by: Yang Shi
Acked-by: Michael Ellerman
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Anton Ivanov
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Guan Xuetao
Cc: H. Peter Anvin
Cc: Jeff Dike
Cc: Linus Torvalds
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Richard Weinberger
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-um@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar

Dave Hansen
2019-05-09 16:37:17 +0800

08 May, 2019

11 commits

80f232121 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.

3) Make flow classifier more lockless, from Vlad Buslov.

4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.

5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.

6) Partial GSO offload support in XFRM, from Boris Pismenny.

7) Add fast link down support to ethtool, from Heiner Kallweit.

8) Use siphash for IP ID generator, from Eric Dumazet.

9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.

10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.

11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.

12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.

13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.

15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.

16) Remove SKB list implementation assumptions in SCTP, your's truly.

17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.

18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.

20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.

21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.

22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.

23) Allow DSA tag drivers to be modular, from Andrew Lunn.

24) Remove legacy DSA probing support, also from Andrew Lunn.

25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.

26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.

27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
...

Linus Torvalds
2019-05-08 13:03:58 +0800
1987b1b8f um: irq: don't set the chip for all irqs ... Browse Code »

Setting a chip for an interrupt marks it as allocated. Since UM doesn't
support dynamic interrupt numbers (yet), it means we cannot simply
increase NR_IRQS and then use the free irqs between LAST_IRQ and NR_IRQS
with gpio-mockup or iio testing drivers as irq_alloc_descs() will fail
after not being able to neither find an unallocated range of interrupts
nor expand the range.

Only call irq_set_chip_and_handler() for irqs until LAST_IRQ.

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Anton Ivanov
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger

Bartosz Golaszewski
2019-05-08 05:18:28 +0800
ea70d791c um: define set_pte_at() as a static inline function, not a macro ... Browse Code »

When defined as macro, the mm argument is unused and subsequently the
variable passed as mm is considered unused by the compiler. This fixes
a build warning.

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Geert Uytterhoeven
Reviewed-by: Anton Ivanov
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger

Bartosz Golaszewski
2019-05-08 05:18:28 +0800
0d4e5ac7e um: remove uses of variable length arrays ... Browse Code »

While the affected code is run in user-mode, the build still warns
about it. Convert all uses of VLA to dynamic allocations.

Signed-off-by: Bartosz Golaszewski
Signed-off-by: Richard Weinberger

Bartosz Golaszewski
2019-05-08 05:18:28 +0800
4b6b4c902 um: remove unused variable ... Browse Code »

The buf variable is unused. Remove it.

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Anton Ivanov
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger

Bartosz Golaszewski
2019-05-08 05:18:28 +0800
689a58605 uml: fix a boot splat wrt use of cpu_all_mask ... Browse Code »

Memory: 509108K/542612K available (3835K kernel code, 919K rwdata, 1028K rodata, 129K init, 211K bss, 33504K reserved, 0K cma-reserved)
NR_IRQS: 15
clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/time/clockevents.c:458 clockevents_register_device+0x72/0x140
posix-timer cpumask == cpu_all_mask, using cpu_possible_mask instead
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-00048-ged79cc87302b #4
Stack:
604ebda0 603c5370 604ebe20 6046fd17
00000000 6006fcbb 604ebdb0 603c53b5
604ebe10 6003bfc4 604ebdd0 9000001ca
Call Trace:
[] ? printk+0x0/0x94
[] ? clockevents_register_device+0x72/0x140
[] show_stack+0x13b/0x155
[] ? dump_stack_print_info+0xe2/0xeb
[] ? printk+0x0/0x94
[] dump_stack+0x2a/0x2c
[] __warn+0x10e/0x13e
[] ? vprintk_func+0xc8/0xcf
[] ? block_signals+0x0/0x16
[] ? printk+0x0/0x94
[] warn_slowpath_fmt+0x97/0x99
[] ? set_signals+0x0/0x3f
[] ? warn_slowpath_fmt+0x0/0x99
[] ? tick_oneshot_mode_active+0x44/0x4f
[] ? block_signals+0x0/0x16
[] ? printk+0x0/0x94
[] ? __clocksource_select+0x20/0x1b1
[] ? block_signals+0x0/0x16
[] ? printk+0x0/0x94
[] clockevents_register_device+0x72/0x140
[] ? get_signals+0x0/0xf
[] ? block_signals+0x0/0x16
[] ? printk+0x0/0x94
[] um_timer_setup+0xc8/0xca
[] start_kernel+0x47f/0x57e
[] start_kernel_proc+0x49/0x4d
[] ? kmsg_dump_register+0x82/0x8a
[] new_thread_handler+0x81/0xb2
[] ? kmsg_dumper_stdout_init+0x1a/0x1c
[] uml_finishsetup+0x54/0x59

random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
---[ end trace 00173d0117a88acb ]---
Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)

Signed-off-by: Maciej Żenczykowski
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Anton Ivanov
Cc: linux-um@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Richard Weinberger

Maciej Żenczykowski
2019-05-08 05:18:28 +0800
9ca55299f um: Do not unlock mutex that is not hold. ... Browse Code »

Return error instead of trying to unlock a mutex that is not hold.

Signed-off-by: Daniel Walter
Reviewed-by: Anton Ivanov
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger

Daniel Walter
2019-05-08 05:18:28 +0800
75f24f787 arch: um: drivers: Kconfig: pedantic formatting ... Browse Code »

Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up. Just indention changes.

Signed-off-by: Enrico Weigelt, metux IT consult
Signed-off-by: Richard Weinberger

Enrico Weigelt, metux IT consult
2019-05-08 05:18:28 +0800
37606596d arch: um: Kconfig: pedantic indention cleanups ... Browse Code »

Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up.

Signed-off-by: Enrico Weigelt, metux IT consult
Signed-off-by: Richard Weinberger

Enrico Weigelt, metux IT consult
2019-05-08 05:18:28 +0800
5c2ffce1e um: Revert to using stack for pt_regs in signal handling ... Browse Code »

Reverts commit b6024b21fec8367ef961a771cc9dde31f1831965 and
adjusts default stack sizing to cope with larger size of
floating point save registers on the newer Intel CPUs.

b6024b21fec8367ef961a771cc9dde31f1831965 replaced storing the
register state on the stack with kmalloc-ed storage. That has
a number of issues and a panic if that fails.
1. kmalloc/ATOMIC can fail. There was a latent hard crash
in all interrupt and fault handling as a result.
2. kmalloc in the interrupt path introduces a considerable
performance penalty for networking ~ 14% on iperf.

This commit restores uml to a stable state until a better
solution is found.

Signed-off-by: Anton Ivanov
Signed-off-by: Richard Weinberger

Anton Ivanov
2019-05-08 05:18:28 +0800
41bc10cab Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux ... Browse Code »

Pull stream_open conversion from Kirill Smelkov:

- remove unnecessary double nonseekable_open from drivers/char/dtlk.c
as noticed by Pavel Machek while reviewing nonseekable_open ->
stream_open mass conversion.

- the mass conversion patch promised in commit 10dce8af3422 ("fs:
stream_open - opener for stream-like files so that read and write can
run simultaneously without deadlock") and is automatically generated
by running

$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci

I've verified each generated change manually - that it is correct to
convert - and each other nonseekable_open instance left - that it is
either not correct to convert there, or that it is not converted due
to current stream_open.cocci limitations. More details on this in the
patch.

- finally, change VFS to pass ppos=NULL into .read/.write for files
that declare themselves streams. It was suggested by Rasmus Villemoes
and makes sure that if ppos starts to be erroneously used in a stream
file, such bug won't go unnoticed and will produce an oops instead of
creating illusion of position change being taken into account.

Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to
use stream_open()" that will be hopefully coming via FUSE tree,
because fs/fuse/ uses new-style .read_iter/.write_iter, and for these
accessors position is still passed as non-pointer kiocb.ki_pos .

* tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux:
vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
*: convert stream-like files from nonseekable_open -> stream_open
dtlk: remove double call to nonseekable_open

Linus Torvalds
2019-05-08 03:15:13 +0800