Eric Lee / smarc-fsl-linux-kernel

06 Jan, 2021

1 commit

13f9eec22 s390: always clear kernel stack backchain before calling functions ... Browse Code »

[ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ]

Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.

Signed-off-by: Heiko Carstens
Signed-off-by: Sasha Levin

Heiko Carstens
2021-01-06 21:56:55 +0800

30 Dec, 2020

3 commits

24d9a8ef1 s390/idle: fix accounting with machine checks ... Browse Code »

commit 454efcf82ea17d7efeb86ebaa20775a21ec87d27 upstream.

When a machine check interrupt is triggered during idle, the code
is using the async timer/clock for idle time calculation. It should use
the machine check enter timer/clock which is passed to the macro.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800
d5d21549d s390/idle: add missing mt_cycles calculation ... Browse Code »

commit e259b3fafa7de362b04ecd86e7fa9a9e9273e5fb upstream.

During removal of the critical section cleanup the calculation
of mt_cycles during idle was removed. This causes invalid
accounting on systems with SMT enabled.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800
0063e1142 s390/smp: perform initial CPU reset also for SMT siblings ... Browse Code »

commit b5e438ebd7e808d1d2435159ac4742e01a94b8da upstream.

Not resetting the SMT siblings might leave them in unpredictable
state. One of the observed problems was that the CPU timer wasn't
reset and therefore large system time values where accounted during
CPU bringup.

Cc: # 4.0
Fixes: 10ad34bc76dfb ("s390: add SMT support")
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800

03 Dec, 2020

1 commit

b1cae1f84 s390: fix irq state tracing ... Browse Code »

With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Heiko Carstens

Heiko Carstens
2020-12-03 01:17:50 +0800

30 Nov, 2020

1 commit

f91a3aa6b Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.

Similar to the entry path the low level idle functions have to be
non-instrumentable"

* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing

Linus Torvalds
2020-11-30 03:19:26 +0800

28 Nov, 2020

1 commit

3913a2bc8 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace

S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call

x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup

Linus Torvalds
2020-11-28 03:04:13 +0800

25 Nov, 2020

1 commit

80145ac2f Merge tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fix from Heiko Carstens:
"Disable interrupts when restoring fpu and vector registers, otherwise
KVM guests might see corrupted register contents"

* tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix fpu restore in entry.S

Linus Torvalds
2020-11-25 04:15:44 +0800

24 Nov, 2020

1 commit

58c644ba5 sched/idle: Fix arch_cpu_idle() vs tracing ... Browse Code »

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Mark Rutland
Tested-by: Mark Rutland
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

Peter Zijlstra
2020-11-24 23:47:35 +0800

23 Nov, 2020

1 commit

1179f170b s390: fix fpu restore in entry.S ... Browse Code »

We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.

The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.

Cc: # 5.8
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel
Signed-off-by: Sven Schnelle
Acked-by: Christian Borntraeger
Reviewed-by: Heiko Carstens
Signed-off-by: Heiko Carstens

Sven Schnelle
2020-11-23 18:52:13 +0800

19 Nov, 2020

1 commit

79af02af1 Merge tag 'kvm-s390-master-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/kvms390/linux into kvm-master

KVM: s390: Fix for destroy page ultravisor call

- handle response code from older firmware
- add uv.c to KVM: s390/s390 maintainer list

Paolo Bonzini
2020-11-19 01:04:05 +0800

18 Nov, 2020

2 commits

4c80d0571 s390/uv: handle destroy page legacy interface ... Browse Code »

Older firmware can return rc=0x107 rrc=0xd for destroy page if the
page is already non-secure. This should be handled like a success
as already done by newer firmware.

Signed-off-by: Christian Borntraeger
Fixes: 1a80b54d1ce1 ("s390/uv: add destroy page call")
Reviewed-by: David Hildenbrand
Acked-by: Cornelia Huck
Reviewed-by: Janosch Frank

Christian Borntraeger
2020-11-18 20:08:49 +0800
111e91a6d Merge tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fixes from Heiko Carstens:

- fix system call exit path; avoid return to user space with any
TIF/CIF/PIF set

- fix file permission for cpum_sfb_size parameter

- another small defconfig update

* tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_sf.c: fix file permission for cpum_sfb_size
s390: update defconfigs
s390: fix system call exit path

Linus Torvalds
2020-11-18 03:22:03 +0800

12 Nov, 2020

1 commit

78d732e1f s390/cpum_sf.c: fix file permission for cpum_sfb_size ... Browse Code »

This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.

Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.

For further reference and details see:
[1] https://github.com/ibm-s390-tools/s390-tools/issues/97

Fixes: 69f239ed335a ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: # 3.14
Signed-off-by: Thomas Richter
Acked-by: Sumanth Korikkar
Signed-off-by: Heiko Carstens

Thomas Richter
2020-11-12 19:10:36 +0800

10 Nov, 2020

2 commits

76a4efa80 perf/arch: Remove perf_sample_data::regs_user_copy ... Browse Code »

struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Steven Rostedt
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org

Peter Zijlstra
2020-11-10 01:12:34 +0800
267fb2735 perf: Reduce stack usage of perf_output_begin() ... Browse Code »

__perf_output_begin() has an on-stack struct perf_sample_data in the
unlikely case it needs to generate a LOST record. However, every call
to perf_output_begin() must already have a perf_sample_data on-stack.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org

Peter Zijlstra
2020-11-10 01:12:33 +0800

09 Nov, 2020

1 commit

ce9dfafe2 s390: fix system call exit path ... Browse Code »

The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.

This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.

Fix this by checking the corresponding bits only when interrupts are
disabled.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Acked-by: Sven Schnelle
Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-09 18:16:11 +0800

03 Nov, 2020

2 commits

de5d9dae1 s390/smp: move rcu_cpu_starting() earlier ... Browse Code »

The call to rcu_cpu_starting() in smp_init_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:

WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.

Call Trace:
show_stack+0x158/0x1f0
dump_stack+0x1f2/0x238
__lock_acquire+0x2640/0x4dd0
lock_acquire+0x3a8/0xd08
_raw_spin_lock_irqsave+0xc0/0xf0
clockevents_register_device+0xa8/0x528
init_cpu_timer+0x33e/0x468
smp_init_secondary+0x11a/0x328
smp_start_secondary+0x82/0x88

This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the smp_init_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai
Acked-by: Paul E. McKenney
Signed-off-by: Heiko Carstens

Qian Cai
2020-11-03 22:12:16 +0800
cfef9aa69 s390/vdso: remove unused constants ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-03 22:12:16 +0800

24 Oct, 2020

1 commit

4a22709e2 Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block ... Browse Code »

Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:

- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().

- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

Linus Torvalds
2020-10-24 01:06:38 +0800

23 Oct, 2020

1 commit

746b25b1a Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries

- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy

- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script

- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions

- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

- Use sha1 build id for both BFD linker and LLD

- Improve deb-pkg for reproducible builds and rootless builds

- Remove stale, useless scripts/namespace.pl

- Turn -Wreturn-type warning into error

- Fix build error of deb-pkg when CONFIG_MODULES=n

- Replace 'hostname' command with more portable 'uname -n'

- Various Makefile cleanups

* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
...

Linus Torvalds
2020-10-23 04:13:57 +0800

19 Oct, 2020

1 commit

ecb8ac8b1 mm/madvise: introduce process_madvise() syscall: an external memory hinting API ... Browse Code »

There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);

DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.

The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)

The pointer iovec points to an array of iovec structures, defined in
as:

struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};

The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).

The vlen represents the number of elements in iovec.

The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.

MADV_COLD
MADV_PAGEOUT

Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.

RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim
Signed-off-by: YueHaibing
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Reviewed-by: Suren Baghdasaryan
Reviewed-by: Vlastimil Babka
Acked-by: David Rientjes
Cc: Alexander Duyck
Cc: Brian Geffon
Cc: Christian Brauner
Cc: Daniel Colascione
Cc: Jann Horn
Cc: Jens Axboe
Cc: Joel Fernandes
Cc: Johannes Weiner
Cc: John Dias
Cc: Kirill Tkhai
Cc: Michal Hocko
Cc: Oleksandr Natalenko
Cc: Sandeep Patil
Cc: SeongJae Park
Cc: SeongJae Park
Cc: Shakeel Butt
Cc: Sonny Rao
Cc: Tim Murray
Cc: Christian Brauner
Cc: Florian Weimer
Cc:
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds

Minchan Kim
2020-10-19 00:27:10 +0800

18 Oct, 2020

1 commit

3c532798e tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() ... Browse Code »

All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov
Reviewed-by: Thomas Gleixner
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-18 05:04:36 +0800

17 Oct, 2020

1 commit

847d4287a Merge tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Vasily Gorbik:

- Remove address space overrides using set_fs()

- Convert to generic vDSO

- Convert to generic page table dumper

- Add ARCH_HAS_DEBUG_WX support

- Add leap seconds handling support

- Add NVMe firmware-assisted kernel dump support

- Extend NVMe boot support with memory clearing control and addition of
kernel parameters

- AP bus and zcrypt api code rework. Add adapter configure/deconfigure
interface. Extend debug features. Add failure injection support

- Add ECC secure private keys support

- Add KASan support for running protected virtualization host with
4-level paging

- Utilize destroy page ultravisor call to speed up secure guests
shutdown

- Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

- Various checksum improvements

- Other small various fixes and improvements all over the code

* tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
s390/uaccess: fix indentation
s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
s390/zcrypt: fix wrong format specifications
s390/kprobes: move insn_page to text segment
s390/sie: fix typo in SIGP code description
s390/lib: fix kernel doc for memcmp()
s390/zcrypt: Introduce Failure Injection feature
s390/zcrypt: move ap_msg param one level up the call chain
s390/ap/zcrypt: revisit ap and zcrypt error handling
s390/ap: Support AP card SCLP config and deconfig operations
s390/sclp: Add support for SCLP AP adapter config/deconfig
s390/ap: add card/queue deconfig state
s390/ap: add error response code field for ap queue devices
s390/ap: split ap queue state machine state from device state
s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
s390/zcrypt: introduce msg tracking in zcrypt functions
s390/startup: correct early pgm check info formatting
s390: remove orphaned extern variables declarations
s390/kasan: make sure int handler always run with DAT on
s390/ipl: add support to control memory clearing for nvme re-IPL
...

Linus Torvalds
2020-10-17 03:36:38 +0800

16 Oct, 2020

1 commit

5a32c3413 Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping ... Browse Code »

Pull dma-mapping updates from Christoph Hellwig:

- rework the non-coherent DMA allocator

- move private definitions out of

- lower CMA_ALIGNMENT (Paul Cercueil)

- remove the omap1 dma address translation in favor of the common code

- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

- support per-node DMA CMA areas (Barry Song)

- increase the default seg boundary limit (Nicolin Chen)

- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

- various cleanups

* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge into
dma-mapping: move large parts of to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove
dma-mapping: merge into
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...

Linus Torvalds
2020-10-16 05:43:29 +0800

14 Oct, 2020

2 commits

b10d6bca8 arch, drivers: replace for_each_membock() with for_each_mem_range() ... Browse Code »

There are several occurrences of the following pattern:

for_each_memblock(memory, reg) {
start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

/* do something with start and end */
}

Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.

[akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
[rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Miguel Ojeda
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-10-14 09:38:35 +0800
87c55870f memblock: make memblock_debug and related functionality private ... Browse Code »

The only user of memblock_dbg() outside memblock was s390 setup code and
it is converted to use pr_debug() instead. This allows to stop exposing
memblock_debug and memblock_dbg() to the rest of the kernel.

[akpm@linux-foundation.org: make memblock_dbg() safer and neater]

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christoph Hellwig
Cc: Daniel Axtens
Cc: Dave Hansen
Cc: Emil Renner Berthing
Cc: Hari Bathini
Cc: Ingo Molnar
Cc: Ingo Molnar
Cc: Jonathan Cameron
Cc: Marek Szyprowski
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Miguel Ojeda
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-10-14 09:38:35 +0800

13 Oct, 2020

5 commits

22230cd2c Merge branch 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull compat mount cleanups from Al Viro:
"The last remnants of mount(2) compat buried by Christoph.

Buried into NFS, that is.

Generally I'm less enthusiastic about "let's use in_compat_syscall()
deep in call chain" kind of approach than Christoph seems to be, but
in this case it's warranted - that had been an NFS-specific wart,
hopefully not to be repeated in any other filesystems (read: any new
filesystem introducing non-text mount options will get NAKed even if
it doesn't mess the layout up).

IOW, not worth trying to grow an infrastructure that would avoid that
use of in_compat_syscall()..."

* 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove compat_sys_mount
fs,nfs: lift compat nfs4 mount data handling into the nfs code
nfs: simplify nfs4_parse_monolithic

Linus Torvalds
2020-10-13 07:44:57 +0800
85ed13e78 Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in

Linus Torvalds
2020-10-13 07:35:51 +0800
1c6890707 Merge tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf/kprobes updates from Ingo Molnar:
"This prepares to unify the kretprobe trampoline handler and make
kretprobe lockless (those patches are still work in progress)"

* tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
kprobes: Make local functions static
kprobes: Free kretprobe_instance with RCU callback
kprobes: Remove NMI context check
sparc: kprobes: Use generic kretprobe trampoline handler
sh: kprobes: Use generic kretprobe trampoline handler
s390: kprobes: Use generic kretprobe trampoline handler
powerpc: kprobes: Use generic kretprobe trampoline handler
parisc: kprobes: Use generic kretprobe trampoline handler
mips: kprobes: Use generic kretprobe trampoline handler
ia64: kprobes: Use generic kretprobe trampoline handler
csky: kprobes: Use generic kretprobe trampoline handler
arc: kprobes: Use generic kretprobe trampoline handler
arm64: kprobes: Use generic kretprobe trampoline handler
arm: kprobes: Use generic kretprobe trampoline handler
x86/kprobes: Use generic kretprobe trampoline handler
kprobes: Add generic kretprobe trampoline handler

Linus Torvalds
2020-10-13 05:21:15 +0800
34eb62d86 Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.

Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.

And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"

* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...

Linus Torvalds
2020-10-13 04:39:19 +0800
6734e20e3 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.

In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.

Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.

We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.

Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.

Summary:

- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.

- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.

- Support for the Armv8.3 Pointer Authentication enhancements.

- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.

- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.

- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.

- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.

- Implementation of ARCH_STACKWALK for unwinding.

- Improve reporting of unexpected kernel traps due to BPF JIT
failure.

- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.

- Removal of TEXT_OFFSET.

- Removal of some unused functions, parameters and prototypes.

- Removal of MPIDR-based topology detection in favour of firmware
description.

- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.

- Cleanups to the SDEI driver in preparation for support in KVM.

- Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...

Linus Torvalds
2020-10-13 01:00:51 +0800

10 Oct, 2020

1 commit

b61e1f328 s390/kprobes: move insn_page to text segment ... Browse Code »

Move the in-kernel kprobes insn page to text segment. Rationale:
having that page in rw data segment is suboptimal, since as soon as a
kprobe is set, this will split the 1:1 kernel mapping for a single
page which get new permissions.

Note: there is always at least one kprobe present for the kretprobe
trampoline; so the mapping will always be split into smaller 4k
mappings because of this.

Moving the kprobes insn page into text segment makes sure that the
page is mapped RO/X in any case, and avoids that the 1:1 mapping is
split.

The kprobe insn_page is defined as a dummy function which is filled
with "br %r14" instructions.

Signed-off-by: Heiko Carstens
Signed-off-by: Vasily Gorbik

Heiko Carstens
2020-10-10 05:45:30 +0800

09 Oct, 2020

1 commit

a96843372 kbuild: explicitly specify the build id style ... Browse Code »

ld's --build-id defaults to "sha1" style, while lld defaults to "fast".
The build IDs are very different between the two, which may confuse
programs that reference them.

Signed-off-by: Bill Wendling
Acked-by: David S. Miller
Signed-off-by: Masahiro Yamada

Bill Wendling
2020-10-09 22:57:30 +0800

06 Oct, 2020

1 commit

0b1abd1fb dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> ... Browse Code »

Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.

Signed-off-by: Christoph Hellwig

Christoph Hellwig
2020-10-06 13:07:04 +0800

03 Oct, 2020

3 commits

c3973b401 mm: remove compat_process_vm_{readv,writev} ... Browse Code »

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2020-10-03 12:02:15 +0800
598b3cec8 fs: remove compat_sys_vmsplice ... Browse Code »

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2020-10-03 12:02:15 +0800
5f764d624 fs: remove the compat readv/writev syscalls ... Browse Code »

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2020-10-03 12:02:14 +0800

02 Oct, 2020

2 commits

100a980c1 s390: remove orphaned extern variables declarations ... Browse Code »

arch/s390/kernel/entry.h: suspend_zero_pages - only declaration left
after commit 394216275c7d ("s390: remove broken hibernate / power
management support")

arch/s390/include/asm/setup.h: vmhalt_cmd - only declaration left after
commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface")

arch/s390/include/asm/setup.h: vmpoff_cmd - only declaration left after
commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface")

Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-10-02 20:40:48 +0800
21a667170 s390/kasan: make sure int handler always run with DAT on ... Browse Code »

Since commit 998f5bbe3dbd ("s390/kasan: fix early pgm check handler
execution") early pgm check handler is executed with DAT on if Kasan
is enabled.

Still there is a window between setup_lowcore_dat_off() and
setup_lowcore_dat_on() when int handlers could be executed with DAT off
under Kasan. If this happens the kernel ends up in pgm check loop due
to Kasan shadow memory access attempts.

With Kasan enabled paging is initialized much earlier and DAT flag has to
be on at all times instrumented code is executed. Make sure int handlers
are set up to be called with DAT on right away in this case.

Signed-off-by: Vasily Gorbik

Vasily Gorbik
2020-10-02 20:40:48 +0800