Eric Lee / smarc-fsl-linux-kernel

13 Jan, 2021

1 commit

2179bae04 local64.h: make <asm/local64.h> mandatory ... Browse Code »

[ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]

Make mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include .

This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.

Build-tested on 21 of 25 arch-es. (tools problems on the others)

Yes, we could even rename to
and change all #includes to use
instead.

Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap
Suggested-by: Christoph Hellwig
Reviewed-by: Masahiro Yamada
Cc: Jens Axboe
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Aurelien Jacquiot
Cc: Peter Zijlstra
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Randy Dunlap
2021-01-13 03:18:16 +0800

06 Jan, 2021

1 commit

13f9eec22 s390: always clear kernel stack backchain before calling functions ... Browse Code »

[ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ]

Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.

Signed-off-by: Heiko Carstens
Signed-off-by: Sasha Levin

Heiko Carstens
2021-01-06 21:56:55 +0800

30 Dec, 2020

5 commits

24d9a8ef1 s390/idle: fix accounting with machine checks ... Browse Code »

commit 454efcf82ea17d7efeb86ebaa20775a21ec87d27 upstream.

When a machine check interrupt is triggered during idle, the code
is using the async timer/clock for idle time calculation. It should use
the machine check enter timer/clock which is passed to the macro.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800
d5d21549d s390/idle: add missing mt_cycles calculation ... Browse Code »

commit e259b3fafa7de362b04ecd86e7fa9a9e9273e5fb upstream.

During removal of the critical section cleanup the calculation
of mt_cycles during idle was removed. This causes invalid
accounting on systems with SMT enabled.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800
bc8f8833e s390/kexec_file: fix diag308 subcode when loading crash kernel ... Browse Code »

commit 613775d62ec60202f98d2c5f520e6e9ba6dd4ac4 upstream.

diag308 subcode 0 performes a clear reset which inlcudes the reset of
all registers in the system. While this is the preferred behavior when
loading a normal kernel via kexec it prevents the crash kernel to store
the register values in the dump. To prevent this use subcode 1 when
loading a crash kernel instead.

Fixes: ee337f5469fd ("s390/kexec_file: Add crash support to image loader")
Cc: # 4.17
Signed-off-by: Philipp Rudo
Reported-by: Xiaoying Yan
Tested-by: Lianbo Jiang
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Philipp Rudo
2020-12-30 18:54:08 +0800
0063e1142 s390/smp: perform initial CPU reset also for SMT siblings ... Browse Code »

commit b5e438ebd7e808d1d2435159ac4742e01a94b8da upstream.

Not resetting the SMT siblings might leave them in unpredictable
state. One of the observed problems was that the CPU timer wasn't
reset and therefore large system time values where accounted during
CPU bringup.

Cc: # 4.0
Fixes: 10ad34bc76dfb ("s390: add SMT support")
Reviewed-by: Heiko Carstens
Signed-off-by: Sven Schnelle
Signed-off-by: Heiko Carstens
Signed-off-by: Greg Kroah-Hartman

Sven Schnelle
2020-12-30 18:54:08 +0800
185640586 s390/test_unwind: fix CALL_ON_STACK tests ... Browse Code »

[ Upstream commit f22b9c219a798e1bf11110a3d2733d883e6da059 ]

The CALL_ON_STACK tests use the no_dat stack to switch to a different
stack for unwinding tests. If an interrupt or machine check happens
while using that stack, and previously being on the async stack, the
interrupt / machine check entry code (SWITCH_ASYNC) will assume that
the previous context did not use the async stack and happily use the
async stack again.

This will lead to stack corruption of the previous context.

To solve this disable both interrupts and machine checks before
switching to the no_dat stack.

Fixes: 7868249fbbc8 ("s390/test_unwind: add CALL_ON_STACK tests")
Signed-off-by: Heiko Carstens
Signed-off-by: Sasha Levin

Heiko Carstens
2020-12-30 18:53:56 +0800

03 Dec, 2020

2 commits

b1cae1f84 s390: fix irq state tracing ... Browse Code »

With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Heiko Carstens

Heiko Carstens
2020-12-03 01:17:50 +0800
a2bd4097b s390/pci: fix CPU address in MSI for directed IRQ ... Browse Code »

The directed MSIs are delivered to CPUs whose address is
written to the MSI message address. The current code assumes
that a CPU logical number (as it is seen by the kernel)
is also the CPU address.

The above assumption is not correct, as the CPU address
is rather the value returned by STAP instruction. That
value does not necessarily match the kernel logical CPU
number.

Fixes: e979ce7bced2 ("s390/pci: provide support for CPU directed interrupts")
Cc: # v5.2+
Signed-off-by: Alexander Gordeev
Reviewed-by: Halil Pasic
Reviewed-by: Niklas Schnelle
Signed-off-by: Niklas Schnelle
Signed-off-by: Heiko Carstens

Alexander Gordeev
2020-12-03 01:17:50 +0800

30 Nov, 2020

1 commit

f91a3aa6b Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.

Similar to the entry path the low level idle functions have to be
non-instrumentable"

* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing

Linus Torvalds
2020-11-30 03:19:26 +0800

28 Nov, 2020

1 commit

3913a2bc8 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace

S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call

x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup

Linus Torvalds
2020-11-28 03:04:13 +0800

25 Nov, 2020

1 commit

80145ac2f Merge tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fix from Heiko Carstens:
"Disable interrupts when restoring fpu and vector registers, otherwise
KVM guests might see corrupted register contents"

* tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix fpu restore in entry.S

Linus Torvalds
2020-11-25 04:15:44 +0800

24 Nov, 2020

1 commit

58c644ba5 sched/idle: Fix arch_cpu_idle() vs tracing ... Browse Code »

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Mark Rutland
Tested-by: Mark Rutland
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

Peter Zijlstra
2020-11-24 23:47:35 +0800

23 Nov, 2020

1 commit

1179f170b s390: fix fpu restore in entry.S ... Browse Code »

We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.

The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.

Cc: # 5.8
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel
Signed-off-by: Sven Schnelle
Acked-by: Christian Borntraeger
Reviewed-by: Heiko Carstens
Signed-off-by: Heiko Carstens

Sven Schnelle
2020-11-23 18:52:13 +0800

19 Nov, 2020

1 commit

79af02af1 Merge tag 'kvm-s390-master-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/kvms390/linux into kvm-master

KVM: s390: Fix for destroy page ultravisor call

- handle response code from older firmware
- add uv.c to KVM: s390/s390 maintainer list

Paolo Bonzini
2020-11-19 01:04:05 +0800

18 Nov, 2020

2 commits

4c80d0571 s390/uv: handle destroy page legacy interface ... Browse Code »

Older firmware can return rc=0x107 rrc=0xd for destroy page if the
page is already non-secure. This should be handled like a success
as already done by newer firmware.

Signed-off-by: Christian Borntraeger
Fixes: 1a80b54d1ce1 ("s390/uv: add destroy page call")
Reviewed-by: David Hildenbrand
Acked-by: Cornelia Huck
Reviewed-by: Janosch Frank

Christian Borntraeger
2020-11-18 20:08:49 +0800
111e91a6d Merge tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fixes from Heiko Carstens:

- fix system call exit path; avoid return to user space with any
TIF/CIF/PIF set

- fix file permission for cpum_sfb_size parameter

- another small defconfig update

* tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_sf.c: fix file permission for cpum_sfb_size
s390: update defconfigs
s390: fix system call exit path

Linus Torvalds
2020-11-18 03:22:03 +0800

17 Nov, 2020

1 commit

d4d3c84d7 Merge tag 'kvm-s390-master-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/kvms390/linux into kvm-master

KVM: s390: Fixes for 5.10

- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early

Paolo Bonzini
2020-11-17 02:18:22 +0800

12 Nov, 2020

2 commits

78d732e1f s390/cpum_sf.c: fix file permission for cpum_sfb_size ... Browse Code »

This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.

Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.

For further reference and details see:
[1] https://github.com/ibm-s390-tools/s390-tools/issues/97

Fixes: 69f239ed335a ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: # 3.14
Signed-off-by: Thomas Richter
Acked-by: Sumanth Korikkar
Signed-off-by: Heiko Carstens

Thomas Richter
2020-11-12 19:10:36 +0800
966e7ea43 s390: update defconfigs ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-12 19:10:36 +0800

11 Nov, 2020

2 commits

6cbf1e960 KVM: s390: remove diag318 reset code ... Browse Code »

The diag318 data must be set to 0 by VM-wide reset events
triggered by diag308. As such, KVM should not handle
resetting this data via the VCPU ioctls.

Fixes: 23a60f834406 ("s390/kvm: diagnose 0x318 sync and reset")
Signed-off-by: Collin Walling
Reviewed-by: Christian Borntraeger
Reviewed-by: Janosch Frank
Acked-by: Cornelia Huck
Signed-off-by: Christian Borntraeger
Link: https://lore.kernel.org/r/20201104181032.109800-1-walling@linux.ibm.com

Collin Walling
2020-11-11 16:31:52 +0800
1ed576a20 KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup ... Browse Code »

We can only have protected guest pages after a successful set secure
parameters call as only then the UV allows imports and unpacks.

By moving the test we can now also check for it in s390_reset_acc()
and do an early return if it is 0.

Signed-off-by: Janosch Frank
Fixes: 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Janosch Frank
2020-11-11 16:31:48 +0800

10 Nov, 2020

2 commits

76a4efa80 perf/arch: Remove perf_sample_data::regs_user_copy ... Browse Code »

struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Steven Rostedt
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org

Peter Zijlstra
2020-11-10 01:12:34 +0800
267fb2735 perf: Reduce stack usage of perf_output_begin() ... Browse Code »

__perf_output_begin() has an on-stack struct perf_sample_data in the
unlikely case it needs to generate a LOST record. However, every call
to perf_output_begin() must already have a perf_sample_data on-stack.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org

Peter Zijlstra
2020-11-10 01:12:33 +0800

09 Nov, 2020

1 commit

ce9dfafe2 s390: fix system call exit path ... Browse Code »

The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.

This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.

Fix this by checking the corresponding bits only when interrupts are
disabled.

Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Cc: # 5.8
Acked-by: Sven Schnelle
Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-09 18:16:11 +0800

03 Nov, 2020

6 commits

0b2ca2c7d s390/pci: fix hot-plug of PCI function missing bus ... Browse Code »

Under some circumstances in particular with "Reconfigure I/O Path"
a zPCI function may first appear in Standby through a PCI event with
PEC 0x0302 which initially makes it visible to the zPCI subsystem,
Only after that is it configured with a zPCI event with PEC 0x0301.
If the zbus is still missing a PCI function zero (devfn == 0) when the
PCI event 0x0301 is handled zdev->zbus->bus is still NULL and gets
dereferenced in common code.
Check for this case and enable but don't scan the zPCI function.
This matches what would happen if we immediately got the 0x0301
configuration request or the function was included in CLP List PCI.
In all cases the PCI functions with devfn != 0 will be scanned once
function 0 appears.

Fixes: 3047766bc6ec ("s390/pci: fix enabling a reserved PCI function")
Cc: # 5.8
Signed-off-by: Niklas Schnelle
Acked-by: Pierre Morel
Signed-off-by: Heiko Carstens

Niklas Schnelle
2020-11-03 22:12:16 +0800
de5d9dae1 s390/smp: move rcu_cpu_starting() earlier ... Browse Code »

The call to rcu_cpu_starting() in smp_init_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:

WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.

Call Trace:
show_stack+0x158/0x1f0
dump_stack+0x1f2/0x238
__lock_acquire+0x2640/0x4dd0
lock_acquire+0x3a8/0xd08
_raw_spin_lock_irqsave+0xc0/0xf0
clockevents_register_device+0xa8/0x528
init_cpu_timer+0x33e/0x468
smp_init_secondary+0x11a/0x328
smp_start_secondary+0x82/0x88

This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the smp_init_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai
Acked-by: Paul E. McKenney
Signed-off-by: Heiko Carstens

Qian Cai
2020-11-03 22:12:16 +0800
c3d9cdca7 s390: update defconfigs ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-03 22:12:16 +0800
cfef9aa69 s390/vdso: remove unused constants ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-03 22:12:16 +0800
e99198661 s390/vdso: remove empty unused file ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2020-11-03 22:12:15 +0800
b0e98aa9c s390/mm: make pmd/pud_deref() large page aware ... Browse Code »

pmd/pud_deref() assume that they will never operate on large pmd/pud
entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask.
With commit 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable()
to handle vmemmap"), that assumption is no longer true, at least for
pmd_deref().

In theory, we could end up with wrong addresses because some of the
non-address bits of a large entry would not be masked out.
In practice, this does not (yet) show any impact, because vmemmap_free()
is currently never used for s390.

Fix pmd/pud_deref() to check for the entry type and use the
_xxx_ENTRY_ORIGIN_LARGE mask for large entries.

While at it, also move pmd/pud_pfn() around, in order to avoid code
duplication, because they do the same thing.

Fixes: 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable() to handle vmemmap")
Cc: # 5.9
Signed-off-by: Gerald Schaefer
Reviewed-by: Alexander Gordeev
Signed-off-by: Heiko Carstens

Gerald Schaefer
2020-11-03 22:12:15 +0800

26 Oct, 2020

2 commits

8e90b4b13 s390: correct __bootdata / __bootdata_preserved macros ... Browse Code »

Currently s390 build is broken.

SECTCMP .boot.data
error: section .boot.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.data] Error 1
SECTCMP .boot.preserved.data
error: section .boot.preserved.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.preserved.data] Error 1
make[1]: *** [bzImage] Error 2

Commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") converted all __section(foo) to __section("foo").
This is wrong for __bootdata / __bootdata_preserved macros which want
variable names to be a part of intermediate section names .boot.data.and .boot.preserved.data.. Those sections are later sorted by alignment + name and merged together into final .boot.data / .boot.preserved.data sections. Those sections must be identical in the decompressor and the decompressed kernel (that is checked during the build).

Fixes: 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")")
Signed-off-by: Vasily Gorbik
Signed-off-by: Heiko Carstens

Vasily Gorbik
2020-10-26 21:18:01 +0800
33def8498 treewide: Convert macro and uses of __section(foo) to __section("foo") ... Browse Code »

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches
Reviewed-by: Nick Desaulniers
Reviewed-by: Miguel Ojeda
Signed-off-by: Linus Torvalds

Joe Perches
2020-10-26 05:51:49 +0800

24 Oct, 2020

2 commits

9313f8026 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio updates from Michael Tsirkin:
"vhost, vdpa, and virtio cleanups and fixes

A very quiet cycle, no new features"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS: add URL for virtio-mem
vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
vringh: fix __vringh_iov() when riov and wiov are different
vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
s390: virtio: PV needs VIRTIO I/O device protection
virtio: let arch advertise guest's memory access restrictions
vhost_vdpa: Fix duplicate included kernel.h
vhost: reduce stack usage in log_used
virtio-mem: Constify mem_id_table
virtio_input: Constify id_table
virtio-balloon: Constify id_table
vdpa/mlx5: Fix failure to bring link up
vdpa/mlx5: Make use of a specific 16 bit endianness API

Linus Torvalds
2020-10-24 02:00:57 +0800
4a22709e2 Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block ... Browse Code »

Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:

- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().

- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

Linus Torvalds
2020-10-24 01:06:38 +0800

23 Oct, 2020

3 commits

746b25b1a Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries

- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy

- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script

- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions

- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

- Use sha1 build id for both BFD linker and LLD

- Improve deb-pkg for reproducible builds and rootless builds

- Remove stale, useless scripts/namespace.pl

- Turn -Wreturn-type warning into error

- Fix build error of deb-pkg when CONFIG_MODULES=n

- Replace 'hostname' command with more portable 'uname -n'

- Various Makefile cleanups

* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
...

Linus Torvalds
2020-10-23 04:13:57 +0800
fc996db97 Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio ... Browse Code »

Pull VFIO updates from Alex Williamson:

- New fsl-mc vfio bus driver supporting userspace drivers of objects
within NXP's DPAA2 architecture (Diana Craciun)

- Support for exposing zPCI information on s390 (Matthew Rosato)

- Fixes for "detached" VFs on s390 (Matthew Rosato)

- Fixes for pin-pages and dma-rw accesses (Yan Zhao)

- Cleanups and optimize vconfig regen (Zenghui Yu)

- Fix duplicate irq-bypass token registration (Alex Williamson)

* tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages
vfio/pci: Clear token on bypass registration failure
vfio/fsl-mc: fix the return of the uninitialized variable ret
vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger
vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit
MAINTAINERS: Add entry for s390 vfio-pci
vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO
vfio/fsl-mc: Add support for device reset
vfio/fsl-mc: Add read/write support for fsl-mc devices
vfio/fsl-mc: trigger an interrupt via eventfd
vfio/fsl-mc: Add irq infrastructure for fsl-mc devices
vfio/fsl-mc: Added lock support in preparation for interrupt handling
vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions
vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO
s390/pci: track whether util_str is valid in the zpci_dev
s390/pci: stash version in the zpci_dev
vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
...

Linus Torvalds
2020-10-23 04:00:44 +0800
f56e65dff Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode

Linus Torvalds
2020-10-23 00:59:21 +0800

21 Oct, 2020

1 commit

4ce1cf7b0 s390: virtio: PV needs VIRTIO I/O device protection ... Browse Code »

If protected virtualization is active on s390, VIRTIO has only retricted
access to the guest memory.
Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
the case.

Signed-off-by: Pierre Morel
Reviewed-by: Cornelia Huck
Reviewed-by: Halil Pasic
Link: https://lore.kernel.org/r/1599728030-17085-3-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin
Acked-by: Christian Borntraeger

Pierre Morel
2020-10-21 22:34:13 +0800

19 Oct, 2020

1 commit

ecb8ac8b1 mm/madvise: introduce process_madvise() syscall: an external memory hinting API ... Browse Code »

There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);

DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.

The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)

The pointer iovec points to an array of iovec structures, defined in
as:

struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};

The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).

The vlen represents the number of elements in iovec.

The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.

MADV_COLD
MADV_PAGEOUT

Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.

RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim
Signed-off-by: YueHaibing
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Reviewed-by: Suren Baghdasaryan
Reviewed-by: Vlastimil Babka
Acked-by: David Rientjes
Cc: Alexander Duyck
Cc: Brian Geffon
Cc: Christian Brauner
Cc: Daniel Colascione
Cc: Jann Horn
Cc: Jens Axboe
Cc: Joel Fernandes
Cc: Johannes Weiner
Cc: John Dias
Cc: Kirill Tkhai
Cc: Michal Hocko
Cc: Oleksandr Natalenko
Cc: Sandeep Patil
Cc: SeongJae Park
Cc: SeongJae Park
Cc: Shakeel Butt
Cc: Sonny Rao
Cc: Tim Murray
Cc: Christian Brauner
Cc: Florian Weimer
Cc:
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds

Minchan Kim
2020-10-19 00:27:10 +0800