Eric Lee / smarc-fsl-linux-kernel

21 Feb, 2019

1 commit

b38f6c502 XArray: Fix xa_release in allocating arrays ... Browse Code »

xa_cmpxchg() was a little too magic in turning ZERO entries into NULL,
and would leave the entry set to the ZERO entry instead of releasing
it for future use. After careful review of existing users of
xa_cmpxchg(), change the semantics so that it does not translate either
incoming argument from NULL into ZERO entries.

Add several tests to the test-suite to make sure this problem doesn't
come back.

Reported-by: Jason Gunthorpe
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-21 06:08:54 +0800

09 Feb, 2019

1 commit

f818b82b8 XArray: Mark xa_insert and xa_reserve as must_check ... Browse Code »

If the user doesn't care about the return value from xa_insert(), then
they should be using xa_store() instead. The point of xa_reserve() is
to get the return value early before taking another lock, so this should
also be __must_check.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-09 13:00:49 +0800

07 Feb, 2019

4 commits

2fa044e51 XArray: Add cyclic allocation ... Browse Code »

This differs slightly from the IDR equivalent in five ways.

1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
like xa_alloc(). Also like xa_alloc(), it will write to the 'id'
pointer before placing the entry in the XArray.
2. The 'next' cursor is allocated separately from the XArray instead
of being part of the IDR. This saves memory for all the users which
do not use the cyclic allocation API and suits some users better.
3. It returns -EBUSY instead of -ENOSPC.
4. It will attempt to wrap back to the minimum value on memory allocation
failure as well as on an -EBUSY error, assuming that a user would
rather allocate a small ID than suffer an ID allocation failure.
5. It reports whether it has wrapped, which is important to some users.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:32:25 +0800
a3e4d3f97 XArray: Redesign xa_alloc API ... Browse Code »

It was too easy to forget to initialise the start index. Add an
xa_limit data structure which can be used to pass min & max, and
define a couple of special values for common cases. Also add some
more tests cribbed from the IDR test suite. Change the return value
from -ENOSPC to -EBUSY to match xa_insert().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:32:23 +0800
3ccaf57a6 XArray: Add support for 1s-based allocation ... Browse Code »

A lot of places want to allocate IDs starting at 1 instead of 0.
While the xa_alloc() API supports this, it's not very efficient if lots
of IDs are allocated, due to having to walk down to the bottom of the
tree to see if ID 1 is available, then all the way over to the next
non-allocated ID. This method marks ID 0 as being occupied which wastes
one slot in the XArray, but preserves xa_empty() as working.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:13:24 +0800
fd9dc93e3 XArray: Change xa_insert to return -EBUSY ... Browse Code »

Userspace translates EEXIST to "File exists" which isn't a very good
error message for the problem. "Device or resource busy" is a better
indication of what went wrong.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:12:15 +0800

05 Feb, 2019

2 commits

809ab9371 XArray: Update xa_erase family descriptions ... Browse Code »

xa_erase does not allocate memory and doesn't have a gfp parameter.
Update the descriptions of all four variants to be more useful.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-05 12:16:58 +0800
bd54211b8 XArray tests: RCU lock prohibits GFP_KERNEL ... Browse Code »

Drop and reacquire the RCU read lock while using GFP_KERNEL.

Reported-by: Li RongQing
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-05 12:15:30 +0800

04 Feb, 2019

6 commits

8834f5600 Linux 5.0-rc5 Browse Code »

Linus Torvalds
2019-02-04 05:48:04 +0800
24b888d8d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Thomas Gleixner:
"A few updates for x86:

- Fix an unintended sign extension issue in the fault handling code

- Rename the new resource control config switch so it's less
confusing

- Avoid setting up EFI info in kexec when the EFI runtime is
disabled.

- Fix the microcode version check in the AMD microcode loader so it
only loads higher version numbers and never downgrades

- Set EFER.LME in the 32bit trampoline before returning to long mode
to handle older AMD/KVM behaviour properly.

- Add Darren and Andy as x86/platform reviewers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Avoid confusion over the new X86_RESCTRL config
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
x86/microcode/amd: Don't falsely trick the late loading mechanism
MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers
x86/fault: Fix sign-extend unintended sign extension
x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
x86/cpu: Add Atom Tremont (Jacobsville)

Linus Torvalds
2019-02-04 01:08:12 +0800
cc6810e36 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the cpu hotplug machinery:

- Replace the overly clever 'SMT disabled by BIOS' detection logic as
it breaks KVM scenarios and prevents speculation control updates
when the Hyperthreads are brought online late after boot.

- Remove a redundant invocation of the speculation control update
function"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
x86/speculation: Remove redundant arch_smt_update() invocation

Linus Torvalds
2019-02-04 01:02:03 +0800
58f6d4287 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Thomas Gleixner:
"A pile of perf updates:

- Fix broken sanity check in the /proc/sys/kernel/perf_cpu_time_max_percent
write handler

- Cure a perf script crash which caused by an unitinialized data
structure

- Highlight the hottest instruction in perf top and not a random one

- Cure yet another clang issue when building perf python

- Handle topology entries with no CPU correctly in the tools

- Handle perf data which contains both tracepoints and performance
counter entries correctly.

- Add a missing NULL pointer check in perf ordered_events_free()"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf script: Fix crash when processing recorded stat data
perf top: Fix wrong hottest instruction highlighted
perf tools: Handle TOPOLOGY headers with no CPU
perf python: Remove -fstack-clash-protection when building with some clang versions
perf core: Fix perf_proc_update_handler() bug
perf script: Fix crash with printing mixed trace point and other events
perf ordered_events: Fix crash in ordered_events__free

Linus Torvalds
2019-02-04 00:59:51 +0800
89401be65 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull EFI fix from Thomas Gleixner:
"The dump info for the efi page table debugging lacks a terminator
which causes the kernel to crash when the debugfile is read"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker

Linus Torvalds
2019-02-04 00:57:05 +0800
312b3a93d Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:

- regression fix: transaction commit can run away due to delayed ref
waiting heuristic, this is not necessary now because of the proper
reservation mechanism introduced in 5.0

- regression fix: potential crash due to use-before-check of an ERR_PTR
return value

- fix for transaction abort during transaction commit that needs to
properly clean up pending block groups

- fix deadlock during b-tree node/leaf splitting, when this happens on
some of the fundamental trees, we must prevent new tree block
allocation to re-enter indirectly via the block group flushing path

- potential memory leak after errors during mount

* tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: On error always free subvol_name in btrfs_mount
btrfs: clean up pending block groups when transaction commit aborts
btrfs: fix potential oops in device_list_add
btrfs: don't end the transaction for delayed refs in throttle
Btrfs: fix deadlock when allocating tree block during leaf/node split

Linus Torvalds
2019-02-04 00:48:33 +0800

03 Feb, 2019

5 commits

12491ed35 Merge tag 'devicetree-fixes-for-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux ... Browse Code »

Pull Devicetree fix from Rob Herring:
"A single fix for building DT bindings in-tree"

* tag 'devicetree-fixes-for-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Fix dt_binding_check target for in tree builds

Linus Torvalds
2019-02-03 02:34:32 +0800
74b13e7ef Merge tag 'riscv-for-linus-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/palmer/riscv-linux

Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of mostly-independent patches:

- make our port respect TIF_NEED_RESCHED, which fixes
CONFIG_PREEMPT=y kernels

- fix double-put of OF nodes

- fix a misspelling of target in our Kconfig

- generic PCIe is enabled in our defconfig

- fix our SBI early console to properly handle line
endings

- fix max_low_pfn being counted in PFNs

- a change to TASK_UNMAPPED_BASE to match what other
arches do

This has passed my standard 'boot Fedora' flow"

* tag 'riscv-for-linus-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
riscv: Adjust mmap base address at a third of task size
riscv: fixup max_low_pfn with PFN_DOWN.
tty/serial: use uart_console_write in the RISC-V SBL early console
RISC-V: defconfig: Add CRYPTO_DEV_VIRTIO=y
RISC-V: defconfig: Enable Generic PCIE by default
RISC-V: defconfig: Move CONFIG_PCI{,E_XILINX}
RISC-V: Kconfig: fix spelling mistake "traget" -> "target"
RISC-V: asm/page.h: fix spelling mistake "CONFIG_64BITS" -> "CONFIG_64BIT"
RISC-V: fix bad use of of_node_put
RISC-V: Add _TIF_NEED_RESCHED check for kernel thread when CONFIG_PREEMPT=y

Linus Torvalds
2019-02-03 02:26:14 +0800
c8864cb70 Merge tag 'for-linus-20190202' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes that should go into this release. This contains:

- MD pull request from Song, fixing a recovery OOM issue (Alexei)

- Fix for a sync related stall (Jianchao)

- Dummy callback for timeouts (Tetsuo)

- IDE atapi sense ordering fix (me)"

* tag 'for-linus-20190202' of git://git.kernel.dk/linux-block:
ide: ensure atapi sense request aren't preempted
blk-mq: fix a hung issue when fsync
block: pass no-op callback to INIT_WORK().
md/raid5: fix 'out of memory' during raid cache recovery

Linus Torvalds
2019-02-03 02:16:28 +0800
3cde55ee7 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI fixes from James Bottomley:
"Five minor bug fixes.

The libfc one is a tiny memory leak, the zfcp one is an incorrect user
visible parameter and the rest are on error legs or obscure features"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
scsi: bnx2fc: Fix error handling in probe()
scsi: scsi_debug: fix write_same with virtual_gb problem
scsi: libfc: free skb when receiving invalid flogi resp
scsi: zfcp: fix sysfs block queue limit output for max_segment_size

Linus Torvalds
2019-02-03 02:12:53 +0800
b9de6efed Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"24 fixes"

* emailed patches from Andrew Morton : (24 commits)
autofs: fix error return in autofs_fill_super()
autofs: drop dentry reference only when it is never used
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
psi: clarify the Kconfig text for the default-disable option
mm, memory_hotplug: __offline_pages fix wrong locking
mm: hwpoison: use do_send_sig_info() instead of force_sig()
kasan: mark file common so ftrace doesn't trace it
init/Kconfig: fix grammar by moving a closing parenthesis
lib/test_kmod.c: potential double free in error handling
mm, oom: fix use-after-free in oom_kill_process
mm/hotplug: invalid PFNs from pfn_to_online_page()
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
psi: fix aggregation idle shut-off
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
oom, oom_reaper: do not enqueue same task twice
mm: migrate: make buffer_migrate_page_norefs() actually succeed
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
x86_64: increase stack size for KASAN_EXTRA
...

Linus Torvalds
2019-02-03 01:32:58 +0800

02 Feb, 2019

21 commits

74c953ca5 efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker ... Browse Code »

When reading 'efi_page_tables' debugfs triggers an out-of-bounds access here:

arch/arm64/mm/dump.c: 282
if (addr >= st->marker[1].start_address) {

called from:

arch/arm64/mm/dump.c: 331
note_page(st, addr, 2, pud_val(pud));

because st->marker++ is is called after "UEFI runtime end" which is the
last element in addr_marker[]. Therefore, add a terminator like the one
for kernel_page_tables, so it can be skipped to print out non-existent
markers.

Here's the KASAN bug report:

# cat /sys/kernel/debug/efi_page_tables
---[ UEFI runtime start ]---
0x0000000020000000-0x0000000020010000 64K PTE RW NX SHD AF ...
0x0000000020200000-0x0000000021340000 17664K PTE RW NX SHD AF ...
...
0x0000000021920000-0x0000000021950000 192K PTE RW x SHD AF ...
0x0000000021950000-0x00000000219a0000 320K PTE RW NX SHD AF ...
---[ UEFI runtime end ]---
---[ (null) ]---
---[ (null) ]---

BUG: KASAN: global-out-of-bounds in note_page+0x1f0/0xac0
Read of size 8 at addr ffff2000123f2ac0 by task read_all/42464
Call trace:
dump_backtrace+0x0/0x298
show_stack+0x24/0x30
dump_stack+0xb0/0xdc
print_address_description+0x64/0x2b0
kasan_report+0x150/0x1a4
__asan_report_load8_noabort+0x30/0x3c
note_page+0x1f0/0xac0
walk_pgd+0xb4/0x244
ptdump_walk_pgd+0xec/0x140
ptdump_show+0x40/0x50
seq_read+0x3f8/0xad0
full_proxy_read+0x9c/0xc0
__vfs_read+0xfc/0x4c8
vfs_read+0xec/0x208
ksys_read+0xd0/0x15c
__arm64_sys_read+0x84/0x94
el0_svc_handler+0x258/0x304
el0_svc+0x8/0xc

The buggy address belongs to the variable:
__compound_literal.0+0x20/0x800

Memory state around the buggy address:
ffff2000123f2980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff2000123f2a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
>ffff2000123f2a80: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 00
^
ffff2000123f2b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff2000123f2b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0

[ ardb: fix up whitespace ]
[ mingo: fix up some moar ]

Signed-off-by: Qian Cai
Signed-off-by: Ard Biesheuvel
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-efi@vger.kernel.org
Fixes: 9d80448ac92b ("efi/arm64: Add debugfs node to dump UEFI runtime page tables")
Link: http://lkml.kernel.org/r/20190202095017.13799-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar

Qian Cai
2019-02-02 18:27:29 +0800
e6d429313 x86/resctrl: Avoid confusion over the new X86_RESCTRL config ... Browse Code »

"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.

Make the user prompt more specific. Match the config symbol name.

[ bp: In the future, the corresponding ARM arch-specific code will be
under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
under the CPU_RESCTRL umbrella symbol. ]

Signed-off-by: Johannes Weiner
Signed-off-by: Borislav Petkov
Cc: Babu Moger
Cc: Fenghua Yu
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: James Morse
Cc: Jonathan Corbet
Cc: "Kirill A. Shutemov"
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra
Cc: Pu Wen
Cc: Reinette Chatre
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: x86-ml
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org

Johannes Weiner
2019-02-02 17:34:52 +0800
cd984a5be Merge tag 'xtensa-20190201' of git://github.com/jcmvbkbc/linux-xtensa ... Browse Code »

Pull xtensa fixes from Max Filippov:

- fix ccount_timer_shutdown for secondary CPUs

- fix secondary CPU initialization

- fix secondary CPU reset vector clash with double exception vector

- fix present CPUs when booting with 'maxcpus' parameter

- limit possible CPUs by configured NR_CPUS

- issue a warning if xtensa PIC is asked to retrigger anything other
than software IRQ

- fix masking/unmasking of the first two IRQs on xtensa MX PIC

- fix typo in Kconfig description for user space unaligned access
feature

- fix Kconfig warning for selecting BUILTIN_DTB

* tag 'xtensa-20190201' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: SMP: limit number of possible CPUs by NR_CPUS
xtensa: rename BUILTIN_DTB to BUILTIN_DTB_SOURCE
xtensa: Fix typo use space=>user space
drivers/irqchip: xtensa-mx: fix mask and unmask
drivers/irqchip: xtensa: add warning to irq_retrigger
xtensa: SMP: mark each possible CPU as present
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: fix secondary CPU initialization
xtensa: SMP: fix ccount_timer_shutdown

Linus Torvalds
2019-02-02 08:56:30 +0800
8b050fe42 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 fixes from Will Deacon:
"Although we're still debugging a few minor arm64-specific issues in
mainline, I didn't want to hold this lot up in the meantime.

We've got an additional KASLR fix after the previous one wasn't quite
complete, a fix for a performance regression when mapping executable
pages into userspace and some fixes for kprobe blacklisting. All
candidates for stable.

Summary:

- Fix module loading when KASLR is configured but disabled at runtime

- Fix accidental IPI when mapping user executable pages

- Ensure hyp-stub and KVM world switch code cannot be kprobed"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Clean the __hyp_text to PoC after resume
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: kprobe: Always blacklist the KVM world-switch code
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
arm64: Do not issue IPIs for user executable ptes

Linus Torvalds
2019-02-02 08:54:25 +0800
33640d718 Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull smb3 fixes from Steve French:
"SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
particularly important one for queryxattr (see xfstests 70 and 117)"

* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
CIFS: fix use-after-free of the lease keys
CIFS: Do not consider -ENODATA as stat failure for reads
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Fix possible oops and memory leaks in async IO
cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
cifs: fix computation for MAX_SMB2_HDR_SIZE

Linus Torvalds
2019-02-02 08:53:01 +0800
b7bd29b53 Merge tag 'apparmor-pr-2019-02-01' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/jj/linux-apparmor

Pull apparmor bug fixes from John Johansen:
"Two bug fixes for apparmor:

- Fix aa_label_build() error handling for failed merges

- Fix warning about unused function apparmor_ipv6_postroute"

* tag 'apparmor-pr-2019-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix aa_label_build() error handling for failed merges
apparmor: Fix warning about unused function apparmor_ipv6_postroute

Linus Torvalds
2019-02-02 08:18:38 +0800
f585b283e autofs: fix error return in autofs_fill_super() ... Browse Code »

In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.

Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2019-02-02 07:46:24 +0800
63ce5f552 autofs: drop dentry reference only when it is never used ... Browse Code »

autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry. However, dentry is read via autofs_dentry_ino(dentry) after
that. This may result in a use-free-bug. The patch drops the reference
count of dentry only when it is never used.

Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pan Bian
2019-02-02 07:46:24 +0800
c27d82f52 fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() ... Browse Code »

When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2019-02-02 07:46:24 +0800
e0a352fab mm: migrate: don't rely on __PageMovable() of newpage after unlocking it ... Browse Code »

We had a race in the old balloon compaction code before b1123ea6d3b3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting 195a8c43e93d ("virtio-balloon:
deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management") till b1123ea6d3b3 ("mm: balloon:
use general non-lru movable page feature"). d6d86c0a7f8d
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 -
4.7].

There was a subtle race between dropping the page lock of the newpage in
__unmap_and_move() and checking for __is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e93d ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():

- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable(). So the new check (__PageMovable(newpage)) will still
hold even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again. So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand
Reported-by: Vratislav Bendel
Acked-by: Michal Hocko
Acked-by: Rafael Aquini
Cc: Mel Gorman
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Jan Kara
Cc: Andrea Arcangeli
Cc: Dominik Brodowski
Cc: Matthew Wilcox
Cc: Vratislav Bendel
Cc: Rafael Aquini
Cc: Konstantin Khlebnikov
Cc: Minchan Kim
Cc: [3.12 - 4.7]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Hildenbrand
2019-02-02 07:46:24 +0800
7b2489d37 psi: clarify the Kconfig text for the default-disable option ... Browse Code »

The current help text caused some confusion in online forums about
whether or not to default-enable or default-disable psi in vendor
kernels. This is because it doesn't communicate the reason for why we
made this setting configurable in the first place: that the overhead is
non-zero in an artificial scheduler stress test.

Since this isn't representative of real workloads, and the effect was
not measurable in scheduler-heavy real world applications such as the
webservers and memcache installations at Facebook, it's fair to point
out that this is a pretty cautious option to select.

Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Reviewed-by: Andrew Morton
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2019-02-02 07:46:24 +0800
e3df4c6e4 mm, memory_hotplug: __offline_pages fix wrong locking ... Browse Code »

Jan has noticed that we do double unlock on some failure paths when
offlining a page range. This is indeed the case when
test_pages_in_a_zone respp. start_isolate_page_range fail. This was an
omission when forward porting the debugging patch from an older kernel.

Fix the issue by dropping mem_hotplug_done from the failure condition
and keeping the single unlock in the catch all failure path.

Link: http://lkml.kernel.org/r/20190115120307.22768-1-mhocko@kernel.org
Fixes: 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure")
Signed-off-by: Michal Hocko
Reported-by: Jan Kara
Reviewed-by: Jan Kara
Tested-by: Jan Kara
Reviewed-by: Oscar Salvador
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2019-02-02 07:46:23 +0800
6376360ec mm: hwpoison: use do_send_sig_info() instead of force_sig() ... Browse Code »

Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.

The root cause is that memory_failure() uses force_sig() to forcibly
kill asynchronous (meaning not in the current context) processes. As
discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM
fixes, this is not a right thing to do. OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099de2 ("signal:
oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this
patch is suggesting to do the same for hwpoison. do_send_sig_info()
properly accesses to siglock with lock_task_sighand(), so is free from
the reported race.

I confirmed that the reported bug reproduces with inserting some delay
in kill_procs(), and it never reproduces with this patch.

Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)

Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi
Reported-by: Jane Chu
Reviewed-by: Dan Williams
Reviewed-by: William Kucharski
Cc: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2019-02-02 07:46:23 +0800
0d0c8de87 kasan: mark file common so ftrace doesn't trace it ... Browse Code »

When option CONFIG_KASAN is enabled toghether with ftrace, function
ftrace_graph_caller() gets in to a recursion, via functions
kasan_check_read() and kasan_check_write().

Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
179 mcount_get_pc x0 // function's pc
(gdb) bt
#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
#1 0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
#2 0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105
#3 0xffffff90104a2464 in atomic_add_return (v=, i=) at ./include/generated/atomic-instrumented.h:71
#4 atomic_inc_return (v=) at ./include/generated/atomic-fallback.h:284
#5 trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441
#6 0xffffff9010481774 in trace_graph_entry_watchdog (trace=) at ../kernel/trace/trace_selftest.c:741
#7 0xffffff90104a185c in function_graph_enter (ret=, func=, frame_pointer=18446743799894897728, retp=) at ../kernel/trace/trace_functions_graph.c:196
#8 0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231
#9 0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Rework so that the kasan implementation isn't traced.

Link: http://lkml.kernel.org/r/20181212183447.15890-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell
Acked-by: Dmitry Vyukov
Tested-by: Dmitry Vyukov
Acked-by: Steven Rostedt (VMware)
Cc: Andrey Ryabinin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anders Roxell
2019-02-02 07:46:23 +0800
980768338 init/Kconfig: fix grammar by moving a closing parenthesis ... Browse Code »

Link: http://lkml.kernel.org/r/20190129150813.15785-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Neuschäfer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonathan Neuschäfer
2019-02-02 07:46:23 +0800
db7ddeab3 lib/test_kmod.c: potential double free in error handling ... Browse Code »

There is a copy and paste bug so we set "config->test_driver" to NULL
twice instead of setting "config->test_fs". Smatch complains that it
leads to a double free:

lib/test_kmod.c:840 __kmod_config_init() warn: 'config->test_fs' double freed

Link: http://lkml.kernel.org/r/20190121140011.GA14283@kadam
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter
Acked-by: Luis Chamberlain
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2019-02-02 07:46:23 +0800
cefc7ef3c mm, oom: fix use-after-free in oom_kill_process ... Browse Code »

Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process. On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process(). More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk. The easiest fix is to do
get/put across the for_each_thread() on the selected task.

Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg. Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent. I looked at the history but it seems like
this is there before git history.

Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Acked-by: Michal Hocko
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Tetsuo Handa
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shakeel Butt
2019-02-02 07:46:23 +0800
b13bc3519 mm/hotplug: invalid PFNs from pfn_to_online_page() ... Browse Code »

On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP). Hence, the page->flags is
uninitialized.

This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid(). However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.

Historically, the commit 68709f45385a ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().

Since the commit 2d070eab2e82 ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.

Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98f5ade ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.

[1]
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
Mem abort info:
ESR = 0x96000005
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
Internal error: Oops: 96000005 [#1] SMP
CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
pstate: 60400009 (nZCv daif +PAN -UAO)
pc : page_mapping+0x24/0x144
lr : __dump_page+0x34/0x3dc
sp : ffff00003a5cfd10
x29: ffff00003a5cfd10 x28: 000000000000802f
x27: 0000000000000000 x26: 0000000000277d00
x25: ffff000010791f56 x24: ffff7fe000000000
x23: ffff000010772f8b x22: ffff00001125f670
x21: ffff000011311000 x20: ffff000010772f8b
x19: fffffffffffffffe x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: ffff802698b19600
x13: ffff802698b1a200 x12: ffff802698b16f00
x11: ffff802698b1a400 x10: 0000000000001400
x9 : 0000000000000001 x8 : ffff00001121a000
x7 : 0000000000000000 x6 : ffff0000102c53b8
x5 : 0000000000000000 x4 : 0000000000000003
x3 : 0000000000000100 x2 : 0000000000000000
x1 : ffff000010772f8b x0 : ffffffffffffffff
Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
Call trace:
page_mapping+0x24/0x144
__dump_page+0x34/0x3dc
dump_page+0x28/0x4c
kmemleak_scan+0x4ac/0x680
kmemleak_scan_thread+0xb4/0xdc
kthread+0x12c/0x13c
ret_from_fork+0x10/0x18
Code: d503201f f9400660 36000040 d1000413 (f9400661)
---[ end trace 4d4bd7f573490c8e ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x002,20000c38
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
Fixes: 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai
Acked-by: Michal Hocko
Cc: Oscar Salvador
Cc: Catalin Marinas
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qian Cai
2019-02-02 07:46:23 +0800
eeb0efd07 mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages ... Browse Code »

This is the same sort of error we saw in commit 17e2e7d7e1b8 ("mm,
page_alloc: fix has_unmovable_pages for HugePages").

Gigantic hugepages cross several memblocks, so it can be that the page
we get in scan_movable_pages() is a page-tail belonging to a
1G-hugepage. If that happens, page_hstate()->size_to_hstate() will
return NULL, and we will blow up in hugepage_migration_supported().

The splat is as follows:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 1350 Comm: bash Tainted: G E 5.0.0-rc1-mm1-1-default+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__offline_pages+0x6ae/0x900
Call Trace:
memory_subsys_offline+0x42/0x60
device_offline+0x80/0xa0
state_store+0xab/0xc0
kernfs_fop_write+0x102/0x180
__vfs_write+0x26/0x190
vfs_write+0xad/0x1b0
ksys_write+0x42/0x90
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E)

[akpm@linux-foundation.org: fix brace layout, per David. Reduce indentation]
Link: http://lkml.kernel.org/r/20190122154407.18417-1-osalvador@suse.de
Signed-off-by: Oscar Salvador
Reviewed-by: Anthony Yznaga
Acked-by: Michal Hocko
Reviewed-by: David Hildenbrand
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oscar Salvador
2019-02-02 07:46:23 +0800
1b69ac6b4 psi: fix aggregation idle shut-off ... Browse Code »

psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating. However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again. This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep. To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself. If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity. But that
should not be a problem. The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation. If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner
Reported-by: Suren Baghdasaryan
Acked-by: Tejun Heo
Cc: Peter Zijlstra
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2019-02-02 07:46:23 +0800
24feb47c5 mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone ... Browse Code »

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized. This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
test_pages_in_a_zone+0xde/0x160
show_valid_zones+0x5c/0x190
dev_attr_show+0x34/0x70
sysfs_kf_seq_show+0xc8/0x148
seq_read+0x204/0x480
__vfs_read+0x32/0x178
vfs_read+0x82/0x138
ksys_read+0x5a/0xb0
system_call+0xdc/0x2d8
Last Breaking-Event-Address:
test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko
Signed-off-by: Mikhail Zaslonko
Tested-by: Mikhail Gavrilov
Reviewed-by: Oscar Salvador
Tested-by: Gerald Schaefer
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Mikhail Gavrilov
Cc: Pavel Tatashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mikhail Zaslonko
2019-02-02 07:46:23 +0800