Eric Lee / smarc-fsl-linux-kernel

18 Dec, 2017

1 commit

b36025b19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf 2017-12-17

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a corner case in generic XDP where we have non-linear skbs
but enough tailroom in the skb to not miss to linearizing there,
from Song.

2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
BPF context is not skb, from Daniel.

3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
call would use the wrong register for the skb, from Daniel.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-12-18 23:49:22 +0800

16 Dec, 2017

2 commits

1f76a7556 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Ingo Molnar:
"Misc fixes:

- Fix a S390 boot hang that was caused by the lock-break logic.
Remove lock-break to begin with, as review suggested it was
unreasonably fragile and our confidence in its continued good
health is lower than our confidence in its removal.

- Remove the lockdep cross-release checking code for now, because of
unresolved false positive warnings. This should make lockdep work
well everywhere again.

- Get rid of the final (and single) ACCESS_ONCE() straggler and
remove the API from v4.15.

- Fix a liblockdep build warning"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/lib/lockdep: Add missing declaration of 'pr_cont()'
checkpatch: Remove ACCESS_ONCE() warning
compiler.h: Remove ACCESS_ONCE()
tools/include: Remove ACCESS_ONCE()
tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
locking/lockdep: Remove the cross-release locking checks
locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK

Linus Torvalds
2017-12-16 03:44:59 +0800
87ab81943 bpf: add test case for ld_abs and helper changing pkt data ... Browse Code »

Add a test that i) uses LD_ABS, ii) zeroing R6 before call, iii) calls
a helper that triggers reload of cached skb data, iv) uses LD_ABS again.
It's added for test_bpf in order to do runtime testing after JITing as
well as test_verifier to test that the sequence is allowed.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: Alexei Starovoitov

Daniel Borkmann
2017-12-16 01:19:36 +0800

15 Dec, 2017

1 commit

338f1d9d1 lib/rbtree,drm/mm: add rbtree_replace_node_cached() ... Browse Code »

Add a variant of rbtree_replace_node() that maintains the leftmost cache
of struct rbtree_root_cached when replacing nodes within the rbtree.

As drm_mm is the only rb_replace_node() being used on an interval tree,
the mistake looks fairly self-contained. Furthermore the only user of
drm_mm_replace_node() is its testsuite...

Testcase: igt/drm_mm/replace

Link: http://lkml.kernel.org/r/20171122100729.3742-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20171109212435.9265-1-chris@chris-wilson.co.uk
Fixes: f808c13fd373 ("lib/interval_tree: fast overlap detection")
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Acked-by: Davidlohr Bueso
Cc: Jérôme Glisse
Cc: Joonas Lahtinen
Cc: Daniel Vetter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Wilson
2017-12-15 08:00:48 +0800

12 Dec, 2017

1 commit

e966eaeeb locking/lockdep: Remove the cross-release locking checks ... Browse Code »

This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.

If we disable cross-release by default but keep the code upstream then
in practice the most likely outcome is that we'll allow the situation
to degrade gradually, by allowing entropy to introduce more and more
false positives, until it overwhelms maintenance capacity.

Another bad side effect was that people were trying to work around
the false positives by uglifying/complicating unrelated code. There's
a marked difference between annotating locking operations and
uglifying good code just due to bad lock debugging code ...

This gradual decrease in quality happened to a number of debugging
facilities in the kernel, and lockdep is pretty complex already,
so we cannot risk this outcome.

Either cross-release checking can be done right with no false positives,
or it should not be included in the upstream kernel.

( Note that it might make sense to maintain it out of tree and go through
the false positives every now and then and see whether new bugs were
introduced. )

Cc: Byungchul Park
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-12-12 19:38:51 +0800

09 Dec, 2017

1 commit

4ded3bec6 Merge tag 'keys-fixes-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/dhowells/linux-fs into keys-for-linus

Assorted fixes for keyrings, ASN.1, X.509 and PKCS#7.

James Morris
2017-12-09 11:39:48 +0800

08 Dec, 2017

5 commits

8dfd2f22d 509: fix printing uninitialized stack memory when OID is empty ... Browse Code »

Callers of sprint_oid() do not check its return value before printing
the result. In the case where the OID is zero-length, -EBADMSG was
being returned without anything being written to the buffer, resulting
in uninitialized stack memory being printed. Fix this by writing
"(bad)" to the buffer in the cases where -EBADMSG is returned.

Fixes: 4f73175d0375 ("X.509: Add utility functions to render OIDs as strings")
Signed-off-by: Eric Biggers
Signed-off-by: David Howells

Eric Biggers
2017-12-08 23:13:28 +0800
47e0a208f X.509: fix buffer overflow detection in sprint_oid() ... Browse Code »

In sprint_oid(), if the input buffer were to be more than 1 byte too
small for the first snprintf(), 'bufsize' would underflow, causing a
buffer overflow when printing the remainder of the OID.

Fortunately this cannot actually happen currently, because no users pass
in a buffer that can be too small for the first snprintf().

Regardless, fix it by checking the snprintf() return value correctly.

For consistency also tweak the second snprintf() check to look the same.

Fixes: 4f73175d0375 ("X.509: Add utility functions to render OIDs as strings")
Cc: Takashi Iwai
Signed-off-by: Eric Biggers
Signed-off-by: David Howells
Reviewed-by: James Morris

Eric Biggers
2017-12-08 23:13:28 +0800
81a7be2cd ASN.1: check for error from ASN1_OP_END__ACT actions ... Browse Code »

asn1_ber_decoder() was ignoring errors from actions associated with the
opcodes ASN1_OP_END_SEQ_ACT, ASN1_OP_END_SET_ACT,
ASN1_OP_END_SEQ_OF_ACT, and ASN1_OP_END_SET_OF_ACT. In practice, this
meant the pkcs7_note_signed_info() action (since that was the only user
of those opcodes). Fix it by checking for the error, just like the
decoder does for actions associated with the other opcodes.

This bug allowed users to leak slab memory by repeatedly trying to add a
specially crafted "pkcs7_test" key (requires CONFIG_PKCS7_TEST_KEY).

In theory, this bug could also be used to bypass module signature
verification, by providing a PKCS#7 message that is misparsed such that
a signature's ->authattrs do not contain its ->msgdigest. But it
doesn't seem practical in normal cases, due to restrictions on the
format of the ->authattrs.

Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Cc: # v3.7+
Signed-off-by: Eric Biggers
Signed-off-by: David Howells
Reviewed-by: James Morris

Eric Biggers
2017-12-08 23:13:27 +0800
e0058f3a8 ASN.1: fix out-of-bounds read when parsing indefinite length item ... Browse Code »

In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed
to the action functions before their lengths had been computed, using
the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in
reading data past the end of the input buffer, when given a specially
crafted message.

Fix it by rearranging the code so that the indefinite length is resolved
before the action is called.

This bug was originally found by fuzzing the X.509 parser in userspace
using libFuzzer from the LLVM project.

KASAN report (cleaned up slightly):

BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline]
BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
Read of size 128 at addr ffff880035dd9eaf by task keyctl/195

CPU: 1 PID: 195 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xd1/0x175 lib/dump_stack.c:53
print_address_description+0x78/0x260 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x23f/0x350 mm/kasan/report.c:409
memcpy+0x1f/0x50 mm/kasan/kasan.c:302
memcpy ./include/linux/string.h:341 [inline]
x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
asn1_ber_decoder+0xb4a/0x1fd0 lib/asn1_decoder.c:447
x509_cert_parse+0x1c7/0x620 crypto/asymmetric_keys/x509_cert_parser.c:89
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96

Allocated by task 195:
__do_kmalloc_node mm/slab.c:3675 [inline]
__kmalloc_node+0x47/0x60 mm/slab.c:3682
kvmalloc ./include/linux/mm.h:540 [inline]
SYSC_add_key security/keys/keyctl.c:104 [inline]
SyS_add_key+0x19e/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Reported-by: Alexander Potapenko
Cc: # v3.7+
Signed-off-by: Eric Biggers
Signed-off-by: David Howells

Eric Biggers
2017-12-08 23:13:27 +0800
6e237d099 netlink: Relax attr validation for fixed length types ... Browse Code »

Commit 28033ae4e0f5 ("net: netlink: Update attr validation to require
exact length for some types") requires attributes using types NLA_U* and
NLA_S* to have an exact length. This change is exposing bugs in various
userspace commands that are sending attributes with an invalid length
(e.g., attribute has type NLA_U8 and userspace sends NLA_U32). While
the commands are clearly broken and need to be fixed, users are arguing
that the sudden change in enforcement is breaking older commands on
newer kernels for use cases that otherwise "worked".

Relax the validation to print a warning mesage similar to what is done
for messages containing extra bytes after parsing.

Fixes: 28033ae4e0f5 ("net: netlink: Update attr validation to require exact length for some types")
Signed-off-by: David Ahern
Reviewed-by: Johannes Berg
Signed-off-by: David S. Miller

David Ahern
2017-12-08 03:00:57 +0800

02 Dec, 2017

2 commits

e1ba1c99d Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/palmer/linux

Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
"This contains a handful of small cleanups that are a result of
feedback that didn't make it into our original patch set, either
because the feedback hadn't been given yet, I missed the original
emails, or we weren't ready to submit the changes yet.

I've been maintaining the various cleanup patch sets I have as their
own branches, which I then merged together and signed. Each merge
commit has a short summary of the changes, and each branch is based on
your latest tag (4.15-rc1, in this case). If this isn't the right way
to do this then feel free to suggest something else, but it seems sane
to me.

Here's a short summary of the changes, roughly in order of how
interesting they are.

- libgcc.h has been moved from include/lib, where it's the only
member, to include/linux. This is meant to avoid tab completion
conflicts.

- VDSO entries for clock_get/gettimeofday/getcpu have been added.
These are simple syscalls now, but we want to let glibc use them
from the start so we can make them faster later.

- A VDSO entry for instruction cache flushing has been added so
userspace can flush the instruction cache.

- The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
removed, as those VDSO entries don't actually exist.

- __io_writes has been corrected to respect the given type.

- A new READ_ONCE in arch_spin_is_locked().

- __test_and_op_bit_ord() is now actually ordered.

- Various small fixes throughout the tree to enable allmodconfig to
build cleanly.

- Removal of some dead code in our atomic support headers.

- Improvements to various comments in our atomic support headers"

* tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
RISC-V: __io_writes should respect the length argument
move libgcc.h to include/linux
RISC-V: Clean up an unused include
RISC-V: Allow userspace to flush the instruction cache
RISC-V: Flush I$ when making a dirty page executable
RISC-V: Add missing include
RISC-V: Use define for get_cycles like other architectures
RISC-V: Provide stub of setup_profiling_timer()
RISC-V: Export some expected symbols for modules
RISC-V: move empty_zero_page definition to C and export it
RISC-V: io.h: type fixes for warnings
RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
RISC-V: use generic serial.h
RISC-V: remove spin_unlock_wait()
RISC-V: `sfence.vma` orderes the instruction cache
RISC-V: Add READ_ONCE in arch_spin_is_locked()
RISC-V: __test_and_op_bit_ord should be strongly ordered
RISC-V: Remove smb_mb__{before,after}_spinlock()
RISC-V: Remove __smp_bp__{before,after}_atomic
RISC-V: Comment on why {,cmp}xchg is ordered how it is
...

Linus Torvalds
2017-12-02 08:39:12 +0800
4db2b604c move libgcc.h to include/linux ... Browse Code »

Introducing a new include/lib directory just for this file totally
messes up tab completion for include/linux, which is highly annoying.

Move it to include/linux where we have headers for all kinds of other
lib/ code as well.

Signed-off-by: Christoph Hellwig
Signed-off-by: Palmer Dabbelt

Christoph Hellwig
2017-12-02 05:09:40 +0800

30 Nov, 2017

1 commit

ef0010a30 vsprintf: don't use 'restricted_pointer()' when not restricting ... Browse Code »

Instead, just fall back on the new '%p' behavior which hashes the
pointer.

Otherwise, '%pK' - that was intended to mark a pointer as restricted -
just ends up leaking pointers that a normal '%p' wouldn't leak. Which
just make the whole thing pointless.

I suspect we should actually get rid of '%pK' entirely, and make it just
work as '%p' regardless, but this is the minimal obvious fix. People
who actually use 'kptr_restrict' should weigh in on which behavior they
want.

Cc: Tobin Harding
Cc: Kees Cook
Signed-off-by: Linus Torvalds

Linus Torvalds
2017-11-30 03:28:09 +0800

29 Nov, 2017

3 commits

7b1924a1d vsprintf: add printk specifier %px ... Browse Code »

printk specifier %p now hashes all addresses before printing. Sometimes
we need to see the actual unmodified address. This can be achieved using
%lx but then we face the risk that if in future we want to change the
way the Kernel handles printing of pointers we will have to grep through
the already existent 50 000 %lx call sites. Let's add specifier %px as a
clear, opt-in, way to print a pointer and maintain some level of
isolation from all the other hex integer output within the Kernel.

Add printk specifier %px to print the actual unmodified address.

Signed-off-by: Tobin C. Harding

Tobin C. Harding
2017-11-29 09:13:14 +0800
ad67b74d2 printk: hash addresses printed with %p ... Browse Code »

Currently there exist approximately 14 000 places in the kernel where
addresses are being printed using an unadorned %p. This potentially
leaks sensitive information regarding the Kernel layout in memory. Many
of these calls are stale, instead of fixing every call lets hash the
address by default before printing. This will of course break some
users, forcing code printing needed addresses to be updated.

Code that _really_ needs the address will soon be able to use the new
printk specifier %px to print the address.

For what it's worth, usage of unadorned %p can be broken down as
follows (thanks to Joe Perches).

$ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
1084 arch
20 block
10 crypto
32 Documentation
8121 drivers
1221 fs
143 include
101 kernel
69 lib
100 mm
1510 net
40 samples
7 scripts
11 security
166 sound
152 tools
2 virt

Add function ptr_to_id() to map an address to a 32 bit unique
identifier. Hash any unadorned usage of specifier %p and any malformed
specifiers.

Signed-off-by: Tobin C. Harding

Tobin C. Harding
2017-11-29 09:09:02 +0800
57e734423 vsprintf: refactor %pK code out of pointer() ... Browse Code »

Currently code to handle %pK is all within the switch statement in
pointer(). This is the wrong level of abstraction. Each of the other switch
clauses call a helper function, pK should do the same.

Refactor code out of pointer() to new function restricted_pointer().

Signed-off-by: Tobin C. Harding

Tobin C. Harding
2017-11-29 09:03:24 +0800

26 Nov, 2017

1 commit

844056fd7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:

- The final conversion of timer wheel timers to timer_setup().

A few manual conversions and a large coccinelle assisted sweep and
the removal of the old initialization mechanisms and the related
code.

- Remove the now unused VSYSCALL update code

- Fix permissions of /proc/timer_list. I still need to get rid of that
file completely

- Rename a misnomed clocksource function and remove a stale declaration

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
m68k/macboing: Fix missed timer callback assignment
treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
timer: Remove redundant __setup_timer*() macros
timer: Pass function down to initialization routines
timer: Remove unused data arguments from macros
timer: Switch callback prototype to take struct timer_list * argument
timer: Pass timer_list pointer to callbacks unconditionally
Coccinelle: Remove setup_timer.cocci
timer: Remove setup_*timer() interface
timer: Remove init_timer() interface
treewide: setup_timer() -> timer_setup() (2 field)
treewide: setup_timer() -> timer_setup()
treewide: init_timer() -> setup_timer()
treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
s390: cmm: Convert timers to use timer_setup()
lightnvm: Convert timers to use timer_setup()
drivers/net: cris: Convert timers to use timer_setup()
drm/vc4: Convert timers to use timer_setup()
block/laptop_mode: Convert timers to use timer_setup()
net/atm/mpc: Avoid open-coded assignment of timer callback function
...

Linus Torvalds
2017-11-26 02:37:16 +0800

23 Nov, 2017

1 commit

14b661ebb Merge tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd ... Browse Code »

Pull MTD updates from Richard Weinberger:
"General changes:
- Unconfuse get_unmapped_area and point/unpoint driver methods
- New partition parser: sharpslpart
- Kill GENERIC_IO
- Various fixes

NAND changes:
- Add a flag to mark NANDs that require 3 address cycles to encode a
page address
- Set a default ECC/free layout when NAND_ECC_NONE is requested
- Fix a bug in panic_nand_write()
- Another batch of cleanups for the denali driver
- Fix PM support in the atmel driver
- Remove support for platform data in the omap driver
- Fix subpage write in the omap driver
- Fix irq handling in the mtk driver
- Change link order of mtk_ecc and mtk_nand drivers to speed up boot
time
- Change log level of ECC error messages in the mxc driver
- Patch the pxa3xx driver to support Armada 8k platforms
- Add BAM DMA support to the qcom driver
- Convert gpio-nand to the GPIO desc API
- Fix ECC handling in the mt29f driver

SPI-NOR changes:
- Introduce system power management support
- New mechanism to select the proper .quad_enable() hook by JEDEC
ID, when needed, instead of only by manufacturer ID
- Add support to new memory parts from Gigadevice, Winbond, Macronix
and Everspin
- Maintainance for Cadence, Intel, Mediatek and STM32 drivers"

* tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd: (85 commits)
mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid
mtd: sharpslpart: Add sharpslpart partition parser
mtd: Add sanity checks in mtd_write/read_oob()
mtd: remove the get_unmapped_area method
mtd: implement mtd_get_unmapped_area() using the point method
mtd: chips/map_rom.c: implement point and unpoint methods
mtd: chips/map_ram.c: implement point and unpoint methods
mtd: mtdram: properly handle the phys argument in the point method
mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD'
mtd: slram: use memremap() instead of ioremap()
kconfig: kill off GENERIC_IO option
mtd: Fix C++ comment in include/linux/mtd/mtd.h
mtd: constify mtd_partition
mtd: plat-ram: Replace manual resource management by devm
mtd: nand: Fix writing mtdoops to nand flash.
mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID
mtd: nand: mtk: fix infinite ECC decode IRQ issue
mtd: spi-nor: Add support for mr25h128
mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o
mtd: spi-nor: enable 4B opcodes for mx66l51235l
...

Linus Torvalds
2017-11-23 14:46:06 +0800

22 Nov, 2017

1 commit

24ed960ab treewide: Switch DEFINE_TIMER callbacks to struct timer_list * ... Browse Code »

This changes all DEFINE_TIMER() callbacks to use a struct timer_list
pointer instead of unsigned long. Since the data argument has already been
removed, none of these callbacks are using their argument currently, so
this renames the argument to "unused".

Done using the following semantic patch:

@match_define_timer@
declarer name DEFINE_TIMER;
identifier _timer, _callback;
@@

DEFINE_TIMER(_timer, _callback);

@change_callback depends on match_define_timer@
identifier match_define_timer._callback;
type _origtype;
identifier _origarg;
@@

void
-_callback(_origtype _origarg)
+_callback(struct timer_list *unused)
{ ... }

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:05 +0800

18 Nov, 2017

16 commits

fa7f57807 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- a bit more MM

- procfs updates

- dynamic-debug fixes

- lib/ updates

- checkpatch

- epoll

- nilfs2

- signals

- rapidio

- PID management cleanup and optimization

- kcov updates

- sysvipc updates

- quite a few misc things all over the place

* emailed patches from Andrew Morton : (94 commits)
EXPERT Kconfig menu: fix broken EXPERT menu
include/asm-generic/topology.h: remove unused parent_node() macro
arch/tile/include/asm/topology.h: remove unused parent_node() macro
arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
arch/sh/include/asm/topology.h: remove unused parent_node() macro
arch/ia64/include/asm/topology.h: remove unused parent_node() macro
drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
mm: add infrastructure for get_user_pages_fast() benchmarking
sysvipc: make get_maxid O(1) again
sysvipc: properly name ipc_addid() limit parameter
sysvipc: duplicate lock comments wrt ipc_addid()
sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
initramfs: use time64_t timestamps
drivers/watchdog: make use of devm_register_reboot_notifier()
kernel/reboot.c: add devm_register_reboot_notifier()
kcov: update documentation
Makefile: support flag -fsanitizer-coverage=trace-cmp
kcov: support comparison operands collection
kcov: remove pointless current != NULL check
kernel/panic.c: add TAINT_AUX
...

Linus Torvalds
2017-11-18 08:56:17 +0800
d677a4d60 Makefile: support flag -fsanitizer-coverage=trace-cmp ... Browse Code »

The flag enables Clang instrumentation of comparison operations
(currently not supported by GCC). This instrumentation is needed by the
new KCOV device to collect comparison operands.

Link: http://lkml.kernel.org/r/20171011095459.70721-2-glider@google.com
Signed-off-by: Victor Chibotaru
Signed-off-by: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Andrey Konovalov
Cc: Mark Rutland
Cc: Alexander Popov
Cc: Andrey Ryabinin
Cc: Kees Cook
Cc: Vegard Nossum
Cc: Quentin Casasnovas
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Victor Chibotaru
2017-11-18 08:10:04 +0800
4441fca0a lib: test module for find_*_bit() functions ... Browse Code »

find_bit functions are widely used in the kernel, including hot paths.
This module tests performance of those functions in 2 typical scenarios:
randomly filled bitmap with relatively equal distribution of set and
cleared bits, and sparse bitmap which has 1 set bit for 500 cleared
bits.

On ThunderX machine:

Start testing find_bit() with random-filled bitmap
find_next_bit: 240043 cycles, 164062 iterations
find_next_zero_bit: 312848 cycles, 163619 iterations
find_last_bit: 193748 cycles, 164062 iterations
find_first_bit: 177720874 cycles, 164062 iterations

Start testing find_bit() with sparse bitmap
find_next_bit: 3633 cycles, 656 iterations
find_next_zero_bit: 620399 cycles, 327025 iterations
find_last_bit: 3038 cycles, 656 iterations
find_first_bit: 691407 cycles, 656 iterations

[arnd@arndb.de: use correct format string for find-bit tests]
Link: http://lkml.kernel.org/r/20171113135605.3166307-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20171109140714.13168-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov
Signed-off-by: Arnd Bergmann
Reviewed-by: Clement Courbet
Cc: Alexey Dobriyan
Cc: Matthew Wilcox
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yury Norov
2017-11-18 08:10:02 +0800
0b548e33e lib/rbtree-test: lower default params ... Browse Code »

Fengguang reported soft lockups while running the rbtree and interval
tree test modules. The logic for these tests all occur in init phase,
and we currently are pounding with the default values for number of
nodes and number of iterations of each test. Reduce the latter by two
orders of magnitude. This does not influence the value of the tests in
that one thousand times by default is enough to get the picture.

Link: http://lkml.kernel.org/r/20171109161715.xai2dtwqw2frhkcm@linux-n805
Signed-off-by: Davidlohr Bueso
Reported-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-11-18 08:10:02 +0800
2f9b7e08c lib/nmi_backtrace.c: fix kernel text address leak ... Browse Code »

Don't leak idle function address in NMI backtrace.

Link: http://lkml.kernel.org/r/20171106165648.GA95243@sofia
Signed-off-by: Liu Changcheng
Reviewed-by: Petr Mladek
Reviewed-by: Josh Poimboeuf
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Liu, Changcheng
2017-11-18 08:10:02 +0800
36a3d1dd4 lib/genalloc.c: make the avail variable an atomic_long_t ... Browse Code »

If the amount of resources allocated to a gen_pool exceeds 2^32 then the
avail atomic overflows and this causes problems when clients try and
borrow resources from the pool. This is only expected to be an issue on
64 bit systems.

Add the header to pull in atomic_long* operations. So
that 32 bit systems continue to use atomic32_t but 64 bit systems can
use atomic64_t.

Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com
Signed-off-by: Stephen Bates
Reviewed-by: Logan Gunthorpe
Reviewed-by: Mathieu Desnoyers
Reviewed-by: Daniel Mentz
Cc: Jonathan Corbet
Cc: Andrew Morton
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Bates
2017-11-18 08:10:02 +0800
e813a6140 lib/int_sqrt: adjust comments ... Browse Code »

Our current int_sqrt() is not rough nor any approximation; it calculates
the exact value of: floor(sqrt()). Document this.

Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Linus Torvalds
Cc: Anshul Garg
Cc: Davidlohr Bueso
Cc: David Miller
Cc: Ingo Molnar
Cc: Joe Perches
Cc: Kees Cook
Cc: Matthew Wilcox
Cc: Michael Davidson
Cc: Thomas Gleixner
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2017-11-18 08:10:01 +0800
f8ae107ee lib/int_sqrt: optimize initial value compute ... Browse Code »

The initial value (@m) compute is:

m = 1UL << (BITS_PER_LONG - 2);
while (m > x)
m >>= 2;

Which is a linear search for the highest even bit smaller or equal to @x
We can implement this using a binary search using __fls() (or better when
its hardware implemented).

m = 1UL << (__fls(x) & ~1UL);

Especially for small values of @x; which are the more common arguments
when doing a CDF on idle times; the linear search is near to worst case,
while the binary search of __fls() is a constant 6 (or 5 on 32bit)
branches.

cycles: branches: branch-misses:

PRE:

hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219

SOFTWARE FLS:

hot: 29.576176 +- 0.028850 26.666730 +- 0.004511 0.019463 +- 0.000663
cold: 165.947136 +- 0.188406 26.666746 +- 0.004511 6.133897 +- 0.004386

HARDWARE FLS:

hot: 24.720922 +- 0.025161 20.666784 +- 0.004509 0.020836 +- 0.000677
cold: 132.777197 +- 0.127471 20.666776 +- 0.004509 5.080285 +- 0.003874

Averages computed over all values
Suggested-by: Joe Perches
Acked-by: Will Deacon
Acked-by: Linus Torvalds
Cc: Anshul Garg
Cc: Davidlohr Bueso
Cc: David Miller
Cc: Ingo Molnar
Cc: Kees Cook
Cc: Matthew Wilcox
Cc: Michael Davidson
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2017-11-18 08:10:01 +0800
3f3295709 lib/int_sqrt: optimize small argument ... Browse Code »

The current int_sqrt() computation is sub-optimal for the case of small
@x. Which is the interesting case when we're going to do cumulative
distribution functions on idle times, which we assume to be a random
variable, where the target residency of the deepest idle state gives an
upper bound on the variable (5e6ns on recent Intel chips).

In the case of small @x, the compute loop:

while (m != 0) {
b = y + m;
y >>= 1;

if (x >= b) {
x -= b;
y += m;
}
m >>= 2;
}

can be reduced to:

while (m > x)
m >>= 2;

Because y==0, b==m and until x>=m y will remain 0.

And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.

cycles: branches: branch-misses:

OLD:

hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593
cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305

PRE:

hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113
cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568

POST:

hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219

Averages computed over all values
Suggested-by: Anshul Garg
Acked-by: Linus Torvalds
Cc: Davidlohr Bueso
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Will Deacon
Cc: Joe Perches
Cc: David Miller
Cc: Matthew Wilcox
Cc: Kees Cook
Cc: Michael Davidson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2017-11-18 08:10:01 +0800
dc2bf000a lib/test: delete five error messages for failed memory allocations ... Browse Code »

Omit extra messages for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/410a4c5a-4ee0-6fcc-969c-103d8e496b78@users.sourceforge.net
Signed-off-by: Markus Elfring
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Markus Elfring
2017-11-18 08:10:01 +0800
d6b28e099 lib: add module support to string tests ... Browse Code »

Extract the string test code into its own source file, to allow
compiling it either to a loadable module, or built into the kernel.

Fixes: 03270c13c5ffaa6a ("lib/string.c: add testcases for memset16/32/64")
Link: http://lkml.kernel.org/r/1505397744-3387-1-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven
Cc: Matthew Wilcox
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2017-11-18 08:10:01 +0800
1f3c790bd dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 ... Browse Code »

line-range is supposed to treat "1-" as "1-endoffile", so
handle the special case by setting last_lineno to UINT_MAX.

Fixes this error:

dynamic_debug:ddebug_parse_query: last-line:0 < 1st-line:1
dynamic_debug:ddebug_exec_query: query parse failed

Link: http://lkml.kernel.org/r/10a6a101-e2be-209f-1f41-54637824788e@infradead.org
Signed-off-by: Randy Dunlap
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2017-11-18 08:10:01 +0800
2a8358d8a bug: define the "cut here" string in a single place ... Browse Code »

The "cut here" string is used in a few paths. Define it in a single
place.

Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Cc: Arnd Bergmann
Cc: Fengguang Wu
Cc: Ingo Molnar
Cc: Josh Poimboeuf
Cc: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-11-18 08:10:01 +0800
aaf5dcfb2 kernel debug: support resetting WARN_ONCE for all architectures ... Browse Code »

Some architectures store the WARN_ONCE state in the flags field of the
bug_entry. Clear that one too when resetting once state through
/sys/kernel/debug/clear_warn_once

Pointed out by Michael Ellerman

Improves the earlier patch that add clear_warn_once.

[ak@linux.intel.com: add a missing ifdef CONFIG_MODULES]
Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org
[akpm@linux-foundation.org: fix unused var warning]
[akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe]
[akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe]
Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org
Signed-off-by: Andi Kleen
Tested-by: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2017-11-18 08:10:01 +0800
3aaabbf1c lib/dma-debug.c: fix incorrect pfn calculation ... Browse Code »

dma-debug reports the following warning:

WARNING: CPU: 3 PID: 298 at kernel-4.4/lib/dma-debug.c:604
debug _dma_assert_idle+0x1a8/0x230()
DMA-API: cpu touching an active dma mapped cacheline [cln=0x00000882300]
CPU: 3 PID: 298 Comm: vold Tainted: G W O 4.4.22+ #1
Hardware name: MT6739 (DT)
Call trace:
debug_dma_assert_idle+0x1a8/0x230
wp_page_copy.isra.96+0x118/0x520
do_wp_page+0x4fc/0x534
handle_mm_fault+0xd4c/0x1310
do_page_fault+0x1c8/0x394
do_mem_abort+0x50/0xec

I found that debug_dma_alloc_coherent() and debug_dma_free_coherent()
assume that dma_alloc_coherent() always returns a linear address.

However it's possible that dma_alloc_coherent() returns a non-linear
address. In this case, page_to_pfn(virt_to_page(virt)) will return an
incorrect pfn. If the pfn is valid and mapped as a COW page, we will
hit the warning when doing wp_page_copy().

Fix this by calculating pfn for linear and non-linear addresses.

[miles.chen@mediatek.com: v4]
Link: http://lkml.kernel.org/r/1510872972-23919-1-git-send-email-miles.chen@mediatek.com
Link: http://lkml.kernel.org/r/1506484087-1177-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen
Reviewed-by: Robin Murphy
Cc: Christoph Hellwig
Cc: Marek Szyprowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miles Chen
2017-11-18 08:10:00 +0800
16382e17c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull iov_iter updates from Al Viro:

- bio_{map,copy}_user_iov() series; those are cleanups - fixes from the
same pile went into mainline (and stable) in late September.

- fs/iomap.c iov_iter-related fixes

- new primitive - iov_iter_for_each_range(), which applies a function
to kernel-mapped segments of an iov_iter.

Usable for kvec and bvec ones, the latter does kmap()/kunmap() around
the callback. _Not_ usable for iovec- or pipe-backed iov_iter; the
latter is not hard to fix if the need ever appears, the former is by
design.

Another related primitive will have to wait for the next cycle - it
passes page + offset + size instead of pointer + size, and that one
will be usable for everything _except_ kvec. Unfortunately, that one
didn't get exposure in -next yet, so...

- a bit more lustre iov_iter work, including a use case for
iov_iter_for_each_range() (checksum calculation)

- vhost/scsi leak fix in failure exit

- misc cleanups and detritectomy...

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
iomap_dio_actor(): fix iov_iter bugs
switch ksocknal_lib_recv_...() to use of iov_iter_for_each_range()
lustre: switch struct ksock_conn to iov_iter
vhost/scsi: switch to iov_iter_get_pages()
fix a page leak in vhost_scsi_iov_to_sgl() error recovery
new primitive: iov_iter_for_each_range()
lnet_return_rx_credits_locked: don't abuse list_entry
xen: don't open-code iov_iter_kvec()
orangefs: remove detritus from struct orangefs_kiocb_s
kill iov_shorten()
bio_alloc_map_data(): do bmd->iter setup right there
bio_copy_user_iov(): saner bio size calculation
bio_map_user_iov(): get rid of copying iov_iter
bio_copy_from_iter(): get rid of copying iov_iter
move more stuff down into bio_copy_user_iov()
blk_rq_map_user_iov(): move iov_iter_advance() down
bio_map_user_iov(): get rid of the iov_for_each()
bio_map_user_iov(): move alignment check into the main loop
don't rely upon subsequent bio_add_pc_page() calls failing
... and with iov_iter_get_pages_alloc() it becomes even simpler
...

Linus Torvalds
2017-11-18 04:08:18 +0800

17 Nov, 2017

1 commit

b9743042b Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here is the set of driver core / debugfs patches for 4.15-rc1.

Not many here, mostly all are debugfs fixes to resolve some
long-reported problems with files going away with references to them
in userspace. There's also some SPDX cleanups for the debugfs code, as
well as a few other minor driver core changes for issues reported by
people.

All of these have been in linux-next for a week or more with no
reported issues"

* tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix device link deferred probe
debugfs: Remove redundant license text
debugfs: add SPDX identifiers to all debugfs files
debugfs: defer debugfs_fsdata allocation to first usage
debugfs: call debugfs_real_fops() only after debugfs_file_get()
debugfs: purge obsolete SRCU based removal protection
IB/hfi1: convert to debugfs_file_get() and -put()
debugfs: convert to debugfs_file_get() and -put()
debugfs: debugfs_real_fops(): drop __must_hold sparse annotation
debugfs: implement per-file removal protection
debugfs: add support for more elaborate ->d_fsdata
driver core: Move device_links_purge() after bus_remove_device()
arch_topology: Fix section miss match warning due to free_raw_capacity()
driver-core: pr_err() strings should end with newlines

Linus Torvalds
2017-11-17 00:55:30 +0800

16 Nov, 2017

3 commits

e60e1ee60 Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.15.

Core:
- Atomic object lifetime fixes
- Atomic iterator improvements
- Sparse/smatch fixes
- Legacy kms ioctls to be interruptible
- EDID override improvements
- fb/gem helper cleanups
- Simple outreachy patches
- Documentation improvements
- Fix dma-buf rcu races
- DRM mode object leasing for improving VR use cases.
- vgaarb improvements for non-x86 platforms.

New driver:
- tve200: Faraday Technology TVE200 block.

This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
the StorLink SL3516 (later Cortina Systems CS3516) as well as the
Grain Media GM8180.

New bridges:
- SiI9234 support

New panels:
- S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
LT089AC19000, Innolux AT043TN24

i915:
- Remove Coffeelake from alpha support
- Cannonlake workarounds
- Infoframe refactoring for DisplayPort
- VBT updates
- DisplayPort vswing/emph/buffer translation refactoring
- CCS fixes
- Restore GPU clock boost on missed vblanks
- Scatter list updates for userptr allocations
- Gen9+ transition watermarks
- Display IPC (Isochronous Priority Control)
- Private PAT management
- GVT: improved error handling and pci config sanitizing
- Execlist refactoring
- Transparent Huge Page support
- User defined priorities support
- HuC/GuC firmware refactoring
- DP MST fixes
- eDP power sequencing fixes
- Use RCU instead of stop_machine
- PSR state tracking support
- Eviction fixes
- BDW DP aux channel timeout fixes
- LSPCON fixes
- Cannonlake PLL fixes

amdgpu:
- Per VM BO support
- Powerplay cleanups
- CI powerplay support
- PASID mgr for kfd
- SR-IOV fixes
- initial GPU reset for vega10
- Prime mmap support
- TTM updates
- Clock query interface for Raven
- Fence to handle ioctl
- UVD encode ring support on Polaris
- Transparent huge page DMA support
- Compute LRU pipe tweaks
- BO flag to allow buffers to opt out of implicit sync
- CTX priority setting API
- VRAM lost infrastructure plumbing

qxl:
- fix flicker since atomic rework

amdkfd:
- Further improvements from internal AMD tree
- Usermode events
- Drop radeon support

nouveau:
- Pascal temperature sensor support
- Improved BAR2 handling
- MMU rework to support Pascal MMU

exynos:
- Improved HDMI/mixer support
- HDMI audio interface support

tegra:
- Prep work for tegra186
- Cleanup/fixes

msm:
- Preemption support for a5xx
- Display fixes for 8x96 (snapdragon 820)
- Async cursor plane fixes
- FW loading rework
- GPU debugging improvements

vc4:
- Prep for DSI panels
- fix T-format tiling scanout
- New madvise ioctl

Rockchip:
- LVDS support

omapdrm:
- omap4 HDMI CEC support

etnaviv:
- GPU performance counters groundwork

sun4i:
- refactor driver load + TCON backend
- HDMI improvements
- A31 support
- Misc fixes

udl:
- Probe/EDID read fixes.

tilcdc:
- Misc fixes.

pl111:
- Support more variants

adv7511:
- Improve EDID handling.
- HDMI CEC support

sii8620:
- Add remote control support"

* tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
drm/rockchip: analogix_dp: Use mutex rather than spinlock
drm/mode_object: fix documentation for object lookups.
drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
drm/i915: Move init_clock_gating() back to where it was
drm/i915: Prune the reservation shared fence array
drm/i915: Idle the GPU before shinking everything
drm/i915: Lock llist_del_first() vs llist_del_all()
drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
drm/i915: Disable lazy PPGTT page table optimization for vGPU
drm/i915/execlists: Remove the priority "optimisation"
drm/i915: Filter out spurious execlists context-switch interrupts
drm/amdgpu: use irq-safe lock for kiq->ring_lock
drm/amdgpu: bypass lru touch for KIQ ring submission
drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
drm/amd/powerplay: initialize a variable before using it
drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
drm/rockchip: add CONFIG_OF dependency for lvds
...

Linus Torvalds
2017-11-16 12:42:10 +0800
7c225c69f Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc bits

- ocfs2 updates

- almost all of MM

* emailed patches from Andrew Morton : (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...

Linus Torvalds
2017-11-16 11:42:40 +0800
c7df8ad29 mm, truncate: do not check mapping for every page being truncated ... Browse Code »

During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.

This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called. The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.

sparsetruncate (tiny)
4.14.0-rc4 4.14.0-rc4
oneirq-v1r1 pickhelper-v1r1
Min Time 141.00 ( 0.00%) 140.00 ( 0.71%)
1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%)
2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%)
Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%)
Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%)
Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%)
Max Time 230.00 ( 0.00%) 205.00 ( 10.87%)
Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%)
Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%)
Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%)
Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%)
Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%)
Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%)
Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%)
Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%)
Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%)

As you'd expect, the gain is marginal but it can be detected. The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field. The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.

Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Reviewed-by: Jan Kara
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2017-11-16 10:21:06 +0800