Eric Lee / smarc-fsl-linux-kernel

28 Aug, 2017

3 commits

b97971bee coresight: pmu: Adds return stack option to perf coresight pmu ... Browse Code »

Return stack is a programmable option on some ETM and PTM hardware.
Adds the option flags to enable this from the perf event command line.

Signed-off-by: Mike Leach
Signed-off-by: Mathieu Poirier
Signed-off-by: Greg Kroah-Hartman

Mike Leach
2017-08-28 22:05:48 +0800
9749c3727 Merge 4.13-rc7 into char-misc-next ... Browse Code »

We want the binder fix in here as well for testing and merge issues.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-08-28 16:19:01 +0800
fff4e7a0e Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb ... Browse Code »

Pull NTB fixes from Jon Mason:
"NTB bug fixes to address an incorrect ntb_mw_count reference in the
NTB transport, improperly bringing down the link if SPADs are
corrupted, and an out-of-order issue regarding link negotiation and
data passing"

* tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
ntb: ntb_test: ensure the link is up before trying to configure the mws
ntb: transport shouldn't disable link due to bogus values in SPADs
ntb: use correct mw_count function in ntb_tool and ntb_transport

Linus Torvalds
2017-08-28 08:01:54 +0800

27 Aug, 2017

1 commit

c153e6210 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"Two fixes: one for an ldt_struct handling bug and a cherry-picked
objtool fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix use-after-free of ldt_struct
objtool: Fix '-mtune=atom' decoding support in objtool 2.0

Linus Torvalds
2017-08-27 00:06:28 +0800

22 Aug, 2017

1 commit

e3181f2c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix IGMP handling wrt VRF, from David Ahern.

2) Fix timer access to freed object in dccp, from Eric Dumazet.

3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
triggerable by userspace. Also from Eric Dumazet.

4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
King.

5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.

6) Fix use after free in TIPC, from Eric Dumazet.

7) When replacing a route in ipv6 we need to reset the round robin
pointer, from Wei Wang.

8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
relaxed ordering changes, from Thierry Redding. I made sure to get
an explicit ACK from Bjorn this time around :-)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: repair fib6 tree in failure case
net_sched: fix order of queue length updates in qdisc_replace()
tools lib bpf: improve warning
switchdev: documentation: minor typo fixes
bpf, doc: also add s390x as arch to sysctl description
net: sched: fix NULL pointer dereference when action calls some targets
rxrpc: Fix oops when discarding a preallocated service call
irda: do not leak initialized list.dev to userspace
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
PCI: Allow PCI express root ports to find themselves
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
ipv6: reset fn->rr_ptr when replacing route
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
tun: handle register_netdevice() failures properly
datagram: When peeking datagrams with offset < 0 don't skip empty skbs
bpf, doc: improve sysctl knob description
netxen: fix incorrect loop counter decrement
nfp: fix infinite loop on umapping cleanup
...

Linus Torvalds
2017-08-22 04:16:27 +0800

21 Aug, 2017

2 commits

deecd4d71 objtool: Fix '-mtune=atom' decoding support in objtool 2.0 ... Browse Code »

With '-mtune=atom', which is enabled with CONFIG_MATOM=y, GCC uses some
unusual instructions for setting up the stack.

Instead of:

mov %rsp, %rbp

it does:

lea (%rsp), %rbp

And instead of:

add imm, %rsp

it does:

lea disp(%rsp), %rsp

Add support for these instructions to the objtool decoder.

Reported-by: Arnd Bergmann
Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: baa41469a7b9 ("objtool: Implement stack validation 2.0")
Link: http://lkml.kernel.org/r/4ea1db896e821226efe1f8e09f270771bde47e65.1501188854.git.jpoimboe@redhat.com
[ This is a cherry-picked version of upcoming commit 5b8de48e82ba. ]
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2017-08-21 22:07:27 +0800
49bf4b36f tools lib bpf: improve warning ... Browse Code »

Signed-off-by: Eric Leblond
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Eric Leblond
2017-08-21 10:49:51 +0800

19 Aug, 2017

1 commit

768dc4e48 test_kmod: fix description for -s -and -c parameters ... Browse Code »

The descriptions were reversed, correct this.

Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
Fixes: 64b671204afd71 ("test_sysctl: add generic script to expand on tests")
Signed-off-by: Luis R. Rodriguez
Reported-by: Daniel Mentz
Cc: "Eric W. Biederman"
Cc: Colin Ian King
Cc: Dan Carpenter
Cc: David Binderman
Cc: Dmitry Torokhov
Cc: Ingo Molnar
Cc: Jessica Yu
Cc: Josh Poimboeuf
Cc: Kees Cook
Cc: Matt Redfearn
Cc: Matt Redfearn
Cc: Michal Marek
Cc: Miroslav Benes
Cc: Peter Zijlstra (Intel)
Cc: Petr Mladek
Cc: Rusty Russell
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-08-19 06:32:01 +0800

17 Aug, 2017

3 commits

3f2baa8a7 Tools: hv: update buffer handling in hv_fcopy_daemon ... Browse Code »

Currently this warning is triggered when compiling hv_fcopy_daemon:

hv_fcopy_daemon.c:216:4: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
kernel_modver = *(__u32 *)buffer;

Convert the send/receive buffer to a union and pass individual members as
needed. This also gives the correct size for the buffer.

Signed-off-by: Olaf Hering
Signed-off-by: K. Y. Srinivasan
Signed-off-by: Greg Kroah-Hartman

Olaf Hering
2017-08-17 00:16:29 +0800
3619350cf Tools: hv: fix snprintf warning in kvp_daemon ... Browse Code »

Increase buffer size so that "_{-INT_MAX}" will fit.
Spotted by the gcc7 snprintf checker.

Signed-off-by: Olaf Hering
Signed-off-by: K. Y. Srinivasan
Signed-off-by: Greg Kroah-Hartman

Olaf Hering
2017-08-17 00:16:29 +0800
ea81fdf09 Tools: hv: vss: Skip freezing filesystems backed by loop ... Browse Code »

Since a loop device is backed by a file, a backup will already result in
its parent filesystem being frozen. It's sufficient to just freeze the
parent filesystem, so we can skip the loop device.

This avoids a situation where a loop device and its parent filesystem are
both frozen and then thawed out of order. For example, if the loop device
is enumerated first, we would thaw it while its parent filesystem is still
frozen. The thaw operation fails and the loop device remains frozen.

Signed-off-by: Alex Ng
Signed-off-by: Vyronas Tsingaras
Signed-off-by: K. Y. Srinivasan
Signed-off-by: Greg Kroah-Hartman

Alex Ng
2017-08-17 00:14:42 +0800

16 Aug, 2017

1 commit

40c6d1b9e Merge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"This update consists of important compile and run-time error fixes to
timers/freq-step, kmod, and sysctl tests"

* tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: timers: freq-step: fix compile error
selftests: futex: fix run_tests target
test_sysctl: fix sysctl.sh by making it executable
test_kmod: fix kmod.sh by making it executable

Linus Torvalds
2017-08-16 03:49:43 +0800

11 Aug, 2017

1 commit

622b2fbe6 selftests: timers: freq-step: fix compile error ... Browse Code »

Fix compile error due to ksft_exit_skip() update to take var_args.

freq-step.c: In function ‘init_test’:
freq-step.c:234:3: error: too few arguments to function ‘ksft_exit_skip’
ksft_exit_skip();
^~~~~~~~~~~~~~
In file included from freq-step.c:26:0:
../kselftest.h:167:19: note: declared here
static inline int ksft_exit_skip(const char *msg, ...)
^~~~~~~~~~~~~~
: recipe for target 'freq-step' failed

Signed-off-by: Shuah Khan

Shuah Khan
2017-08-11 23:28:37 +0800

10 Aug, 2017

1 commit

7ba190be8 selftests: futex: fix run_tests target ... Browse Code »

make -C tools/testing/selftests/futex/ run_tests doesn't run the futex
tests.

Running the tests when `dirname $(OUTPUT)` == $(PWD) doesn't work when
the $(OUTPUT) is $(PWD) which is the case when the test is run using
make -C tools/testing/selftests/futex/ run_tests.

Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT")
Signed-off-by: Shuah Khan
Reviewed-by: Darren Hart (VMware)
Signed-off-by: Shuah Khan

Shuah Khan
2017-08-10 00:16:24 +0800

08 Aug, 2017

3 commits

8b0949d40 test_sysctl: fix sysctl.sh by making it executable ... Browse Code »

We had just forogtten to do this. Without this the following test fails:

$ sudo make -C tools/testing/selftests/sysctl/ run_tests
make: Entering directory '/home/mcgrof/linux-next/tools/testing/selftests/sysctl'
/bin/sh: ./sysctl.sh: Permission denied
selftests: sysctl.sh [FAIL]
/home/mcgrof/linux-next/tools/testing/selftests/sysctl
make: Leaving directory '/home/mcgrof/linux-next/tools/testing/selftests/sysctl'

Fixes: 64b671204afd71 ("test_sysctl: add generic script to expand on tests")
Signed-off-by: Luis R. Rodriguez
Signed-off-by: Shuah Khan

Luis R. Rodriguez
2017-08-08 05:13:36 +0800
0a9c40cea test_kmod: fix kmod.sh by making it executable ... Browse Code »

We had just forgotten to do this. Without this if we run the
following we get a permission denied:

sudo make -C tools/testing/selftests/kmod/ run_tests
make: Entering directory '/home/mcgrof/linux-next/tools/testing/selftests/kmod'
/bin/sh: ./kmod.sh: Permission denied
selftests: kmod.sh [FAIL]
/home/mcgrof/linux-next/tools/testing/selftests/kmod
make: Leaving directory '/home/mcgrof/linux-next/tools/testing/selftests/kmod

Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Luis R. Rodriguez
Signed-off-by: Shuah Khan

Luis R. Rodriguez
2017-08-08 05:13:11 +0800
f9ea3225d bpf: fix selftest/bpf/test_pkt_md_access on s390x ... Browse Code »

Commit 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
introduced new eBPF test cases. One of them (test_pkt_md_access.c)
fails on s390x. The BPF verifier error message is:

[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 349 nsec
test_pkt_access:PASS:ipv6 212 nsec
[....]
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (71) r2 = *(u8 *)(r1 +0)
invalid bpf_context access off=0 size=1

libbpf: -- END LOG --
libbpf: failed to load program 'test1'
libbpf: failed to load object './test_pkt_md_access.o'
Summary: 29 PASSED, 1 FAILED
[root@s8360046 bpf]#

This is caused by a byte endianness issue. S390x is a big endian
architecture. Pointer access to the lowest byte or halfword of a
four byte value need to add an offset.
On little endian architectures this offset is not needed.

Fix this and use the same approach as the originator used for other files
(for example test_verifier.c) in his original commit.

With this fix the test program test_progs succeeds on s390x:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 236 nsec
test_pkt_access:PASS:ipv6 217 nsec
test_xdp:PASS:ipv4 3624 nsec
test_xdp:PASS:ipv6 1722 nsec
test_l4lb:PASS:ipv4 926 nsec
test_l4lb:PASS:ipv6 1322 nsec
test_tcp_estats:PASS: 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
test_pkt_md_access:PASS: 277 nsec
Summary: 30 PASSED, 0 FAILED
[root@s8360046 bpf]#

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Thomas Richter
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Thomas Richter
2017-08-08 01:06:27 +0800

05 Aug, 2017

2 commits

2c460621b bpf: fix byte order test in test_verifier ... Browse Code »

We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.

E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:

./test_verifier: ELF 64-bit MSB executable, [...]

Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux

Before fix: Summary: 505 PASSED, 11 FAILED
After fix: Summary: 516 PASSED, 0 FAILED

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Daniel Borkmann
Acked-by: Yonghong
Signed-off-by: David S. Miller

Daniel Borkmann
2017-08-05 07:09:06 +0800
bad1926dd bpf, s390: fix build for libbpf and selftest suite ... Browse Code »

The BPF feature test as well as libbpf is missing the __NR_bpf
define for s390 and currently refuses to compile (selftest suite
depends on libbpf as well). Similar issue was fixed some time
ago via b0c47807d31d ("bpf: Add sparc support to tools and
samples."), just do the same and add definitions.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2017-08-05 02:18:01 +0800

02 Aug, 2017

1 commit

0eb463453 ntb: ntb_test: ensure the link is up before trying to configure the mws ... Browse Code »

After the link tests, there is a race on one side of the test for
the link coming up. It's possible, in some cases, for the test script
to write to the 'peer_trans' files before the link has come up.

To fix this, we simply use the link event file to ensure both sides
see the link as up before continuning.

Signed-off-by: Logan Gunthorpe
Acked-by: Allen Hubbe
Signed-off-by: Jon Mason
Fixes: a9c59ef77458 ("ntb_test: Add a selftest script for the NTB subsystem")

Logan Gunthorpe
2017-08-02 03:18:59 +0800

01 Aug, 2017

1 commit

bc78d646e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Handle notifier registry failures properly in tun/tap driver, from
Tonghao Zhang.

2) Fix bpf verifier handling of subtraction bounds and add a testcase
for this, from Edward Cree.

3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
from Cong Wang.

5) Fix SElinux regression due to recent UDP optimizations, from Paolo
Abeni.

6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
paths, fix from Stefano Brivio.

7) Fix some mem leaks in dccp, from Xin Long.

8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
Bergmann.

9) Mac address length check and buffer size fixes from Cong Wang.

10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

11) Fix return value when copy_from_user() fails in
bpf_prog_get_info_by_fd(), from Daniel Borkmann.

12) Handle PHY_HALTED properly in phy library state machine, from
Florian Fainelli.

13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

14) Fix truesize calculation in virtio_net which led to performance
regressions, from Michael S Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
samples/bpf: fix bpf tunnel cleanup
udp6: fix jumbogram reception
ppp: Fix a scheduling-while-atomic bug in del_chan
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
virtio_net: fix truesize for mergeable buffers
mv643xx_eth: fix of_irq_to_resource() error check
MAINTAINERS: Add more files to the PHY LIBRARY section
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
tcp: avoid bogus gcc-7 array-bounds warning
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
bpf: don't indicate success when copy_from_user fails
udp6: fix socket leak on early demux
net: thunderx: Fix BGX transmit stall due to underflow
Revert "vhost: cache used event for better performance"
team: use a larger struct for mac address
net: check dev->addr_len for dev_set_mac_address()
phy: bcm-ns-usb3: fix MDIO_BUS dependency
...

Linus Torvalds
2017-08-01 13:36:42 +0800

27 Jul, 2017

3 commits

d777b2ddb bpf: don't zero out the info struct in bpf_obj_get_info_by_fd() ... Browse Code »

The buffer passed to bpf_obj_get_info_by_fd() should be initialized
to zeros. Kernel will enforce that to guarantee we can safely extend
info structures in the future.

Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
is problematic, however, since some members of the info structures
may need to be initialized by the callers (for instance pointers
to buffers to which kernel is to dump translated and jited images).

Remove the zeroing and fix up the in-tree callers before any kernel
has been released with this code.

As Daniel points out this seems to be the intended operation anyway,
since commit 95b9afd3987f ("bpf: Test for bpf ID") is itself setting
the buffer pointers before calling bpf_obj_get_info_by_fd().

Signed-off-by: Jakub Kicinski
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Jakub Kicinski
2017-07-27 08:02:52 +0800
67fbcd62f tools/kvm_stat: add '-f help' to get the available event list ... Browse Code »

Signed-off-by: Lin Ma
Signed-off-by: Paolo Bonzini

Lin Ma
2017-07-27 01:04:53 +0800
efcb52194 tools/kvm_stat: use variables instead of hard paths in help output ... Browse Code »

Using variables instead of hard paths makes the requirements information
more accurate.

Signed-off-by: Lin Ma
Signed-off-by: Paolo Bonzini

Lin Ma
2017-07-27 01:04:52 +0800

25 Jul, 2017

1 commit

545722cb0 selftests/bpf: subtraction bounds test ... Browse Code »

There is a bug in the verifier's handling of BPF_SUB: [a,b] - [c,d] yields
was [a-c, b-d] rather than the correct [a-d, b-c]. So here is a test
which, with the bogus handling, will produce ranges of [0,0] and thus
allowed accesses; whereas the correct handling will give a range of
[-255, 255] (and hence the right-shift will give a range of [0, 255]) and
the accesses will be rejected.

Signed-off-by: Edward Cree
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Edward Cree
2017-07-25 05:02:55 +0800

22 Jul, 2017

1 commit

bbcdea658 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Two hw-enablement patches, two race fixes, three fixes for regressions
of semantics, plus a number of tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add proper condition to run sched_task callbacks
perf/core: Fix locking for children siblings group read
perf/core: Fix scheduling regression of pinned groups
perf/x86/intel: Fix debug_store reset field for freq events
perf/x86/intel: Add Goldmont Plus CPU PMU support
perf/x86/intel: Enable C-state residency events for Apollo Lake
perf symbols: Accept zero as the kernel base address
Revert "perf/core: Drop kernel samples even though :u is specified"
perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
perf evsel: State in the default event name if attr.exclude_kernel is set
perf evsel: Fix attr.exclude_kernel setting for default cycles:p

Linus Torvalds
2017-07-22 02:12:48 +0800

21 Jul, 2017

5 commits

96080f697 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) BPF verifier signed/unsigned value tracking fix, from Daniel
Borkmann, Edward Cree, and Josef Bacik.

2) Fix memory allocation length when setting up calls to
->ndo_set_mac_address, from Cong Wang.

3) Add a new cxgb4 device ID, from Ganesh Goudar.

4) Fix FIB refcount handling, we have to set it's initial value before
the configure callback (which can bump it). From David Ahern.

5) Fix double-free in qcom/emac driver, from Timur Tabi.

6) A bunch of gcc-7 string format overflow warning fixes from Arnd
Bergmann.

7) Fix link level headroom tests in ip_do_fragment(), from Vasily
Averin.

8) Fix chunk walking in SCTP when iterating over error and parameter
headers. From Alexander Potapenko.

9) TCP BBR congestion control fixes from Neal Cardwell.

10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.

11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
Wang.

12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
from Gao Feng.

13) Cannot release skb->dst in UDP if IP options processing needs it.
From Paolo Abeni.

14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
Levin and myself.

15) Revert some rtnetlink notification changes that are causing
regressions, from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
net: bonding: Fix transmit load balancing in balance-alb mode
rds: Make sure updates to cp_send_gen can be observed
net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
ipv4: initialize fib_trie prior to register_netdev_notifier call.
rtnetlink: allocate more memory for dev_set_mac_address()
net: dsa: b53: Add missing ARL entries for BCM53125
bpf: more tests for mixed signed and unsigned bounds checks
bpf: add test for mixed signed and unsigned bounds checks
bpf: fix up test cases with mixed signed/unsigned bounds
bpf: allow to specify log level and reduce it for test_verifier
bpf: fix mixed signed/unsigned derived min/max value bounds
ipv6: avoid overflow of offset in ip6_find_1stfragopt
net: tehuti: don't process data if it has not been copied from userspace
Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
NET: dwmac: Make dwmac reset unconditional
net: Zero terminate ifr_name in dev_ifname().
wireless: wext: terminate ifr name coming from userspace
netfilter: fix netfilter_net_init() return
...

Linus Torvalds
2017-07-21 07:33:39 +0800
864125025 bpf: more tests for mixed signed and unsigned bounds checks ... Browse Code »

Add a couple of more test cases to BPF selftests that are related
to mixed signed and unsigned checks.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2017-07-21 06:20:27 +0800
b712296a4 bpf: add test for mixed signed and unsigned bounds checks ... Browse Code »

These failed due to a bug in verifier bounds handling.

Signed-off-by: Edward Cree
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Edward Cree
2017-07-21 06:20:27 +0800
a15021328 bpf: fix up test cases with mixed signed/unsigned bounds ... Browse Code »

Fix the few existing test cases that used mixed signed/unsigned
bounds and switch them only to one flavor. Reason why we need this
is that proper boundaries cannot be derived from mixed tests.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2017-07-21 06:20:27 +0800
d65549041 bpf: allow to specify log level and reduce it for test_verifier ... Browse Code »

For the test_verifier case, it's quite hard to parse log level 2 to
figure out what's causing an issue when used to log level 1. We do
want to use bpf_verify_program() in order to simulate some of the
tests with strict alignment. So just add an argument to pass the level
and put it to 1 for test_verifier.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2017-07-21 06:20:27 +0800

15 Jul, 2017

4 commits

867eacd7f Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge even more updates from Andrew Morton:

- a few leftovers

- fault-injector rework

- add a module loader test driver

* emailed patches from Andrew Morton :
kmod: throttle kmod thread limit
kmod: add test driver to stress test the module loader
MAINTAINERS: give kmod some maintainer love
xtensa: use generic fb.h
fault-inject: add /proc//fail-nth
fault-inject: simplify access check for fail-nth
fault-inject: make fail-nth read/write interface symmetric
fault-inject: parse as natural 1-based value for fail-nth write interface
fault-inject: automatically detect the number base for fail-nth write interface
kernel/watchdog.c: use better pr_fmt prefix
MAINTAINERS: move the befs tree to kernel.org
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
mm: fix overflow check in expand_upwards()

Linus Torvalds
2017-07-15 12:57:25 +0800
6d7964a72 kmod: throttle kmod thread limit ... Browse Code »

If we reach the limit of modprobe_limit threads running the next
request_module() call will fail. The original reason for adding a kill
was to do away with possible issues with in old circumstances which would
create a recursive series of request_module() calls.

We can do better than just be super aggressive and reject calls once we've
reached the limit by simply making pending callers wait until the
threshold has been reduced, and then throttling them in, one by one.

This throttling enables requests over the kmod concurrent limit to be
processed once a pending request completes. Only the first item queued up
to wait is woken up. The assumption here is once a task is woken it will
have no other option to also kick the queue to check if there are more
pending tasks -- regardless of whether or not it was successful.

By throttling and processing only max kmod concurrent tasks we ensure we
avoid unexpected fatal request_module() calls, and we keep memory
consumption on module loading to a minimum.

With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
time to run both tests:

time ./kmod.sh -t 0008
real 0m16.366s
user 0m0.883s
sys 0m8.916s

time ./kmod.sh -t 0009
real 0m50.803s
user 0m0.791s
sys 0m9.852s

Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Reviewed-by: Petr Mladek
Cc: Jessica Yu
Cc: Shuah Khan
Cc: Rusty Russell
Cc: Michal Marek
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-15 06:05:13 +0800
d9c6a72d6 kmod: add test driver to stress test the module loader ... Browse Code »

This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.

Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.

The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.

Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.

We only enable tests we know work as of right now.

Demo screenshots:

# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed

You can also request for specific tests:

# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed

Lastly, the current available number of tests:

# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t ]
Valid tests: 0001-0009

0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()

The following test cases currently fail, as such they are not currently
enabled by default:

# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009

To be sure to run them as intended please unload both of the modules:

o test_module
o xfs

And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().

Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:

cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads

Finally to trigger:

echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config

The kmod.sh script uses the above constructs to build different test cases.

A bit of interpretation of the current failures follows, first two
premises:

a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.

b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.

A few things to consider to help identify root causes of issues:

0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.

1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.

This one explanation why test case 0009 fails at least once for
get_fs_type().

2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.

This explains why *multiple* NULLs are possible on test 0009.

3) finit_module() consumes quite a bit of memory.

4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.

Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.

[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Jessica Yu
Cc: Shuah Khan
Cc: Rusty Russell
Cc: Michal Marek
Cc: Petr Mladek
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-15 06:05:13 +0800
ccd5d1b91 Merge tag 'ntb-4.13' of git://github.com/jonmason/ntb ... Browse Code »

Pull NTB updates from Jon Mason:
"The major change in the series is a rework of the NTB infrastructure
to all for IDT hardware to be supported (and resulting fallout from
that). There are also a few clean-ups, etc.

New IDT NTB driver and changes to the NTB infrastructure to allow for
this different kind of NTB HW, some style fixes (per Greg KH
recommendation), and some ntb_test tweaks"

* tag 'ntb-4.13' of git://github.com/jonmason/ntb:
ntb_netdev: set the net_device's parent
ntb: Add error path/handling to Debug FS entry creation
ntb: Add more debugfs support for ntb_perf testing options
ntb: Remove debug-fs variables from the context structure
ntb: Add a module option to control affinity of DMA channels
NTB: Add IDT 89HPESxNTx PCIe-switches support
ntb_hw_intel: Style fixes: open code macros that just obfuscate code
ntb_hw_amd: Style fixes: open code macros that just obfuscate code
NTB: Add ntb.h comments
NTB: Add PCIe Gen4 link speed
NTB: Add new Memory Windows API documentation
NTB: Add Messaging NTB API
NTB: Alter Scratchpads API to support multi-ports devices
NTB: Alter MW API to support multi-ports devices
NTB: Alter link-state API to support multi-port devices
NTB: Add indexed ports NTB API
NTB: Make link-state API being declared first
NTB: ntb_test: add parameter for doorbell bitmask
NTB: ntb_test: modprobe on remote host

Linus Torvalds
2017-07-15 04:31:52 +0800

14 Jul, 2017

3 commits

bc0f51d35 Merge tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull more tracing updates from Steven Rostedt:
"A few more minor updates:

- Show the tgid mappings for user space trace tools to use

- Fix and optimize the comm and tgid cache recording

- Sanitize derived kprobe names

- Ftrace selftest updates

- trace file header fix

- Update of Documentation/trace/ftrace.txt

- Compiler warning fixes

- Fix possible uninitialized variable"

* tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix uninitialized variable in match_records()
ftrace: Remove an unneeded NULL check
ftrace: Hide cached module code for !CONFIG_MODULES
tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
tracing: Update Documentation/trace/ftrace.txt
tracing: Fixup trace file header alignment
selftests/ftrace: Add a testcase for kprobe event naming
selftests/ftrace: Add a test to probe module functions
selftests/ftrace: Update multiple kprobes test for powerpc
trace/kprobes: Sanitize derived event names
tracing: Attempt to record other information even if some fail
tracing: Treat recording tgid for idle task as a success
tracing: Treat recording comm for idle task as a success
tracing: Add saved_tgids file to show cached pid to tgid mappings

Linus Torvalds
2017-07-14 04:17:19 +0800
ad51271af Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge yet more updates from Andrew Morton:

- various misc things

- kexec updates

- sysctl core updates

- scripts/gdb udpates

- checkpoint-restart updates

- ipc updates

- kernel/watchdog updates

- Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"

- "stackprotector: ascii armor the stack canary"

- more MM bits

- checkpatch updates

* emailed patches from Andrew Morton : (96 commits)
writeback: rework wb_[dec|inc]_stat family of functions
ARM: samsung: usb-ohci: move inline before return type
video: fbdev: omap: move inline before return type
video: fbdev: intelfb: move inline before return type
USB: serial: safe_serial: move __inline__ before return type
drivers: tty: serial: move inline before return type
drivers: s390: move static and inline before return type
x86/efi: move asmlinkage before return type
sh: move inline before return type
MIPS: SMP: move asmlinkage before return type
m68k: coldfire: move inline before return type
ia64: sn: pci: move inline before type
ia64: move inline before return type
FRV: tlbflush: move asmlinkage before return type
CRIS: gpio: move inline before return type
ARM: HP Jornada 7XX: move inline before return type
ARM: KVM: move asmlinkage before type
checkpatch: improve the STORAGE_CLASS test
mm, migration: do not trigger OOM killer when migrating memory
drm/i915: use __GFP_RETRY_MAYFAIL
...

Linus Torvalds
2017-07-14 03:38:49 +0800
3a00be192 Merge tag 'rtc-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux ... Browse Code »

Pull RTC updates from Alexandre Belloni:
"Here is the pull-request for the RTC subsystem for 4.13.

Subsystem:

- expose non volatile RAM using nvmem instead of open coding in many
drivers. Unfortunately, this option has to be enabled by default to
not break existing users.

- rtctest can now test for cutoff dates, showing when an RTC will
start failing to properly save time and date.

- new RTC registration functions to remove race conditions in drivers

Newly supported RTCs:

- Broadcom STB wake-timer

- Epson RX8130CE

- Maxim IC DS1308

- STMicroelectronics STM32H7

Drivers:

- ds1307: use regmap, use nvmem, more cleanups

- ds3232: temperature reading support

- gemini: renamed to ftrtc010

- m41t80: use CCF to expose the clock

- rv8803: use nvmem

- s3c: many cleanups

- st-lpc: fix y2106 bug"

* tag 'rtc-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (51 commits)
rtc: Remove wrong deprecation comment
nvmem: include linux/err.h from header
rtc: st-lpc: make it robust against y2038/2106 bug
rtc: rtctest: add check for problematic dates
tools: timer: add rtctest_setdate
rtc: ds1307: remove ds1307_remove
rtc: ds1307: use generic nvmem
rtc: ds1307: switch to rtc_register_device
rtc: rv8803: remove rv8803_remove
rtc: rv8803: use generic nvmem support
rtc: rv8803: switch to rtc_register_device
rtc: add generic nvmem support
rtc: at91rm9200: remove race condition
rtc: introduce new registration method
rtc: class separate id allocation from registration
rtc: class separate device allocation from registration
rtc: stm32: add STM32H7 RTC support
dt-bindings: rtc: stm32: add support for STM32H7
rtc: ds1307: add ds1308 variant
rtc: ds3232: add temperature support
...

Linus Torvalds
2017-07-14 03:15:06 +0800

13 Jul, 2017

2 commits

edaf38251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari
and Arnd Bergmann.

2) Fix race in statistics gathering in bnxt_en driver, from Michael
Chan.

3) Can't use a mutex in RCU reader protected section on tap driver, from
Cong WANG.

4) Fix mdb leak in bridging code, from Eduardo Valentin.

5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.

6) Buffer overflow in brcmfmac driver, from Arend van SPriel.

7) ioremap_nocache() return value needs to be checked in smsc911x
driver, from Alexey Khoroshilov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
net: stmmac: revert "support future possible different internal phy mode"
sfc: don't read beyond unicast address list
datagram: fix kernel-doc comments
socket: add documentation for missing elements
smsc911x: Add check for ioremap_nocache() return code
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
net: hns: Bugfix for Tx timeout handling in hns driver
net: ipmr: ipmr_get_table() returns NULL
nfp: freeing the wrong variable
mlxsw: spectrum_switchdev: Check status of memory allocation
mlxsw: spectrum_switchdev: Remove unused variable
mlxsw: spectrum_router: Fix use-after-free in route replace
mlxsw: spectrum_router: Add missing rollback
samples/bpf: fix a build issue
bridge: mdb: fix leak on complete_info ptr on fail path
tap: convert a mutex to a spinlock
cxgb4: fix BUG() on interrupt deallocating path of ULD
qed: Fix printk option passed when printing ipv6 addresses
net: Fix minor code bug in timestamping.txt
net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer
...

Linus Torvalds
2017-07-13 10:30:57 +0800
dcda9b047 mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic ... Browse Code »

__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.

Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)

- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim

- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.

- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.

- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).

- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.

- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.

- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.

Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.

This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.

[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Alex Belits
Cc: Chris Wilson
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: David Daney
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: NeilBrown
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-07-13 07:26:03 +0800