28 Aug, 2017

3 commits


27 Aug, 2017

1 commit


22 Aug, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) Fix IGMP handling wrt VRF, from David Ahern.

    2) Fix timer access to freed object in dccp, from Eric Dumazet.

    3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
    triggerable by userspace. Also from Eric Dumazet.

    4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
    King.

    5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.

    6) Fix use after free in TIPC, from Eric Dumazet.

    7) When replacing a route in ipv6 we need to reset the round robin
    pointer, from Wei Wang.

    8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
    relaxed ordering changes, from Thierry Redding. I made sure to get
    an explicit ACK from Bjorn this time around :-)

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
    ipv6: repair fib6 tree in failure case
    net_sched: fix order of queue length updates in qdisc_replace()
    tools lib bpf: improve warning
    switchdev: documentation: minor typo fixes
    bpf, doc: also add s390x as arch to sysctl description
    net: sched: fix NULL pointer dereference when action calls some targets
    rxrpc: Fix oops when discarding a preallocated service call
    irda: do not leak initialized list.dev to userspace
    net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
    PCI: Allow PCI express root ports to find themselves
    tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
    ipv6: reset fn->rr_ptr when replacing route
    sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    tipc: fix use-after-free
    tun: handle register_netdevice() failures properly
    datagram: When peeking datagrams with offset < 0 don't skip empty skbs
    bpf, doc: improve sysctl knob description
    netxen: fix incorrect loop counter decrement
    nfp: fix infinite loop on umapping cleanup
    ...

    Linus Torvalds
     

21 Aug, 2017

2 commits

  • With '-mtune=atom', which is enabled with CONFIG_MATOM=y, GCC uses some
    unusual instructions for setting up the stack.

    Instead of:

    mov %rsp, %rbp

    it does:

    lea (%rsp), %rbp

    And instead of:

    add imm, %rsp

    it does:

    lea disp(%rsp), %rsp

    Add support for these instructions to the objtool decoder.

    Reported-by: Arnd Bergmann
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: baa41469a7b9 ("objtool: Implement stack validation 2.0")
    Link: http://lkml.kernel.org/r/4ea1db896e821226efe1f8e09f270771bde47e65.1501188854.git.jpoimboe@redhat.com
    [ This is a cherry-picked version of upcoming commit 5b8de48e82ba. ]
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Signed-off-by: Eric Leblond
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Eric Leblond
     

19 Aug, 2017

1 commit

  • The descriptions were reversed, correct this.

    Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
    Fixes: 64b671204afd71 ("test_sysctl: add generic script to expand on tests")
    Signed-off-by: Luis R. Rodriguez
    Reported-by: Daniel Mentz
    Cc: "Eric W. Biederman"
    Cc: Colin Ian King
    Cc: Dan Carpenter
    Cc: David Binderman
    Cc: Dmitry Torokhov
    Cc: Ingo Molnar
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Matt Redfearn
    Cc: Matt Redfearn
    Cc: Michal Marek
    Cc: Miroslav Benes
    Cc: Peter Zijlstra (Intel)
    Cc: Petr Mladek
    Cc: Rusty Russell
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

17 Aug, 2017

3 commits

  • Currently this warning is triggered when compiling hv_fcopy_daemon:

    hv_fcopy_daemon.c:216:4: warning: dereferencing type-punned pointer will break
    strict-aliasing rules [-Wstrict-aliasing]
    kernel_modver = *(__u32 *)buffer;

    Convert the send/receive buffer to a union and pass individual members as
    needed. This also gives the correct size for the buffer.

    Signed-off-by: Olaf Hering
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Olaf Hering
     
  • Increase buffer size so that "_{-INT_MAX}" will fit.
    Spotted by the gcc7 snprintf checker.

    Signed-off-by: Olaf Hering
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Olaf Hering
     
  • Since a loop device is backed by a file, a backup will already result in
    its parent filesystem being frozen. It's sufficient to just freeze the
    parent filesystem, so we can skip the loop device.

    This avoids a situation where a loop device and its parent filesystem are
    both frozen and then thawed out of order. For example, if the loop device
    is enumerated first, we would thaw it while its parent filesystem is still
    frozen. The thaw operation fails and the loop device remains frozen.

    Signed-off-by: Alex Ng
    Signed-off-by: Vyronas Tsingaras
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     

16 Aug, 2017

1 commit

  • …/kernel/git/shuah/linux-kselftest

    Pull kselftest fixes from Shuah Khan:
    "This update consists of important compile and run-time error fixes to
    timers/freq-step, kmod, and sysctl tests"

    * tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    selftests: timers: freq-step: fix compile error
    selftests: futex: fix run_tests target
    test_sysctl: fix sysctl.sh by making it executable
    test_kmod: fix kmod.sh by making it executable

    Linus Torvalds
     

11 Aug, 2017

1 commit

  • Fix compile error due to ksft_exit_skip() update to take var_args.

    freq-step.c: In function ‘init_test’:
    freq-step.c:234:3: error: too few arguments to function ‘ksft_exit_skip’
    ksft_exit_skip();
    ^~~~~~~~~~~~~~
    In file included from freq-step.c:26:0:
    ../kselftest.h:167:19: note: declared here
    static inline int ksft_exit_skip(const char *msg, ...)
    ^~~~~~~~~~~~~~
    : recipe for target 'freq-step' failed

    Signed-off-by: Shuah Khan

    Shuah Khan
     

10 Aug, 2017

1 commit

  • make -C tools/testing/selftests/futex/ run_tests doesn't run the futex
    tests.

    Running the tests when `dirname $(OUTPUT)` == $(PWD) doesn't work when
    the $(OUTPUT) is $(PWD) which is the case when the test is run using
    make -C tools/testing/selftests/futex/ run_tests.

    Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT")
    Signed-off-by: Shuah Khan
    Reviewed-by: Darren Hart (VMware)
    Signed-off-by: Shuah Khan

    Shuah Khan
     

08 Aug, 2017

3 commits

  • We had just forogtten to do this. Without this the following test fails:

    $ sudo make -C tools/testing/selftests/sysctl/ run_tests
    make: Entering directory '/home/mcgrof/linux-next/tools/testing/selftests/sysctl'
    /bin/sh: ./sysctl.sh: Permission denied
    selftests: sysctl.sh [FAIL]
    /home/mcgrof/linux-next/tools/testing/selftests/sysctl
    make: Leaving directory '/home/mcgrof/linux-next/tools/testing/selftests/sysctl'

    Fixes: 64b671204afd71 ("test_sysctl: add generic script to expand on tests")
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Shuah Khan

    Luis R. Rodriguez
     
  • We had just forgotten to do this. Without this if we run the
    following we get a permission denied:

    sudo make -C tools/testing/selftests/kmod/ run_tests
    make: Entering directory '/home/mcgrof/linux-next/tools/testing/selftests/kmod'
    /bin/sh: ./kmod.sh: Permission denied
    selftests: kmod.sh [FAIL]
    /home/mcgrof/linux-next/tools/testing/selftests/kmod
    make: Leaving directory '/home/mcgrof/linux-next/tools/testing/selftests/kmod

    Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Shuah Khan

    Luis R. Rodriguez
     
  • Commit 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
    introduced new eBPF test cases. One of them (test_pkt_md_access.c)
    fails on s390x. The BPF verifier error message is:

    [root@s8360046 bpf]# ./test_progs
    test_pkt_access:PASS:ipv4 349 nsec
    test_pkt_access:PASS:ipv6 212 nsec
    [....]
    libbpf: load bpf program failed: Permission denied
    libbpf: -- BEGIN DUMP LOG ---
    libbpf:
    0: (71) r2 = *(u8 *)(r1 +0)
    invalid bpf_context access off=0 size=1

    libbpf: -- END LOG --
    libbpf: failed to load program 'test1'
    libbpf: failed to load object './test_pkt_md_access.o'
    Summary: 29 PASSED, 1 FAILED
    [root@s8360046 bpf]#

    This is caused by a byte endianness issue. S390x is a big endian
    architecture. Pointer access to the lowest byte or halfword of a
    four byte value need to add an offset.
    On little endian architectures this offset is not needed.

    Fix this and use the same approach as the originator used for other files
    (for example test_verifier.c) in his original commit.

    With this fix the test program test_progs succeeds on s390x:
    [root@s8360046 bpf]# ./test_progs
    test_pkt_access:PASS:ipv4 236 nsec
    test_pkt_access:PASS:ipv6 217 nsec
    test_xdp:PASS:ipv4 3624 nsec
    test_xdp:PASS:ipv6 1722 nsec
    test_l4lb:PASS:ipv4 926 nsec
    test_l4lb:PASS:ipv6 1322 nsec
    test_tcp_estats:PASS: 0 nsec
    test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
    test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
    test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
    test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
    test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
    test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
    test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
    test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
    test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
    test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
    test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
    test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
    test_pkt_md_access:PASS: 277 nsec
    Summary: 30 PASSED, 0 FAILED
    [root@s8360046 bpf]#

    Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
    Signed-off-by: Thomas Richter
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Thomas Richter
     

05 Aug, 2017

2 commits

  • We really must check with #if __BYTE_ORDER == XYZ instead of
    just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
    actually running this on big endian machine, the latter test
    resolves to true for user space, same for #ifdef __BIG_ENDIAN.

    E.g., looking at endian.h from libc, both are also defined
    there, so we really must test this against __BYTE_ORDER instead
    for proper insns selection. For the kernel, such checks are
    fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
    __BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
    in M32R's asm/stat.h") for some more context, but not for
    user space. Lets also make sure to properly include endian.h.
    After that, suite passes for me:

    ./test_verifier: ELF 64-bit MSB executable, [...]

    Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux

    Before fix: Summary: 505 PASSED, 11 FAILED
    After fix: Summary: 516 PASSED, 0 FAILED

    Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The BPF feature test as well as libbpf is missing the __NR_bpf
    define for s390 and currently refuses to compile (selftest suite
    depends on libbpf as well). Similar issue was fixed some time
    ago via b0c47807d31d ("bpf: Add sparc support to tools and
    samples."), just do the same and add definitions.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

02 Aug, 2017

1 commit

  • After the link tests, there is a race on one side of the test for
    the link coming up. It's possible, in some cases, for the test script
    to write to the 'peer_trans' files before the link has come up.

    To fix this, we simply use the link event file to ensure both sides
    see the link as up before continuning.

    Signed-off-by: Logan Gunthorpe
    Acked-by: Allen Hubbe
    Signed-off-by: Jon Mason
    Fixes: a9c59ef77458 ("ntb_test: Add a selftest script for the NTB subsystem")

    Logan Gunthorpe
     

01 Aug, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) Handle notifier registry failures properly in tun/tap driver, from
    Tonghao Zhang.

    2) Fix bpf verifier handling of subtraction bounds and add a testcase
    for this, from Edward Cree.

    3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

    4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
    from Cong Wang.

    5) Fix SElinux regression due to recent UDP optimizations, from Paolo
    Abeni.

    6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
    paths, fix from Stefano Brivio.

    7) Fix some mem leaks in dccp, from Xin Long.

    8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
    Bergmann.

    9) Mac address length check and buffer size fixes from Cong Wang.

    10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

    11) Fix return value when copy_from_user() fails in
    bpf_prog_get_info_by_fd(), from Daniel Borkmann.

    12) Handle PHY_HALTED properly in phy library state machine, from
    Florian Fainelli.

    13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

    14) Fix truesize calculation in virtio_net which led to performance
    regressions, from Michael S Tsirkin.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
    samples/bpf: fix bpf tunnel cleanup
    udp6: fix jumbogram reception
    ppp: Fix a scheduling-while-atomic bug in del_chan
    Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
    virtio_net: fix truesize for mergeable buffers
    mv643xx_eth: fix of_irq_to_resource() error check
    MAINTAINERS: Add more files to the PHY LIBRARY section
    ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
    net: phy: Correctly process PHY_HALTED in phy_stop_machine()
    sunhme: fix up GREG_STAT and GREG_IMASK register offsets
    bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
    tcp: avoid bogus gcc-7 array-bounds warning
    net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
    bpf: don't indicate success when copy_from_user fails
    udp6: fix socket leak on early demux
    net: thunderx: Fix BGX transmit stall due to underflow
    Revert "vhost: cache used event for better performance"
    team: use a larger struct for mac address
    net: check dev->addr_len for dev_set_mac_address()
    phy: bcm-ns-usb3: fix MDIO_BUS dependency
    ...

    Linus Torvalds
     

27 Jul, 2017

3 commits

  • The buffer passed to bpf_obj_get_info_by_fd() should be initialized
    to zeros. Kernel will enforce that to guarantee we can safely extend
    info structures in the future.

    Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
    is problematic, however, since some members of the info structures
    may need to be initialized by the callers (for instance pointers
    to buffers to which kernel is to dump translated and jited images).

    Remove the zeroing and fix up the in-tree callers before any kernel
    has been released with this code.

    As Daniel points out this seems to be the intended operation anyway,
    since commit 95b9afd3987f ("bpf: Test for bpf ID") is itself setting
    the buffer pointers before calling bpf_obj_get_info_by_fd().

    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Signed-off-by: Lin Ma
    Signed-off-by: Paolo Bonzini

    Lin Ma
     
  • Using variables instead of hard paths makes the requirements information
    more accurate.

    Signed-off-by: Lin Ma
    Signed-off-by: Paolo Bonzini

    Lin Ma
     

25 Jul, 2017

1 commit

  • There is a bug in the verifier's handling of BPF_SUB: [a,b] - [c,d] yields
    was [a-c, b-d] rather than the correct [a-d, b-c]. So here is a test
    which, with the bogus handling, will produce ranges of [0,0] and thus
    allowed accesses; whereas the correct handling will give a range of
    [-255, 255] (and hence the right-shift will give a range of [0, 255]) and
    the accesses will be rejected.

    Signed-off-by: Edward Cree
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Edward Cree
     

22 Jul, 2017

1 commit

  • Pull perf fixes from Ingo Molnar:
    "Two hw-enablement patches, two race fixes, three fixes for regressions
    of semantics, plus a number of tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/intel: Add proper condition to run sched_task callbacks
    perf/core: Fix locking for children siblings group read
    perf/core: Fix scheduling regression of pinned groups
    perf/x86/intel: Fix debug_store reset field for freq events
    perf/x86/intel: Add Goldmont Plus CPU PMU support
    perf/x86/intel: Enable C-state residency events for Apollo Lake
    perf symbols: Accept zero as the kernel base address
    Revert "perf/core: Drop kernel samples even though :u is specified"
    perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
    perf evsel: State in the default event name if attr.exclude_kernel is set
    perf evsel: Fix attr.exclude_kernel setting for default cycles:p

    Linus Torvalds
     

21 Jul, 2017

5 commits

  • Pull networking fixes from David Miller:

    1) BPF verifier signed/unsigned value tracking fix, from Daniel
    Borkmann, Edward Cree, and Josef Bacik.

    2) Fix memory allocation length when setting up calls to
    ->ndo_set_mac_address, from Cong Wang.

    3) Add a new cxgb4 device ID, from Ganesh Goudar.

    4) Fix FIB refcount handling, we have to set it's initial value before
    the configure callback (which can bump it). From David Ahern.

    5) Fix double-free in qcom/emac driver, from Timur Tabi.

    6) A bunch of gcc-7 string format overflow warning fixes from Arnd
    Bergmann.

    7) Fix link level headroom tests in ip_do_fragment(), from Vasily
    Averin.

    8) Fix chunk walking in SCTP when iterating over error and parameter
    headers. From Alexander Potapenko.

    9) TCP BBR congestion control fixes from Neal Cardwell.

    10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.

    11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
    Wang.

    12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
    from Gao Feng.

    13) Cannot release skb->dst in UDP if IP options processing needs it.
    From Paolo Abeni.

    14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
    Levin and myself.

    15) Revert some rtnetlink notification changes that are causing
    regressions, from David Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
    net: bonding: Fix transmit load balancing in balance-alb mode
    rds: Make sure updates to cp_send_gen can be observed
    net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
    ipv4: initialize fib_trie prior to register_netdev_notifier call.
    rtnetlink: allocate more memory for dev_set_mac_address()
    net: dsa: b53: Add missing ARL entries for BCM53125
    bpf: more tests for mixed signed and unsigned bounds checks
    bpf: add test for mixed signed and unsigned bounds checks
    bpf: fix up test cases with mixed signed/unsigned bounds
    bpf: allow to specify log level and reduce it for test_verifier
    bpf: fix mixed signed/unsigned derived min/max value bounds
    ipv6: avoid overflow of offset in ip6_find_1stfragopt
    net: tehuti: don't process data if it has not been copied from userspace
    Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
    net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
    dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
    NET: dwmac: Make dwmac reset unconditional
    net: Zero terminate ifr_name in dev_ifname().
    wireless: wext: terminate ifr name coming from userspace
    netfilter: fix netfilter_net_init() return
    ...

    Linus Torvalds
     
  • Add a couple of more test cases to BPF selftests that are related
    to mixed signed and unsigned checks.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • These failed due to a bug in verifier bounds handling.

    Signed-off-by: Edward Cree
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Fix the few existing test cases that used mixed signed/unsigned
    bounds and switch them only to one flavor. Reason why we need this
    is that proper boundaries cannot be derived from mixed tests.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • For the test_verifier case, it's quite hard to parse log level 2 to
    figure out what's causing an issue when used to log level 1. We do
    want to use bpf_verify_program() in order to simulate some of the
    tests with strict alignment. So just add an argument to pass the level
    and put it to 1 for test_verifier.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Jul, 2017

4 commits

  • Merge even more updates from Andrew Morton:

    - a few leftovers

    - fault-injector rework

    - add a module loader test driver

    * emailed patches from Andrew Morton :
    kmod: throttle kmod thread limit
    kmod: add test driver to stress test the module loader
    MAINTAINERS: give kmod some maintainer love
    xtensa: use generic fb.h
    fault-inject: add /proc//fail-nth
    fault-inject: simplify access check for fail-nth
    fault-inject: make fail-nth read/write interface symmetric
    fault-inject: parse as natural 1-based value for fail-nth write interface
    fault-inject: automatically detect the number base for fail-nth write interface
    kernel/watchdog.c: use better pr_fmt prefix
    MAINTAINERS: move the befs tree to kernel.org
    lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
    mm: fix overflow check in expand_upwards()

    Linus Torvalds
     
  • If we reach the limit of modprobe_limit threads running the next
    request_module() call will fail. The original reason for adding a kill
    was to do away with possible issues with in old circumstances which would
    create a recursive series of request_module() calls.

    We can do better than just be super aggressive and reject calls once we've
    reached the limit by simply making pending callers wait until the
    threshold has been reduced, and then throttling them in, one by one.

    This throttling enables requests over the kmod concurrent limit to be
    processed once a pending request completes. Only the first item queued up
    to wait is woken up. The assumption here is once a task is woken it will
    have no other option to also kick the queue to check if there are more
    pending tasks -- regardless of whether or not it was successful.

    By throttling and processing only max kmod concurrent tasks we ensure we
    avoid unexpected fatal request_module() calls, and we keep memory
    consumption on module loading to a minimum.

    With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
    time to run both tests:

    time ./kmod.sh -t 0008
    real 0m16.366s
    user 0m0.883s
    sys 0m8.916s

    time ./kmod.sh -t 0009
    real 0m50.803s
    user 0m0.791s
    sys 0m9.852s

    Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Reviewed-by: Petr Mladek
    Cc: Jessica Yu
    Cc: Shuah Khan
    Cc: Rusty Russell
    Cc: Michal Marek
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • This adds a new stress test driver for kmod: the kernel module loader.
    The new stress test driver, test_kmod, is only enabled as a module right
    now. It should be possible to load this as built-in and load tests
    early (refer to the force_init_test module parameter), however since a
    lot of test can get a system out of memory fast we leave this disabled
    for now.

    Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
    fast with this test driver.

    The test_kmod driver exposes API knobs for us to fine tune simple
    request_module() and get_fs_type() calls. Since these API calls only
    allow each one parameter a test driver for these is rather simple.
    Other factors that can help out test driver though are the number of
    calls we issue and knowing current limitations of each. This exposes
    configuration as much as possible through userspace to be able to build
    tests directly from userspace.

    Since it allows multiple misc devices its will eventually (once we add a
    knob to let us create new devices at will) also be possible to perform
    more tests in parallel, provided you have enough memory.

    We only enable tests we know work as of right now.

    Demo screenshots:

    # tools/testing/selftests/kmod/kmod.sh
    kmod_test_0001_driver: OK! - loading kmod test
    kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0001_fs: OK! - loading kmod test
    kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    kmod_test_0002_driver: OK! - loading kmod test
    kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0002_fs: OK! - loading kmod test
    kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    kmod_test_0003: OK! - loading kmod test
    kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0004: OK! - loading kmod test
    kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0005: OK! - loading kmod test
    kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0006: OK! - loading kmod test
    kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0005: OK! - loading kmod test
    kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0006: OK! - loading kmod test
    kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    XXX: add test restult for 0007
    Test completed

    You can also request for specific tests:

    # tools/testing/selftests/kmod/kmod.sh -t 0001
    kmod_test_0001_driver: OK! - loading kmod test
    kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0001_fs: OK! - loading kmod test
    kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    Test completed

    Lastly, the current available number of tests:

    # tools/testing/selftests/kmod/kmod.sh --help
    Usage: tools/testing/selftests/kmod/kmod.sh [ -t ]
    Valid tests: 0001-0009

    0001 - Simple test - 1 thread for empty string
    0002 - Simple test - 1 thread for modules/filesystems that do not exist
    0003 - Simple test - 1 thread for get_fs_type() only
    0004 - Simple test - 2 threads for get_fs_type() only
    0005 - multithreaded tests with default setup - request_module() only
    0006 - multithreaded tests with default setup - get_fs_type() only
    0007 - multithreaded tests with default setup test request_module() and get_fs_type()
    0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
    0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()

    The following test cases currently fail, as such they are not currently
    enabled by default:

    # tools/testing/selftests/kmod/kmod.sh -t 0008
    # tools/testing/selftests/kmod/kmod.sh -t 0009

    To be sure to run them as intended please unload both of the modules:

    o test_module
    o xfs

    And ensure they are not loaded on your system prior to testing them. If
    you use these paritions for your rootfs you can change the default test
    driver used for get_fs_type() by exporting it into your environment. For
    example of other test defaults you can override refer to kmod.sh
    allow_user_defaults().

    Behind the scenes this is how we fine tune at a test case prior to
    hitting a trigger to run it:

    cat /sys/devices/virtual/misc/test_kmod0/config
    echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
    echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
    echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
    cat /sys/devices/virtual/misc/test_kmod0/config
    echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads

    Finally to trigger:

    echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config

    The kmod.sh script uses the above constructs to build different test cases.

    A bit of interpretation of the current failures follows, first two
    premises:

    a) When request_module() is used userspace figures out an optimized
    version of module order for us. Once it finds the modules it needs, as
    per depmod symbol dep map, it will finit_module() the respective
    modules which are needed for the original request_module() request.

    b) We have an optimization in place whereby if a kernel uses
    request_module() on a module already loaded we never bother userspace
    as the module already is loaded. This is all handled by kernel/kmod.c.

    A few things to consider to help identify root causes of issues:

    0) kmod 19 has a broken heuristic for modules being assumed to be
    built-in to your kernel and will return 0 even though request_module()
    failed. Upgrade to a newer version of kmod.

    1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
    not for "xfs". The optimization in kernel described in b) fails to
    catch if we have a lot of consecutive get_fs_type() calls. The reason
    is the optimization in place does not look for aliases. This means two
    consecutive get_fs_type() calls will bump kmod_concurrent, whereas
    request_module() will not.

    This one explanation why test case 0009 fails at least once for
    get_fs_type().

    2) If a module fails to load --- for whatever reason (kmod_concurrent
    limit reached, file not yet present due to rootfs switch, out of
    memory) we have a period of time during which module request for the
    same name either with request_module() or get_fs_type() will *also*
    fail to load even if the file for the module is ready.

    This explains why *multiple* NULLs are possible on test 0009.

    3) finit_module() consumes quite a bit of memory.

    4) Filesystems typically also have more dependent modules than other
    modules, its important to note though that even though a get_fs_type()
    call does not incur additional kmod_concurrent bumps, since userspace
    loads dependencies it finds it needs via finit_module_fd(), it *will*
    take much more memory to load a module with a lot of dependencies.

    Because of 3) and 4) we will easily run into out of memory failures with
    certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
    It panics a box after reaping all userspace processes and still not
    having enough memory to reap.

    [arnd@arndb.de: add dependencies for test module]
    Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Jessica Yu
    Cc: Shuah Khan
    Cc: Rusty Russell
    Cc: Michal Marek
    Cc: Petr Mladek
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Pull NTB updates from Jon Mason:
    "The major change in the series is a rework of the NTB infrastructure
    to all for IDT hardware to be supported (and resulting fallout from
    that). There are also a few clean-ups, etc.

    New IDT NTB driver and changes to the NTB infrastructure to allow for
    this different kind of NTB HW, some style fixes (per Greg KH
    recommendation), and some ntb_test tweaks"

    * tag 'ntb-4.13' of git://github.com/jonmason/ntb:
    ntb_netdev: set the net_device's parent
    ntb: Add error path/handling to Debug FS entry creation
    ntb: Add more debugfs support for ntb_perf testing options
    ntb: Remove debug-fs variables from the context structure
    ntb: Add a module option to control affinity of DMA channels
    NTB: Add IDT 89HPESxNTx PCIe-switches support
    ntb_hw_intel: Style fixes: open code macros that just obfuscate code
    ntb_hw_amd: Style fixes: open code macros that just obfuscate code
    NTB: Add ntb.h comments
    NTB: Add PCIe Gen4 link speed
    NTB: Add new Memory Windows API documentation
    NTB: Add Messaging NTB API
    NTB: Alter Scratchpads API to support multi-ports devices
    NTB: Alter MW API to support multi-ports devices
    NTB: Alter link-state API to support multi-port devices
    NTB: Add indexed ports NTB API
    NTB: Make link-state API being declared first
    NTB: ntb_test: add parameter for doorbell bitmask
    NTB: ntb_test: modprobe on remote host

    Linus Torvalds
     

14 Jul, 2017

3 commits

  • Pull more tracing updates from Steven Rostedt:
    "A few more minor updates:

    - Show the tgid mappings for user space trace tools to use

    - Fix and optimize the comm and tgid cache recording

    - Sanitize derived kprobe names

    - Ftrace selftest updates

    - trace file header fix

    - Update of Documentation/trace/ftrace.txt

    - Compiler warning fixes

    - Fix possible uninitialized variable"

    * tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Fix uninitialized variable in match_records()
    ftrace: Remove an unneeded NULL check
    ftrace: Hide cached module code for !CONFIG_MODULES
    tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
    tracing: Update Documentation/trace/ftrace.txt
    tracing: Fixup trace file header alignment
    selftests/ftrace: Add a testcase for kprobe event naming
    selftests/ftrace: Add a test to probe module functions
    selftests/ftrace: Update multiple kprobes test for powerpc
    trace/kprobes: Sanitize derived event names
    tracing: Attempt to record other information even if some fail
    tracing: Treat recording tgid for idle task as a success
    tracing: Treat recording comm for idle task as a success
    tracing: Add saved_tgids file to show cached pid to tgid mappings

    Linus Torvalds
     
  • Merge yet more updates from Andrew Morton:

    - various misc things

    - kexec updates

    - sysctl core updates

    - scripts/gdb udpates

    - checkpoint-restart updates

    - ipc updates

    - kernel/watchdog updates

    - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"

    - "stackprotector: ascii armor the stack canary"

    - more MM bits

    - checkpatch updates

    * emailed patches from Andrew Morton : (96 commits)
    writeback: rework wb_[dec|inc]_stat family of functions
    ARM: samsung: usb-ohci: move inline before return type
    video: fbdev: omap: move inline before return type
    video: fbdev: intelfb: move inline before return type
    USB: serial: safe_serial: move __inline__ before return type
    drivers: tty: serial: move inline before return type
    drivers: s390: move static and inline before return type
    x86/efi: move asmlinkage before return type
    sh: move inline before return type
    MIPS: SMP: move asmlinkage before return type
    m68k: coldfire: move inline before return type
    ia64: sn: pci: move inline before type
    ia64: move inline before return type
    FRV: tlbflush: move asmlinkage before return type
    CRIS: gpio: move inline before return type
    ARM: HP Jornada 7XX: move inline before return type
    ARM: KVM: move asmlinkage before type
    checkpatch: improve the STORAGE_CLASS test
    mm, migration: do not trigger OOM killer when migrating memory
    drm/i915: use __GFP_RETRY_MAYFAIL
    ...

    Linus Torvalds
     
  • Pull RTC updates from Alexandre Belloni:
    "Here is the pull-request for the RTC subsystem for 4.13.

    Subsystem:

    - expose non volatile RAM using nvmem instead of open coding in many
    drivers. Unfortunately, this option has to be enabled by default to
    not break existing users.

    - rtctest can now test for cutoff dates, showing when an RTC will
    start failing to properly save time and date.

    - new RTC registration functions to remove race conditions in drivers

    Newly supported RTCs:

    - Broadcom STB wake-timer

    - Epson RX8130CE

    - Maxim IC DS1308

    - STMicroelectronics STM32H7

    Drivers:

    - ds1307: use regmap, use nvmem, more cleanups

    - ds3232: temperature reading support

    - gemini: renamed to ftrtc010

    - m41t80: use CCF to expose the clock

    - rv8803: use nvmem

    - s3c: many cleanups

    - st-lpc: fix y2106 bug"

    * tag 'rtc-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (51 commits)
    rtc: Remove wrong deprecation comment
    nvmem: include linux/err.h from header
    rtc: st-lpc: make it robust against y2038/2106 bug
    rtc: rtctest: add check for problematic dates
    tools: timer: add rtctest_setdate
    rtc: ds1307: remove ds1307_remove
    rtc: ds1307: use generic nvmem
    rtc: ds1307: switch to rtc_register_device
    rtc: rv8803: remove rv8803_remove
    rtc: rv8803: use generic nvmem support
    rtc: rv8803: switch to rtc_register_device
    rtc: add generic nvmem support
    rtc: at91rm9200: remove race condition
    rtc: introduce new registration method
    rtc: class separate id allocation from registration
    rtc: class separate device allocation from registration
    rtc: stm32: add STM32H7 RTC support
    dt-bindings: rtc: stm32: add support for STM32H7
    rtc: ds1307: add ds1308 variant
    rtc: ds3232: add temperature support
    ...

    Linus Torvalds
     

13 Jul, 2017

2 commits

  • Pull networking fixes from David Miller:

    1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari
    and Arnd Bergmann.

    2) Fix race in statistics gathering in bnxt_en driver, from Michael
    Chan.

    3) Can't use a mutex in RCU reader protected section on tap driver, from
    Cong WANG.

    4) Fix mdb leak in bridging code, from Eduardo Valentin.

    5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.

    6) Buffer overflow in brcmfmac driver, from Arend van SPriel.

    7) ioremap_nocache() return value needs to be checked in smsc911x
    driver, from Alexey Khoroshilov.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
    net: stmmac: revert "support future possible different internal phy mode"
    sfc: don't read beyond unicast address list
    datagram: fix kernel-doc comments
    socket: add documentation for missing elements
    smsc911x: Add check for ioremap_nocache() return code
    brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
    net: hns: Bugfix for Tx timeout handling in hns driver
    net: ipmr: ipmr_get_table() returns NULL
    nfp: freeing the wrong variable
    mlxsw: spectrum_switchdev: Check status of memory allocation
    mlxsw: spectrum_switchdev: Remove unused variable
    mlxsw: spectrum_router: Fix use-after-free in route replace
    mlxsw: spectrum_router: Add missing rollback
    samples/bpf: fix a build issue
    bridge: mdb: fix leak on complete_info ptr on fail path
    tap: convert a mutex to a spinlock
    cxgb4: fix BUG() on interrupt deallocating path of ULD
    qed: Fix printk option passed when printing ipv6 addresses
    net: Fix minor code bug in timestamping.txt
    net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer
    ...

    Linus Torvalds
     
  • __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
    the page allocator. This has been true but only for allocations
    requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
    ignored for smaller sizes. This is a bit unfortunate because there is
    no way to express the same semantic for those requests and they are
    considered too important to fail so they might end up looping in the
    page allocator for ever, similarly to GFP_NOFAIL requests.

    Now that the whole tree has been cleaned up and accidental or misled
    usage of __GFP_REPEAT flag has been removed for !costly requests we can
    give the original flag a better name and more importantly a more useful
    semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
    that the allocator would try really hard but there is no promise of a
    success. This will work independent of the order and overrides the
    default allocator behavior. Page allocator users have several levels of
    guarantee vs. cost options (take GFP_KERNEL as an example)

    - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
    attempt to free memory at all. The most light weight mode which even
    doesn't kick the background reclaim. Should be used carefully because
    it might deplete the memory and the next user might hit the more
    aggressive reclaim

    - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
    allocation without any attempt to free memory from the current
    context but can wake kswapd to reclaim memory if the zone is below
    the low watermark. Can be used from either atomic contexts or when
    the request is a performance optimization and there is another
    fallback for a slow path.

    - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
    non sleeping allocation with an expensive fallback so it can access
    some portion of memory reserves. Usually used from interrupt/bh
    context with an expensive slow path fallback.

    - GFP_KERNEL - both background and direct reclaim are allowed and the
    _default_ page allocator behavior is used. That means that !costly
    allocation requests are basically nofail but there is no guarantee of
    that behavior so failures have to be checked properly by callers
    (e.g. OOM killer victim is allowed to fail currently).

    - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
    and all allocation requests fail early rather than cause disruptive
    reclaim (one round of reclaim in this implementation). The OOM killer
    is not invoked.

    - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
    behavior and all allocation requests try really hard. The request
    will fail if the reclaim cannot make any progress. The OOM killer
    won't be triggered.

    - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
    and all allocation requests will loop endlessly until they succeed.
    This might be really dangerous especially for larger orders.

    Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
    because they already had their semantic. No new users are added.
    __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
    there is no progress and we have already passed the OOM point.

    This means that all the reclaim opportunities have been exhausted except
    the most disruptive one (the OOM killer) and a user defined fallback
    behavior is more sensible than keep retrying in the page allocator.

    [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
    [mhocko@suse.com: semantic fix]
    Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
    [mhocko@kernel.org: address other thing spotted by Vlastimil]
    Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Alex Belits
    Cc: Chris Wilson
    Cc: Christoph Hellwig
    Cc: Darrick J. Wong
    Cc: David Daney
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: NeilBrown
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko