11 Aug, 2017

12 commits

  • Commit 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth")
    unintentionally broke a conditional statement in should_fail(). Any
    faults are not injected in the task context by the change when the
    systematic fault injection is not used.

    This change restores to the previous correct behaviour.

    Link: http://lkml.kernel.org/r/1501633700-3488-1-git-send-email-akinobu.mita@gmail.com
    Fixes: 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth")
    Signed-off-by: Akinobu Mita
    Reported-by: Lu Fengqi
    Tested-by: Lu Fengqi
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The break was in the wrong place so file system tests don't work as
    intended, leaking memory at each test switch.

    [mcgrof@kernel.org: massaged commit subject, noted memory leak issue without the fix]
    Link: http://lkml.kernel.org/r/20170802211450.27928-6-mcgrof@kernel.org
    Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Luis R. Rodriguez
    Reported-by: David Binderman
    Cc: Colin Ian King
    Cc: Dmitry Torokhov
    Cc: Eric W. Biederman
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Michal Marek
    Cc: Miroslav Benes
    Cc: Petr Mladek
    Cc: Rusty Russell
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • We accidentally just drop the lock twice instead of taking it and then
    releasing it. This isn't a big issue unless you are adding more than
    one device to test on, and the kmod.sh doesn't do that yet, however this
    obviously is the correct thing to do.

    [mcgrof@kernel.org: massaged subject, explain what happens]
    Link: http://lkml.kernel.org/r/20170802211450.27928-5-mcgrof@kernel.org
    Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Luis R. Rodriguez
    Cc: Colin Ian King
    Cc: David Binderman
    Cc: Dmitry Torokhov
    Cc: Eric W. Biederman
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Michal Marek
    Cc: Miroslav Benes
    Cc: Petr Mladek
    Cc: Rusty Russell
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Parsing with kstrtol() enables values to be negative, and we failed to
    check for negative values when parsing with test_dev_config_update_uint_sync()
    or test_dev_config_update_uint_range().

    test_dev_config_update_uint_range() has a minimum check though so an
    issue is not present there. test_dev_config_update_uint_sync() is only
    used for the number of threads to use (config_num_threads_store()), and
    indeed this would fail with an attempt for a large allocation.

    Although the issue is only present in practice with the first fix both
    by using kstrtoul() instead of kstrtol().

    Link: http://lkml.kernel.org/r/20170802211450.27928-4-mcgrof@kernel.org
    Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
    Signed-off-by: Luis R. Rodriguez
    Reported-by: Dan Carpenter
    Cc: Colin Ian King
    Cc: David Binderman
    Cc: Dmitry Torokhov
    Cc: Eric W. Biederman
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Michal Marek
    Cc: Miroslav Benes
    Cc: Petr Mladek
    Cc: Rusty Russell
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Trivial fix to spelling mistake in snprintf text

    [mcgrof@kernel.org: massaged commit message]
    Link: http://lkml.kernel.org/r/20170802211450.27928-3-mcgrof@kernel.org
    Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
    Signed-off-by: Colin Ian King
    Signed-off-by: Luis R. Rodriguez
    Cc: Dmitry Torokhov
    Cc: Kees Cook
    Cc: Jessica Yu
    Cc: Rusty Russell
    Cc: Michal Marek
    Cc: Petr Mladek
    Cc: Miroslav Benes
    Cc: Josh Poimboeuf
    Cc: Eric W. Biederman
    Cc: Shuah Khan
    Cc: Dan Carpenter
    Cc: David Binderman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • huge_add_to_page_cache->add_to_page_cache implicitly unlocks the page
    before returning in case of errors.

    The error returned was -EEXIST by running UFFDIO_COPY on a non-hole
    offset of a VM_SHARED hugetlbfs mapping. It was an userland bug that
    triggered it and the kernel must cope with it returning -EEXIST from
    ioctl(UFFDIO_COPY) as expected.

    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    kernel BUG at mm/filemap.c:964!
    invalid opcode: 0000 [#1] SMP
    CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1
    RIP: unlock_page+0x4a/0x50
    Call Trace:
    hugetlb_mcopy_atomic_pte+0xc0/0x320
    mcopy_atomic+0x96f/0xbe0
    userfaultfd_ioctl+0x218/0xe90
    do_vfs_ioctl+0xa5/0x600
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x1a/0xa9

    Link: http://lkml.kernel.org/r/20170802165145.22628-2-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Tested-by: Maxime Coquelin
    Reviewed-by: Mike Kravetz
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Rapoport
    Cc: Alexey Perevalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • The RDMA subsystem can generate several thousand of these messages per
    second eventually leading to a kernel crash. Ratelimit these messages
    to prevent this crash.

    Doug said:
    "I've been carrying a version of this for several kernel versions. I
    don't remember when they started, but we have one (and only one) class
    of machines: Dell PE R730xd, that generate these errors. When it
    happens, without a rate limit, we get rcu timeouts and kernel oopses.
    With the rate limit, we just get a lot of annoying kernel messages but
    the machine continues on, recovers, and eventually the memory
    operations all succeed"

    And:
    "> Well... why are all these EBUSY's occurring? It sounds inefficient
    > (at least) but if it is expected, normal and unavoidable then
    > perhaps we should just remove that message altogether?

    I don't have an answer to that question. To be honest, I haven't
    looked real hard. We never had this at all, then it started out of the
    blue, but only on our Dell 730xd machines (and it hits all of them),
    but no other classes or brands of machines. And we have our 730xd
    machines loaded up with different brands and models of cards (for
    instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
    ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
    meant it wasn't tied to any particular brand/model of RDMA hardware.
    To me, it always smelled of a hardware oddity specific to maybe the
    CPUs or mainboard chipsets in these machines, so given that I'm not an
    mm expert anyway, I never chased it down.

    A few other relevant details: it showed up somewhere around 4.8/4.9 or
    thereabouts. It never happened before, but the prinkt has been there
    since the 3.18 days, so possibly the test to trigger this message was
    changed, or something else in the allocator changed such that the
    situation started happening on these machines?

    And, like I said, it is specific to our 730xd machines (but they are
    all identical, so that could mean it's something like their specific
    ram configuration is causing the allocator to hit this on these
    machine but not on other machines in the cluster, I don't want to say
    it's necessarily the model of chipset or CPU, there are other bits of
    identicalness between these machines)"

    Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
    Signed-off-by: Jonathan Toppins
    Reviewed-by: Doug Ledford
    Tested-by: Doug Ledford
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Hillf Danton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Toppins
     
  • As Tetsuo points out:
    "Commit 385386cff4c6 ("mm: vmstat: move slab statistics from zone to
    node counters") broke "Slab:" field of /proc/meminfo . It shows nearly
    0kB"

    In addition to /proc/meminfo, this problem also affects the slab
    counters OOM/allocation failure info dumps, can cause early -ENOMEM from
    overcommit protection, and miscalculate image size requirements during
    suspend-to-disk.

    This is because the patch in question switched the slab counters from
    the zone level to the node level, but forgot to update the global
    accessor functions to read the aggregate node data instead of the
    aggregate zone data.

    Use global_node_page_state() to access the global slab counters.

    Fixes: 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters")
    Link: http://lkml.kernel.org/r/20170801134256.5400-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reported-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Cc: Josef Bacik
    Cc: Vladimir Davydov
    Cc: Stefan Agner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pull networking fixes from David Miller:

    1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.

    2) Fix stats handling in bcm_sysport_get_stats(), from Florian
    Fainelli.

    3) Reject 16777215 VNI value in geneve_validate(), from Girish
    Moodalbail.

    4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.

    5) Once a UFO fragmented frame is treated as UFO, we should continue
    doing so. Likewise once a frame has been segmented, we should
    continue doing that and not try to convert it to a UFO frame. From
    Willem de Bruijn.

    6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
    prevent races. From Willem de Bruijn.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    packet: fix tp_reserve race in packet_set_ring
    udp: consistently apply ufo or fragmentation
    net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
    igmp: Fix regression caused by igmp sysctl namespace code.
    geneve: maximum value of VNI cannot be used
    net: systemport: Fix software statistics for SYSTEMPORT Lite
    tipc: remove premature ESTABLISH FSM event at link synchronization

    Linus Torvalds
     
  • Updates to tp_reserve can race with reads of the field in
    packet_set_ring. Avoid this by holding the socket lock during
    updates in setsockopt PACKET_RESERVE.

    This bug was discovered by syzkaller.

    Fixes: 8913336a7e8d ("packet: add PACKET_RESERVE sockopt")
    Reported-by: Andrey Konovalov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • When iteratively building a UDP datagram with MSG_MORE and that
    datagram exceeds MTU, consistently choose UFO or fragmentation.

    Once skb_is_gso, always apply ufo. Conversely, once a datagram is
    split across multiple skbs, do not consider ufo.

    Sendpage already maintains the first invariant, only add the second.
    IPv6 does not have a sendpage implementation to modify.

    A gso skb must have a partial checksum, do not follow sk_no_check_tx
    in udp_send_skb.

    Found by syzkaller.

    Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
    Reported-by: Andrey Konovalov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Pull sparc updates from David Miller:

    1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.

    2) Prevent crashes when bringing up sunvdc virtual block devices in
    some environments. From Jim Quigley.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
    sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
    sparc64: recognize and support sparc M8 cpu type
    sparc64: properly name the cpu constants

    Linus Torvalds
     

10 Aug, 2017

12 commits

  • Commit 55917a21d0cc ("netfilter: x_tables: add context to know if
    extension runs from nft_compat") introduced a member nft_compat to
    xt_tgchk_param structure.

    But it didn't set it's value for ipt_init_target. With unexpected
    value in par.nft_compat, it may return unexpected result in some
    target's checkentry.

    This patch is to set all it's fields as 0 and only initialize the
    non-zero fields in ipt_init_target.

    v1->v2:
    As Wang Cong's suggestion, fix it by setting all it's fields as
    0 and only initializing the non-zero fields.

    Fixes: 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat")
    Suggested-by: Cong Wang
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Commit dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
    moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
    function is only called as part of per-namespace initialization, only if
    CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
    compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
    being left initialized with 0. However, there are certain functions, such as
    ip_mc_join_group which are always compiled and make use of some of those
    sysctls. Let's do a partial revert of the aforementioned commit and move the
    sysctl initialization into inet_init_net, that way they will always have
    sane values.

    Fixes: dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595
    Reported-by: Gerardo Exequiel Pozzi
    Signed-off-by: Nikolay Borisov
    Signed-off-by: David S. Miller

    Nikolay Borisov
     
  • Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
    of values for it would be from 0 to 16777215 (2^24 -1). However, one
    cannot create a geneve device with VNI set to 16777215. This patch fixes
    this issue.

    Signed-off-by: Girish Moodalbail
    Signed-off-by: David S. Miller

    Girish Moodalbail
     
  • With SYSTEMPORT Lite we have holes in our statistics layout that make us
    skip over the hardware MIB counters, bcm_sysport_get_stats() was not
    taking that into account resulting in reporting 0 for all SW-maintained
    statistics, fix this by skipping accordingly.

    Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • When a link between two nodes come up, both endpoints will initially
    send out a STATE message to the peer, to increase the probability that
    the peer endpoint also is up when the first traffic message arrives.
    Thereafter, if the establishing link is the second link between two
    nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
    helping the peer to perform initial synchronization between the two
    links.

    However, the initial STATE message may be lost, in which case the SYNCH
    message will be the first one arriving at the peer. This should also
    work, as the SYNCH message itself will be used to take up the link
    endpoint before initializing synchronization.

    Unfortunately the code for this case is broken. Currently, the link is
    brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
    arrives, whereupon __tipc_node_link_up() is called to distribute the
    link slots and take the link into traffic. But, __tipc_node_link_up() is
    itself starting with a test for whether the link is up, and if true,
    returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
    is unnecessary, since tipc_node_link_up() is itself issuing such an
    event, but also harmful, since it inhibits tipc_node_link_up() to
    perform the test of its tasks, and the link endpoint in question hence
    is never taken into traffic.

    This problem has been exposed when we set up dual links between pre-
    and post-4.4 kernels, because the former ones don't send out the
    initial STATE message described above.

    We fix this by removing the unnecessary event call.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Using mpgroup to define multiple paths for a virtual disk causes multiple
    virtual-device-port ports to be created for that virtual device.
    Each virtual-device-port port then gets a vdisk created for it by the Linux
    sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
    cannot handle multiple ports for a single vdisk, leading to a kernel panic
    at startup.

    This fix prevents more than one vdisk per virtual-device-port being created
    until full virtual disk multipathing (mpgroup) support is implemented.

    Signed-off-by: Jim Quigley
    Reviewed-by: Liam Merwick
    Reviewed-by: Shannon Nelson
    Reviewed-by: Alexandre Chartre
    Reviewed-by: Aaron Young
    Signed-off-by: David S. Miller

    Jim Quigley
     
  • Pull pin control fixes from Linus Walleij:
    "These are the pin control fixes I have gathered since the return from
    my vacation. They boiled in -next a while so let's get them in.

    Apart from the documentation build it is purely driver fixes. Which is
    nice. The Intel fixes seem kind of important.

    - Fix the documentation build as the docs were moved

    - Correct the UART pin list on the Intel Merrifield

    - Fix pin assignment and number of pins on the Marvell Armada 37xx
    pin controller

    - Cover the Setzer models in the Chromebook DMI quirk in the Intel
    cheryview driver so they start working

    - Add the missing "sim" function to the sunxi driver

    - Fix USB pin definitions on Uniphier Pro4

    - Smatch fix for invalid reference in the zx pin control driver"

    * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
    pinctrl: generic: update references to Documentation/pinctrl.txt
    pinctrl: intel: merrifield: Correct UART pin lists
    pinctrl: armada-37xx: Fix number of pin in south bridge
    pinctrl: armada-37xx: Fix the pin 23 on south bridge
    pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
    pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
    pinctrl: uniphier: fix USB3 pin assignment for Pro4
    pinctrl: zte: fix dereference of 'data' in zx_set_mux()

    Linus Torvalds
     
  • Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in
    get_futex_key()") removed an unnecessary lock_page() with the
    side-effect that page->mapping needed to be treated very carefully.

    Two defensive warnings were added in case any assumption was missed and
    the first warning assumed a correct application would not alter a
    mapping backing a futex key. Since merging, it has not triggered for
    any unexpected case but Mark Rutland reported the following bug
    triggering due to the first warning.

    kernel BUG at kernel/futex.c:679!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
    Hardware name: linux,dummy-virt (DT)
    task: ffff80001e271780 task.stack: ffff000010908000
    PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
    LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
    pc : [] lr : [] pstate: 80000145

    The fact that it's a bug instead of a warning was due to an unrelated
    arm64 problem, but the warning itself triggered because the underlying
    mapping changed.

    This is an application issue but from a kernel perspective it's a
    recoverable situation and the warning is unnecessary so this patch
    removes the warning. The warning may potentially be triggered with the
    following test program from Mark although it may be necessary to adjust
    NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
    system.

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define NR_FUTEX_THREADS 16
    pthread_t threads[NR_FUTEX_THREADS];

    void *mem;

    #define MEM_PROT (PROT_READ | PROT_WRITE)
    #define MEM_SIZE 65536

    static int futex_wrapper(int *uaddr, int op, int val,
    const struct timespec *timeout,
    int *uaddr2, int val3)
    {
    syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
    }

    void *poll_futex(void *unused)
    {
    for (;;) {
    futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
    }
    }

    int main(int argc, char *argv[])
    {
    int i;

    mem = mmap(NULL, MEM_SIZE, MEM_PROT,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);

    printf("Mapping @ %p\n", mem);

    printf("Creating futex threads...\n");

    for (i = 0; i < NR_FUTEX_THREADS; i++)
    pthread_create(&threads[i], NULL, poll_futex, NULL);

    printf("Flipping mapping...\n");
    for (;;) {
    mmap(mem, MEM_SIZE, MEM_PROT,
    MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    }

    return 0;
    }

    Reported-and-tested-by: Mark Rutland
    Signed-off-by: Mel Gorman
    Acked-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org # 4.7+
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull i2c fixes from Wolfram Sang:
    "The main thing is to allow empty id_tables for ACPI to make some
    drivers get probed again. It looks a bit bigger than usual because it
    needs some internal renaming, too.

    Other than that, there is a fix for broken DSTDs, a super simple
    enablement for ARM MPS, and two documentation fixes which I'd like to
    see in v4.13 already"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: rephrase explanation of I2C_CLASS_DEPRECATED
    i2c: allow i2c-versatile for ARM MPS platforms
    i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
    i2c: designware: Print clock freq on invalid clock freq error
    i2c: core: Allow empty id_table in ACPI case as well
    i2c: mux: pinctrl: mention correct module name in Kconfig help text

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Three patches that should go into this release.

    Two of them are from Paolo and fix up some corner cases with BFQ, and
    the last patch is from Ming and fixes up a potential usage count
    imbalance regression due to the recent NOWAIT work"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
    block, bfq: consider also in_service_entity to state whether an entity is active
    block, bfq: reset in_service_entity if it becomes idle

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "Fix two regressions in the inside-secure driver with respect to
    hmac(sha1)"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
    crypto: inside-secure - fix invalidation check in hmac_sha1_setkey

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "The pull requests are getting smaller, that's progress I suppose :-)

    1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.

    2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
    from Koichiro Den.

    3) Missing u64_stats_init() calls in several drivers, from Florian
    Fainelli.

    4) TCP can set the congestion window to an invalid ssthresh value
    after congestion window reductions, from Yuchung Cheng.

    5) Fix BPF jit branch generation on s390, from Daniel Borkmann.

    6) Correct MIPS ebpf JIT merge, from David Daney.

    7) Correct byte order test in BPF test_verifier.c, from Daniel
    Borkmann.

    8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.

    9) Handle SCTP checksums properly in mlx4 driver, from Davide
    Caratti.

    10) We can potentially enter tcp_connect() with a cached route
    already, due to fastopen, so we have to explicitly invalidate it.

    11) skb_warn_bad_offload() can bark in legitimate situations, fix from
    Willem de Bruijn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
    net: avoid skb_warn_bad_offload false positives on UFO
    qmi_wwan: fix NULL deref on disconnect
    ppp: fix xmit recursion detection on ppp channels
    rds: Reintroduce statistics counting
    tcp: fastopen: tcp_connect() must refresh the route
    net: sched: set xt_tgchk_param par.net properly in ipt_init_target
    net: dsa: mediatek: add adjust link support for user ports
    net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
    qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
    hysdn: fix to a race condition in put_log_buffer
    s390/qeth: fix L3 next-hop in xmit qeth hdr
    asix: Fix small memory leak in ax88772_unbind()
    asix: Ensure asix_rx_fixup_info members are all reset
    asix: Add rx->ax_skb = NULL after usbnet_skb_return()
    bpf: fix selftest/bpf/test_pkt_md_access on s390x
    netvsc: fix race on sub channel creation
    bpf: fix byte order test in test_verifier
    xgene: Always get clk source, but ignore if it's missing for SGMII ports
    MIPS: Add missing file for eBPF JIT.
    bpf, s390: fix build for libbpf and selftest suite
    ...

    Linus Torvalds
     

09 Aug, 2017

16 commits

  • skb_warn_bad_offload triggers a warning when an skb enters the GSO
    stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
    checksum offload set.

    Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
    observed that SKB_GSO_DODGY producers can trigger the check and
    that passing those packets through the GSO handlers will fix it
    up. But, the software UFO handler will set ip_summed to
    CHECKSUM_NONE.

    When __skb_gso_segment is called from the receive path, this
    triggers the warning again.

    Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
    Tx these two are equivalent. On Rx, this better matches the
    skb state (checksum computed), as CHECKSUM_NONE here means no
    checksum computed.

    See also this thread for context:
    http://patchwork.ozlabs.org/patch/799015/

    Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • qmi_wwan_disconnect is called twice when disconnecting devices with
    separate control and data interfaces. The first invocation will set
    the interface data to NULL for both interfaces to flag that the
    disconnect has been handled. But the matching NULL check was left
    out when qmi_wwan_disconnect was added, resulting in this oops:

    usb 2-1.4: USB disconnect, device number 4
    qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
    BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
    IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
    PGD 0
    P4D 0
    Oops: 0000 [#1] SMP
    Modules linked in:
    CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G E 4.12.3-nr44-normandy-r1500619820+ #1
    Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
    Workqueue: usb_hub_wq hub_event [usbcore]
    task: ffff8c882b716040 task.stack: ffffb8e800d84000
    RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
    RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
    RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
    R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
    R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
    Call Trace:
    ? usb_unbind_interface+0x71/0x270 [usbcore]
    ? device_release_driver_internal+0x154/0x210
    ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
    ? usbnet_disconnect+0x6c/0xf0 [usbnet]
    ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
    ? usb_unbind_interface+0x71/0x270 [usbcore]
    ? device_release_driver_internal+0x154/0x210

    Reported-and-tested-by: Nathaniel Roach
    Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
    Cc: Daniele Palmas
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Bjørn Mork
     
  • Commit e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp
    devices") dropped the xmit_recursion counter incrementation in
    ppp_channel_push() and relied on ppp_xmit_process() for this task.
    But __ppp_channel_push() can also send packets directly (using the
    .start_xmit() channel callback), in which case the xmit_recursion
    counter isn't incremented anymore. If such packets get routed back to
    the parent ppp unit, ppp_xmit_process() won't notice the recursion and
    will call ppp_channel_push() on the same channel, effectively creating
    the deadlock situation that the xmit_recursion mechanism was supposed
    to prevent.

    This patch re-introduces the xmit_recursion counter incrementation in
    ppp_channel_push(). Since the xmit_recursion variable is now part of
    the parent ppp unit, incrementation is skipped if the channel doesn't
    have any. This is fine because only packets routed through the parent
    unit may enter the channel recursively.

    Finally, we have to ensure that pch->ppp is not going to be modified
    while executing ppp_channel_push(). Instead of taking this lock only
    while calling ppp_xmit_process(), we now have to hold it for the full
    ppp_channel_push() execution. This respects the ppp locks ordering
    which requires locking ->upl before ->downl.

    Fixes: e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • In commit 7e3f2952eeb1 ("rds: don't let RDS shutdown a connection
    while senders are present"), refilling the receive queue was removed
    from rds_ib_recv(), along with the increment of
    s_ib_rx_refill_from_thread.

    Commit 73ce4317bf98 ("RDS: make sure we post recv buffers")
    re-introduces filling the receive queue from rds_ib_recv(), but does
    not add the statistics counter. rds_ib_recv() was later renamed to
    rds_ib_recv_path().

    This commit reintroduces the statistics counting of
    s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Reviewed-by: Wei Lin Guay
    Reviewed-by: Shamir Rabinovitch
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     
  • With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
    to call tcp_connect() while socket sk_dst_cache is either NULL
    or invalid.

    +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
    +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
    +0 connect(4, ..., ...) = 0

    << sk->sk_dst_cache becomes obsolete, or even set to NULL >>

    +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000

    We need to refresh the route otherwise bad things can happen,
    especially when syzkaller is running on the host :/

    Fixes: 19f6d3f3c8422 ("net/tcp-fastopen: Add new API support")
    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Cc: Yuchung Cheng
    Acked-by: Wei Wang
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Now xt_tgchk_param par in ipt_init_target is a local varibale,
    par.net is not initialized there. Later when xt_check_target
    calls target's checkentry in which it may access par.net, it
    would cause kernel panic.

    Jaroslav found this panic when running:

    # ip link add TestIface type dummy
    # tc qd add dev TestIface ingress handle ffff:
    # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
    action xt -j CONNMARK --set-mark 4

    This patch is to pass net param into ipt_init_target and set
    par.net with it properly in there.

    v1->v2:
    As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
    it by also passing net_id to __tcf_ipt_init.
    v2->v3:
    Missed the fixes tag, so add it.

    Fixes: ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put")
    Reported-by: Jaroslav Aster
    Signed-off-by: Xin Long
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Xin Long
     
  • Manually adjust the port settings of user ports once PHY polling has
    completed. This patch extends the adjust_link callback to configure the
    per port PMCR register, applying the proper values polled from the PHY.
    Without this patch flow control was not always getting setup properly.

    Signed-off-by: Shashidhar Lakkavalli
    Signed-off-by: Muciri Gatimu
    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
    checksum is successful, the driver subtracts the pseudo-header checksum
    from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
    do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.

    V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set

    Reported-by: Shuang Li
    Fixes: f8c6455bb04b ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
    Signed-off-by: Davide Caratti
    Acked-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • Pull rdma fixes from Doug Ledford:
    "Third set of -rc fixes for 4.13 cycle

    - small set of miscellanous fixes

    - a reasonably sizable set of IPoIB fixes that deal with multiple
    long standing issues"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
    IB/hns: checking for IS_ERR() instead of NULL
    RDMA/mlx5: Fix existence check for extended address vector
    IB/uverbs: Fix device cleanup
    RDMA/uverbs: Prevent leak of reserved field
    IB/core: Fix race condition in resolving IP to MAC
    IB/ipoib: Notify on modify QP failure only when relevant
    Revert "IB/core: Allow QP state transition from reset to error"
    IB/ipoib: Remove double pointer assigning
    IB/ipoib: Clean error paths in add port
    IB/ipoib: Add get statistics support to SRIOV VF
    IB/ipoib: Add multicast packets statistics
    IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization
    IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp
    IB/ipoib: Make sure no in-flight joins while leaving that mcast
    IB/ipoib: Use cancel_delayed_work_sync when needed
    IB/ipoib: Fix race between light events and interface restart

    Linus Torvalds
     
  • Allow any number of command line arguments to match either the
    section header or the section contents and create new files.

    Create MAINTAINERS.new and SECTION.new.

    This allows scripting of the movement of various sections from
    MAINTAINERS.

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Instead of reading STDIN and writing STDOUT, use specific filenames of
    MAINTAINERS and MAINTAINERS.new.

    Use hash references instead of global hash %hash so future modifications
    can read and write specific hashes to split up MAINTAINERS into multiple
    files using a script.

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Section [A-Z]: patterns are not currently in any required sorting order.
    Add a specific sorting sequence to MAINTAINERS entries.
    Sort F: and X: patterns in alphabetic order.

    The preferred section ordering is:

    SECTION HEADER
    M: Maintainers
    R: Reviewers
    P: Named persons without email addresses
    L: Mailing list addresses
    S: Status of this section (Supported, Maintained, Orphan, etc...)
    W: Any relevant URLs
    T: Source code control type (git, quilt, etc)
    Q: Patchwork patch acceptance queue site
    B: Bug tracking URIs
    C: Chat URIs
    F: Files with wildcard patterns (alphabetic ordered)
    X: Excluded files with wildcard patterns (alphabetic ordered)
    N: Files with regex patterns
    K: Keyword regexes in source code for maintainership identification

    Miscellaneous perl neatening:

    - Rename %map to %hash, map has a different meaning in perl
    - Avoid using \& and local variables for function indirection
    - Use return for a little c like clarity
    - Use c-like function call style instead of &function

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Allow for MAINTAINERS to become a directory and if it is,
    read all the files in the directory for maintained sections.

    Optionally look for all files named MAINTAINERS in directories
    excluding the .git directory by using --find-maintainer-files.

    This optional feature adds ~.3 seconds of CPU on an Intel
    i5-6200 with an SSD.

    Miscellanea:

    - Create a read_maintainer_file subroutine from the existing code
    - Test only the existence of MAINTAINERS, not whether it's a file

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • The openbmc mailing list is moderated for non-subscribers.

    Signed-off-by: Randy Dunlap
    Acked-by: Brendan Higgins
    Cc: Benjamin Herrenschmidt
    Cc: Joel Stanley
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fixes: f47e07bc5f1a5c48 ("Fix up MAINTAINERS file problems")
    Cc: Joe Perches
    Signed-off-by: Sedat Dilek
    Signed-off-by: Linus Torvalds

    Sedat Dilek
     
  • Pull SCSI fixes from James Bottomley:
    "Two small fixes, one re-fix of a previous fix and five patches sorting
    out hotplug in the bnx2X class of drivers. The latter is rather
    involved, but necessary because these drivers have started dropping
    lockdep recursion warnings on the hotplug lock because of its
    conversion to a percpu rwsem"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: sg: only check for dxfer_len greater than 256M
    scsi: aacraid: reading out of bounds
    scsi: qedf: Limit number of CQs
    scsi: bnx2i: Simplify cpu hotplug code
    scsi: bnx2fc: Simplify CPU hotplug code
    scsi: bnx2i: Prevent recursive cpuhotplug locking
    scsi: bnx2fc: Prevent recursive cpuhotplug locking
    scsi: bnx2fc: Plug CPU hotplug race

    Linus Torvalds