02 Sep, 2016

2 commits

  • Pull audit fixes from Paul Moore:
    "Two small patches to fix some bugs with the audit-by-executable
    functionality we introduced back in v4.3 (both patches are marked
    for the stable folks)"

    * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
    audit: fix exe_file access in audit_exe_compare
    mm: introduce get_task_exe_file

    Linus Torvalds
     
  • …rnel/git/dgc/linux-xfs

    Pull xfs and iomap fixes from Dave Chinner:
    "Most of these changes are small regression fixes that address problems
    introduced in the 4.8-rc1 window. The two fixes that aren't (IO
    completion fix and superblock inprogress check) are fixes for problems
    introduced some time ago and need to be pushed back to stable kernels.

    Changes in this update:
    - iomap FIEMAP_EXTENT_MERGED usage fix
    - additional mount-time feature restrictions
    - rmap btree query fixes
    - freeze/unmount io completion workqueue fix
    - memory corruption fix for deferred operations handling"

    * tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
    xfs: track log done items directly in the deferred pending work item
    iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
    xfs: prevent dropping ioend completions during buftarg wait
    xfs: fix superblock inprogress check
    xfs: simple btree query range should look right if LE lookup fails
    xfs: fix some key handling problems in _btree_simple_query_range
    xfs: don't log the entire end of the AGF
    xfs: disallow mounting of realtime + rmap filesystems
    xfs: don't perform lookups on zero-height btrees

    Linus Torvalds
     

01 Sep, 2016

4 commits

  • Prior to the change the function would blindly deference mm, exe_file
    and exe_file->f_inode, each of which could have been NULL or freed.

    Use get_task_exe_file to safely obtain stable exe_file.

    Signed-off-by: Mateusz Guzik
    Acked-by: Konstantin Khlebnikov
    Acked-by: Richard Guy Briggs
    Cc: # 4.3.x
    Signed-off-by: Paul Moore

    Mateusz Guzik
     
  • For more convenient access if one has a pointer to the task.

    As a minor nit take advantage of the fact that only task lock + rcu are
    needed to safely grab ->exe_file. This saves mm refcount dance.

    Use the helper in proc_exe_link.

    Signed-off-by: Mateusz Guzik
    Acked-by: Konstantin Khlebnikov
    Acked-by: Richard Guy Briggs
    Cc: # 4.3.x
    Signed-off-by: Paul Moore

    Mateusz Guzik
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes the following issues:

    - Kconfig problem that prevented mxc-rnga from being enabled

    - bogus key sizes in qat aes-xts

    - buggy aes-xts code in vmx"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: vmx - fix null dereference in p8_aes_xts_crypt
    crypto: qat - fix aes-xts key sizes
    hwrng: mxc-rnga - Fix Kconfig dependency

    Linus Torvalds
     
  • We used to delay switching to the new credentials until after we had
    mapped the executable (and possible elf interpreter). That was kind of
    odd to begin with, since the new executable will actually then _run_
    with the new creds, but whatever.

    The bigger problem was that we also want to make sure that we turn off
    prof events and tracing before we start mapping the new executable
    state. So while this is a cleanup, it's also a fix for a possible
    information leak.

    Reported-by: Robert Święcki
    Tested-by: Peter Zijlstra
    Acked-by: David Howells
    Acked-by: Oleg Nesterov
    Acked-by: Andy Lutomirski
    Acked-by: Eric W. Biederman
    Cc: Willy Tarreau
    Cc: Kees Cook
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

31 Aug, 2016

7 commits

  • Pull seccomp fix from Kees Cook:
    "Fix fatal signal delivery after ptrace reordering"

    * tag 'seccomp-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    seccomp: Fix tracer exit notifications during fatal signals

    Linus Torvalds
     
  • This fixes a ptrace vs fatal pending signals bug as manifested in
    seccomp now that seccomp was reordered to happen after ptrace. The
    short version is that seccomp should not attempt to call do_exit()
    while fatal signals are pending under a tracer. The existing code was
    trying to be as defensively paranoid as possible, but it now ends up
    confusing ptrace. Instead, the syscall can just be skipped (which solves
    the original concern that the do_exit() was addressing) and normal signal
    handling, tracer notification, and process death can happen.

    Paraphrasing from the original bug report:

    If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
    after such a trap but not yet been scheduled, and another task in the
    thread-group calls exit_group(), then the tracee task exits without the
    ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
    https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

    The bug happens because when __seccomp_filter() detects
    fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
    signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
    that task is descheduled, __schedule() notices that there is a fatal
    signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
    That prevents the ptracer's waitpid() from returning the ptrace event.
    A more detailed analysis is here:
    https://github.com/mozilla/rr/issues/1762#issuecomment-237396255.

    Reported-by: Robert O'Callahan
    Reported-by: Kyle Huey
    Tested-by: Kyle Huey
    Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
    Signed-off-by: Kees Cook
    Acked-by: Oleg Nesterov
    Acked-by: James Morris

    Kees Cook
     
  • Pull MD fixes from Shaohua Li:
    "This includes several bug fixes:

    - Alexey Obitotskiy fixed a hang for faulty raid5 array with external
    management

    - Song Liu fixed two raid5 journal related bugs

    - Tomasz Majchrzak fixed a bad block recording issue and an
    accounting issue for raid10

    - ZhengYuan Liu fixed an accounting issue for raid5

    - I fixed a potential race condition and memory leak with DIF/DIX
    enabled

    - other trival fixes"

    * tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    raid5: avoid unnecessary bio data set
    raid5: fix memory leak of bio integrity data
    raid10: record correct address of bad block
    md-cluster: fix error return code in join()
    r5cache: set MD_JOURNAL_CLEAN correctly
    md: don't print the same repeated messages about delayed sync operation
    md: remove obsolete ret in md_start_sync
    md: do not count journal as spare in GET_ARRAY_INFO
    md: Prevent IO hold during accessing to faulty raid5 array
    MD: hold mddev lock to change bitmap location
    raid5: fix incorrectly counter of conf->empty_inactive_list_nr
    raid10: increment write counter after bio is split

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - Fix a refcount leak in nfs_callback_up_net
    - Fix an Oopsable condition when the flexfile pNFS driver connection
    to the DS fails
    - Fix an Oopsable condition in NFSv4.1 server callback races
    - Ensure pNFS clients stop doing I/O to the DS if their lease has
    expired, as required by the NFSv4.1 protocol

    Bugfixes:
    - Fix potential looping in the NFSv4.x migration code
    - Patch series to close callback races for OPEN, LAYOUTGET and
    LAYOUTRETURN
    - Silence WARN_ON when NFSv4.1 over RDMA is in use
    - Fix a LAYOUTCOMMIT race in the pNFS/blocks client
    - Fix pNFS timeout issues when the DS fails"

    * tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.x: Fix a refcount leak in nfs_callback_up_net
    NFS4: Avoid migration loops
    pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
    NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
    NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
    NFSv4.1: Defer bumping the slot sequence number until we free the slot
    NFSv4.1: Delay callback processing when there are referring triples
    NFSv4.1: Fix Oopsable condition in server callback races
    SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
    pnfs/blocklayout: update last_write_offset atomically with extents
    pNFS: The client must not do I/O to the DS if it's lease has expired
    pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
    pNFS/flexfiles: Set reasonable default retrans values for the data channel
    NFS: Allow the mount option retrans=0
    pNFS/flexfiles: Fix layoutstat periodic reporting

    Linus Torvalds
     
  • There are three usercopy warnings which are currently being silenced for
    gcc 4.6 and newer:

    1) "copy_from_user() buffer size is too small" compile warning/error

    This is a static warning which happens when object size and copy size
    are both const, and copy size > object size. I didn't see any false
    positives for this one. So the function warning attribute seems to
    be working fine here.

    Note this scenario is always a bug and so I think it should be
    changed to *always* be an error, regardless of
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

    2) "copy_from_user() buffer size is not provably correct" compile warning

    This is another static warning which happens when I enable
    __compiletime_object_size() for new compilers (and
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
    is const, but copy size is *not*. In this case there's no way to
    compare the two at build time, so it gives the warning. (Note the
    warning is a byproduct of the fact that gcc has no way of knowing
    whether the overflow function will be called, so the call isn't dead
    code and the warning attribute is activated.)

    So this warning seems to only indicate "this is an unusual pattern,
    maybe you should check it out" rather than "this is a bug".

    I get 102(!) of these warnings with allyesconfig and the
    __compiletime_object_size() gcc check removed. I don't know if there
    are any real bugs hiding in there, but from looking at a small
    sample, I didn't see any. According to Kees, it does sometimes find
    real bugs. But the false positive rate seems high.

    3) "Buffer overflow detected" runtime warning

    This is a runtime warning where object size is const, and copy size >
    object size.

    All three warnings (both static and runtime) were completely disabled
    for gcc 4.6 with the following commit:

    2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

    That commit mistakenly assumed that the false positives were caused by a
    gcc bug in __compiletime_object_size(). But in fact,
    __compiletime_object_size() seems to be working fine. The false
    positives were instead triggered by #2 above. (Though I don't have an
    explanation for why the warnings supposedly only started showing up in
    gcc 4.6.)

    So remove warning #2 to get rid of all the false positives, and re-enable
    warnings #1 and #3 by reverting the above commit.

    Furthermore, since #1 is a real bug which is detected at compile time,
    upgrade it to always be an error.

    Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
    needed.

    Signed-off-by: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Steven Rostedt
    Cc: Brian Gerst
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Byungchul Park
    Cc: Nilay Vaish
    Signed-off-by: Linus Torvalds

    Josh Poimboeuf
     
  • Pull libata fixes from Tejun Heo:
    "Two libata driver specific fixes for v4.8-rc4. Nothing too scary"

    * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    pata_ninja32: Avoid corrupting status flags
    ahci: disable correct irq for dummy ports

    Linus Torvalds
     
  • Pull cgroup fixes from Tejun Heo:
    "Two fixes for cgroup.

    - There still was a hole in enforcing cpuset rules, fixed by Li.

    - The recent switch to global percpu_rwseom for threadgroup locking
    revealed a couple issues in how percpu_rwsem is implemented and
    used by cgroup. Balbir found that the read locking section was too
    wide unnecessarily including operations which can often depend on
    IOs. With percpu_rwsem updates (coming through a different tree)
    and reduction of read locking section, all the reported locking
    latency issues, including the android one, are resolved.

    It looks like we can keep global percpu_rwsem locking for now. If
    there actually are cases which can't be resolved, we can go back to
    more complex per-signal_struct locking"

    * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
    cpuset: make sure new tasks conform to the current config of the cpuset

    Linus Torvalds
     

30 Aug, 2016

10 commits

  • Ninja32 needs to set some flags to indicate it does 32bit IO. However it currently assigns this which
    loses the initializing flag and causes a warning spew. Fix it to use a logical or as is intended.

    Signed-off-by: Alan Cox
    Tested-by: Ellmar Stelnberger
    Signed-off-by: Tejun Heo

    Alan Cox
     
  • On error, the callers expect us to return without bumping
    nn->cb_users[].

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v3.7+

    Trond Myklebust
     
  • If a server returns itself as a location while migrating, the client may
    end up getting stuck attempting to migrate twice to the same server. Catch
    this by checking if the nfs_client found is the same as the existing
    client. For the other two callers to nfs4_set_client, the nfs_client will
    always be ERR_PTR(-EINVAL).

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     
  • Christoph reports slab corruption when a deferred refcount update
    aborts during _defer_finish(). The cause of this was broken log item
    state tracking in xfs_defer_pending -- upon an abort,
    _defer_trans_abort() will call abort_intent on all intent items,
    including the ones that have already had a done item attached.

    This is incorrect because each intent item has 2 refcount: the first
    is released when the intent item is committed to the log; and the
    second is released when the _done_ item is committed to the log, or
    by the intent creator if there is no done item. In other words, once
    we log the done item, responsibility for releasing the intent item's
    second refcount is transferred to the done item and /must not/ be
    performed by anything else.

    The dfp_committed flag should have been tracking whether or not we had
    a done item so that _defer_trans_abort could decide if it needs to
    abort the intent item, but due to a thinko this was not the case. Rip
    it out and track the done item directly so that we do the right thing
    w.r.t. intent item freeing.

    Signed-off-by: Darrick J. Wong
    Reported-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • …l/git/groeck/linux-staging

    Pull hwmon fix from Guenter Roeck:
    "Add missing sysfs attribute group terminator to it87 driver"

    * tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
    hwmon: (it87) Add missing sysfs attribute group terminator

    Linus Torvalds
     
  • Pull ext4 fixes from Ted Ts'o:
    "Fix bugs that could cause kernel deadlocks or file system corruption
    while moving xattrs to expand the extended inode.

    Also add some sanity checks to the block group descriptors to make
    sure we don't end up overwriting the superblock"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid deadlock when expanding inode size
    ext4: properly align shifted xattrs when expanding inodes
    ext4: fix xattr shifting when expanding inodes part 2
    ext4: fix xattr shifting when expanding inodes
    ext4: validate that metadata blocks do not overlap superblock
    ext4: reserve xattr index for the Hurd

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Segregate namespaces properly in conntrack dumps, from Liping Zhang.

    2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet.

    3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz.

    4) Fix use-after-free in tcp_xmit_retransmit_queue().

    5) Userspace header fixups (use of __u32, missing includes, etc.) from
    Mikko Rapeli.

    6) Further refinements to fragmentation wrt gso and tunnels, from
    Shmulik Ladkani.

    7) Trigger poll correctly for zero length UDP packets, from Eric
    Dumazet.

    8) TCP window scaling fix, also from Eric Dumazet.

    9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets.

    10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet.

    11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng.

    12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet.

    13) Add new device ID to alx driver, from Owen Lin.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
    Add Killer E2500 device ID in alx driver.
    net: smc91x: fix SMC accesses
    Documentation: networking: dsa: Remove platform device TODO
    net/mlx5: Increase number of ethtool steering priorities
    net/mlx5: Add error prints when validate ETS failed
    net/mlx5e: Fix memory leak if refreshing TIRs fails
    net/mlx5e: Add ethtool counter for TX xmit_more
    net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
    net/mlx5e: Don't wait for SQ completions on close
    net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
    net/mlx5e: Don't wait for RQ completions on close
    net/mlx5e: Limit UMR length to the device's limitation
    rhashtable: fix a memory leak in alloc_bucket_locks()
    sfc: fix potential stack corruption from running past stat bitmask
    team: loadbalance: push lacpdus to exact delivery
    net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
    8139cp: Fix one possible deadloop in cp_rx_poll
    i40e: Change some init flow for the client
    Revert "phy: IRQ cannot be shared"
    net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
    ...

    Linus Torvalds
     
  • If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
    ff_layout_pg_init_write, then we currently end up clearing the layout
    segment carried by the struct nfs_pageio_descriptor, causing an Oops
    when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.

    The fix is to ensure we return the layout and then retry.

    Fixes: 446ca2195303 ("pNFS/flexfiles: When initing reads or writes, we...")
    Cc: stable@vger.kernel.org # v4.7+
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • …linux-platform-drivers-x86

    Pull x86 platform driver fixes from Darren Hart:
    "Remove module related code from two drivers that are only configurable
    as built-in: intel_pmic_gpio and platform/olpc"

    * tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
    intel_pmic_gpio: Make explicitly non-modular
    platform/olpc: Make ec explicitly non-modular

    Linus Torvalds
     
  • Pull powerpc fixes from Ben Herrenschmidt:
    "This was meant to be sent early last week, but I has a change pending
    on one of the fixes and other things made me forget all about. Ugh.

    We have some misc fixes for powerpc 4.8. Some trivial bits and some
    regressions, and a trivial cleanup or two that I saw no point in
    letting rot in patchwork"

    * tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc: signals: Discard transaction state from signal frames
    powerpc/powernv : Drop reference added by kset_find_obj()
    powerpc/tm: do not use r13 for tabort_syscall
    powerpc: move hmi.c to arch/powerpc/kvm/
    powerpc: sysdev: cpm: fix gpio save_regs functions
    powerpc/pseries: PACA save area fix for MCE vs MCE
    powerpc/pseries: PACA save area fix for general exception vs MCE
    powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
    powerpc, hotplug: Avoid to touch non-existent cpumasks.
    powerpc: migrate exception table users off module.h and onto extable.h
    powerpc/powernv/pci: fix iterator signedness
    powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
    cxl: use pcibios_free_controller_deferred() when removing vPHBs
    powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
    powerpc/512x: Delete unnecessary assignment for the field "owner"
    drivers/macintosh: Delete owner assignment
    powerpc: cputhreads: Add missing include file

    Linus Torvalds
     

29 Aug, 2016

17 commits

  • Attribute array it87_attributes_in lacks its NULL terminator,
    causing random behavior when operating on the attribute group.

    Fixes: 52929715634a ("hwmon: (it87) Use is_visible for voltage sensors")
    Signed-off-by: Jean Delvare
    Cc: Martin Blumenstingl
    Cc: Guenter Roeck
    Cc: stable@vger.kernel.org
    Signed-off-by: Guenter Roeck

    Jean Delvare
     
  • The Kconfig entry controlling compilation of this code is:

    drivers/platform/x86/Kconfig:config GPIO_INTEL_PMIC
    drivers/platform/x86/Kconfig: bool "Intel PMIC GPIO support"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the couple traces of modular infrastructure use, so that
    when reading the driver there is no doubt it is builtin-only.

    We delete the MODULE_LICENSE tag etc. since all that information
    was (or is now) contained at the top of the file in the comments.

    We don't replace module.h with init.h since the file already has that.

    Cc: Alek Du
    Cc: platform-driver-x86@vger.kernel.org
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Darren Hart

    Paul Gortmaker
     
  • The Kconfig entry controlling compilation of this code is:

    arch/x86/Kconfig:config OLPC
    arch/x86/Kconfig: bool "One Laptop Per Child support"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the couple traces of modular infrastructure use, so that
    when reading the driver there is no doubt it is builtin-only.

    We delete the MODULE_LICENSE tag etc. since all that information
    was (or is now) contained at the top of the file in the comments.

    Cc: platform-driver-x86@vger.kernel.org
    Signed-off-by: Paul Gortmaker
    Acked-by: Andres Salomon
    Signed-off-by: Darren Hart

    Paul Gortmaker
     
  • Signed-off-by: David S. Miller

    Owen Lin
     
  • Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
    machines") broke some ARM platforms through several mistakes. Firstly,
    the access size must correspond to the following rule:

    (a) at least one of 16-bit or 8-bit access size must be supported
    (b) 32-bit accesses are optional, and may be enabled in addition to
    the above.

    Secondly, it provides no emulation of 16-bit accesses, instead blindly
    making 16-bit accesses even when the platform specifies that only 8-bit
    is supported.

    Reorganise smc91x.h so we can make use of the existing 16-bit access
    emulation already provided - if 16-bit accesses are supported, use
    16-bit accesses directly, otherwise if 8-bit accesses are supported,
    use the provided 16-bit access emulation. If neither, BUG(). This
    exactly reflects the driver behaviour prior to the commit being fixed.

    Since the conversion incorrectly cut down the available access sizes on
    several platforms, we also need to go through every platform and fix up
    the overly-restrictive access size: Arnd assumed that if a platform can
    perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
    size needed to be specified - not so, all available access sizes must
    be specified.

    This likely fixes some performance regressions in doing this: if a
    platform does not support 8-bit accesses, 8-bit accesses have been
    emulated by performing a 16-bit read-modify-write access.

    Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
    accesses, which was broken by the original commit.

    Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines")
    Signed-off-by: Russell King
    Tested-by: Robert Jarzmik
    Signed-off-by: David S. Miller

    Russell King
     
  • Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"),
    the shortcomings of the dsa platform device have been addressed, remove
    that TODO item.

    Signed-off-by: Florian Fainelli
    Acked-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Saeed Mahameed says:

    ====================
    Mellanox 100G mlx5 fixes 2016-08-29

    This series contains some bug fixes for the mlx5 core and mlx5
    ethernet driver.

    From Saeed, Fix UMR to consider hardware translation table field
    size limitation when calculating the maximum number of MTTs required
    by the driver. Three patches to speed-up netdevice close time by
    serializing channel (SQs & RQs) destruction rather than issuing and
    waiting for hardware interrupts to free them.

    From Eran, Fix ethtool ring parameter reporting for striding RQ layout.
    Add error prints on ETS validation failure.

    From Kamal, Fix memory leak on error flow.

    From Maor, Fix ethtool steering priorities number.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Ethtool has 11 flow tables, each flow table has its own priority.
    Increase the number of priorities to be aligned with the number of flow
    tables.

    Fixes: 1174fce8d141 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
    Signed-off-by: Maor Gottlieb
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Maor Gottlieb
     
  • Upon set ETS failure due to user invalid input, add error prints to
    specify the exact error to the user.

    Fixes: cdcf11212b22 ('net/mlx5e: Validate BW weight values of ETS')
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     
  • Free 'in' command object also when mlx5_core_modify_tir fails.

    Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring")
    Signed-off-by: Kamal Heib
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Kamal Heib
     
  • Add a counter in ethtool for the number of times that
    TX xmit_more was used.

    Signed-off-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Tariq Toukan
     
  • The driver RQ has two possible configurations: striding RQ and
    non-striding RQ. Until this patch, the driver always reported the
    number of hardware WQEs (ring descriptors). For non striding RQ
    configuration, this was OK since we have one WQE per pending packet
    For striding RQ, multiple packets can fit into one WQE. For better
    user experience we normalize the rx_pending parameter (size of wqe/mtu)
    as the average ring size in case of striding RQ.

    Fixes: 461017cb006a ('net/mlx5e: Support RX multi-packet WQE ...')
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     
  • Instead of asking the firmware to flush the SQ (Send Queue) via
    asynchronous completions when moved to error, we handle SQ flush
    manually (mlx5e_free_tx_descs) same as we did when SQ flush got
    timed out or on tx_timeout.

    This will reduce SQs flush time and speedup interface down procedure.

    Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
    critical code locality.

    Fixes: 29429f3300a3 ('net/mlx5e: Timeout if SQ doesn't flush during close')
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Saeed Mahameed
     
  • ICO (Internal control operations) SQ (Send Queue) is closed/disabled
    after RQ (Receive Queue). After RQ is closed an ICO SQ completion
    might post a fragmented MPWQE (Multi Packet Work Queue Element) into
    that RQ.

    As on regular RQ post, check if we are allowed to post to that
    RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
    if needed.

    Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Saeed Mahameed
     
  • This will significantly reduce receive queue flush time on interface
    down.

    Instead of asking the firmware to flush the RQ (Receive Queue) via
    asynchronous completions when moved to error, we handle RQ flush
    manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
    out.

    This will reduce RQs flush time and speedup interface down procedure
    (ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.

    Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
    free form non critical data path code for better code locality.

    Fixes: 6cd392a082de ('net/mlx5e: Handle RQ flush in error cases')
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Saeed Mahameed
     
  • ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
    is limited to U16_MAX, before this patch we ignored that limitation and
    requested the maximum possible UMR translation length that the netdev
    might need (MAX channels * MAX pages per channel).
    In case of a system with #cores > 32 and when linear WQE allocation fails,
    falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
    stuck.

    Here we limit UMR length to min(U16_MAX, max required pages) (while
    considering the required alignments) on driver load, by default U16_MAX is
    sufficient since the default RX rings value guarantees that we are in
    range, dynamically (on set_ringparam/set_channels) we will check if the
    new required UMR length (num mtts) is still in range, if not, fail the
    request.

    Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Saeed Mahameed
     
  • Userspace can begin and suspend a transaction within the signal
    handler which means they might enter sys_rt_sigreturn() with the
    processor in suspended state.

    sys_rt_sigreturn() wants to restore process context (which may have
    been in a transaction before signal delivery). To do this it must
    restore TM SPRS. To achieve this, any transaction initiated within the
    signal frame must be discarded in order to be able to restore TM SPRs
    as TM SPRs can only be manipulated non-transactionally..
    >From the PowerPC ISA:
    TM Bad Thing Exception [Category: Transactional Memory]
    An attempt is made to execute a mtspr targeting a TM register in
    other than Non-transactional state.

    Not doing so results in a TM Bad Thing:
    [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
    [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
    [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
    [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
    [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
    nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
    xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
    ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
    uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
    scsi_transport_sas bnx2x ipr mdio libcrc32c
    [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
    [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
    [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
    [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0)
    [12045.222418] MSR: 9000000300201033 CR: 28444280 XER: 20000000
    [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
    GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
    GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
    GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
    GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
    GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
    [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
    [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
    [12045.223630] Call Trace:
    [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
    [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
    [12045.223806] Instruction dump:
    [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
    [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 e80304b8 7c0123a6 4e800020
    [12045.224074] ---[ end trace cb8002ee240bae76 ]---

    It isn't clear exactly if there is really a use case for userspace
    returning with a suspended transaction, however, doing so doesn't (on
    its own) constitute a bad frame. As such, this patch simply discards
    the transactional state of the context calling the sigreturn and
    continues.

    Reported-by: Laurent Dufour
    Signed-off-by: Cyril Bur
    Tested-by: Laurent Dufour
    Reviewed-by: Laurent Dufour
    Acked-by: Simon Guo
    Signed-off-by: Benjamin Herrenschmidt

    Cyril Bur