08 May, 2015

11 commits

  • Currently, atomic_cmpxchg() is used to get the lock. However, this
    is not really necessary if there is more than one task in the queue
    and the queue head don't need to reset the tail code. For that case,
    a simple write to set the lock bit is enough as the queue head will
    be the only one eligible to get the lock as long as it checks that
    both the lock and pending bits are not set. The current pending bit
    waiting code will ensure that the bit will not be set as soon as the
    tail code in the lock is set.

    With that change, the are some slight improvement in the performance
    of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
    Westere-EX machine as shown in the tables below.

    [Standalone/Embedded - same node]
    # of tasks Before patch After patch %Change
    ---------- ----------- ---------- -------
    3 2324/2321 2248/2265 -3%/-2%
    4 2890/2896 2819/2831 -2%/-2%
    5 3611/3595 3522/3512 -2%/-2%
    6 4281/4276 4173/4160 -3%/-3%
    7 5018/5001 4875/4861 -3%/-3%
    8 5759/5750 5563/5568 -3%/-3%

    [Standalone/Embedded - different nodes]
    # of tasks Before patch After patch %Change
    ---------- ----------- ---------- -------
    3 12242/12237 12087/12093 -1%/-1%
    4 10688/10696 10507/10521 -2%/-2%

    It was also found that this change produced a much bigger performance
    improvement in the newer IvyBridge-EX chip and was essentially to close
    the performance gap between the ticket spinlock and queued spinlock.

    The disk workload of the AIM7 benchmark was run on a 4-socket
    Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
    on a 3.14 based kernel. The results of the test runs were:

    AIM7 XFS Disk Test
    kernel JPM Real Time Sys Time Usr Time
    ----- --- --------- -------- --------
    ticketlock 5678233 3.17 96.61 5.81
    qspinlock 5750799 3.13 94.83 5.97

    AIM7 EXT4 Disk Test
    kernel JPM Real Time Sys Time Usr Time
    ----- --- --------- -------- --------
    ticketlock 1114551 16.15 509.72 7.11
    qspinlock 2184466 8.24 232.99 6.01

    The ext4 filesystem run had a much higher spinlock contention than
    the xfs filesystem run.

    The "ebizzy -m" test was also run with the following results:

    kernel records/s Real Time Sys Time Usr Time
    ----- --------- --------- -------- --------
    ticketlock 2075 10.00 216.35 3.49
    qspinlock 3023 10.00 198.20 4.80

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • When we allow for a max NR_CPUS < 2^14 we can optimize the pending
    wait-acquire and the xchg_tail() operations.

    By growing the pending bit to a byte, we reduce the tail to 16bit.
    This means we can use xchg16 for the tail part and do away with all
    the repeated compxchg() operations.

    This in turn allows us to unconditionally acquire; the locked state
    as observed by the wait loops cannot change. And because both locked
    and pending are now a full byte we can use simple stores for the
    state transition, obviating one atomic operation entirely.

    This optimization is needed to make the qspinlock achieve performance
    parity with ticket spinlock at light load.

    All this is horribly broken on Alpha pre EV56 (and any other arch that
    cannot do single-copy atomic byte stores).

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra (Intel)
     
  • This is a preparatory patch that extracts out the following 2 code
    snippets to prepare for the next performance optimization patch.

    1) the logic for the exchange of new and previous tail code words
    into a new xchg_tail() function.
    2) the logic for clearing the pending bit and setting the locked bit
    into a new clear_pending_set_locked() function.

    This patch also simplifies the trylock operation before queuing by
    calling queued_spin_trylock() directly.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Because the qspinlock needs to touch a second cacheline (the per-cpu
    mcs_nodes[]); add a pending bit and allow a single in-word spinner
    before we punt to the second cacheline.

    It is possible so observe the pending bit without the locked bit when
    the last owner has just released but the pending owner has not yet
    taken ownership.

    In this case we would normally queue -- because the pending bit is
    already taken. However, in this case the pending bit is guaranteed
    to be released 'soon', therefore wait for it and avoid queueing.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra (Intel)
     
  • This patch makes the necessary changes at the x86 architecture
    specific layer to enable the use of queued spinlocks for x86-64. As
    x86-32 machines are typically not multi-socket. The benefit of queue
    spinlock may not be apparent. So queued spinlocks are not enabled.

    Currently, there is some incompatibilities between the para-virtualized
    spinlock code (which hard-codes the use of ticket spinlock) and the
    queued spinlocks. Therefore, the use of queued spinlocks is disabled
    when the para-virtualized spinlock is enabled.

    The arch/x86/include/asm/qspinlock.h header file includes some x86
    specific optimization which will make the queueds spinlock code
    perform better than the generic implementation.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • This patch introduces a new generic queued spinlock implementation that
    can serve as an alternative to the default ticket spinlock. Compared
    with the ticket spinlock, this queued spinlock should be almost as fair
    as the ticket spinlock. It has about the same speed in single-thread
    and it can be much faster in high contention situations especially when
    the spinlock is embedded within the data structure to be protected.

    Only in light to moderate contention where the average queue depth
    is around 1-3 will this queued spinlock be potentially a bit slower
    due to the higher slowpath overhead.

    This queued spinlock is especially suit to NUMA machines with a large
    number of cores as the chance of spinlock contention is much higher
    in those machines. The cost of contention is also higher because of
    slower inter-node memory traffic.

    Due to the fact that spinlocks are acquired with preemption disabled,
    the process will not be migrated to another CPU while it is trying
    to get a spinlock. Ignoring interrupt handling, a CPU can only be
    contending in one spinlock at any one time. Counting soft IRQ, hard
    IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
    activities. By allocating a set of per-cpu queue nodes and used them
    to form a waiting queue, we can encode the queue node address into a
    much smaller 24-bit size (including CPU number and queue node index)
    leaving one byte for the lock.

    Please note that the queue node is only needed when waiting for the
    lock. Once the lock is acquired, the queue node can be released to
    be used later.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Daniel J Blueman
    Cc: David Vrabel
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Raghavendra K T
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Looks like commit :

    43239cbe79fc ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")

    left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE().

    Signed-off-by: Preeti U Murthy
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: borntraeger@de.ibm.com
    Cc: dave@stgolabs.net
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.com
    Signed-off-by: Ingo Molnar

    Preeti U Murthy
     
  • In up_write()/up_read(), rwsem_wake() will be called whenever it
    detects that some writers/readers are waiting. The rwsem_wake()
    function will take the wait_lock and call __rwsem_do_wake() to do the
    real wakeup. For a heavily contended rwsem, doing a spin_lock() on
    wait_lock will cause further contention on the heavily contended rwsem
    cacheline resulting in delay in the completion of the up_read/up_write
    operations.

    This patch makes the wait_lock taking and the call to __rwsem_do_wake()
    optional if at least one spinning writer is present. The spinning
    writer will be able to take the rwsem and call rwsem_wake() later
    when it calls up_write(). With the presence of a spinning writer,
    rwsem_wake() will now try to acquire the lock using trylock. If that
    fails, it will just quit.

    Suggested-by: Peter Zijlstra (Intel)
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Davidlohr Bueso
    Acked-by: Jason Low
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Douglas Hatch
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These include three regression fixes (PCI resources management,
    ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI
    documentation fixes related to GPIO.

    Specifics:

    - Fix for a PCI resources management regression introduced during the
    4.0 cycle and related to the handling of ACPI resources'
    Producer/Consumer flags that turn out to be useless (Jiang Liu)

    - Fix for a MacBook regression related to the Smart Battery Subsystem
    (SBS) driver causing various problems (stalls on boot, failure to
    detect or report battery) to happen and introduced during the 3.18
    cycle (Chris Bainbridge)

    - Fix for an ACPI/PNP device enumeration regression introduced during
    the 3.16 cycle caused by failing to include two PNP device IDs into
    the list of IDs that PNP device objects need to be created for
    (Witold Szczeponik)

    - Fixes for two minor mistakes in the ACPI GPIO properties
    documentation (Antonio Ospite, Rafael J Wysocki)"

    * tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / PNP: add two IDs to list for PNPACPI device enumeration
    ACPI / documentation: Fix ambiguity in the GPIO properties document
    ACPI / documentation: fix a sentence about GPIO resources
    ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
    x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

    Linus Torvalds
     
  • * acpi-resources:
    x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

    * acpi-battery:
    ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook

    * acpi-doc:
    ACPI / documentation: Fix ambiguity in the GPIO properties document
    ACPI / documentation: fix a sentence about GPIO resources

    * acpi-pnp:
    ACPI / PNP: add two IDs to list for PNPACPI device enumeration

    Rafael J. Wysocki
     
  • Pull f2fs fixes from Jaegeuk Kim:
    "Fix a performance regression and a bug"

    * tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
    f2fs: fix wrong error hanlder in f2fs_follow_link
    Revert "f2fs: enhance multi-threads performance"

    Linus Torvalds
     

07 May, 2015

7 commits

  • Pull pin control fixes from Linus Walleij:
    "Here is a smallish set of pin control fixes for the v4.1 cycle,
    collected the last two weeks:

    - fix a real nasty legacy bug that has screwed up the protection of
    adding pinctrl maps dynamically. Normally this didn't happen so
    much but Dough Anderson ran into it and fixed it, kudos!

    - minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers"

    * tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
    pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
    pinctrl: mediatek: mtk-common: initialize unmask
    pinctrl: qcom-spmi-mpp: Fix input value report
    pinctrl: qcom-spmi: Fix pin direction configuration
    pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)

    Linus Torvalds
     
  • Pull vfio fixes from Alex Williamson:
    "Fix some undesirable behavior with the vfio device request interface:

    - increase verbosity of device request channel (Alex Williamson)

    - fix runaway interruptible timeout (Alex Williamson)"

    * tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio:
    vfio: Fix runaway interruptible timeout
    vfio-pci: Log device requests more verbosely

    Linus Torvalds
     
  • Pull infiniband updates from Doug Ledford:
    "Minor updates for 4.1-rc

    Most of the changes are fairly small and well confined. The iWARP
    address reporting changes are the only ones that are a medium size. I
    had these queued up prior to rc1, but due to the shuffle in
    maintainers, they did not get submitted when I expected. My apologies
    for that. I feel comfortable with them however due to the testing
    they've received, so I left them in this submission"

    * tag 'for-linus' of git://github.com/dledford/linux:
    MAINTAINERS: Update InfiniBand subsystem maintainer
    MAINTAINERS: add include/rdma/ to InfiniBand subsystem
    IPoIB/CM: Fix indentation level
    iw_cxgb4: Remove negative advice dmesg warnings
    IB/core: Fix unaligned accesses
    IB/core: change rdma_gid2ip into void function as it always return zero
    IB/qib: use arch_phys_wc_add()
    IB/qib: add acounting for MTRR
    IB/core: dma unmap optimizations
    IB/core: dma map/unmap locking optimizations
    RDMA/cxgb4: Report the actual address of the remote connecting peer
    RDMA/nes: Report the actual address of the remote connecting peer
    RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
    iw_cxgb4: enforce qp/cq id requirements
    iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs
    iw_cxgb4: 32b platform fixes
    iw_cxgb4: Cleanup register defines/MACROS
    RDMA/CMA: Canonize IPv4 on IPV6 sockets properly

    Linus Torvalds
     
  • Pull xen bug fixes from David Vrabel:

    - fix blkback regression if using persistent grants

    - fix various event channel related suspend/resume bugs

    - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

    - SWIOTLB on ARM now uses frames evtchn before binding the channel to CPU in __startup_pirq()
    xen/console: Update console event channel on resume
    xen/xenbus: Update xenbus event channel on resume
    xen/events: Clear cpu_evtchn_mask before resuming
    xen-pciback: Add name prefix to global 'permissive' variable
    xen: Suspend ticks on all CPUs during suspend
    xen/grant: introduce func gnttab_unmap_refs_sync()
    xen/blkback: safely unmap purge persistent grants

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
    two build fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/fpu: Always restore_xinit_state() when use_eager_cpu()
    x86: Make cpu_tss available to external modules
    efi: Fix error handling in add_sysfs_runtime_map_entry()
    x86/spinlocks: Fix regression in spinlock contention detection
    x86/mm: Clean up types in xlate_dev_mem_ptr()
    x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
    efivarfs: Ensure VariableName is NUL-terminated

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
    PMU driver hardware-enablement addition"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf probe: Fix segfault if passed with ''.
    perf report: Fix -T/--threads option to work again
    perf bench numa: Fix immediate meeting of convergence condition
    perf bench numa: Fixes of --quiet argument
    perf bench futex: Fix hung wakeup tasks after requeueing
    perf probe: Fix bug with global variables handling
    perf top: Fix a segfault when kernel map is restricted.
    tools lib traceevent: Fix build failure on 32-bit arch
    perf kmem: Fix compiles on RHEL6/OL6
    tools lib api: Undefine _FORTIFY_SOURCE before setting it
    perf kmem: Consistently use PRIu64 for printing u64 values
    perf trace: Disable events and drain events when forked workload ends
    perf trace: Enable events when doing system wide tracing and starting a workload
    perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
    perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
    perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu

    Linus Torvalds
     
  • Pull RCU fix from Ingo Molnar:
    "An RCU Kconfig fix that eliminates an annoying interactive kconfig
    question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rcu: Control grace-period delays directly from value

    Linus Torvalds
     

06 May, 2015

22 commits

  • Way back, when the world was a simpler place and there was no war, no
    evil, and no kernel bugs, there was just a single pinctrl lock. That
    was how the world was when (57291ce pinctrl: core device tree mapping
    table parsing support) was written. In that case, there were
    instances where the pinctrl mutex was already held when
    pinctrl_register_map() was called, hence a "locked" parameter was
    passed to the function to indicate that the mutex was already locked
    (so we shouldn't lock it again).

    A few years ago in (42fed7b pinctrl: move subsystem mutex to
    pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex.
    ...but (oops) we forgot to re-think about the whole "locked" parameter
    for pinctrl_register_map(). Basically the "locked" parameter appears
    to still refer to whether the bigger pinctrl_dev mutex is locked, but
    we're using it to skip locks of our (now separate) pinctrl_maps_mutex.

    That's kind of a bad thing(TM). Probably nobody noticed because most
    of the calls to pinctrl_register_map happen at boot time and we've got
    synchronous device probing. ...and even cases where we're
    asynchronous don't end up actually hitting the race too often. ...but
    after banging my head against the wall for a bug that reproduced 1 out
    of 1000 reboots and lots of looking through kgdb, I finally noticed
    this.

    Anyway, we can now safely remove the "locked" parameter and go back to
    a war-free, evil-free, and kernel-bug-free world.

    Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct")
    Signed-off-by: Doug Anderson
    Signed-off-by: Linus Walleij

    Doug Anderson
     
  • Make sure that xen_swiotlb_init allocates buffers that are DMA capable
    when at least one memblock is available below 4G. Otherwise we assume
    that all devices on the SoC can cope with >4G addresses. We do this on
    ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.

    No functional changes on x86.

    From: Chen Baozi

    Signed-off-by: Chen Baozi
    Signed-off-by: Stefano Stabellini
    Tested-by: Chen Baozi
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: David Vrabel

    Stefano Stabellini
     
  • The following commit:

    f893959b0898 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

    removed drop_init_fpu() usage from flush_thread(). This seems to break
    things for me - the Go 1.4 test suite fails all over the place with
    floating point comparision errors (offending commit found through
    bisection).

    The functional change was that flush_thread() after this commit
    only calls restore_init_xstate() when both use_eager_fpu() and
    !used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
    restore_init_xstate() regardless of whether current used_math() - apply
    the same logic here.

    Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
    tsk instead of current, like in the rest of flush_thread().

    Tested-by: Dave Hansen
    Signed-off-by: Bobby Powers
    Signed-off-by: Borislav Petkov
    Acked-by: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Pekka Riikonen
    Cc: Quentin Casasnovas
    Cc: Rik van Riel
    Cc: Suresh Siddha
    Cc: Thomas Gleixner
    Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
    Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
    Signed-off-by: Ingo Molnar

    Bobby Powers
     
  • Pull EFI fixes from Matt Fleming:

    * Avoid garbage names in efivarfs due to buggy firmware by zeroing
    EFI variable name. (Ross Lagerwall)

    * Stop erroneously dropping upper 32 bits of boot command line pointer
    in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)

    * Fix double-free bug in error handling code path of EFI runtime map
    code. (Dan Carpenter)

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …it/acme/linux into perf/urgent

    Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

    - Fix 'perf probe -a' segfault if passed with '' (Wang Nan)

    - Fix report -T/--threads option (Namhyung Kim)

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Pull IPMI fixes from Corey Minyard:
    "Lots of minor IPMI fixes, especially ones that have have come up since
    the SSIF driver has been in the main kernel for a while"

    * tag 'for-linus-4.1-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
    ipmi: Fix multi-part message handling
    ipmi: Add alert handling to SSIF
    ipmi: Fix a problem that messages are not issued in run_to_completion mode
    ipmi: Report an error if ACPI _IFT doesn't exist
    ipmi: Remove unused including
    ipmi: Don't report err in the SI driver for SSIF devices
    ipmi: Remove incorrect use of seq_has_overflowed
    ipmi:ssif: Ignore spaces when comparing I2C adapter names
    ipmi_ssif: Fix the logic on user-supplied addresses

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "16 patches

    This includes a new rtc driver for the Abracon AB x80x and isn't very
    appropriate for -rc2. It was still being fiddled with a bit during
    the merge window and I fell asleep during -rc1"

    [ So I took the new driver, it seems small and won't regress anything.
    I'm a softy. - Linus ]

    * emailed patches from Andrew Morton :
    rtc: armada38x: fix concurrency access in armada38x_rtc_set_time
    ocfs2: dlm: fix race between purge and get lock resource
    nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()
    util_macros.h: have array pointer point to array of constants
    configfs: init configfs module earlier at boot time
    mm/hwpoison-inject: check PageLRU of hpage
    mm/hwpoison-inject: fix refcounting in no-injection case
    mm: soft-offline: fix num_poisoned_pages counting on concurrent events
    rtc: add rtc-abx80x, a driver for the Abracon AB x80x i2c rtc
    Documentation: bindings: add abracon,abx80x
    kasan: show gcc version requirements in Kconfig and Documentation
    mm/memory-failure: call shake_page() when error hits thp tail page
    lib: delete lib/find_last_bit.c
    MAINTAINERS: add co-maintainer for LED subsystem
    zram: add Designated Reviewer for zram in MAINTAINERS
    revert "zram: move compact_store() to sysfs functions area"

    Linus Torvalds
     
  • …linux-platform-drivers-x86

    Pull x86 platform driver fixes from Darren Hart:
    "This includes a trivial warning and adding a Lenovo laptop to an
    existing quirk.

    I've held off on things like the latter in the past, but I didn't feel
    it was risky enough to push out to 4.2.

    - thinkpad_acpi:
    Fix warning for static not at beginning

    - ideapad_laptop:
    Add Lenovo G40-30 to devices without radio switch"

    * tag 'platform-drivers-x86-v4.1-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
    thinkpad_acpi: Fix warning for static not at beginning
    ideapad_laptop: Add Lenovo G40-30 to devices without radio switch

    Linus Torvalds
     
  • Lots of little fixes for multi-part messages:

    The values was not being re-initialized, if something went wrong
    handling a multi-part message and it got left in a bad state, it
    might be an issue.

    The commands were not correct when issuing multi-part reads, the
    code was not passing in the proper value for commands. Also clean
    up some minor formatting issues.

    Get the block number from the right location, limit the maximum send
    message size to 63 bytes and explain why, and fix some minor sylistic
    issues.

    Signed-off-by: Corey Minyard

    Corey Minyard
     
  • The SSIF interface can optionally have an SMBus alert come in when
    data is ready. Unfortunately, the IPMI spec gives wiggle room to
    the implementer to allow them to always have the alert enabled,
    even if the driver doesn't enable it. So implement alerts.
    If you don't in this situation, the SMBus alert handling will
    constantly complain.

    Signed-off-by: Corey Minyard

    Corey Minyard
     
  • start_next_msg() issues a message placed in smi_info->waiting_msg
    if it is non-NULL. However, sender() sets a message to
    smi_info->curr_msg and NULL to smi_info->waiting_msg in the context
    of run_to_completion mode. As the result, it leads an infinite
    loop by waiting the completion of unissued message when leaving
    dying message after kernel panic.

    sender() should set the message to smi_info->waiting_msg not
    curr_msg.

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Corey Minyard

    Hidehiro Kawai
     
  • When probing an ACPI table, report a specific error, instead of just
    returning an error, if _IFT doesn't exist.

    Signed-off-by: Corey Minyard

    Corey Minyard
     
  • Remove including that don't need it.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Corey Minyard

    Wei Yongjun
     
  • While setting the time, the RTC TIME register should not be accessed.
    However due to hardware constraints, setting the RTC time involves
    sleeping during 100ms. This sleep was done outside the critical section
    protected by the spinlock, so it was possible to read the RTC TIME
    register and get an incorrect value. This patch introduces a mutex for
    protecting the RTC TIME access, unlike the spinlock it is allowed to
    sleep in a critical section protected by a mutex.

    The RTC STATUS register can still be used from the interrupt handler but
    it has no effect on setting the time.

    Signed-off-by: Gregory CLEMENT
    Acked-by: Alexandre Belloni
    Acked-by: Andrew Lunn
    Cc: Alessandro Zummo
    Cc: [4.0]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gregory CLEMENT
     
  • There is a race window in dlm_get_lock_resource(), which may return a
    lock resource which has been purged. This will cause the process to
    hang forever in dlmlock() as the ast msg can't be handled due to its
    lock resource not existing.

    dlm_get_lock_resource {
    ...
    spin_lock(&dlm->spinlock);
    tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
    if (tmpres) {
    spin_unlock(&dlm->spinlock);
    >>>>>>>> race window, dlm_run_purge_list() may run and purge
    the lock resource
    spin_lock(&tmpres->spinlock);
    ...
    spin_unlock(&tmpres->spinlock);
    }
    }

    Signed-off-by: Junxiao Bi
    Cc: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • The range check for b-tree level parameter in nilfs_btree_root_broken()
    is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
    though the level is limited to values in the range of 0 to
    (NILFS_BTREE_LEVEL_MAX - 1).

    Since the level parameter is read from storage device and used to index
    nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
    can cause memory overrun during btree operations if the boundary value
    is set to the level parameter on device.

    This fixes the broken sanity check and adds a comment to clarify that
    the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.

    Signed-off-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Using the new find_closest() macro can result in the following sparse
    warnings.

    drivers/hwmon/lm85.c:194:16: warning:
    incorrect type in initializer (different modifiers)
    drivers/hwmon/lm85.c:194:16: expected int *__fc_a
    drivers/hwmon/lm85.c:194:16: got int static const [toplevel] *
    drivers/hwmon/lm85.c:210:16: warning:
    incorrect type in initializer (different modifiers)
    drivers/hwmon/lm85.c:210:16: expected int *__fc_a
    drivers/hwmon/lm85.c:210:16: got int const *map

    This is because the array passed to find_closest() will typically be
    declared as array of constants, but the macro declares a non-constant
    pointer to it.

    Signed-off-by: Guenter Roeck
    Cc: Bartosz Golaszewski

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guenter Roeck
     
  • We need this earlier in the boot process to allow various subsystems to
    use configfs (e.g Industrial IIO).

    Also, debugfs is at core_initcall level and configfs should be on the same
    level from infrastructure point of view.

    Signed-off-by: Daniel Baluta
    Suggested-by: Lars-Peter Clausen
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Baluta
     
  • Hwpoison injector checks PageLRU of the raw target page to find out
    whether the page is an appropriate target, but current code now filters
    out thp tail pages, which prevents us from testing for such cases via this
    interface. So let's check hpage instead of p.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Dean Nelson
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Hidetoshi Seto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Hwpoison injection via debugfs:hwpoison/corrupt-pfn takes a refcount of
    the target page. But current code doesn't release it if the target page
    is not supposed to be injected, which results in memory leak. This patch
    simply adds the refcount releasing code.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Dean Nelson
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Hidetoshi Seto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • If multiple soft offline events hit one free page/hugepage concurrently,
    soft_offline_page() can handle the free page/hugepage multiple times,
    which makes num_poisoned_pages counter increased more than once. This
    patch fixes this wrong counting by checking TestSetPageHWPoison for normal
    papes and by checking the return value of dequeue_hwpoisoned_huge_page()
    for hugepages.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Dean Nelson
    Cc: Andi Kleen
    Cc: [3.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • This is a basic driver for the ultra-low-power Abracon AB x80x series of RTC
    chips. It supports in particular, the supersets AB0805 and AB1805.
    It allows reading and writing the time, and enables the supercapacitor/
    battery charger.

    [arnd@arndb.de: abx805 depends on i2c]
    [alexandre.belloni@free-electrons.com: renam buffer from date to buf in abx80x_rtc_read_time()]
    Signed-off-by: Philippe De Muyter
    Cc: Alessandro Zummo
    Signed-off-by: Alexandre Belloni
    Signed-off-by: Arnd Bergmann
    Cc: Paul Bolle
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philippe De Muyter