04 Jan, 2017

1 commit


02 Jan, 2017

2 commits

  • Linus Torvalds
     
  • Pull DAX updates from Dan Williams:
    "The completion of Jan's DAX work for 4.10.

    As I mentioned in the libnvdimm-for-4.10 pull request, these are some
    final fixes for the DAX dirty-cacheline-tracking invalidation work
    that was merged through the -mm, ext4, and xfs trees in -rc1. These
    patches were prepared prior to the merge window, but we waited for
    4.10-rc1 to have a stable merge base after all the prerequisites were
    merged.

    Quoting Jan on the overall changes in these patches:

    "So I'd like all these 6 patches to go for rc2. The first three
    patches fix invalidation of exceptional DAX entries (a bug which
    is there for a long time) - without these patches data loss can
    occur on power failure even though user called fsync(2). The other
    three patches change locking of DAX faults so that ->iomap_begin()
    is called in a more relaxed locking context and we are safe to
    start a transaction there for ext4"

    These have received a build success notification from the kbuild
    robot, and pass the latest libnvdimm unit tests. There have not been
    any -next releases since -rc1, so they have not appeared there"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    ext4: Simplify DAX fault path
    dax: Call ->iomap_begin without entry lock during dax fault
    dax: Finish fault completely when loading holes
    dax: Avoid page invalidation races and unnecessary radix tree traversals
    mm: Invalidate DAX radix tree entries only if appropriate
    ext2: Return BH_New buffers for zeroed blocks

    Linus Torvalds
     

31 Dec, 2016

2 commits

  • Pull documentation fixes from Jonathan Corbet:
    "Two small fixes:

    - A merge error on my part broke the DocBook build. I've
    requisitioned one of tglx's frozen sharks for appropriate
    disciplinary action and resolved to be more careful about testing
    the DocBook stuff as long as it's still around.

    - Fix an error in unaligned-memory-access.txt"

    * tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
    Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
    docs: Fix build failure

    Linus Torvalds
     
  • Pull crypto fix from Herbert Xu:
    "This fixes a boot failure on some platforms when crypto self test is
    enabled along with the new acomp interface"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: testmgr - Use heap buffer for acomp test input

    Linus Torvalds
     

30 Dec, 2016

2 commits

  • mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
    mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
    return test_bit(PG_waiters);
    ^~~~~~~~

    Fixes: b91e1302ad9b ('mm: optimize PageWaiters bit use for unlock_page()')
    Signed-off-by: Olof Johansson
    Brown-paper-bag-by: Linus Torvalds
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • In commit 62906027091f ("mm: add PageWaiters indicating tasks are
    waiting for a page bit") Nick Piggin made our page locking no longer
    unconditionally touch the hashed page waitqueue, which not only helps
    performance in general, but is particularly helpful on NUMA machines
    where the hashed wait queues can bounce around a lot.

    However, the "clear lock bit atomically and then test the waiters bit"
    sequence turns out to be much more expensive than it needs to be,
    because you get a nasty stall when trying to access the same word that
    just got updated atomically.

    On architectures where locking is done with LL/SC, this would be trivial
    to fix with a new primitive that clears one bit and tests another
    atomically, but that ends up not working on x86, where the only atomic
    operations that return the result end up being cmpxchg and xadd. The
    atomic bit operations return the old value of the same bit we changed,
    not the value of an unrelated bit.

    On x86, we could put the lock bit in the high bit of the byte, and use
    "xadd" with that bit (where the overflow ends up not touching other
    bits), and look at the other bits of the result. However, an even
    simpler model is to just use a regular atomic "and" to clear the lock
    bit, and then the sign bit in eflags will indicate the resulting state
    of the unrelated bit #7.

    So by moving the PageWaiters bit up to bit #7, we can atomically clear
    the lock bit and test the waiters bit on x86 too. And architectures
    with LL/SC (which is all the usual RISC suspects), the particular bit
    doesn't matter, so they are fine with this approach too.

    This avoids the extra access to the same atomic word, and thus avoids
    the costly stall at page unlock time.

    The only downside is that the interface ends up being a bit odd and
    specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
    love the resulting name of the new primitive, but I'd rather make the
    name be descriptive and very clear about the limitation imposed by
    trying to work across all relevant architectures than make it be some
    generic thing that doesn't make the odd semantics explicit.

    So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

    and adds the trivial implementation for x86. We have a generic
    non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
    combination) which can be overridden by any architecture that can do
    better. According to Nick, Power has the same hickup x86 has, for
    example, but some other architectures may not even care.

    All these optimizations mean that my page locking stress-test (which is
    just executing a lot of small short-lived shell scripts: "make test" in
    the git source tree) no longer makes our page locking look horribly bad.
    Before all these optimizations, just the unlock_page() costs were just
    over 3% of all CPU overhead on "make test". After this, it's down to
    0.66%, so just a quarter of the cost it used to be.

    (The difference on NUMA is bigger, but there this micro-optimization is
    likely less noticeable, since the big issue on NUMA was not the accesses
    to 'struct page', but the waitqueue accesses that were already removed
    by Nick's earlier commit).

    Acked-by: Nick Piggin
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

28 Dec, 2016

10 commits

  • Pull crypto fix from Herbert Xu:
    "This fixes a hash corruption bug in the marvell driver"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: marvell - Copy IVDIG before launching partial DMA ahash requests

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

    2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

    3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

    4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: stmmac: fix incorrect bit set in gmac4 mdio addr register
    r8169: add support for RTL8168 series add-on card.
    net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
    openvswitch: upcall: Fix vlan handling.
    ipv4: Namespaceify tcp_tw_reuse knob
    net: korina: Fix NAPI versus resources freeing
    net, sched: fix soft lockup in tc_classify
    net/mlx4_en: Fix user prio field in XDP forward
    tipc: don't send FIN message from connectionless socket
    ipvlan: fix multicast processing
    ipvlan: fix various issues in ipvlan_process_multicast()

    Linus Torvalds
     
  • In the actual implementation ether_addr_equal function tests for equality to 0
    when returning. It seems in commit 0d74c4 it is somehow overlooked to change
    this operator to reflect the actual function.

    Signed-off-by: Cihangir Akturk
    Signed-off-by: Jonathan Corbet

    Cihangir Akturk
     
  • The 80211.tmpl DocBook file was removed in commit 819bf593767c ("docs-rst:
    sphinxify 802.11 documentation"), but the 80211.xml target was re-added to
    the Makefile by commit 7ddedebb03b7 ("ALSA: doc: ReSTize
    writing-an-alsa-driver document"), leading to a failure when building the
    documentation:

    *** No rule to make target 'Documentation/DocBook/80211.xml', needed by
    'Documentation/DocBook/80211.aux.xml'.

    cc: stable@vger.kernel.org
    Signed-off-by: John Brooks
    Mea-culpa-by: Jonathan Corbet
    Signed-off-by: Jonathan Corbet

    John Brooks
     
  • Linux 4.10-rc1

    Jonathan Corbet
     
  • Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of
    OR together with MII_WRITE.

    Signed-off-by: Kweh, Hock Leong
    Acked-By: Joao Pinto
    Signed-off-by: David S. Miller

    Kweh, Hock Leong
     
  • This chip is the same as RTL8168, but its device id is 0x8161.

    Signed-off-by: Chun-Hao Lin
    Signed-off-by: David S. Miller

    Chun-Hao Lin
     
  • After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net:
    remove the warning before XDP linearizing"), there's no users for
    bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
    commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc.

    Cc: Daniel Borkmann
    Cc: John Fastabend
    Signed-off-by: Jason Wang
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jason Wang
     
  • Networking stack accelerate vlan tag handling by
    keeping topmost vlan header in skb. This works as
    long as packet remains in OVS datapath. But during
    OVS upcall vlan header is pushed on to the packet.
    When such packet is sent back to OVS datapath, core
    networking stack might not handle it correctly. Following
    patch avoids this issue by accelerating the vlan tag
    during flow key extract. This simplifies datapath by
    bringing uniform packet processing for packets from
    all code paths.

    Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
    CC: Jarno Rajahalme
    CC: Jiri Benc
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • Different namespaces might have different requirements to reuse
    TIME-WAIT sockets for new connections. This might be required in
    cases where different namespace applications are in place which
    require TIME_WAIT socket connections to be reduced independently
    of the host.

    Signed-off-by: Haishuang Yan
    Signed-off-by: David S. Miller

    Haishuang Yan
     

27 Dec, 2016

12 commits

  • Christopher Covington reported a crash on aarch64 on recent Fedora
    kernels:

    kernel BUG at ./include/linux/scatterlist.h:140!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162
    Hardware name: linux,dummy-virt (DT)
    task: ffff80007c650080 task.stack: ffff800008910000
    PC is at sg_init_one+0xa0/0xb8
    LR is at sg_init_one+0x24/0xb8
    ...
    [] sg_init_one+0xa0/0xb8
    [] test_acomp+0x10c/0x438
    [] alg_test_comp+0xb0/0x118
    [] alg_test+0x17c/0x2f0
    [] cryptomgr_test+0x44/0x50
    [] kthread+0xf8/0x128
    [] ret_from_fork+0x10/0x50

    The test vectors used for input are part of the kernel image. These
    inputs are passed as a buffer to sg_init_one which eventually blows up
    with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
    false for the kernel image since virt_to_page will not return the
    correct page. Fix this by copying the input vectors to heap buffer
    before setting up the scatterlist.

    Reported-by: Christopher Covington
    Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp")
    Signed-off-by: Laura Abbott
    Signed-off-by: Herbert Xu

    Laura Abbott
     
  • Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
    can use transaction starting in ext4_iomap_begin() and thus simplify
    ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.

    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently ->iomap_begin() handler is called with entry lock held. If the
    filesystem held any locks between ->iomap_begin() and ->iomap_end()
    (such as ext4 which will want to hold transaction open), this would cause
    lock inversion with the iomap_apply() from standard IO path which first
    calls ->iomap_begin() and only then calls ->actor() callback which grabs
    entry locks for DAX (if it faults when copying from/to user provided
    buffers).

    Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
    - ->iomap_end() pair.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • The only case when we do not finish the page fault completely is when we
    are loading hole pages into a radix tree. Avoid this special case and
    finish the fault in that case as well inside the DAX fault handler. It
    will allow us for easier iomap handling.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently dax_iomap_rw() takes care of invalidating page tables and
    evicting hole pages from the radix tree when write(2) to the file
    happens. This invalidation is only necessary when there is some block
    allocation resulting from write(2). Furthermore in current place the
    invalidation is racy wrt page fault instantiating a hole page just after
    we have invalidated it.

    So perform the page invalidation inside dax_iomap_actor() where we can
    do it only when really necessary and after blocks have been allocated so
    nobody will be instantiating new hole pages anymore.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
    just delete all exceptional radix tree entries they find. For DAX this
    is not desirable as we track cache dirtiness in these entries and when
    they are evicted, we may not flush caches although it is necessary. This
    can for example manifest when we write to the same block both via mmap
    and via write(2) (to different offsets) and fsync(2) then does not
    properly flush CPU caches when modification via write(2) was the last
    one.

    Create appropriate DAX functions to handle invalidation of DAX entries
    for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
    wire them up into the corresponding mm functions.

    Acked-by: Johannes Weiner
    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • So far we did not return BH_New buffers from ext2_get_blocks() when we
    allocated and zeroed-out a block for DAX inode to avoid racy zeroing in
    DAX code. This zeroing is gone these days so we can remove the
    workaround.

    Reviewed-by: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • If mce_device_init() fails then the mce device pointer is NULL and the
    AMD mce code happily dereferences it.

    Add a sanity check.

    Reported-by: Markus Trippelsdorf
    Reported-by: Boris Ostrovsky
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The attempt to prevent overwriting an active state resulted in a
    disaster which effectively disables all dynamically allocated hotplug
    states.

    Cleanup the mess.

    Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks")
    Reported-by: Markus Trippelsdorf
    Reported-by: Boris Ostrovsky
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Split asm-only parts of arm64 uaccess.h into a new header and use that
    from *.S.

    Signed-off-by: Al Viro

    Al Viro
     
  • Commit beb0babfb77e ("korina: disable napi on close and restart")
    introduced calls to napi_disable() that were missing before,
    unfortunately this leaves a small window during which NAPI has a chance
    to run, yet we just freed resources since korina_free_ring() has been
    called:

    Fix this by disabling NAPI first then freeing resource, and make sure
    that we also cancel the restart task before doing the resource freeing.

    Fixes: beb0babfb77e ("korina: disable napi on close and restart")
    Reported-by: Alexandros C. Couloumbis
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Shahar reported a soft lockup in tc_classify(), where we run into an
    endless loop when walking the classifier chain due to tp->next == tp
    which is a state we should never run into. The issue only seems to
    trigger under load in the tc control path.

    What happens is that in tc_ctl_tfilter(), thread A allocates a new
    tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
    with it. In that classifier callback we had to unlock/lock the rtnl
    mutex and returned with -EAGAIN. One reason why we need to drop there
    is, for example, that we need to request an action module to be loaded.

    This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
    after we loaded and found the requested action, we need to redo the
    whole request so we don't race against others. While we had to unlock
    rtnl in that time, thread B's request was processed next on that CPU.
    Thread B added a new tp instance successfully to the classifier chain.
    When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
    and destroying its tp instance which never got linked, we goto replay
    and redo A's request.

    This time when walking the classifier chain in tc_ctl_tfilter() for
    checking for existing tp instances we had a priority match and found
    the tp instance that was created and linked by thread B. Now calling
    again into tp->ops->change() with that tp was successful and returned
    without error.

    tp_created was never cleared in the second round, thus kernel thinks
    that we need to link it into the classifier chain (once again). tp and
    *back point to the same object due to the match we had earlier on. Thus
    for thread B's already public tp, we reset tp->next to tp itself and
    link it into the chain, which eventually causes the mentioned endless
    loop in tc_classify() once a packet hits the data path.

    Fix is to clear tp_created at the beginning of each request, also when
    we replay it. On the paths that can cause -EAGAIN we already destroy
    the original tp instance we had and on replay we really need to start
    from scratch. It seems that this issue was first introduced in commit
    12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
    and avoid kernel panic when we use cls_cgroup").

    Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
    Reported-by: Shahar Klein
    Signed-off-by: Daniel Borkmann
    Cc: Cong Wang
    Acked-by: Eric Dumazet
    Tested-by: Shahar Klein
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

26 Dec, 2016

11 commits

  • Linus Torvalds
     
  • I am getting the following warning when I build kernel 4.9-git on my
    PowerBook G4 with a 32-bit PPC processor:

    AS arch/powerpc/kernel/misc_32.o
    arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

    This problem is evident after commit 989cea5c14be ("kbuild: prevent
    lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
    error that has been in the code since 2005 when this source file was
    created. That was with commit 9994a33865f4 ("powerpc: Introduce
    entry_{32,64}.S, misc_{32,64}.S, systbl.S").

    The offending line does not make a lot of sense. This error does not
    seem to cause any errors in the executable, thus I am not recommending
    that it be applied to any stable versions.

    Thanks to Nicholas Piggin for suggesting this solution.

    Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
    Signed-off-by: Larry Finger
    Cc: Nicholas Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Linus Torvalds

    Larry Finger
     
  • The timer type simplifications caused a new gcc warning:

    drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’:
    drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));

    despite the actual use of "time_start" not having changed in any way.
    It appears that simply changing the type of ktime_t from a union to a
    plain scalar type made gcc check the use.

    The variable wasn't actually used uninitialized, but gcc apparently
    failed to notice that the conditional around the use was exactly the
    same as the conditional around the initialization of that variable.

    Add an unnecessary initialization just to shut up the compiler.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull timer type cleanups from Thomas Gleixner:
    "This series does a tree wide cleanup of types related to
    timers/timekeeping.

    - Get rid of cycles_t and use a plain u64. The type is not really
    helpful and caused more confusion than clarity

    - Get rid of the ktime union. The union has become useless as we use
    the scalar nanoseconds storage unconditionally now. The 32bit
    timespec alike storage got removed due to the Y2038 limitations
    some time ago.

    That leaves the odd union access around for no reason. Clean it up.

    Both changes have been done with coccinelle and a small amount of
    manual mopping up"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ktime: Get rid of ktime_equal()
    ktime: Cleanup ktime_set() usage
    ktime: Get rid of the union
    clocksource: Use a plain u64 instead of cycle_t

    Linus Torvalds
     
  • Pull SMP hotplug notifier removal from Thomas Gleixner:
    "This is the final cleanup of the hotplug notifier infrastructure. The
    series has been reintgrated in the last two days because there came a
    new driver using the old infrastructure via the SCSI tree.

    Summary:

    - convert the last leftover drivers utilizing notifiers

    - fixup for a completely broken hotplug user

    - prevent setup of already used states

    - removal of the notifiers

    - treewide cleanup of hotplug state names

    - consolidation of state space

    There is a sphinx based documentation pending, but that needs review
    from the documentation folks"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/armada-xp: Consolidate hotplug state space
    irqchip/gic: Consolidate hotplug state space
    coresight/etm3/4x: Consolidate hotplug state space
    cpu/hotplug: Cleanup state names
    cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
    staging/lustre/libcfs: Convert to hotplug state machine
    scsi/bnx2i: Convert to hotplug state machine
    scsi/bnx2fc: Convert to hotplug state machine
    cpu/hotplug: Prevent overwriting of callbacks
    x86/msr: Remove bogus cleanup from the error path
    bus: arm-ccn: Prevent hotplug callback leak
    perf/x86/intel/cstate: Prevent hotplug callback leak
    ARM/imx/mmcd: Fix broken cpu hotplug handling
    scsi: qedi: Convert to hotplug state machine

    Linus Torvalds
     
  • Pull turbostat updates from Len Brown.

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: remove obsolete -M, -m, -C, -c options
    tools/power turbostat: Make extensible via the --add parameter
    tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
    tools/power turbostat: line up headers when -M is used
    tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
    tools/power turbostat: Support Knights Mill (KNM)
    tools/power turbostat: Display HWP OOB status
    tools/power turbostat: fix Denverton BCLK
    tools/power turbostat: use intel-family.h model strings
    tools/power/turbostat: Add Denverton RAPL support
    tools/power/turbostat: Add Denverton support
    tools/power/turbostat: split core MSR support into status + limit
    tools/power turbostat: fix error case overflow read of slm_freq_table[]
    tools/power turbostat: Allocate correct amount of fd and irq entries
    tools/power turbostat: switch to tab delimited output
    tools/power turbostat: Gracefully handle ACPI S3
    tools/power turbostat: tidy up output on Joule counter overflow

    Linus Torvalds
     
  • Add a new page flag, PageWaiters, to indicate the page waitqueue has
    tasks waiting. This can be tested rather than testing waitqueue_active
    which requires another cacheline load.

    This bit is always set when the page has tasks on page_waitqueue(page),
    and is set and cleared under the waitqueue lock. It may be set when
    there are no tasks on the waitqueue, which will cause a harmless extra
    wakeup check that will clears the bit.

    The generic bit-waitqueue infrastructure is no longer used for pages.
    Instead, waitqueues are used directly with a custom key type. The
    generic code was not flexible enough to have PageWaiters manipulation
    under the waitqueue lock (which simplifies concurrency).

    This improves the performance of page lock intensive microbenchmarks by
    2-3%.

    Putting two bits in the same word opens the opportunity to remove the
    memory barrier between clearing the lock bit and testing the waiters
    bit, after some work on the arch primitives (e.g., ensuring memory
    operand widths match and cover both bits).

    Signed-off-by: Nicholas Piggin
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • A page is not added to the swap cache without being swap backed,
    so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache.

    Signed-off-by: Nicholas Piggin
    Acked-by: Hugh Dickins
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • No point in going through loops and hoops instead of just comparing the
    values.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime_set(S,N) was required for the timespec storage type and is still
    useful for situations where a Seconds and Nanoseconds part of a time value
    needs to be converted. For anything where the Seconds argument is 0, this
    is pointless and can be replaced with a simple assignment.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner