05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

29 Mar, 2016

1 commit

  • Update the definition of memcpy_from_pmem() to return 0 or a negative
    error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

    Cc: Borislav Petkov
    Cc: Tony Luck
    Cc: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Acked-by: Ingo Molnar
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

27 Mar, 2016

3 commits

  • Pull Ceph updates from Sage Weil:
    "There is quite a bit here, including some overdue refactoring and
    cleanup on the mon_client and osd_client code from Ilya, scattered
    writeback support for CephFS and a pile of bug fixes from Zheng, and a
    few random cleanups and fixes from others"

    [ I already decided not to pull this because of it having been rebased
    recently, but ended up changing my mind after all. Next time I'll
    really hold people to it. Oh well. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
    libceph: use KMEM_CACHE macro
    ceph: use kmem_cache_zalloc
    rbd: use KMEM_CACHE macro
    ceph: use lookup request to revalidate dentry
    ceph: kill ceph_get_dentry_parent_inode()
    ceph: fix security xattr deadlock
    ceph: don't request vxattrs from MDS
    ceph: fix mounting same fs multiple times
    ceph: remove unnecessary NULL check
    ceph: avoid updating directory inode's i_size accidentally
    ceph: fix race during filling readdir cache
    libceph: use sizeof_footer() more
    ceph: kill ceph_empty_snapc
    ceph: fix a wrong comparison
    ceph: replace CURRENT_TIME by current_fs_time()
    ceph: scattered page writeback
    libceph: add helper that duplicates last extent operation
    libceph: enable large, variable-sized OSD requests
    libceph: osdc->req_mempool should be backed by a slab pool
    libceph: make r_request msg_size calculation clearer
    ...

    Linus Torvalds
     
  • Pull NTB bug fixes from Jon Mason:
    "NTB bug fixes for tasklet from spinning forever, link errors,
    translation window setup, NULL ptr dereference, and ntb-perf errors.

    Also, a modification to the driver API that makes _addr functions
    optional"

    * tag 'ntb-4.6' of git://github.com/jonmason/ntb:
    NTB: Remove _addr functions from ntb_hw_amd
    NTB: Make _addr functions optional in the API
    NTB: Fix incorrect clean up routine in ntb_perf
    NTB: Fix incorrect return check in ntb_perf
    ntb: fix possible NULL dereference
    ntb: add missing setup of translation window
    ntb: stop link work when we do not have memory
    ntb: stop tasklet from spinning forever during shutdown.
    ntb: perf test: fix address space confusion

    Linus Torvalds
     
  • Pull more SCSI updates from James Bottomley:
    "The only new stuff which missed the first pull request is an update to
    the UFS driver.

    The rest is an assortment of bug fixes and minor tweaks which appeared
    recently (some are fixes for recent code and some are stuff spotted
    recently by the checkers or the new gcc-6 compiler [most of Arnd's
    stuff])"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
    scsi_common: do not clobber fixed sense information
    scsi: ufs: select CONFIG_NLS
    scsi: fc: use get/put_unaligned64 for wwn access
    fnic: move printk()s outside of the critical code section.
    qla2xxx: avoid maybe_uninitialized warning
    megaraid_sas: add missing curly braces in ioctl handler
    lpfc: fix misleading indentation
    scsi_transport_sas: add 'scsi_target_id' sysfs attribute
    scsi_dh_alua: uninitialized variable in alua_check_vpd()
    scsi: ufs-qcom: add printouts of testbus debug registers
    scsi: ufs-qcom: enable/disable the device ref clock
    scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup
    scsi: ufs: add device quirk delay before putting UFS rails in LPM
    scsi: ufs: fix leakage during link off state
    scsi: ufs: tune UniPro parameters to optimize hibern8 exit time
    scsi: ufs: handle non spec compliant bkops behaviour by device
    scsi: ufs: add retry for query descriptors
    scsi: ufs: add error recovery after DL NAC error
    scsi: ufs: make error handling bit faster
    scsi: ufs: disable vccq if it's not needed by UFS device
    ...

    Linus Torvalds
     

26 Mar, 2016

20 commits

  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Add GFP flags to KASAN hooks for future patches to use.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Add KASAN hooks to SLAB allocator.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • A leftover from commit c32b3cbe0d06 ("oom, PM: make OOM detection in the
    freezer path raceless").

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • "oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
    to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it
    by simply checking tsk->oom_reaper_list != NULL.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Entries are only added/removed from oom_reaper_list at head so we can
    use a single linked list and hence save a word in task_struct.

    Signed-off-by: Vladimir Davydov
    Signed-off-by: Michal Hocko
    Cc: Tetsuo Handa
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Tetsuo has reported that oom_kill_allocating_task=1 will cause
    oom_reaper_list corruption because oom_kill_process doesn't follow
    standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
    the same task multiple times - e.g. by sacrificing the same child
    multiple times.

    This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
    which is set in oom_kill_process atomically and oom reaper is disabled
    if the flag was already set.

    Signed-off-by: Michal Hocko
    Reported-by: Tetsuo Handa
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • wake_oom_reaper has allowed only 1 oom victim to be queued. The main
    reason for that was the simplicity as other solutions would require some
    way of queuing. The current approach is racy and that was deemed
    sufficient as the oom_reaper is considered a best effort approach to
    help with oom handling when the OOM victim cannot terminate in a
    reasonable time. The race could lead to missing an oom victim which can
    get stuck

    out_of_memory
    wake_oom_reaper
    cmpxchg // OK
    oom_reaper
    oom_reap_task
    __oom_reap_task
    oom_victim terminates
    atomic_inc_not_zero // fail
    out_of_memory
    wake_oom_reaper
    cmpxchg // fails
    task_to_reap = NULL

    This race requires 2 OOM invocations in a short time period which is not
    very likely but certainly not impossible. E.g. the original victim
    might have not released a lot of memory for some reason.

    The situation would improve considerably if wake_oom_reaper used a more
    robust queuing. This is what this patch implements. This means adding
    oom_reaper_list list_head into task_struct (eat a hole before embeded
    thread_struct for that purpose) and a oom_reaper_lock spinlock for
    queuing synchronization. wake_oom_reaper will then add the task on the
    queue and oom_reaper will dequeue it.

    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Andrea Argangeli
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • When oom_reaper manages to unmap all the eligible vmas there shouldn't
    be much of the freable memory held by the oom victim left anymore so it
    makes sense to clear the TIF_MEMDIE flag for the victim and allow the
    OOM killer to select another task.

    The lack of TIF_MEMDIE also means that the victim cannot access memory
    reserves anymore but that shouldn't be a problem because it would get
    the access again if it needs to allocate and hits the OOM killer again
    due to the fatal_signal_pending resp. PF_EXITING check. We can safely
    hide the task from the OOM killer because it is clearly not a good
    candidate anymore as everyhing reclaimable has been torn down already.

    This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
    and thus hold off further global OOM killer actions granted the oom
    reaper is able to take mmap_sem for the associated mm struct. This is
    not guaranteed now but further steps should make sure that mmap_sem for
    write should be blocked killable which will help to reduce such a lock
    contention. This is not done by this patch.

    Note that exit_oom_victim might be called on a remote task from
    __oom_reap_task now so we have to check and clear the flag atomically
    otherwise we might race and underflow oom_victims or wake up waiters too
    early.

    Signed-off-by: Michal Hocko
    Suggested-by: Johannes Weiner
    Suggested-by: Tetsuo Handa
    Cc: Andrea Argangeli
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • This patch (of 5):

    This is based on the idea from Mel Gorman discussed during LSFMM 2015
    and independently brought up by Oleg Nesterov.

    The OOM killer currently allows to kill only a single task in a good
    hope that the task will terminate in a reasonable time and frees up its
    memory. Such a task (oom victim) will get an access to memory reserves
    via mark_oom_victim to allow a forward progress should there be a need
    for additional memory during exit path.

    It has been shown (e.g. by Tetsuo Handa) that it is not that hard to
    construct workloads which break the core assumption mentioned above and
    the OOM victim might take unbounded amount of time to exit because it
    might be blocked in the uninterruptible state waiting for an event (e.g.
    lock) which is blocked by another task looping in the page allocator.

    This patch reduces the probability of such a lockup by introducing a
    specialized kernel thread (oom_reaper) which tries to reclaim additional
    memory by preemptively reaping the anonymous or swapped out memory owned
    by the oom victim under an assumption that such a memory won't be needed
    when its owner is killed and kicked from the userspace anyway. There is
    one notable exception to this, though, if the OOM victim was in the
    process of coredumping the result would be incomplete. This is
    considered a reasonable constrain because the overall system health is
    more important than debugability of a particular application.

    A kernel thread has been chosen because we need a reliable way of
    invocation so workqueue context is not appropriate because all the
    workers might be busy (e.g. allocating memory). Kswapd which sounds
    like another good fit is not appropriate as well because it might get
    blocked on locks during reclaim as well.

    oom_reaper has to take mmap_sem on the target task for reading so the
    solution is not 100% because the semaphore might be held or blocked for
    write but the probability is reduced considerably wrt. basically any
    lock blocking forward progress as described above. In order to prevent
    from blocking on the lock without any forward progress we are using only
    a trylock and retry 10 times with a short sleep in between. Users of
    mmap_sem which need it for write should be carefully reviewed to use
    _killable waiting as much as possible and reduce allocations requests
    done with the lock held to absolute minimum to reduce the risk even
    further.

    The API between oom killer and oom reaper is quite trivial.
    wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
    NULL->mm transition and oom_reaper clear this atomically once it is done
    with the work. This means that only a single mm_struct can be reaped at
    the time. As the operation is potentially disruptive we are trying to
    limit it to the ncessary minimum and the reaper blocks any updates while
    it operates on an mm. mm_struct is pinned by mm_count to allow parallel
    exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).

    Signed-off-by: Michal Hocko
    Suggested-by: Oleg Nesterov
    Suggested-by: Mel Gorman
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Cc: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Cc: Andrea Argangeli
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • This will be needed in the patch "mm, oom: introduce oom reaper".

    Acked-by: Michal Hocko
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When security is enabled, security module can call filesystem's
    getxattr/setxattr callbacks during d_instantiate(). For cephfs,
    d_instantiate() is usually called by MDS' dispatch thread, while
    handling MDS reply. If the MDS reply does not include xattrs and
    corresponding caps, getxattr/setxattr need to send a new request
    to MDS and waits for the reply. This makes MDS' dispatch sleep,
    nobody handles later MDS replies.

    The fix is make sure lookup/atomic_open reply include xattrs and
    corresponding caps. So getxattr can be handled by cached xattrs.
    This requires some modification to both MDS and request message.
    (Client tells MDS what caps it wants; MDS encodes proper caps in
    the reply)

    Smack security module may call setxattr during d_instantiate().
    Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
    to us. So just make setxattr return error when called by MDS'
    dispatch thread.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • This helper duplicates last extent operation in OSD request, then
    adjusts the new extent operation's offset and length. The helper
    is for scatterd page writeback, which adds nonconsecutive dirty
    pages to single OSD request.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Turn r_ops into a flexible array member to enable large, consisting of
    up to 16 ops, OSD requests. The use case is scattered writeback in
    cephfs and, as far as the kernel client is concerned, 16 is just a made
    up number.

    r_ops had size 3 for copyup+hint+write, but copyup is really a special
    case - it can only happen once. ceph_osd_request_cache is therefore
    stuffed with num_ops=2 requests, anything bigger than that is allocated
    with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
    means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
    users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
    that.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • This avoids defining large array of r_reply_op_{len,result} in
    in struct ceph_osd_request.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Follow userspace nomenclature on this - the next commit adds
    outdata_len.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Unless we are in the process of setting up a client (i.e. connecting to
    the monitor cluster for the first time), apply a backoff: every time we
    want to reopen a session, increase our timeout by a multiple (currently
    2); when we complete the connection, reduce that multipler by 50%.

    Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Split ping interval and ping timeout: ping interval is 10s; keepalive
    timeout is 30s.

    Make monc_ping_timeout a constant while at it - it's not actually
    exported as a mount option (and the rest of tick-related settings won't
    be either), so it's got no place in ceph_options.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • It is currently hard-coded in the mon_client that mdsmap and monmap
    subs are continuous, while osdmap sub is always "onetime". To better
    handle full clusters/pools in the osd_client, we need to be able to
    issue continuous osdmap subs. Revamp subs code to allow us to specify
    for each sub whether it should be continuous or not.

    Although not strictly required for the above, switch to SUBSCRIBE2
    protocol while at it, eliminating the ambiguity between a request for
    "every map since X" and a request for "just the latest" when we don't
    have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
    required - it's been supported since pre-argonaut (2010).

    Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
    in before we validate the epoch and successfully install the new map
    can mess up mon_client sub state.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

25 Mar, 2016

15 commits

  • Pull drm fixes from Dave Airlie:
    "Just a couple of dma-buf related fixes and some amdgpu fixes, along
    with a regression fix for radeon off but default feature, but makes my
    30" monitor happy again"

    * 'drm-next' of git://people.freedesktop.org/~airlied/linux:
    drm/radeon/mst: cleanup code indentation
    drm/radeon/mst: fix regression in lane/link handling.
    drm/amdgpu: add invalidate_page callback for userptrs
    drm/amdgpu: Revert "remove the userptr rmn->lock"
    drm/amdgpu: clean up path handling for powerplay
    drm/amd/powerplay: fix memory leak of tdp_table
    dma-buf/fence: fix fence_is_later v2
    dma-buf: Update docs for SYNC ioctl
    drm: remove excess description
    dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()
    drm/atmel-hlcdc: use helper to get crtc state
    drm/atomic: use helper to get crtc state

    Linus Torvalds
     
  • Pull asm-generic updates from Arnd Bergmann:
    "There are only three patches this time, most other changes to files in
    include/asm-generic tend to go through the tree of whoever depends on
    the change.

    Two patches are cleanups for stuff that is no longer needed, the main
    change is to adapt the generic version of BUG_ON() for CONFIG_BUG=n to
    make it behave consistently with BUG().

    This avoids undefined behavior along with a number of warnings about
    that undefined behavior in randconfig builds when we keep going on
    after hitting a BUG_ON()"

    * tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
    asm-generic: remove old nonatomic-io wrapper files
    asm-generic: default BUG_ON(x) to if(x)BUG()
    asm-generic: page.h: Remove useless get_user_page and free_user_page

    Linus Torvalds
     
  • Pull more power management and ACPI updates from Rafael Wysocki:
    "The second batch of power management and ACPI updates for v4.6.

    Included are fixups on top of the previous PM/ACPI pull request and
    other material that didn't make into it but still should go into 4.6.

    Among other things, there's a fix for an intel_pstate driver issue
    uncovered by recent cpufreq changes, a workaround for a boot hang on
    Skylake-H related to the handling of deep C-states by the platform and
    a PCI/ACPI fix for the handling of IO port resources on non-x86
    architectures plus some new device IDs and similar.

    Specifics:

    - Fix for an intel_pstate driver issue related to the handling of MSR
    updates uncovered by the recent cpufreq rework (Rafael Wysocki).

    - cpufreq core cleanups related to starting governors and frequency
    synchronization during resume from system suspend and a locking fix
    for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).

    - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
    Michael Neuling, Richard Cochran, Shilpasri Bhat).

    - intel_idle driver update preventing some Skylake-H systems from
    hanging during initialization by disabling deep C-states mishandled
    by the platform in the problematic configurations (Len Brown).

    - Intel Xeon Phi Processor x200 support for intel_idle
    (Dasaratharaman Chandramouli).

    - cpuidle menu governor updates to make it always honor PM QoS
    latency constraints (and prevent C1 from being used as the fallback
    C-state on x86 when they are set below its exit latency) and to
    restore the previous behavior to fall back to C1 if the next timer
    event is set far enough in the future that was changed in 4.4 which
    led to an energy consumption regression (Rik van Riel, Rafael
    Wysocki).

    - New device ID for a future AMD UART controller in the ACPI driver
    for AMD SoCs (Wang Hongcheng).

    - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
    scaling (AVS) driver (David Wu).

    - ACPI PCI resources management fix for the handling of IO space
    resources on architectures where the IO space is memory mapped
    (IA64 and ARM64) broken by the introduction of common ACPI
    resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).

    - Fix for the ACPI backend of the generic device properties API to
    make it parse non-device (data node only) children of an ACPI
    device correctly (Irina Tirdea).

    - Fixes for the handling of global suspend flags (introduced in 4.4)
    during hibernation and resume from it (Lukas Wunner).

    - Support for obtaining configuration information from Device Trees
    in the PM clocks framework (Jon Hunter).

    - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
    King, Geert Uytterhoeven)"

    * tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
    PM / AVS: rockchip-io: add io selectors and supplies for rk3399
    intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
    intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
    ACPI / PM: Runtime resume devices when waking from hibernate
    PM / sleep: Clear pm_suspend_global_flags upon hibernate
    cpufreq: governor: Always schedule work on the CPU running update
    cpufreq: Always update current frequency before startig governor
    cpufreq: Introduce cpufreq_update_current_freq()
    cpufreq: Introduce cpufreq_start_governor()
    cpufreq: powernv: Add sysfs attributes to show throttle stats
    cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
    PCI: ACPI: IA64: fix IO port generic range check
    ACPI / util: cast data to u64 before shifting to fix sign extension
    cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
    cpuidle: menu: Fall back to polling if next timer event is near
    cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
    intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
    cpufreq: Make cpufreq_quick_get() safe to call
    ACPI / property: fix data node parsing in acpi_get_next_subnode()
    ACPI / APD: Add device HID for future AMD UART controller
    ...

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Final round of fixes for this merge window - some of this has come up
    after the initial pull request, and some of it was put in a post-merge
    branch before the merge window.

    This contains:

    - Fix for a bad check for an error on dma mapping in the mtip32xx
    driver, from Alexey Khoroshilov.

    - A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

    - An NVMe completion record corruption fix from Marta, ensuring that
    we read things in the right order.

    - Two writeback fixes from Tejun, marked for stable@ as well.

    - A blk-mq sw queue iterator fix from Thomas, fixing an oops for
    sparse CPU maps. They hit this in the hot plug/unplug rework"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    nvme: avoid cqe corruption when update at the same time as read
    writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
    writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
    blk-mq: Use proper cpumask iterator
    mtip32xx: fix checks for dma mapping errors
    lightnvm: do not load L2P table if not supported
    lightnvm: do not reserve lun on l2p loading
    nvme: lightnvm: return ppa completion status
    lightnvm: add a bitmap of luns
    lightnvm: specify target's logical address area
    null_blk: add lightnvm null_blk device to the nullb_list

    Linus Torvalds
     
  • Pull MTD updates from Brian Norris:
    "NAND:
    - Add sunxi_nand randomizer support
    - begin refactoring NAND ecclayout structs
    - fix pxa3xx_nand dmaengine usage
    - brcmnand: fix support for v7.1 controller
    - add Qualcomm NAND controller driver

    SPI NOR:
    - add new ls1021a, ls2080a support to Freescale QuadSPI
    - add new flash ID entries
    - support bottom-block protection for Winbond flash
    - support Status Register Write Protect
    - remove broken QPI support for Micron SPI flash

    JFFS2:
    - improve post-mount CRC scan efficiency

    General:
    - refactor bcm63xxpart parser, to later extend for NAND
    - add writebuf size parameter to mtdram

    Other minor code quality improvements"

    * tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits)
    mtd: nand: remove kerneldoc for removed function parameter
    mtd: nand: Qualcomm NAND controller driver
    dt/bindings: qcom_nandc: Add DT bindings
    mtd: nand: don't select chip in nand_chip's block_bad op
    mtd: spi-nor: support lock/unlock for a few Winbond chips
    mtd: spi-nor: add TB (Top/Bottom) protect support
    mtd: spi-nor: add SPI_NOR_HAS_LOCK flag
    mtd: spi-nor: use BIT() for flash_info flags
    mtd: spi-nor: disallow further writes to SR if WP# is low
    mtd: spi-nor: make lock/unlock bounds checks more obvious and robust
    mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region
    mtd: spi-nor: wait for SR_WIP to clear on initial unlock
    mtd: nand: simplify nand_bch_init() usage
    mtd: mtdswap: remove useless if (!mtd->ecclayout) test
    mtd: create an mtd_oobavail() helper and make use of it
    mtd: kill the ecclayout->oobavail field
    mtd: nand: check status before reporting timeout
    mtd: bcm63xxpart: give width specifier an 'int', not 'size_t'
    mtd: mtdram: Add parameter for setting writebuf size
    mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd'
    ...

    Linus Torvalds
     
  • Pull more nfsd updates from Bruce Fields:
    "Apologies for the previous request, which omitted the top 8 commits
    from my for-next branch (including the SCSI layout commits). Thanks
    to Trond for spotting my error!"

    This actually includes the new layout types, so here's that part of
    the pull message repeated:

    "Support for a new pnfs layout type from Christoph Hellwig. The new
    layout type is a variant of the block layout which uses SCSI features
    to offer improved fencing and device identification.

    Note this pull request also includes the client side of SCSI layout,
    with Trond's permission"

    * tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
    nfsd: use short read as well as i_size to set eof
    nfsd: better layoutupdate bounds-checking
    nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
    nfsd: add SCSI layout support
    nfsd: move some blocklayout code
    nfsd: add a new config option for the block layout driver
    nfs/blocklayout: add SCSI layout support
    nfs4.h: add SCSI layout definitions

    Linus Torvalds
     
  • Pull kbuild updates from Michal Marek:

    - make dtbs_install fix

    - Error handling fix fixdep and link-vmlinux.sh

    - __UNIQUE_ID fix for clang

    - Fix for if_changed_* to suppress the "is up to date." message

    - The kernel is built with -Werror=incompatible-pointer-types

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kbuild: Add option to turn incompatible pointer check into error
    kbuild: suppress annoying "... is up to date." message
    kbuild: fixdep: Check fstat(2) return value
    scripts/link-vmlinux.sh: force error on kallsyms failure
    Kbuild: provide a __UNIQUE_ID for clang
    dtbsinstall: don't move target directory out of the way

    Linus Torvalds
     
  • Pull parisc updates from Helge Deller:
    "This patchset adds stack usage debug info for parisc and metag (on
    both the stack grows upwards), switches to the new generic realative
    extable search and sort routines, drops the long time ago removed
    syscalls alloc_hugepages and free_hugepages and wires up the new
    preadv2 and pwritev2 syscalls"

    * 'parisc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Wire up preadv2 and pwritev2 syscalls
    parisc: Use generic extable search and sort routines
    parisc: Panic immediately when panic_on_oops
    parisc,metag: Implement CONFIG_DEBUG_STACK_USAGE option
    parisc: Drop alloc_hugepages and free_hugepages syscalls

    Linus Torvalds
     
  • * pm-avs:
    PM / AVS: rockchip-io: add io selectors and supplies for rk3399

    * pm-clk:
    PM / clk: Add support for obtaining clocks from device-tree

    * pm-devfreq:
    PM / devfreq: Spelling s/frequnecy/frequency/

    * pm-sleep:
    ACPI / PM: Runtime resume devices when waking from hibernate
    PM / sleep: Clear pm_suspend_global_flags upon hibernate

    Rafael J. Wysocki
     
  • Pull tracing updates from Steven Rostedt:
    "Nothing major this round. Mostly small clean ups and fixes.

    Some visible changes:

    - A new flag was added to distinguish traces done in NMI context.

    - Preempt tracer now shows functions where preemption is disabled but
    interrupts are still enabled.

    Other notes:

    - Updates were done to function tracing to allow better performance
    with perf.

    - Infrastructure code has been added to allow for a new histogram
    feature for recording live trace event histograms that can be
    configured by simple user commands. The feature itself was just
    finished, but needs a round in linux-next before being pulled.

    This only includes some infrastructure changes that will be needed"

    * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
    tracing: Record and show NMI state
    tracing: Fix trace_printk() to print when not using bprintk()
    tracing: Remove redundant reset per-CPU buff in irqsoff tracer
    x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
    tracing: Fix crash from reading trace_pipe with sendfile
    tracing: Have preempt(irqs)off trace preempt disabled functions
    tracing: Fix return while holding a lock in register_tracer()
    ftrace: Use kasprintf() in ftrace_profile_tracefs()
    ftrace: Update dynamic ftrace calls only if necessary
    ftrace: Make ftrace_hash_rec_enable return update bool
    tracing: Fix typoes in code comment and printk in trace_nop.c
    tracing, writeback: Replace cgroup path to cgroup ino
    tracing: Use flags instead of bool in trigger structure
    tracing: Add an unreg_all() callback to trigger commands
    tracing: Add needs_rec flag to event triggers
    tracing: Add a per-event-trigger 'paused' field
    tracing: Add get_syscall_name()
    tracing: Add event record param to trigger_ops.func()
    tracing: Make event trigger functions available
    tracing: Make ftrace_event_field checking functions available
    ...

    Linus Torvalds
     
  • Pull thermal updates from Zhang Rui:

    - Fix a regression where bogus trip points on some Lenovo laptops start
    to screw up thermal control after commit 81ad4276b505 ("Thermal:
    initialize thermal zone device correctly").

    On these Lenovo laptops, a bogus passive trip point is reported,
    which is 0 degree Celsius. Without commit 81ad4276b505, thermal zone
    fails to set cooling devices to proper cooling state, which is a bug.
    But with commit 81ad4276b505 applied, the processors are always
    throttled on these Lenovo laptops because the current temperature is
    always higher than the passive trip point.

    Fix things to ignore such bogus trip points. (Zhang Rui)

    - Introduce Mediatek thermal driver. (Sascha Hauer)

    - Introduce devm_ versions of OF thermal sensor register API. (Laxman
    Dewangan)

    - Changes in Kconfigs to allow compile test on UM arch. (Krzysztof
    Kozlowski)

    - Introduce Skylake support in intel_pch_thermal driver. (Srinivas
    Pandruvada)

    - Several small fixes on Rockchip, TI-SoC, Tegra, RCar, and Exynos
    thermal drivers.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (26 commits)
    Thermal: Ignore invalid trip points
    thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros
    thermal: intel_pch_thermal: Enable Skylake PCH thermal
    thermal: doc: Add details of devm_thermal_zone_of_sensor_{register,unregister}
    thermal: of-thermal: Add devm version of thermal_zone_of_sensor_register
    thermal: doc: Add details of thermal_zone_of_sensor_{register,unregister}
    thermal: exynos: Defer probe if vtmu is present but not registered
    thermal: exynos: Use devm_regulator_get_optional() for vtmu
    thermal: exynos: List vtmu-supply as optional property in DT binding
    thermal: exynos: Print a message about exceeded number of supported trip-points
    thermal: exynos: Document number of supported trip-points
    thermal: exynos: Document compatible for Exynos5433 TMU
    thermal: mtk: allow compile testing on UM
    thermal: tegra_soctherm: fix sign bit of temperature
    thermal: Fix build error of missing devm_ioremap_resource on UM
    thermal: ti-soc-thermal: clean up the error handling a bit
    thermal: rcar: Use ARCH_RENESAS
    thermal: rcar_thermal: don't open code of_device_get_match_data()
    thermal: db8500_cpufreq_cooling: Compile with COMPILE_TEST
    thermal: rockchip: fix the tsadc sequence output on rk3228/rk3399
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Various bugfixes, a RDMA update from Chuck Lever, and support for a
    new pnfs layout type from Christoph Hellwig. The new layout type is a
    variant of the block layout which uses SCSI features to offer improved
    fencing and device identification.

    (Also: note this pull request also includes the client side of SCSI
    layout, with Trond's permission.)"

    * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
    sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
    nfsd: recover: fix memory leak
    nfsd: fix deadlock secinfo+readdir compound
    nfsd4: resfh unused in nfsd4_secinfo
    svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
    svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
    svcrdma: Remove close_out exit path
    svcrdma: Hook up the logic to return ERR_CHUNK
    svcrdma: Use correct XID in error replies
    svcrdma: Make RDMA_ERROR messages work
    rpcrdma: Add RPCRDMA_HDRLEN_ERR
    svcrdma: svc_rdma_post_recv() should close connection on error
    svcrdma: Close connection when a send error occurs
    nfsd: Lower NFSv4.1 callback message size limit
    svcrdma: Do not send Write chunk XDR pad with inline content
    svcrdma: Do not write xdr_buf::tail in a Write chunk
    svcrdma: Find client-provided write and reply chunks once per reply
    nfsd: Update NFS server comments related to RDMA support
    nfsd: Fix a memory leak when meeting unsupported state_protect_how4
    nfsd4: fix bad bounds checking

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "This tree contains various perf fixes on the kernel side, plus three
    hw/event-enablement late additions:

    - Intel Memory Bandwidth Monitoring events and handling
    - the AMD Accumulated Power Mechanism reporting facility
    - more IOMMU events

    ... and a final round of perf tooling updates/fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    perf llvm: Use strerror_r instead of the thread unsafe strerror one
    perf llvm: Use realpath to canonicalize paths
    perf tools: Unexport some methods unused outside strbuf.c
    perf probe: No need to use formatting strbuf method
    perf help: Use asprintf instead of adhoc equivalents
    perf tools: Remove unused perf_pathdup, xstrdup functions
    perf tools: Do not include stringify.h from the kernel sources
    tools include: Copy linux/stringify.h from the kernel
    tools lib traceevent: Remove redundant CPU output
    perf tools: Remove needless 'extern' from function prototypes
    perf tools: Simplify die() mechanism
    perf tools: Remove unused DIE_IF macro
    perf script: Remove lots of unused arguments
    perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
    perf machine: Rename perf_event__preprocess_sample to machine__resolve
    perf tools: Add cpumode to struct perf_sample
    perf tests: Forward the perf_sample in the dwarf unwind test
    perf tools: Remove misplaced __maybe_unused
    perf list: Fix documentation of :ppp
    perf bench numa: Fix assertion for nodes bitfield
    ...

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes:

    - fix hotplug bugs
    - fix irq live lock
    - fix various topology handling bugs
    - fix APIC ACK ordering
    - fix PV iopl handling
    - fix speling
    - fix/tweak memcpy_mcsafe() return value
    - fix fbcon bug
    - remove stray prototypes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/msr: Remove unused native_read_tscp()
    x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
    x86/oprofile/nmi: Add missing hotplug FROZEN handling
    x86/hpet: Use proper mask to modify hotplug action
    x86/apic/uv: Fix the hotplug notifier
    x86/apb/timer: Use proper mask to modify hotplug action
    x86/topology: Use total_cpus not nr_cpu_ids for logical packages
    x86/topology: Fix Intel HT disable
    x86/topology: Fix logical package mapping
    x86/irq: Cure live lock in fixup_irqs()
    x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
    x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
    x86/iopl: Fix iopl capability check on Xen PV
    x86/iopl/64: Properly context-switch IOPL on Xen PV
    selftests/x86: Add an iopl test
    x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
    x86/video: Don't assume all FB devices are PCI devices
    arch/x86/irq: Purge useless handler declarations from hw_irq.h
    x86: Fix misspellings in comments

    Linus Torvalds
     
  • Pull locking fixes from Ingo Molnar:
    "Documentation updates and a bitops ordering fix"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    bitops: Do not default to __clear_bit() for __clear_bit_unlock()
    documentation: Clarify compiler store-fusion example
    documentation: Transitivity is not cumulativity
    documentation: Add alternative release-acquire outcome
    documentation: Distinguish between local and global transitivity
    documentation: Subsequent writes ordered by rcu_dereference()
    documentation: Remove obsolete reference to RCU-protected indexes
    documentation: Fix memory-barriers.txt section references
    documentation: Fix control dependency and identical stores

    Linus Torvalds