30 Sep, 2016

8 commits

  • Small cleanup; nothing uses the @cpu argument so make it go away.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The partial initialization of wait_queue_t in prepare_to_wait_event() looks
    ugly. This was done to shrink .text, but we can simply add the new helper
    which does the full initialization and shrink the compiled code a bit more.

    And. This way prepare_to_wait_event() can have more users. In particular we
    are ready to remove the signal_pending_state() checks from wait_bit_action_f
    helpers and change __wait_on_bit_lock() to use prepare_to_wait_event().

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Bart Van Assche
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160906140055.GA6167@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • __wait_on_bit_lock() doesn't need abort_exclusive_wait() too. Right
    now it can't use prepare_to_wait_event() (see the next change), but
    it can do the additional finish_wait() if action() fails.

    abort_exclusive_wait() no longer has callers, remove it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Bart Van Assche
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160906140053.GA6164@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • ___wait_event() doesn't really need abort_exclusive_wait(), we can simply
    change prepare_to_wait_event() to remove the waiter from q->task_list if
    it was interrupted.

    This simplifies the code/logic, and this way prepare_to_wait_event() can
    have more users, see the next change.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Bart Van Assche
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160908164815.GA18801@redhat.com
    Signed-off-by: Ingo Molnar
    --
    include/linux/wait.h | 7 +------
    kernel/sched/wait.c | 35 +++++++++++++++++++++++++----------
    2 files changed, 26 insertions(+), 16 deletions(-)

    Oleg Nesterov
     
  • Otherwise this logic only works if mode is "compatible" with another
    exclusive waiter.

    If some wq has both TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE waiters,
    abort_exclusive_wait() won't wait an uninterruptible waiter.

    The main user is __wait_on_bit_lock() and currently it is fine but only
    because TASK_KILLABLE includes TASK_UNINTERRUPTIBLE and we do not have
    lock_page_interruptible() yet.

    Just use TASK_NORMAL and remove the "mode" arg from abort_exclusive_wait().
    Yes, this means that (say) wake_up_interruptible() can wake up the non-
    interruptible waiter(s), but I think this is fine. And in fact I think
    that abort_exclusive_wait() must die, see the next change.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Bart Van Assche
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160906140047.GA6157@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Since commit:

    2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")

    we now have two different fixed point units for load:

    - 'shares' in calc_cfs_shares() has 20 bit fixed point unit on 64-bit
    kernels. Therefore use scale_load() on MIN_SHARES.

    - 'wl' in effective_load() has 10 bit fixed point unit. Therefore use
    scale_load_down() on tg->shares which has 20 bit fixed point unit on
    64-bit kernels.

    Signed-off-by: Dietmar Eggemann
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1471874441-24701-1-git-send-email-dietmar.eggemann@arm.com
    Signed-off-by: Ingo Molnar

    Dietmar Eggemann
     
  • Current code can call set_cpu_sibling_map() and invoke sched_set_topology()
    more than once (e.g. on CPU hot plug). When this happens after
    sched_init_smp() has been called, we lose the NUMA topology extension to
    sched_domain_topology in sched_init_numa(). This results in incorrect
    topology when the sched domain is rebuilt.

    This patch fixes the bug and issues warning if we call sched_set_topology()
    after sched_init_smp().

    Signed-off-by: Tim Chen
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: bp@suse.de
    Cc: jolsa@redhat.com
    Cc: rjw@rjwysocki.net
    Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com
    Signed-off-by: Ingo Molnar

    Tim Chen
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Sep, 2016

7 commits

  • Merge fixes from Andrew Morton:
    "4 fixes"

    * emailed patches from Andrew Morton :
    mem-hotplug: use nodes that contain memory as mask in new_node_page()
    scripts/recordmcount.c: account for .softirqentry.text
    dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
    mm,ksm: fix endless looping in allocating memory when ksm enable

    Linus Torvalds
     
  • 9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
    prevents allocating from an empty nodemask, but as David points out, it is
    still wrong. As node_online_map may include memoryless nodes, only
    allocating from these nodes is meaningless.

    This patch uses node_states[N_MEMORY] mask to prevent the above case.

    Fixes: 9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
    Fixes: 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline")
    Link: http://lkml.kernel.org/r/1474447117.28370.6.camel@TP420
    Signed-off-by: Li Zhong
    Suggested-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Michal Hocko
    Cc: John Allen
    Cc: Xishi Qiu
    Cc: Joonsoo Kim
    Cc: Naoya Horiguchi
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zhong
     
  • be7635e7287e ("arch, ftrace: for KASAN put hard/soft IRQ entries into
    separate sections") added .softirqentry.text section, but it was not added
    to recordmcount. So functions in the section are untracable. Add the
    section to scripts/recordmcount.c and scripts/recordmcount.pl.

    Fixes: be7635e7287e ("arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections")
    Link: http://lkml.kernel.org/r/1474902626-73468-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Steve Rostedt
    Cc: [4.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • When CONFIG_DMA_API_DEBUG is enabled we need to preserve unmapping address
    even if "unmap" is a no-op for our architecutre because we need
    debug_dma_unmap_page() to correctly cleanup all of the debug bookkeeping.
    Failing to do so results in a false positive warnings about previously
    mapped areas never being unmapped.

    Link: http://lkml.kernel.org/r/1474387125-3713-1-git-send-email-andrew.smirnov@gmail.com
    Signed-off-by: Andrey Smirnov
    Reviewed-by: Robin Murphy
    Cc: Joerg Roedel
    Cc: Will Deacon
    Cc: Zhen Lei
    Cc: "Luis R. Rodriguez"
    Cc: Christian Borntraeger
    Cc: Geliang Tang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Smirnov
     
  • I hit the following hung task when runing a OOM LTP test case with 4.1
    kernel.

    Call trace:
    [] __switch_to+0x74/0x8c
    [] __schedule+0x23c/0x7bc
    [] schedule+0x3c/0x94
    [] rwsem_down_write_failed+0x214/0x350
    [] down_write+0x64/0x80
    [] __ksm_exit+0x90/0x19c
    [] mmput+0x118/0x11c
    [] do_exit+0x2dc/0xa74
    [] do_group_exit+0x4c/0xe4
    [] get_signal+0x444/0x5e0
    [] do_signal+0x1d8/0x450
    [] do_notify_resume+0x70/0x78

    The oom victim cannot terminate because it needs to take mmap_sem for
    write while the lock is held by ksmd for read which loops in the page
    allocator

    ksm_do_scan
    scan_get_next_rmap_item
    down_read
    get_next_rmap_item
    alloc_rmap_item #ksmd will loop permanently.

    There is no way forward because the oom victim cannot release any memory
    in 4.1 based kernel. Since 4.6 we have the oom reaper which would solve
    this problem because it would release the memory asynchronously.
    Nevertheless we can relax alloc_rmap_item requirements and use
    __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
    would just retry later after the lock got dropped.

    Such a patch would be also easy to backport to older stable kernels which
    do not have oom_reaper.

    While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
    by the allocation failure.

    Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Suggested-by: Hugh Dickins
    Suggested-by: Michal Hocko
    Acked-by: Michal Hocko
    Acked-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Pull late MTD fixes from Brian Norris:
    "Another round of MTD fixes for v4.8

    My apologies for sending this so late. I've been fairly absent as a
    maintainer this cycle, but I did queue these up weeks ago. In the
    meantime, Richard was able to handle some other fixes (thanks!) but
    didn't pick these up.

    On the bright side, these are very simple changes that should carry
    little risk.

    Summary:

    - Davinci NAND: fix a long-standing bug in how we clear/prep 4-bit ECC

    - OMAP NAND: an error-handling fix that made it into v4.8-rc1 caused
    error-handling cases in other configurations/code-paths; this fixes
    the fix"

    * tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd:
    mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
    mtd: nand: omap2: Don't call dma_release_channel() if dma_request_chan() failed

    Linus Torvalds
     
  • I will be starting employment at Versity next week and would like to update
    my MAINTAINERS e-mail to reflect that change. My versity e-mail is already
    activated so I shouldn't get any bounces on the new one. My ability to help
    with Ocfs2 kernel maintenance won't change as a result of the new job.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

28 Sep, 2016

1 commit

  • Pull cgroup fixes from Tejun Heo:
    "Three late fixes for cgroup: Two cpuset ones, one trivial and the
    other pretty obscure, and a cgroup core fix for a bug which impacts
    cgroup v2 namespace users"

    * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix invalid controller enable rejections with cgroup namespace
    cpuset: fix non static symbol warning
    cpuset: handle race between CPU hotplug and cpuset_hotplug_work

    Linus Torvalds
     

26 Sep, 2016

9 commits

  • Linus Torvalds
     
  • Pull tracefs fixes from Steven Rostedt:
    "Al Viro has been looking at the tracefs code, and has pointed out some
    issues. This contains one fix by me and one by Al. I'm sure that
    he'll come up with more but for now I tested these patches and they
    don't appear to have any negative impact on tracing"

    * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    fix memory leaks in tracing_buffers_splice_read()
    tracing: Move mutex to protect against resetting of seq data

    Linus Torvalds
     
  • When building XFS with -Werror, it now fails with:

    include/linux/pagemap.h: In function 'fault_in_multipages_readable':
    include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
    volatile char c;
    ^

    This is a regression caused by commit e23d4159b109 ("fix
    fault_in_multipages_...() on architectures with no-op access_ok()").
    Fix it by re-adding the "(void)c" trick taht was previously used to make
    the compiler think the variable is used.

    Signed-off-by: Dave Chinner
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Dave Chinner
     
  • The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
    defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
    PMDs respectively as requiring balancing upon a subsequent page fault.
    User-defined PROT_NONE memory regions which also have this flag set will
    not normally invoke the NUMA balancing code as do_page_fault() will send
    a segfault to the process before handle_mm_fault() is even called.

    However if access_remote_vm() is invoked to access a PROT_NONE region of
    memory, handle_mm_fault() is called via faultin_page() and
    __get_user_pages() without any access checks being performed, meaning
    the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
    region.

    A simple means of triggering this problem is to access PROT_NONE mmap'd
    memory using /proc/self/mem which reliably results in the NUMA handling
    functions being invoked when CONFIG_NUMA_BALANCING is set.

    This issue was reported in bugzilla (issue 99101) which includes some
    simple repro code.

    There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
    added at commit c0e7cad to avoid accidentally provoking strange
    behaviour by attempting to apply NUMA balancing to pages that are in
    fact PROT_NONE. The BUG_ON()'s are consistently triggered by the repro.

    This patch moves the PROT_NONE check into mm/memory.c rather than
    invoking BUG_ON() as faulting in these pages via faultin_page() is a
    valid reason for reaching the NUMA check with the PROT_NONE page table
    flag set and is therefore not always a bug.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
    Reported-by: Trevor Saunders
    Signed-off-by: Lorenzo Stoakes
    Acked-by: Rik van Riel
    Cc: Andrew Morton
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     
  • Pull MIPS fixes from Ralf Baechle:
    "A round of 4.8 fixes:

    MIPS generic code:
    - Add a missing ".set pop" in an early commit
    - Fix memory regions reaching top of physical
    - MAAR: Fix address alignment
    - vDSO: Fix Malta EVA mapping to vDSO page structs
    - uprobes: fix incorrect uprobe brk handling
    - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API
    - Avoid a BUG warning during PR_SET_FP_MODE prctl
    - SMP: Fix possibility of deadlock when bringing CPUs online
    - R6: Remove compact branch policy Kconfig entries
    - Fix size calc when avoiding IPIs for small icache flushes
    - Fix pre-r6 emulation FPU initialisation
    - Fix delay slot emulation count in debugfs

    ATH79:
    - Fix test for error return of clk_register_fixed_factor.

    Octeon:
    - Fix kernel header to work for VDSO build.
    - Fix initialization of platform device probing.

    paravirt:
    - Fix undefined reference to smp_bootstrap"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
    MIPS: Fix delay slot emulation count in debugfs
    MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
    MIPS: Fix pre-r6 emulation FPU initialisation
    MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
    MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API
    MIPS: Octeon: Fix platform bus probing
    MIPS: Octeon: mangle-port: fix build failure with VDSO code
    MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
    MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes
    MIPS: Add a missing ".set pop" in an early commit
    MIPS: paravirt: Fix undefined reference to smp_bootstrap
    MIPS: Remove compact branch policy Kconfig entries
    MIPS: MAAR: Fix address alignment
    MIPS: Fix memory regions reaching top of physical
    MIPS: uprobes: fix incorrect uprobe brk handling
    MIPS: ath79: Fix test for error return of clk_register_fixed_factor().

    Linus Torvalds
     
  • Pull one more powerpc fix from Michael Ellerman:
    "powernv/pci: Fix m64 checks for SR-IOV and window alignment from
    Russell Currey"

    * tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment

    Linus Torvalds
     
  • The fixes to the radix tree test suite show that the multi-order case is
    broken. The basic reason is that the radix tree code uses tagged
    pointers with the "internal" bit in the low bits, and calculating the
    pointer indices was supposed to mask off those bits. But gcc will
    notice that we then use the index to re-create the pointer, and will
    avoid doing the arithmetic and use the tagged pointer directly.

    This cleans the code up, using the existing is_sibling_entry() helper to
    validate the sibling pointer range (instead of open-coding it), and
    using entry_to_node() to mask off the low tag bit from the pointer. And
    once you do that, you might as well just use the now cleaned-up pointer
    directly.

    [ Side note: the multi-order code isn't actually ever used in the kernel
    right now, and the only reason I didn't just delete all that code is
    that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
    It also converts shmem-with-huge-pages and hugetlb to them.

    I'm okay with converting it to other mechanism, but I need
    something. (I looked into Konstantin's RFC patchset[2]. It looks
    okay, but I don't feel myself qualified to review it as I don't
    know much about radix-tree internals.)"

    [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
    [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

    Reported-by: Matthew Wilcox
    Cc: Andrew Morton
    Cc: Ross Zwisler
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Cedric Blancher
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • When we replace a multiorder entry, check that all indices reflect the
    new value.

    Also, compile the test suite with -O2, which shows other problems with
    the code due to some dodgy pointer operations in the radix tree code.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

25 Sep, 2016

11 commits

  • The iter->seq can be reset outside the protection of the mutex. So can
    reading of user data. Move the mutex up to the beginning of the function.

    Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
    Cc: stable@vger.kernel.org # 2.6.30+
    Reported-by: Al Viro
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Commit 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot
    instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS
    macro from do_dsemulret, leading to the ds_emul file in debugfs always
    returning zero even though we perform delay slot emulations.

    Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro.

    Signed-off-by: Paul Burton
    Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/14301/
    Signed-off-by: Ralf Baechle

    Paul Burton
     
  • This patch fixes the possibility of a deadlock when bringing up
    secondary CPUs.
    The deadlock occurs because the set_cpu_online() is called before
    synchronise_count_slave(). This can cause a deadlock if the boot CPU,
    having scheduled another thread, attempts to send an IPI to the
    secondary CPU, which it sees has been marked online. The secondary is
    blocked in synchronise_count_slave() waiting for the boot CPU to enter
    synchronise_count_master(), but the boot cpu is blocked in
    smp_call_function_many() waiting for the secondary to respond to it's
    IPI request.

    Fix this by marking the CPU online in cpu_callin_map and synchronising
    counters before declaring the CPU online and calculating the maps for
    IPIs.

    Signed-off-by: Matt Redfearn
    Reported-by: Justin Chen
    Tested-by: Justin Chen
    Cc: Florian Fainelli
    Cc: stable@vger.kernel.org # v4.1+
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/14302/
    Signed-off-by: Ralf Baechle

    Matt Redfearn
     
  • Pull perf fixes from Thomas Gleixner:
    "Three fixlets for perf:

    - add a missing NULL pointer check in the intel BTS driver

    - make BTS an exclusive PMU because BTS can only handle one event at
    a time

    - ensure that exclusive events are limited to one PMU so that several
    exclusive events can be scheduled on different PMU instances"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/core: Limit matching exclusive events to one PMU
    perf/x86/intel/bts: Make it an exclusive PMU
    perf/x86/intel/bts: Make sure debug store is valid

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "Two smallish fixes:

    - use the proper asm constraint in the Super-H atomic_fetch_ops

    - a trivial typo fix in the Kconfig help text"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
    locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()

    Linus Torvalds
     
  • Pull EFI fixes from Thomas Gleixner:
    "Two fixes for EFI/PAT:

    - a 32bit overflow bug in the PAT code which was unearthed by the
    large EFI mappings

    - prevent a boot hang on large systems when EFI mixed mode is enabled
    but not used"

    * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/efi: Only map RAM into EFI page tables if in mixed-mode
    x86/mm/pat: Prevent hang during boot when mapping pages

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "Three fixes for irq core and irq chip drivers:

    - Do not set the irq type if type is NONE. Fixes a boot regression
    on various SoCs

    - Use the proper cpu for setting up the GIC target list. Discovered
    by the cpumask debugging code.

    - A rather large fix for the MIPS-GIC so per cpu local interrupts
    work again. This was discovered late because the code falls back
    to slower timers which use normal device interrupts"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/mips-gic: Fix local interrupts
    irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
    genirq: Skip chained interrupt trigger setup if type is IRQ_TYPE_NONE

    Linus Torvalds
     
  • Merge VM fixes from High Dickins:
    "I get the impression that Andrew is away or busy at the moment, so I'm
    going to send you three independent uncontroversial little mm fixes
    directly - though none is strictly a 4.8 regression fix.

    - shmem: fix tmpfs to handle the huge= option properly from Toshi
    Kani is a one-liner to fix a major embarrassment in 4.8's hugepages
    on tmpfs feature: although Hillf pointed it out in June, somehow
    both Kirill and I repeatedly dropped the ball on this one. You
    might wonder if the feature got tested at all with that bug in:
    yes, it did, but for wider testing coverage, Kirill and I had each
    relied too much on an override which bypasses that condition.

    - huge tmpfs: fix Committed_AS leak just a run-of-the-mill accounting
    fix in the same feature.

    - mm: delete unnecessary and unsafe init_tlb_ubc() is an unrelated
    fix to 4.3's TLB flush batching in reclaim: the bug would be rare,
    and none of us will be shamed if this one misses 4.8; but it got
    such a quick ack from Mel today that I'm inclined to offer it along
    with the first two"

    * emailed patches from Hugh Dickins :
    mm: delete unnecessary and unsafe init_tlb_ubc()
    huge tmpfs: fix Committed_AS leak
    shmem: fix tmpfs to handle the huge= option properly

    Linus Torvalds
     
  • init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
    initialized with zeroes in the init_task, and copied from parent to
    child while it is quiescent in arch_dup_task_struct(); so I went to
    delete it.

    But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
    check that it was always empty at that point, and found them firing:
    because memcg reclaim can recurse into global reclaim (when allocating
    biosets for swapout in my case), and arrive back at the init_tlb_ubc()
    in shrink_node_memcg().

    Resetting tlb_ubc.flush_required at that point is wrong: if the upper
    level needs a deferred TLB flush, but the lower level turns out not to,
    we miss a TLB flush. But fortunately, that's the only part of the
    protocol that does not nest: with the initialization removed, cpumask
    collects bits from upper and lower levels, and flushes TLB when needed.

    Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: stable@vger.kernel.org # 4.3+
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows
    bigger and bigger: just a cosmetic issue for most users, but disabling
    for those who run without overcommit (/proc/sys/vm/overcommit_memory 2).

    shmem_uncharge() was forgetting to unaccount __vm_enough_memory's
    charge, and shmem_charge() was forgetting it on the filesystem-full
    error path.

    Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • shmem_get_unmapped_area() checks SHMEM_SB(sb)->huge incorrectly, which
    leads to a reversed effect of "huge=" mount option.

    Fix the check in shmem_get_unmapped_area().

    Note, the default value of SHMEM_SB(sb)->huge remains as
    SHMEM_HUGE_NEVER. User will need to specify "huge=" option to enable
    huge page mappings.

    Reported-by: Hillf Danton
    Signed-off-by: Toshi Kani
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

24 Sep, 2016

4 commits

  • Pull i2c fixes from Wolfram Sang:
    "Three driver bugfixes: fixing uninitialized memory pointers (eg20t),
    pm/clock imbalance (qup), and a wrongly set cached variable (pc954x)"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
    i2c: mux: pca954x: retry updating the mux selection on failure
    i2c-eg20t: fix race between i2c init and interrupt enable

    Linus Torvalds
     
  • Pull input updates from Dmitry Torokhov:
    "Just a fix up for the firmware handling to the Silead driver (which is
    a new driver in this release)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: silead_gsl1680 - use "silead/" prefix for firmware loading
    Input: silead_gsl1680 - document firmware-name, fix implementation

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Three fixes, two regressions and one that poses a problem in blk-mq
    with the new nvmef code"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
    nvme-rdma: only clear queue flags after successful connect
    blk-throttle: Extend slice if throttle group is not empty

    Linus Torvalds
     
  • On the v2 hierarchy, "cgroup.subtree_control" rejects controller
    enables if the cgroup has processes in it. The enforcement of this
    logic assumes that the cgroup wouldn't have any css_sets associated
    with it if there are no tasks in the cgroup, which is no longer true
    since a79a908fd2b0 ("cgroup: introduce cgroup namespaces").

    When a cgroup namespace is created, it pins the css_set of the
    creating task to use it as the root css_set of the namespace. This
    extra reference stays as long as the namespace is around and makes
    "cgroup.subtree_control" think that the namespace root cgroup is not
    empty even when it is and thus reject controller enables.

    Fix it by making cgroup_subtree_control() walk and test emptiness of
    each css_set instead of testing whether the list_head is empty.

    While at it, update the comment of cgroup_task_count() to indicate
    that the returned value may be higher than the number of tasks, which
    has always been true due to temporary references and doesn't break
    anything.

    Signed-off-by: Tejun Heo
    Reported-by: Evgeny Vereshchagin
    Cc: Serge E. Hallyn
    Cc: Aditya Kali
    Cc: Eric W. Biederman
    Cc: stable@vger.kernel.org # v4.6+
    Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
    Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541

    Tejun Heo