04 Oct, 2017

31 commits

  • MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
    SwapBacked). There is no lock to prevent the page is added to swap
    cache between these two steps by page reclaim. Page reclaim could add
    the page to swap cache and unmap the page. After page reclaim, the page
    is added back to lru. At that time, we probably start draining per-cpu
    pagevec and mark the page lazyfree. So the page could be in a state
    with SwapBacked cleared and PG_swapcache set. Next time there is a
    refault in the virtual address, do_swap_page can find the page from swap
    cache but the page has PageSwapCache false because SwapBacked isn't set,
    so do_swap_page will bail out and do nothing. The task will keep
    running into fault handler.

    Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
    Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
    Signed-off-by: Shaohua Li
    Reported-by: Artem Savkov
    Tested-by: Artem Savkov
    Reviewed-by: Rik van Riel
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Minchan Kim
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: [4.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Eryu noticed that he could sometimes get a leftover error reported when
    it shouldn't be on fsync with ext2 and non-journalled ext4.

    The problem is that writeback_single_inode still uses filemap_fdatawait.
    That picks up a previously set AS_EIO flag, which would ordinarily have
    been cleared before.

    Since we're mostly using this function as a replacement for
    filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO
    and AS_ENOSPC when reporting an error. That should allow the new
    function to better emulate the behavior of the old with respect to these
    flags.

    Link: http://lkml.kernel.org/r/20170922133331.28812-1-jlayton@kernel.org
    Signed-off-by: Jeff Layton
    Reported-by: Eryu Guan
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • The build of m32r allmodconfig is giving lots of build warnings about:

    include/linux/byteorder/big_endian.h:7:2:
    warning: #warning inconsistent configuration,
    needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
    #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN

    Define CPU_BIG_ENDIAN like the way CPU_LITTLE_ENDIAN is defined.

    Link: http://lkml.kernel.org/r/1505678083-10320-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sudip Mukherjee
     
  • In testing I found handle passed to zs_map_object in __zram_bvec_read is
    NULL so eh kernel goes oops in pin_object().

    The reason is there is no routine to check the slot's freeing after
    getting the slot's lock. This patch fixes it.

    [minchan@kernel.org: v2]
    Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org
    Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org
    Fixes: 1f7319c74275 ("zram: partial IO refactoring")
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • On powerpc, RODATA_TEST fails with message the following messages:

    Freeing unused kernel memory: 528K
    rodata_test: test data was not read only

    This is because GCC allocates it to .data section:

    c0695034 g O .data 00000004 rodata_test_data

    Since commit 056b9d8a7692 ("mm: remove rodata_test_data export, add
    pr_fmt"), rodata_test_data is used only inside rodata_test.c By
    declaring it static, it gets properly allocated into .rodata section
    instead of .data:

    c04df710 l O .rodata 00000004 rodata_test_data

    Fixes: 056b9d8a7692 ("mm: remove rodata_test_data export, add pr_fmt")
    Link: http://lkml.kernel.org/r/20170921093729.1080368AC1@po15668-vm-win7.idsi0.si.c-s.fr
    Signed-off-by: Christophe Leroy
    Cc: Kees Cook
    Cc: Jinbum Park
    Cc: Segher Boessenkool
    Cc: David Laight
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • Locking of config and doorbell operations should be done only if the
    underlying hardware requires it.

    This patch removes the global spinlocks from the rapidio subsystem and
    moves them to the mport drivers (fsl_rio and tsi721), only to the
    necessary places. For example, local config space read and write
    operations (lcread/lcwrite) are atomic in all existing drivers, so there
    should be no need for locking, while the cread/cwrite operations which
    generate maintenance transactions need to be synchronized with a lock.

    Later, each driver could chose to use a per-port lock instead of a
    global one, or even more granular locking.

    Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com
    Signed-off-by: Ioan Nicu
    Signed-off-by: Frank Kunz
    Acked-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ioan Nicu
     
  • The function is called from __meminit context and calls other __meminit
    functions but isn't it self mark as such today:

    WARNING: vmlinux.o(.text.unlikely+0x4516): Section mismatch in reference from the function init_reserved_page() to the function .meminit.text:early_pfn_to_nid()
    The function init_reserved_page() references the function __meminit early_pfn_to_nid().
    This is often because init_reserved_page lacks a __meminit annotation or the annotation of early_pfn_to_nid is wrong.

    On most compilers, we don't notice this because the function gets
    inlined all the time. Adding __meminit here fixes the harmless warning
    for the old versions and is generally the correct annotation.

    Link: http://lkml.kernel.org/r/20170915193149.901180-1-arnd@arndb.de
    Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
    Signed-off-by: Arnd Bergmann
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Fix the situation when clear_bit() is called for page->private before
    the page pointer is actually assigned. While at it, remove work_busy()
    check because it is costly and does not give 100% guarantee anyway.

    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Andrea brought to my attention that the L->{L,S} guarantees are
    completely bogus for this case. I was looking at the diagram, from the
    offending commit, when that _is_ the race, we had the load reordered
    already.

    What we need is at least S->L semantics, thus simply use
    wq_has_sleeper() to serialize the call for good.

    Link: http://lkml.kernel.org/r/20170914175313.GB811@linux-80c1.suse
    Fixes: 46acef048a6 (mm,compaction: serialize waitqueue_active() checks)
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Parri
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Drop the global lru lock in isolate callback before calling
    zap_page_range which calls cond_resched, and re-acquire the global lru
    lock before returning. Also change return code to LRU_REMOVED_RETRY.

    Use mmput_async when fail to acquire mmap sem in an atomic context.

    Fix "BUG: sleeping function called from invalid context"
    errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

    Also restore mmput_async, which was initially introduced in commit
    ec8d7c14ea14 ("mm, oom_reaper: do not mmput synchronously from the oom
    reaper context"), and was removed in commit 212925802454 ("mm: oom: let
    oom_reap_task and exit_mmap run concurrently").

    Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com
    Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder")
    Signed-off-by: Sherry Yang
    Signed-off-by: Greg Kroah-Hartman
    Reported-by: Kyle Yan
    Acked-by: Arve Hjønnevåg
    Acked-by: Michal Hocko
    Cc: Martijn Coenen
    Cc: Todd Kjos
    Cc: Riley Andrews
    Cc: Ingo Molnar
    Cc: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Oleg Nesterov
    Cc: Hoeun Ryu
    Cc: Christopher Lameter
    Cc: Vegard Nossum
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sherry Yang
     
  • Fix for 4.14, zone device page always have an elevated refcount of one
    and thus page count sanity check in uncharge_page() is inappropriate for
    them.

    [mhocko@suse.com: nano-optimize VM_BUG_ON in uncharge_page]
    Link: http://lkml.kernel.org/r/20170914190011.5217-1-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Michal Hocko
    Reported-by: Evgeny Baskakov
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • The following lockdep splat has been noticed during LTP testing

    ======================================================
    WARNING: possible circular locking dependency detected
    4.13.0-rc3-next-20170807 #12 Not tainted
    ------------------------------------------------------
    a.out/4771 is trying to acquire lock:
    (cpu_hotplug_lock.rw_sem){++++++}, at: [] drain_all_stock.part.35+0x18/0x140

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] __do_page_fault+0x175/0x530

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #3 (&mm->mmap_sem){++++++}:
    lock_acquire+0xc9/0x230
    __might_fault+0x70/0xa0
    _copy_to_user+0x23/0x70
    filldir+0xa7/0x110
    xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
    xfs_readdir+0x1fa/0x2c0 [xfs]
    xfs_file_readdir+0x30/0x40 [xfs]
    iterate_dir+0x17a/0x1a0
    SyS_getdents+0xb0/0x160
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    -> #2 (&type->i_mutex_dir_key#3){++++++}:
    lock_acquire+0xc9/0x230
    down_read+0x51/0xb0
    lookup_slow+0xde/0x210
    walk_component+0x160/0x250
    link_path_walk+0x1a6/0x610
    path_openat+0xe4/0xd50
    do_filp_open+0x91/0x100
    file_open_name+0xf5/0x130
    filp_open+0x33/0x50
    kernel_read_file_from_path+0x39/0x80
    _request_firmware+0x39f/0x880
    request_firmware_direct+0x37/0x50
    request_microcode_fw+0x64/0xe0
    reload_store+0xf7/0x180
    dev_attr_store+0x18/0x30
    sysfs_kf_write+0x44/0x60
    kernfs_fop_write+0x113/0x1a0
    __vfs_write+0x37/0x170
    vfs_write+0xc7/0x1c0
    SyS_write+0x58/0xc0
    do_syscall_64+0x6c/0x1f0
    return_from_SYSCALL_64+0x0/0x7a

    -> #1 (microcode_mutex){+.+.+.}:
    lock_acquire+0xc9/0x230
    __mutex_lock+0x88/0x960
    mutex_lock_nested+0x1b/0x20
    microcode_init+0xbb/0x208
    do_one_initcall+0x51/0x1a9
    kernel_init_freeable+0x208/0x2a7
    kernel_init+0xe/0x104
    ret_from_fork+0x2a/0x40

    -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
    __lock_acquire+0x153c/0x1550
    lock_acquire+0xc9/0x230
    cpus_read_lock+0x4b/0x90
    drain_all_stock.part.35+0x18/0x140
    try_charge+0x3ab/0x6e0
    mem_cgroup_try_charge+0x7f/0x2c0
    shmem_getpage_gfp+0x25f/0x1050
    shmem_fault+0x96/0x200
    __do_fault+0x1e/0xa0
    __handle_mm_fault+0x9c3/0xe00
    handle_mm_fault+0x16e/0x380
    __do_page_fault+0x24a/0x530
    do_page_fault+0x30/0x80
    page_fault+0x28/0x30

    other info that might help us debug this:

    Chain exists of:
    cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&mm->mmap_sem);
    lock(&type->i_mutex_dir_key#3);
    lock(&mm->mmap_sem);
    lock(cpu_hotplug_lock.rw_sem);

    *** DEADLOCK ***

    2 locks held by a.out/4771:
    #0: (&mm->mmap_sem){++++++}, at: [] __do_page_fault+0x175/0x530
    #1: (percpu_charge_mutex){+.+...}, at: [] try_charge+0x397/0x6e0

    The problem is very similar to the one fixed by commit a459eeb7b852
    ("mm, page_alloc: do not depend on cpu hotplug locks inside the
    allocator"). We are taking hotplug locks while we can be sitting on top
    of basically arbitrary locks. This just calls for problems.

    We can get rid of {get,put}_online_cpus, fortunately. We do not have to
    be worried about races with memory hotplug because drain_local_stock,
    which is called from both the WQ draining and the memory hotplug
    contexts, is always operating on the local cpu stock with IRQs disabled.

    The only thing to be careful about is that the target memcg doesn't
    vanish while we are still in drain_all_stock so take a reference on it.

    Link: http://lkml.kernel.org/r/20170913090023.28322-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reported-by: Artem Savkov
    Tested-by: Artem Savkov
    Cc: Johannes Weiner
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Andrea has noticed that the oom_reaper doesn't invalidate the range via
    mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can
    corrupt the memory of the kvm guest for example.

    tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not
    sufficient as per Andrea:

    "mmu_notifier_invalidate_range cannot be used in replacement of
    mmu_notifier_invalidate_range_start/end. For KVM
    mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
    notifier implementation has to implement either ->invalidate_range
    method or the invalidate_range_start/end methods, not both. And if you
    implement invalidate_range_start/end like KVM is forced to do, calling
    mmu_notifier_invalidate_range in common code is a noop for KVM.

    For those MMU notifiers that can get away only implementing
    ->invalidate_range, the ->invalidate_range is implicitly called by
    mmu_notifier_invalidate_range_end(). And only those secondary MMUs
    that share the same pagetable with the primary MMU (like AMD iommuv2)
    can get away only implementing ->invalidate_range"

    As the callback is allowed to sleep and the implementation is out of
    hand of the MM it is safer to simply bail out if there is an mmu
    notifier registered. In order to not fail too early make the
    mm_has_notifiers check under the oom_lock and have a little nap before
    failing to give the current oom victim some more time to exit.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org
    Fixes: aac453635549 ("mm, oom: introduce oom reaper")
    Signed-off-by: Michal Hocko
    Reported-by: Andrea Arcangeli
    Reviewed-by: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • It is possible that on a (partially) unsuccessful page reclaim,
    kref_put() called in z3fold_reclaim_page() does not yield page release,
    but the page is released shortly afterwards by another thread. Then
    z3fold_reclaim_page() would try to list_add() that (released) page again
    which is obviously a bug.

    To avoid that, spin_lock() has to be taken earlier, before the
    kref_put() call mentioned earlier.

    Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
    array initializers, where the GPIO_* enums serve as indices. If enum
    values are defined, but never used, pinmux_pins[] contains (zero-filled)
    holes. Such entries are treated as pin zero, which was registered
    before, thus leading to pinctrl registration failures, as seen on
    sh7722:

    sh-pfc pfc-sh7722: pin 0 already registered
    sh-pfc pfc-sh7722: error during pin registration
    sh-pfc pfc-sh7722: could not register: -22
    sh-pfc: probe of pfc-sh7722 failed with error -22

    Remove GPIO_PH[0-7] from the enum to fix this.

    Link: http://lkml.kernel.org/r/1505205657-18012-5-git-send-email-geert+renesas@glider.be
    Fixes: ef0fa5331a73e479 ("sh: Add pinmux for sh7269")
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Laurent Pinchart
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Magnus Damm
    Cc: Yoshihiro Shimoda
    Cc: Jacopo Mondi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
    array initializers, where the GPIO_* enums serve as indices. If enum
    values are defined, but never used, pinmux_pins[] contains (zero-filled)
    holes. Such entries are treated as pin zero, which was registered
    before, thus leading to pinctrl registration failures, as seen on
    sh7722:

    sh-pfc pfc-sh7722: pin 0 already registered
    sh-pfc pfc-sh7722: error during pin registration
    sh-pfc pfc-sh7722: could not register: -22
    sh-pfc: probe of pfc-sh7722 failed with error -22

    Remove GPIO_PH[0-7] from the enum to fix this.

    Link: http://lkml.kernel.org/r/1505205657-18012-4-git-send-email-geert+renesas@glider.be
    Fixes: 41797f75486d8ca3 ("sh: Add pinmux for sh7264")
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Laurent Pinchart
    Cc: Jacopo Mondi
    Cc: Magnus Damm
    Cc: Rich Felker
    Cc: Yoshihiro Shimoda
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Commit 3810e96056ff ("sh: modify pinmux for SH7757 2nd cut") renamed
    GPIO_PT[JLNQ]7 to GPIO_PT[JLNQ]7_RESV, and removed the existing users
    from the pinmux_pins[] array.

    However, pinmux_pins[] is initialized through PINMUX_GPIO(), using
    designated array initializers, where the GPIO_* enums serve as indices.
    Hence entries were not really removed, but replaced by (zero-filled)
    holes. Such entries are treated as pin zero, which was registered
    before, thus leading to pinctrl registration failures, as seen on
    sh7722:

    sh-pfc pfc-sh7722: pin 0 already registered
    sh-pfc pfc-sh7722: error during pin registration
    sh-pfc pfc-sh7722: could not register: -22
    sh-pfc: probe of pfc-sh7722 failed with error -22

    Remove GPIO_PT[JLNQ]7_RESV from the enum to fix this.

    Link: http://lkml.kernel.org/r/1505205657-18012-3-git-send-email-geert+renesas@glider.be
    Fixes: 3810e96056ffddf6 ("sh: modify pinmux for SH7757 2nd cut")
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Laurent Pinchart
    Cc: Jacopo Mondi
    Cc: Magnus Damm
    Cc: Rich Felker
    Cc: Yoshihiro Shimoda
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Patch series "sh: sh7722/sh7757i/sh7264/sh7269: Fix pinctrl registration",
    v2.

    Magnus Damm reported that on sh7722/Migo-R, pinctrl registration fails
    with:

    sh-pfc pfc-sh7722: pin 0 already registered
    sh-pfc pfc-sh7722: error during pin registration
    sh-pfc pfc-sh7722: could not register: -22
    sh-pfc: probe of pfc-sh7722 failed with error -22

    pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
    array initializers, where the GPIO_* enums serve as indices. Apparently
    GPIO_PTQ7 was defined in the enum, but never used. If enum values are
    defined, but never used, pinmux_pins[] contains (zero-filled) holes.
    Hence such entries are treated as pin zero, which was registered before,
    and pinctrl registration fails.

    I can't see how this ever worked, as at the time of commit f5e25ae52fef
    ("sh-pfc: Add sh7722 pinmux support"), pinmux_gpios[] in
    drivers/pinctrl/sh-pfc/pfc-sh7722.c already had the hole, and
    drivers/pinctrl/core.c already had the check.

    Some scripting revealed a few more broken drivers:
    - sh7757 has four holes, due to nonexistent GPIO_PT[JLNQ]7_RESV.
    - sh7264 and sh7269 define GPIO_PH[0-7], but don't use it with
    PINMUX_GPIO().

    Patch 1 fixes the issue on sh7722, and was tested. Patches 3-4 should
    fix the issue on the other 3 SoCs, but was untested due to lack of
    hardware.

    This patch (of 4):

    On sh7722/Migo-R, pinctrl registration fails with:

    sh-pfc pfc-sh7722: pin 0 already registered
    sh-pfc pfc-sh7722: error during pin registration
    sh-pfc pfc-sh7722: could not register: -22
    sh-pfc: probe of pfc-sh7722 failed with error -22

    pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array
    initializers, where the GPIO_* enums serve as indices. As GPIO_PTQ7 is
    defined in the enum, but never used, pinmux_pins[] contains a
    (zero-filled) hole. Hence this entry is treated as pin zero, which was
    registered before, and pinctrl registration fails.

    According to the datasheet, port PTQ7 does not exist. Hence remove
    GPIO_PTQ7 from the enum to fix this.

    Link: http://lkml.kernel.org/r/1505205657-18012-2-git-send-email-geert+renesas@glider.be
    Fixes: 8d7b5b0af7e070b9 ("sh: Add sh7722 pinmux code")
    Signed-off-by: Geert Uytterhoeven
    Reported-by: Magnus Damm
    Reviewed-by: Laurent Pinchart
    Tested-by: Jacopo Mondi
    Cc: Rich Felker
    Cc: Yoshihiro Shimoda
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • This fixes a bug in madvise() where if you'd try to soft offline a
    hugepage via madvise(), while walking the address range you'd end up,
    using the wrong page offset due to attempting to get the compound order
    of a former but presently not compound page, due to dissolving the huge
    page (since commit c3114a84f7f9: "mm: hugetlb: soft-offline: dissolve
    source hugepage after successful migration").

    As a result I ended up with all my free pages except one being offlined.

    Link: http://lkml.kernel.org/r/20170912204306.GA12053@gmail.com
    Fixes: c3114a84f7f9 ("mm: hugetlb: soft-offline: dissolve source hugepage after successful migration")
    Signed-off-by: Alexandru Moise
    Cc: Anshuman Khandual
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Hillf Danton
    Cc: Shaohua Li
    Cc: Mike Rapoport
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Rik van Riel
    Cc: Naoya Horiguchi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandru Moise
     
  • In this place mm is unlocked, so vmas or list may change. Down read
    mmap_sem to protect them from modifications.

    Link: http://lkml.kernel.org/r/150512788393.10691.8868381099691121308.stgit@localhost.localdomain
    Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
    Signed-off-by: Kirill Tkhai
    Acked-by: Michal Hocko
    Reviewed-by: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: zhong jiang
    Cc: Ingo Molnar
    Cc: Claudio Imbrenda
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • There's a typo in recent change of VM_MPX definition. We want it to be
    VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4.

    This bug does cause visible regressions. In arch_vma_name the vmflags
    are tested against VM_MPX. With the incorrect value of VM_MPX, a number
    of vmas (such as the stack) test positive and end up being marked as
    "[mpx]" in /proc/N/maps instead of their correct names.

    This confuses tools like rr which expect to be able to find familiar
    vmas.

    Fixes: df3735c5b40f ("x86,mpx: make mpx depend on x86-64 to free up VMA flag")
    Link: http://lkml.kernel.org/r/20170918140253.36856-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Rik van Riel
    Cc: Dave Hansen
    Cc: Kyle Huey
    Cc: [4.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Here are some of the more spelling mistakes and typos that I've found
    while fixing up spelling mistakes in kernel error message text over the
    past eight weeks.

    [akpm@linux-foundation.org: s/|/||/, per Joe]
    Link: http://lkml.kernel.org/r/20170919090818.5989-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Acked-by: Kees Cook
    Cc: Masahiro Yamada
    Cc: Stephen Boyd
    Cc: Joe Perches
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • This parameter is named kp, so the documentation should use that.

    Fixes: 9b473de87209 ("param: Fix duplicate module prefixes")
    Link: http://lkml.kernel.org/r/20170919142656.64aea59e@endymion
    Signed-off-by: Jean Delvare
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • The build of alpha allmodconfig is giving error:

    arch/alpha/include/asm/mmu_context.h: In function 'ev5_switch_mm':
    arch/alpha/include/asm/mmu_context.h:160:2: error:
    implicit declaration of function 'task_thread_info';
    did you mean 'init_thread_info'? [-Werror=implicit-function-declaration]

    The file 'mmu_context.h' needed an extra header file.

    Link: http://lkml.kernel.org/r/1505668810-7497-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sudip Mukherjee
     
  • Pull workqueue fixlet from Tejun Heo:
    "Minor documentation update"

    * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    Documentation: core-api: minor workqueue.rst cleanups

    Linus Torvalds
     
  • Pull cgroup fix from Tejun Heo:
    "The recent migration code updates assumed that migrations always
    execute from the top to the bottom once and didn't clean up internal
    states after each migration round; however, cgroup_transfer_tasks()
    repeats the inner steps multiple times and the garbage internal states
    from the previous iteration led to OOPS.

    Waiman fixed the bug by reinitializing the relevant states at the end
    of each migration round"

    * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: Reinit cgroup_taskset structure before cgroup_migrate_execute() returns

    Linus Torvalds
     
  • Pull percpu fixes from Tejun Heo:
    "Rather important fixes this time.

    - The new percpu area allocator had a subtle bug in how it iterates
    the memory regions and could skip viable areas, which led to
    allocation failures for module static percpu variables. Dennis
    fixed the bug and another non-critical one in stat calculation.

    - Mark noticed that the generic implementations of percpu local
    atomic reads aren't properly protected against irqs and there's a
    (slim) chance for split reads on some 32bit systems. Generic
    implementations are updated to disable irq when read size is larger
    than ulong size. This may have made some 32bit archs which can do
    atomic local 64bit accesses generate sub-optimal code. We need to
    find them out and implement arch-specific overrides"

    * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: fix iteration to prevent skipping over block
    percpu: fix starting offset for chunk statistics traversal
    percpu: make this_cpu_generic_read() atomic w.r.t. interrupts

    Linus Torvalds
     
  • Pull libata fixes from Tejun Heo:
    "Nothing too interesting.

    Arnd's gcc-7 warning fixes that slipped through the cracks for two
    release cycles (my bad), and two minor low level driver updates"

    * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    ahci: don't ignore result code of ahci_reset_controller()
    ata_piix: Add Fujitsu-Siemens Lifebook S6120 to short cable IDs
    ata: avoid gcc-7 warning in ata_timing_quantize

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a number of USB fixes for 4.14-rc4 to resolved reported
    issues.

    There's a bunch of stuff in here based on the great work Andrey
    Konovalov is doing in fuzzing the USB stack. Lots of bug fixes when
    dealing with corrupted USB descriptors that we've never seen in
    "normal" operation, but is now ensuring the stack is much more
    hardened overall.

    There's also the usual XHCI and gadget driver fixes as well, and a
    build error fix, and a few other minor things, full details in the
    shortlog.

    All of these have been in linux-next with no reported issues"

    * tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (38 commits)
    usb: dwc3: of-simple: Add compatible for Spreadtrum SC9860 platform
    usb: gadget: udc: atmel: set vbus irqflags explicitly
    usb: gadget: ffs: handle I/O completion in-order
    usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
    usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
    usb: gadget: udc: renesas_usb3: Fix return value of usb3_write_pipe()
    usb: gadget: udc: renesas_usb3: fix Pn_RAMMAP.Pn_MPKT value
    usb: gadget: udc: renesas_usb3: fix for no-data control transfer
    USB: dummy-hcd: Fix erroneous synchronization change
    USB: dummy-hcd: fix infinite-loop resubmission bug
    USB: dummy-hcd: fix connection failures (wrong speed)
    USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
    USB: devio: Don't corrupt user memory
    USB: devio: Prevent integer overflow in proc_do_submiturb()
    USB: g_mass_storage: Fix deadlock when driver is unbound
    USB: gadgetfs: Fix crash caused by inadequate synchronization
    USB: gadgetfs: fix copy_to_user while holding spinlock
    USB: uas: fix bug in handling of alternate settings
    usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
    usb-storage: fix bogus hardware error messages for ATA pass-thru devices
    ...

    Linus Torvalds
     
  • Pull tty/serial fixes from Greg KH:
    "Here are a small number (5) of patches for some reported TTY and
    serial issues. Nothing major, a documentation update, timing fix,
    error handling fix, name reporting fix, and a timeout issue resolved.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'tty-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    serial: sccnxp: Fix error handling in sccnxp_probe()
    tty: serial: lpuart: avoid report NULL interrupt
    serial: bcm63xx: fix timing issue.
    mxser: fix timeout calculation for low rates
    serial: sh-sci: document R8A77970 bindings

    Linus Torvalds
     
  • Pull staging/IIO fixes from Greg KH:
    "Here are some small staging/IIO driver fixes for 4.14-rc4

    Most of these have been in my tree for a while due to travels, sorry
    for the delay. They resolve a number of small issues reported by
    people, mostly for the iio drivers. Nothing major in here, full
    details are in the shortlog.

    All have been linux-next for a few weeks with no reported issues"

    * tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
    staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
    iio: core: Return error for failed read_reg
    iio: ad7793: Fix the serial interface reset
    iio: ad_sigma_delta: Implement a dedicated reset function
    IIO: BME280: Updates to Humidity readings need ctrl_reg write!
    iio: adc: mcp320x: Fix readout of negative voltages
    iio: adc: mcp320x: Fix oops on module unload
    iio: adc: stm32: fix bad error check on max_channels
    iio: trigger: stm32-timer: fix a corner case to write preset
    iio: trigger: stm32-timer: preset shouldn't be buffered
    iio: adc: twl4030: Return an error if we can not enable the vusb3v1 regulator in 'twl4030_madc_probe()'
    iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
    iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
    staging: rtl8723bs: avoid null pointer dereference on pmlmepriv
    staging: rtl8723bs: add missing range check on id
    staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist
    staging: speakup: fix speakup-r empty line lockup
    staging: pi433: Move limit check to switch default to kill warning
    staging: r8822be: fix null pointer dereferences with a null driver_adapter
    staging: mt29f_spinand: Enable the read ECC before program the page
    ...

    Linus Torvalds
     

03 Oct, 2017

3 commits

  • Pull driver core fixes from Greg KH:
    "Here are a few small fixes for 4.14-rc4.

    The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
    straggler made it in through some other tree (odds are, one of
    mine...) So there's a simple removal of the last user, and then
    finally the macro is removed from the tree.

    There's a fix for old crazy udev instances that insist on reloading a
    module when it is removed from the kernel due to the new uevents for
    bind/unbind. This fixes the reported regression, hopefully some year
    in the future we can drop the workaround, once users update to the
    latest version, but I'm not holding my breath.

    And then there's a build fix for a linker warning, and a buffer
    overflow fix to match the PCI fixes you took through the PCI tree in
    the same area.

    All of these have been in linux-next for a few weeks while I've been
    traveling, sorry for the delay"

    * tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    driver core: remove DRIVER_ATTR
    fpga: altera-cvp: remove DRIVER_ATTR() usage
    driver core: platform: Don't read past the end of "driver_override" buffer
    base: arch_topology: fix section mismatch build warnings
    driver core: suppress sending MODALIAS in UNBIND uevents

    Linus Torvalds
     
  • Pull char/misc fixes from Greg KH:
    "Here are a handful of char/misc driver fixes for 4.14-rc4.

    Nothing major, some binder fixups, hyperv fixes, and other tiny
    things.

    All of these have been sitting in my tree for way too long, sorry for
    the delay in getting them to you. All have been in linux-next for a
    few weeks, and despite some people's feeling about if linux-next
    actually tests things, I think it's a good "soak test" for patches"

    * tag 'char-misc-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    Drivers: hv: fcopy: restore correct transfer length
    vmbus: don't acquire the mutex in vmbus_hvsock_device_unregister()
    intel_th: pci: Add Lewisburg PCH support
    intel_th: pci: Add Cedar Fork PCH support
    stm class: Fix a use-after-free
    nvmem: add missing of_node_put() in of_nvmem_cell_get()
    nvmem: core: return EFBIG on out-of-range write
    auxdisplay: charlcd: properly restore atomic counter on error path
    binder: fix memory corruption in binder_transaction binder
    binder: fix an ret value override
    android: binder: fix type mismatch warning

    Linus Torvalds
     
  • ahci_pci_reset_controller() calls ahci_reset_controller(), which may
    fail, but ignores the result code and always returns success. This
    may result in failures like below

    ahci 0000:02:00.0: version 3.0
    ahci 0000:02:00.0: enabling device (0000 -> 0003)
    ahci 0000:02:00.0: SSS flag set, parallel bus scan disabled
    ahci 0000:02:00.0: controller reset failed (0xffffffff)
    ahci 0000:02:00.0: failed to stop engine (-5)
    ... repeated many times ...
    ahci 0000:02:00.0: failed to stop engine (-5)
    Unable to handle kernel paging request at virtual address ffff0000093f9018
    ...
    PC is at ahci_stop_engine+0x5c/0xd8 [libahci]
    LR is at ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
    ...
    [] ahci_stop_engine+0x5c/0xd8 [libahci]
    [] ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
    [] ahci_init_controller+0x80/0x168 [libahci]
    [] ahci_pci_init_controller+0x60/0x68 [ahci]
    [] ahci_init_one+0x75c/0xd88 [ahci]
    [] local_pci_probe+0x3c/0xb8
    [] pci_device_probe+0x138/0x170
    [] driver_probe_device+0x2dc/0x458
    [] __driver_attach+0x114/0x118
    [] bus_for_each_dev+0x60/0xa0
    [] driver_attach+0x20/0x28
    [] bus_add_driver+0x1f0/0x2a8
    [] driver_register+0x60/0xf8
    [] __pci_register_driver+0x3c/0x48
    [] ahci_pci_driver_init+0x1c/0x1000 [ahci]
    [] do_one_initcall+0x38/0x120

    where an obvious hardware level failure results in an unnecessary 15 second
    delay and a subsequent crash.

    So record the result code of ahci_reset_controller() and relay it, rather
    than ignoring it.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Tejun Heo

    Ard Biesheuvel
     

02 Oct, 2017

6 commits

  • Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "This contains the following fixes and improvements:

    - Avoid dereferencing an unprotected VMA pointer in the fault signal
    generation code

    - Fix inline asm call constraints for GCC 4.4

    - Use existing register variable to retrieve the stack pointer
    instead of forcing the compiler to create another indirect access
    which results in excessive extra 'mov %rsp, %' instructions

    - Disable branch profiling for the memory encryption code to prevent
    an early boot crash

    - Fix a sparse warning caused by casting the __user annotation in
    __get_user_asm_u64() away

    - Fix an off by one error in the loop termination of the error patch
    in the x86 sysfs init code

    - Add missing CPU IDs to various Intel specific drivers to enable the
    functionality on recent hardware

    - More (init) constification in the numachip code"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asm: Use register variable to get stack pointer value
    x86/mm: Disable branch profiling in mem_encrypt.c
    x86/asm: Fix inline asm call constraints for GCC 4.4
    perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
    perf/x86/intel/rapl: Add missing CPU IDs
    perf/x86/msr: Add missing CPU IDs
    perf/x86/intel/cstate: Add missing CPU IDs
    x86: Don't cast away the __user in __get_user_asm_u64()
    x86/sysfs: Fix off-by-one error in loop termination
    x86/mm: Fix fault error path using unsafe vma pointer
    x86/numachip: Add const and __initconst to numachip2_clockevent

    Linus Torvalds
     
  • Pull timer fixes from Thomas Gleixner:
    "This adds a new timer wheel function which is required for the
    conversion of the timer callback function from the 'unsigned long
    data' argument to 'struct timer_list *timer'. This conversion has two
    benefits:

    1) It makes struct timer_list smaller

    2) Many callers hand in a pointer to the timer or to the structure
    containing the timer, which happens via type casting both at setup
    and in the callback. This change gets rid of the typecasts.

    Once the conversion is complete, which is planned for 4.15, the old
    setup function and the intermediate typecast in the new setup function
    go away along with the data field in struct timer_list.

    Merging this now into mainline allows a smooth queueing of the actual
    conversion in the affected maintainer trees without creating
    dependencies"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    um/time: Fixup namespace collision
    timer: Prepare to change timer callback argument type

    Linus Torvalds
     
  • Pull smp/hotplug fixes from Thomas Gleixner:
    "This addresses the fallout of the new lockdep mechanism which covers
    completions in the CPU hotplug code.

    The lockdep splats are false positives, but there is no way to
    annotate that reliably. The solution is to split the completions for
    CPU up and down, which requires some reshuffling of the failure
    rollback handling as well"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    smp/hotplug: Hotplug state fail injection
    smp/hotplug: Differentiate the AP completion between up and down
    smp/hotplug: Differentiate the AP-work lockdep class between up and down
    smp/hotplug: Callback vs state-machine consistency
    smp/hotplug: Rewrite AP state machine core
    smp/hotplug: Allow external multi-instance rollback
    smp/hotplug: Add state diagram

    Linus Torvalds
     
  • Pull scheduler fixes from Thomas Gleixner:
    "The scheduler pull request comes with the following updates:

    - Prevent a divide by zero issue by validating the input value of
    sysctl_sched_time_avg

    - Make task state printing consistent all over the place and have
    explicit state characters for IDLE and PARKED so they wont be
    displayed as 'D' state which confuses tools"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/sysctl: Check user input value of sysctl_sched_time_avg
    sched/debug: Add explicit TASK_PARKED printing
    sched/debug: Ignore TASK_IDLE for SysRq-W
    sched/debug: Add explicit TASK_IDLE printing
    sched/tracing: Use common task-state helpers
    sched/tracing: Fix trace_sched_switch task-state printing
    sched/debug: Remove unused variable
    sched/debug: Convert TASK_state to hex
    sched/debug: Implement consistent task-state printing

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:

    - Prevent a division by zero in the perf aux buffer handling

    - Sync kernel headers with perf tool headers

    - Fix a build failure in the syscalltbl code

    - Make the debug messages of perf report --call-graph work correctly

    - Make sure that all required perf files are in the MANIFEST for
    container builds

    - Fix the atrr.exclude kernel handling so it respects the
    perf_event_paranoid and the user permissions

    - Make perf test on s390x work correctly

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/aux: Only update ->aux_wakeup in non-overwrite mode
    perf test: Fix vmlinux failure on s390x part 2
    perf test: Fix vmlinux failure on s390x
    perf tools: Fix syscalltbl build failure
    perf report: Fix debug messages with --call-graph option
    perf evsel: Fix attr.exclude_kernel setting for default cycles:p
    tools include: Sync kernel ABI headers with tooling headers
    perf tools: Get all of tools/{arch,include}/ in the MANIFEST

    Linus Torvalds