20 May, 2020

1 commit

  • [ Upstream commit ea0dfeb4209b4eab954d6e00ed136bc6b48b380d ]

    Recent commit 71725ed10c40 ("mm: huge tmpfs: try to split_huge_page()
    when punching hole") has allowed syzkaller to probe deeper, uncovering a
    long-standing lockdep issue between the irq-unsafe shmlock_user_lock,
    the irq-safe xa_lock on mapping->i_pages, and shmem inode's info->lock
    which nests inside xa_lock (or tree_lock) since 4.8's shmem_uncharge().

    user_shm_lock(), servicing SysV shmctl(SHM_LOCK), wants
    shmlock_user_lock while its caller shmem_lock() holds info->lock with
    interrupts disabled; but hugetlbfs_file_setup() calls user_shm_lock()
    with interrupts enabled, and might be interrupted by a writeback endio
    wanting xa_lock on i_pages.

    This may not risk an actual deadlock, since shmem inodes do not take
    part in writeback accounting, but there are several easy ways to avoid
    it.

    Requiring interrupts disabled for shmlock_user_lock would be easy, but
    it's a high-level global lock for which that seems inappropriate.
    Instead, recall that the use of info->lock to guard info->flags in
    shmem_lock() dates from pre-3.1 days, when races with SHMEM_PAGEIN and
    SHMEM_TRUNCATE could occur: nowadays it serves no purpose, the only flag
    added or removed is VM_LOCKED itself, and calls to shmem_lock() an inode
    are already serialized by the caller.

    Take info->lock out of the chain and the possibility of deadlock or
    lockdep warning goes away.

    Fixes: 4595ef88d136 ("shmem: make shmem_inode_info::lock irq-safe")
    Reported-by: syzbot+c8a8197c8852f566b9d9@syzkaller.appspotmail.com
    Reported-by: syzbot+40b71e145e73f78f81ad@syzkaller.appspotmail.com
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Acked-by: Yang Shi
    Cc: Yang Shi
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004161707410.16322@eggly.anvils
    Link: https://lore.kernel.org/lkml/000000000000e5838c05a3152f53@google.com/
    Link: https://lore.kernel.org/lkml/0000000000003712b305a331d3b1@google.com/
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Hugh Dickins
     

02 May, 2020

1 commit

  • commit 94b7cc01da5a3cc4f3da5e0ff492ef008bb555d6 upstream.

    Syzbot reported the below lockdep splat:

    WARNING: possible irq lock inversion dependency detected
    5.6.0-rc7-syzkaller #0 Not tainted
    --------------------------------------------------------
    syz-executor.0/10317 just changed the state of lock:
    ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
    ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: shmem_mfill_atomic_pte+0x1012/0x21c0 mm/shmem.c:2407
    but this lock was taken by another, SOFTIRQ-safe lock in the past:
    (&(&xa->xa_lock)->rlock#5){..-.}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&(&info->lock)->rlock);
    local_irq_disable();
    lock(&(&xa->xa_lock)->rlock#5);
    lock(&(&info->lock)->rlock);

    lock(&(&xa->xa_lock)->rlock#5);

    *** DEADLOCK ***

    The full report is quite lengthy, please see:

    https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2004152007370.13597@eggly.anvils/T/#m813b412c5f78e25ca8c6c7734886ed4de43f241d

    It is because CPU 0 held info->lock with IRQ enabled in userfaultfd_copy
    path, then CPU 1 is splitting a THP which held xa_lock and info->lock in
    IRQ disabled context at the same time. If softirq comes in to acquire
    xa_lock, the deadlock would be triggered.

    The fix is to acquire/release info->lock with *_irq version instead of
    plain spin_{lock,unlock} to make it softirq safe.

    Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
    Reported-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
    Signed-off-by: Yang Shi
    Signed-off-by: Andrew Morton
    Tested-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
    Acked-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Link: http://lkml.kernel.org/r/1587061357-122619-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Yang Shi
     

23 Jan, 2020

1 commit

  • commit 991589974d9c9ecb24ee3799ec8c415c730598a2 upstream.

    Shmem/tmpfs tries to provide THP-friendly mappings if huge pages are
    enabled. But it doesn't work well with above-47bit hint address.

    Normally, the kernel doesn't create userspace mappings above 47-bit,
    even if the machine allows this (such as with 5-level paging on x86-64).
    Not all user space is ready to handle wide addresses. It's known that
    at least some JIT compilers use higher bits in pointers to encode their
    information.

    Userspace can ask for allocation from full address space by specifying
    hint address (with or without MAP_FIXED) above 47-bits. If the
    application doesn't need a particular address, but wants to allocate
    from whole address space it can specify -1 as a hint address.

    Unfortunately, this trick breaks THP alignment in shmem/tmp:
    shmem_get_unmapped_area() would not try to allocate PMD-aligned area if
    *any* hint address specified.

    This can be fixed by requesting the aligned area if the we failed to
    allocated at user-specified hint address. The request with inflated
    length will also take the user-specified hint address. This way we will
    not lose an allocation request from the full address space.

    [kirill@shutemov.name: fold in a fixup]
    Link: http://lkml.kernel.org/r/20191223231309.t6bh5hkbmokihpfu@box
    Link: http://lkml.kernel.org/r/20191220142548.7118-3-kirill.shutemov@linux.intel.com
    Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace")
    Signed-off-by: Kirill A. Shutemov
    Cc: "Willhalm, Thomas"
    Cc: Dan Williams
    Cc: "Bruggeman, Otto G"
    Cc: "Aneesh Kumar K . V"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

09 Jan, 2020

1 commit

  • [ Upstream commit 8897c1b1a1795cab23d5ac13e4e23bf0b5f4e0c6 ]

    syzbot found the following crash:

    BUG: KASAN: use-after-free in perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
    Read of size 8 at addr ffff8880a5cf2c50 by task syz-executor.0/26173

    CPU: 0 PID: 26173 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #146
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
    trace_lock_acquire include/trace/events/lock.h:13 [inline]
    lock_acquire+0x2de/0x410 kernel/locking/lockdep.c:4411
    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
    _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
    spin_lock include/linux/spinlock.h:338 [inline]
    shmem_fault+0x5ec/0x7b0 mm/shmem.c:2034
    __do_fault+0x111/0x540 mm/memory.c:3083
    do_shared_fault mm/memory.c:3535 [inline]
    do_fault mm/memory.c:3613 [inline]
    handle_pte_fault mm/memory.c:3840 [inline]
    __handle_mm_fault+0x2adf/0x3f20 mm/memory.c:3964
    handle_mm_fault+0x1b5/0x6b0 mm/memory.c:4001
    do_user_addr_fault arch/x86/mm/fault.c:1441 [inline]
    __do_page_fault+0x536/0xdd0 arch/x86/mm/fault.c:1506
    do_page_fault+0x38/0x590 arch/x86/mm/fault.c:1530
    page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1202

    It happens if the VMA got unmapped under us while we dropped mmap_sem
    and inode got freed.

    Pinning the file if we drop mmap_sem fixes the issue.

    Link: http://lkml.kernel.org/r/20190927083908.rhifa4mmaxefc24r@box
    Signed-off-by: Kirill A. Shutemov
    Reported-by: syzbot+03ee87124ee05af991bd@syzkaller.appspotmail.com
    Acked-by: Johannes Weiner
    Reviewed-by: Matthew Wilcox (Oracle)
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Josef Bacik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Kirill A. Shutemov
     

18 Dec, 2019

2 commits

  • commit aa71ecd8d86500da6081a72da6b0b524007e0627 upstream.

    In 64bit system. sb->s_maxbytes of shmem filesystem is MAX_LFS_FILESIZE,
    which equal LLONG_MAX.

    If offset > LLONG_MAX - PAGE_SIZE, offset + len < LLONG_MAX in
    shmem_fallocate, which will pass the checking in vfs_fallocate.

    /* Check for wrap through zero too */
    if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
    return -EFBIG;

    loff_t unmap_start = round_up(offset, PAGE_SIZE) in shmem_fallocate
    causes a overflow.

    Syzkaller reports a overflow problem in mm/shmem:

    UBSAN: Undefined behaviour in mm/shmem.c:2014:10
    signed integer overflow: '9223372036854775807 + 1' cannot be represented in type 'long long int'
    CPU: 0 PID:17076 Comm: syz-executor0 Not tainted 4.1.46+ #1
    Hardware name: linux, dummy-virt (DT)
    Call trace:
    dump_backtrace+0x0/0x2c8 arch/arm64/kernel/traps.c:100
    show_stack+0x20/0x30 arch/arm64/kernel/traps.c:238
    __dump_stack lib/dump_stack.c:15 [inline]
    ubsan_epilogue+0x18/0x70 lib/ubsan.c:164
    handle_overflow+0x158/0x1b0 lib/ubsan.c:195
    shmem_fallocate+0x6d0/0x820 mm/shmem.c:2104
    vfs_fallocate+0x238/0x428 fs/open.c:312
    SYSC_fallocate fs/open.c:335 [inline]
    SyS_fallocate+0x54/0xc8 fs/open.c:239

    The highest bit of unmap_start will be appended with sign bit 1
    (overflow) when calculate shmem_falloc.start:

    shmem_falloc.start = unmap_start >> PAGE_SHIFT.

    Fix it by casting the type of unmap_start to u64, when right shifted.

    This bug is found in LTS Linux 4.1. It also seems to exist in mainline.

    Link: http://lkml.kernel.org/r/1573867464-5107-1-git-send-email-chenjun102@huawei.com
    Signed-off-by: Chen Jun
    Reviewed-by: Andrew Morton
    Cc: Hugh Dickins
    Cc: Qian Cai
    Cc: Kefeng Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Chen Jun
     
  • commit 05d351102dbe4e103d6bdac18b1122cd3cd04925 upstream.

    F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
    A private mapping created after the memfd file that gets sealed with
    F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
    children and parent share the same memory, even though the mapping is
    private.

    The reason for this is due to the code below:

    static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
    {
    struct shmem_inode_info *info = SHMEM_I(file_inode(file));

    if (info->seals & F_SEAL_FUTURE_WRITE) {
    /*
    * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
    * "future write" seal active.
    */
    if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
    return -EPERM;

    /*
    * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
    * read-only mapping, take care to not allow mprotect to revert
    * protections.
    */
    vma->vm_flags &= ~(VM_MAYWRITE);
    }
    ...
    }

    And for the mm to know if a mapping is copy-on-write:

    static inline bool is_cow_mapping(vm_flags_t flags)
    {
    return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
    }

    The patch fixes the issue by making the mprotect revert protection
    happen only for shared mappings. For private mappings, using mprotect
    will have no effect on the seal behavior.

    The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so v5.3.x stable
    kernels would need a backport.

    [akpm@linux-foundation.org: reflow comment, per Christoph]
    Link: http://lkml.kernel.org/r/20191107195355.80608-1-joel@joelfernandes.org
    Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
    Signed-off-by: Nicolas Geoffray
    Signed-off-by: Joel Fernandes (Google)
    Cc: Hugh Dickins
    Cc: Shuah Khan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Geoffray
     

10 Oct, 2019

2 commits


29 Sep, 2019

2 commits

  • Merge hugepage allocation updates from David Rientjes:
    "We (mostly Linus, Andrea, and myself) have been discussing offlist how
    to implement a sane default allocation strategy for hugepages on NUMA
    platforms.

    With these reverts in place, the page allocator will happily allocate
    a remote hugepage immediately rather than try to make a local hugepage
    available. This incurs a substantial performance degradation when
    memory compaction would have otherwise made a local hugepage
    available.

    This series reverts those reverts and attempts to propose a more sane
    default allocation strategy specifically for hugepages. Andrea
    acknowledges this is likely to fix the swap storms that he originally
    reported that resulted in the patches that removed __GFP_THISNODE from
    hugepage allocations.

    The immediate goal is to return 5.3 to the behavior the kernel has
    implemented over the past several years so that remote hugepages are
    not immediately allocated when local hugepages could have been made
    available because the increased access latency is untenable.

    The next goal is to introduce a sane default allocation strategy for
    hugepages allocations in general regardless of the configuration of
    the system so that we prevent thrashing of local memory when
    compaction is unlikely to succeed and can prefer remote hugepages over
    remote native pages when the local node is low on memory."

    Note on timing: this reverts the hugepage VM behavior changes that got
    introduced fairly late in the 5.3 cycle, and that fixed a huge
    performance regression for certain loads that had been around since
    4.18.

    Andrea had this note:

    "The regression of 4.18 was that it was taking hours to start a VM
    where 3.10 was only taking a few seconds, I reported all the details
    on lkml when it was finally tracked down in August 2018.

    https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/

    __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
    workload degrade like in the "current upstream" above. And it still
    would have been that bad as above until 5.3-rc5"

    where the bad behavior ends up happening as you fill up a local node,
    and without that change, you'd get into the nasty swap storm behavior
    due to compaction working overtime to make room for more memory on the
    nodes.

    As a result 5.3 got the two performance fix reverts in rc5.

    However, David Rientjes then noted that those performance fixes in turn
    regressed performance for other loads - although not quite to the same
    degree. He suggested reverting the reverts and instead replacing them
    with two small changes to how hugepage allocations are done (patch
    descriptions rephrased by me):

    - "avoid expensive reclaim when compaction may not succeed": just admit
    that the allocation failed when you're trying to allocate a huge-page
    and compaction wasn't successful.

    - "allow hugepage fallback to remote nodes when madvised": when that
    node-local huge-page allocation failed, retry without forcing the
    local node.

    but by then I judged it too late to replace the fixes for a 5.3 release.
    So 5.3 was released with behavior that harked back to the pre-4.18 logic.

    But now we're in the merge window for 5.4, and we can see if this
    alternate model fixes not just the horrendous swap storm behavior, but
    also restores the performance regression that the late reverts caused.

    Fingers crossed.

    * emailed patches from David Rientjes :
    mm, page_alloc: allow hugepage fallback to remote nodes when madvised
    mm, page_alloc: avoid expensive reclaim when compaction may not succeed
    Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
    Revert "Revert "mm, thp: restore node-local hugepage allocations""

    Linus Torvalds
     
  • This reverts commit 92717d429b38e4f9f934eed7e605cc42858f1839.

    Since commit a8282608c88e ("Revert "mm, thp: restore node-local hugepage
    allocations"") is reverted in this series, it is better to restore the
    previous 5.2 behavior between the thp allocation and the page allocator
    rather than to attempt any consolidation or cleanup for a policy that is
    now reverted. It's less risky during an rc cycle and subsequent patches
    in this series further modify the same policy that the pre-5.3 behavior
    implements.

    Consolidation and cleanup can be done subsequent to a sane default page
    allocation strategy, so this patch reverts a cleanup done on a strategy
    that is now reverted and thus is the least risky option.

    Signed-off-by: David Rientjes
    Cc: Andrea Arcangeli
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Vlastimil Babka
    Cc: Stefan Priebe - Profihost AG
    Cc: "Kirill A. Shutemov"
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

25 Sep, 2019

3 commits

  • Replace "fault_mm" with "vmf" in code comment because commit cfda05267f7b
    ("userfaultfd: shmem: add userfaultfd hook for shared memory faults") has
    changed the prototpye of shmem_getpage_gfp() - pass vmf instead of
    fault_mm to the function.

    Before:
    static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
    struct page **pagep, enum sgp_type sgp,
    gfp_t gfp, struct mm_struct *fault_mm, int *fault_type);
    After:
    static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
    struct page **pagep, enum sgp_type sgp,
    gfp_t gfp, struct vm_area_struct *vma,
    struct vm_fault *vmf, vm_fault_t *fault_type);

    Link: http://lkml.kernel.org/r/20190816100204.9781-1-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     
  • Transparent Huge Pages are currently stored in i_pages as pointers to
    consecutive subpages. This patch changes that to storing consecutive
    pointers to the head page in preparation for storing huge pages more
    efficiently in i_pages.

    Large parts of this are "inspired" by Kirill's patch
    https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

    Kirill and Huang Ying contributed several fixes.

    [willy@infradead.org: use compound_nr, squish uninit-var warning]
    Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jan Kara
    Reviewed-by: Kirill Shutemov
    Reviewed-by: Song Liu
    Tested-by: Song Liu
    Tested-by: William Kucharski
    Reviewed-by: William Kucharski
    Tested-by: Qian Cai
    Tested-by: Mikhail Gavrilov
    Cc: Hugh Dickins
    Cc: Chris Wilson
    Cc: Song Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Replace 1 << compound_order(page) with compound_nr(page). Minor
    improvements in readability.

    Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

13 Sep, 2019

5 commits

  • Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new
    internal mount API as the old one will be obsoleted and removed. This
    allows greater flexibility in communication of mount parameters between
    userspace, the VFS and the filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Note that tmpfs is slightly tricky as it can contain embedded commas, so it
    can't be trivially split up using strsep() to break on commas in
    generic_parse_monolithic(). Instead, tmpfs has to supply its own generic
    parser.

    However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers
    around tmpfs or ramfs, must change too - and thus so must ramfs, so these
    had to be converted also.

    [AV: rewritten]

    Signed-off-by: David Howells
    cc: Hugh Dickins
    cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    David Howells
     
  • This thing will eventually become our ->parse_param(), while
    shmem_parse_options() - ->parse_monolithic(). At that point
    shmem_parse_options() will start calling vfs_parse_fs_string(),
    rather than calling shmem_parse_one() directly.

    Signed-off-by: Al Viro

    Al Viro
     
  • mechanical move.

    Signed-off-by: Al Viro

    Al Viro
     
  • just use ctx->mpol (note that callers always set ctx->mpol to NULL when
    calling that).

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and copy the data from it into sbinfo in the callers.
    For use by remount we need to keep track whether there'd
    been options setting max_inodes, max_blocks and huge resp.
    and do the sanity checks (and copying) only if such options
    had been seen. uid/gid/mode is ignored by remount and
    NULL mpol is already explicitly treated as "ignore it",
    so we don't need to keep track of those.

    Note: theoretically, mpol_parse_string() may return NULL
    not in case of error (for default policy), so the assumption
    that NULL mpol means "change nothing" is incorrect. However,
    that's the mainline behaviour and any changes belong in
    a separate patch. If we go for that, we'll need to keep
    track of having encountered mpol= option too.

    [changes in remount logics from Hugh Dickins folded]

    Signed-off-by: Al Viro

    Al Viro
     

06 Sep, 2019

1 commit


14 Aug, 2019

1 commit

  • Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings".

    The fixes for what was originally reported as "pathological THP
    behavior" we rightfully reverted to be sure not to introduced
    regressions at end of a merge window after a severe regression report
    from the kernel bot. We can safely re-apply them now that we had time
    to analyze the problem.

    The mm process worked fine, because the good fixes were eventually
    committed upstream without excessive delay.

    The regression reported by the kernel bot however forced us to revert
    the good fixes to be sure not to introduce regressions and to give us
    the time to analyze the issue further. The silver lining is that this
    extra time allowed to think more at this issue and also plan for a
    future direction to improve things further in terms of THP NUMA
    locality.

    This patch (of 2):

    This reverts commit 356ff8a9a78fb35d ("Revert "mm, thp: consolidate THP
    gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies
    89c83fb539f954 ("mm, thp: consolidate THP gfp handling into
    alloc_hugepage_direct_gfpmask").

    Consolidation of the THP allocation flags at the same place was meant to
    be a clean up to easier handle otherwise scattered code which is
    imposing a maintenance burden. There were no real problems observed
    with the gfp mask consolidation but the reversion was rushed through
    without a larger consensus regardless.

    This patch brings the consolidation back because this should make the
    long term maintainability easier as well as it should allow future
    changes to be less error prone.

    [mhocko@kernel.org: changelog additions]
    Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Zi Yan
    Cc: Stefan Priebe - Profihost AG
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

19 Jul, 2019

1 commit

  • Commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
    vma") introduced THPeligible bit for processes' smaps. But, when
    checking the eligibility for shmem vma, __transparent_hugepage_enabled()
    is called to override the result from shmem_huge_enabled(). It may
    result in the anonymous vma's THP flag override shmem's. For example,
    running a simple test which create THP for shmem, but with anonymous THP
    disabled, when reading the process's smaps, it may show:

    7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
    Size: 4096 kB
    ...
    [snip]
    ...
    ShmemPmdMapped: 4096 kB
    ...
    [snip]
    ...
    THPeligible: 0

    And, /proc/meminfo does show THP allocated and PMD mapped too:

    ShmemHugePages: 4096 kB
    ShmemPmdMapped: 4096 kB

    This doesn't make too much sense. The shmem objects should be treated
    separately from anonymous THP. Calling shmem_huge_enabled() with
    checking MMF_DISABLE_THP sounds good enough. And, we could skip stack
    and dax vma check since we already checked if the vma is shmem already.

    Also check if vma is suitable for THP by calling
    transhuge_vma_suitable().

    And minor fix to smaps output format and documentation.

    Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
    Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
    Signed-off-by: Yang Shi
    Acked-by: Hugh Dickins
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

17 Jul, 2019

1 commit

  • When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a
    warning about shmem_parse_huge() never being called:

    mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' [-Werror,-Wunused-function]
    static int shmem_parse_huge(const char *str)

    Change the #ifdef so we no longer build this function in that configuration.

    Link: http://lkml.kernel.org/r/20190712091141.673355-1-arnd@arndb.de
    Fixes: 144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
    Signed-off-by: Arnd Bergmann
    Cc: Hugh Dickins
    Cc: Arnd Bergmann
    Cc: David Howells
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Vineeth Remanan Pillai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

06 Jul, 2019

1 commit

  • This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f.

    Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in
    __delete_from_swap_cache() to trigger:

    page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363
    anon
    flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
    raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
    raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
    page dumped because: VM_BUG_ON_PAGE(entry != page)
    page->mem_cgroup:ffff978433ace000
    ------------[ cut here ]------------
    kernel BUG at mm/swap_state.c:170!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
    Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019
    RIP: 0010:__delete_from_swap_cache+0x20d/0x240
    Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
    RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
    RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
    RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
    R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
    R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
    FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
    Call Trace:
    delete_from_swap_cache+0x46/0xa0
    try_to_free_swap+0xbc/0x110
    swap_writepage+0x13/0x70
    pageout.isra.0+0x13c/0x350
    shrink_page_list+0xc14/0xdf0
    shrink_inactive_list+0x1e5/0x3c0
    shrink_node_memcg+0x202/0x760
    shrink_node+0xe0/0x470
    balance_pgdat+0x2d1/0x510
    kswapd+0x220/0x420
    kthread+0xfb/0x130
    ret_from_fork+0x22/0x40

    and it's not immediately obvious why it happens. It's too late in the
    rc cycle to do anything but revert for now.

    Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/
    Reported-and-bisected-by: Mikhail Gavrilov
    Suggested-by: Jan Kara
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Kirill Shutemov
    Cc: William Kucharski
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Jul, 2019

1 commit

  • No point having two call sites (earlier in init_rootfs() from
    mnt_init() in case we are going to use shmem-style rootfs,
    later from do_basic_setup() unconditionally), along with the
    logics in shmem_init() itself to make the second call a no-op...

    Signed-off-by: Al Viro

    Al Viro
     

15 May, 2019

1 commit

  • Transparent Huge Pages are currently stored in i_pages as pointers to
    consecutive subpages. This patch changes that to storing consecutive
    pointers to the head page in preparation for storing huge pages more
    efficiently in i_pages.

    Large parts of this are "inspired" by Kirill's patch
    https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

    [willy@infradead.org: fix swapcache pages]
    Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
    [kirill@shutemov.name: hugetlb stores pages in page cache differently]
    Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
    Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jan Kara
    Reviewed-by: Kirill Shutemov
    Reviewed-and-tested-by: Song Liu
    Tested-by: William Kucharski
    Reviewed-by: William Kucharski
    Tested-by: Qian Cai
    Cc: Hugh Dickins
    Cc: Song Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

08 May, 2019

1 commit

  • Pull vfs inode freeing updates from Al Viro:
    "Introduction of separate method for RCU-delayed part of
    ->destroy_inode() (if any).

    Pretty much as posted, except that destroy_inode() stashes
    ->free_inode into the victim (anon-unioned with ->i_fops) before
    scheduling i_callback() and the last two patches (sockfs conversion
    and folding struct socket_wq into struct socket) are excluded - that
    pair should go through netdev once davem reopens his tree"

    * 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
    orangefs: make use of ->free_inode()
    shmem: make use of ->free_inode()
    hugetlb: make use of ->free_inode()
    overlayfs: make use of ->free_inode()
    jfs: switch to ->free_inode()
    fuse: switch to ->free_inode()
    ext4: make use of ->free_inode()
    ecryptfs: make use of ->free_inode()
    ceph: use ->free_inode()
    btrfs: use ->free_inode()
    afs: switch to use of ->free_inode()
    dax: make use of ->free_inode()
    ntfs: switch to ->free_inode()
    securityfs: switch to ->free_inode()
    apparmor: switch to ->free_inode()
    rpcpipe: switch to ->free_inode()
    bpf: switch to ->free_inode()
    mqueue: switch to ->free_inode()
    ufs: switch to ->free_inode()
    coda: switch to ->free_inode()
    ...

    Linus Torvalds
     

02 May, 2019

1 commit


20 Apr, 2019

2 commits

  • The igrab() in shmem_unuse() looks good, but we forgot that it gives no
    protection against concurrent unmounting: a point made by Konstantin
    Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78
    ("tmpfs: fix race between umount and swapoff"). The current 5.1-rc
    swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
    Self-destruct in 5 seconds. Have a nice day..." followed by GPF.

    Once again, give up on using igrab(); but don't go back to making such
    heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
    the new design, and I expect could deadlock inside shmem_swapin_page().

    Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
    specific inode, and shmem_evict_inode() wait for that to go down to 0.
    Call it "stop_eviction" rather than "swapoff_busy" because it can be put
    to use for others later (huge tmpfs patches expect to use it).

    That simplifies shmem_unuse(), protecting it from both unlink and
    unmount; and in practice lets it locate all the swap in its first try.
    But do not rely on that: there's still a theoretical case, when
    shmem_writepage() might have been preempted after its get_swap_page(),
    before making the swap entry visible to swapoff.

    [hughd@google.com: remove incorrect list_del()]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils
    Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
    Signed-off-by: Hugh Dickins
    Cc: "Alex Xu (Hello71)"
    Cc: Huang Ying
    Cc: Kelley Nielsen
    Cc: Konstantin Khlebnikov
    Cc: Rik van Riel
    Cc: Vineeth Pillai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Swapfile "type" was passed all the way down to shmem_unuse_inode(), but
    then forgotten from shmem_find_swap_entries(): with the result that
    removing one swapfile would try to free up all the swap from shmem - no
    problem when only one swapfile anyway, but counter-productive when more,
    causing swapoff to be unnecessarily OOM-killed when it should succeed.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081254470.1523@eggly.anvils
    Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
    Signed-off-by: Hugh Dickins
    Cc: Konstantin Khlebnikov
    Cc: "Alex Xu (Hello71)"
    Cc: Vineeth Pillai
    Cc: Kelley Nielsen
    Cc: Rik van Riel
    Cc: Huang Ying
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Mar, 2019

3 commits

  • Android uses ashmem for sharing memory regions. We are looking forward
    to migrating all usecases of ashmem to memfd so that we can possibly
    remove the ashmem driver in the future from staging while also
    benefiting from using memfd and contributing to it. Note staging
    drivers are also not ABI and generally can be removed at anytime.

    One of the main usecases Android has is the ability to create a region
    and mmap it as writeable, then add protection against making any
    "future" writes while keeping the existing already mmap'ed
    writeable-region active. This allows us to implement a usecase where
    receivers of the shared memory buffer can get a read-only view, while
    the sender continues to write to the buffer. See CursorWindow
    documentation in Android for more details:

    https://developer.android.com/reference/android/database/CursorWindow

    This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
    To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
    which prevents any future mmap and write syscalls from succeeding while
    keeping the existing mmap active.

    A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
    where we don't need to modify core VFS structures to get the same
    behavior of the seal. This solves several side-effects pointed by Andy.
    self-tests are provided in later patch to verify the expected semantics.

    [1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/

    Thanks a lot to Andy for suggestions to improve code.

    Link: http://lkml.kernel.org/r/20190112203816.85534-2-joel@joelfernandes.org
    Signed-off-by: Joel Fernandes (Google)
    Acked-by: John Stultz
    Cc: Andy Lutomirski
    Cc: Minchan Kim
    Cc: Jann Horn
    Cc: Al Viro
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: J. Bruce Fields
    Cc: Jeff Layton
    Cc: Marc-Andr Lureau
    Cc: Matthew Wilcox
    Cc: Mike Kravetz
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joel Fernandes (Google)
     
  • This patch was initially posted by Kelley Nielsen. Reposting the patch
    with all review comments addressed and with minor modifications and
    optimizations. Also, folding in the fixes offered by Hugh Dickins and
    Huang Ying. Tests were rerun and commit message updated with new
    results.

    try_to_unuse() is of quadratic complexity, with a lot of wasted effort.
    It unuses swap entries one by one, potentially iterating over all the
    page tables for all the processes in the system for each one.

    This new proposed implementation of try_to_unuse simplifies its
    complexity to linear. It iterates over the system's mms once, unusing
    all the affected entries as it walks each set of page tables. It also
    makes similar changes to shmem_unuse.

    Improvement

    swapoff was called on a swap partition containing about 6G of data, in a
    VM(8cpu, 16G RAM), and calls to unuse_pte_range() were counted.

    Present implementation....about 1200M calls(8min, avg 80% cpu util).
    Prototype.................about 9.0K calls(3min, avg 5% cpu util).

    Details

    In shmem_unuse(), iterate over the shmem_swaplist and, for each
    shmem_inode_info that contains a swap entry, pass it to
    shmem_unuse_inode(), along with the swap type. In shmem_unuse_inode(),
    iterate over its associated xarray, and store the index and value of
    each swap entry in an array for passing to shmem_swapin_page() outside
    of the RCU critical section.

    In try_to_unuse(), instead of iterating over the entries in the type and
    unusing them one by one, perhaps walking all the page tables for all the
    processes for each one, iterate over the mmlist, making one pass. Pass
    each mm to unuse_mm() to begin its page table walk, and during the walk,
    unuse all the ptes that have backing store in the swap type received by
    try_to_unuse(). After the walk, check the type for orphaned swap
    entries with find_next_to_unuse(), and remove them from the swap cache.
    If find_next_to_unuse() starts over at the beginning of the type, repeat
    the check of the shmem_swaplist and the walk a maximum of three times.

    Change unuse_mm() and the intervening walk functions down to
    unuse_pte_range() to take the type as a parameter, and to iterate over
    their entire range, calling the next function down on every iteration.
    In unuse_pte_range(), make a swap entry from each pte in the range using
    the passed in type. If it has backing store in the type, call
    swapin_readahead() to retrieve the page and pass it to unuse_pte().

    Pass the count of pages_to_unuse down the page table walks in
    try_to_unuse(), and return from the walk when the desired number of
    pages has been swapped back in.

    Link: http://lkml.kernel.org/r/20190114153129.4852-2-vpillai@digitalocean.com
    Signed-off-by: Vineeth Remanan Pillai
    Signed-off-by: Kelley Nielsen
    Signed-off-by: Huang Ying
    Acked-by: Hugh Dickins
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineeth Remanan Pillai
     
  • swapin logic can be reused independently without rest of the logic in
    shmem_getpage_gfp. So lets refactor it out as an independent function.

    Link: http://lkml.kernel.org/r/20190114153129.4852-1-vpillai@digitalocean.com
    Signed-off-by: Vineeth Remanan Pillai
    Reviewed-by: Andrew Morton
    Cc: Huang Ying
    Cc: Hugh Dickins
    Cc: Kelley Nielsen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineeth Remanan Pillai
     

26 Feb, 2019

1 commit

  • When we made the shmem_reserve_inode call in shmem_link conditional, we
    forgot to update the declaration for ret so that it always has a known
    value. Dan Carpenter pointed out this deficiency in the original patch.

    Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in")
    Reported-by: Dan Carpenter
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Hugh Dickins
    Cc: Matej Kupljen
    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     

22 Feb, 2019

1 commit

  • tmpfs has a peculiarity of accounting hard links as if they were
    separate inodes: so that when the number of inodes is limited, as it is
    by default, a user cannot soak up an unlimited amount of unreclaimable
    dcache memory just by repeatedly linking a file.

    But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
    fd, we missed accommodating this new case in tmpfs: "df -i" shows that
    an extra "inode" remains accounted after the file is unlinked and the fd
    closed and the actual inode evicted. If a user repeatedly links
    tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
    are deleted.

    Just skip the extra reservation from shmem_link() in this case: there's
    a sense in which this first link of a tmpfile is then cheaper than a
    hard link of another file, but the accounting works out, and there's
    still good limiting, so no need to do anything more complicated.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
    Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to")
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Hugh Dickins
    Reported-by: Matej Kupljen
    Acked-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     

29 Dec, 2018

2 commits

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     
  • Patch series "mm: convert totalram_pages, totalhigh_pages and managed
    pages to atomic", v5.

    This series converts totalram_pages, totalhigh_pages and
    zone->managed_pages to atomic variables.

    totalram_pages, zone->managed_pages and totalhigh_pages updates are
    protected by managed_page_count_lock, but readers never care about it.
    Convert these variables to atomic to avoid readers potentially seeing a
    store tear.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
    to remove the lock and convert variables to atomic. With the change,
    preventing poteintial store-to-read tearing comes as a bonus.

    This patch (of 4):

    This is in preparation to a later patch which converts totalram_pages and
    zone->managed_pages to atomic variables. Please note that re-reading the
    value might lead to a different value and as such it could lead to
    unexpected behavior. There are no known bugs as a result of the current
    code but it is better to prevent from them in principle.

    Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

26 Dec, 2018

1 commit

  • Pull drm updates from Dave Airlie:
    "Core:
    - shared fencing staging removal
    - drop transactional atomic helpers and move helpers to new location
    - DP/MST atomic cleanup
    - Leasing cleanups and drop EXPORT_SYMBOL
    - Convert drivers to atomic helpers and generic fbdev.
    - removed deprecated obj_ref/unref in favour of get/put
    - Improve dumb callback documentation
    - MODESET_LOCK_BEGIN/END helpers

    panels:
    - CDTech panels, Banana Pi Panel, DLC1010GIG,
    - Olimex LCD-O-LinuXino, Samsung S6D16D0, Truly NT35597 WQXGA,
    - Himax HX8357D, simulated RTSM AEMv8.
    - GPD Win2 panel
    - AUO G101EVN010

    vgem:
    - render node support

    ttm:
    - move global init out of drivers
    - fix LRU handling for ghost objects
    - Support for simultaneous submissions to multiple engines

    scheduler:
    - timeout/fault handling changes to help GPU recovery
    - helpers for hw with preemption support

    i915:
    - Scaler/Watermark fixes
    - DP MST + powerwell fixes
    - PSR fixes
    - Break long get/put shmemfs pages
    - Icelake fixes
    - Icelake DSI video mode enablement
    - Engine workaround improvements

    amdgpu:
    - freesync support
    - GPU reset enabled on CI, VI, SOC15 dGPUs
    - ABM support in DC
    - KFD support for vega12/polaris12
    - SDMA paging queue on vega
    - More amdkfd code sharing
    - DCC scanout on GFX9
    - DC kerneldoc
    - Updated SMU firmware for GFX8 chips
    - XGMI PSP + hive reset support
    - GPU reset
    - DC trace support
    - Powerplay updates for newer Polaris
    - Cursor plane update fast path
    - kfd dma-buf support

    virtio-gpu:
    - add EDID support

    vmwgfx:
    - pageflip with damage support

    nouveau:
    - Initial Turing TU104/TU106 modesetting support

    msm:
    - a2xx gpu support for apq8060 and imx5
    - a2xx gpummu support
    - mdp4 display support for apq8060
    - DPU fixes and cleanups
    - enhanced profiling support
    - debug object naming interface
    - get_iova/page pinning decoupling

    tegra:
    - Tegra194 host1x, VIC and display support enabled
    - Audio over HDMI for Tegra186 and Tegra194

    exynos:
    - DMA/IOMMU refactoring
    - plane alpha + blend mode support
    - Color format fixes for mixer driver

    rcar-du:
    - R8A7744 and R8A77470 support
    - R8A77965 LVDS support

    imx:
    - fbdev emulation fix
    - multi-tiled scalling fixes
    - SPDX identifiers

    rockchip
    - dw_hdmi support
    - dw-mipi-dsi + dual dsi support
    - mailbox read size fix

    qxl:
    - fix cursor pinning

    vc4:
    - YUV support (scaling + cursor)

    v3d:
    - enable TFU (Texture Formatting Unit)

    mali-dp:
    - add support for linear tiled formats

    sun4i:
    - Display Engine 3 support
    - H6 DE3 mixer 0 support
    - H6 display engine support
    - dw-hdmi support
    - H6 HDMI phy support
    - implicit fence waiting
    - BGRX8888 support

    meson:
    - Overlay plane support
    - implicit fence waiting
    - HDMI 1.4 4k modes

    bridge:
    - i2c fixes for sii902x"

    * tag 'drm-next-2018-12-14' of git://anongit.freedesktop.org/drm/drm: (1403 commits)
    drm/amd/display: Add fast path for cursor plane updates
    drm/amdgpu: Enable GPU recovery by default for CI
    drm/amd/display: Fix duplicating scaling/underscan connector state
    drm/amd/display: Fix unintialized max_bpc state values
    Revert "drm/amd/display: Set RMX_ASPECT as default"
    drm/amdgpu: Fix stub function name
    drm/msm/dpu: Fix clock issue after bind failure
    drm/msm/dpu: Clean up dpu_media_info.h static inline functions
    drm/msm/dpu: Further cleanups for static inline functions
    drm/msm/dpu: Cleanup the debugfs functions
    drm/msm/dpu: Remove dpu_irq and unused functions
    drm/msm: Make irq_postinstall optional
    drm/msm/dpu: Cleanup callers of dpu_hw_blk_init
    drm/msm/dpu: Remove unused functions
    drm/msm/dpu: Remove dpu_crtc_is_enabled()
    drm/msm/dpu: Remove dpu_crtc_get_mixer_height
    drm/msm/dpu: Remove dpu_dbg
    drm/msm: dpu: Remove crtc_lock
    drm/msm: dpu: Remove vblank_requested flag from dpu_crtc
    drm/msm: dpu: Separate crtc assignment from vblank enable
    ...

    Linus Torvalds
     

14 Dec, 2018

1 commit

  • Pull XArray fixes from Matthew Wilcox:
    "Two bugfixes, each with test-suite updates, two improvements to the
    test-suite without associated bugs, and one patch adding a missing
    API"

    * tag 'xarray-4.20-rc7' of git://git.infradead.org/users/willy/linux-dax:
    XArray: Fix xa_alloc when id exceeds max
    XArray tests: Check iterating over multiorder entries
    XArray tests: Handle larger indices more elegantly
    XArray: Add xa_cmpxchg_irq and xa_cmpxchg_bh
    radix tree: Don't return retry entries from lookup

    Linus Torvalds
     

09 Dec, 2018

1 commit

  • This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317.

    This should have been done as part of 2f0799a0ffc0 ("mm, thp: restore
    node-local hugepage allocations"). The movement of the thp allocation
    policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
    intended to only set __GFP_THISNODE for mempolicies that are not
    MPOL_BIND whereas the revert could set this regardless of mempolicy.

    While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
    and alloc_pages_vma() was racy, that has since been removed since the
    revert. What is left is the possibility to use __GFP_THISNODE in
    policy_node() when it is unexpected because the special handling for
    hugepages in alloc_pages_vma() was removed as part of the consolidation.

    Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat
    different policy for hugepage allocations, which were allocated through
    alloc_hugepage_vma(). For hugepage allocations, if the allocating
    process's node is in the set of allowed nodes, allocate with
    __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
    __GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to
    allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in
    mm/mempolicy.c which is functionally different behavior and removes the
    requirement to only allocate hugepages locally.

    So this commit does a full revert of 89c83fb539f9 instead of the partial
    revert that was done in 2f0799a0ffc0. The result is the same thp
    allocation policy for 4.20 that was in 4.19.

    Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
    Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
    Signed-off-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes