30 Dec, 2020

2 commits

  • [ Upstream commit 1e8aaedb182d6ddffc894b832e4962629907b3e0 ]

    madvise_inject_error() uses get_user_pages_fast to translate the address
    we specified to a page. After [1], we drop the extra reference count for
    memory_failure() path. That commit says that memory_failure wanted to
    keep the pin in order to take the page out of circulation.

    The truth is that we need to keep the page pinned, otherwise the page
    might be re-used after the put_page() and we can end up messing with
    someone else's memory.

    E.g:

    CPU0
    process X CPU1
    madvise_inject_error
    get_user_pages
    put_page
    page gets reclaimed
    process Y allocates the page
    memory_failure
    // We mess with process Y memory

    madvise() is meant to operate on a self address space, so messing with
    pages that do not belong to us seems the wrong thing to do.
    To avoid that, let us keep the page pinned for memory_failure as well.

    Pages for DAX mappings will release this extra refcount in
    memory_failure_dev_pagemap.

    [1] ("23e7b5c2e271: mm, madvise_inject_error:
    Let memory_failure() optionally take a page reference")

    Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de
    Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
    Signed-off-by: Oscar Salvador
    Suggested-by: Vlastimil Babka
    Acked-by: Naoya Horiguchi
    Cc: Vlastimil Babka
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Oscar Salvador
     
  • [ Upstream commit 013339df116c2ee0d796dd8bfb8f293a2030c063 ]

    Since commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic
    v2"), the code to check the secondary MMU's page table access bit is
    broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
    secondary MMU's page table before the check. More specifically for those
    secondary MMUs which unmap the memory in
    mmu_notifier_invalidate_range_start() like kvm.

    However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
    absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
    access check before trying to unmap the page. So, at worst the reclaim
    will miss accesses in a very short window if we remove page table access
    check in unmapping code.

    There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
    reclaim. From memcg reclaim the page_referenced() only account the
    accesses from the processes which are in the same memcg of the target page
    but the unmapping code is considering accesses from all the processes, so,
    decreasing the effectiveness of memcg reclaim.

    The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
    code.

    Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com
    Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
    Signed-off-by: Shakeel Butt
    Acked-by: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Shakeel Butt
     

15 Nov, 2020

1 commit

  • Qian Cai reported the following BUG in [1]

    LTP: starting move_pages12
    BUG: unable to handle page fault for address: ffffffffffffffe0
    ...
    RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
    Call Trace:
    rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
    try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
    migrate_pages+0x1005/0x1fb0
    move_pages_and_store_status.isra.47+0xd7/0x1a0
    __x64_sys_move_pages+0xa5c/0x1100
    do_syscall_64+0x5f/0x310
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Hugh Dickins diagnosed this as a migration bug caused by code introduced
    to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the
    routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
    flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for
    anon pages as the anon_vma_lock should be held in this case. Further
    analysis suggested that i_mmap_rwsem was not required to he held at all
    when calling try_to_unmap for anon pages as an anon page could never be
    part of a shared pmd mapping.

    Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
    to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to
    keep mapping valid while dropping page lock.

    This patch does the following:

    - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
    calling try_to_unmap.

    - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
    will now simply do a 'trylock' while still holding the page lock. If
    the trylock fails, it will return NULL. This could impact the
    callers:

    - migration calling code will receive -EAGAIN and retry up to the
    hard coded limit (10).

    - memory error code will treat the page as BUSY. This will force
    killing (SIGKILL) instead of SIGBUS any mapping tasks.

    Do note that this change in behavior only happens when there is a
    race. None of the standard kernel testing suites actually hit this
    race, but it is possible.

    [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
    [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/

    Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    Reported-by: Qian Cai
    Suggested-by: Hugh Dickins
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc:
    Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

19 Oct, 2020

1 commit

  • There is a well-defined standard migration target callback. Use it
    directly.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Christoph Hellwig
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: Roman Gushchin
    Link: http://lkml.kernel.org/r/1594622517-20681-9-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

17 Oct, 2020

12 commits

  • Aristeu Rozanski reported that a customer test case started to report
    -EBUSY after the hwpoison rework patchset.

    There is a race window between spotting a free page and taking it off its
    buddy freelist, so it might be that by the time we try to take it off, the
    page has been already allocated.

    This patch tries to handle such race window by trying to handle the new
    type of page again if the page was allocated under us.

    Reported-by: Aristeu Rozanski
    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Tested-by: Aristeu Rozanski
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-15-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Soft offlining could fail with EIO due to the race condition with hugepage
    migration. This issuse became visible due to the change by previous patch
    that makes soft offline handler take page refcount by its own. We have no
    way to directly pin zero refcount page, and the page considered as a zero
    refcount page could be allocated just after the first check.

    This patch adds the second check to find the race and gives us chance to
    handle it more reliably.

    Reported-by: Qian Cai
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-14-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • memory_failure() is supposed to call action_result() when it handles a
    memory error event, but there's one missing case. So let's add it.

    I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so
    this patch also adds them.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-13-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Currently, there is an inconsistency when calling soft-offline from
    different paths on a page that is already poisoned.

    1) madvise:

    madvise_inject_error skips any poisoned page and continues
    the loop.
    If that was the only page to madvise, it returns 0.

    2) /sys/devices/system/memory/:

    When calling soft_offline_page_store()->soft_offline_page(),
    we return -EBUSY in case the page is already poisoned.
    This is inconsistent with a) the above example and b)
    memory_failure, where we return 0 if the page was poisoned.

    Fix this by dropping the PageHWPoison() check in madvise_inject_error, and
    let soft_offline_page return 0 if it finds the page already poisoned.

    Please, note that this represents a user-api change, since now the return
    error when calling soft_offline_page_store()->soft_offline_page() will be
    different.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-12-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Merging soft_offline_huge_page and __soft_offline_page let us get rid of
    quite some duplicated code, and makes the code much easier to follow.

    Now, __soft_offline_page will handle both normal and hugetlb pages.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-11-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • This patch changes the way we set and handle in-use poisoned pages. Until
    now, poisoned pages were released to the buddy allocator, trusting that
    the checks that take place at allocation time would act as a safe net and
    would skip that page.

    This has proved to be wrong, as we got some pfn walkers out there, like
    compaction, that all they care is the page to be in a buddy freelist.

    Although this might not be the only user, having poisoned pages in the
    buddy allocator seems a bad idea as we should only have free pages that
    are ready and meant to be used as such.

    Before explaining the taken approach, let us break down the kind of pages
    we can soft offline.

    - Anonymous THP (after the split, they end up being 4K pages)
    - Hugetlb
    - Order-0 pages (that can be either migrated or invalited)

    * Normal pages (order-0 and anon-THP)

    - If they are clean and unmapped page cache pages, we invalidate
    then by means of invalidate_inode_page().
    - If they are mapped/dirty, we do the isolate-and-migrate dance.

    Either way, do not call put_page directly from those paths. Instead, we
    keep the page and send it to page_handle_poison to perform the right
    handling.

    page_handle_poison sets the HWPoison flag and does the last put_page.

    Down the chain, we placed a check for HWPoison page in
    free_pages_prepare, that just skips any poisoned page, so those pages
    do not end up in any pcplist/freelist.

    After that, we set the refcount on the page to 1 and we increment
    the poisoned pages counter.

    If we see that the check in free_pages_prepare creates trouble, we can
    always do what we do for free pages:

    - wait until the page hits buddy's freelists
    - take it off, and flag it

    The downside of the above approach is that we could race with an
    allocation, so by the time we want to take the page off the buddy, the
    page has been already allocated so we cannot soft offline it.
    But the user could always retry it.

    * Hugetlb pages

    - We isolate-and-migrate them

    After the migration has been successful, we call dissolve_free_huge_page,
    and we set HWPoison on the page if we succeed.
    Hugetlb has a slightly different handling though.

    While for non-hugetlb pages we cared about closing the race with an
    allocation, doing so for hugetlb pages requires quite some additional
    and intrusive code (we would need to hook in free_huge_page and some other
    places).
    So I decided to not make the code overly complicated and just fail
    normally if the page we allocated in the meantime.

    We can always build on top of this.

    As a bonus, because of the way we handle now in-use pages, we no longer
    need the put-as-isolation-migratetype dance, that was guarding for poisoned
    pages to end up in pcplists.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • When trying to soft-offline a free page, we need to first take it off the
    buddy allocator. Once we know is out of reach, we can safely flag it as
    poisoned.

    take_page_off_buddy will be used to take a page meant to be poisoned off
    the buddy allocator. take_page_off_buddy calls break_down_buddy_pages,
    which splits a higher-order page in case our page belongs to one.

    Once the page is under our control, we call page_handle_poison to set it
    as poisoned and grab a refcount on it.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Place the THP's page handling in a helper and use it from both hard and
    soft-offline machinery, so we get rid of some duplicated code.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-8-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • After commit 4e41a30c6d50 ("mm: hwpoison: adjust for new thp
    refcounting"), put_hwpoison_page got reduced to a put_page. Let us just
    use put_page instead.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-7-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Since get_hwpoison_page is only used in memory-failure code now, let us
    un-export it and make it private to that code.

    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-5-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • hpage is never used after try_to_split_thp_page() in memory_failure(), so
    we don't have to update hpage. So let's not recalculate/use hpage.

    Suggested-by: "Aneesh Kumar K.V"
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: Aneesh Kumar K.V
    Cc: Aristeu Rozanski
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Dmitry Yakunin
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Cc: Qian Cai
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20200922135650.1634-3-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Patch series "HWPOISON: soft offline rework", v7.

    This patchset fixes a couple of issues that the patchset Naoya sent [1]
    contained due to rebasing problems and a misunterdansting.

    Main focus of this series is to stabilize soft offline. Historically soft
    offlined pages have suffered from racy conditions because PageHWPoison is
    used to a little too aggressively, which (directly or indirectly) invades
    other mm code which cares little about hwpoison. This results in
    unexpected behavior or kernel panic, which is very far from soft offline's
    "do not disturb userspace or other kernel component" policy. An example
    of this can be found here [2].

    Along with several cleanups, this code refactors and changes the way soft
    offline work. Main point of this change set is to contain target page
    "via buddy allocator" or in migrating path. For ther former we first free
    the target page as we do for normal pages, and once it has reached buddy
    and it has been taken off the freelists, we flag it as HWpoison. For the
    latter we never get to release the page in unmap_and_move, so the page is
    under our control and we can handle it in hwpoison code.

    [1] https://patchwork.kernel.org/cover/11704083/
    [2] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u

    This patch (of 14):

    Drop the PageHuge check, which is dead code since memory_failure() forks
    into memory_failure_hugetlb() for hugetlb pages.

    memory_failure() and memory_failure_hugetlb() shares some functions like
    hwpoison_user_mappings() and identify_page_state(), so they should
    properly handle 4kB page, thp, and hugetlb.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Oscar Salvador
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: Michal Hocko
    Cc: Tony Luck
    Cc: David Hildenbrand
    Cc: Aneesh Kumar K.V
    Cc: Dmitry Yakunin
    Cc: Qian Cai
    Cc: Dave Hansen
    Cc: "Aneesh Kumar K.V"
    Cc: Aristeu Rozanski
    Cc: Oscar Salvador
    Link: https://lkml.kernel.org/r/20200922135650.1634-1-osalvador@suse.de
    Link: https://lkml.kernel.org/r/20200922135650.1634-2-osalvador@suse.de
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

14 Oct, 2020

2 commits

  • Unlike others we don't use the marco writeback. so let's remove it to
    tame gcc warning:

    mm/memory-failure.c:827: warning: macro "writeback" is not used
    [-Wunused-macros]

    Signed-off-by: Alex Shi
    Signed-off-by: Andrew Morton
    Cc: Naoya Horiguchi
    Link: https://lkml.kernel.org/r/1599715096-20369-1-git-send-email-alex.shi@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Alex Shi
     
  • There is no need to calculate pgoff in each loop of for_each_process(), so
    move it to the place before for_each_process(), which can save some CPU
    cycles.

    Signed-off-by: Xianting Tian
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Link: http://lkml.kernel.org/r/20200818082647.34322-1-tian.xianting@h3c.com
    Signed-off-by: Linus Torvalds

    Xianting Tian
     

25 Sep, 2020

1 commit


13 Aug, 2020

1 commit

  • There are some similar functions for migration target allocation. Since
    there is no fundamental difference, it's better to keep just one rather
    than keeping all variants. This patch implements base migration target
    allocation function. In the following patches, variants will be converted
    to use this function.

    Changes should be mechanical, but, unfortunately, there are some
    differences. First, some callers' nodemask is assgined to NULL since NULL
    nodemask will be considered as all available nodes, that is,
    &node_states[N_MEMORY]. Second, for hugetlb page allocation, gfp_mask is
    redefined as regular hugetlb allocation gfp_mask plus __GFP_THISNODE if
    user provided gfp_mask has it. This is because future caller of this
    function requires to set this node constaint. Lastly, if provided nodeid
    is NUMA_NO_NODE, nodeid is set up to the node where migration source
    lives. It helps to remove simple wrappers for setting up the nodeid.

    Note that PageHighmem() call in previous function is changed to open-code
    "is_highmem_idx()" since it provides more readability.

    [akpm@linux-foundation.org: tweak patch title, per Vlastimil]
    [akpm@linux-foundation.org: fix typo in comment]

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Christoph Hellwig
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: Roman Gushchin
    Link: http://lkml.kernel.org/r/1594622517-20681-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Jun, 2020

2 commits

  • Action Required memory error should happen only when a processor is
    about to access to a corrupted memory, so it's synchronous and only
    affects current process/thread.

    Recently commit 872e9a205c84 ("mm, memory_failure: don't send
    BUS_MCEERR_AO for action required error") fixed the issue that Action
    Required memory could unnecessarily send SIGBUS to the processes which
    share the error memory. But we still have another issue that we could
    send SIGBUS to a wrong thread.

    This is because collect_procs() and task_early_kill() fails to add the
    current process to "to-kill" list. So this patch is suggesting to fix
    it. With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
    process/thread.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Acked-by: Tony Luck
    Acked-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-3-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Patch series "hwpoison: fixes signaling on memory error"

    This is a small patchset to solve issues in memory error handler to send
    SIGBUS to proper process/thread as expected in configuration. Please
    see descriptions in individual patches for more details.

    This patch (of 2):

    Early-kill policy is controlled from two types of settings, one is
    per-process setting prctl(PR_MCE_KILL) and the other is system-wide
    setting vm.memory_failure_early_kill. Users expect per-process setting
    to override system-wide setting as many other settings do, but
    early-kill setting doesn't work as such.

    For example, if a system configures vm.memory_failure_early_kill to 1
    (enabled), a process receives SIGBUS even if it's configured to
    explicitly disable PF_MCE_KILL by prctl(). That's not desirable for
    applications with their own policies.

    This patch is suggesting to change the priority of these two types of
    settings, by checking sysctl_memory_failure_early_kill only when a given
    process has the default kill policy.

    Note that this patch is solving a thread choice issue too.

    Originally, collect_procs() always chooses the main thread when
    vm.memory_failure_early_kill is 1, even if the process has a dedicated
    thread for memory error handling. SIGBUS should be sent to the
    dedicated thread if early-kill is enabled via
    vm.memory_failure_early_kill as we are doing for PR_MCE_KILL_EARLY
    processes.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Cc: Tony Luck
    Cc: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-1-git-send-email-naoya.horiguchi@nec.com
    Link: http://lkml.kernel.org/r/1591321039-22141-2-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

03 Jun, 2020

2 commits

  • Pull ACPI updates from Rafael Wysocki:
    "These update the ACPICA code in the kernel to upstream revision
    20200430, fix several reference counting errors related to ACPI
    tables, add _Exx / _Lxx support to the GED driver, add a new
    acpi_evaluate_reg() helper, add new DPTF battery participant driver
    and extend the DPFT power participant driver, improve the handling of
    memory failures in the APEI code, add a blacklist entry to the
    backlight driver, update the PMIC driver and the processor idle
    driver, fix two kobject reference count leaks, and make a few janitory
    changes.

    Specifics:

    - Update the ACPICA code in the kernel to upstream revision 20200430:

    - Move acpi_gbl_next_cmd_num definition (Erik Kaneda).

    - Ignore AE_ALREADY_EXISTS status in the disassembler when parsing
    create operators (Erik Kaneda).

    - Add status checks to the dispatcher (Erik Kaneda).

    - Fix required parameters for _NIG and _NIH (Erik Kaneda).

    - Make acpi_protocol_lengths static (Yue Haibing).

    - Fix ACPI table reference counting errors in several places, mostly
    in error code paths (Hanjun Guo).

    - Extend the Generic Event Device (GED) driver to support _Exx and
    _Lxx handler methods (Ard Biesheuvel).

    - Add new acpi_evaluate_reg() helper and modify the ACPI PCI hotplug
    code to use it (Hans de Goede).

    - Add new DPTF battery participant driver and make the DPFT power
    participant driver create more sysfs device attributes (Srinivas
    Pandruvada).

    - Improve the handling of memory failures in APEI (James Morse).

    - Add new blacklist entry for Acer TravelMate 5735Z to the backlight
    driver (Paul Menzel).

    - Add i2c address for thermal control to the PMIC driver (Mauro
    Carvalho Chehab).

    - Allow the ACPI processor idle driver to work on platforms with only
    one ACPI C-state present (Zhang Rui).

    - Fix kobject reference count leaks in error code paths in two places
    (Qiushi Wu).

    - Delete unused proc filename macros and make some symbols static
    (Pascal Terjan, Zheng Zengkai, Zou Wei)"

    * tag 'acpi-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
    ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe()
    ACPI: sysfs: Fix reference count leak in acpi_sysfs_add_hotplug_profile()
    ACPI: GED: use correct trigger type field in _Exx / _Lxx handling
    ACPI: DPTF: Add battery participant driver
    ACPI: DPTF: Additional sysfs attributes for power participant driver
    ACPI: video: Use native backlight on Acer TravelMate 5735Z
    arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
    ACPI: APEI: Kick the memory_failure() queue for synchronous errors
    mm/memory-failure: Add memory_failure_queue_kick()
    ACPI / PMIC: Add i2c address for thermal control
    ACPI: GED: add support for _Exx / _Lxx handler methods
    ACPI: Delete unused proc filename macros
    ACPI: hotplug: PCI: Use the new acpi_evaluate_reg() helper
    ACPI: utils: Add acpi_evaluate_reg() helper
    ACPI: debug: Make two functions static
    ACPI: sleep: Put the FACS table after using it
    ACPI: scan: Put SPCR and STAO table after using it
    ACPI: EC: Put the ACPI table after using it
    ACPI: APEI: Put the HEST table for error path
    ACPI: APEI: Put the error record serialization table for error path
    ...

    Linus Torvalds
     
  • Some processes dont't want to be killed early, but in "Action Required"
    case, those also may be killed by BUS_MCEERR_AO when sharing memory with
    other which is accessing the fail memory. And sending SIGBUS with
    BUS_MCEERR_AO for action required error is strange, so ignore the
    non-current processes here.

    Suggested-by: Naoya Horiguchi
    Signed-off-by: Wetp Zhang
    Signed-off-by: Andrew Morton
    Acked-by: Naoya Horiguchi
    Acked-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1590817116-21281-1-git-send-email-wetp.zy@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Wetp Zhang
     

20 May, 2020

1 commit

  • The GHES code calls memory_failure_queue() from IRQ context to schedule
    work on the current CPU so that memory_failure() can sleep.

    For synchronous memory errors the arch code needs to know any signals
    that memory_failure() will trigger are pending before it returns to
    user-space, possibly when exiting from the IRQ.

    Add a helper to kick the memory failure queue, to ensure the scheduled
    work has happened. This has to be called from process context, so may
    have been migrated from the original cpu. Pass the cpu the work was
    queued on.

    Change memory_failure_work_func() to permit being called on the 'wrong'
    cpu.

    Signed-off-by: James Morse
    Tested-by: Tyler Baicar
    Acked-by: Naoya Horiguchi
    Signed-off-by: Rafael J. Wysocki

    James Morse
     

08 Apr, 2020

1 commit

  • Some comments for MADV_FREE is revised and added to help people understand
    the MADV_FREE code, especially the page flag, PG_swapbacked. This makes
    page_is_file_cache() isn't consistent with its comments. So the function
    is renamed to page_is_file_lru() to make them consistent again. All these
    are put in one patch as one logical change.

    Suggested-by: David Hildenbrand
    Suggested-by: Johannes Weiner
    Suggested-by: David Rientjes
    Signed-off-by: "Huang, Ying"
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Acked-by: Pankaj Gupta
    Acked-by: Vlastimil Babka
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Link: http://lkml.kernel.org/r/20200317100342.2730705-1-ying.huang@intel.com
    Signed-off-by: Linus Torvalds

    Huang Ying
     

03 Apr, 2020

1 commit

  • Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.

    While discussing the issue with huge_pte_offset [1], I remembered that
    there were more outstanding hugetlb races. These issues are:

    1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
    invalid via a call to huge_pmd_unshare by another thread.
    2) hugetlbfs page faults can race with truncation causing invalid global
    reserve counts and state.

    A previous attempt was made to use i_mmap_rwsem in this manner as
    described at [2]. However, those patches were reverted starting with [3]
    due to locking issues.

    To effectively use i_mmap_rwsem to address the above issues it needs to be
    held (in read mode) during page fault processing. However, during fault
    processing we need to lock the page we will be adding. Lock ordering
    requires we take page lock before i_mmap_rwsem. Waiting until after
    taking the page lock is too late in the fault process for the
    synchronization we want to do.

    To address this lock ordering issue, the following patches change the lock
    ordering for hugetlb pages. This is not too invasive as hugetlbfs
    processing is done separate from core mm in many places. However, I don't
    really like this idea. Much ugliness is contained in the new routine
    hugetlb_page_mapping_lock_write() of patch 1.

    The only other way I can think of to address these issues is by catching
    all the races. After catching a race, cleanup, backout, retry ... etc,
    as needed. This can get really ugly, especially for huge page
    reservations. At one time, I started writing some of the reservation
    backout code for page faults and it got so ugly and complicated I went
    down the path of adding synchronization to avoid the races. Any other
    suggestions would be welcome.

    [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
    [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
    [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
    [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
    [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/

    This patch (of 2):

    While looking at BUGs associated with invalid huge page map counts, it was
    discovered and observed that a huge pte pointer could become 'invalid' and
    point to another task's page table. Consider the following:

    A task takes a page fault on a shared hugetlbfs file and calls
    huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
    shared pmd.

    Now, another task truncates the hugetlbfs file. As part of truncation, it
    unmaps everyone who has the file mapped. If the range being truncated is
    covered by a shared pmd, huge_pmd_unshare will be called. For all but the
    last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
    to the pmd. If the task in the middle of the page fault is not the last
    user, the ptep returned by huge_pte_alloc now points to another task's
    page table or worse. This leads to bad things such as incorrect page
    map/reference counts or invalid memory references.

    To fix, expand the use of i_mmap_rwsem as follows:
    - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
    huge_pmd_share is only called via huge_pte_alloc, so callers of
    huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
    of huge_pte_alloc continue to hold the semaphore until finished with
    the ptep.
    - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.

    One problem with this scheme is that it requires taking i_mmap_rwsem
    before taking the page lock during page faults. This is not the order
    specified in the rest of mm code. Handling of hugetlbfs pages is mostly
    isolated today. Therefore, we use this alternative locking order for
    PageHuge() pages.

    mapping->i_mmap_rwsem
    hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
    page->flags PG_locked (lock_page)

    To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
    introduced to write lock the i_mmap_rwsem associated with a page.

    In most cases it is easy to get address_space via vma->vm_file->f_mapping.
    However, in the case of migration or memory errors for anon pages we do
    not have an associated vma. A new routine _get_hugetlb_page_mapping()
    will use anon_vma to get address_space in these cases.

    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K . V"
    Cc: Andrea Arcangeli
    Cc: "Kirill A . Shutemov"
    Cc: Davidlohr Bueso
    Cc: Prakash Sangappa
    Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

02 Dec, 2019

3 commits

  • page_shift() is supported after the commit 94ad9338109f ("mm: introduce
    page_shift()").

    So replace with page_shift() in add_to_kill() for readability.

    Link: http://lkml.kernel.org/r/543d8bc9-f2e7-3023-7c35-2e7ed67c0e82@huawei.com
    Signed-off-by: Yunfeng Ye
    Reviewed-by: David Hildenbrand
    Acked-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yunfeng Ye
     
  • Currently soft_offline_page() receives struct page, and its sibling
    memory_failure() receives pfn. This discrepancy looks weird and makes
    precheck on pfn validity tricky. So let's align them.

    Link: http://lkml.kernel.org/r/20191016234706.GA5493@www9186uo.sakura.ne.jp
    Signed-off-by: Naoya Horiguchi
    Acked-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • add_to_kill() expects the first 'tk' to be pre-allocated, it makes
    subsequent allocations on need basis, this makes the code a bit
    difficult to read.

    Move all the allocation internal to add_to_kill() and drop the **tk
    argument.

    Link: http://lkml.kernel.org/r/1565112345-28754-2-git-send-email-jane.chu@oracle.com
    Signed-off-by: Jane Chu
    Reviewed-by: Dan Williams
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jane Chu
     

19 Oct, 2019

1 commit

  • We should check for pfn_to_online_page() to not access uninitialized
    memmaps. Reshuffle the code so we don't have to duplicate the error
    message.

    Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
    Signed-off-by: David Hildenbrand
    Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: [4.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

15 Oct, 2019

1 commit

  • Mmap /dev/dax more than once, then read the poison location using
    address from one of the mappings. The other mappings due to not having
    the page mapped in will cause SIGKILLs delivered to the process.
    SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
    handle the UE.

    Although one may add MAP_POPULATE to mmap(2) to work around the issue,
    MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
    isn't always an option.

    Details -

    ndctl inject-error --block=10 --count=1 namespace6.0

    ./read_poison -x dax6.0 -o 5120 -m 2
    mmaped address 0x7f5bb6600000
    mmaped address 0x7f3cf3600000
    doing local read at address 0x7f3cf3601400
    Killed

    Console messages in instrumented kernel -

    mce: Uncorrected hardware memory error in user-access at edbe201400
    Memory failure: tk->addr = 7f5bb6601000
    Memory failure: address edbe201: call dev_pagemap_mapping_shift
    dev_pagemap_mapping_shift: page edbe201: no PUD
    Memory failure: tk->size_shift == 0
    Memory failure: Unable to find user space address edbe201 in read_poison
    Memory failure: tk->addr = 7f3cf3601000
    Memory failure: address edbe201: call dev_pagemap_mapping_shift
    Memory failure: tk->size_shift = 21
    Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
    => to deliver SIGKILL
    Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
    => to deliver SIGBUS

    Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com
    Signed-off-by: Jane Chu
    Suggested-by: Naoya Horiguchi
    Reviewed-by: Dan Williams
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jane Chu
     

15 Jul, 2019

1 commit

  • Pull HMM updates from Jason Gunthorpe:
    "Improvements and bug fixes for the hmm interface in the kernel:

    - Improve clarity, locking and APIs related to the 'hmm mirror'
    feature merged last cycle. In linux-next we now see AMDGPU and
    nouveau to be using this API.

    - Remove old or transitional hmm APIs. These are hold overs from the
    past with no users, or APIs that existed only to manage cross tree
    conflicts. There are still a few more of these cleanups that didn't
    make the merge window cut off.

    - Improve some core mm APIs:
    - export alloc_pages_vma() for driver use
    - refactor into devm_request_free_mem_region() to manage
    DEVICE_PRIVATE resource reservations
    - refactor duplicative driver code into the core dev_pagemap
    struct

    - Remove hmm wrappers of improved core mm APIs, instead have drivers
    use the simplified API directly

    - Remove DEVICE_PUBLIC

    - Simplify the kconfig flow for the hmm users and core code"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
    mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
    mm: remove the HMM config option
    mm: sort out the DEVICE_PRIVATE Kconfig mess
    mm: simplify ZONE_DEVICE page private data
    mm: remove hmm_devmem_add
    mm: remove hmm_vma_alloc_locked_page
    nouveau: use devm_memremap_pages directly
    nouveau: use alloc_page_vma directly
    PCI/P2PDMA: use the dev_pagemap internal refcount
    device-dax: use the dev_pagemap internal refcount
    memremap: provide an optional internal refcount in struct dev_pagemap
    memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
    memremap: remove the data field in struct dev_pagemap
    memremap: add a migrate_to_ram method to struct dev_pagemap_ops
    memremap: lift the devmap_enable manipulation into devm_memremap_pages
    memremap: pass a struct dev_pagemap to ->kill and ->cleanup
    memremap: move dev_pagemap callbacks into a separate structure
    memremap: validate the pagemap type passed to devm_memremap_pages
    mm: factor out a devm_request_free_mem_region helper
    mm: export alloc_pages_vma
    ...

    Linus Torvalds
     

13 Jul, 2019

1 commit

  • Some user who install SIGBUS handler that does longjmp out therefore
    keeping the process alive is confused by the error message

    "[188988.765862] Memory failure: 0x1840200: Killing cellsrv:33395 due to hardware memory corruption"

    Slightly modify the error message to improve clarity.

    Link: http://lkml.kernel.org/r/1558403523-22079-1-git-send-email-jane.chu@oracle.com
    Signed-off-by: Jane Chu
    Acked-by: Naoya Horiguchi
    Acked-by: Pankaj Gupta
    Reviewed-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jane Chu
     

09 Jul, 2019

1 commit

  • …iederm/user-namespace

    Pull force_sig() argument change from Eric Biederman:
    "A source of error over the years has been that force_sig has taken a
    task parameter when it is only safe to use force_sig with the current
    task.

    The force_sig function is built for delivering synchronous signals
    such as SIGSEGV where the userspace application caused a synchronous
    fault (such as a page fault) and the kernel responded with a signal.

    Because the name force_sig does not make this clear, and because the
    force_sig takes a task parameter the function force_sig has been
    abused for sending other kinds of signals over the years. Slowly those
    have been fixed when the oopses have been tracked down.

    This set of changes fixes the remaining abusers of force_sig and
    carefully rips out the task parameter from force_sig and friends
    making this kind of error almost impossible in the future"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
    signal: Remove the signal number and task parameters from force_sig_info
    signal: Factor force_sig_info_to_task out of force_sig_info
    signal: Generate the siginfo in force_sig
    signal: Move the computation of force into send_signal and correct it.
    signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
    signal: Remove the task parameter from force_sig_fault
    signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
    signal: Explicitly call force_sig_fault on current
    signal/unicore32: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from ptrace_break
    signal/nds32: Remove tsk parameter from send_sigtrap
    signal/riscv: Remove tsk parameter from do_trap
    signal/sh: Remove tsk parameter from force_sig_info_fault
    signal/um: Remove task parameter from send_sigtrap
    signal/x86: Remove task parameter from send_sigtrap
    signal: Remove task parameter from force_sig_mceerr
    signal: Remove task parameter from force_sig
    signal: Remove task parameter from force_sigsegv
    ...

    Linus Torvalds
     

03 Jul, 2019

1 commit

  • The code hasn't been used since it was added to the tree, and doesn't
    appear to actually be usable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

29 Jun, 2019

2 commits

  • madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
    for hugepages with overcommitting enabled. That was caused by the
    suboptimal code in current soft-offline code. See the following part:

    ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
    MIGRATE_SYNC, MR_MEMORY_FAILURE);
    if (ret) {
    ...
    } else {
    /*
    * We set PG_hwpoison only when the migration source hugepage
    * was successfully dissolved, because otherwise hwpoisoned
    * hugepage remains on free hugepage list, then userspace will
    * find it as SIGBUS by allocation failure. That's not expected
    * in soft-offlining.
    */
    ret = dissolve_free_huge_page(page);
    if (!ret) {
    if (set_hwpoison_free_buddy_page(page))
    num_poisoned_pages_inc();
    }
    }
    return ret;

    Here dissolve_free_huge_page() returns -EBUSY if the migration source page
    was freed into buddy in migrate_pages(), but even in that case we actually
    has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
    current code gives up offlining too early now.

    dissolve_free_huge_page() checks that a given hugepage is suitable for
    dissolving, where we should return success for !PageHuge() case because
    the given hugepage is considered as already dissolved.

    This change also affects other callers of dissolve_free_huge_page(), which
    are cleaned up together.

    [n-horiguchi@ah.jp.nec.com: v3]
    Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp.nec.comLink: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com
    Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
    Signed-off-by: Naoya Horiguchi
    Reported-by: Chen, Jerry T
    Tested-by: Chen, Jerry T
    Reviewed-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Cc: Michal Hocko
    Cc: Xishi Qiu
    Cc: "Chen, Jerry T"
    Cc: "Zhuo, Qiuxu"
    Cc: [4.19+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The pass/fail of soft offline should be judged by checking whether the
    raw error page was finally contained or not (i.e. the result of
    set_hwpoison_free_buddy_page()), but current code do not work like
    that. It might lead us to misjudge the test result when
    set_hwpoison_free_buddy_page() fails.

    Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
    not offline the original page and will not return an error.

    Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.com
    Signed-off-by: Naoya Horiguchi
    Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
    Reviewed-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Cc: Michal Hocko
    Cc: Xishi Qiu
    Cc: "Chen, Jerry T"
    Cc: "Zhuo, Qiuxu"
    Cc: [4.19+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this software may be redistributed and or modified under the terms
    of the gnu general public license gpl version 2 only as published by
    the free software foundation

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141333.676969322@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

27 May, 2019

1 commit