04 Oct, 2022

40 commits

  • Create the new routine remove_inode_single_folio that will remove a single
    folio from a file. This is refactored code from remove_inode_hugepages.
    It checks for the uncommon case in which the folio is still mapped and
    unmaps.

    No functional change. This refactoring will be put to use and expanded
    upon in a subsequent patches.

    Link: https://lkml.kernel.org/r/20220914221810.95771-5-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Miaohe Lin
    Cc: Andrea Arcangeli
    Cc: "Aneesh Kumar K.V"
    Cc: Axel Rasmussen
    Cc: David Hildenbrand
    Cc: Davidlohr Bueso
    Cc: James Houghton
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Muchun Song
    Cc: Naoya Horiguchi
    Cc: Pasha Tatashin
    Cc: Peter Xu
    Cc: Prakash Sangappa
    Cc: Sven Schnelle
    Signed-off-by: Andrew Morton

    Mike Kravetz
     
  • remove_huge_page removes a hugetlb page from the page cache. Change to
    hugetlb_delete_from_page_cache as it is a more descriptive name.
    huge_add_to_page_cache is global in scope, but only deals with hugetlb
    pages. For consistency and clarity, rename to hugetlb_add_to_page_cache.

    Link: https://lkml.kernel.org/r/20220914221810.95771-4-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Miaohe Lin
    Cc: Andrea Arcangeli
    Cc: "Aneesh Kumar K.V"
    Cc: Axel Rasmussen
    Cc: David Hildenbrand
    Cc: Davidlohr Bueso
    Cc: James Houghton
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Muchun Song
    Cc: Naoya Horiguchi
    Cc: Pasha Tatashin
    Cc: Peter Xu
    Cc: Prakash Sangappa
    Cc: Sven Schnelle
    Signed-off-by: Andrew Morton

    Mike Kravetz
     
  • Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
    synchronization") added code to take i_mmap_rwsem in read mode for the
    duration of fault processing. However, this has been shown to cause
    performance/scaling issues. Revert the code and go back to only taking
    the semaphore in huge_pmd_share during the fault path.

    Keep the code that takes i_mmap_rwsem in write mode before calling
    try_to_unmap as this is required if huge_pmd_unshare is called.

    NOTE: Reverting this code does expose the following race condition.

    Faulting thread Unsharing thread
    ... ...
    ptep = huge_pte_offset()
    or
    ptep = huge_pte_alloc()
    ...
    i_mmap_lock_write
    lock page table
    ptep invalid
    Reviewed-by: Miaohe Lin
    Cc: Andrea Arcangeli
    Cc: "Aneesh Kumar K.V"
    Cc: Axel Rasmussen
    Cc: David Hildenbrand
    Cc: Davidlohr Bueso
    Cc: James Houghton
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Muchun Song
    Cc: Naoya Horiguchi
    Cc: Pasha Tatashin
    Cc: Peter Xu
    Cc: Prakash Sangappa
    Cc: Sven Schnelle
    Signed-off-by: Andrew Morton

    Mike Kravetz
     
  • Patch series "hugetlb: Use new vma lock for huge pmd sharing
    synchronization", v2.

    hugetlb fault scalability regressions have recently been reported [1].
    This is not the first such report, as regressions were also noted when
    commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
    synchronization") was added [2] in v5.7. At that time, a proposal to
    address the regression was suggested [3] but went nowhere.

    The regression and benefit of this patch series is not evident when
    using the vm_scalability benchmark reported in [2] on a recent kernel.
    Results from running,
    "./usemem -n 48 --prealloc --prefault -O -U 3448054972"

    48 sample Avg
    next-20220913 next-20220913 next-20220913
    unmodified revert i_mmap_sema locking vma sema locking, this series
    -----------------------------------------------------------------------------
    498150 KB/s 501934 KB/s 504793 KB/s

    The recent regression report [1] notes page fault and fork latency of
    shared hugetlb mappings. To measure this, I created two simple programs:
    1) map a shared hugetlb area, write fault all pages, unmap area
    Do this in a continuous loop to measure faults per second
    2) map a shared hugetlb area, write fault a few pages, fork and exit
    Do this in a continuous loop to measure forks per second
    These programs were run on a 48 CPU VM with 320GB memory. The shared
    mapping size was 250GB. For comparison, a single instance of the program
    was run. Then, multiple instances were run in parallel to introduce
    lock contention. Changing the locking scheme results in a significant
    performance benefit.

    test instances unmodified revert vma
    --------------------------------------------------------------------------
    faults per sec 1 393043 395680 389932
    faults per sec 24 71405 81191 79048
    forks per sec 1 2802 2747 2725
    forks per sec 24 439 536 500
    Combined faults 24 1621 68070 53662
    Combined forks 24 358 67 142

    Combined test is when running both faulting program and forking program
    simultaneously.

    Patches 1 and 2 of this series revert c0d0381ade79 and 87bf91d39bb5 which
    depends on c0d0381ade79. Acquisition of i_mmap_rwsem is still required in
    the fault path to establish pmd sharing, so this is moved back to
    huge_pmd_share. With c0d0381ade79 reverted, this race is exposed:

    Faulting thread Unsharing thread
    ... ...
    ptep = huge_pte_offset()
    or
    ptep = huge_pte_alloc()
    ...
    i_mmap_lock_write
    lock page table
    ptep invalid
    Reviewed-by: Miaohe Lin
    Cc: Andrea Arcangeli
    Cc: "Aneesh Kumar K.V"
    Cc: Axel Rasmussen
    Cc: David Hildenbrand
    Cc: Davidlohr Bueso
    Cc: James Houghton
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Muchun Song
    Cc: Naoya Horiguchi
    Cc: Pasha Tatashin
    Cc: Peter Xu
    Cc: Prakash Sangappa
    Cc: Sven Schnelle
    Signed-off-by: Andrew Morton

    Mike Kravetz
     
  • Pointer variables allocate memory first, and then judge. There is no need
    to initialize the assignment.

    Link: https://lkml.kernel.org/r/20220914012113.6271-1-xupengfei@nfschina.com
    Signed-off-by: XU pengfei
    Reviewed-by: Muchun Song
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton

    XU pengfei
     
  • It's only used in mm/filemap.c, since commit
    ("mm/migrate.c: rework migration_entry_wait() to not take a pageref").

    Make it static.

    Link: https://lkml.kernel.org/r/20220914021738.3228011-1-sunke@kylinos.cn
    Signed-off-by: Ke Sun
    Reported-by: k2ci
    Signed-off-by: Andrew Morton

    Ke Sun
     
  • The memory-notify-based approach aims to handle meory-less nodes, however,
    it just adds the complexity of code as pointed by David in thread [1].
    The handling of memory-less nodes is introduced by commit 4faf8d950ec4
    ("hugetlb: handle memory hot-plug events"). >From its commit message, we
    cannot find any necessity of handling this case. So, we can simply
    register/unregister sysfs entries in register_node/unregister_node to
    simlify the code.

    BTW, hotplug callback added because in hugetlb_register_all_nodes() we
    register sysfs nodes only for N_MEMORY nodes, seeing commit 9b5e5d0fdc91,
    which said it was a preparation for handling memory-less nodes via memory
    hotplug. Since we want to remove memory hotplug, so make sure we only
    register per-node sysfs for online (N_ONLINE) nodes in
    hugetlb_register_all_nodes().

    https://lore.kernel.org/linux-mm/60933ffc-b850-976c-78a0-0ee6e0ea9ef0@redhat.com/ [1]
    Link: https://lkml.kernel.org/r/20220914072603.60293-3-songmuchun@bytedance.com
    Suggested-by: David Hildenbrand
    Signed-off-by: Muchun Song
    Acked-by: David Hildenbrand
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Rafael J. Wysocki
    Signed-off-by: Andrew Morton

    Muchun Song
     
  • Patch series "simplify handling of per-node sysfs creation and removal",
    v4.

    This patch (of 2):

    The following commit offload per-node sysfs creation and removal to a
    kworker and did not say why it is needed. And it also said "I don't know
    that this is absolutely required". It seems like the author was not sure
    as well. Since it only complicates the code, this patch will revert the
    changes to simplify the code.

    39da08cb074c ("hugetlb: offload per node attribute registrations")

    We could use memory hotplug notifier to do per-node sysfs creation and
    removal instead of inserting those operations to node registration and
    unregistration. Then, it can reduce the code coupling between node.c and
    hugetlb.c. Also, it can simplify the code.

    Link: https://lkml.kernel.org/r/20220914072603.60293-1-songmuchun@bytedance.com
    Link: https://lkml.kernel.org/r/20220914072603.60293-2-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Acked-by: Mike Kravetz
    Acked-by: David Hildenbrand
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Muchun Song
    Cc: Oscar Salvador
    Cc: Rafael J. Wysocki
    Signed-off-by: Andrew Morton

    Muchun Song
     
  • Replace the simple calculation with PAGE_ALIGN.

    Link: https://lkml.kernel.org/r/20220913015505.1998958-1-zuoze1@huawei.com
    Signed-off-by: ze zuo
    Reviewed-by: Muchun Song
    Signed-off-by: Andrew Morton

    ze zuo
     
  • Cc: Catalin Marinas
    Cc: ke.wang
    Cc: Matthew Wilcox
    Cc: Zhaoyang Huang
    Signed-off-by: Andrew Morton

    Andrew Morton
     
  • The name "check_free_page()" provides no information regarding its return
    value when the page is indeed found to be bad.

    Renaming it to "free_page_is_bad()" makes it clear that a `true' return
    value means the page was bad.

    And make it return a bool, not an int.

    [akpm@linux-foundation.org: don't use bool as int]
    Cc: Catalin Marinas
    Cc: ke.wang
    Cc: Matthew Wilcox
    Cc: Zhaoyang Huang
    Signed-off-by: Andrew Morton

    Andrew Morton
     
  • Use kstrtobool which is more powerful to handle all kinds of parameters
    like 'Yy1Nn0' or [oO][NnFf] for "on" and "off".

    Link: https://lkml.kernel.org/r/20220913071358.1812206-1-liushixin2@huawei.com
    Signed-off-by: Liu Shixin
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Jonathan Corbet
    Cc: Kefeng Wang
    Cc: Muchun Song
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton

    Liu Shixin
     
  • When the 'kdamond_wait_activation()' function or 'after_sampling()' or
    'after_aggregation()' DAMON callbacks return an error, it is unnecessary
    to use bool 'done' to check if kdamond should be finished. This commit
    simplifies the kdamond stop mechanism by removing 'done' and break the
    while loop directly in the cases.

    Link: https://lkml.kernel.org/r/1663060287-30201-4-git-send-email-kaixuxia@tencent.com
    Signed-off-by: Kaixu Xia
    Reviewed-by: SeongJae Park
    Signed-off-by: Andrew Morton

    Kaixu Xia
     
  • We can initialize the variable 'pid' with '-1' in pid_show() to simplify
    the variable assignment operation and make the code more readable.

    Link: https://lkml.kernel.org/r/1663060287-30201-3-git-send-email-kaixuxia@tencent.com
    Signed-off-by: Kaixu Xia
    Reviewed-by: SeongJae Park
    Signed-off-by: Andrew Morton

    Kaixu Xia
     
  • Patch series "mm/damon: code simplifications and cleanups".

    This patchset contains some code simplifications and cleanups for DAMON.

    This patch (of 4):

    The parameter 'struct damon_ctx *ctx' isn't used in the functions
    __damon_{p,v}a_prepare_access_check(), so we can remove it and simplify
    the parameter passing.

    Link: https://lkml.kernel.org/r/1663060287-30201-1-git-send-email-kaixuxia@tencent.com
    Link: https://lkml.kernel.org/r/1663060287-30201-2-git-send-email-kaixuxia@tencent.com
    Signed-off-by: Kaixu Xia
    Reviewed-by: SeongJae Park
    Signed-off-by: Andrew Morton

    Kaixu Xia
     
  • damon_lru_sort_new_{hot,cold}_scheme() have quite a lot of duplicates.
    This commit factors out the duplicate to a separate function and use it
    for reducing the duplicate.

    Link: https://lkml.kernel.org/r/20220913174449.50645-23-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_LRU_SORT to generate the module parameters for
    DAMOS watermarks using the generator macro to simplify the code and reduce
    duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-22-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_RECLAIM to generate the module parameters for
    DAMOS quotas using the generator macro to simplify the code and reduce
    duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-21-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_LRU_SORT have module parameters for DAMOS time quota only but size
    quota. This commit implements a macro for generating the module
    parameters so that we can reuse later.

    Link: https://lkml.kernel.org/r/20220913174449.50645-20-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_RECLAIM and DAMON_LRU_SORT have module parameters for DAMOS quotas
    that having same names. This commit implements a macro for generating
    such module parameters so that we can reuse later.

    Link: https://lkml.kernel.org/r/20220913174449.50645-19-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_LRU_SORT to generate the module parameters for
    DAMOS statistics using the generator macro to simplify the code and reduce
    duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-18-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_RECLAIM to generate the module parameters for
    DAMOS statistics using the generator macro to simplify the code and
    reduce duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-17-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_RECLAIM and DAMON_LRU_SORT have module parameters for DAMOS
    statistics that having same names. This commit implements a macro for
    generating such module parameters so that we can reuse later.

    Link: https://lkml.kernel.org/r/20220913174449.50645-16-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_RECLAIM to generate the module parameters for
    DAMOS watermarks using the generator macro to simplify the code and reduce
    duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-15-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_LRU_SORT to generate the module parameters for
    DAMOS watermarks using the generator macro to simplify the code and reduce
    duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-14-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_RECLAIM and DAMON_LRU_SORT have module parameters for watermarks
    that having same names. This commit implements a macro for generating
    such module parameters so that we can reuse later.

    Link: https://lkml.kernel.org/r/20220913174449.50645-13-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_RECLAIM to generate the module parameters for
    DAMON monitoring attributes using the generator macro to simplify the code
    and reduce duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-12-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • This commit makes DAMON_LRU_SORT to generate the module parameters for
    DAMON monitoring attributes using the generator macro to simplify the code
    and reduce duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-11-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_RECLAIM and DAMON_LRU_SORT have module parameters for monitoring
    attributes that having same names. This commot implements a macro for
    generating such module parameters so that we can reuse later.

    Link: https://lkml.kernel.org/r/20220913174449.50645-10-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_LRU_SORT receives monitoring attributes by parameters one by one to
    separate variables, and then combines those into 'struct damon_attrs'.
    This commit makes the module directly stores the parameter values to a
    static 'struct damon_attrs' variable and use it to simplify the code.

    Link: https://lkml.kernel.org/r/20220913174449.50645-9-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON_RECLAIM receives monitoring attributes by parameters one by one to
    separate variables, and then combine those into 'struct damon_attrs'.
    This commit makes the module directly stores the parameter values to a
    static 'struct damon_attrs' variable and use it to simplify the code.

    Link: https://lkml.kernel.org/r/20220913174449.50645-8-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • Number of parameters for 'damon_set_attrs()' is six. As it could be
    confusing and verbose, this commit reduces the number by receiving single
    pointer to a 'struct damon_attrs'.

    Link: https://lkml.kernel.org/r/20220913174449.50645-7-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • DAMON monitoring attributes are directly defined as fields of 'struct
    damon_ctx'. This makes 'struct damon_ctx' a little long and complicated.
    This commit defines and uses a struct, 'struct damon_attrs', which is
    dedicated for only the monitoring attributes to make the purpose of the
    five values clearer and simplify 'struct damon_ctx'.

    Link: https://lkml.kernel.org/r/20220913174449.50645-6-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • The 'struct damos' creation function, 'damon_new_scheme()', does
    initialization of private fileds of 'struct damos_quota' in it. As its
    verbose and makes the function unnecessarily long, this commit factors it
    out to separate function.

    Link: https://lkml.kernel.org/r/20220913174449.50645-5-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • The function for new 'struct damos' creation, 'damon_new_scheme()', copies
    each field of the struct one by one, though it could simply copied via
    struct to struct. This commit replaces the unnecessarily verbose
    field-to-field copies with struct-to-struct copies to make code simple and
    short.

    Link: https://lkml.kernel.org/r/20220913174449.50645-4-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • The bodies of damon_pa_{mark_accessed,deactivate_pages}() contains
    duplicates. This commit factors out the common part to a separate
    function and removes the duplicates.

    Link: https://lkml.kernel.org/r/20220913174449.50645-3-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • Patch series "mm/damon: cleanup code".

    DAMON code was not so clean from the beginning, but it has been too much
    nowadays, especially due to the duplicates in DAMON_RECLAIM and
    DAMON_LRU_SORT. This patchset cleans some of the mess.

    This patch (of 22):

    The 'switch-case' statement in 'damon_va_apply_scheme()' function provides
    a 'case' for every supported DAMOS action while all not-yet-supported
    DAMOS actions fall through the 'default' case, and comment it so that
    people can easily know which actions are supported. Its counterpart in
    'paddr', 'damon_pa_apply_scheme()', however, doesn't. This commit makes
    the 'paddr' side function follows the pattern of 'vaddr' for better
    readability and consistency.

    Link: https://lkml.kernel.org/r/20220913174449.50645-1-sj@kernel.org
    Link: https://lkml.kernel.org/r/20220913174449.50645-2-sj@kernel.org
    Signed-off-by: SeongJae Park
    Signed-off-by: Andrew Morton

    SeongJae Park
     
  • In damon_lru_sort_apply_parameters(), we can use damon_set_schemes() to
    replace the way of creating the first 'scheme' in original code, this
    makes the code look cleaner.

    Link: https://lkml.kernel.org/r/20220911005917.835-1-xhao@linux.alibaba.com
    Signed-off-by: Xin Hao
    Reviewed-by: SeongJae Park
    Signed-off-by: Andrew Morton

    Xin Hao
     
  • Several trivial fixups (that I should have spotted during review).

    Link: https://lkml.kernel.org/r/20220914052033.838050-1-senozhatsky@chromium.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton

    Sergey Senozhatsky
     
  • zram_table_entry::flags stores object size in the lower bits and zram
    pageflags in the upper bits. However, for some reason, we use 24 lower
    bits, while maximum zram object size is PAGE_SIZE, which requires
    PAGE_SHIFT bits (up to 16 on arm64). This wastes 24 - PAGE_SHIFT bits
    that we can use for additional zram pageflags instead.

    Also add a BUILD_BUG_ON() to alert us should we run out of bits in
    zram_table_entry::flags.

    Link: https://lkml.kernel.org/r/20220912152744.527438-1-senozhatsky@chromium.org
    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Brian Geffon
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton

    Sergey Senozhatsky