02 Oct, 2020

1 commit

  • CMA allocations will fail if 'pinned' pages are in a CMA area, since
    we cannot migrate pinned pages. The _refcount of a struct page being
    greater than _mapcount for that page can cause pinning for anonymous
    pages. This is because try_to_unmap(), which (1) is called in the CMA
    allocation path, and (2) decrements both _refcount and _mapcount for a
    page, will stop unmapping a page from VMAs once the _mapcount for a
    page reaches 0. This implies that after try_to_unmap() has finished
    successfully for a page where _recount > _mapcount, that _refcount
    will be greater than 0. Later in the CMA allocation path in
    migrate_page_move_mapping(), we will have one more reference count
    than intended for anonymous pages, meaning the allocation will fail
    for that page.

    One example of where _refcount can be greater than _mapcount for a
    page we would not expect to be pinned is inside of copy_one_pte(),
    which is called during a fork. For ptes for which pte_present(pte) ==
    true, copy_one_pte() will increment the _refcount field followed by
    the _mapcount field of a page. If the process doing copy_one_pte() is
    context switched out after incrementing _refcount but before
    incrementing _mapcount, then the page will be temporarily pinned.

    So, inside of cma_alloc(), instead of giving up when
    alloc_contig_range() returns -EBUSY after having scanned a whole
    CMA-region bitmap, perform retries with sleeps to give the system an
    opportunity to unpin any pinned pages.

    Additionally, based off feedback by Minchan Kim, add the ability to
    exit early if a fatal signal is pending (this is a delta from the
    mailing-list version of this patch).

    Bug: 168521646
    Link: https://lore.kernel.org/lkml/1596682582-29139-2-git-send-email-cgoldswo@codeaurora.org/
    Signed-off-by: Chris Goldsworthy
    Co-developed-by: Susheel Khiani
    Signed-off-by: Susheel Khiani
    Co-developed-by: Vinayak Menon
    Signed-off-by: Vinayak Menon
    Change-Id: I2f0c8388f9163e0decd631d9ae07bb6ad9ab79c8

    Chris Goldsworthy
     

13 Aug, 2020

4 commits

  • …ernel/git/abelloni/linux") into android-mainline

    Steps on the way to 5.9-rc1.

    Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
    Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30

    Greg Kroah-Hartman
     
  • The routine cma_init_reserved_areas is designed to activate all
    reserved cma areas. It quits when it first encounters an error.
    This can leave some areas in a state where they are reserved but
    not activated. There is no feedback to code which performed the
    reservation. Attempting to allocate memory from areas in such a
    state will result in a BUG.

    Modify cma_init_reserved_areas to always attempt to activate all
    areas. The called routine, cma_activate_area is responsible for
    leaving the area in a valid state. No one is making active use
    of returned error codes, so change the routine to void.

    How to reproduce: This example uses kernelcore, hugetlb and cma
    as an easy way to reproduce. However, this is a more general cma
    issue.

    Two node x86 VM 16GB total, 8GB per node
    Kernel command line parameters, kernelcore=4G hugetlb_cma=8G
    Related boot time messages,
    hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node
    cma: Reserved 4096 MiB at 0x0000000100000000
    hugetlb_cma: reserved 4096 MiB on node 0
    cma: Reserved 4096 MiB at 0x0000000300000000
    hugetlb_cma: reserved 4096 MiB on node 1
    cma: CMA area hugetlb could not be activated

    # echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    ...
    Call Trace:
    bitmap_find_next_zero_area_off+0x51/0x90
    cma_alloc+0x1a5/0x310
    alloc_fresh_huge_page+0x78/0x1a0
    alloc_pool_huge_page+0x6f/0xf0
    set_max_huge_pages+0x10c/0x250
    nr_hugepages_store_common+0x92/0x120
    ? __kmalloc+0x171/0x270
    kernfs_fop_write+0xc1/0x1a0
    vfs_write+0xc7/0x1f0
    ksys_write+0x5f/0xe0
    do_syscall_64+0x4d/0x90
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")
    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Reviewed-by: Roman Gushchin
    Acked-by: Barry Song
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Kyungmin Park
    Cc: Joonsoo Kim
    Cc:
    Link: http://lkml.kernel.org/r/20200730163123.6451-1-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Patch series "mm: fix the names of general cma and hugetlb cma", v2.

    The current code of CMA can only work when users pass a const string as
    name parameter. we need to fix the way to handle names in CMA. On the
    other hand, to avoid name conflicts after enabling CMA_DEBUGFS, each
    hugetlb should get a different CMA name.

    This patch (of 2):

    If users give a name saved in stack, the current code will generate magic
    pointer. if users don't give a name(NULL), kasprintf() will always return
    NULL as we are at the early stage. that means cma_init_reserved_mem()
    will return -ENOMEM if users set name parameter as NULL.

    [natechancellor@gmail.com: return cma->name directly in cma_get_name]
    Link: https://github.com/ClangBuiltLinux/linux/issues/1063
    Link: http://lkml.kernel.org/r/20200623015840.621964-1-natechancellor@gmail.com

    Signed-off-by: Barry Song
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Acked-by: Roman Gushchin
    Link: http://lkml.kernel.org/r/20200616223131.33828-2-song.bao.hua@hisilicon.com
    Signed-off-by: Linus Torvalds

    Barry Song
     
  • In some case the cma area could not be activated, but the cma_alloc be
    used under this case, then the kernel will crash caused by NULL pointer
    dereference.

    Add bitmap valid check in cma_alloc to avoid this issue.

    Signed-off-by: Jianqun Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Link: http://lkml.kernel.org/r/20200615010123.15596-1-jay.xu@rock-chips.com
    Signed-off-by: Linus Torvalds

    Jianqun Xu
     

06 Jul, 2020

1 commit


04 Jul, 2020

1 commit

  • Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
    reservation can easily cause cma leak and various confusion. For example,
    mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages. But it
    can easily leak cma and make users confused when system has memoryless
    nodes.

    In case the system has 4 numa nodes, and only numa node0 has memory. if
    we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
    different numa nodes. since exact_nid=false in current code, all 4 numa
    nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
    never be available to hugepage will only allocate memory from
    hugetlb_cma[0].

    In case the system has 4 numa nodes, both numa node0&2 has memory, other
    nodes have no memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
    will get 4 cma areas for 4 different numa nodes. since exact_nid=false in
    current code, all 4 numa nodes will get cma successfully from node0 or 2,
    but hugetlb_cma[1] and [3] will never be available to hugepage as
    mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
    hugetlb_cma[2]. This causes permanent leak of the cma areas which are
    supposed to be used by memoryless node.

    Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
    areas in alloc_gigantic_page() even node_mask includes node0 only. that
    means when node_mask includes node0 only, we can get page from
    hugetlb_cma[1] to hugetlb_cma[3]. But this will cause kernel crash in
    free_gigantic_page() while it wants to free page by:
    cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)

    On the other hand, exact_nid=false won't consider numa distance, it might
    be not that useful to leverage cma areas on remote nodes. I feel it is
    much simpler to make exact_nid true to make everything clear. After that,
    memoryless nodes won't be able to reserve per-numa CMA from other nodes
    which have memory.

    Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
    Signed-off-by: Barry Song
    Signed-off-by: Andrew Morton
    Acked-by: Roman Gushchin
    Cc: Jonathan Cameron
    Cc: Aslan Bakirov
    Cc: Michal Hocko
    Cc: Andreas Schaufler
    Cc: Mike Kravetz
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Cc: Robin Murphy
    Cc:
    Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.com
    Signed-off-by: Linus Torvalds

    Barry Song
     

11 Apr, 2020

2 commits

  • Steps along the way to the 5.7-rc1 merge.

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: Iaf237a174205979344cfa76274198e87e2ba7799

    Greg Kroah-Hartman
     
  • I've noticed that there is no interface exposed by CMA which would let
    me to declare contigous memory on particular NUMA node.

    This patchset adds the ability to try to allocate contiguous memory on a
    specific node. It will fallback to other nodes if the specified one
    doesn't work.

    Implement a new method for declaring contigous memory on particular node
    and keep cma_declare_contiguous() as a wrapper.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Aslan Bakirov
    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko
    Cc: Andreas Schaufler
    Cc: Mike Kravetz
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
    Signed-off-by: Linus Torvalds

    Aslan Bakirov
     

09 Dec, 2019

1 commit


02 Dec, 2019

1 commit

  • kzalloc() is used for cma bitmap allocation in cma_activate_area(),
    switch to bitmap_zalloc() for clarity.

    Link: http://lkml.kernel.org/r/895d4627-f115-c77a-d454-c0a196116426@huawei.com
    Signed-off-by: Yunfeng Ye
    Reviewed-by: Andrew Morton
    Cc: Mike Rapoport
    Cc: Yue Hu
    Cc: Peng Fan
    Cc: Andrey Ryabinin
    Cc: Ryohei Suzuki
    Cc: Andrey Konovalov
    Cc: Doug Berger
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yunfeng Ye
     

25 Sep, 2019

1 commit


17 Jul, 2019

2 commits

  • The description of cma_declare_contiguous() indicates that if the
    'fixed' argument is true the reserved contiguous area must be exactly at
    the address of the 'base' argument.

    However, the function currently allows the 'base', 'size', and 'limit'
    arguments to be silently adjusted to meet alignment constraints. This
    commit enforces the documented behavior through explicit checks that
    return an error if the region does not fit within a specified region.

    Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com
    Fixes: 5ea3b1b2f8ad ("cma: add placement specifier for "cma=" kernel parameter")
    Signed-off-by: Doug Berger
    Acked-by: Michal Nazarewicz
    Cc: Yue Hu
    Cc: Mike Rapoport
    Cc: Laura Abbott
    Cc: Peng Fan
    Cc: Thomas Gleixner
    Cc: Marek Szyprowski
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Berger
     
  • A comment referred to a non-existent function alloc_cma(), which should
    have been cma_alloc().

    Link: http://lkml.kernel.org/r/20190712085549.5920-1-ryh.szk.cmnty@gmail.com
    Signed-off-by: Ryohei Suzuki
    Reviewed-by: Andrew Morton
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryohei Suzuki
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your optional any later version of the license

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520075212.713472955@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • f022d8cb7ec7 ("mm: cma: Don't crash on allocation if CMA area can't be
    activated") fixes the crash issue when activation fails via setting
    cma->count as 0, same logic exists if bitmap allocation fails.

    Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com
    Signed-off-by: Yue Hu
    Reviewed-by: Anshuman Khandual
    Cc: Joonsoo Kim
    Cc: Laura Abbott
    Cc: Mike Rapoport
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yue Hu
     
  • Currently one bit in cma bitmap represents number of pages rather than
    one page, cma->count means cma size in pages. So to find available pages
    via find_next_zero_bit()/find_next_bit() we should use cma size not in
    pages but in bits although current free pages number is correct due to
    zero value of order_per_bit. Once order_per_bit is changed the bitmap
    status will be incorrect.

    The size input in cma_debug_show_areas() is not correct. It will
    affect the available pages at some position to debug the failure issue.

    This is an example with order_per_bit = 1

    Before this change:
    [ 4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages

    After this change:
    [ 4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages

    Obviously the bitmap status before is incorrect.

    Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com
    Signed-off-by: Yue Hu
    Reviewed-by: Andrew Morton
    Cc: Joonsoo Kim
    Cc: Ingo Molnar
    Cc: Vlastimil Babka
    Cc: Mike Rapoport
    Cc: Randy Dunlap
    Cc: Laura Abbott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yue Hu
     

13 Mar, 2019

1 commit

  • Rename memblock_alloc_range() to memblock_phys_alloc_range() to
    emphasize that it returns a physical address.

    While on it, remove the 'enum memblock_flags' parameter from this
    function as its only user anyway sets it to MEMBLOCK_NONE, which is the
    default for the most of memblock allocations.

    Link: http://lkml.kernel.org/r/1548057848-15136-6-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

06 Mar, 2019

1 commit

  • In case cma_init_reserved_mem failed, need to free the memblock
    allocated by memblock_reserve or memblock_alloc_range.

    Quote Catalin's comments:
    https://lkml.org/lkml/2019/2/26/482

    Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
    ignores the memblock_reserve() as a memblock_alloc() implementation
    detail. It is, however, tolerant to memblock_free() being called on
    a sub-range or just a different range from a previous memblock_alloc().
    So the original patch looks fine to me. FWIW:

    Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
    Signed-off-by: Peng Fan
    Reviewed-by: Catalin Marinas
    Reviewed-by: Mike Rapoport
    Cc: Laura Abbott
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Marek Szyprowski
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peng Fan
     

29 Dec, 2018

1 commit

  • Tag-based KASAN doesn't check memory accesses through pointers tagged with
    0xff. When page_address is used to get pointer to memory that corresponds
    to some page, the tag of the resulting pointer gets set to 0xff, even
    though the allocated memory might have been tagged differently.

    For slab pages it's impossible to recover the correct tag to return from
    page_address, since the page might contain multiple slab objects tagged
    with different values, and we can't know in advance which one of them is
    going to get accessed. For non slab pages however, we can recover the tag
    in page_address, since the whole page was marked with the same tag.

    This patch adds tagging to non slab memory allocated with pagealloc. To
    set the tag of the pointer returned from page_address, the tag gets stored
    to page->flags when the memory gets allocated.

    Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Andrey Ryabinin
    Reviewed-by: Dmitry Vyukov
    Acked-by: Will Deacon
    Cc: Christoph Lameter
    Cc: Mark Rutland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

18 Aug, 2018

1 commit

  • cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so
    convert gfp_mask parameter to boolean no_warn parameter.

    This will help to avoid giving false feeling that this function supports
    standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
    what has already been an issue: see commit dd65a941f6ba ("arm64:
    dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

    Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com
    Signed-off-by: Marek Szyprowski
    Acked-by: Michal Hocko
    Acked-by: Michał Nazarewicz
    Acked-by: Laura Abbott
    Acked-by: Vlastimil Babka
    Reviewed-by: Christoph Hellwig
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marek Szyprowski
     

25 May, 2018

1 commit

  • This reverts the following commits that change CMA design in MM.

    3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

    1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA")

    bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

    Ville reported a following error on i386.

    Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    microcode: microcode updated early to revision 0x4, date = 2013-06-28
    Initializing CPU#0
    Initializing HighMem for node 0 (000377fe:00118000)
    Initializing Movable for node 0 (00000001:00118000)
    BUG: Bad page state in process swapper pfn:377fe
    page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
    flags: 0x80000000()
    raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
    page dumped because: nonzero mapcount
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
    Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
    Call Trace:
    dump_stack+0x60/0x96
    bad_page+0x9a/0x100
    free_pages_check_bad+0x3f/0x60
    free_pcppages_bulk+0x29d/0x5b0
    free_unref_page_commit+0x84/0xb0
    free_unref_page+0x3e/0x70
    __free_pages+0x1d/0x20
    free_highmem_page+0x19/0x40
    add_highpages_with_active_regions+0xab/0xeb
    set_highmem_pages_init+0x66/0x73
    mem_init+0x1b/0x1d7
    start_kernel+0x17a/0x363
    i386_start_kernel+0x95/0x99
    startup_32_smp+0x164/0x168

    The reason for this error is that the span of MOVABLE_ZONE is extended
    to whole node span for future CMA initialization, and, normal memory is
    wrongly freed here. I submitted the fix and it seems to work, but,
    another problem happened.

    It's so late time to fix the later problem so I decide to reverting the
    series.

    Reported-by: Ville Syrjälä
    Acked-by: Laura Abbott
    Acked-by: Michal Hocko
    Cc: Andrew Morton
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Apr, 2018

1 commit

  • Patch series "mm/cma: manage the memory of the CMA area by using the
    ZONE_MOVABLE", v2.

    0. History

    This patchset is the follow-up of the discussion about the "Introduce
    ZONE_CMA (v7)" [1]. Please reference it if more information is needed.

    1. What does this patch do?

    This patch changes the management way for the memory of the CMA area in
    the MM subsystem. Currently the memory of the CMA area is managed by
    the zone where their pfn is belong to. However, this approach has some
    problems since MM subsystem doesn't have enough logic to handle the
    situation that different characteristic memories are in a single zone.
    To solve this issue, this patch try to manage all the memory of the CMA
    area by using the MOVABLE zone. In MM subsystem's point of view,
    characteristic of the memory on the MOVABLE zone and the memory of the
    CMA area are the same. So, managing the memory of the CMA area by using
    the MOVABLE zone will not have any problem.

    2. Motivation

    There are some problems with current approach. See following. Although
    these problem would not be inherent and it could be fixed without this
    conception change, it requires many hooks addition in various code path
    and it would be intrusive to core MM and would be really error-prone.
    Therefore, I try to solve them with this new approach. Anyway,
    following is the problems of the current implementation.

    o CMA memory utilization

    First, following is the freepage calculation logic in MM.

    - For movable allocation: freepage = total freepage
    - For unmovable allocation: freepage = total freepage - CMA freepage

    Freepages on the CMA area is used after the normal freepages in the zone
    where the memory of the CMA area is belong to are exhausted. At that
    moment that the number of the normal freepages is zero, so

    - For movable allocation: freepage = total freepage = CMA freepage
    - For unmovable allocation: freepage = 0

    If unmovable allocation comes at this moment, allocation request would
    fail to pass the watermark check and reclaim is started. After reclaim,
    there would exist the normal freepages so freepages on the CMA areas
    would not be used.

    FYI, there is another attempt [2] trying to solve this problem in lkml.
    And, as far as I know, Qualcomm also has out-of-tree solution for this
    problem.

    Useless reclaim:

    There is no logic to distinguish CMA pages in the reclaim path. Hence,
    CMA page is reclaimed even if the system just needs the page that can be
    usable for the kernel allocation.

    Atomic allocation failure:

    This is also related to the fallback allocation policy for the memory of
    the CMA area. Consider the situation that the number of the normal
    freepages is *zero* since the bunch of the movable allocation requests
    come. Kswapd would not be woken up due to following freepage
    calculation logic.

    - For movable allocation: freepage = total freepage = CMA freepage

    If atomic unmovable allocation request comes at this moment, it would
    fails due to following logic.

    - For unmovable allocation: freepage = total freepage - CMA freepage = 0

    It was reported by Aneesh [3].

    Useless compaction:

    Usual high-order allocation request is unmovable allocation request and
    it cannot be served from the memory of the CMA area. In compaction,
    migration scanner try to migrate the page in the CMA area and make
    high-order page there. As mentioned above, it cannot be usable for the
    unmovable allocation request so it's just waste.

    3. Current approach and new approach

    Current approach is that the memory of the CMA area is managed by the
    zone where their pfn is belong to. However, these memory should be
    distinguishable since they have a strong limitation. So, they are
    marked as MIGRATE_CMA in pageblock flag and handled specially. However,
    as mentioned in section 2, the MM subsystem doesn't have enough logic to
    deal with this special pageblock so many problems raised.

    New approach is that the memory of the CMA area is managed by the
    MOVABLE zone. MM already have enough logic to deal with special zone
    like as HIGHMEM and MOVABLE zone. So, managing the memory of the CMA
    area by the MOVABLE zone just naturally work well because constraints
    for the memory of the CMA area that the memory should always be
    migratable is the same with the constraint for the MOVABLE zone.

    There is one side-effect for the usability of the memory of the CMA
    area. The use of MOVABLE zone is only allowed for a request with
    GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also
    only allowed for this gfp flag. Before this patchset, a request with
    GFP_MOVABLE can use them. IMO, It would not be a big issue since most
    of GFP_MOVABLE request also has GFP_HIGHMEM flag. For example, file
    cache page and anonymous page. However, file cache page for blockdev
    file is an exception. Request for it has no GFP_HIGHMEM flag. There is
    pros and cons on this exception. In my experience, blockdev file cache
    pages are one of the top reason that causes cma_alloc() to fail
    temporarily. So, we can get more guarantee of cma_alloc() success by
    discarding this case.

    Note that there is no change in admin POV since this patchset is just
    for internal implementation change in MM subsystem. Just one minor
    difference for admin is that the memory stat for CMA area will be
    printed in the MOVABLE zone. That's all.

    4. Result

    Following is the experimental result related to utilization problem.

    8 CPUs, 1024 MB, VIRTUAL MACHINE
    make -j16

    CMA area: 0 MB 512 MB
    Elapsed-time: 92.4 186.5
    pswpin: 82 18647
    pswpout: 160 69839

    CMA : 0 MB 512 MB
    Elapsed-time: 93.1 93.4
    pswpin: 84 46
    pswpout: 183 92

    akpm: "kernel test robot" reported a 26% improvement in
    vm-scalability.throughput:
    http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop

    [1]: lkml.kernel.org/r/1491880640-9944-1-git-send-email-iamjoonsoo.kim@lge.com
    [2]: https://lkml.org/lkml/2014/10/15/623
    [3]: http://www.spinics.net/lists/linux-mm/msg100562.html

    Link: http://lkml.kernel.org/r/1512114786-5085-2-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Reviewed-by: Aneesh Kumar K.V
    Tested-by: Tony Lindgren
    Acked-by: Vlastimil Babka
    Cc: Johannes Weiner
    Cc: Laura Abbott
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

06 Apr, 2018

2 commits

  • Currently #includes for no obvious
    reason. It looks like it's only a convenience, so remove kmemleak.h
    from slab.h and add to any users of kmemleak_* that
    don't already #include it. Also remove from source
    files that do not use it.

    This is tested on i386 allmodconfig and x86_64 allmodconfig. It would
    be good to run it through the 0day bot for other $ARCHes. I have
    neither the horsepower nor the storage space for the other $ARCHes.

    Update: This patch has been extensively build-tested by both the 0day
    bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
    for which patches are included here (in v2).

    [ slab.h is the second most used header file after module.h; kernel.h is
    right there with slab.h. There could be some minor error in the
    counting due to some #includes having comments after them and I didn't
    combine all of those. ]

    [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
    Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
    Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
    Signed-off-by: Randy Dunlap
    Reviewed-by: Ingo Molnar
    Reported-by: Michael Ellerman [2 build failures]
    Reported-by: Fengguang Wu [2 build failures]
    Reviewed-by: Andrew Morton
    Cc: Wei Yongjun
    Cc: Luis R. Rodriguez
    Cc: Greg Kroah-Hartman
    Cc: Mimi Zohar
    Cc: John Johansen
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

16 Nov, 2017

1 commit

  • It was observed that under cma_alloc fail log, pr_info was used instead
    of pr_err. This will lead to problems if printk debug level is set to
    below 7. In this case the cma_alloc failure log will not be captured in
    the log and it will be difficult to debug.

    Simply replace the pr_info with pr_err to capture failure log.

    Link: http://lkml.kernel.org/r/1507650633-4430-1-git-send-email-pintu.ping@gmail.com
    Signed-off-by: Pintu Agarwal
    Cc: Laura Abbott
    Cc: Greg Kroah-Hartman
    Cc: Jaewon Kim
    Cc: Doug Berger
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pintu Agarwal
     

14 Oct, 2017

1 commit

  • cma_alloc() unconditionally prints an INFO message when the CMA
    allocation fails. Make this message conditional on the non-presence of
    __GFP_NOWARN in gfp_mask.

    This patch aims at removing INFO messages that are displayed when the
    VC4 driver tries to allocate buffer objects. From the driver
    perspective an allocation failure is acceptable, and the driver can
    possibly do something to make following allocation succeed (like
    flushing the VC4 internal cache).

    Link: http://lkml.kernel.org/r/20171004125447.15195-1-boris.brezillon@free-electrons.com
    Signed-off-by: Boris Brezillon
    Acked-by: Laura Abbott
    Cc: Jaewon Kim
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Eric Anholt
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boris Brezillon
     

11 Jul, 2017

2 commits

  • The align_offset parameter is used by bitmap_find_next_zero_area_off()
    to represent the offset of map's base from the previous alignment
    boundary; the function ensures that the returned index, plus the
    align_offset, honors the specified align_mask.

    The logic introduced by commit b5be83e308f7 ("mm: cma: align to physical
    address, not CMA region position") has the cma driver calculate the
    offset to the *next* alignment boundary. In most cases, the base
    alignment is greater than that specified when making allocations,
    resulting in a zero offset whether we align up or down. In the example
    given with the commit, the base alignment (8MB) was half the requested
    alignment (16MB) so the math also happened to work since the offset is
    8MB in both directions. However, when requesting allocations with an
    alignment greater than twice that of the base, the returned index would
    not be correctly aligned.

    Also, the align_order arguments of cma_bitmap_aligned_mask() and
    cma_bitmap_aligned_offset() should not be negative so the argument type
    was made unsigned.

    Fixes: b5be83e308f7 ("mm: cma: align to physical address, not CMA region position")
    Link: http://lkml.kernel.org/r/20170628170742.2895-1-opendmb@gmail.com
    Signed-off-by: Angus Clark
    Signed-off-by: Doug Berger
    Acked-by: Gregory Fong
    Cc: Doug Berger
    Cc: Angus Clark
    Cc: Laura Abbott
    Cc: Vlastimil Babka
    Cc: Greg Kroah-Hartman
    Cc: Lucas Stach
    Cc: Catalin Marinas
    Cc: Shiraz Hashim
    Cc: Jaewon Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Berger
     
  • While activating a CMA area we check to make sure that all the PFNs in
    the range are inside the same zone. This is a requirement for
    alloc_contig_range() to work. Any CMA area failing the check is
    disabled for good. This happens silently right now making all future
    cma_alloc() allocations failure inevitable.

    Here we add an error message stating that the CMA area could not be
    activated which makes it easier to explain any future cma_alloc()
    failures on it. While in there, change the bail out goto label from
    'err' to 'not_in_zone' which makes more sense.

    Link: http://lkml.kernel.org/r/20170605023729.26303-1-khandual@linux.vnet.ibm.com
    Signed-off-by: Anshuman Khandual
    Cc: "Aneesh Kumar K.V"
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

19 Apr, 2017

2 commits


25 Feb, 2017

3 commits

  • There are many reasons of CMA allocation failure such as EBUSY, ENOMEM,
    EINTR. But we did not know error reason so far. This patch prints the
    error value.

    Additionally if CONFIG_CMA_DEBUG is enabled, this patch shows bitmap
    status to know available pages. Actually CMA internally tries on all
    available regions because some regions can be failed because of EBUSY.
    Bitmap status is useful to know in detail on both ENONEM and EBUSY;

    ENOMEM: not tried at all because of no available region
    it could be too small total region or could be fragmentation issue
    EBUSY: tried some region but all failed

    This is an ENOMEM example with this patch.

    [2: Binder:714_1: 744] cma: cma_alloc: alloc failed, req-size: 256 pages, ret: -12

    If CONFIG_CMA_DEBUG is enabled, avabile pages also will be shown as
    concatenated size@position format. So 4@572 means that there are 4
    available pages at 572 position starting from 0 position.

    [2: Binder:714_1: 744] cma: number of available pages: 4@572+7@585+7@601+8@632+38@730+166@1114+127@1921=> 357 free of 2048 total pages

    Link: http://lkml.kernel.org/r/1485909785-3952-1-git-send-email-jaewon31.kim@samsung.com
    Signed-off-by: Jaewon Kim
    Acked-by: Michal Nazarewicz
    Cc: Laura Abbott
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaewon Kim
     
  • Most users of this interface just want to use it with the default
    GFP_KERNEL flags, but for cases where DMA memory is allocated it may be
    called from a different context.

    No functional change yet, just passing through the flag to the
    underlying alloc_contig_range function.

    Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de
    Signed-off-by: Lucas Stach
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Radim Krcmar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Chris Zankel
    Cc: Ralf Baechle
    Cc: Paolo Bonzini
    Cc: Alexander Graf
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas Stach
     
  • Currently alloc_contig_range assumes that the compaction should be done
    with the default GFP_KERNEL flags. This is probably right for all
    current uses of this interface, but may change as CMA is used in more
    use-cases (including being the default DMA memory allocator on some
    platforms).

    Change the function prototype, to allow for passing through the GFP mask
    set by upper layers.

    Also respect global restrictions by applying memalloc_noio_flags to the
    passed in flags.

    Link: http://lkml.kernel.org/r/20170127172328.18574-1-l.stach@pengutronix.de
    Signed-off-by: Lucas Stach
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Radim Krcmar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Chris Zankel
    Cc: Ralf Baechle
    Cc: Paolo Bonzini
    Cc: Alexander Graf
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas Stach
     

11 Jan, 2017

1 commit

  • 6b101e2a3ce4 ("mm/CMA: fix boot regression due to physical address of
    high_memory") added checks to use __pa_nodebug on x86 since
    CONFIG_DEBUG_VIRTUAL complains about high_memory not being linearlly
    mapped. arm64 is now getting support for CONFIG_DEBUG_VIRTUAL as well.
    Rather than add an explosion of arches to the #ifdef, switch to an
    alternate method to calculate the physical start of highmem using
    the page before highmem starts. This avoids the need for the #ifdef and
    extra __pa_nodebug calls.

    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Signed-off-by: Laura Abbott
    Signed-off-by: Will Deacon

    Laura Abbott
     

12 Nov, 2016

1 commit

  • CMA allocation request size is represented by size_t that gets truncated
    when same is passed as int to bitmap_find_next_zero_area_off.

    We observe that during fuzz testing when cma allocation request is too
    high, bitmap_find_next_zero_area_off still returns success due to the
    truncation. This leads to kernel crash, as subsequent code assumes that
    requested memory is available.

    Fail cma allocation in case the request breaches the corresponding cma
    region size.

    Link: http://lkml.kernel.org/r/1478189211-3467-1-git-send-email-shashim@codeaurora.org
    Signed-off-by: Shiraz Hashim
    Cc: Catalin Marinas
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shiraz Hashim
     

12 Oct, 2016

1 commit

  • Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
    physical address to a virtual one using __va(). However, such physical
    addresses may sometimes be located in highmem and using __va() is
    incorrect, leading to inconsistent object tracking in kmemleak.

    The following functions have been added to the kmemleak API and they take
    a physical address as the object pointer. They only perform the
    corresponding action if the address has a lowmem mapping:

    kmemleak_alloc_phys
    kmemleak_free_part_phys
    kmemleak_not_leak_phys
    kmemleak_ignore_phys

    The affected calling places have been updated to use the new kmemleak
    API.

    Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Reported-by: Vignesh R
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

28 May, 2016

1 commit

  • pageblock_order can be (at least) an unsigned int or an unsigned long
    depending on the kernel config and architecture, so use max_t(unsigned
    long, ...) when comparing it.

    fixes these warnings:

    In file included from include/asm-generic/bug.h:13:0,
    from arch/powerpc/include/asm/bug.h:127,
    from include/linux/bug.h:4,
    from include/linux/mmdebug.h:4,
    from include/linux/mm.h:8,
    from include/linux/memblock.h:18,
    from mm/cma.c:28:
    mm/cma.c: In function 'cma_init_reserved_mem':
    include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
    (void) (&_max1 == &_max2); ^
    mm/cma.c:186:27: note: in expansion of macro 'max'
    alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
    ^
    mm/cma.c: In function 'cma_declare_contiguous':
    include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
    (void) (&_max1 == &_max2); ^
    include/linux/kernel.h:747:9: note: in definition of macro 'max'
    typeof(y) _max2 = (y); ^
    mm/cma.c:270:29: note: in expansion of macro 'max'
    (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
    ^
    include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
    (void) (&_max1 == &_max2); ^
    include/linux/kernel.h:747:21: note: in definition of macro 'max'
    typeof(y) _max2 = (y); ^
    mm/cma.c:270:29: note: in expansion of macro 'max'
    (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
    ^

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20160526150748.5be38a4f@canb.auug.org.au
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     

06 Nov, 2015

1 commit

  • mm/cma.c: In function 'cma_alloc':
    mm/cma.c:366: warning: 'pfn' may be used uninitialized in this function

    The patch actually improves the tracing a bit: if alloc_contig_range()
    fails, tracing will display the offending pfn rather than -1.

    Cc: Stefan Strogin
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Cc: Laurent Pinchart
    Cc: Thierry Reding
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Oct, 2015

1 commit

  • This was found during userspace fuzzing test when a large size dma cma
    allocation is made by driver(like ion) through userspace.

    show_stack+0x10/0x1c
    dump_stack+0x74/0xc8
    kasan_report_error+0x2b0/0x408
    kasan_report+0x34/0x40
    __asan_storeN+0x15c/0x168
    memset+0x20/0x44
    __dma_alloc_coherent+0x114/0x18c

    Signed-off-by: Rohit Vaswani
    Acked-by: Greg Kroah-Hartman
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rohit Vaswani