08 Apr, 2018

1 commit

  • commit 47504ee04b9241548ae2c28be7d0b01cff3b7aa6 upstream.

    Percpu memory using the vmalloc area based chunk allocator lazily
    populates chunks by first requesting the full virtual address space
    required for the chunk and subsequently adding pages as allocations come
    through. To ensure atomic allocations can succeed, a workqueue item is
    used to maintain a minimum number of empty pages. In certain scenarios,
    such as reported in [1], it is possible that physical memory becomes
    quite scarce which can result in either a rather long time spent trying
    to find free pages or worse, a kernel panic.

    This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
    through to the underlying allocators. This should prevent any
    unnecessary panics potentially caused by the workqueue item. The passing
    of gfp around is as additional flags rather than a full set of flags.
    The next patch will change these to caller passed semantics.

    V2:
    Added const modifier to gfp flags in the balance path.
    Removed an extra whitespace.

    [1] https://lkml.org/lkml/2018/2/12/551

    Signed-off-by: Dennis Zhou
    Suggested-by: Daniel Borkmann
    Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
    Acked-by: Christoph Lameter
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Dennis Zhou
     

29 Jun, 2017

1 commit


21 Jun, 2017

2 commits

  • Add support for tracepoints to the following events: chunk allocation,
    chunk free, area allocation, area free, and area allocation failure.
    This should let us replay percpu memory requests and evaluate
    corresponding decisions.

    Signed-off-by: Dennis Zhou
    Signed-off-by: Tejun Heo

    Dennis Zhou
     
  • There is limited visibility into the use of percpu memory leaving us
    unable to reason about correctness of parameters and overall use of
    percpu memory. These counters and statistics aim to help understand
    basic statistics about percpu memory such as number of allocations over
    the lifetime, allocation sizes, and fragmentation.

    New Config: PERCPU_STATS

    Signed-off-by: Dennis Zhou
    Signed-off-by: Tejun Heo

    Dennis Zhou
     

07 Mar, 2017

1 commit


03 Sep, 2014

4 commits

  • Previously, pcpu_[de]populate_chunk() were called with the range which
    may contain multiple target regions in it and
    pcpu_[de]populate_chunk() iterated over the regions. This has the
    benefit of batching up cache flushes for all the regions; however,
    we're planning to add more bookkeeping logic around [de]population to
    support atomic allocations and this delegation of iterations gets in
    the way.

    This patch moves the region iterations out of
    pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
    pcpu_reclaim() - so that we can later add logic to track more states
    around them. This change may make cache and tlb flushes more frequent
    but multi-region [de]populations are rare anyway and if this actually
    becomes a problem, it's not difficult to factor out cache flushes as
    separate callbacks which are directly invoked from percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • percpu-vm and percpu-km implement separate versions of
    pcpu_[de]populate_chunk() and some part which is or should be common
    are currently in the specific implementations. Make the following
    changes.

    * Allocate area clearing is moved from the pcpu_populate_chunk()
    implementations to pcpu_alloc(). This makes percpu-km's version
    noop.

    * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
    to their respective callers so that they are applied to percpu-km
    too. This doesn't make any meaningful difference as both functions
    are noop for percpu-km; however, this is more consistent and will
    help implementing atomic allocation support.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • pcpu_get_pages() creates the temp pages array if not already allocated
    and returns the pointer to it. As the function is called from both
    [de]population paths and depopulation can only happen after at least
    one successful population, the param doesn't make any difference - the
    allocation will always happen on the population path anyway.

    Remove @may_alloc from pcpu_get_pages(). Also, add an lockdep
    assertion pcpu_alloc_mutex instead of vaguely stating that the
    exclusion is the caller's responsibility.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
    and populated bitmap and uses the two during [de]population. The temp
    bitmap is used only to build the new bitmap that is copied to
    chunk->populated after the operation succeeds; however, the new bitmap
    can be trivially set after success without using the temp bitmap.

    This patch removes the temp populated bitmap usage from percpu-vm.c.

    * pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
    longer hands out the temp bitmap.

    * @populated arugment is dropped from all the related functions.
    @populated updates in pcpu_[un]map_pages() are dropped.

    * Two loops in pcpu_map_pages() are merged.

    * pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
    from @page_start and @page_end after success.

    Signed-off-by: Tejun Heo
    Acked-by: Christoph Lameter

    Tejun Heo
     

16 Aug, 2014

2 commits

  • If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
    Currently, it doesn't flush tlb after the partial unmapping. This may
    be okay in most cases as the established mapping hasn't been used at
    that point but it can go wrong and when it goes wrong it'd be
    extremely difficult to track down.

    Flush tlb after the partial unmapping.

    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Tejun Heo
     
  • When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
    free what has already been allocated. The invocation is across the
    whole requested range and pcpu_free_pages() will try to free all
    non-NULL pages; unfortunately, this is incorrect as
    pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
    clear the pages array and thus the array may have entries from the
    previous invocations making the partial failure path free incorrect
    pages.

    Fix it by open-coding the partial freeing of the already allocated
    pages.

    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Tejun Heo
     

21 Jun, 2012

1 commit

  • Fix kernel-doc warnings such as

    Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
    Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'

    Signed-off-by: Wanpeng Li
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     

21 Jan, 2012

1 commit


23 Nov, 2011

2 commits

  • Percpu allocator recorded the cpus which map to the first and last
    units in pcpu_first/last_unit_cpu respectively and used them to
    determine the address range of a chunk - e.g. it assumed that the
    first unit has the lowest address in a chunk while the last unit has
    the highest address.

    This simply isn't true. Groups in a chunk can have arbitrary positive
    or negative offsets from the previous one and there is no guarantee
    that the first unit occupies the lowest offset while the last one the
    highest.

    Fix it by actually comparing unit offsets to determine cpus occupying
    the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
    to pcpu_low/high_unit_cpu to avoid confusion.

    The chunk address range is used to flush cache on vmalloc area
    map/unmap and decide whether a given address is in the first chunk by
    per_cpu_ptr_to_phys() and the bug was discovered by invalid
    per_cpu_ptr_to_phys() translation for crash_note.

    Kudos to Dave Young for tracking down the problem.

    Signed-off-by: Tejun Heo
    Reported-by: WANG Cong
    Reported-by: Dave Young
    Tested-by: Dave Young
    LKML-Reference:
    Cc: stable @kernel.org

    Tejun Heo
     
  • Currently pcpu_mem_alloc() is implemented always return zeroed memory.
    So rename it to make user like pcpu_get_pages_and_bitmap() know don't
    reinit it.

    Signed-off-by: Bob Liu
    Reviewed-by: Pekka Enberg
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Bob Liu
     

14 Jan, 2011

1 commit


01 May, 2010

1 commit

  • Separate out and move chunk management (creation/desctruction and
    [de]population) code into percpu-vm.c which is included by percpu.c
    and compiled together. The interface for chunk management is defined
    as follows.

    * pcpu_populate_chunk - populate the specified range of a chunk
    * pcpu_depopulate_chunk - depopulate the specified range of a chunk
    * pcpu_create_chunk - create a new chunk
    * pcpu_destroy_chunk - destroy a chunk, always preceded by full depop
    * pcpu_addr_to_page - translate address to physical address
    * pcpu_verify_alloc_info - check alloc_info is acceptable during init

    Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and
    dummy pcpu_verify_alloc_info() implementation, this patch only moves
    code around. This separation is to allow alternate chunk management
    implementation.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo