23 Sep, 2020

1 commit

  • commit b3b33d3c43bbe0177d70653f4e889c78cc37f097 upstream.

    Variable populated, which is a member of struct pcpu_chunk, is used as a
    unit of size of unsigned long.
    However, size of populated is miscounted. So, I fix this minor part.

    Fixes: 8ab16c43ea79 ("percpu: change the number of pages marked in the first_chunk pop bitmap")
    Cc: # 4.14+
    Signed-off-by: Sunghyun Jin
    Signed-off-by: Dennis Zhou
    Signed-off-by: Greg Kroah-Hartman

    Sunghyun Jin
     

05 Sep, 2019

1 commit

  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct pcpu_alloc_info {
    ...
    struct pcpu_group_info groups[];
    };

    Make use of the struct_size() helper instead of an open-coded version
    in order to avoid any potential type mistakes.

    So, replace the following form:

    sizeof(*ai) + nr_groups * sizeof(ai->groups[0])

    with:

    struct_size(ai, groups, nr_groups)

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Dennis Zhou

    Gustavo A. R. Silva
     

24 Jul, 2019

1 commit


04 Jul, 2019

1 commit


05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

14 May, 2019

1 commit

  • Pull percpu updates from Dennis Zhou:

    - scan hint update which helps address performance issues with heavily
    fragmented blocks

    - lockdep fix when freeing an allocation causes balance work to be
    scheduled

    * 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
    percpu: remove spurious lock dependency between percpu and sched
    percpu: use chunk scan_hint to skip some scanning
    percpu: convert chunk hints to be based on pcpu_block_md
    percpu: make pcpu_block_md generic
    percpu: use block scan_hint to only scan forward
    percpu: remember largest area skipped during allocation
    percpu: add block level scan_hint
    percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE
    percpu: relegate chunks unusable when failing small allocations
    percpu: manage chunks based on contig_bits instead of free_bytes
    percpu: introduce helper to determine if two regions overlap
    percpu: do not search past bitmap when allocating an area
    percpu: update free path with correct new free region

    Linus Torvalds
     

09 May, 2019

1 commit

  • In free_percpu() we sometimes call pcpu_schedule_balance_work() to
    queue a work item (which does a wakeup) while holding pcpu_lock.
    This creates an unnecessary lock dependency between pcpu_lock and
    the scheduler's pi_lock. There are other places where we call
    pcpu_schedule_balance_work() without hold pcpu_lock, and this case
    doesn't need to be different.

    Moving the call outside the lock prevents the following lockdep splat
    when running tools/testing/selftests/bpf/{test_maps,test_progs} in
    sequence with lockdep enabled:

    ======================================================
    WARNING: possible circular locking dependency detected
    5.1.0-dbg-DEV #1 Not tainted
    ------------------------------------------------------
    kworker/23:255/18872 is trying to acquire lock:
    000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520

    but task is already holding lock:
    00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #4 (pcpu_lock){..-.}:
    lock_acquire+0x9e/0x180
    _raw_spin_lock_irqsave+0x3a/0x50
    pcpu_alloc+0xfa/0x780
    __alloc_percpu_gfp+0x12/0x20
    alloc_htab_elem+0x184/0x2b0
    __htab_percpu_map_update_elem+0x252/0x290
    bpf_percpu_hash_update+0x7c/0x130
    __do_sys_bpf+0x1912/0x1be0
    __x64_sys_bpf+0x1a/0x20
    do_syscall_64+0x59/0x400
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #3 (&htab->buckets[i].lock){....}:
    lock_acquire+0x9e/0x180
    _raw_spin_lock_irqsave+0x3a/0x50
    htab_map_update_elem+0x1af/0x3a0

    -> #2 (&rq->lock){-.-.}:
    lock_acquire+0x9e/0x180
    _raw_spin_lock+0x2f/0x40
    task_fork_fair+0x37/0x160
    sched_fork+0x211/0x310
    copy_process.part.43+0x7b1/0x2160
    _do_fork+0xda/0x6b0
    kernel_thread+0x29/0x30
    rest_init+0x22/0x260
    arch_call_rest_init+0xe/0x10
    start_kernel+0x4fd/0x520
    x86_64_start_reservations+0x24/0x26
    x86_64_start_kernel+0x6f/0x72
    secondary_startup_64+0xa4/0xb0

    -> #1 (&p->pi_lock){-.-.}:
    lock_acquire+0x9e/0x180
    _raw_spin_lock_irqsave+0x3a/0x50
    try_to_wake_up+0x41/0x600
    wake_up_process+0x15/0x20
    create_worker+0x16b/0x1e0
    workqueue_init+0x279/0x2ee
    kernel_init_freeable+0xf7/0x288
    kernel_init+0xf/0x180
    ret_from_fork+0x24/0x30

    -> #0 (&(&pool->lock)->rlock){-.-.}:
    __lock_acquire+0x101f/0x12a0
    lock_acquire+0x9e/0x180
    _raw_spin_lock+0x2f/0x40
    __queue_work+0xb2/0x520
    queue_work_on+0x38/0x80
    free_percpu+0x221/0x260
    pcpu_freelist_destroy+0x11/0x20
    stack_map_free+0x2a/0x40
    bpf_map_free_deferred+0x3c/0x50
    process_one_work+0x1f7/0x580
    worker_thread+0x54/0x410
    kthread+0x10f/0x150
    ret_from_fork+0x24/0x30

    other info that might help us debug this:

    Chain exists of:
    &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(pcpu_lock);
    lock(&htab->buckets[i].lock);
    lock(pcpu_lock);
    lock(&(&pool->lock)->rlock);

    *** DEADLOCK ***

    3 locks held by kworker/23:255/18872:
    #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
    at: process_one_work+0x17a/0x580
    #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
    at: process_one_work+0x17a/0x580
    #2: 00000000e3e7a6aa (pcpu_lock){..-.},
    at: free_percpu+0x36/0x260

    stack backtrace:
    CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
    Hardware name: ...
    Workqueue: events bpf_map_free_deferred
    Call Trace:
    dump_stack+0x67/0x95
    print_circular_bug.isra.38+0x1c6/0x220
    check_prev_add.constprop.50+0x9f6/0xd20
    __lock_acquire+0x101f/0x12a0
    lock_acquire+0x9e/0x180
    _raw_spin_lock+0x2f/0x40
    __queue_work+0xb2/0x520
    queue_work_on+0x38/0x80
    free_percpu+0x221/0x260
    pcpu_freelist_destroy+0x11/0x20
    stack_map_free+0x2a/0x40
    bpf_map_free_deferred+0x3c/0x50
    process_one_work+0x1f7/0x580
    worker_thread+0x54/0x410
    kthread+0x10f/0x150
    ret_from_fork+0x24/0x30

    Signed-off-by: John Sperbeck
    Signed-off-by: Dennis Zhou

    John Sperbeck
     

19 Mar, 2019

1 commit

  • Since commit ad67b74d2469d9b8 ("printk: hash addresses printed with %p"),
    at boot "____ptrval____" is printed instead of actual addresses:

    percpu: Embedded 38 pages/cpu @(____ptrval____) s124376 r0 d31272 u524288

    Instead of changing the print to "%px", and leaking kernel addresses,
    just remove the print completely, cfr. e.g. commit 071929dbdd865f77
    ("arm64: Stop printing the virtual memory layout").

    Signed-off-by: Matteo Croce
    Signed-off-by: Dennis Zhou

    Matteo Croce
     

14 Mar, 2019

12 commits

  • Just like blocks, chunks now maintain a scan_hint. This can be used to
    skip some scanning by promoting the scan_hint to be the contig_hint.
    The chunk's scan_hint is primarily updated on the backside and relies on
    full scanning when a block becomes free or the free region spans across
    blocks.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • As mentioned in the last patch, a chunk's hints are no different than a
    block just responsible for more bits. This converts chunk level hints to
    use a pcpu_block_md to maintain them. This lets us reuse the same hint
    helper functions as a block. The left_free and right_free are unused by
    the chunk's pcpu_block_md.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • In reality, a chunk is just a block covering a larger number of bits.
    The hints themselves are one in the same. Rather than maintaining the
    hints separately, first introduce nr_bits to genericize
    pcpu_block_update() to correctly maintain block->right_free. The next
    patch will convert chunk hints to be managed as a pcpu_block_md.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • Blocks now remember the latest scan_hint. This can be used on the
    allocation path as when a contig_hint is broken, we can promote the
    scan_hint to the contig_hint and scan forward from there. This works
    because pcpu_block_refresh_hint() is only called on the allocation path
    while block free regions are updated manually in
    pcpu_block_update_hint_free().

    Signed-off-by: Dennis Zhou

    Dennis Zhou
     
  • Percpu allocations attempt to do first fit by scanning forward from the
    first_free of a block. However, fragmentation from allocation requests
    can cause holes not seen by block hint update functions. To address
    this, create a local version of bitmap_find_next_zero_area_off() that
    remembers the largest area skipped over. The caveat is that it only sees
    regions skipped over due to not fitting, not regions skipped due to
    alignment.

    Prior to updating the scan_hint, a scan backwards is done to try and
    recover free bits skipped due to alignment. While this can cause
    scanning to miss earlier possible free areas, smaller allocations will
    eventually fill those holes due to first fit.

    Signed-off-by: Dennis Zhou

    Dennis Zhou
     
  • Fragmentation can cause both blocks and chunks to have an early
    first_firee bit available, but only able to satisfy allocations much
    later on. This patch introduces a scan_hint to help mitigate some
    unnecessary scanning.

    The scan_hint remembers the largest area prior to the contig_hint. If
    the contig_hint == scan_hint, then scan_hint_start > contig_hint_start.
    This is necessary for scan_hint discovery when refreshing a block.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • Previously, block size was flexible based on the constraint that the
    GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the
    overhead that keeping a floating number of populated free pages required
    scanning over the free regions of a chunk.

    Setting the block size to be fixed at PAGE_SIZE lets us know when an
    empty page becomes used as we will break a full contig_hint of a block.
    This means we no longer have to scan the whole chunk upon breaking a
    contig_hint which empty page management piggybacked off. A later patch
    takes advantage of this to optimize the allocation path by only scanning
    forward using the scan_hint introduced later too.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • In certain cases, requestors of percpu memory may want specific
    alignments. However, it is possible to end up in situations where the
    contig_hint matches, but the alignment does not. This causes excess
    scanning of chunks that will fail. To prevent this, if a small
    allocation fails (< 32B), the chunk is moved to the empty list. Once an
    allocation is freed from that chunk, it is placed back into rotation.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • When a chunk becomes fragmented, it can end up having a large number of
    small allocation areas free. The free_bytes sorting of chunks leads to
    unnecessary checking of chunks that cannot satisfy the allocation.
    Switch to contig_bits sorting to prevent scanning chunks that may not be
    able to service the allocation request.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • While block hints were always accurate, it's possible when spanning
    across blocks that we miss updating the chunk's contig_hint. Rather than
    rely on correctness of the boundaries of hints, do a full overlap
    comparison.

    A future patch introduces the scan_hint which makes the contig_hint
    slightly fuzzy as they can at times be smaller than the actual hint.

    Signed-off-by: Dennis Zhou

    Dennis Zhou
     
  • pcpu_find_block_fit() guarantees that a fit is found within
    PCPU_BITMAP_BLOCK_BITS. Iteration is used to determine the first fit as
    it compares against the block's contig_hint. This can lead to
    incorrectly scanning past the end of the bitmap. The behavior was okay
    given the check after for bit_off >= end and the correctness of the
    hints from pcpu_find_block_fit().

    This patch fixes this by bounding the end offset by the number of bits
    in a chunk.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     
  • When updating the chunk's contig_hint on the free path of a hint that
    does not touch the page boundaries, it was incorrectly using the
    starting offset of the free region and the block's contig_hint. This
    could lead to incorrect assumptions about fit given a size and better
    alignment of the start. Fix this by using (end - start) as this is only
    called when updating a hint within a block.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Peng Fan

    Dennis Zhou
     

13 Mar, 2019

2 commits

  • As all the memblock allocation functions return NULL in case of error
    rather than panic(), the duplicates with _nopanic suffix can be removed.

    Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Petr Mladek [printk]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-17-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

24 Feb, 2019

1 commit

  • group_cnt array is defined with NR_CPUS entries, but normally
    nr_groups will not reach up to NR_CPUS. So there is no issue
    to the current code.

    Checking other parts of pcpu_build_alloc_info, use nr_groups as
    check condition, so make it consistent to use 'group < nr_groups'
    as for loop check. In case we do have nr_groups equals with NR_CPUS,
    we could also avoid memory access out of bounds.

    Signed-off-by: Peng Fan
    Signed-off-by: Dennis Zhou

    Peng Fan
     

02 Nov, 2018

1 commit


31 Oct, 2018

3 commits

  • When a memblock allocation APIs are called with align = 0, the alignment
    is implicitly set to SMP_CACHE_BYTES.

    Implicit alignment is done deep in the memblock allocator and it can
    come as a surprise. Not that such an alignment would be wrong even
    when used incorrectly but it is better to be explicit for the sake of
    clarity and the prinicple of the least surprise.

    Replace all such uses of memblock APIs with the 'align' parameter
    explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
    in the memblock internal allocation functions.

    For the case when memblock APIs are used via helper functions, e.g. like
    iommu_arena_new_node() in Alpha, the helper functions were detected with
    Coccinelle's help and then manually examined and updated where
    appropriate.

    The direct memblock APIs users were updated using the semantic patch below:

    @@
    expression size, min_addr, max_addr, nid;
    @@
    (
    |
    - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
    |
    - memblock_alloc(size, 0)
    + memblock_alloc(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_raw(size, 0)
    + memblock_alloc_raw(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from(size, 0, min_addr)
    + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_nopanic(size, 0)
    + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low(size, 0)
    + memblock_alloc_low(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low_nopanic(size, 0)
    + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from_nopanic(size, 0, min_addr)
    + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_node(size, 0, nid)
    + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
    )

    [mhocko@suse.com: changelog update]
    [akpm@linux-foundation.org: coding-style fixes]
    [rppt@linux.ibm.com: fix missed uses of implicit alignment]
    Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
    Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Suggested-by: Michal Hocko
    Acked-by: Paul Burton [MIPS]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: Matt Turner
    Cc: Michal Simek
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
    $(git grep -l memblock_virt_alloc)

    Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

08 Oct, 2018

1 commit

  • The commit ca460b3c9627 ("percpu: introduce bitmap metadata blocks")
    introduced bitmap metadata blocks. These metadata blocks are allocated
    whenever a new chunk is created, but they are never freed. Fix it.

    Fixes: ca460b3c9627 ("percpu: introduce bitmap metadata blocks")
    Signed-off-by: Mike Rapoport
    Cc: stable@vger.kernel.org
    Signed-off-by: Dennis Zhou

    Mike Rapoport
     

13 Sep, 2018

1 commit

  • WARN_ON() already contains an unlikely(), so it's not necessary to
    wrap it into another.

    Signed-off-by: Igor Stoppa
    Cc: Tejun Heo
    Cc: zijun_hu
    Cc: Christoph Lameter
    Cc: linux-mm@kvack.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Dennis Zhou

    Igor Stoppa
     

23 Aug, 2018

1 commit

  • Currently, percpu memory only exposes allocation and utilization
    information via debugfs. This more or less is only really useful for
    understanding the fragmentation and allocation information at a per-chunk
    level with a few global counters. This is also gated behind a config.
    BPF and cgroup, for example, have seen an increase in use causing
    increased use of percpu memory. Let's make it easier for someone to
    identify how much memory is being used.

    This patch adds the "Percpu" stat to meminfo to more easily look up how
    much percpu memory is in use. This number includes the cost for all
    allocated backing pages and not just insight at the per a unit, per chunk
    level. Metadata is excluded. I think excluding metadata is fair because
    the backing memory scales with the numbere of cpus and can quickly
    outweigh the metadata. It also makes this calculation light.

    Link: http://lkml.kernel.org/r/20180807184723.74919-1-dennisszhou@gmail.com
    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Acked-by: Roman Gushchin
    Reviewed-by: Andrew Morton
    Acked-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Johannes Weiner
    Cc: Christoph Lameter
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dennis Zhou (Facebook)
     

03 Apr, 2018

1 commit

  • Pul removal of obsolete architecture ports from Arnd Bergmann:
    "This removes the entire architecture code for blackfin, cris, frv,
    m32r, metag, mn10300, score, and tile, including the associated device
    drivers.

    I have been working with the (former) maintainers for each one to
    ensure that my interpretation was right and the code is definitely
    unused in mainline kernels. Many had fond memories of working on the
    respective ports to start with and getting them included in upstream,
    but also saw no point in keeping the port alive without any users.

    In the end, it seems that while the eight architectures are extremely
    different, they all suffered the same fate: There was one company in
    charge of an SoC line, a CPU microarchitecture and a software
    ecosystem, which was more costly than licensing newer off-the-shelf
    CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
    seems that all the SoC product lines are still around, but have not
    used the custom CPU architectures for several years at this point. In
    contrast, CPU instruction sets that remain popular and have actively
    maintained kernel ports tend to all be used across multiple licensees.

    [ See the new nds32 port merged in the previous commit for the next
    generation of "one company in charge of an SoC line, a CPU
    microarchitecture and a software ecosystem" - Linus ]

    The removal came out of a discussion that is now documented at
    https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
    marking any ports as deprecated but remove them all at once after I
    made sure that they are all unused. Some architectures (notably tile,
    mn10300, and blackfin) are still being shipped in products with old
    kernels, but those products will never be updated to newer kernel
    releases.

    After this series, we still have a few architectures without mainline
    gcc support:

    - unicore32 and hexagon both have very outdated gcc releases, but the
    maintainers promised to work on providing something newer. At least
    in case of hexagon, this will only be llvm, not gcc.

    - openrisc, risc-v and nds32 are still in the process of finishing
    their support or getting it added to mainline gcc in the first
    place. They all have patched gcc-7.3 ports that work to some
    degree, but complete upstream support won't happen before gcc-8.1.
    Csky posted their first kernel patch set last week, their situation
    will be similar

    [ Palmer Dabbelt points out that RISC-V support is in mainline gcc
    since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"

    This really says it all:

    2498 files changed, 95 insertions(+), 467668 deletions(-)

    * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
    MAINTAINERS: UNICORE32: Change email account
    staging: iio: remove iio-trig-bfin-timer driver
    tty: hvc: remove tile driver
    tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
    serial: remove tile uart driver
    serial: remove m32r_sio driver
    serial: remove blackfin drivers
    serial: remove cris/etrax uart drivers
    usb: Remove Blackfin references in USB support
    usb: isp1362: remove blackfin arch glue
    usb: musb: remove blackfin port
    usb: host: remove tilegx platform glue
    pwm: remove pwm-bfin driver
    i2c: remove bfin-twi driver
    spi: remove blackfin related host drivers
    watchdog: remove bfin_wdt driver
    can: remove bfin_can driver
    mmc: remove bfin_sdh driver
    input: misc: remove blackfin rotary driver
    input: keyboard: remove bf54x driver
    ...

    Linus Torvalds
     

26 Mar, 2018

1 commit


20 Mar, 2018

2 commits

  • In case of memory deficit and low percpu memory pages,
    pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
    time (as it makes memory allocations itself and waits
    for memory reclaim). If tasks doing pcpu_alloc() are
    choosen by OOM killer, they can't exit, because they
    are waiting for the mutex.

    The patch makes pcpu_alloc() to care about killing signal
    and use mutex_lock_killable(), when it's allowed by GFP
    flags. This guarantees, a task does not miss SIGKILL
    from OOM killer.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Tejun Heo

    Kirill Tkhai
     
  • microblaze build broke due to missing declaration of the
    cond_resched() invocation added recently. Let's include linux/sched.h
    explicitly.

    Signed-off-by: Tejun Heo
    Reported-by: kbuild test robot

    Tejun Heo
     

24 Feb, 2018

1 commit

  • When a large BPF percpu map is destroyed, I have seen
    pcpu_balance_workfn() holding cpu for hundreds of milliseconds.

    On KASAN config and 112 hyperthreads, average time to destroy a chunk
    is ~4 ms.

    [ 2489.841376] destroy chunk 1 in 4148689 ns
    ...
    [ 2490.093428] destroy chunk 32 in 4072718 ns

    Signed-off-by: Eric Dumazet
    Signed-off-by: Tejun Heo

    Eric Dumazet
     

18 Feb, 2018

3 commits

  • The prior patch added support for passing gfp flags through to the
    underlying allocators. This patch allows users to pass along gfp flags
    (currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
    allocators. This should allow users to decide if they are ok with
    failing allocations recovering in a more graceful way.

    Additionally, gfp passing was done as additional flags in the previous
    patch. Instead, change this to caller passed semantics. GFP_KERNEL is
    also removed as the default flag. It continues to be used for internally
    caused underlying percpu allocations.

    V2:
    Removed gfp_percpu_mask in favor of doing it inline.
    Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.

    Signed-off-by: Dennis Zhou
    Suggested-by: Daniel Borkmann
    Acked-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Dennis Zhou
     
  • Percpu memory using the vmalloc area based chunk allocator lazily
    populates chunks by first requesting the full virtual address space
    required for the chunk and subsequently adding pages as allocations come
    through. To ensure atomic allocations can succeed, a workqueue item is
    used to maintain a minimum number of empty pages. In certain scenarios,
    such as reported in [1], it is possible that physical memory becomes
    quite scarce which can result in either a rather long time spent trying
    to find free pages or worse, a kernel panic.

    This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
    through to the underlying allocators. This should prevent any
    unnecessary panics potentially caused by the workqueue item. The passing
    of gfp around is as additional flags rather than a full set of flags.
    The next patch will change these to caller passed semantics.

    V2:
    Added const modifier to gfp flags in the balance path.
    Removed an extra whitespace.

    [1] https://lkml.org/lkml/2018/2/12/551

    Signed-off-by: Dennis Zhou
    Suggested-by: Daniel Borkmann
    Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
    Acked-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Dennis Zhou
     
  • At some point the function declaration parameters got out of sync with
    the function definitions in percpu-vm.c and percpu-km.c. This patch
    makes them match again.

    Signed-off-by: Dennis Zhou
    Acked-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Dennis Zhou
     

28 Nov, 2017

1 commit

  • Commit 438a506180 ("percpu: don't forget to free the temporary struct
    pcpu_alloc_info") uncovered a problem on the CRIS architecture where
    the bootmem allocator is initialized with virtual addresses. Given it
    has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

    then things just work out because the end result is the same whether you
    give this a physical or a virtual address.

    Untill you call memblock_free_early(__pa(address)) that is, because
    values from __pa() don't match with the virtual addresses stuffed in the
    bootmem allocator anymore.

    Avoid freeing the temporary pcpu_alloc_info memory on that architecture
    until they fix things up to let the kernel boot like it did before.

    Signed-off-by: Nicolas Pitre
    Signed-off-by: Tejun Heo
    Fixes: 438a506180 ("percpu: don't forget to free the temporary struct pcpu_alloc_info")

    Nicolas Pitre
     

16 Nov, 2017

1 commit