04 Apr, 2017

1 commit


26 Mar, 2017

1 commit

  • Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized
    PER_CPU locks properly") we try to collapse per-cpu locks into a single
    class by giving them all the same key. For this key we choose the canonical
    address of the per-cpu object, which would be the offset into the per-cpu
    area.

    This has two problems:

    - there is a case where we run !0 lock->key through static_obj() and
    expect this to pass; it doesn't for canonical pointers.

    - 0 is a valid canonical address.

    Cure both issues by redefining the canonical address as the address of the
    per-cpu variable on the boot CPU.

    Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
    all, track the boot CPU in a variable.

    Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
    Reported-by: kernel test robot
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Borislav Petkov
    Cc: Sebastian Andrzej Siewior
    Cc: linux-mm@kvack.org
    Cc: wfg@linux.intel.com
    Cc: kernel test robot
    Cc: LKP
    Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

16 Mar, 2017

1 commit

  • If a PER_CPU struct which contains a spin_lock is statically initialized
    via:

    DEFINE_PER_CPU(struct foo, bla) = {
    .lock = __SPIN_LOCK_UNLOCKED(bla.lock)
    };

    then lockdep assigns a seperate key to each lock because the logic for
    assigning a key to statically initialized locks is to use the address as
    the key. With per CPU locks the address is obvioulsy different on each CPU.

    That's wrong, because all locks should have the same key.

    To solve this the following modifications are required:

    1) Extend the is_kernel/module_percpu_addr() functions to hand back the
    canonical address of the per CPU address, i.e. the per CPU address
    minus the per CPU offset.

    2) Check the lock address with these functions and if the per CPU check
    matches use the returned canonical address as the lock key, so all per
    CPU locks have the same key.

    3) Move the static_obj(key) check into look_up_lock_class() so this check
    can be avoided for statically initialized per CPU locks. That's
    required because the canonical address fails the static_obj(key) check
    for obvious reasons.

    Reported-by: Mike Galbraith
    Signed-off-by: Thomas Gleixner
    [ Merged Dan's fixups for !MODULES and !SMP into this patch. ]
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Dan Murphy
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

07 Mar, 2017

1 commit

  • Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done
    without holding pcpu_lock. This can lead to bad updates to the variable.
    Add missing lock calls.

    Fixes: b539b87fed37 ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated")
    Signed-off-by: Tahsin Erdogan
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org # v3.18+

    Tahsin Erdogan
     

28 Feb, 2017

1 commit

  • Fix typos and add the following to the scripts/spelling.txt:

    followings||following

    While we are here, add a missing colon in the boilerplate in DT binding
    documents. The "you SoC" in allwinner,sunxi-pinctrl.txt was fixed as
    well.

    I reworded "as the followings:" to "as follows:" for
    drivers/usb/gadget/udc/renesas_usb3.c.

    Link: http://lkml.kernel.org/r/1481573103-11329-32-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

14 Dec, 2016

1 commit


13 Dec, 2016

1 commit

  • As shown by pcpu_build_alloc_info(), the number of units within a percpu
    group is deduced by rounding up the number of CPUs within the group to
    @upa boundary/ Therefore, the number of CPUs isn't equal to the units's
    if it isn't aligned to @upa normally. However, pcpu_page_first_chunk()
    uses BUG_ON() to assert that one number is equal to the other roughly,
    so a panic is maybe triggered by the BUG_ON() incorrectly.

    In order to fix this issue, the number of CPUs is rounded up then
    compared with units's and the BUG_ON() is replaced with a warning and
    return of an error code as well, to keep system alive as much as
    possible.

    Link: http://lkml.kernel.org/r/57FCF07C.2020103@zoho.com
    Signed-off-by: zijun_hu
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zijun_hu
     

20 Oct, 2016

1 commit

  • The percpu allocator expectedly assumes that the requested alignment
    is power of two but hasn't been veryfing the input. If the specified
    alignment isn't power of two, the allocator can malfunction. Add the
    sanity check.

    The following is detailed analysis of the effects of alignments which
    aren't power of two.

    The alignment must be a even at least since the LSB of a chunk->map
    element is used as free/in-use flag of a area; besides, the alignment
    must be a power of 2 too since ALIGN() doesn't work well for other
    alignment always but is adopted by pcpu_fit_in_area(). IOW, the
    current allocator only works well for a power of 2 aligned area
    allocation.

    See below opposite example for why an odd alignment doesn't work.
    Let's assume area [16, 36) is free but its previous one is in-use, we
    want to allocate a @size == 8 and @align == 7 area. The larger area
    [16, 36) is split to three areas [16, 21), [21, 29), [29, 36)
    eventually. However, due to the usage for a chunk->map element, the
    actual offset of the aim area [21, 29) is 21 but is recorded in
    relevant element as 20; moreover, the residual tail free area [29,
    36) is mistook as in-use and is lost silently

    Unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power
    of 2 for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and
    the latter result isn't desired obviously.

    tj: Code style and patch description updates.

    Signed-off-by: zijun_hu
    Suggested-by: Tejun Heo
    Signed-off-by: Tejun Heo

    zijun_hu
     

05 Oct, 2016

2 commits

  • in order to ensure the percpu group areas within a chunk aren't
    distributed too sparsely, pcpu_embed_first_chunk() goes to error handling
    path when a chunk spans over 3/4 VMALLOC area, however, during the error
    handling, it forget to free the memory allocated for all percpu groups by
    going to label @out_free other than @out_free_areas.

    it will cause memory leakage issue if the rare scene really happens, in
    order to fix the issue, we check chunk spanned area immediately after
    completing memory allocation for all percpu groups, we go to label
    @out_free_areas to free the memory then return if the checking is failed.

    in order to verify the approach, we dump all memory allocated then
    enforce the jump then dump all memory freed, the result is okay after
    checking whether we free all memory we allocate in this function.

    BTW, The approach is chosen after thinking over the below scenes
    - we don't go to label @out_free directly to fix this issue since we
    maybe free several allocated memory blocks twice
    - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
    usable memory other than handling error, moreover, the function does
    not return error code in any case, it either panics due to BUG_ON()
    or return 0.

    Signed-off-by: zijun_hu
    Tested-by: zijun_hu
    Signed-off-by: Tejun Heo

    zijun_hu
     
  • pcpu_embed_first_chunk() calculates the range a percpu chunk spans into
    @max_distance and uses it to ensure that a chunk is not too big compared
    to the total vmalloc area. However, during calculation, it used incorrect
    top address by adding a unit size to the highest group's base address.

    This can make the calculated max_distance slightly smaller than the actual
    distance although given the scale of values involved the error is very
    unlikely to have an actual impact.

    Fix this issue by adding the group's size instead of a unit size.

    BTW, The type of variable max_distance is changed from size_t to unsigned
    long too based on below consideration:
    - type unsigned long usually have same width with IP core registers and
    can be applied at here very well
    - make @max_distance type consistent with the operand calculated against
    it such as @ai->groups[i].base_offset and macro VMALLOC_TOTAL
    - type unsigned long is more universal then size_t, size_t is type defined
    to unsigned int or unsigned long among various ARCHs usually

    Signed-off-by: zijun_hu
    Signed-off-by: Tejun Heo

    zijun_hu
     

25 May, 2016

2 commits

  • For non-atomic allocations, pcpu_alloc() can try to extend the area
    map synchronously after dropping pcpu_lock; however, the extension
    wasn't synchronized against chunk destruction and the chunk might get
    freed while extension is in progress.

    This patch fixes the bug by putting most of non-atomic allocations
    under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
    is responsible for async chunk management including destruction.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Alexei Starovoitov
    Reported-by: Vlastimil Babka
    Reported-by: Sasha Levin
    Cc: stable@vger.kernel.org # v3.18+
    Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")

    Tejun Heo
     
  • Atomic allocations can trigger async map extensions which is serviced
    by chunk->map_extend_work. pcpu_balance_work which is responsible for
    destroying idle chunks wasn't synchronizing properly against
    chunk->map_extend_work and may end up freeing the chunk while the work
    item is still in flight.

    This patch fixes the bug by rolling async map extension operations
    into pcpu_balance_work.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Alexei Starovoitov
    Reported-by: Vlastimil Babka
    Reported-by: Sasha Levin
    Cc: stable@vger.kernel.org # v3.18+
    Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")

    Tejun Heo
     

18 Mar, 2016

4 commits

  • Use the normal mechanism to make the logging output consistently
    "percpu:" instead of a mix of "PERCPU:" and "percpu:"

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Most of the mm subsystem uses pr_ so make it consistent.

    Miscellanea:

    - Realign arguments
    - Add missing newline to format
    - kmemleak-test.c has a "kmemleak: " prefix added to the
    "Kmemleak testing" logging message via pr_fmt

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Kernel style prefers a single string over split strings when the string is
    'user-visible'.

    Miscellanea:

    - Add a missing newline
    - Realign arguments

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • There are a mixture of pr_warning and pr_warn uses in mm. Use pr_warn
    consistently.

    Miscellanea:

    - Coalesce formats
    - Realign arguments

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

23 Jan, 2016

1 commit

  • There are many locations that do

    if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
    else
    kfree(ptr);

    but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
    using is_vmalloc_addr(). Unless callers have special reasons, we can
    replace this branch with kvfree(). Please check and reply if you found
    problems.

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Acked-by: Jan Kara
    Acked-by: Russell King
    Reviewed-by: Andreas Dilger
    Acked-by: "Rafael J. Wysocki"
    Acked-by: David Rientjes
    Cc: "Luck, Tony"
    Cc: Oleg Drokin
    Cc: Boris Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

06 Nov, 2015

1 commit


21 Jul, 2015

1 commit


25 Jun, 2015

1 commit

  • Beginning at commit d52d3997f843 ("ipv6: Create percpu rt6_info"), the
    following INFO splat is logged:

    ===============================
    [ INFO: suspicious RCU usage. ]
    4.1.0-rc7-next-20150612 #1 Not tainted
    -------------------------------
    kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
    other info that might help us debug this:
    rcu_scheduler_active = 1, debug_locks = 0
    3 locks held by systemd/1:
    #0: (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv+0x1f/0x40
    #1: (rcu_read_lock_bh){......}, at: [] ipv6_add_addr+0x62/0x540
    #2: (addrconf_hash_lock){+...+.}, at: [] ipv6_add_addr+0x184/0x540
    stack backtrace:
    CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
    Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014
    Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

    Additional backtrace lines are truncated. In addition, the above splat
    is followed by several "BUG: sleeping function called from invalid
    context at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau,
    these are the clue to the fix. Routine kmemleak_alloc_percpu() always
    uses GFP_KERNEL for its allocations, whereas it should follow the gfp
    from its callers.

    Reviewed-by: Catalin Marinas
    Reviewed-by: Kamalesh Babulal
    Acked-by: Martin KaFai Lau
    Signed-off-by: Larry Finger
    Cc: Martin KaFai Lau
    Cc: Catalin Marinas
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: [3.18+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Larry Finger
     

25 Mar, 2015

1 commit


14 Feb, 2015

1 commit

  • printk and friends can now format bitmaps using '%*pb[l]'. cpumask
    and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
    respectively which can be used to generate the two printf arguments
    necessary to format the specified cpu/nodemask.

    Signed-off-by: Tejun Heo
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

29 Oct, 2014

1 commit


09 Oct, 2014

1 commit

  • When @gfp is specified, the percpu allocator is interested in whether
    it contains all of GFP_KERNEL or not. If it does, the normal
    allocation path is taken; otherwise, the atomic allocation path.
    Unfortunately, pcpu_alloc() was incorrectly testing for whether @gfp
    contains any part of GFP_KERNEL.

    Fix it by testing "(gfp & GFP_KERNEL) != GFP_KERNEL" instead of
    "!(gfp & GFP_KERNEL)" to decide whether the allocation should be
    atomic or not.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

22 Sep, 2014

1 commit

  • This reverts commit 3189eddbcafc ("percpu: free percpu allocation info for
    uniprocessor system").

    The commit causes a hang with a crisv32 image. This may be an architecture
    problem, but at least for now the revert is necessary to be able to boot a
    crisv32 image.

    Cc: Tejun Heo
    Cc: Honggang Li
    Signed-off-by: Guenter Roeck
    Signed-off-by: Tejun Heo
    Fixes: 3189eddbcafc ("percpu: free percpu allocation info for uniprocessor system")
    Cc: stable@vger.kernel.org # Please don't apply 3189eddbcafc

    Guenter Roeck
     

09 Sep, 2014

1 commit


03 Sep, 2014

10 commits

  • The percpu allocator now supports atomic allocations by only
    allocating from already populated areas but the mechanism to ensure
    that there's adequate amount of populated areas was missing.

    This patch expands pcpu_balance_work so that in addition to freeing
    excess free chunks it also populates chunks to maintain an adequate
    level of populated areas. pcpu_alloc() schedules pcpu_balance_work if
    the amount of free populated areas is too low or after an atomic
    allocation failure.

    * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
    PCPU_EMPTY_POP_PAGES_LOW.

    * pcpu_async_enabled is added to gate both async jobs -
    chunk->map_extend_work and pcpu_balance_work - so that we don't end
    up scheduling them while the needed subsystems aren't up yet.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • pcpu_reclaim_work will also be used to populate chunks asynchronously.
    Rename it to pcpu_balance_work in preparation. pcpu_reclaim() is
    renamed to pcpu_balance_workfn() and some of its local variables are
    renamed too.

    This is pure rename.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • pcpu_nr_empty_pop_pages counts the number of empty populated pages
    across all chunks and chunk->nr_populated counts the number of
    populated pages in a chunk. Both will be used to implement pre/async
    population for atomic allocations.

    pcpu_chunk_[de]populated() are added to update chunk->populated,
    chunk->nr_populated and pcpu_nr_empty_pop_pages together. All
    successful chunk [de]populations should be followed by the
    corresponding pcpu_chunk_[de]populated() calls.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • An allocation attempt may require extending chunk->map array which
    requires GFP_KERNEL context which isn't available for atomic
    allocations. This patch ensures that chunk->map array usually keeps
    some amount of available space by directly allocating buffer space
    during GFP_KERNEL allocations and scheduling async extension during
    atomic ones. This should make atomic allocation failures from map
    space exhaustion rare.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Now that pcpu_alloc_area() can allocate only from populated areas,
    it's easy to add atomic allocation support to [__]alloc_percpu().
    Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
    operations and allocates only from the populated areas if @gfp doesn't
    contain GFP_KERNEL. New interface functions [__]alloc_percpu_gfp()
    are added.

    While this means that atomic allocations are possible, this isn't
    complete yet as there's no mechanism to ensure that certain amount of
    populated areas is kept available and atomic allocations may keep
    failing under certain conditions.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • The next patch will conditionalize the population block in
    pcpu_alloc() which will end up making a rather large indentation
    change obfuscating the actual logic change. This patch puts the block
    under "if (true)" so that the next patch can avoid indentation
    changes. The defintions of the local variables which are used only in
    the block are moved into the block.

    This patch is purely cosmetic.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Update pcpu_alloc_area() so that it can skip unpopulated areas if the
    new parameter @pop_only is true. This is implemented by a new
    function, pcpu_fit_in_area(), which determines the amount of head
    padding considering the alignment and populated state.

    @pop_only is currently always false but this will be used to implement
    atomic allocation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • At first, the percpu allocator required a sleepable context for both
    alloc and free paths and used pcpu_alloc_mutex to protect everything.
    Later, pcpu_lock was introduced to protect the index data structure so
    that the free path can be invoked from atomic contexts. The
    conversion only updated what's necessary and left most of the
    allocation path under pcpu_alloc_mutex.

    The percpu allocator is planned to add support for atomic allocation
    and this patch restructures locking so that the coverage of
    pcpu_alloc_mutex is further reduced.

    * pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
    chunk and populating the allocated area. Everything else is now
    protected soley by pcpu_lock.

    After this change, multiple instances of pcpu_extend_area_map() may
    race but the function already implements sufficient synchronization
    using pcpu_lock.

    This also allows multiple allocators to arrive at new chunk
    creation. To avoid creating multiple empty chunks back-to-back, a
    new chunk is created iff there is no other empty chunk after
    grabbing pcpu_alloc_mutex.

    * pcpu_lock is now held while modifying chunk->populated bitmap.
    After this, all data structures are protected by pcpu_lock.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Previously, pcpu_[de]populate_chunk() were called with the range which
    may contain multiple target regions in it and
    pcpu_[de]populate_chunk() iterated over the regions. This has the
    benefit of batching up cache flushes for all the regions; however,
    we're planning to add more bookkeeping logic around [de]population to
    support atomic allocations and this delegation of iterations gets in
    the way.

    This patch moves the region iterations out of
    pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
    pcpu_reclaim() - so that we can later add logic to track more states
    around them. This change may make cache and tlb flushes more frequent
    but multi-region [de]populations are rare anyway and if this actually
    becomes a problem, it's not difficult to factor out cache flushes as
    separate callbacks which are directly invoked from percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • percpu-vm and percpu-km implement separate versions of
    pcpu_[de]populate_chunk() and some part which is or should be common
    are currently in the specific implementations. Make the following
    changes.

    * Allocate area clearing is moved from the pcpu_populate_chunk()
    implementations to pcpu_alloc(). This makes percpu-km's version
    noop.

    * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
    to their respective callers so that they are applied to percpu-km
    too. This doesn't make any meaningful difference as both functions
    are noop for percpu-km; however, this is more consistent and will
    help implementing atomic allocation support.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

16 Aug, 2014

1 commit


19 Jun, 2014

1 commit


15 Apr, 2014

1 commit

  • pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
    BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)

    It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
    but for consistency with its couterpart pcpu_mem_zalloc(),
    use pcpu_mem_free() instead.

    Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use
    pcpu_mem_free() instead of kfree()") addressed this problem, but
    missed this one.

    tj: commit message updated

    Signed-off-by: Jianyu Zhan
    Signed-off-by: Tejun Heo
    Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online)
    Cc: stable@vger.kernel.org

    Jianyu Zhan
     

29 Mar, 2014

1 commit