10 May, 2019

2 commits

  • Release percpu memory after finishing the switch to the atomic mode
    if only PERCPU_REF_ALLOW_REINIT isn't set.

    Signed-off-by: Roman Gushchin
    Acked-by: Tejun Heo
    Signed-off-by: Dennis Zhou

    Roman Gushchin
     
  • In most cases percpu reference counters are not switched to the
    percpu mode after they reach the atomic mode. Some obvious exceptions
    are reference counters which are initialized into the atomic
    mode (using PERCPU_REF_INIT_ATOMIC and PERCPU_REF_INIT_DEAD flags),
    and there are few other exceptions.

    But in most cases there is no way back, and once the reference counter
    is switched to the atomic mode, there is no reason to wait for
    percpu_ref_exit() to release the percpu memory. Of course, the size
    of a single counter is not so big, but because it can pin the whole
    percpu block in memory, the memory footprint can be noticeable
    (e.g. on my 32 CPUs machine a percpu block is 8Mb large).

    To make releasing of the percpu memory as early as possible, let's
    introduce the PERCPU_REF_ALLOW_REINIT flag with the following semantics:
    it has to be set in order to switch a percpu reference counter to the
    percpu mode after the initialization. PERCPU_REF_INIT_ATOMIC and
    PERCPU_REF_INIT_DEAD flags will implicitly assume PERCPU_REF_ALLOW_REINIT.

    This patch doesn't introduce any functional change to avoid any
    regressions. It will be done later in the patchset after adjusting
    all call sites, which are reviving percpu counters.

    Signed-off-by: Roman Gushchin
    Acked-by: Tejun Heo
    Signed-off-by: Dennis Zhou

    Roman Gushchin
     

27 Sep, 2018

1 commit

  • This function will be used in a later patch to switch the struct
    request_queue q_usage_counter from killed back to live. In contrast
    to percpu_ref_reinit(), this new function does not require that the
    refcount is zero.

    Signed-off-by: Bart Van Assche
    Acked-by: Tejun Heo
    Reviewed-by: Ming Lei
    Cc: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

20 Mar, 2018

1 commit

  • percpu_ref internally uses sched-RCU to implement the percpu -> atomic
    mode switching and the documentation suggested that this could be
    depended upon. This doesn't seem like a good idea.

    * percpu_ref uses sched-RCU which has different grace periods regular
    RCU. Users may combine percpu_ref with regular RCU usage and
    incorrectly believe that regular RCU grace periods are performed by
    percpu_ref. This can lead to, for example, use-after-free due to
    premature freeing.

    * percpu_ref has a grace period when switching from percpu to atomic
    mode. It doesn't have one between the last put and release. This
    distinction is subtle and can lead to surprising bugs.

    * percpu_ref allows starting in and switching to atomic mode manually
    for debugging and other purposes. This means that there may not be
    any grace periods from kill to release.

    This patch makes it clear that the grace periods are percpu_ref's
    internal implementation detail and can't be depended upon by the
    users.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Linus Torvalds
    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Dec, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

23 Mar, 2017

1 commit

  • percpu_ref_switch_to_atomic_sync() schedules the switch to atomic mode, then
    waits for it to complete.

    Also export percpu_ref_switch_to_* so they can be used from modules.

    This will be used in md/raid to count the number of pending write
    requests to an array.
    We occasionally need to check if the count is zero, but most often
    we don't care.
    We always want updates to the counter to be fast, as in some cases
    we count every 4K page.

    Signed-off-by: NeilBrown
    Acked-by: Tejun Heo
    Signed-off-by: Shaohua Li

    NeilBrown
     

28 Jan, 2017

1 commit

  • percpu_ref_tryget() and percpu_ref_tryget_live() should return
    "true" IFF they acquire a reference. But the return value from
    atomic_long_inc_not_zero() is a long and may have high bits set,
    e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
    is bool so the reference may actually be acquired but the routines
    return "false" which results in a reference leak since the caller
    assumes it does not need to do a corresponding percpu_ref_put().

    This was seen when performing CPU hotplug during I/O, as hangs in
    blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
    raced with percpu_ref_tryget (blk_mq_timeout_work).
    Sample stack trace:

    __switch_to+0x2c0/0x450
    __schedule+0x2f8/0x970
    schedule+0x48/0xc0
    blk_mq_freeze_queue_wait+0x94/0x120
    blk_mq_queue_reinit_work+0xb8/0x180
    blk_mq_queue_reinit_prepare+0x84/0xa0
    cpuhp_invoke_callback+0x17c/0x600
    cpuhp_up_callbacks+0x58/0x150
    _cpu_up+0xf0/0x1c0
    do_cpu_up+0x120/0x150
    cpu_subsys_online+0x64/0xe0
    device_online+0xb4/0x120
    online_store+0xb4/0xc0
    dev_attr_store+0x68/0xa0
    sysfs_kf_write+0x80/0xb0
    kernfs_fop_write+0x17c/0x250
    __vfs_write+0x6c/0x1e0
    vfs_write+0xd0/0x270
    SyS_write+0x6c/0x110
    system_call+0x38/0xe0

    Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
    and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
    However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
    and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.

    The fix is to make the tryget routines use an actual boolean internally instead
    of the atomic long result truncated to a int.

    Fixes: e625305b3907 percpu-refcount: make percpu_ref based on longs instead of ints
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751
    Signed-off-by: Douglas Miller
    Reviewed-by: Jens Axboe
    Signed-off-by: Tejun Heo
    Fixes: e625305b3907 ("percpu-refcount: make percpu_ref based on longs instead of ints")
    Cc: stable@vger.kernel.org # v3.18+

    Douglas Miller
     

03 Jun, 2016

1 commit

  • lockless_dereference() is planned to grow a sanity check to ensure
    that the input parameter is a pointer. __ref_is_percpu() passes in an
    unsinged long value which is a combination of a pointer and a flag.
    While it can be casted to a pointer lvalue, the casting looks messy
    and it's a special case anyway. Let's revert back to open-coding
    READ_ONCE() and explicit barrier.

    This doesn't cause any functional changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Pranith Kumar
    Cc: Thomas Gleixner
    Cc: kernel-team@fb.com
    Link: http://lkml.kernel.org/g/20160522185040.GA23664@p183.telecom.by
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

16 Nov, 2015

1 commit


06 Jan, 2015

2 commits

  • Implement percpu_ref_is_dying() which tests whether the ref is dying
    or dead. This is useful to determine the current state when a
    percpu_ref is used as a cyclic on/off switch via kill and reinit.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet

    Tejun Heo
     
  • __ref_is_percpu() needs the implied ACCESS_ONCE() in
    lockless_dereference() on @ref->percpu_count_ptr because the value is
    tested for !__PERCPU_REF_ATOMIC, which may be set asynchronously, and
    then used as a pointer. If the compiler generates a separate fetch
    when using it as a pointer, __PERCPU_REF_ATOMIC may be set in between
    contaminating the pointer value.

    percpu_ref_tryget_live() also uses ACCESS_ONCE() to test
    __PERCPU_REF_DEAD; however, there's no reason for this. I just copied
    ACCESS_ONCE() usage blindly from __ref_is_percpu(). All it does is
    confusing people trying to understand what's going on.

    This patch removes the unnecessary ACCESS_ONCE() usage from
    percpu_ref_tryget_live() and adds a comment explaining why
    __ref_is_percpu() needs it.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet

    Tejun Heo
     

12 Dec, 2014

1 commit

  • Pull percpu updates from Tejun Heo:
    "Nothing interesting. A patch to convert the remaining __get_cpu_var()
    users, another to fix non-critical off-by-one in an assertion and a
    cosmetic conversion to lockless_dereference() in percpu-ref.

    The back-merge from mainline is to receive lockless_dereference()"

    * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: Replace smp_read_barrier_depends() with lockless_dereference()
    percpu: Convert remaining __get_cpu_var uses in 3.18-rcX
    percpu: off by one in BUG_ON()

    Linus Torvalds
     

11 Dec, 2014

1 commit

  • Charges currently pin the css indirectly by playing tricks during
    css_offline(): user pages stall the offlining process until all of them
    have been reparented, whereas kmemcg acquires a keep-alive reference if
    outstanding kernel pages are detected at that point.

    In preparation for removing all this complexity, make the pinning explicit
    and acquire a css references for every charged page.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: David Rientjes
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

24 Nov, 2014

1 commit

  • While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
    decouple switching to percpu mode and reinit") updated
    __ref_is_percpu() so that it only tests ATOMIC flag to determine
    whether the ref is in percpu mode or not; however, while DEAD implies
    ATOMIC, the two flags are set separately during percpu_ref_kill() and
    if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
    ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
    value verbatim as the percpu pointer after testing ATOMIC, the pointer
    may now be contaminated with the DEAD flag.

    This can be fixed by clearing the flag bits before returning the
    pointer which was the fix proposed by Shaohua; however, as DEAD
    implies ATOMIC, we can just test for both flags at once and avoid the
    explicit masking.

    Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
    are clear before returning @ref->percpu_count_ptr as the percpu
    pointer.

    Signed-off-by: Tejun Heo
    Reported-and-Reviewed-by: Shaohua Li
    Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
    Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")

    Tejun Heo
     

22 Nov, 2014

1 commit


10 Oct, 2014

1 commit

  • Pull percpu updates from Tejun Heo:
    "A lot of activities on percpu front. Notable changes are...

    - percpu allocator now can take @gfp. If @gfp doesn't contain
    GFP_KERNEL, it tries to allocate from what's already available to
    the allocator and a work item tries to keep the reserve around
    certain level so that these atomic allocations usually succeed.

    This will replace the ad-hoc percpu memory pool used by
    blk-throttle and also be used by the planned blkcg support for
    writeback IOs.

    Please note that I noticed a bug in how @gfp is interpreted while
    preparing this pull request and applied the fix 6ae833c7fe0c
    ("percpu: fix how @gfp is interpreted by the percpu allocator")
    just now.

    - percpu_ref now uses longs for percpu and global counters instead of
    ints. It leads to more sparse packing of the percpu counters on
    64bit machines but the overhead should be negligible and this
    allows using percpu_ref for refcnting pages and in-memory objects
    directly.

    - The switching between percpu and single counter modes of a
    percpu_ref is made independent of putting the base ref and a
    percpu_ref can now optionally be initialized in single or killed
    mode. This allows avoiding percpu shutdown latency for cases where
    the refcounted objects may be synchronously created and destroyed
    in rapid succession with only a fraction of them reaching fully
    operational status (SCSI probing does this when combined with
    blk-mq support). It's also planned to be used to implement forced
    single mode to detect underflow more timely for debugging.

    There's a separate branch percpu/for-3.18-consistent-ops which cleans
    up the duplicate percpu accessors. That branch causes a number of
    conflicts with s390 and other trees. I'll send a separate pull
    request w/ resolutions once other branches are merged"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
    percpu: fix how @gfp is interpreted by the percpu allocator
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
    percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
    percpu_ref: add PERCPU_REF_INIT_* flags
    percpu_ref: decouple switching to percpu mode and reinit
    percpu_ref: decouple switching to atomic mode and killing
    percpu_ref: add PCPU_REF_DEAD
    percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
    percpu_ref: replace pcpu_ prefix with percpu_
    percpu_ref: minor code and comment updates
    percpu_ref: relocate percpu_ref_reinit()
    Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
    Revert "percpu: free percpu allocation info for uniprocessor system"
    percpu-refcount: make percpu_ref based on longs instead of ints
    percpu-refcount: improve WARN messages
    percpu: fix locking regression in the failure path of pcpu_alloc()
    percpu-refcount: add @gfp to percpu_ref_init()
    proportions: add @gfp to init functions
    percpu_counter: add @gfp to percpu_counter_init()
    percpu_counter: make percpu_counters_lock irq-safe
    ...

    Linus Torvalds
     

08 Oct, 2014

1 commit

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     

25 Sep, 2014

11 commits

  • Currently, a percpu_ref which is initialized with
    PERPCU_REF_INIT_ATOMIC or switched to atomic mode via
    switch_to_atomic() automatically reverts to percpu mode on the first
    percpu_ref_reinit(). This makes the atomic mode difficult to use for
    cases where a percpu_ref is used as a persistent on/off switch which
    may be cycled multiple times.

    This patch makes such atomic state sticky so that it survives through
    kill/reinit cycles. After this patch, atomic state is cleared only by
    an explicit percpu_ref_switch_to_percpu() call.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner

    Tejun Heo
     
  • With the recent addition of percpu_ref_reinit(), percpu_ref now can be
    used as a persistent switch which can be turned on and off repeatedly
    where turning off maps to killing the ref and waiting for it to drain;
    however, there currently isn't a way to initialize a percpu_ref in its
    off (killed and drained) state, which can be inconvenient for certain
    persistent switch use cases.

    Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
    selection of operation mode; however, currently a newly initialized
    percpu_ref is always in percpu mode making it impossible to avoid the
    latency overhead of switching to atomic mode.

    This patch adds @flags to percpu_ref_init() and implements the
    following flags.

    * PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
    * PERCPU_REF_INIT_DEAD : start ref killed and drained

    These flags should be able to serve the above two use cases.

    v2: target_core_tpg.c conversion was missing. Fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner

    Tejun Heo
     
  • percpu_ref has treated the dropping of the base reference and
    switching to atomic mode as an integral operation; however, there's
    nothing inherent tying the two together.

    The use cases for percpu_ref have been expanding continuously. While
    the current init/kill/reinit/exit model can cover a lot, the coupling
    of kill/reinit with atomic/percpu mode switching is turning out to be
    too restrictive for use cases where many percpu_refs are created and
    destroyed back-to-back with only some of them reaching extended
    operation. The coupling also makes implementing always-atomic debug
    mode difficult.

    This patch separates out percpu mode switching into
    percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on
    top of it.

    * DEAD still requires ATOMIC. A dead ref can't be switched to percpu
    mode w/o going through reinit.

    v2: __percpu_ref_switch_to_percpu() was missing static. Fixed.
    Reported by Fengguang aka kbuild test robot.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner
    Cc: kbuild test robot

    Tejun Heo
     
  • percpu_ref has treated the dropping of the base reference and
    switching to atomic mode as an integral operation; however, there's
    nothing inherent tying the two together.

    The use cases for percpu_ref have been expanding continuously. While
    the current init/kill/reinit/exit model can cover a lot, the coupling
    of kill/reinit with atomic/percpu mode switching is turning out to be
    too restrictive for use cases where many percpu_refs are created and
    destroyed back-to-back with only some of them reaching extended
    operation. The coupling also makes implementing always-atomic debug
    mode difficult.

    This patch separates out atomic mode switching into
    percpu_ref_switch_to_atomic() and reimplements
    percpu_ref_kill_and_confirm() on top of it.

    * The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now
    differentiated. Among get/put operations, percpu_ref_tryget_live()
    is the only one which cares about DEAD.

    * percpu_ref_switch_to_atomic() can be called multiple times on the
    same ref. This means that multiple @confirm_switch may get queued
    up which we can't do reliably without extra memory area. This is
    handled by making the later invocation synchronously wait for the
    completion of the previous one. This isn't particularly desirable
    but such synchronous waits shouldn't happen in most cases.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner

    Tejun Heo
     
  • percpu_ref will be restructured so that percpu/atomic mode switching
    and reference killing are dedoupled. In preparation, add
    PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD.
    For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC
    uses are converted to PCPU_REF_ATOMIC_DEAD without causing any
    behavior changes.

    percpu_ref_init() now specifies an explicit alignment when allocating
    the percpu counters so that the pointer has enough unused low bits to
    accomodate the flags. Note that one flag was fine as min alignment
    for percpu memory is 2 bytes but two flags are already too many for
    the natural alignment of unsigned longs on archs like cris and m68k.

    v2: The original patch had BUILD_BUG_ON() which triggers if unsigned
    long's alignment isn't enough to accomodate the flags, which
    triggered on cris and m64k. percpu_ref_init() updated to specify
    the required alignment explicitly. Reported by Fengguang.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: kbuild test robot

    Tejun Heo
     
  • percpu_ref will be restructured so that percpu/atomic mode switching
    and reference killing are dedoupled. In preparation, do the following
    renames.

    * percpu_ref->confirm_kill -> percpu_ref->confirm_switch
    * __PERCPU_REF_DEAD -> __PERCPU_REF_ATOMIC
    * __percpu_ref_alive() -> __ref_is_percpu()

    This patch is pure rename and doesn't introduce any functional
    changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
    externally visible ones. This is the same convention used in the
    percpu allocator implementation. It works fine there but percpu_ref
    doesn't have too much internal-only stuff and scattered usages of
    pcpu_ prefix are confusing than helpful.

    This patch replaces all pcpu_ prefixes with percpu_. This is pure
    rename and there's no functional change. Note that PCPU_REF_DEAD is
    renamed to __PERCPU_REF_DEAD to signify that the flag is internal.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • * Some comments became stale. Updated.
    * percpu_ref_tryget() unnecessarily initializes @ret. Removed.
    * A blank line removed from percpu_ref_kill_rcu().
    * Explicit function name in a WARN format string replaced with __func__.
    * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • percpu_ref is gonna go through restructuring. Move
    percpu_ref_reinit() after percpu_ref_kill_and_confirm(). This will
    make later changes easier to follow and result in cleaner
    organization.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • This reverts commit 0a30288da1aec914e158c2d7a3482a85f632750f, which
    was a temporary fix for SCSI blk-mq stall issue. The following
    patches will fix the issue properly by introducing atomic mode to
    percpu_ref.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig

    Tejun Heo
     
  • …linux-block into for-3.18

    This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
    kludge for SCSI blk-mq stall during probe") which implements
    __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
    commit reverted and patches to implement proper fix will be added.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Kent Overstreet <kmo@daterainc.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Christoph Hellwig <hch@lst.de>

    Tejun Heo
     

24 Sep, 2014

1 commit

  • blk-mq uses percpu_ref for its usage counter which tracks the number
    of in-flight commands and used to synchronously drain the queue on
    freeze. percpu_ref shutdown takes measureable wallclock time as it
    involves a sched RCU grace period. This means that draining a blk-mq
    takes measureable wallclock time. One would think that this shouldn't
    matter as queue shutdown should be a rare event which takes place
    asynchronously w.r.t. userland.

    Unfortunately, SCSI probing involves synchronously setting up and then
    tearing down a lot of request_queues back-to-back for non-existent
    LUNs. This means that SCSI probing may take more than ten seconds
    when scsi-mq is used.

    This will be properly fixed by implementing a mechanism to keep
    q->mq_usage_counter in atomic mode till genhd registration; however,
    that involves rather big updates to percpu_ref which is difficult to
    apply late in the devel cycle (v3.17-rc6 at the moment). As a
    stop-gap measure till the proper fix can be implemented in the next
    cycle, this patch introduces __percpu_ref_kill_expedited() and makes
    blk_mq_freeze_queue() use it. This is heavy-handed but should work
    for testing the experimental SCSI blk-mq implementation.

    Signed-off-by: Tejun Heo
    Reported-by: Christoph Hellwig
    Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
    Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
    Cc: Kent Overstreet
    Cc: Jens Axboe
    Tested-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

20 Sep, 2014

1 commit

  • percpu_ref is currently based on ints and the number of refs it can
    cover is (1 << 31). This makes it impossible to use a percpu_ref to
    count memory objects or pages on 64bit machines as it may overflow.
    This forces those users to somehow aggregate the references before
    contributing to the percpu_ref which is often cumbersome and sometimes
    challenging to get the same level of performance as using the
    percpu_ref directly.

    While using ints for the percpu counters makes them pack tighter on
    64bit machines, the possible gain from using ints instead of longs is
    extremely small compared to the overall gain from per-cpu operation.
    This patch makes percpu_ref based on longs so that it can be used to
    directly count memory objects or pages.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Johannes Weiner

    Tejun Heo
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_refs too.

    This patch doesn't make any functional difference.

    v2: blk-mq conversion was missing. Updated.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Benjamin LaHaise
    Cc: Li Zefan
    Cc: Nicholas A. Bellinger
    Cc: Jens Axboe

    Tejun Heo
     

28 Aug, 2014

1 commit


28 Jun, 2014

5 commits

  • Now that explicit invocation of percpu_ref_exit() is necessary to free
    the percpu counter, we can implement percpu_ref_reinit() which
    reinitializes a released percpu_ref. This can be used implement
    scalable gating switch which can be drained and then re-opened without
    worrying about memory allocation failures.

    percpu_ref_is_zero() is added to be used in a sanity check in
    percpu_ref_exit(). As this function will be useful for other purposes
    too, make it a public interface.

    v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We
    only need data dep barrier and smp_load_acquire() is stronger and
    heavier on some archs. Spotted by Lai Jiangshan.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Christoph Lameter
    Cc: Lai Jiangshan

    Tejun Heo
     
  • Currently, a percpu_ref undoes percpu_ref_init() automatically by
    freeing the allocated percpu area when the percpu_ref is killed.
    While seemingly convenient, this has the following niggles.

    * It's impossible to re-init a released reference counter without
    going through re-allocation.

    * In the similar vein, it's impossible to initialize a percpu_ref
    count with static percpu variables.

    * We need and have an explicit destructor anyway for failure paths -
    percpu_ref_cancel_init().

    This patch removes the automatic percpu counter freeing in
    percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
    generic destructor now named percpu_ref_exit(). percpu_ref_destroy()
    is considered but it gets confusing with percpu_ref_kill() while
    "exit" clearly indicates that it's the counterpart of
    percpu_ref_init().

    All percpu_ref_cancel_init() users are updated to invoke
    percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
    added to the destruction path of all percpu_ref users.

    Signed-off-by: Tejun Heo
    Acked-by: Benjamin LaHaise
    Cc: Kent Overstreet
    Cc: Christoph Lameter
    Cc: Benjamin LaHaise
    Cc: Nicholas A. Bellinger
    Cc: Li Zefan

    Tejun Heo
     
  • percpu_ref->pcpu_count is a percpu pointer with a status flag in its
    lowest bit. As such, it always goes through arithmetic operations
    which is very cumbersome to do on a pointer. It has to be first
    casted to unsigned long and then back.

    Let's just make the field unsigned long so that we can skip the first
    casts. While at it, rename it to pcpu_counter_ptr to clarify that
    it's a pointer value.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Christoph Lameter

    Tejun Heo
     
  • * All four percpu_ref_*() operations implemented in the header file
    perform the same operation to determine whether the percpu_ref is
    alive and extract the percpu pointer. Factor out the common logic
    into __pcpu_ref_alive(). This doesn't change the generated code.

    * There are a couple places in percpu-refcount.c which masks out
    PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into
    pcpu_count_ptr().

    * The above changes make the WARN_ON_ONCE() conditional at the top of
    percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test
    PCPU_REF_DEAD directly and remove REF_STATUS().

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Christoph Lameter

    Tejun Heo
     
  • percpu-refcount currently reserves two lowest bits of its percpu
    pointer to indicate its state; however, only one bit is used for
    PCPU_REF_DEAD.

    Simplify it by removing PCPU_STATUS_BITS/MASK and testing
    PCPU_REF_DEAD directly. This also allows the compiler to choose a
    more efficient instruction depending on the architecture.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Christoph Lameter

    Tejun Heo
     

05 Jun, 2014

2 commits

  • …j/percpu.git into for-3.16

    Pull percpu/for-3.15-fixes into percpu/for-3.16 to receive
    0c36b390a546 ("percpu-refcount: fix usage of this_cpu_ops").

    The merge doesn't produce any conflict but the automatic merge is
    still incorrect because 4fb6e25049cb ("percpu-refcount: implement
    percpu_ref_tryget()") added another use of __this_cpu_inc() which
    should also be converted to this_cpu_ince().

    This commit pulls in percpu/for-3.15-fixes and converts the newly
    added __this_cpu_inc() to this_cpu_inc().

    Signed-off-by: Tejun Heo <tj@kernel.org>

    Tejun Heo
     
  • The percpu-refcount infrastructure uses the underscore variants of
    this_cpu_ops in order to modify percpu reference counters.
    (e.g. __this_cpu_inc()).

    However the underscore variants do not atomically update the percpu
    variable, instead they may be implemented using read-modify-write
    semantics (more than one instruction). Therefore it is only safe to
    use the underscore variant if the context is always the same (process,
    softirq, or hardirq). Otherwise it is possible to lose updates.

    This problem is something that Sebastian has seen within the aio
    subsystem which uses percpu refcounters both in process and softirq
    context leading to reference counts that never dropped to zeroes; even
    though the number of "get" and "put" calls matched.

    Fix this by using the non-underscore this_cpu_ops variant which
    provides correct per cpu atomic semantics and fixes the corrupted
    reference counts.

    Cc: Kent Overstreet
    Cc: # v3.11+
    Reported-by: Sebastian Ott
    Signed-off-by: Heiko Carstens
    Signed-off-by: Tejun Heo
    References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett

    Sebastian Ott