08 Aug, 2020

1 commit

  • percpu_counter's accuracy is related to its batch size. For a
    percpu_counter with a big batch, its deviation could be big, so when the
    counter's batch is runtime changed to a smaller value for better accuracy,
    there could also be requirment to reduce the big deviation.

    So add a percpu-counter sync function to be run on each CPU.

    Reported-by: kernel test robot
    Signed-off-by: Feng Tang
    Signed-off-by: Andrew Morton
    Cc: Dennis Zhou
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Michal Hocko
    Cc: Qian Cai
    Cc: Andi Kleen
    Cc: Huang Ying
    Cc: Dave Hansen
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: "K. Y. Srinivasan"
    Cc: Matthew Wilcox (Oracle)
    Cc: Mel Gorman
    Cc: Tim Chen
    Link: http://lkml.kernel.org/r/1594389708-60781-4-git-send-email-feng.tang@intel.com
    Signed-off-by: Linus Torvalds

    Feng Tang
     

08 Apr, 2020

1 commit

  • "vm_committed_as.count" could be accessed concurrently as reported by
    KCSAN,

    BUG: KCSAN: data-race in __vm_enough_memory / percpu_counter_add_batch

    write to 0xffffffff9451c538 of 8 bytes by task 65879 on cpu 35:
    percpu_counter_add_batch+0x83/0xd0
    percpu_counter_add_batch at lib/percpu_counter.c:91
    __vm_enough_memory+0xb9/0x260
    dup_mm+0x3a4/0x8f0
    copy_process+0x2458/0x3240
    _do_fork+0xaa/0x9f0
    __do_sys_clone+0x125/0x160
    __x64_sys_clone+0x70/0x90
    do_syscall_64+0x91/0xb05
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    read to 0xffffffff9451c538 of 8 bytes by task 66773 on cpu 19:
    __vm_enough_memory+0x199/0x260
    percpu_counter_read_positive at include/linux/percpu_counter.h:81
    (inlined by) __vm_enough_memory at mm/util.c:839
    mmap_region+0x1b2/0xa10
    do_mmap+0x45c/0x700
    vm_mmap_pgoff+0xc0/0x130
    ksys_mmap_pgoff+0x6e/0x300
    __x64_sys_mmap+0x33/0x40
    do_syscall_64+0x91/0xb05
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The read is outside percpu_counter::lock critical section which results in
    a data race. Fix it by adding a READ_ONCE() in
    percpu_counter_read_positive() which could also service as the existing
    compiler memory barrier.

    Signed-off-by: Qian Cai
    Signed-off-by: Andrew Morton
    Acked-by: Marco Elver
    Link: http://lkml.kernel.org/r/1582302724-2804-1-git-send-email-cai@lca.pw
    Signed-off-by: Linus Torvalds

    Qian Cai
     

15 Dec, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

21 Jun, 2017

1 commit

  • Currently, percpu_counter_add is a wrapper around __percpu_counter_add
    which is preempt safe due to explicit calls to preempt_disable. Given
    how __ prefix is used in percpu related interfaces, the naming
    unfortunately creates the false sense that __percpu_counter_add is
    less safe than percpu_counter_add. In terms of context-safety,
    they're equivalent. The only difference is that the __ version takes
    a batch parameter.

    Make this a bit more explicit by just renaming __percpu_counter_add to
    percpu_counter_add_batch.

    This patch doesn't cause any functional changes.

    tj: Minor updates to patch description for clarity. Cosmetic
    indentation updates.

    Signed-off-by: Nikolay Borisov
    Signed-off-by: Tejun Heo
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Cc: Darrick J. Wong
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: linux-mm@kvack.org
    Cc: "David S. Miller"

    Nikolay Borisov
     

29 May, 2015

1 commit

  • XFS uses non-stanard batch sizes for avoiding frequent global
    counter updates on it's allocated inode counters, as they increment
    or decrement in batches of 64 inodes. Hence the standard percpu
    counter batch of 32 means that the counter is effectively a global
    counter. Currently Xfs uses a batch size of 128 so that it doesn't
    take the global lock on every single modification.

    However, Xfs also needs to compare accurately against zero, which
    means we need to use percpu_counter_compare(), and that has a
    hard-coded batch size of 32, and hence will spuriously fail to
    detect when it is supposed to use precise comparisons and hence
    the accounting goes wrong.

    Add __percpu_counter_compare() to take a custom batch size so we can
    use it sanely in XFS and factor percpu_counter_compare() to use it.

    Signed-off-by: Dave Chinner
    Acked-by: Tejun Heo
    Signed-off-by: Dave Chinner

    Dave Chinner
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_counters too.

    We could have left percpu_counter_init() alone and added
    percpu_counter_init_gfp(); however, the number of users isn't that
    high and introducing _gfp variants to all percpu data structures would
    be quite ugly, so let's just do the conversion. This is the one with
    the most users. Other percpu data structures are a lot easier to
    convert.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Jan Kara
    Acked-by: "David S. Miller"
    Cc: x86@kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andrew Morton

    Tejun Heo
     

05 Feb, 2013

1 commit


13 Sep, 2011

1 commit

  • The percpu_counter::lock can be taken in atomic context and therefore
    cannot be preempted on -rt - annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

25 May, 2011

1 commit

  • The percpu_counter_*_positive() API in UP case doesn't check if return
    value is positive. Add comments to explain why we don't. Also if count <
    0, returns 0 instead of 1 for *read_positive().

    [akpm@linux-foundation.org: tweak comment]
    Signed-off-by: Shaohua Li
    Acked-by: Eric Dumazet
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

28 Oct, 2010

1 commit

  • Commit 84061e0 fixed an accounting bug only to introduce the
    possibility of a kernel OOPS if the journal has a non-zero j_errno
    field indicating that the file system had detected a fs inconsistency.
    After the journal replay, if the journal superblock indicates that the
    file system has an error, this indication is transfered to the file
    system and then ext4_commit_super() is called to write this to the
    disk.

    But since the percpu counters are now initialized after the journal
    replay, the call to ext4_commit_super() will cause a kernel oops since
    it needs to use the percpu counters the ext4 superblock structure.

    The fix is to skip setting the ext4 free block and free inode fields
    if the percpu counter has not been set.

    Thanks to Ken Sumrall for reporting and analyzing the root causes of
    this bug.

    Addresses-Google-Bug: #3054080

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

10 Aug, 2010

1 commit

  • Add percpu_counter_compare that allows for a quick but accurate comparison
    of percpu_counter with a given value.

    A rough count is provided by the count field in percpu_counter structure,
    without accounting for the other values stored in individual cpu counters.

    The actual count is a sum of count and the cpu counters. However, count
    field is never different from the actual value by a factor of
    batch*num_online_cpu. We do not need to get actual count for comparison
    if count is different from the given value by this factor and allows for
    quick comparison without summing up all the per cpu counters.

    Signed-off-by: Tim Chen
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Chen
     

03 Mar, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: add __percpu sparse annotations to what's left
    percpu: add __percpu sparse annotations to fs
    percpu: add __percpu sparse annotations to core kernel subsystems
    local_t: Remove leftover local.h
    this_cpu: Remove pageset_notifier
    this_cpu: Page allocator conversion
    percpu, x86: Generic inc / dec percpu instructions
    local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
    module: Use this_cpu_xx to dynamically allocate counters
    local_t: Remove cpu_local_xx macros
    percpu: refactor the code in pcpu_[de]populate_chunk()
    percpu: remove compile warnings caused by __verify_pcpu_ptr()
    percpu: make accessors check for percpu pointer in sparse
    percpu: add __percpu for sparse.
    percpu: make access macros universal
    percpu: remove per_cpu__ prefix.

    Linus Torvalds
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to core subsystems.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Paul E. McKenney
    Cc: Jens Axboe
    Cc: linux-mm@kvack.org
    Cc: Rusty Russell
    Cc: Dipankar Sarma
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Eric Biederman

    Tejun Heo
     

08 Feb, 2010

1 commit

  • Even though batch isn't used on UP, we may want to pass one in
    to keep the SMP and UP code paths similar. Convert
    __percpu_counter_add to an inline function so we wont get
    variable unused warnings if we do.

    Signed-off-by: Anton Blanchard
    Cc: KOSAKI Motohiro
    Cc: Peter Zijlstra
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

07 Jan, 2009

2 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: fix rcutorture bug
    rcu: eliminate synchronize_rcu_xxx macro
    rcu: make treercu safe for suspend and resume
    rcu: fix rcutree grace-period-latency bug on small systems
    futex: catch certain assymetric (get|put)_futex_key calls
    futex: make futex_(get|put)_key() calls symmetric
    locking, percpu counters: introduce separate lock classes
    swiotlb: clean up EXPORT_SYMBOL usage
    swiotlb: remove unnecessary declaration
    swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
    swiotlb: add support for systems with highmem
    swiotlb: store phys address in io_tlb_orig_addr array
    swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()

    Linus Torvalds
     
  • For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS

    Considering more and more distros are using high NR_CPUS values, it makes
    sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.

    A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
    minimum value helps branch prediction in __percpu_counter_add())

    We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.

    We rename FBC_BATCH to percpu_counter_batch since its not a constant
    anymore.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

29 Dec, 2008

1 commit

  • Impact: fix lockdep false positives

    Classify percpu_counter instances similar to regular lock objects --
    that is, per instantiation site.

    The networking code has increased its use of percpu_counters, which
    leads to false positives if they are treated as a single class.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Dec, 2008

2 commits

  • Revert

    commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e
    Author: Mingming Cao
    Date: Fri Jul 11 19:27:31 2008 -0400

    percpu_counter: new function percpu_counter_sum_and_set

    As described in

    revert "percpu counter: clean up percpu_counter_sum_and_set()"

    the new percpu_counter_sum_and_set() is racy against updates to the
    cpu-local accumulators on other CPUs. Revert that change.

    This means that ext4 will be slow again. But correct.

    Reported-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Peter Zijlstra
    Cc: Mingming Cao
    Cc:
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert

    commit 1f7c14c62ce63805f9574664a6c6de3633d4a354
    Author: Mingming Cao
    Date: Thu Oct 9 12:50:59 2008 -0400

    percpu counter: clean up percpu_counter_sum_and_set()

    Before this patch we had the following:

    percpu_counter_sum(): return the percpu_counter's value

    percpu_counter_sum_and_set(): return the percpu_counter's value, copying
    that value into the central value and zeroing the per-cpu counters before
    returning.

    After this patch, percpu_counter_sum_and_set() has gone, and
    percpu_counter_sum() gets the old percpu_counter_sum_and_set()
    functionality.

    Problem is, as Eric points out, the old percpu_counter_sum_and_set()
    functionality was racy and wrong. It zeroes out counters on "other" cpus,
    without holding any locks which will prevent races agaist updates from
    those other CPUS.

    This patch reverts 1f7c14c62ce63805f9574664a6c6de3633d4a354. This means
    that percpu_counter_sum_and_set() still has the race, but
    percpu_counter_sum() does not.

    Note that this is not a simple revert - ext4 has since started using
    percpu_counter_sum() for its dirty_blocks counter as well.

    Note that this revert patch changes percpu_counter_sum() semantics.

    Before the patch, a call to percpu_counter_sum() will bring the counter's
    central counter mostly up-to-date, so a following percpu_counter_read()
    will return a close value.

    After this patch, a call to percpu_counter_sum() will leave the counter's
    central accumulator unaltered, so a subsequent call to
    percpu_counter_read() can now return a significantly inaccurate result.

    If there is any code in the tree which was introduced after
    e8ced39d5e8911c662d4d69a342b9d053eaaac4e was merged, and which depends
    upon the new percpu_counter_sum() semantics, that code will break.

    Reported-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Peter Zijlstra
    Cc: Mingming Cao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

10 Oct, 2008

1 commit

  • percpu_counter_sum_and_set() and percpu_counter_sum() is the same except
    the former updates the global counter after accounting. Since we are
    taking the fbc->lock to calculate the precise value of the counter in
    percpu_counter_sum() anyway, it should simply set fbc->count too, as the
    percpu_counter_sum_and_set() does.

    This patch merges these two interfaces into one.

    Signed-off-by: Mingming Cao
    Acked-by: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

12 Jul, 2008

1 commit

  • Delayed allocation need to check free blocks at every write time.
    percpu_counter_read_positive() is not quit accurate. delayed
    allocation need a more accurate accounting, but using
    percpu_counter_sum_positive() is frequently is quite expensive.

    This patch added a new function to update center counter when sum
    per-cpu counter, to increase the accurate rate for next
    percpu_counter_read() and require less calling expensive
    percpu_counter_sum().

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

17 Oct, 2007

9 commits


17 Jul, 2007

1 commit

  • per-cpu counters presently must iterate over all possible CPUs in the
    exhaustive percpu_counter_sum().

    But it can be much better to only iterate over the presently-online CPUs. To
    do this, we must arrange for an offlined CPU's count to be spilled into the
    counter's central count.

    We can do this for all percpu_counters in the machine by linking them into a
    single global list and walking that list at CPU_DEAD time.

    (I hope. Might have race windows in which the percpu_counter_sum() count is
    inaccurate?)

    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Jun, 2006

1 commit

  • The percpu counter data type are changed in this set of patches to support
    more users like ext3 who need more than 32 bit to store the free blocks
    total in the filesystem.

    - Generic perpcu counters data type changes. The size of the global counter
    and local counter were explictly specified using s64 and s32. The global
    counter is changed from long to s64, while the local counter is changed from
    long to s32, so we could avoid doing 64 bit update in most cases.

    - Users of the percpu counters are updated to make use of the new
    percpu_counter_init() routine now taking an additional parameter to allow
    users to pass the initial value of the global counter.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

26 Apr, 2006

1 commit


09 Mar, 2006

1 commit

  • Implement percpu_counter_sum(). This is a more accurate but slower version of
    percpu_counter_read_positive().

    We need this for Alex's speedup-ext3_statfs patch and for the nr_file
    accounting fix. Otherwise these things would be too inaccurate on large CPU
    counts.

    Cc: Ravikiran G Thirumalai
    Cc: Alex Tomas
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds