20 Mar, 2015

1 commit


27 Feb, 2015

2 commits

  • This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
    that emulates a very early boot rcu_expedite_gp(). A late-boot
    call to rcu_end_inkernel_boot() will provide the corresponding
    rcu_unexpedite_gp(). The late-boot call to rcu_end_inkernel_boot()
    should be made just before init is spawned.

    According to Arjan:

    > To show the boot time, I'm using the timestamp of the "Write protecting"
    > line, that's pretty much the last thing we print prior to ring 3 execution.
    >
    > A kernel with default RCU behavior (inside KVM, only virtual devices)
    > looks like this:
    >
    > [ 0.038724] Write protecting the kernel read-only data: 10240k
    >
    > a kernel with expedited RCU (using the command line option, so that I
    > don't have to recompile between measurements and thus am completely
    > oranges-to-oranges)
    >
    > [ 0.031768] Write protecting the kernel read-only data: 10240k
    >
    > which, in percentage, is an 18% improvement.

    Reported-by: Arjan van de Ven
    Signed-off-by: Paul E. McKenney
    Tested-by: Arjan van de Ven

    Paul E. McKenney
     
  • Currently, expediting of normal synchronous grace-period primitives
    (synchronize_rcu() and friends) is controlled by the rcu_expedited()
    boot/sysfs parameter. This works well, but does not handle nesting.
    This commit therefore provides rcu_expedite_gp() to enable expediting
    and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
    which support nesting.

    Reported-by: Arjan van de Ven
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

26 Feb, 2015

1 commit


14 Nov, 2014

2 commits


04 Nov, 2014

1 commit

  • Add early boot self tests for RCU under CONFIG_PROVE_RCU.

    Currently the only test is adding a dummy callback which increments a counter
    which we then later verify after calling rcu_barrier*().

    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul E. McKenney

    Pranith Kumar
     

30 Oct, 2014

1 commit


17 Sep, 2014

1 commit


08 Sep, 2014

12 commits

  • The rcu_preempt_note_context_switch() function is on a scheduling fast
    path, so it would be good to avoid disabling irqs. The reason that irqs
    are disabled is to synchronize process-level and irq-handler access to
    the task_struct ->rcu_read_unlock_special bitmask. This commit therefore
    makes ->rcu_read_unlock_special instead be a union of bools with a short
    allowing single-access checks in RCU's __rcu_read_unlock(). This results
    in the process-level and irq-handler accesses being simple loads and
    stores, so that irqs need no longer be disabled. This commit therefore
    removes the irq disabling from rcu_preempt_note_context_switch().

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The grace-period-wait loop in rcu_tasks_kthread() is under (unnecessary)
    RCU protection, and therefore has no preemption points in a PREEMPT=n
    kernel. This commit therefore removes the RCU protection and inserts
    cond_resched().

    Reported-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
    usermode execution. There would be neither a context switch nor a
    scheduling-clock interrupt to tell TASKS_RCU that the task in question
    had passed through a quiescent state. The grace period would therefore
    extend indefinitely. This commit therefore makes RCU's dyntick-idle
    subsystem record the task_struct structure of the task that is running
    in dyntick-idle mode on each CPU. The TASKS_RCU grace period can
    then access this information and record a quiescent state on
    behalf of any CPU running in dyntick-idle usermode.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It is expected that many sites will have CONFIG_TASKS_RCU=y, but
    will never actually invoke call_rcu_tasks(). For such sites, creating
    rcu_tasks_kthread() at boot is wasteful. This commit therefore defers
    creation of this kthread until the time of the first call_rcu_tasks().

    This of course means that the first call_rcu_tasks() must be invoked
    from process context after the scheduler is fully operational.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current RCU-tasks implementation uses strict polling to detect
    callback arrivals. This works quite well, but is not so good for
    energy efficiency. This commit therefore replaces the strict polling
    with a wait queue.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds a ten-minute RCU-tasks stall warning. The actual
    time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
    with values less than or equal to zero disabling the stall warnings.
    The default value is ten minutes, which means that the tasks that have
    not yet responded will get their stacks dumped every ten minutes, until
    they pass through a voluntary context switch.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit exports the RCU-tasks synchronous APIs,
    synchronize_rcu_tasks() and rcu_barrier_tasks(), to
    GPL-licensed kernel modules.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Steven Rostedt
     
  • Once a task has passed exit_notify() in the do_exit() code path, it
    is no longer on the task lists, and is therefore no longer visible
    to rcu_tasks_kthread(). This means that an almost-exited task might
    be preempted while within a trampoline, and this task won't be waited
    on by rcu_tasks_kthread(). This commit fixes this bug by adding an
    srcu_struct. An exiting task does srcu_read_lock() just before calling
    exit_notify(), and does the corresponding srcu_read_unlock() after
    doing the final preempt_disable(). This means that rcu_tasks_kthread()
    can do synchronize_srcu() to wait for all mostly-exited tasks to reach
    their final preempt_disable() region, and then use synchronize_sched()
    to wait for those tasks to finish exiting.

    Reported-by: Oleg Nesterov
    Suggested-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It turns out to be easier to add the synchronous grace-period waiting
    functions to RCU-tasks than to work around their absense in rcutorture,
    so this commit adds them. The key point is that the existence of
    call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds a new RCU-tasks flavor of RCU, which provides
    call_rcu_tasks(). This RCU flavor's quiescent states are voluntary
    context switch (not preemption!) and userspace execution (not the idle
    loop -- use some sort of schedule_on_each_cpu() if you need to handle the
    idle tasks. Note that unlike other RCU flavors, these quiescent states
    occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt.

    This RCU flavor is assumed to have very infrequent latency-tolerant
    updaters. This assumption permits significant simplifications, including
    a single global callback list protected by a single global lock, along
    with a single task-private linked list containing all tasks that have not
    yet passed through a quiescent state. If experience shows this assumption
    to be incorrect, the required additional complexity will be added.

    Suggested-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit uninlines rcu_read_lock_held(). According to "size vmlinux"
    this saves 28549 in .text:

    - 5541731 3014560 14757888 23314179
    + 5513182 3026848 14757888 23297918

    Note: it looks as if the data grows by 12288 bytes but this is not true,
    it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and
    since .text shrinks the padding grows, and thus .data grows too as it
    seen by /bin/size. diff System.map:

    - ffffffff81510000 D _sdata
    - ffffffff81510000 D init_thread_union
    + ffffffff81509000 D _sdata
    + ffffffff8150c000 D init_thread_union

    Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size
    can't "wrongly" report that .data grows if .text shinks.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Paul E. McKenney

    Oleg Nesterov
     

10 Jul, 2014

1 commit

  • The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
    effective at finding race conditions, so this commit removes it.

    Signed-off-by: Paul E. McKenney
    Cc: Andi Kleen
    [ paulmck: Remove definition and uses as noted by Paul Bolle. ]

    Paul E. McKenney
     

24 Jun, 2014

2 commits

  • Commit ac1bea85781e (Make cond_resched() report RCU quiescent states)
    fixed a problem where a CPU looping in the kernel with but one runnable
    task would give RCU CPU stall warnings, even if the in-kernel loop
    contained cond_resched() calls. Unfortunately, in so doing, it introduced
    performance regressions in Anton Blanchard's will-it-scale "open1" test.
    The problem appears to be not so much the increased cond_resched() path
    length as an increase in the rate at which grace periods complete, which
    increased per-update grace-period overhead.

    This commit takes a different approach to fixing this bug, mainly by
    moving the RCU-visible quiescent state from cond_resched() to
    rcu_note_context_switch(), and by further reducing the check to a
    simple non-zero test of a single per-CPU variable. However, this
    approach requires that the force-quiescent-state processing send
    resched IPIs to the offending CPUs. These will be sent only once
    the grace period has reached an age specified by the boot/sysfs
    parameter rcutree.jiffies_till_sched_qs, or once the grace period
    reaches an age halfway to the point at which RCU CPU stall warnings
    will be emitted, whichever comes first.

    Reported-by: Dave Hansen
    Signed-off-by: Paul E. McKenney
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mike Galbraith
    Cc: Eric Dumazet
    Reviewed-by: Josh Triplett
    [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
    ktest build robot. Also fixed smp_mb() comment as noted by
    Oleg Nesterov. ]

    Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, call_rcu() relies on implicit allocation and initialization
    for the debug-objects handling of RCU callbacks. If you hammer the
    kernel hard enough with Sasha's modified version of trinity, you can end
    up with the sl*b allocators recursing into themselves via this implicit
    call_rcu() allocation.

    This commit therefore exports the debug_init_rcu_head() and
    debug_rcu_head_free() functions, which permits the allocators to allocated
    and pre-initialize the debug-objects information, so that there no longer
    any need for call_rcu() to do that initialization, which in turn prevents
    the recursion into the memory allocators.

    Reported-by: Sasha Levin
    Suggested-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Acked-by: Thomas Gleixner
    Looks-good-to: Christoph Lameter

    Paul E. McKenney
     

20 May, 2014

1 commit

  • Some sysrq handlers can run for a long time, because they dump a lot
    of data onto a serial console. Having RCU stall warnings pop up in
    the middle of them only makes the problem worse.

    This commit provides rcu_sysrq_start() and rcu_sysrq_end() APIs to
    temporarily suppress RCU CPU stall warnings while a sysrq request is
    handled.

    Signed-off-by: Rik van Riel
    [ paulmck: Fix TINY_RCU build error. ]
    Signed-off-by: Paul E. McKenney

    Rik van Riel
     

15 May, 2014

1 commit

  • Given a CPU running a loop containing cond_resched(), with no
    other tasks runnable on that CPU, RCU will eventually report RCU
    CPU stall warnings due to lack of quiescent states. Fortunately,
    every call to cond_resched() is a perfectly good quiescent state.
    Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
    for cond_resched(), especially given the need to disable preemption,
    and, for RCU-preempt, interrupts as well.

    This commit therefore maintains a per-CPU counter that causes
    cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
    rcu_note_context_switch(), but only about once per 256 invocations.
    This ratio was chosen in keeping with the relative time constants of
    RCU grace periods.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

26 Feb, 2014

1 commit

  • The kbuild test bot uncovered an implicit dependence on the
    trace header being present before rcu.h in ia64 allmodconfig
    that looks like this:

    In file included from kernel/ksysfs.c:22:0:
    kernel/rcu/rcu.h: In function '__rcu_reclaim':
    kernel/rcu/rcu.h:107:3: error: implicit declaration of function 'trace_rcu_invoke_kfree_callback' [-Werror=implicit-function-declaration]
    kernel/rcu/rcu.h:112:3: error: implicit declaration of function 'trace_rcu_invoke_callback' [-Werror=implicit-function-declaration]
    cc1: some warnings being treated as errors

    Looking at other rcu.h users, we can find that they all
    were sourcing the trace header in advance of rcu.h itself,
    as seen in the context of this diff. There were also some
    inconsistencies as to whether it was or wasn't sourced based
    on the parent tracing Kconfig.

    Rather than "fix" it at each use site, and have inconsistent
    use based on whether "#ifdef CONFIG_RCU_TRACE" was used or not,
    lets just source the trace header just once, in the actual consumer
    of it, which is rcu.h itself. We include it unconditionally, as
    build testing shows us that is a hard requirement for some files.

    Reported-by: kbuild test robot
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney

    Paul Gortmaker
     

18 Feb, 2014

1 commit

  • All of the RCU source files have the usual GPL header, which contains a
    long-obsolete postal address for FSF. To avoid the need to track the
    FSF office's movements, this commit substitutes the URL where GPL may
    be found.

    Reported-by: Greg KH
    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

25 Jan, 2014

1 commit

  • rcu_dereference_check_fdtable() looks very wrong,

    1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix
    RCU-lockdep false positive due to /proc" but it doesn't really
    fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
    hit the same race with get_files_struct().

    And otoh rcu_my_thread_group_empty() can suppress the correct
    warning if the caller is the CLONE_FILES (without CLONE_THREAD)
    task.

    2. files->count == 1 check is not really right too. Even if this
    files_struct is not shared it is not safe to access it lockless
    unless the caller is the owner.

    Otoh, this check is sub-optimal. files->count == 0 always means
    it is safe to use it lockless even if files != current->files,
    but put_files_struct() has to take rcu_read_lock(). See the next
    patch.

    This patch removes the buggy checks and turns fcheck_files() into
    __fcheck_files() which uses rcu_dereference_raw(), the "unshared"
    callers, fget_light() and fget_raw_light(), can use it to avoid
    the warning from RCU-lockdep.

    fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
    plus __fcheck_files().

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Al Viro

    Oleg Nesterov
     

10 Dec, 2013

1 commit


16 Oct, 2013

1 commit