29 Jan, 2013

2 commits

  • …' and 'tiny.2013.01.29b' into HEAD

    doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.

    fixes.2013.01.26a: Miscellaneous fixes.

    tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
    simplify callback advancement.

    tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.

    Paul E. McKenney
     
  • Tiny RCU has historically omitted RCU CPU stall warnings in order to
    reduce memory requirements, however, lack of these warnings caused
    Thomas Gleixner some debugging pain recently. Therefore, this commit
    adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps
    the memory footprint small, while still enabling CPU stall warnings
    in kernels built to enable them.

    Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
    config variable to simplify #if expressions.

    Reported-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

09 Jan, 2013

1 commit


14 Nov, 2012

1 commit


23 Sep, 2012

3 commits

  • TINY_RCU's rcu_idle_enter_common() invokes rcu_sched_qs() in order
    to inform the RCU core of the quiescent state implied by idle entry.
    Of course, idle is also an extended quiescent state, so that the call
    to rcu_sched_qs() speeds up RCU's invoking of any callbacks that might
    be queued. This speed-up is important when entering into dyntick-idle
    mode -- if there are no further scheduling-clock interrupts, the callbacks
    might never be invoked, which could result in a system hang.

    However, processing callbacks does event tracing, which in turn
    implies RCU read-side critical sections, which are illegal in extended
    quiescent states. This patch therefore moves the call to rcu_sched_qs()
    so that it precedes the point at which we inform lockdep that RCU has
    entered an extended quiescent state.

    Signed-off-by: Li Zhong
    Signed-off-by: Paul E. McKenney

    Li Zhong
     
  • There is a need to use RCU from interrupt context, but either before
    rcu_irq_enter() is called or after rcu_irq_exit() is called. If the
    interrupt occurs from idle, then lockdep-RCU will complain about such
    uses, as they appear to be illegal uses of RCU from the idle loop.
    In other environments, RCU_NONIDLE() could be used to properly protect
    the use of RCU, but RCU_NONIDLE() currently cannot be invoked except
    from process context.

    This commit therefore modifies RCU_NONIDLE() to permit its use more
    globally.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because TINY_RCU's idle detection keys directly off of the nesting
    level, rather than from a separate variable as in TREE_RCU, the
    TINY_RCU dyntick-idle tracing on transition to idle must happen
    before the change to the nesting level. This commit therefore makes
    this change by passing the desired new value (rather than the old value)
    of the nesting level in to rcu_idle_enter_common().

    [ paulmck: Add fix for wrong-variable bug spotted by
    Michael Wang . ]

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

03 Jul, 2012

1 commit

  • The rcu_is_cpu_idle() function is used if CONFIG_DEBUG_LOCK_ALLOC,
    but TINY_RCU defines it only when CONFIG_PROVE_RCU. This causes
    build failures when CONFIG_DEBUG_LOCK_ALLOC=y but CONFIG_PROVE_RCU=n.
    This commit therefore adjusts the #ifdefs for rcu_is_cpu_idle() so
    that it is defined when CONFIG_DEBUG_LOCK_ALLOC=y.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

22 Feb, 2012

4 commits

  • RCU, RCU-bh, and RCU-sched read-side critical sections are forbidden
    in the inner idle loop, that is, between the rcu_idle_enter() and the
    rcu_idle_exit() -- RCU will happily ignore any such read-side critical
    sections. However, things like powertop need tracepoints in the inner
    idle loop.

    This commit therefore provides an RCU_NONIDLE() macro that can be used to
    wrap code in the idle loop that requires RCU read-side critical sections.

    Suggested-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Acked-by: Deepthi Dharwar

    Paul E. McKenney
     
  • Use of RCU in the idle loop is incorrect, quite a few instances of
    just that have made their way into mainline, primarily event tracing.
    The problem with RCU read-side critical sections on CPUs that RCU believes
    to be idle is that RCU is completely ignoring the CPU, along with any
    attempts and RCU read-side critical sections.

    The approaches of eliminating the offending uses and of pushing the
    definition of idle down beyond the offending uses have both proved
    impractical. The new approach is to encapsulate offending uses of RCU
    with rcu_idle_exit() and rcu_idle_enter(), but this requires nesting
    for code that is invoked both during idle and and during normal execution.
    Therefore, this commit modifies rcu_idle_enter() and rcu_idle_exit() to
    permit nesting.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Acked-by: Deepthi Dharwar

    Paul E. McKenney
     
  • When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
    enter dyntick-idle mode even if it still has RCU callbacks queued.
    RCU avoids system hangs in this case by scheduling a timer for several
    jiffies in the future. However, if all of the callbacks on that CPU
    are from kfree_rcu(), there is no reason to wake the CPU up, as it is
    not a problem to defer freeing of memory.

    This commit therefore tracks the number of callbacks on a given CPU
    that are from kfree_rcu(), and avoids scheduling the timer if all of
    a given CPU's callbacks are from kfree_rcu().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It is illegal to have a grace period within a same-flavor RCU read-side
    critical section, so this commit adds lockdep-RCU checks to splat when
    such abuse is encountered. This commit does not detect more elaborate
    RCU deadlock situations. These situations might be a job for lockdep
    enhancements.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

12 Dec, 2011

7 commits

  • The current rcu_batch_end event trace records only the name of the RCU
    flavor and the total number of callbacks that remain queued on the
    current CPU. This is insufficient for testing and tuning the new
    dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
    with whether or not any of the callbacks that were ready to invoke
    at the beginning of rcu_do_batch() are still queued.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • RCU has traditionally relied on idle_cpu() to determine whether a given
    CPU is running in the context of an idle task, but commit 908a3283
    (Fix idle_cpu()) has invalidated this approach. After commit 908a3283,
    idle_cpu() will return true if the current CPU is currently running the
    idle task, and will be doing so for the foreseeable future. RCU instead
    needs to know whether or not the current CPU is currently running the
    idle task, regardless of what the near future might bring.

    This commit therefore switches from idle_cpu() to "current->pid != 0".

    Reported-by: Wu Fengguang
    Suggested-by: Carsten Emde
    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Tested-by: Wu Fengguang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current code just complains if the current task is not the idle task.
    This commit therefore adds printing of the identity of the idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The trace_rcu_dyntick() trace event did not print both the old and
    the new value of the nesting level, and furthermore printed only
    the low-order 32 bits of it. This could result in some confusion
    when interpreting trace-event dumps, so this commit prints both
    the old and the new value, prints the full 64 bits, and also selects
    the process-entry/exit increment to print nicely in hexadecimal.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Report that none of the rcu read lock maps are held while in an RCU
    extended quiescent state (the section between rcu_idle_enter()
    and rcu_idle_exit()). This helps detect any use of rcu_dereference()
    and friends from within the section in idle where RCU is not allowed.

    This way we can guarantee an extended quiescent window where the CPU
    can be put in dyntick idle mode or can simply aoid to be part of any
    global grace period completion while in the idle loop.

    Uses of RCU from such mode are totally ignored by RCU, hence the
    importance of these checks.

    Signed-off-by: Frederic Weisbecker
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Earlier versions of RCU used the scheduling-clock tick to detect idleness
    by checking for the idle task, but handled idleness differently for
    CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
    critical sections in the idle task, for example, for tracing. A more
    fine-grained detection of idleness is therefore required.

    This commit presses the old dyntick-idle code into full-time service,
    so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
    always invoked at the beginning of an idle loop iteration. Similarly,
    rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
    at the end of an idle-loop iteration. This allows the idle task to
    use RCU everywhere except between consecutive rcu_idle_enter() and
    rcu_idle_exit() calls, in turn allowing architecture maintainers to
    specify exactly where in the idle loop that RCU may be used.

    Because some of the userspace upcall uses can result in what looks
    to RCU like half of an interrupt, it is not possible to expect that
    the irq_enter() and irq_exit() hooks will give exact counts. This
    patch therefore expands the ->dynticks_nesting counter to 64 bits
    and uses two separate bitfields to count process/idle transitions
    and interrupt entry/exit transitions. It is presumed that userspace
    upcalls do not happen in the idle loop or from usermode execution
    (though usermode might do a system call that results in an upcall).
    The counter is hard-reset on each process/idle transition, which
    avoids the interrupt entry/exit error from accumulating. Overflow
    is avoided by the 64-bitness of the ->dyntick_nesting counter.

    This commit also adds warnings if a non-idle task asks RCU to enter
    idle state (and these checks will need some adjustment before applying
    Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
    In addition, validation of ->dynticks and ->dynticks_nesting is added.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

31 Oct, 2011

2 commits

  • The file rcutiny.c does not need moduleparam.h header, as
    there are no modparams in this file.

    However rcutiny_plugin.h does define a module_init() and
    a module_exit() and it uses the various MODULE_ macros, so
    it really does need module.h included.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

29 Sep, 2011

6 commits

  • Add trace events to record grace-period start and end, quiescent states,
    CPUs noticing grace-period start and end, grace-period initialization,
    call_rcu() invocation, tasks blocking in RCU read-side critical sections,
    tasks exiting those same critical sections, force_quiescent_state()
    detection of dyntick-idle and offline CPUs, CPUs entering and leaving
    dyntick-idle mode (except from NMIs), CPUs coming online and going
    offline, and CPUs being kicked for staying in dyntick-idle mode for too
    long (as in many weeks, even on 32-bit systems).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    rcu: Add the rcu flavor to callback trace events

    The earlier trace events for registering RCU callbacks and for invoking
    them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
    This commit adds the RCU flavor to those trace events.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This patch #ifdefs TINY_RCU kthreads out of the kernel unless RCU_BOOST=y,
    thus eliminating context-switch overhead if RCU priority boosting has
    not been configured.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add a string to the rcu_batch_start() and rcu_batch_end() trace
    messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
    "rcu_preempt"). The trace messages for the actual invocations
    themselves are not marked, as it should be clear from the
    rcu_batch_start() and rcu_batch_end() events before and after.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In order to allow event tracing to distinguish between flavors of
    RCU, we need those names in the relevant RCU data structures. TINY_RCU
    has avoided them for memory-footprint reasons, so add them only if
    CONFIG_RCU_TRACE=y.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There was recently some controversy about the overhead of invoking RCU
    callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
    start and stop of a batch of callbacks and also for each callback invoked.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Pull the code that waits for an RCU grace period into a single function,
    which is then called by synchronize_rcu() and friends in the case of
    TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
    the case of TINY_RCU and TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

21 May, 2011

1 commit

  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 May, 2011

2 commits

  • rcu_sched_qs() currently calls local_irq_save()/local_irq_restore() up
    to three times.

    Remove irq masking from rcu_qsctr_help() / invoke_rcu_kthread()
    and do it once in rcu_sched_qs() / rcu_bh_qs()

    This generates smaller code as well.

    text data bss dec hex filename
    2314 156 24 2494 9be kernel/rcutiny.old.o
    2250 156 24 2430 97e kernel/rcutiny.new.o

    Fix an outdated comment for rcu_qsctr_help()
    Move invoke_rcu_kthread() definition before its use.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Eric Dumazet
     
  • Many rcu callbacks functions just call kfree() on the base structure.
    These functions are trivial, but their size adds up, and furthermore
    when they are used in a kernel module, that module must invoke the
    high-latency rcu_barrier() function at module-unload time.

    The kfree_rcu() function introduced by this commit addresses this issue.
    Rather than encoding a function address in the embedded rcu_head
    structure, kfree_rcu() instead encodes the offset of the rcu_head
    structure within the base structure. Because the functions are not
    allowed in the low-order 4096 bytes of kernel virtual memory, offsets
    up to 4095 bytes can be accommodated. If the offset is larger than
    4095 bytes, a compile-time error will be generated in __kfree_rcu().
    If this error is triggered, you can either fall back to use of call_rcu()
    or rearrange the structure to position the rcu_head structure into the
    first 4096 bytes.

    Note that the allowable offset might decrease in the future, for example,
    to allow something like kmem_cache_free_rcu().

    The new kfree_rcu() function can replace code as follows:

    call_rcu(&p->rcu, simple_kfree_callback);

    where "simple_kfree_callback()" might be defined as follows:

    void simple_kfree_callback(struct rcu_head *p)
    {
    struct foo *q = container_of(p, struct foo, rcu);

    kfree(q);
    }

    with the following:

    kfree_rcu(&p->rcu, rcu);

    Note that the "rcu" is the name of a field in the structure being
    freed. The reason for using this rather than passing in a pointer
    to the base structure is that the above approach allows better type
    checking.

    This commit is based on earlier work by Lai Jiangshan and Manfred Spraul:

    Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
    Manfred's patch: http://lkml.org/lkml/2009/1/2/115

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Manfred Spraul
    Signed-off-by: Paul E. McKenney
    Reviewed-by: David Howells
    Reviewed-by: Josh Triplett

    Lai Jiangshan
     

14 Jan, 2011

1 commit

  • If the RCU callback-processing kthread has nothing to do, it parks in
    a wait_event(). If RCU remains idle for more than two minutes, the
    kernel complains about this. This commit changes from wait_event()
    to wait_event_interruptible() to prevent the kernel from complaining
    just because RCU is idle.

    Reported-by: Russell King
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Thomas Weber
    Tested-by: Russell King

    Paul E. McKenney
     

30 Nov, 2010

2 commits

  • Add tracing for the tiny RCU implementations, including statistics on
    boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled
    by the default-off RCU_BOOST kernel parameter. The priority to which to
    boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel
    parameter (defaulting to real-time priority 1) and the time to wait
    before boosting the readers blocking a given grace period is controlled
    by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

18 Nov, 2010

1 commit

  • If RCU priority boosting is to be meaningful, callback invocation must
    be boosted in addition to preempted RCU readers. Otherwise, in presence
    of CPU real-time threads, the grace period ends, but the callbacks don't
    get invoked. If the callbacks don't get invoked, the associated memory
    doesn't get freed, so the system is still subject to OOM.

    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
    moves the callback invocations to a kthread, which can be boosted easily.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

20 Aug, 2010

1 commit

  • Implement a small-memory-footprint uniprocessor-only implementation of
    preemptible RCU. This implementation uses but a single blocked-tasks
    list rather than the combinatorial number used per leaf rcu_node by
    TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
    processing. This version also takes advantage of uniprocessor execution
    to accelerate grace periods in the case where there are no readers.

    The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

    This implementation is a step towards having RCU implementation driven
    off of the SMP and PREEMPT kernel configuration variables, which can
    happen once this implementation has accumulated sufficient experience.

    Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
    suggested by Steve Rostedt in order to avoid the compiler-reordering
    issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).

    As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
    savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
    workloads, CONFIG_TINY_RCU is even better.

    CONFIG_TREE_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    6170 825 28 7023 kernel/rcutree.o
    ----
    7026 Total

    CONFIG_TINY_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    2081 81 8 2170 kernel/rcutiny.o
    ----
    2183 Total

    CONFIG_TINY_RCU (non-preemptible)

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    719 25 0 744 kernel/rcutiny.o
    ---
    757 Total

    Requested-by: Loïc Minier
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

15 Jun, 2010

1 commit

  • Helps finding racy users of call_rcu(), which results in hangs because list
    entries are overwritten and/or skipped.

    Changelog since v4:
    - Bissectability is now OK
    - Now generate a WARN_ON_ONCE() for non-initialized rcu_head passed to
    call_rcu(). Statically initialized objects are detected with
    object_is_static().
    - Rename rcu_head_init_on_stack to init_rcu_head_on_stack.
    - Remove init_rcu_head() completely.

    Changelog since v3:
    - Include comments from Lai Jiangshan

    This new patch version is based on the debugobjects with the newly introduced
    "active state" tracker.

    Non-initialized entries are all considered as "statically initialized". An
    activation fixup (triggered by call_rcu()) takes care of performing the debug
    object initialization without issuing any warning. Since we cannot increase the
    size of struct rcu_head, I don't see much room to put an identifier for
    statically initialized rcu_head structures. So for now, we have to live without
    "activation without explicit init" detection. But the main purpose of this debug
    option is to detect double-activations (double call_rcu() use of a rcu_head
    before the callback is executed), which is correctly addressed here.

    This also detects potential internal RCU callback corruption, which would cause
    the callbacks to be executed twice.

    Signed-off-by: Mathieu Desnoyers
    CC: David S. Miller
    CC: "Paul E. McKenney"
    CC: akpm@linux-foundation.org
    CC: mingo@elte.hu
    CC: laijs@cn.fujitsu.com
    CC: dipankar@in.ibm.com
    CC: josh@joshtriplett.org
    CC: dvhltc@us.ibm.com
    CC: niv@us.ibm.com
    CC: tglx@linutronix.de
    CC: peterz@infradead.org
    CC: rostedt@goodmis.org
    CC: Valdis.Kletnieks@vt.edu
    CC: dhowells@redhat.com
    CC: eric.dumazet@gmail.com
    CC: Alexey Dobriyan
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Lai Jiangshan

    Mathieu Desnoyers
     

12 May, 2010

1 commit


11 May, 2010

3 commits