21 Oct, 2010

1 commit

  • With the addition of trace_softirq_raise() the softirq tracepoint got
    even more convoluted. Why the tracepoints take two pointers to assign
    an integer is beyond my comprehension.

    But adding an extra case which treats the first pointer as an unsigned
    long when the second pointer is NULL including the back and forth
    type casting is just horrible.

    Convert the softirq tracepoints to take a single unsigned int argument
    for the softirq vector number and fix the call sites.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Acked-by: Peter Zijlstra
    Acked-by: mathieu.desnoyers@efficios.com
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt

    Thomas Gleixner
     

05 Jun, 2010

1 commit

  • The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu
    notifier to return encapsulate errno value") changed the return value of
    cpu notifier callbacks.

    Those callbacks don't return NOTIFY_BAD on failures anymore. But there
    are a few callbacks which are called directly at init time and checking
    the return value.

    I forgot to change BUG_ON checking by the direct callers in the commit.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

28 May, 2010

1 commit


11 May, 2010

1 commit

  • The addition of preemptible RCU to treercu resulted in a bit of
    confusion and inefficiency surrounding the handling of context switches
    for RCU-sched and for RCU-preempt. For RCU-sched, a context switch
    is a quiescent state, pure and simple, just like it always has been.
    For RCU-preempt, a context switch is in no way a quiescent state, but
    special handling is required when a task blocks in an RCU read-side
    critical section.

    However, the callout from the scheduler and the outer loop in ksoftirqd
    still calls something named rcu_sched_qs(), whose name is no longer
    accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched
    quiescent state, it ends up unnecessarily (though harmlessly, aside
    from the performance hit) enqueuing the current task if it happens to
    be running in an RCU-preempt read-side critical section. This not only
    increases the maximum latency of scheduler_tick(), it also needlessly
    increases the overhead of the next outermost rcu_read_unlock() invocation.

    This patch addresses this situation by separating the notion of RCU's
    context-switch handling from that of RCU-sched's quiescent states.
    The context-switch handling is covered by rcu_note_context_switch() in
    general and by rcu_preempt_note_context_switch() for preemptible RCU.
    This permits rcu_sched_qs() to handle quiescent states and only quiescent
    states. It also reduces the maximum latency of scheduler_tick(), though
    probably by much less than a microsecond. Finally, it means that tasks
    within preemptible-RCU read-side critical sections avoid incurring the
    overhead of queuing unless there really is a context switch.

    Suggested-by: Lai Jiangshan
    Acked-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Paul E. McKenney
     

04 Feb, 2010

1 commit

  • hrtimers callbacks are always done from hardirq context, either the
    jiffy tick interrupt or the hrtimer device interrupt.

    [ there is currently one exception that can still call a hrtimer
    callback from softirq, but even in that case this will still
    work correctly. ]

    Reported-by: Wei Yongjun
    Signed-off-by: Peter Zijlstra
    Cc: Yury Polyanskiy
    Tested-by: Wei Yongjun
    Acked-by: David S. Miller
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

02 Nov, 2009

1 commit

  • Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
    while rcu_irq_enter() is invoked unconditionally. This patch
    moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
    calls are balanced.

    This patch has no effect on the behavior of the kernel because
    both rcu_irq_enter() and rcu_irq_exit() are empty for
    !CONFIG_NO_HZ, but the code is easier to understand if the calls
    are obviously balanced in all cases.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

29 Oct, 2009

1 commit

  • This patch updates percpu related symbols under kernel/ and mm/ such
    that percpu symbols are unique and don't clash with local symbols.
    This serves two purposes of decreasing the possibility of global
    percpu symbol collision and allowing dropping per_cpu__ prefix from
    percpu symbols.

    * kernel/lockdep.c: s/lock_stats/cpu_lock_stats/

    * kernel/sched.c: s/init_rq_rt/init_rt_rq_var/ (any better idea?)
    s/sched_group_cpus/sched_groups/

    * kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a

    * kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/
    s/watchdog_task/softlockup_watchdog/
    s/timestamp/ts/ for local variables

    * kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/

    * mm/slab.c: s/reap_work/slab_reap_work/
    s/reap_node/slab_reap_node/

    * mm/vmstat.c: local variable changed to avoid collision with vmstat_work

    Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
    which cause name clashes" patch.

    Signed-off-by: Tejun Heo
    Acked-by: (slab/vmstat) Christoph Lameter
    Reviewed-by: Christoph Lameter
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Nick Piggin

    Tejun Heo
     

18 Sep, 2009

1 commit


23 Aug, 2009

1 commit

  • Make RCU-sched, RCU-bh, and RCU-preempt be underlying
    implementations, with "RCU" defined in terms of one of the
    three. Update the outdated rcu_qsctr_inc() names, as these
    functions no longer increment anything.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

22 Jul, 2009

1 commit

  • commit ca109491f (hrtimer: removing all ur callback modes) moved all
    hrtimer callbacks into hard interrupt context when high resolution
    timers are active. That breaks code which relied on the assumption
    that the callback happens in softirq context.

    Provide a generic infrastructure which combines tasklets and hrtimers
    together to provide an in-softirq hrtimer experience.

    Signed-off-by: Peter Zijlstra
    Cc: torvalds@linux-foundation.org
    Cc: kaber@trash.net
    Cc: David Miller
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

19 Jun, 2009

1 commit

  • Statistics for softirq doesn't exist.
    It will be helpful like statistics for interrupts.
    This patch introduces counting the number of softirq,
    which will be exported in /proc/softirqs.

    When softirq handler consumes much CPU time,
    /proc/stat is like the following.

    $ while :; do cat /proc/stat | head -n1 ; sleep 10 ; done
    cpu 88 0 408 739665 583 28 2 0 0
    cpu 450 0 1090 740970 594 28 1294 0 0
    ^^^^
    softirq

    In such a situation,
    /proc/softirqs shows us which softirq handler is invoked.
    We can see the increase rate of softirqs.

    $ cat /proc/softirqs
    CPU0 CPU1 CPU2 CPU3
    HI 0 0 0 0
    TIMER 462850 462805 462782 462718
    NET_TX 0 0 0 365
    NET_RX 2472 2 2 40
    BLOCK 0 0 381 1164
    TASKLET 0 0 0 224
    SCHED 462654 462689 462698 462427
    RCU 3046 2423 3367 3173

    $ cat /proc/softirqs
    CPU0 CPU1 CPU2 CPU3
    HI 0 0 0 0
    TIMER 463361 465077 465056 464991
    NET_TX 53 0 1 365
    NET_RX 3757 2 2 40
    BLOCK 0 0 398 1170
    TASKLET 0 0 0 224
    SCHED 463074 464318 464612 463330
    RCU 3505 2948 3947 3673

    When CPU TIME of softirq is high,
    the rates of increase is the following.
    TIMER : 220/sec : CPU1-3
    NET_TX : 5/sec : CPU0
    NET_RX : 120/sec : CPU0
    SCHED : 40-200/sec : all CPU
    RCU : 45-58/sec : all CPU

    The rates of increase in an idle mode is the following.
    TIMER : 250/sec
    SCHED : 250/sec
    RCU : 2/sec

    It seems many softirqs for receiving packets and rcu are invoked. This
    gives us help for checking system.

    Signed-off-by: Keika Kobayashi
    Reviewed-by: Hiroshi Shimamoto
    Reviewed-by: KOSAKI Motohiro
    Cc: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

13 Jun, 2009

1 commit

  • Rationale: kmemcheck needs to be able to schedule a tasklet without
    touching any dynamically allocated memory _at_ _all_ (since that would
    lead to a recursive page fault). This tasklet is used for writing the
    error reports to the kernel log.

    The new scheduling function avoids touching any other tasklets by
    inserting the new tasklist as the head of the "tasklet_hi" list instead
    of on the tail.

    Also don't wake up the softirq thread lest the scheduler access some
    tracked memory and we go down with a recursive page fault.

    In this case, we'd better just wait for the maximum time of 1/HZ for the
    message to appear.

    Signed-off-by: Vegard Nossum

    Vegard Nossum
     

11 Jun, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
    Revert "x86, bts: reenable ptrace branch trace support"
    tracing: do not translate event helper macros in print format
    ftrace/documentation: fix typo in function grapher name
    tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
    tracing: add protection around module events unload
    tracing: add trace_seq_vprint interface
    tracing: fix the block trace points print size
    tracing/events: convert block trace points to TRACE_EVENT()
    ring-buffer: fix ret in rb_add_time_stamp
    ring-buffer: pass in lockdep class key for reader_lock
    tracing: add annotation to what type of stack trace is recorded
    tracing: fix multiple use of __print_flags and __print_symbolic
    tracing/events: fix output format of user stack
    tracing/events: fix output format of kernel stack
    tracing/trace_stack: fix the number of entries in the header
    ring-buffer: discard timestamps that are at the start of the buffer
    ring-buffer: try to discard unneeded timestamps
    ring-buffer: fix bug in ring_buffer_discard_commit
    ftrace: do not profile functions when disabled
    tracing: make trace pipe recognize latency format flag
    ...

    Linus Torvalds
     

07 May, 2009

1 commit


29 Apr, 2009

1 commit

  • "tracing: create automated trace defines" causes this compile error on s390,
    as reported by Sachin Sant against linux-next:

    kernel/built-in.o: In function `__do_softirq':
    (.text+0x1c680): undefined reference to `__tracepoint_softirq_entry'

    This happens because the definitions of the softirq tracepoints were moved
    from kernel/softirq.c to kernel/irq/handle.c. Since s390 doesn't support
    generic hardirqs handle.c doesn't get compiled and the definitions are
    missing.

    So move the tracepoints to softirq.c again.

    [ Impact: fix build failure on s390 ]

    Reported-by: Sachin Sant
    Signed-off-by: Heiko Carstens
    Cc: Steven Rostedt
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

28 Apr, 2009

1 commit

  • This simplifies the node awareness of the code. All our allocators
    only deal with a NUMA node ID locality not with CPU ids anyway - so
    there's no need to maintain (and transform) a CPU id all across the
    IRq layer.

    v2: keep move_irq_desc related

    [ Impact: cleanup, prepare IRQ code to be NUMA-aware ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

17 Apr, 2009

1 commit


15 Apr, 2009

2 commits

  • Impact: clean up

    Create a sub directory in include/trace called events to keep the
    trace point headers in their own separate directory. Only headers that
    declare trace points should be defined in this directory.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Neil Horman
    Cc: Zhao Lei
    Cc: Eduard - Gabriel Munteanu
    Cc: Pekka Enberg
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch lowers the number of places a developer must modify to add
    new tracepoints. The current method to add a new tracepoint
    into an existing system is to write the trace point macro in the
    trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
    DECLARE_TRACE, then they must add the same named item into the C file
    with the macro DEFINE_TRACE(name) and then add the trace point.

    This change cuts out the needing to add the DEFINE_TRACE(name).
    Every file that uses the tracepoint must still include the trace/.h
    file, but the one C file must also add a define before the including
    of that file.

    #define CREATE_TRACE_POINTS
    #include

    This will cause the trace/mytrace.h file to also produce the C code
    necessary to implement the trace point.

    Note, if more than one trace/.h is used to create the C code
    it is best to list them all together.

    #define CREATE_TRACE_POINTS
    #include
    #include
    #include

    Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
    the cleaner solution of the define above the includes over my first
    design to have the C code include a "special" header.

    This patch converts sched, irq and lockdep and skb to use this new
    method.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Neil Horman
    Cc: Zhao Lei
    Cc: Eduard - Gabriel Munteanu
    Cc: Pekka Enberg
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

07 Apr, 2009

1 commit


06 Apr, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

04 Apr, 2009

1 commit


31 Mar, 2009

2 commits

  • It appears I inadvertly introduced rq->lock recursion to the
    hrtimer_start() path when I delegated running already expired
    timers to softirq context.

    This patch fixes it by introducing a __hrtimer_start_range_ns()
    method that will not use raise_softirq_irqoff() but
    __raise_softirq_irqoff() which avoids the wakeup.

    It then also changes schedule() to check for pending softirqs and
    do the wakeup then, I'm not quite sure I like this last bit, nor
    am I convinced its really needed.

    Signed-off-by: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: paulus@samba.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Conflicts:
    lib/Kconfig.debug

    Ingo Molnar
     

28 Mar, 2009

1 commit


13 Mar, 2009

7 commits


10 Mar, 2009

2 commits


06 Mar, 2009

1 commit


05 Mar, 2009

1 commit

  • If a machine is flooded by network frames, a cpu can loop
    100% of its time inside ksoftirqd() without calling schedule().
    This can delay RCU grace period to insane values.

    Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.

    Paul: "This regression was a result of the recent change from
    "schedule()" to "cond_resched()", which got rid of that quiescent
    state in the common case where a reschedule is not needed".

    Signed-off-by: Eric Dumazet
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     

25 Feb, 2009

1 commit

  • Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
    the code so that we can use CSD_FLAG_LOCK for both purposes.

    Signed-off-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Linus Torvalds
    Cc: Nick Piggin
    Cc: Jens Axboe
    Cc: "Paul E. McKenney"
    Cc: Rusty Russell
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Jan, 2009

1 commit

  • Impact: fix to preempt trace triggering lockdep check_flag failure

    In local_bh_disable, the use of add_preempt_count causes the
    preempt tracer to start recording the time preemption is off.
    But because it already modified the preempt_count to show
    softirqs disabled, and before it called the lockdep code to
    handle this, it causes a state that lockdep can not handle.

    The preempt tracer will reset the ring buffer on start of a trace,
    and the ring buffer reset code does a spin_lock_irqsave. This
    calls into lockdep and lockdep will fail when it detects the
    invalid state of having softirqs disabled but the internal
    current->softirqs_enabled is still set.

    The fix is to manually add the SOFTIRQ_OFFSET to preempt count
    and call the preempt tracer code outside the lockdep critical
    area.

    Thanks to Peter Zijlstra for suggesting this solution.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

13 Jan, 2009

1 commit