Eric Lee / smarc-fsl-linux-kernel

21 Mar, 2012

3 commits

161f7a716 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer changes for v3.4 from Ingo Molnar

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
ntp: Fix integer overflow when setting time
math: Introduce div64_long
cs5535-clockevt: Allow the MFGPT IRQ to be shared
cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
x86/time: Eliminate unused irq0_irqs counter
clocksource: scx200_hrt: Fix the build
x86/tsc: Reduce the TSC sync check time for core-siblings
timer: Fix bad idle check on irq entry
nohz: Remove ts->Einidle checks before restarting the tick
nohz: Remove update_ts_time_stat from tick_nohz_start_idle
clockevents: Leave the broadcast device in shutdown mode when not needed
clocksource: Load the ACPI PM clocksource asynchronously
clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
clocksource: Get rid of clocksource_calc_mult_shift()
clocksource: dbx500: convert to clocksource_register_hz()
clocksource: scx200_hrt: use pr_ instead of printk
time: Move common updates to a function
time: Reorder so the hot data is together
time: Remove most of xtime_lock usage in timekeeping.c
ntp: Add ntp_lock to replace xtime_locking
...

Linus Torvalds
2012-03-21 01:32:09 +0800
2ba68940c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes for v3.4 from Ingo Molnar

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
printk: Make it compile with !CONFIG_PRINTK
sched/x86: Fix overflow in cyc2ns_offset
sched: Fix nohz load accounting -- again!
sched: Update yield() docs
printk/sched: Introduce special printk_sched() for those awkward moments
sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
sched: Cleanup cpu_active madness
sched: Fix load-balance wreckage
sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
sched: Ditch per cgroup task lists for load-balancing
sched: Rename load-balancing fields
sched: Move load-balancing arguments into helper struct
sched/rt: Do not submit new work when PI-blocked
sched/rt: Prevent idle task boosting
sched/wait: Add __wake_up_all_locked() API
sched/rt: Document scheduler related skip-resched-check sites
sched/rt: Use schedule_preempt_disabled()
sched/rt: Add schedule_preempt_disabled()
sched/rt: Do not throttle when PI boosting
sched/rt: Keep period timer ticking when rt throttling is active
...

Linus Torvalds
2012-03-21 01:31:44 +0800
9c2b957db Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf events changes for v3.4 from Ingo Molnar:

- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)

This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.

The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:

$ perf record -b any_call,u -e cycles:u branchy

$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf

This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.

This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.

- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.

- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:

perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485

- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.

- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:

struct static_key key = STATIC_KEY_INIT_FALSE;

...

if (static_key_false(&key))
do unlikely code
else
do likely code

...
static_key_slow_inc();
...
static_key_slow_inc();
...

The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.

This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.

- SW function tracer improvements: perf support and filtering support.

- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.

- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.

- Allow the tracing of kernel console output (printk).

- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.

- 'perf bench' improvements

- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...

Linus Torvalds
2012-03-21 01:29:15 +0800

06 Mar, 2012

1 commit

b2a001786 softirq: Reduce invoke_softirq() code duplication ... Browse Code »

The two invoke_softirq() variants are identical except for a single
line. So move the #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED inside one of
the functions and get rid of the other one.

Signed-off-by: Heiko Carstens
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Heiko Carstens
2012-03-06 20:33:27 +0800

01 Mar, 2012

2 commits

ba74c1448 sched/rt: Document scheduler related skip-resched-check sites ... Browse Code »

Create a distinction between scheduler related preempt_enable_no_resched()
calls and the nearly one hundred other places in the kernel that do not
want to reschedule, for one reason or another.

This distinction matters for -rt, where the scheduler and the non-scheduler
preempt models (and checks) are different. For upstream it's purely
documentational.

Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org
Signed-off-by: Ingo Molnar

Thomas Gleixner
2012-03-01 17:28:04 +0800
bd2f55361 sched/rt: Use schedule_preempt_disabled() ... Browse Code »

Coccinelle based conversion.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.org
Signed-off-by: Ingo Molnar

Thomas Gleixner
2012-03-01 17:28:03 +0800

15 Feb, 2012

1 commit

0a8a2e78b timer: Fix bad idle check on irq entry ... Browse Code »

idle_cpu() is called on irq entry to guess if we need to call
tick_check_idle(). This way we can catch up with jiffies if the tick
was stopped, stop accounting idle time during the interrupt and
maintain the sched clock if it is unstable.

But if we are going to exit the idle loop to schedule a new task (ie:
if we have a task in the runqueue or a remotely enqueued ttwu to
perform), the idle_cpu() check will return 0 such that we miss the
call to tick_check_idle() for all interrupts happening before we
schedule the new task.

As a result these interrupts and the softirqs coming along may deal
with stale jiffies values, bad sched clock values, and won't substract
their time from the idle time accounting.

Fix this with using is_idle_task() instead that strictly checks that
we are running the idle task, without caring about the fact we are
going to schedule a task soon.

Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: John Stultz
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/1327427984-23282-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner

Frederic Weisbecker
2012-02-15 22:23:09 +0800

03 Feb, 2012

1 commit

f069686e4 tracing/softirq: Move __raise_softirq_irqoff() out of header ... Browse Code »

The __raise_softirq_irqoff() contains a tracepoint. As tracepoints in headers
can cause issues, and not to mention, bloats the kernel when they are
in a static inline, it is best to move the function that contains the
tracepoint out of the header and into softirq.c.

Link: http://lkml.kernel.org/r/20120118120711.GB14863@elte.hu

Suggested-by: Ingo Molnar
Signed-off-by: Steven Rostedt

Steven Rostedt
2012-02-03 22:48:19 +0800

12 Dec, 2011

2 commits

416eb33cd rcu: Fix early call to rcu_idle_enter() ... Browse Code »

On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:

[ 132.978883] ===============================
[ 132.978883] [ INFO: suspicious RCU usage. ]
[ 132.978883] -------------------------------
[ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[ 132.978883]
[ 132.978883] other info that might help us debug this:
[ 132.978883]
[ 132.978883]
[ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
[ 132.978883] RCU used illegally from extended quiescent state!
[ 132.978883] 2 locks held by swapper/0:
[ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x39/0x2f0
[ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [] select_task_rq_fair+0x6a/0xec0
[ 132.978883]
[ 132.978883] stack backtrace:
[ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
[ 132.978883] Call Trace:
[ 132.978883] [] lockdep_rcu_suspicious+0xe6/0x100
[ 132.978883] [] select_task_rq_fair+0x749/0xec0
[ 132.978883] [] ? select_task_rq_fair+0x6a/0xec0
[ 132.978883] [] ? do_raw_spin_lock+0x54/0x150
[ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [] try_to_wake_up+0xd3/0x2f0
[ 132.978883] [] ? ktime_get+0x68/0xf0
[ 132.978883] [] wake_up_process+0x15/0x20
[ 132.978883] [] raise_softirq_irqoff+0x65/0x110
[ 132.978883] [] __hrtimer_start_range_ns+0x415/0x5a0
[ 132.978883] [] ? do_raw_spin_unlock+0x5e/0xb0
[ 132.978883] [] hrtimer_start+0x18/0x20
[ 132.978883] [] tick_nohz_stop_sched_tick+0x393/0x450
[ 132.978883] [] irq_exit+0xd2/0x100
[ 132.978883] [] do_IRQ+0x66/0xe0
[ 132.978883] [] common_interrupt+0x13/0x13
[ 132.978883] [] ? native_safe_halt+0xb/0x10
[ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [] default_idle+0xba/0x370
[ 132.978883] [] amd_e400_idle+0x5e/0x130
[ 132.978883] [] cpu_idle+0xb6/0x120
[ 132.978883] [] rest_init+0xef/0x150
[ 132.978883] [] ? rest_init+0x52/0x150
[ 132.978883] [] start_kernel+0x3da/0x3e5
[ 132.978883] [] x86_64_start_reservations+0x131/0x135
[ 132.978883] [] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Frederic Weisbecker
2011-12-12 02:31:38 +0800
280f06774 nohz: Separate out irq exit and idle loop dyntick logic ... Browse Code »

The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker
Cc: Mike Frysinger
Cc: Guan Xuetao
Cc: David Miller
Cc: Chris Metcalf
Cc: Hans-Christian Egtvedt
Cc: Ralf Baechle
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Cc: Russell King
Cc: Paul Mackerras
Cc: Heiko Carstens
Cc: Paul Mundt
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Frederic Weisbecker
2011-12-12 02:31:35 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

21 Jul, 2011

1 commit

ec433f0c5 softirq,rcu: Inform RCU of irq_exit() activity ... Browse Code »
43

The rcu_read_unlock_special() function relies on in_irq() to exclude
scheduler activity from interrupt level. This fails because exit_irq()
can invoke the scheduler after clearing the preempt_count() bits that
in_irq() uses to determine that it is at interrupt level. This situation
can result in failures as follows:

$task IRQ SoftIRQ

rcu_read_lock()

/* do stuff */

|= UNLOCK_BLOCKED

rcu_read_unlock()
--t->rcu_read_lock_nesting

irq_enter();
/* do stuff, don't use RCU */
irq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
invoke_softirq()

ttwu();
spin_lock_irq(&pi->lock)
rcu_read_lock();
/* do stuff */
rcu_read_unlock();
rcu_read_unlock_special()
rcu_report_exp_rnp()
ttwu()
spin_lock_irq(&pi->lock) /* deadlock */

rcu_read_unlock_special(t);

Ed can simply trigger this 'easy' because invoke_softirq() immediately
does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
first, but even without that the above happens.

Cure this by also excluding softirqs from the
rcu_read_unlock_special() handler and ensuring the force_irqthreads
ksoftirqd/# wakeup is done from full softirq context.

[ Alternatively, delaying the ->rcu_read_lock_nesting decrement
until after the special handling would make the thing more robust
in the face of interrupts as well. And there is a separate patch
for that. ]

Cc: Thomas Gleixner
Reported-and-tested-by: Ed Tomlinson
Signed-off-by: Peter Zijlstra
Signed-off-by: Paul E. McKenney

Peter Zijlstra
2011-07-21 01:50:12 +0800

15 Jun, 2011

1 commit

09223371d rcu: Use softirq to address performance regression ... Browse Code »

Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread. A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler. Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling. (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime. And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work. RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case. This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

Signed-off-by: Shaohua Li
Tested-by: "Alex,Shi"
Signed-off-by: Paul E. McKenney

Shaohua Li
2011-06-15 06:25:39 +0800

06 May, 2011

1 commit

a26ac2455 rcu: move TREE_RCU from softirq to kthread ... Browse Code »

If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers. Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked. If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.

But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.

Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2011-05-06 14:16:54 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

1 commit

94dcf29a1 kthread: use kthread_create_on_node() ... Browse Code »

ksoftirqd, kworker, migration, and pktgend kthreads can be created with
kthread_create_on_node(), to get proper NUMA affinities for their stack and
task_struct.

Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Acked-by: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2011-03-23 08:44:01 +0800

16 Mar, 2011

1 commit

5f6fb4546 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
x86: Enable forced interrupt threading support
x86: Mark low level interrupts IRQF_NO_THREAD
x86: Use generic show_interrupts
x86: ioapic: Avoid redundant lookup of irq_cfg
x86: ioapic: Use new move_irq functions
x86: Use the proper accessors in fixup_irqs()
x86: ioapic: Use irq_data->state
x86: ioapic: Simplify irq chip and handler setup
x86: Cleanup the genirq name space
genirq: Add chip flag to force mask on suspend
genirq: Add desc->irq_data accessor
genirq: Add comments to Kconfig switches
genirq: Fixup fasteoi handler for oneshot mode
genirq: Provide forced interrupt threading
sched: Switch wait_task_inactive to schedule_hrtimeout()
genirq: Add IRQF_NO_THREAD
genirq: Allow shared oneshot interrupts
genirq: Prepare the handling of shared oneshot interrupts
genirq: Make warning in handle_percpu_event useful
x86: ioapic: Move trigger defines to io_apic.h
...

Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups. The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().

Linus Torvalds
2011-03-16 10:23:40 +0800

26 Feb, 2011

1 commit

8d32a307e genirq: Provide forced interrupt threading ... Browse Code »

Add a commandline parameter "threadirqs" which forces all interrupts except
those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
allow retrieving better debug data from crashing interrupt handlers. If
"threadirqs" is not enabled on the kernel command line, then there is no
impact in the interrupt hotpath.

Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
marking the interrupts which cant be threaded IRQF_NO_THREAD. All
interrupts which have IRQF_TIMER set are implict marked
IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

Forced threading hard interrupts also forces all soft interrupt
handling into thread context.

When enabled it might slow down things a bit, but for debugging problems in
interrupt code it's a reasonable penalty as it does not immediately
crash and burn the machine when an interrupt handler is buggy.

Some test results on a Core2Duo machine:

Cache cold run of:
# time git grep irq_desc

non-threaded threaded
real 1m18.741s 1m19.061s
user 0m1.874s 0m1.757s
sys 0m5.843s 0m5.427s

# iperf -c server
non-threaded
[ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
[ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
[ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
threaded
[ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
[ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
[ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
LKML-Reference:

Thomas Gleixner
2011-02-26 18:57:18 +0800

09 Feb, 2011

1 commit

c305d524e softirq: Avoid stack switch from ksoftirqd ... Browse Code »

ksoftirqd() calls do_softirq() which switches stacks on several
architectures. That makes no sense at all. ksoftirqd's stack is
sufficient.

Call __do_softirq() directly.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Cc: Benjamin Herrenschmidt
Cc: Heiko Carstens
Acked-by: David Miller
Cc: Paul Mundt
Reviewed-by: Frank Rowand
LKML-Reference:

Thomas Gleixner
2011-02-09 02:37:12 +0800

26 Jan, 2011

1 commit

4dd53d891 softirqs: Free up pf flag PF_KSOFTIRQD ... Browse Code »

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.

Tested-by: Shaun Ruffell
Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2011-01-26 19:33:20 +0800

14 Jan, 2011

1 commit

351f8f8e6 kernel: clean up USE_GENERIC_SMP_HELPERS ... Browse Code »

For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
on_each_cpu() is compiled.

Signed-off-by: Amerigo Wang
Cc: David Howells
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Peter Zijlstra
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Amerigo Wang
2011-01-14 00:03:08 +0800

08 Jan, 2011

1 commit

72eb6a791 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.

Linus Torvalds
2011-01-08 09:02:58 +0800

07 Jan, 2011

1 commit

c9b5f501e sched: Constify function scope static struct sched_param usage ... Browse Code »

Function-scope statics are discouraged because they are
easily overlooked and can cause subtle bugs/races due to
their global (non-SMP safe) nature.

Linus noticed that we did this for sched_param - at minimum
make the const.

Suggested-by: Linus Torvalds
Signed-off-by: Peter Zijlstra
LKML-Reference: Message-ID:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-01-07 22:55:45 +0800

17 Dec, 2010

1 commit

909ea9646 core: Replace __get_cpu_var with __this_cpu_read if not used for an address. ... Browse Code »

__get_cpu_var() can be replaced with this_cpu_read and will then use a
single read instruction with implied address calculation to access the
correct per cpu instance.

However, the address of a per cpu variable passed to __this_cpu_read()
cannot be determined (since it's an implied address conversion through
segment prefixes). Therefore apply this only to uses of __get_cpu_var
where the address of the variable is not used.

Cc: Pekka Enberg
Cc: Hugh Dickins
Cc: Thomas Gleixner
Acked-by: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2010-12-17 22:07:19 +0800

18 Nov, 2010

1 commit

92fd4d4d6 Merge commit 'v2.6.37-rc2' into sched/core ... Browse Code »

Merge reason: Move to a .37-rc base.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-11-18 20:22:26 +0800

28 Oct, 2010

1 commit

a042e2613 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf python scripting: Add futex-contention script
perf python scripting: Fixup cut'n'paste error in sctop script
perf scripting: Shut up 'perf record' final status
perf record: Remove newline character from perror() argument
perf python scripting: Support fedora 11 (audit 1.7.17)
perf python scripting: Improve the syscalls-by-pid script
perf python scripting: print the syscall name on sctop
perf python scripting: Improve the syscalls-counts script
perf python scripting: Improve the failed-syscalls-by-pid script
kprobes: Remove redundant text_mutex lock in optimize
x86/oprofile: Fix uninitialized variable use in debug printk
tracing: Fix 'faild' -> 'failed' typo
perf probe: Fix format specified for Dwarf_Off parameter
perf trace: Fix detection of script extension
perf trace: Use $PERF_EXEC_PATH in canned report scripts
perf tools: Document event modifiers
perf tools: Remove direct slang.h include
perf_events: Fix for transaction recovery in group_sched_in()
perf_events: Revert: Fix transaction recovery in group_sched_in()
perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
...

Linus Torvalds
2010-10-28 09:48:00 +0800

24 Oct, 2010

1 commit

b8ecad8b2 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/l… ... Browse Code »

…inux-2.6 into perf/urgent

Ingo Molnar
2010-10-24 02:05:43 +0800

23 Oct, 2010

2 commits

fe7de49f9 sched: Make sched_param argument static in sched_setscheduler() callers ... Browse Code »

Andrew Morton pointed out almost all sched_setscheduler() callers are
using fixed parameters and can be converted to static. It reduces runtime
memory use a little.

Signed-off-by: KOSAKI Motohiro
Reported-by: Andrew Morton
Acked-by: James Morris
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2010-10-23 23:56:48 +0800
02f36038c Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus',… ... Browse Code »

… 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softirqs: Make wakeup_softirqd static

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Restore parentheses around one pushl_cfi argument
x86, asm: Fix ancient-GAS workaround
x86, asm: Fix CFI macro invocations to deal with shortcomings in gas

* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: HPET force enable for CX700 / VIA Epia LT

* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: Use string copy operation to optimze copy in kernel compression

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()

* 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.

Linus Torvalds
2010-10-23 23:25:36 +0800

22 Oct, 2010

1 commit

4a60cfa94 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
apic, x86: Check if EILVT APIC registers are available (AMD only)
x86: ioapic: Call free_irte only if interrupt remapping enabled
arm: Use ARCH_IRQ_INIT_FLAGS
genirq, ARM: Fix boot on ARM platforms
genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
x86: Switch sparse_irq allocations to GFP_KERNEL
genirq: Switch sparse_irq allocator to GFP_KERNEL
genirq: Make sparse_lock a mutex
x86: lguest: Use new irq allocator
genirq: Remove the now unused sparse irq leftovers
genirq: Sanitize dynamic irq handling
genirq: Remove arch_init_chip_data()
x86: xen: Sanitise sparse_irq handling
x86: Use sane enumeration
x86: uv: Clean up the direct access to irq_desc
x86: Make io_apic.c local functions static
genirq: Remove irq_2_iommu
x86: Speed up the irq_remapped check in hot pathes
intr_remap: Simplify the code further
...

Fix up trivial conflicts in arch/x86/Kconfig

Linus Torvalds
2010-10-22 05:11:46 +0800

21 Oct, 2010

1 commit

f4bc6bb2d tracing: Cleanup the convoluted softirq tracepoints ... Browse Code »

With the addition of trace_softirq_raise() the softirq tracepoint got
even more convoluted. Why the tracepoints take two pointers to assign
an integer is beyond my comprehension.

But adding an extra case which treats the first pointer as an unsigned
long when the second pointer is NULL including the back and forth
type casting is just horrible.

Convert the softirq tracepoints to take a single unsigned int argument
for the softirq vector number and fix the call sites.

Signed-off-by: Thomas Gleixner
LKML-Reference:
Acked-by: Peter Zijlstra
Acked-by: mathieu.desnoyers@efficios.com
Cc: Frederic Weisbecker
Cc: Steven Rostedt

Thomas Gleixner
2010-10-21 22:50:29 +0800

19 Oct, 2010

3 commits

d267f87fb sched: Call tick_check_idle before __irq_enter ... Browse Code »

When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
to notify interruption from idle. But, there is a problem if this call
is done after __irq_enter, as all routines in __irq_enter may find
stale time due to yet to be done tick_check_idle.

Specifically, trace calls in __irq_enter when they use global clock and also
account_system_vtime change in this patch as it wants to use sched_clock_cpu()
to do proper irq timing.

But, tick_check_idle was moved after __irq_enter intentionally to
prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a:

irq: call __irq_enter() before calling the tick_idle_check
Impact: avoid spurious ksoftirqd wakeups

Moving tick_check_idle() before __irq_enter and wrapping it with
local_bh_enable/disable would solve both the problems.

Fixed-by: Yong Zhang
Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2010-10-19 02:52:29 +0800
6cdd5199d sched: Add a PF flag for ksoftirqd identification ... Browse Code »

To account softirq time cleanly in scheduler, we need to identify whether
softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
Add PF_KSOFTIRQD for that purpose.

As all PF flag bits are currently taken, create space by moving one of the
infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
with some other state fields.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2010-10-19 02:52:22 +0800
75e1056f5 sched: Fix softirq time accounting ... Browse Code »

Peter Zijlstra found a bug in the way softirq time is accounted in
VIRT_CPU_ACCOUNTING on this thread:

http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

The problem is, softirq processing uses local_bh_disable internally. There
is no way, later in the flow, to differentiate between whether softirq is
being processed or is it just that bh has been disabled. So, a hardirq when bh
is disabled results in time being wrongly accounted as softirq.

Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
as well. As account_system_time() in normal tick based accouting also uses
softirq_count, which will be set even when not in softirq with bh disabled.

Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
processing. The patch below does that and adds API in_serving_softirq() which
returns whether we are currently processing softirq or not.

Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
to in_serving_softirq.

Looks like many usages of in_softirq really want in_serving_softirq. Those
changes can be made individually on a case by case basis.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2010-10-19 02:52:20 +0800

12 Oct, 2010

2 commits

b7d0d8258 genirq: Remove arch_init_chip_data() ... Browse Code »

This function should have not been there in the first place.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar

Thomas Gleixner
2010-10-12 22:53:44 +0800
b683de2b3 genirq: Query arch for number of early descriptors ... Browse Code »

sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go
ahead and allocate more.

Use the unused return value of arch_probe_nr_irqs() to let the
architecture return the number of early allocations. Fix up all users.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar

Thomas Gleixner
2010-10-12 22:39:08 +0800

22 Sep, 2010

1 commit

676cb02dc softirqs: Make wakeup_softirqd static ... Browse Code »

No users outside of kernel/softirq.c

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2010-09-22 16:15:42 +0800

05 Jun, 2010

1 commit

9e506f7ad kernel/: fix BUG_ON checks for cpu notifier callbacks direct call ... Browse Code »

The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu
notifier to return encapsulate errno value") changed the return value of
cpu notifier callbacks.

Those callbacks don't return NOTIFY_BAD on failures anymore. But there
are a few callbacks which are called directly at init time and checking
the return value.

I forgot to change BUG_ON checking by the direct callers in the commit.

Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-06-05 06:21:45 +0800

28 May, 2010

1 commit

80b5184cc kernel/: convert cpu notifier to return encapsulate errno value ... Browse Code »

By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for kernel/*.c

Signed-off-by: Akinobu Mita
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-05-28 00:12:48 +0800

11 May, 2010

1 commit

25502a6c1 rcu: refactor RCU's context-switch handling ... Browse Code »

The addition of preemptible RCU to treercu resulted in a bit of
confusion and inefficiency surrounding the handling of context switches
for RCU-sched and for RCU-preempt. For RCU-sched, a context switch
is a quiescent state, pure and simple, just like it always has been.
For RCU-preempt, a context switch is in no way a quiescent state, but
special handling is required when a task blocks in an RCU read-side
critical section.

However, the callout from the scheduler and the outer loop in ksoftirqd
still calls something named rcu_sched_qs(), whose name is no longer
accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched
quiescent state, it ends up unnecessarily (though harmlessly, aside
from the performance hit) enqueuing the current task if it happens to
be running in an RCU-preempt read-side critical section. This not only
increases the maximum latency of scheduler_tick(), it also needlessly
increases the overhead of the next outermost rcu_read_unlock() invocation.

This patch addresses this situation by separating the notion of RCU's
context-switch handling from that of RCU-sched's quiescent states.
The context-switch handling is covered by rcu_note_context_switch() in
general and by rcu_preempt_note_context_switch() for preemptible RCU.
This permits rcu_sched_qs() to handle quiescent states and only quiescent
states. It also reduces the maximum latency of scheduler_tick(), though
probably by much less than a microsecond. Finally, it means that tasks
within preemptible-RCU read-side critical sections avoid incurring the
overhead of queuing unless there really is a context switch.

Suggested-by: Lai Jiangshan
Acked-by: Lai Jiangshan
Signed-off-by: Paul E. McKenney
Cc: Ingo Molnar
Cc: Peter Zijlstra

Paul E. McKenney
2010-05-11 02:08:33 +0800