Eric Lee / smarc-fsl-linux-kernel

25 Jan, 2014

1 commit

a2b4c607c Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/fred… ... Browse Code »

…eric/linux-dynticks into timers/urgent

Pull dynticks cleanups from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2014-01-25 15:27:26 +0800

21 Jan, 2014

1 commit

6c6461435 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer changes from Ingo Molnar:
- ARM clocksource/clockevent improvements and fixes
- generic timekeeping updates: TAI fixes/improvements, cleanups
- Posix cpu timer cleanups and improvements
- dynticks updates: full dynticks bugfixes, optimizations and cleanups

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
clocksource: Timer-sun5i: Switch to sched_clock_register()
timekeeping: Remove comment that's mostly out of date
rtc-cmos: Add an alarm disable quirk
timekeeper: fix comment typo for tk_setup_internals()
timekeeping: Fix missing timekeeping_update in suspend path
timekeeping: Fix CLOCK_TAI timer/nanosleep delays
tick/timekeeping: Call update_wall_time outside the jiffies lock
timekeeping: Avoid possible deadlock from clock_was_set_delayed
timekeeping: Fix potential lost pv notification of time change
timekeeping: Fix lost updates to tai adjustment
clocksource: sh_cmt: Add clk_prepare/unprepare support
clocksource: bcm_kona_timer: Remove unused bcm_timer_ids
clocksource: vt8500: Remove deprecated IRQF_DISABLED
clocksource: tegra: Remove deprecated IRQF_DISABLED
clocksource: misc drivers: Remove deprecated IRQF_DISABLED
clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata()
clocksource: sh_tmu: Remove unnecessary platform_set_drvdata()
clocksource: armada-370-xp: Enable timer divider only when needed
clocksource: clksrc-of: Warn if no clock sources are found
clocksource: orion: Switch to sched_clock_register()
...

Linus Torvalds
2014-01-21 03:34:26 +0800

16 Jan, 2014

3 commits

e9a2eb403 nohz_full: fix code style issue of tick_nohz_full_stop_tick ... Browse Code »

Code usually starts with 'tab' instead of 7 'space' in kernel

Signed-off-by: Alex Shi
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Alex Shi
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: John Stultz
Cc: Kevin Hilman
Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex.shi@linaro.org
Signed-off-by: Frederic Weisbecker

Alex Shi
2014-01-16 06:07:11 +0800
855a0fc30 nohz: Get timekeeping max deferment outside jiffies_lock ... Browse Code »

We don't need to fetch the timekeeping max deferment under the
jiffies_lock seqlock.

If the clocksource is updated concurrently while we stop the tick,
stop machine is called and the tick will be reevaluated again along with
uptodate jiffies and its related values.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Alex Shi
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: John Stultz
Cc: Kevin Hilman
Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-01-16 06:07:11 +0800
5acac1be4 tick: Rename tick_check_idle() to tick_irq_enter() ... Browse Code »

This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Alex Shi
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: John Stultz
Cc: Kevin Hilman
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-01-16 06:05:31 +0800

13 Jan, 2014

1 commit

35af99e64 sched/clock, x86: Use a static_key for sched_clock_stable ... Browse Code »

In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

MAINLINE PRE POST

sched_clock_stable: 1 1 1
(cold) sched_clock: 329841 221876 215295
(cold) local_clock: 301773 234692 220773
(warm) sched_clock: 38375 25602 25659
(warm) local_clock: 100371 33265 27242
(warm) rdtsc: 27340 24214 24208
sched_clock_stable: 0 0 0
(cold) sched_clock: 382634 235941 237019
(cold) local_clock: 396890 297017 294819
(warm) sched_clock: 38194 25233 25609
(warm) local_clock: 143452 71234 71232
(warm) rdtsc: 27345 24245 24243

Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-01-13 22:13:13 +0800

12 Jan, 2014

2 commits

d05d24a98 Merge branch 'fortglx/3.14/time' of git://git.linaro.org/people/john.stultz/linux into timers/core ... Browse Code »

Pull timekeeping updates from John Stultz.

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-01-12 21:13:31 +0800
dba861461 Merge branch 'linus' into timers/core ... Browse Code »

Pick up the latest fixes and refresh the branch.

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-01-12 21:12:44 +0800

24 Dec, 2013

1 commit

47a1b7963 tick/timekeeping: Call update_wall_time outside the jiffies lock ... Browse Code »

Since the xtime lock was split into the timekeeping lock and
the jiffies lock, we no longer need to call update_wall_time()
while holding the jiffies lock.

Thus, this patch splits update_wall_time() out from do_timer().

This allows us to get away from calling clock_was_set_delayed()
in update_wall_time() and instead use the standard clock_was_set()
call that previously would deadlock, as it causes the jiffies lock
to be acquired.

Cc: Sasha Levin
Cc: Thomas Gleixner
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Ingo Molnar
Signed-off-by: John Stultz

John Stultz
2013-12-24 03:54:32 +0800

03 Dec, 2013

1 commit

e8fcaa5c5 nohz: Convert a few places to use local per cpu accesses ... Browse Code »

A few functions use remote per CPU access APIs when they
deal with local values.

Just do the right conversion to improve performance, code
readability and debug checks.

While at it, lets extend some of these function names with *_this_cpu()
suffix in order to display their purpose more clearly.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steven Rostedt

Frederic Weisbecker
2013-12-03 03:39:30 +0800

29 Nov, 2013

1 commit

0e576acbc nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off ... Browse Code »

If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.

If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
command line option "nohz=off" or not enabled due to missing hardware
support, then tick_nohz_get_sleep_length() returns 0. That happens
because ts->sleep_length is never set in that case.

Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.

Reported-by: Michal Hocko
Reported-by: Borislav Petkov
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-11-29 19:23:03 +0800

19 Nov, 2013

1 commit

d689fe222 NOHZ: Check for nohz active instead of nohz enabled ... Browse Code »

RCU and the fine grained idle time accounting functions check
tick_nohz_enabled. But that variable is merily telling that NOHZ has
been enabled in the config and not been disabled on the command line.

But it does not tell anything about nohz being active. That's what all
this should check for.

Matthew reported, that the idle accounting on his old P1 machine
showed bogus values, when he enabled NOHZ in the config and did not
disable it on the kernel command line. The reason is that his machine
uses (refined) jiffies as a clocksource which explains why the "fine"
grained accounting went into lala land, because it depends on when the
system goes and leaves idle relative to the jiffies increment.

Provide a tick_nohz_active indicator and let RCU and the accounting
code use this instead of tick_nohz_enable.

Reported-and-tested-by: Matthew Whitehead
Signed-off-by: Thomas Gleixner
Reviewed-by: Steven Rostedt
Reviewed-by: Paul E. McKenney
Cc: john.stultz@linaro.org
Cc: mwhitehe@redhat.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de

Thomas Gleixner
2013-11-19 21:59:50 +0800

16 Aug, 2013

1 commit

c2e7fcf53 nohz: Include local CPU in full dynticks global kick ... Browse Code »

tick_nohz_full_kick_all() is useful to notify all full dynticks
CPUs that there is a system state change to checkout before
re-evaluating the need for the tick.

Unfortunately this is implemented using smp_call_function_many()
that ignores the local CPU. This CPU also needs to re-evaluate
the tick.

on_each_cpu_mask() is not useful either because we don't want to
re-evaluate the tick state in place but asynchronously from an IPI
to avoid messing up with any random locking scenario.

So lets call tick_nohz_full_kick() from tick_nohz_full_kick_all()
so that the usual irq work takes care of it.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1375460996-16329-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2013-08-16 23:55:33 +0800

14 Aug, 2013

4 commits

6f1d65766 Merge branch 'timers/nohz-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/f… ... Browse Code »

…rederic/linux-dynticks into timers/nohz

Pull nohz improvements from Frederic Weisbecker:

" It mostly contains fixes and full dynticks off-case optimizations. I believe that
distros want to enable this feature so it seems important to optimize the case
where the "nohz_full=" parameter is empty. ie: I'm trying to remove any performance
regression that comes with NO_HZ_FULL=y when the feature is not used.

This patchset improves the current situation a lot (off-case appears to be around 11% faster
with hackbench, although I guess it may vary depending on the configuration but it should be
significantly faster in any case) now there is still some work to do: I can still observe a
remaining loss of 1.6% throughput seen with hackbench compared to CONFIG_NO_HZ_FULL=n. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-08-14 23:58:56 +0800
d13508f94 nohz: Optimize full dynticks's sched hooks with static keys ... Browse Code »

Scheduler IPIs and task context switches are serious fast path.
Let's try to hide as much as we can the impact of full
dynticks APIs' off case that are called on these sites
through the use of static keys.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-14 23:14:58 +0800
460775df4 nohz: Optimize full dynticks state checks with static keys ... Browse Code »

These APIs are frequenctly accessed and priority is given
to optimize the full dynticks off-case in order to let
distros enable this feature without suffering from
significant performance regressions.

Let's inline these APIs and optimize them with static keys.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-14 23:14:57 +0800
73867dcd0 nohz: Rename a few state variables ... Browse Code »

Rename the full dynticks's cpumask and cpumask state variables
to some more exportable names.

These will be used later from global headers to optimize
the main full dynticks APIs in conjunction with static keys.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-14 23:14:57 +0800

13 Aug, 2013

1 commit

2e7093386 nohz: Only enable context tracking on full dynticks CPUs ... Browse Code »

The context tracking subsystem has the ability to selectively
enable the tracking on any defined subset of CPU. This means that
we can define a CPU range that doesn't run the context tracking
and another range that does.

Now what we want in practice is to enable the tracking on full
dynticks CPUs only. In order to perform this, we just need to pass
our full dynticks CPU range selection from the full dynticks
subsystem to the context tracking.

This way we can spare the overhead of RCU user extended quiescent
state and vtime maintainance on the CPUs that are outside the
full dynticks range. Just keep in mind the raw context tracking
itself is still necessary everywhere.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-13 06:54:07 +0800

29 Jul, 2013

1 commit

148519120 Revert "cpuidle: Quickly notice prediction failure for repeat mode" ... Browse Code »

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

We believe we've identified a particular commit to the cpuidle code
that seems to be impacting performance of variety of workloads.
The simplest way to reproduce is using netperf TCP_RR test, so
we're using that, on a pair of Sandy Bridge based servers. We also
have data from a large database setup where performance is also
measurably/positively impacted, though that test data isn't easily
share-able.

Included below are test results from 3 test kernels:

kernel reverts
-----------------------------------------------------------
1) vanilla upstream (no reverts)

2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
e11538d1f03914eb92af5a1a378375c05ae8520c

In summary, netperf TCP_RR numbers improve by approximately 4%
after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
never seems to get above 40%. Taking that patch out gets C0 near
100% quite often, and performance increases.

The below data are histograms representing the %c0 residency @
1-second sample rates (using turbostat), while under netperf test.

- If you look at the first 4 histograms, you can see %c0 residency
almost entirely in the 30,40% bin.
- The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
shows %c0 in the 80,90,100% bins.

Below each kernel name are netperf TCP_RR trans/s numbers for the
particular kernel that can be disclosed publicly, comparing the 3
test kernels. We ran a 4th test with the vanilla kernel where
we've also set /dev/cpu_dma_latency=0 to show overall impact
boosting single-threaded TCP_RR performance over 11% above
baseline.

3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
TCP_RR trans/s 54323.78

-----------------------------------------------------------
3.10-rc2 vanilla RX (no reverts)
TCP_RR trans/s 48192.47

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 1]: *
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 49]:
*************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 perfteam2 RX (reverts commit
e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 49698.69

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 2]: **
40.0000 - 50.0000 [ 58]:
**********************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
and e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 47766.95

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 27]: ***************************
40.0000 - 50.0000 [ 2]: **
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 2]: **
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 28]: ****************************

Sender:
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 1]: *
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 3]: ***
80.0000 - 90.0000 [ 7]: *******
90.0000 - 100.0000 [ 38]: **************************************

These results demonstrate gaining back the tendency of the CPU to
stay in more responsive, performant C-states (and thus yield
measurably better performance), by reverting commit
69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

Requested-by: Jeremy Eder
Tested-by: Len Brown
Cc: 3.8+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-07-29 19:32:29 +0800

25 Jul, 2013

2 commits

ca06416b2 nohz: fix compile warning in tick_nohz_init() ... Browse Code »

cpu is not used after commit 5b8621a68fdcd2baf1d3b413726f913a5254d46a

Signed-off-by: Li Zhong
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman
Signed-off-by: Frederic Weisbecker

Li Zhong
2013-07-25 02:30:33 +0800
543487c7a nohz: Do not warn about unstable tsc unless user uses nohz_full ... Browse Code »

If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
with an unstable TSC, it will produce a WARN_ON dump as well as taint
the kernel. This is a bit extreme for a kernel that just enables a
feature but doesn't use it.

The warning should only happen if the user tries to use the feature by
either adding nohz_full to the kernel command line, or by enabling
CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
this second feature should not (yet) be used by distros or anyone that
doesn't care if NO_HZ is used or not.

Signed-off-by: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman
Signed-off-by: Frederic Weisbecker

Steven Rostedt
2013-07-25 02:30:33 +0800

15 Jul, 2013

1 commit

0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:59 +0800

10 Jul, 2013

1 commit

e399eb56a Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/fred… ... Browse Code »

…eric/linux-dynticks into timers/urgent

Pull nohz updates/fixes from Frederic Weisbecker:

' Note that "watchdog: Boot-disable by default on full dynticks" is a temporary
solution to solve the issue with the watchdog that prevents the tick from
stopping. This is to make sure that 3.11 doesn't have that problem as several
people complained about it.

A proper and longer term solution has been proposed by Peterz:

http://lkml.kernel.org/r/20130618103632.GO3204@twins.programming.kicks-ass.net
'

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-07-10 16:43:25 +0800

20 Jun, 2013

2 commits

5b8621a68 nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs ... Browse Code »

Building full dynticks now implies that all CPUs are forced
into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL.

The dynamic check has become useless.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Li Zhong
Cc: Borislav Petkov

Frederic Weisbecker
2013-06-20 21:46:43 +0800
e12d02717 nohz: Warn if the machine can not perform nohz_full ... Browse Code »

If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
due to the machine having a unstable clock, warn about it.

We do not want users thinking that they are getting the benefit
of nohz when their machine can not support it.

Signed-off-by: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Signed-off-by: Frederic Weisbecker

Steven Rostedt
2013-06-20 07:15:51 +0800

31 May, 2013

1 commit

1a7f829f0 nohz: Fix notifier return val that enforce timekeeping ... Browse Code »

In tick_nohz_cpu_down_callback() if the cpu is the one handling
timekeeping, we must return something that stops the CPU_DOWN_PREPARE
notifiers and then start notify CPU_DOWN_FAILED on the already called
notifier call backs.

However traditional errno values are not handled by the notifier unless
these are encapsulated using errno_to_notifier().

Hence the current -EINVAL is misinterpreted and converted to junk after
notifier_to_errno(), leaving the notifier subsystem to random behaviour
such as eventually allowing the cpu to go down.

Fix this by using the standard NOTIFY_BAD instead.

Signed-off-by: Li Zhong
Reviewed-by: Srivatsa S. Bhat
Acked-by: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Li Zhong
2013-05-31 17:33:10 +0800

16 May, 2013

1 commit

cc51bf6e6 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:

- Cure for not using zalloc in the first place, which leads to random
crashes with CPUMASK_OFF_STACK.

- Revert a user space visible change which broke udev

- Add a missing cpu_online early return introduced by the new full
dyntick conversions

- Plug a long standing race in the timer wheel cpu hotplug code.
Sigh...

- Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
up.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
tick: Cleanup NOHZ per cpu data on cpu down
tick: Use zalloc_cpumask_var for allocating offstack cpumasks

Linus Torvalds
2013-05-16 05:05:17 +0800

14 May, 2013

1 commit

f7ea0fd63 tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline ... Browse Code »

commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
to bail out when the cpu is offline. That's causing subsequent
failures as an offline CPU is supposed to die and not to fiddle with
nohz magic.

Return false in can_stop_idle_tick() if the cpu is offline.

Reported-and-tested-by: Jiri Kosina
Reported-and-tested-by: Prarit Bhargava
Cc: Frederic Weisbecker
Cc: Borislav Petkov
Cc: Tony Luck
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-05-14 23:40:31 +0800

12 May, 2013

1 commit

4b0c0f294 tick: Cleanup NOHZ per cpu data on cpu down ... Browse Code »

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava
Cc: stable@vger.kernel.org
Cc: Mike Galbraith
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-05-12 18:20:09 +0800

06 May, 2013

1 commit

534c97b09 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.

This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.

This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:

- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.

- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.

- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.

The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.

Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:

- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.

- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.

- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.

The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.

This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.

This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:

- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.

- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.

- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.

- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.

- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)

Future plans:

- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.

- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.

I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.

More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...

Linus Torvalds
2013-05-06 04:23:27 +0800

04 May, 2013

1 commit

265f22a97 sched: Keep at least 1 tick per second for active dynticks tasks ... Browse Code »

The scheduler doesn't yet fully support environments
with a single task running without a periodic tick.

In order to ensure we still maintain the duties of scheduler_tick(),
keep at least 1 tick per second.

This makes sure that we keep the progression of various scheduler
accounting and background maintainance even with a very low granularity.
Examples include cpu load, sched average, CFS entity vruntime,
avenrun and events such as load balancing, amongst other details
handled in sched_class::task_tick().

This limitation will be removed in the future once we get
these individual items to work in full dynticks CPUs.

Suggested-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Christoph Lameter
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-05-04 14:32:02 +0800

29 Apr, 2013

1 commit

6296ace46 nohz: Protect smp_processor_id() in tick_nohz_task_switch() ... Browse Code »

I saw following error when testing the latest nohz code on
Power:

[ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
[ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
[ 85.295402] Call Trace:
[ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
[ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
[ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
[ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
[ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
[ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
[ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

The code below moves the test into local_irq_save/restore
section to avoid the above complaint.

Signed-off-by: Li Zhong
Acked-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Paul McKenney
Link: http://lkml.kernel.org/r/1367119558.6391.34.camel@ThinkPad-T5421.cn.ibm.com
Signed-off-by: Ingo Molnar

Li Zhong
2013-04-29 19:17:33 +0800

26 Apr, 2013

1 commit

47aa8b6cb nohz: Reduce overhead under high-freq idling patterns ... Browse Code »

One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
which apparently can break out of its idle loop rather frequently, with
high frequency.

In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
context switching, because tick_nohz_stop_sched_tick() will, if
delta_jiffies == 0, mis-identify this as a timer event - activating the
TIMER_SOFTIRQ, which wakes up ksoftirqd.

Fix this by treating delta_jiffies == 0 the same way we treat other short
wakeups, delta_jiffies == 1.

Cc: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Signed-off-by: Ingo Molnar

Ingo Molnar
2013-04-26 16:13:42 +0800

23 Apr, 2013

5 commits

cb41a2907 nohz: Add basic tracing ... Browse Code »

It's not obvious to find out why the full dynticks subsystem
doesn't always stop the tick: whether this is due to kthreads,
posix timers, perf events, etc...

These new tracepoints are here to help the user diagnose
the failures and test this feature.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 05:03:09 +0800
99e5ada94 nohz: Re-evaluate the tick for the new task after a context switch ... Browse Code »

When a task is scheduled in, it may have some properties
of its own that could make the CPU reconsider the need for
the tick: posix cpu timers, perf events, ...

So notify the full dynticks subsystem when a task gets
scheduled in and re-check the tick dependency at this
stage. This is done through a self IPI to avoid messing
up with any current lock scenario.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:29:07 +0800
5811d9963 nohz: Prepare to stop the tick on irq exit ... Browse Code »

Interrupt exit is a natural place to stop the tick: it happens
after all events happening before and during the irq which
are liable to update the dependency on the tick occured. Also
it makes sure that any check on tick dependency is well ordered
against dynticks kick IPIs.

Bring in the infrastructure that performs the tick dependency
checks on irq exit and shut it down if these checks show that we
can do it safely.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:29:05 +0800
9014c45d9 nohz: Implement full dynticks kick ... Browse Code »

Implement the full dynticks kick that is performed from
IPIs sent by various subsystems (scheduler, posix timers, ...)
when they want to notify about a new event that may
reconsider the dependency on the tick.

Most of the time, such an event end up restarting the tick.

(Part of the design with subsystems providing *_can_stop_tick()
helpers suggested by Peter Zijlstra a while ago).

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:28:04 +0800
ff442c51f nohz: Re-evaluate the tick from the scheduler IPI ... Browse Code »

The scheduler IPI is used by the scheduler to kick
full dynticks CPUs asynchronously when more than one
task are running or when a new timer list timer is
enqueued. This way the destination CPU can decide
to restart the tick to handle this new situation.

Now let's call that kick in the scheduler IPI.

(Reusing the scheduler IPI rather than implementing
a new IPI was suggested by Peter Zijlstra a while ago)

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:16:04 +0800

21 Apr, 2013

1 commit

a166fcf04 Merge branch 'timers/nohz-posix-timers-v2' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/frederic/linux-dynticks into timers/nohz

Pull posix cpu timers handling on full dynticks from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-04-21 17:05:47 +0800

19 Apr, 2013

1 commit

f98823ac7 nohz: New option to default all CPUs in full dynticks range ... Browse Code »

Provide a new kernel config that defaults all CPUs to be part
of the full dynticks range, except the boot one for timekeeping.

This default setting is overriden by the nohz_full= boot option
if passed by the user.

This is helpful for those who don't need a finegrained range
of full dynticks CPU and also for automated testing.

Suggested-by: Ingo Molnar
Reviewed-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 19:54:16 +0800