Eric Lee / smarc-fsl-linux-kernel

02 Sep, 2015

1 commit

9642d18ee nohz: Affine unpinned timers to housekeepers ... Browse Code »

The problem addressed in this patch is about affining unpinned
timers. Adaptive or Full Dynticks CPUs are currently disturbed
by unnecessary jitter due to firing of such timers on them.

This patch will affine timers to online CPUs which are not full
dynticks in NOHZ_FULL configured systems. It should not
introduce overhead in nohz full off case due to static keys.

Signed-off-by: Vatika Harlalka
Signed-off-by: Frederic Weisbecker
Reviewed-by: Preeti U Murthy
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1441119060-2230-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Vatika Harlalka
2015-09-02 16:33:22 +0800

29 Jul, 2015

3 commits

de734f89b nohz: Remove useless argument on tick_nohz_task_switch() ... Browse Code »

Leftover from early code.

Cc: Christoph Lameter
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2015-07-29 21:45:01 +0800
73738a95d nohz: Restart nohz full tick from irq exit ... Browse Code »

Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.

In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.

Reviewed-by: Rik van Riel
Cc: Christoph Lameter
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2015-07-29 21:45:00 +0800
03f6199a3 nohz: Prevent tilegx network driver interrupts ... Browse Code »

Normally the tilegx networking shim sends irqs to all the cores
to distribute the load of processing incoming-packet interrupts,
so that you can get to multiple Gb's of traffic inbound.

However, in nohz_full mode we don't want to interrupt the
nohz_full cores by default, so we limit the set of cores we use
to only the online housekeeping cores.

To make client code easier to read, we introduce a new nohz_full
accessor, housekeeping_cpumask(), which returns a pointer to the
housekeeping_mask if nohz_full is enabled, and otherwise returns
the cpu_possible_mask.

Signed-off-by: Chris Metcalf
Cc: Christoph Lameter
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Chris Metcalf
2015-07-29 21:44:59 +0800

08 Jul, 2015

2 commits

37b64a420 tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build ... Browse Code »

Making tick_broadcast_oneshot_control() independent from
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
there.

Provide a proper stub inline.

Fixes: f32dd1170511 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: kbuild test robot
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-07-08 03:56:34 +0800
f32dd1170 tick/broadcast: Make idle check independent from mode and config ... Browse Code »

Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

- If broadcast is disabled in the config, then return busy

- If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
busy if the broadcast device is hrtimer based.

- If oneshot mode is enabled in the config call the original
tick_broadcast_oneshot_control() function. That function needs
extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800

24 Jun, 2015

1 commit

43c9fad94 Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI updates from Rafael Wysocki:
"The rework of backlight interface selection API from Hans de Goede
stands out from the number of commits and the number of affected
places perspective. The cpufreq core fixes from Viresh Kumar are
quite significant too as far as the number of commits goes and because
they should reduce CPU online/offline overhead quite a bit in the
majority of cases.

From the new featues point of view, the ACPICA update (to upstream
revision 20150515) adding support for new ACPI 6 material to ACPICA is
the one that matters the most as some new significant features will be
based on it going forward. Also included is an update of the ACPI
device power management core to follow ACPI 6 (which in turn reflects
the Windows' device PM implementation), a PM core extension to support
wakeup interrupts in a more generic way and support for the ACPI _CCA
device configuration object.

The rest is mostly fixes and cleanups all over and some documentation
updates, including new DT bindings for Operating Performance Points.

There is one fix for a regression introduced in the 4.1 cycle, but it
adds quite a number of lines of code, it wasn't really ready before
Thursday and you were on vacation, so I refrained from pushing it on
the last minute for 4.1.

Specifics:

- ACPICA update to upstream revision 20150515 including basic support
for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
_MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
Lv Zheng).

- ACPI device power management core code update to follow ACPI 6
which reflects the ACPI device power management implementation in
Windows (Rafael J Wysocki).

- rework of the backlight interface selection logic to reduce the
number of kernel command line options and improve the handling of
DMI quirks that may be involved in that and to make the code
generally more straightforward (Hans de Goede).

- fixes for the ACPI Embedded Controller (EC) driver related to the
handling of EC transactions (Lv Zheng).

- fix for a regression related to the ACPI resources management and
resulting from a recent change of ACPI initialization code ordering
(Rafael J Wysocki).

- fix for a system initialization regression related to ACPI
introduced during the 3.14 cycle and caused by running the code
that switches the platform over to the ACPI mode too early in the
initialization sequence (Rafael J Wysocki).

- support for the ACPI _CCA device configuration object related to
DMA cache coherence (Suravee Suthikulpanit).

- ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).

- ACPI battery driver cleanups (Luis Henriques, Mathias Krause).

- ACPI processor driver cleanups (Hanjun Guo).

- cleanups and documentation update related to the ACPI device
properties interface based on _DSD (Rafael J Wysocki).

- ACPI device power management fixes (Rafael J Wysocki).

- assorted cleanups related to ACPI (Dominik Brodowski, Fabian
Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).

- fix for a long-standing issue causing General Protection Faults to
be generated occasionally on return to user space after resume from
ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).

- fix to make the suspend core code return -EBUSY consistently in all
cases when system suspend is aborted due to wakeup detection (Ruchi
Kandoi).

- support for automated device wakeup IRQ handling allowing drivers
to make their PM support more starightforward (Tony Lindgren).

- new tracepoints for suspend-to-idle tracing and rework of the
prepare/complete callbacks tracing in the PM core (Todd E Brandt,
Rafael J Wysocki).

- wakeup sources framework enhancements (Jin Qian).

- new macro for noirq system PM callbacks (Grygorii Strashko).

- assorted cleanups related to system suspend (Rafael J Wysocki).

- cpuidle core cleanups to make the code more efficient (Rafael J
Wysocki).

- powernv/pseries cpuidle driver update (Shilpasri G Bhat).

- cpufreq core fixes related to CPU online/offline that should reduce
the overhead of these operations quite a bit, unless the CPU in
question is physically going away (Viresh Kumar, Saravana Kannan).

- serialization of cpufreq governor callbacks to avoid race
conditions in some cases (Viresh Kumar).

- intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
Bhargava, Joe Konno).

- cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
Holla, Felipe Balbi, Tang Yuantian).

- assorted cleanups in cpufreq drivers and core (Shailendra Verma,
Fabian Frederick, Wang Long).

- new Device Tree bindings for representing Operating Performance
Points (Viresh Kumar).

- updates for the common clock operations support code in the PM core
(Rajendra Nayak, Geert Uytterhoeven).

- PM domains core code update (Geert Uytterhoeven).

- Intel Knights Landing support for the RAPL (Running Average Power
Limit) power capping driver (Dasaratharaman Chandramouli).

- fixes related to the floor frequency setting on Atom SoCs in the
RAPL power capping driver (Ajay Thomas).

- runtime PM framework documentation update (Ben Dooks).

- cpupower tool fix (Herton R Krzesinski)"

* tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
x86: Load __USER_DS into DS/ES after resume
PM / OPP: Add binding for 'opp-suspend'
PM / OPP: Allow multiple OPP tables to be passed via DT
PM / OPP: Add new bindings to address shortcomings of existing bindings
ACPI: Constify ACPI device IDs in documentation
ACPI / enumeration: Document the rules regarding the PRP0001 device ID
ACPI / video: Make acpi_video_unregister_backlight() private
acpi-video-detect: Remove old API
toshiba-acpi: Port to new backlight interface selection API
thinkpad-acpi: Port to new backlight interface selection API
sony-laptop: Port to new backlight interface selection API
samsung-laptop: Port to new backlight interface selection API
msi-wmi: Port to new backlight interface selection API
msi-laptop: Port to new backlight interface selection API
intel-oaktrail: Port to new backlight interface selection API
ideapad-laptop: Port to new backlight interface selection API
fujitsu-laptop: Port to new backlight interface selection API
eeepc-laptop: Port to new backlight interface selection API
dell-wmi: Port to new backlight interface selection API
...

Linus Torvalds
2015-06-24 05:18:07 +0800

19 May, 2015

1 commit

87e9b9f1d PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND ... Browse Code »

Since idle_should_freeze() is defined to always return 'false'
for CONFIG_SUSPEND unset, all of the code depending on it in
cpuidle_idle_call() is not necessary in that case.

Make that code depend on CONFIG_SUSPEND too to avoid building it
when it is not going to be used.

Signed-off-by: Rafael J. Wysocki
Acked-by: Thomas Gleixner

Rafael J. Wysocki
2015-05-19 08:44:24 +0800

07 May, 2015

1 commit

83dedea8a nohz: Add tick_nohz_full_add_cpus_to() API ... Browse Code »

This API is useful to modify a cpumask indicating some special
nohz-type functionality so that the nohz cores are automatically
added to that set.

Signed-off-by: Chris Metcalf
Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra (Intel)
Cc: Borislav Petkov
Cc: Dave Jones
Cc: H. Peter Anvin
Cc: Martin Schwidefsky
Cc: Mike Galbraith
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Rafael J. Wysocki
Cc: Rik van Riel
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1429024675-18938-1-git-send-email-cmetcalf@ezchip.com
Link: http://lkml.kernel.org/r/1430928266-24888-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Chris Metcalf
2015-05-07 18:02:51 +0800

03 Apr, 2015

4 commits

a49b116dc clockevents: Cleanup dead cpu explicitely ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.

Signed-off-by: Thomas Gleixner
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:37 +0800
52c063d1a clockevents: Make tick handover explicit ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.

Signed-off-by: Thomas Gleixner
[ Rebase ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: John Stultz
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:36 +0800
1fe5d5c3c clockevents: Provide explicit broadcast oneshot control functions ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Alexandre Courbot
Cc: Daniel Lezcano
Cc: Len Brown
Cc: Peter Zijlstra
Cc: Stephen Warren
Cc: Thierry Reding
Cc: Tony Lindgren
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:33 +0800
592a438ff clockevents: Provide explicit broadcast control functions ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Cc: Daniel Lezcano
Cc: Len Brown
Cc: Peter Zijlstra
Cc: Tony Lindgren
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-03 14:44:31 +0800

02 Apr, 2015

1 commit

345527b1e clockevents: Fix cpu_down() race for hrtimer based broadcasting ... Browse Code »

It was found when doing a hotplug stress test on POWER, that the
machine either hit softlockups or rcu_sched stall warnings. The
issue was traced to commit:

7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

which exposed the cpu_down() race with hrtimer based broadcast mode:

5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

The race is the following:

Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
before it is taken down.

CPU0 CPU1

cpu_down() take_cpu_down()
disable_interrupts()

cpu_die()

while (CPU1 != CPU_DEAD) {
msleep(100);
switch_to_idle();
stop_cpu_timer();
schedule_broadcast();
}

tick_cleanup_cpu_dead()
take_over_broadcast()

So after CPU1 disabled interrupts it cannot handle the broadcast
hrtimer anymore, so CPU0 will be stuck forever.

Fix this by explicitly taking over broadcast duty before cpu_die().

This is a temporary workaround. What we really want is a callback
in the clockevent device which allows us to do that from the dying
CPU by pushing the hrtimer onto a different cpu. That might involve
an IPI and is definitely more complex than this immediate fix.

Changelog was picked up from:

https://lkml.org/lkml/2015/2/16/213

Suggested-by: Thomas Gleixner
Tested-by: Nicolas Pitre
Signed-off-by: Preeti U. Murthy
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: nicolas.pitre@linaro.org
Cc: peterz@infradead.org
Cc: rjw@rjwysocki.net
Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
[ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
Signed-off-by: Ingo Molnar

Preeti U Murthy
2015-04-02 20:25:39 +0800

01 Apr, 2015

4 commits

7270d11c5 arm/bL_switcher: Kill tick suspend hackery ... Browse Code »

Use the new tick_suspend/resume_local() and get rid of the
homebrewn implementation of these in the ARM bL switcher. The
check for the cpumask is completely pointless. There is no harm
to suspend a per cpu tick device unconditionally. If that's a
real issue then we fix it proper at the core level and not with
some completely undocumented hacks in some random core code.

Move the tick internals to the core code, now that this nuisance
is gone.

Signed-off-by: Thomas Gleixner
[ rjw: Rebase, changelog ]
Signed-off-by: Rafael J. Wysocki
Cc: Nicolas Pitre
Cc: Peter Zijlstra
Cc: Russell King
Link: http://lkml.kernel.org/r/1655112.Ws17YsMfN7@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-01 20:23:00 +0800
f46481d0a tick/xen: Provide and use tick_suspend_local() and tick_resume_local() ... Browse Code »

Xen calls on every cpu into tick_resume() which is just wrong.
tick_resume() is for the syscore global suspend/resume
invocation. What XEN really wants is a per cpu local resume
function.

Provide a tick_resume_local() function and use it in XEN.

Also provide a complementary tick_suspend_local() and modify
tick_unfreeze() and tick_freeze(), respectively, to use the
new local tick resume/suspend functions.

Signed-off-by: Thomas Gleixner
[ Combined two patches, rebased, modified subject/changelog. ]
Signed-off-by: Rafael J. Wysocki
Cc: Boris Ostrovsky
Cc: David Vrabel
Cc: Konrad Rzeszutek Wilk
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
[ Merged to latest timers/core. ]
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-01 20:23:00 +0800
4ffee521f clockevents: Make suspend/resume calls explicit ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.

We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.

No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.

Signed-off-by: Thomas Gleixner
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-01 20:22:59 +0800
c1797baf6 tick: Move core only declarations and functions to core ... Browse Code »

No point to expose everything to the world. People just believe
such functions can be abused for whatever purposes. Sigh.

Signed-off-by: Thomas Gleixner
[ Rebased on top of 4.0-rc5 ]
Signed-off-by: Rafael J. Wysocki
Cc: Nicolas Pitre
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan
[ Merged to latest timers/core ]
Signed-off-by: Ingo Molnar

Thomas Gleixner
2015-04-01 20:22:58 +0800

16 Feb, 2015

1 commit

124cf9117 PM / sleep: Make it possible to quiesce timers during suspend-to-idle ... Browse Code »

The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.

However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion. A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.

The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping. That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs. It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.

Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit. Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.

To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).

Suggested-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2015-02-16 02:40:09 +0800

14 Oct, 2014

1 commit

1ee07ef6b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Martin Schwidefsky:
"This patch set contains the main portion of the changes for 3.18 in
regard to the s390 architecture. It is a bit bigger than usual,
mainly because of a new driver and the vector extension patches.

The interesting bits are:
- Quite a bit of work on the tracing front. Uprobes is enabled and
the ftrace code is reworked to get some of the lost performance
back if CONFIG_FTRACE is enabled.
- To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
IPTE range facility is added.
- The rwlock code is re-factored to improve writer fairness and to be
able to use the interlocked-access instructions.
- The kernel part for the support of the vector extension is added.
- The device driver to access the CD/DVD on the HMC is added, this
will hopefully come in handy to improve the installation process.
- Add support for control-unit initiated reconfiguration.
- The crypto device driver is enhanced to enable the additional AP
domains and to allow the new crypto hardware to be used.
- Bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
s390/ftrace: remove 31 bit ftrace support
s390/kdump: add support for vector extension
s390/disassembler: add vector instructions
s390: add support for vector extension
s390/zcrypt: Toleration of new crypto hardware
s390/idle: consolidate idle functions and definitions
s390/nohz: use a per-cpu flag for arch_needs_cpu
s390/vtime: do not reset idle data on CPU hotplug
s390/dasd: add support for control unit initiated reconfiguration
s390/dasd: fix infinite loop during format
s390/mm: make use of ipte range facility
s390/setup: correct 4-level kernel page table detection
s390/topology: call set_sched_topology early
s390/uprobes: architecture backend for uprobes
s390/uprobes: common library for kprobes and uprobes
s390/rwlock: use the interlocked-access facility 1 instructions
s390/rwlock: improve writer fairness
s390/rwlock: remove interrupt-enabling rwlock variant.
s390/mm: remove change bit override support
...

Linus Torvalds
2014-10-14 09:47:00 +0800

09 Oct, 2014

1 commit

fe0f49768 s390/nohz: use a per-cpu flag for arch_needs_cpu ... Browse Code »

Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2014-10-09 15:14:02 +0800

14 Sep, 2014

1 commit

a80e49e2c nohz: Move nohz full init call to tick init ... Browse Code »

This way we unbloat a bit main.c and more importantly we initialize
nohz full after init_IRQ(). This dependency will be needed in further
patches because nohz full needs irq work to raise its own IRQ.
Information about the support for this ability on ARM64 is obtained on
init_IRQ() which initialize the pointer to __smp_call_function.

Since tick_init() is called right after init_IRQ(), this is a good place
to call tick_nohz_init() and prepare for that dependency.

Acked-by: Peter Zijlstra (Intel)
Cc: Ingo Molnar
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-09-14 00:34:44 +0800

05 Sep, 2014

1 commit

40bea0395 nohz: Restore NMI safe local irq work for local nohz kick ... Browse Code »

The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.

As a result:

WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
Call Trace:
[] dump_stack+0x4e/0x7a
[] warn_slowpath_common+0x7d/0xa0
[] warn_slowpath_null+0x1a/0x20
[] irq_work_queue_on+0x11e/0x140
[] tick_nohz_full_kick_cpu+0x57/0x90
[] __perf_event_overflow+0x275/0x350
[] ? perf_event_task_disable+0xa0/0xa0
[] ? x86_perf_event_set_period+0xbf/0x150
[] perf_event_overflow+0x14/0x20
[] intel_pmu_handle_irq+0x206/0x410
[] ? arch_vtime_task_switch+0x63/0x130
[] perf_event_nmi_handler+0x2b/0x50
[] nmi_handle+0xd2/0x390
[] ? nmi_handle+0x5/0x390
[] ? lock_release+0xab/0x330
[] default_do_nmi+0x72/0x1c0
[] ? cpuacct_account_field+0xcf/0x200
[] do_nmi+0xb8/0x100

Lets fix this by restoring the use of local irq work for the nohz local
kick.

Reported-by: Catalin Iacob
Reported-and-tested-by: Dave Jones
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-09-05 04:35:59 +0800

05 Aug, 2014

1 commit

98959948a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.

This necessiated quite some background infrastructure rework,
including:

* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification

- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)

- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)

- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)

- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)

- Robustify the sched topology setup code. (Peterz Zijlstra)

- Improve sched_feat() handling wrt. static_keys (Jason Baron)

- Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...

Linus Torvalds
2014-08-05 07:23:30 +0800

10 Jul, 2014

1 commit

c0f489d2c rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs ... Browse Code »

Binding the grace-period kthreads to the timekeeping CPU resulted in
significant performance decreases for some workloads. For more detail,
see:

https://lkml.org/lkml/2014/6/3/395 for benchmark numbers

https://lkml.org/lkml/2014/6/4/218 for CPU statistics

It turns out that it is necessary to bind the grace-period kthreads
to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
In other cases, it suffices to bind the grace-period kthreads to the
set of non-nohz_full CPUs.

This commit therefore creates a tick_nohz_not_full_mask that is the
complement of tick_nohz_full_mask, and then binds the grace-period
kthread to the set of CPUs indicated by this new mask, which covers
the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y
case still binds the grace-period kthreads to the timekeeping CPU.
This commit also includes the tick_nohz_full_enabled() check suggested
by Frederic Weisbecker.

Reported-by: Jet Chen
Signed-off-by: Paul E. McKenney
[ paulmck: Created housekeeping_affine() and housekeeping_mask per
fweisbec feedback. ]

Paul E. McKenney
2014-07-10 00:15:02 +0800

16 Jun, 2014

1 commit

3d36aebc2 nohz: Support nohz full remote kick ... Browse Code »

Remotely kicking a full nohz CPU in order to make it re-evaluate its
next tick is currently implemented using the scheduler IPI.

However this bloats a scheduler fast path with an off-topic feature.
The scheduler tick was abused here for its cool "callable
anywhere/anytime" properties.

But now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

So lets implement remote kick on top of irq work. This is going to
be used when a new event requires the next tick to be recalculated:
more than 1 task competing on the CPU, timer armed, ...

Acked-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-06-16 22:26:54 +0800

16 Jan, 2014

1 commit

5acac1be4 tick: Rename tick_check_idle() to tick_irq_enter() ... Browse Code »

This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Alex Shi
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: John Stultz
Cc: Kevin Hilman
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-01-16 06:05:31 +0800

03 Dec, 2013

2 commits

58135f574 context_tracking: Wrap static key check into more intuitive function name ... Browse Code »

Use a function with a meaningful name to check the global context
tracking state. static_key_false() is a bit confusing for reviewers.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steven Rostedt

Frederic Weisbecker
2013-12-03 03:43:14 +0800
e8fcaa5c5 nohz: Convert a few places to use local per cpu accesses ... Browse Code »

A few functions use remote per CPU access APIs when they
deal with local values.

Just do the right conversion to improve performance, code
readability and debug checks.

While at it, lets extend some of these function names with *_this_cpu()
suffix in order to display their purpose more clearly.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steven Rostedt

Frederic Weisbecker
2013-12-03 03:39:30 +0800

14 Aug, 2013

3 commits

6f1d65766 Merge branch 'timers/nohz-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/f… ... Browse Code »

…rederic/linux-dynticks into timers/nohz

Pull nohz improvements from Frederic Weisbecker:

" It mostly contains fixes and full dynticks off-case optimizations. I believe that
distros want to enable this feature so it seems important to optimize the case
where the "nohz_full=" parameter is empty. ie: I'm trying to remove any performance
regression that comes with NO_HZ_FULL=y when the feature is not used.

This patchset improves the current situation a lot (off-case appears to be around 11% faster
with hackbench, although I guess it may vary depending on the configuration but it should be
significantly faster in any case) now there is still some work to do: I can still observe a
remaining loss of 1.6% throughput seen with hackbench compared to CONFIG_NO_HZ_FULL=n. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-08-14 23:58:56 +0800
d13508f94 nohz: Optimize full dynticks's sched hooks with static keys ... Browse Code »

Scheduler IPIs and task context switches are serious fast path.
Let's try to hide as much as we can the impact of full
dynticks APIs' off case that are called on these sites
through the use of static keys.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-14 23:14:58 +0800
460775df4 nohz: Optimize full dynticks state checks with static keys ... Browse Code »

These APIs are frequenctly accessed and priority is given
to optimize the full dynticks off-case in order to let
distros enable this feature without suffering from
significant performance regressions.

Let's inline these APIs and optimize them with static keys.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul E. McKenney
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Li Zhong
Cc: Mike Galbraith
Cc: Kevin Hilman

Frederic Weisbecker
2013-08-14 23:14:57 +0800

29 Jul, 2013

1 commit

148519120 Revert "cpuidle: Quickly notice prediction failure for repeat mode" ... Browse Code »

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

We believe we've identified a particular commit to the cpuidle code
that seems to be impacting performance of variety of workloads.
The simplest way to reproduce is using netperf TCP_RR test, so
we're using that, on a pair of Sandy Bridge based servers. We also
have data from a large database setup where performance is also
measurably/positively impacted, though that test data isn't easily
share-able.

Included below are test results from 3 test kernels:

kernel reverts
-----------------------------------------------------------
1) vanilla upstream (no reverts)

2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
e11538d1f03914eb92af5a1a378375c05ae8520c

In summary, netperf TCP_RR numbers improve by approximately 4%
after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
never seems to get above 40%. Taking that patch out gets C0 near
100% quite often, and performance increases.

The below data are histograms representing the %c0 residency @
1-second sample rates (using turbostat), while under netperf test.

- If you look at the first 4 histograms, you can see %c0 residency
almost entirely in the 30,40% bin.
- The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
shows %c0 in the 80,90,100% bins.

Below each kernel name are netperf TCP_RR trans/s numbers for the
particular kernel that can be disclosed publicly, comparing the 3
test kernels. We ran a 4th test with the vanilla kernel where
we've also set /dev/cpu_dma_latency=0 to show overall impact
boosting single-threaded TCP_RR performance over 11% above
baseline.

3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
TCP_RR trans/s 54323.78

-----------------------------------------------------------
3.10-rc2 vanilla RX (no reverts)
TCP_RR trans/s 48192.47

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 1]: *
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 49]:
*************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 perfteam2 RX (reverts commit
e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 49698.69

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 2]: **
40.0000 - 50.0000 [ 58]:
**********************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
and e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 47766.95

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 27]: ***************************
40.0000 - 50.0000 [ 2]: **
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 2]: **
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 28]: ****************************

Sender:
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 1]: *
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 3]: ***
80.0000 - 90.0000 [ 7]: *******
90.0000 - 100.0000 [ 38]: **************************************

These results demonstrate gaining back the tendency of the CPU to
stay in more responsive, performant C-states (and thus yield
measurably better performance), by reverting commit
69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

Requested-by: Jeremy Eder
Tested-by: Len Brown
Cc: 3.8+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-07-29 19:32:29 +0800

23 Apr, 2013

2 commits

99e5ada94 nohz: Re-evaluate the tick for the new task after a context switch ... Browse Code »

When a task is scheduled in, it may have some properties
of its own that could make the CPU reconsider the need for
the tick: posix cpu timers, perf events, ...

So notify the full dynticks subsystem when a task gets
scheduled in and re-check the tick dependency at this
stage. This is done through a self IPI to avoid messing
up with any current lock scenario.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:29:07 +0800
ff442c51f nohz: Re-evaluate the tick from the scheduler IPI ... Browse Code »

The scheduler IPI is used by the scheduler to kick
full dynticks CPUs asynchronously when more than one
task are running or when a new timer list timer is
enqueued. This way the destination CPU can decide
to restart the tick to handle this new situation.

Now let's call that kick in the scheduler IPI.

(Reusing the scheduler IPI rather than implementing
a new IPI was suggested by Peter Zijlstra a while ago)

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-23 02:16:04 +0800

21 Apr, 2013

1 commit

a166fcf04 Merge branch 'timers/nohz-posix-timers-v2' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/frederic/linux-dynticks into timers/nohz

Pull posix cpu timers handling on full dynticks from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-04-21 17:05:47 +0800

19 Apr, 2013

2 commits

d1e43fa5f nohz: Ensure full dynticks CPUs are RCU nocbs ... Browse Code »

We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.

Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=

The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.

Suggested-by: Steven Rostedt
Reviewed-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 19:54:04 +0800
76c24fb05 nohz: New APIs to re-evaluate the tick on full dynticks CPUs ... Browse Code »

Provide two new helpers in order to notify the full dynticks CPUs about
some internal system changes against which they may reconsider the state
of their tick. Some practical examples include: posix cpu timers, perf tick
and sched clock tick.

For now the notifying handler, implemented through IPIs, is a stub
that will be implemented when we get the tick stop/restart infrastructure
in.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 00:53:34 +0800

16 Apr, 2013

1 commit

c5bfece2d nohz: Switch from "extended nohz" to "full nohz" based naming ... Browse Code »

"Extended nohz" was used as a naming base for the full dynticks
API and Kconfig symbols. It reflects the fact the system tries
to stop the tick in more places than just idle.

But that "extended" name is a bit opaque and vague. Rename it to
"full" makes it clearer what the system tries to do under this
config: try to shutdown the tick anytime it can. The various
constraints that prevent that to happen shouldn't be considered
as fundamental properties of this feature but rather technical
issues that may be solved in the future.

Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-16 01:58:17 +0800

03 Apr, 2013

1 commit

3451d0243 nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON ... Browse Code »

We are planning to convert the dynticks Kconfig options layout
into a choice menu. The user must be able to easily pick
any of the following implementations: constant periodic tick,
idle dynticks, full dynticks.

As this implies a mutual exclusion, the two dynticks implementions
need to converge on the selection of a common Kconfig option in order
to ease the sharing of a common infrastructure.

It would thus seem pretty natural to reuse CONFIG_NO_HZ to
that end. It already implements all the idle dynticks code
and the full dynticks depends on all that code for now.
So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

On the other hand we want to stay backward compatible: if
CONFIG_NO_HZ is set in an older config file, we want to
enable CONFIG_NO_HZ_IDLE by default.

But we can't afford both at the same time or we run into
a circular dependency:

1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
CONFIG_NO_HZ
2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

We might be able to support that from Kconfig/Kbuild but it
may not be wise to introduce such a confusing behaviour.

So to solve this, create a new CONFIG_NO_HZ_COMMON option
which gathers the common code between idle and full dynticks
(that common code for now is simply the idle dynticks code)
and select it from their referring Kconfig.

Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
to it for backward compatibility.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-03 19:56:03 +0800