Eric Lee / smarc-fsl-linux-kernel

15 Jun, 2020

1 commit

9cc5b8656 isolcpus: Affine unbound kernel threads to housekeeping cpus ... Browse Code »

This is a kernel enhancement that configures the cpu affinity of kernel
threads via kernel boot option nohz_full=.

When this option is specified, the cpumask is immediately applied upon
kthread launch. This does not affect kernel threads that specify cpu
and node.

This allows CPU isolation (that is not allowing certain threads
to execute on certain CPUs) without using the isolcpus=domain parameter,
making it possible to enable load balancing on such CPUs
during runtime (see kernel-parameters.txt).

Note-1: this is based off on Wind River's patch at
https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch

Difference being that this patch is limited to modifying kernel thread
cpumask. Behaviour of other threads can be controlled via cgroups or
sched_setaffinity.

Note-2: Wind River's patch was based off Christoph Lameter's patch at
https://lwn.net/Articles/565932/ with the only difference being
the kernel parameter changed from kthread to kthread_cpus.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Marcelo Tosatti
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200527142909.23372-3-frederic@kernel.org

Marcelo Tosatti
2020-06-15 20:10:03 +0800

15 Apr, 2020

1 commit

3662daf02 sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters ... Browse Code »

The "isolcpus=" parameter allows sub-parameters before the cpulist is
specified, and if the parser detects an unknown sub-parameters the whole
parameter will be ignored.

This design is incompatible with itself when new sub-parameters are added.
An older kernel will not recognize the new sub-parameter and will
invalidate the whole parameter so the CPU isolation will not take
effect. It emits a warning:

isolcpus: Error, unknown flag

The better and compatible way is to allow "isolcpus=" to skip unknown
sub-parameters, so that even if new sub-parameters are added an older
kernel will still be able to behave as usual even if with the new
sub-parameter specified on the command line.

Ideally this should have been there when the first sub-parameter for
"isolcpus=" was introduced.

Suggested-by: Thomas Gleixner
Signed-off-by: Peter Xu
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com

Peter Xu
2020-04-15 16:38:26 +0800

22 Jan, 2020

1 commit

11ea68f55 genirq, sched/isolation: Isolate from handling managed interrupts ... Browse Code »

The affinity of managed interrupts is completely handled in the kernel and
cannot be changed via the /proc/irq/* interfaces from user space. As the
kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent
vector exhaustion, it can happen that a managed interrupt whose affinity
mask contains both isolated and housekeeping CPUs is routed to an isolated
CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
on the isolated CPU.

Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
logic in the interrupt affinity selection code.

The subparameter indicates to the interrupt affinity selection logic that
it should try to avoid the above scenario.

This isolation is best effort and only effective if the automatically
assigned interrupt mask of a device queue contains isolated and
housekeeping CPUs. If housekeeping CPUs are online then such interrupts are
directed to the housekeeping CPU so that IO submitted on the housekeeping
CPU cannot disturb the isolated CPU.

If a queue's affinity mask contains only isolated CPUs then this parameter
has no effect on the interrupt routing decision, though interrupts are only
happening when tasks running on those isolated CPUs submit IO. IO submitted
on housekeeping CPUs has no influence on those queues.

If the affinity mask contains both housekeeping and isolated CPUs, but none
of the contained housekeeping CPUs is online, then the interrupt is also
routed to an isolated CPU. Interrupts are only delivered when one of the
isolated CPUs in the affinity mask submits IO. If one of the contained
housekeeping CPUs comes online, the CPU hotplug logic migrates the
interrupt automatically back to the upcoming housekeeping CPU. Depending on
the type of interrupt controller, this can require that at least one
interrupt is delivered to the isolated CPU in order to complete the
migration.

[ tglx: Removed unused parameter, added and edited comments/documentation
and rephrased the changelog so it contains more details. ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com

Ming Lei
2020-01-22 23:29:49 +0800

25 Jul, 2019

1 commit

e0e8d4911 sched/isolation: Prefer housekeeping CPU in local node ... Browse Code »

In real product setup, there will be houseeking CPUs in each nodes, it
is prefer to do housekeeping from local node, fallback to global online
cpumask if failed to find houseeking CPU from local node.

Signed-off-by: Wanpeng Li
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Frederic Weisbecker
Reviewed-by: Srikar Dronamraju
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/1561711901-4755-2-git-send-email-wanpengli@tencent.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2019-07-25 21:51:55 +0800

20 Jul, 2019

1 commit

0c5f81dad KVM: LAPIC: Inject timer interrupt via posted interrupt ... Browse Code »

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside. There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits. This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially. Without patch

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

While with patch:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Marcelo Tosatti
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini

Wanpeng Li
2019-07-20 15:00:40 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

04 May, 2019

1 commit

9219565aa sched/isolation: Require a present CPU in housekeeping mask ... Browse Code »

During housekeeping mask setup, currently a possible CPU is required.
That does not guarantee the CPU would be available at boot time, so
check to ensure that at least one present CPU is in the mask.

Signed-off-by: Nicholas Piggin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Frederic Weisbecker
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rafael J . Wysocki
Cc: Thomas Gleixner
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-5-npiggin@gmail.com
Signed-off-by: Ingo Molnar

Nicholas Piggin
2019-05-04 01:42:58 +0800

13 Feb, 2019

1 commit

c89d92edd sched/fair: Use non-atomic cpumask_{set,clear}_cpu() ... Browse Code »

The cpumasks updated here are not subject to concurrency and using
atomic bitops for them is pointless and expensive. Use the non-atomic
variants instead.

Suggested-by: Peter Zijlstra
Signed-off-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Thomas Gleixner
Cc: Vincent Guittot
Link: http://lkml.kernel.org/r/2e2a10f84b9049a81eef94ed6d5989447c21e34a.1549963617.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar

Viresh Kumar
2019-02-13 15:34:13 +0800

03 Dec, 2018

1 commit

dfcb245e2 sched: Fix various typos in comments ... Browse Code »

Go over the scheduler source code and fix common typos
in comments - and a typo in an actual variable name.

No change in functionality intended.

Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Linus Torvalds
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-12-03 18:55:42 +0800

04 Mar, 2018

1 commit

325ea10c0 sched/headers: Simplify and clean up header usage in the scheduler ... Browse Code »

Do the following cleanups and simplifications:

- sched/sched.h already includes , so no need to
include it in sched/core.c again.

- order the headers alphabetically

- add all headers to kernel/sched/sched.h

- remove all unnecessary includes from the .c files that
are already included in kernel/sched/sched.h.

Finally, make all scheduler .c files use a single common header:

#include "sched.h"

... which now contains a union of the relied upon headers.

This makes the various .c files easier to read and easier to handle.

Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-03-04 19:39:29 +0800

03 Mar, 2018

1 commit

97fb7a0a8 sched: Clean up and harmonize the coding style of the scheduler code base ... Browse Code »

A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:

- fix speling in comments,

- use curly braces for multi-line statements,

- remove unnecessary parentheses from integer literals,

- capitalize consistently,

- remove stray newlines,

- add comments where necessary,

- remove invalid/unnecessary comments,

- align structure definitions and other data types vertically,

- add missing newlines for increased readability,

- fix vertical tabulation where it's misaligned,

- harmonize preprocessor conditional block labeling
and vertical alignment,

- remove line-breaks where they uglify the code,

- add newline after local variable definitions,

No change in functionality:

md5:
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm

Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-03-03 22:50:21 +0800

21 Feb, 2018

2 commits

d84b31313 sched/isolation: Offload residual 1Hz scheduler tick ... Browse Code »

When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
keep the scheduler stats alive. However this residual tick is a burden
for bare metal tasks that can't stand any interruption at all, or want
to minimize them.

The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
outsource these scheduler ticks to the global workqueue so that a
housekeeping CPU handles those remotely. The sched_class::task_tick()
implementations have been audited and look safe to be called remotely
as the target runqueue and its current task are passed in parameter
and don't seem to be accessed locally.

Note that in the case of using isolcpus, it's still up to the user to
affine the global workqueues to the housekeeping CPUs through
/sys/devices/virtual/workqueue/cpumask or domains isolation
"isolcpus=nohz,domain".

Signed-off-by: Frederic Weisbecker
Reviewed-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1519186649-3242-6-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2018-02-21 16:49:09 +0800
1bda3f808 sched/isolation: Isolate workqueues when "nohz_full=" is set ... Browse Code »

As we prepare for offloading the residual 1hz scheduler ticks to
workqueue, let's affine those to housekeepers so that they don't
interrupt the CPUs that don't want to be disturbed.

Signed-off-by: Frederic Weisbecker
Reviewed-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1519186649-3242-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2018-02-21 16:49:08 +0800

27 Oct, 2017

7 commits

150dfee95 sched/isolation: Add basic isolcpus flags ... Browse Code »

Add flags to control NOHZ and domain isolation from "isolcpus=", in
order to centralize the isolation features to a common interface. Domain
isolation remains the default so not to break the existing isolcpus
boot paramater behaviour.

Further flags in the future may include 0hz (1hz tick offload) and timers,
workqueue, RCU, kthread, watchdog, likely all merged together in a
common flag ("async"?). In any case, this will have to be modifiable by
cpusets.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-12-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:31 +0800
edb938217 sched/isolation: Move isolcpus= handling to the housekeeping code ... Browse Code »

We want to centralize the isolation features, to be done by the housekeeping
subsystem and scheduler domain isolation is a significant part of it.

No intended behaviour change, we just reuse the housekeeping cpumask
and core code.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-11-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:30 +0800
6f1982fed sched/isolation: Handle the nohz_full= parameter ... Browse Code »

We want to centralize the isolation management, done by the housekeeping
subsystem. Therefore we need to handle the nohz_full= parameter from
there.

Since nohz_full= so far has involved unbound timers, watchdog, RCU
and tilegx NAPI isolation, we keep that default behaviour.

nohz_full= will be deprecated in the future. We want to control
the isolation features from the isolcpus= parameter.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-10-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:30 +0800
de201559d sched/isolation: Introduce housekeeping flags ... Browse Code »

Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want NOHZ_FULL
without the full scheduler isolation, others want full scheduler
isolation without NOHZ_FULL.

So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:29 +0800
e179f5a04 sched/isolation: Use its own static key ... Browse Code »

Housekeeping code still depends on the nohz_full static key. Since we want
to decouple housekeeping from NOHZ, let's create a housekeeping specific
static key.

It's mostly relevant for calls to is_housekeeping_cpu() from the scheduler.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-6-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:27 +0800
7e56a1cf4 sched/isolation: Make the housekeeping cpumask private ... Browse Code »

Nobody needs to access this detail. housekeeping_cpumask() already
takes care of it.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:26 +0800
786340614 sched/isolation: Move housekeeping related code to its own file ... Browse Code »

The housekeeping code is currently tied to the NOHZ code. As we are
planning to make housekeeping independent from it, start with moving
the relevant code to its own file.

Signed-off-by: Frederic Weisbecker
Acked-by: Thomas Gleixner
Acked-by: Paul E. McKenney
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Linus Torvalds
Cc: Luiz Capitulino
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-10-27 15:55:24 +0800