Eric Lee / smarc-fsl-linux-kernel

05 Sep, 2019

1 commit

1251201c0 sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code ... Browse Code »

Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

$ chrt -p $$
chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

- if user-attributes have the same size as kernel attributes then the
logic is unchanged.

- if user-attributes are larger than the kernel knows about then simply
skip the extra bits, but set attr->size to the (smaller) kernel size
so that tooling can (in principle) handle older kernel as well.

- if user-attributes are smaller than the kernel knows about then just
copy whatever user-space can accept.

Also clean up the whole logic:

- Simplify the code flow - there's no need for 'ret' for example.

- Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
always know which side we are dealing with.

- Why is it called 'read' when what it does is to copy to user? This
code is so far away from VFS read() semantics that the naming is
actively confusing. Name it sched_attr_copy_to_user() instead, which
mirrors other copy_to_user() functionality.

- Move the attr->size assignment from the head of sched_getattr() to the
sched_attr_copy_to_user() function. Nothing else within the kernel
should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo
Tested-by: Dietmar Eggemann
Tested-by: Thadeu Lima de Souza Cascardo
Acked-by: Thadeu Lima de Souza Cascardo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Patrick Bellasi
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar

Ingo Molnar
2019-09-05 01:51:30 +0800

03 Sep, 2019

1 commit

5e2d2cc25 sched/fair: Don't assign runtime for throttled cfs_rq ... Browse Code »

do_sched_cfs_period_timer() will refill cfs_b runtime and call
distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime
will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
attached to this cfs_b can't get runtime and will be throttled.

We find that one throttled cfs_rq has non-negative
cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64
in snippet:

distribute_cfs_runtime() {
runtime = -cfs_rq->runtime_remaining + 1;
}

The runtime here will change to a large number and consume all
cfs_b->runtime in this cfs_b period.

According to Ben Segall, the throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.

This commit prevents cfs_rq to be assgined new runtime if it has been
throttled until that distribute_cfs_runtime is called.

Signed-off-by: Liangyan
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Valentin Schneider
Reviewed-by: Ben Segall
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: shanpeic@linux.alibaba.com
Cc: stable@vger.kernel.org
Cc: xlpang@linux.alibaba.com
Fixes: d3d9dc330236 ("sched: Throttle entities exceeding their allowed bandwidth")
Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.com
Signed-off-by: Ingo Molnar

Liangyan
2019-09-03 14:55:07 +0800

26 Aug, 2019

1 commit

8a04c2ee6 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Thomas Gleixner:
"Handle the worker management in situations where a task is scheduled
out on a PI lock contention correctly and schedule a new worker if
possible"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Schedule new worker even if PI-blocked

Linus Torvalds
2019-08-26 01:06:12 +0800

25 Aug, 2019

1 commit

7b2b55da1 psi: get poll_work to run when calling poll syscall next time ... Browse Code »

Only when calling the poll syscall the first time can user receive
POLLPRI correctly. After that, user always fails to acquire the event
signal.

Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.

The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag.

Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Jason Xing
Signed-off-by: Joseph Qi
Reviewed-by: Caspar Zhang
Reviewed-by: Suren Baghdasaryan
Acked-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Xing
2019-08-25 10:48:42 +0800

19 Aug, 2019

1 commit

b0fdc0135 sched/core: Schedule new worker even if PI-blocked ... Browse Code »

If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
schedule a new kworker if we schedule out due to lock contention because !RT
does not do that as well. A spinning spinlock disables preemption and a worker
does not schedule out on lock contention (but spin).

On RT the RW-semaphore implementation uses an rtmutex so
tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
will now start a new worker which may deadlock if one worker is waiting on
progress from another worker. Since a RW-semaphore starts a new worker on !RT,
we should do the same on RT.

XFS is able to trigger this deadlock.

Allow to schedule new worker if the current worker is PI-blocked.

Signed-off-by: Sebastian Andrzej Siewior
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar

Sebastian Andrzej Siewior
2019-08-19 16:57:26 +0800

16 Aug, 2019

1 commit

a3ee2477c Merge branch 'pm-cpufreq' ... Browse Code »

* pm-cpufreq:
cpufreq: schedutil: Don't skip freq update when limits change
cpufreq: dev_pm_qos_update_request() can return 1 on success

Rafael J. Wysocki
2019-08-16 20:24:51 +0800

10 Aug, 2019

1 commit

600f5badb cpufreq: schedutil: Don't skip freq update when limits change ... Browse Code »

To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.

This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.

Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ # v4.18+
Reported-by: Doug Smythies
Tested-by: Doug Smythies
Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2019-08-10 19:53:19 +0800

06 Aug, 2019

3 commits

04e048cf0 sched/psi: Do not require setsched permission from the trigger creator ... Browse Code »

When a process creates a new trigger by writing into /proc/pressure/*
files, permissions to write such a file should be used to determine whether
the process is allowed to do so or not. Current implementation would also
require such a process to have setsched capability. Setting of psi trigger
thread's scheduling policy is an implementation detail and should not be
exposed to the user level. Remove the permission check by using _nocheck
version of the function.

Suggested-by: Nick Kralevich
Signed-off-by: Suren Baghdasaryan
Signed-off-by: Peter Zijlstra (Intel)
Cc: lizefan@huawei.com
Cc: mingo@redhat.com
Cc: akpm@linux-foundation.org
Cc: kernel-team@android.com
Cc: dennisszhou@gmail.com
Cc: dennis@kernel.org
Cc: hannes@cmpxchg.org
Cc: axboe@kernel.dk
Link: https://lkml.kernel.org/r/20190730013310.162367-1-surenb@google.com

Suren Baghdasaryan
2019-08-06 18:49:18 +0800
14f5c7b46 sched/psi: Reduce psimon FIFO priority ... Browse Code »

PSI defaults to a FIFO-99 thread, reduce this to FIFO-1.

FIFO-99 is the very highest priority available to SCHED_FIFO and
it not a suitable default; it would indicate the psi work is the
most important work on the machine.

Since Real-Time tasks will have pre-allocated memory and locked it in
place, Real-Time tasks do not care about PSI. All it needs is to be
above OTHER.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Johannes Weiner
Tested-by: Suren Baghdasaryan
Cc: Thomas Gleixner

Peter Zijlstra
2019-08-06 18:49:18 +0800
f4904815f sched/deadline: Fix double accounting of rq/running bw in push & pull ... Browse Code »

{push,pull}_dl_task() always calls {de,}activate_task() with .flags=0
which sets p->on_rq=TASK_ON_RQ_MIGRATING.

{push,pull}_dl_task()->{de,}activate_task()->{de,en}queue_task()->
{de,en}queue_task_dl() calls {sub,add}_{running,rq}_bw() since
p->on_rq==TASK_ON_RQ_MIGRATING.
So {sub,add}_{running,rq}_bw() in {push,pull}_dl_task() is
double-accounting for that task.

Fix it by removing rq/running bw accounting in [push/pull]_dl_task().

Fixes: 7dd778841164 ("sched/core: Unify p->on_rq updates")
Signed-off-by: Dietmar Eggemann
Signed-off-by: Peter Zijlstra (Intel)
Cc: Valentin Schneider
Cc: Ingo Molnar
Cc: Luca Abeni
Cc: Daniel Bristot de Oliveira
Cc: Juri Lelli
Cc: Qais Yousef
Link: https://lkml.kernel.org/r/20190802145945.18702-2-dietmar.eggemann@arm.com

Dietmar Eggemann
2019-08-06 18:49:18 +0800

25 Jul, 2019

2 commits

cb361d8cd sched/fair: Use RCU accessors consistently for ->numa_group ... Browse Code »

The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.

Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.

Signed-off-by: Jann Horn
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Petr Mladek
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Cc: Will Deacon
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar

Jann Horn
2019-07-25 21:37:05 +0800
16d51a590 sched/fair: Don't free p->numa_faults with concurrent readers ... Browse Code »

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Petr Mladek
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Cc: Will Deacon
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar

Jann Horn
2019-07-25 21:37:04 +0800

21 Jul, 2019

1 commit

07ab9d5bc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull more KVM updates from Paolo Bonzini:
"Mostly bugfixes, but also:

- s390 support for KVM selftests

- LAPIC timer offloading to housekeeping CPUs

- Extend an s390 optimization for overcommitted hosts to all
architectures

- Debugging cleanups and improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: x86: Add fixed counters to PMU filter
KVM: nVMX: do not use dangling shadow VMCS after guest reset
KVM: VMX: dump VMCS on failed entry
KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
KVM: Boost vCPUs that are delivering interrupts
KVM: selftests: Remove superfluous define from vmx.c
KVM: SVM: Fix detection of AMD Errata 1096
KVM: LAPIC: Inject timer interrupt via posted interrupt
KVM: LAPIC: Make lapic timer unpinned
KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
kvm: x86: ioapic and apic debug macros cleanup
kvm: x86: some tsc debug cleanup
kvm: vmx: fix coccinelle warnings
x86: kvm: avoid constant-conversion warning
x86: kvm: avoid -Wsometimes-uninitized warning
KVM: x86: expose AVX512_BF16 feature to guest
KVM: selftests: enable pgste option for the linker on s390
KVM: selftests: Move kvm_create_max_vcpus test to generic code
...

Linus Torvalds
2019-07-21 01:20:27 +0800

20 Jul, 2019

1 commit

0c5f81dad KVM: LAPIC: Inject timer interrupt via posted interrupt ... Browse Code »

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside. There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits. This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially. Without patch

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

While with patch:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Marcelo Tosatti
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini

Wanpeng Li
2019-07-20 15:00:40 +0800

13 Jul, 2019

1 commit

e3d85487f sched/core: Fix preempt warning in ttwu ... Browse Code »

John reported a DEBUG_PREEMPT warning caused by commit:

aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups")

I overlooked that ttwu_stat() requires preemption disabled.

Reported-by: John Stultz
Tested-by: John Stultz
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups")
Link: https://lkml.kernel.org/r/20190710105736.GK3402@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2019-07-13 17:23:27 +0800

10 Jul, 2019

1 commit

e9a83bd23 Merge tag 'docs-5.3' of git://git.lwn.net/linux ... Browse Code »

Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:

- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.

- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.

- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.

- We now recommend using sphinx 1.7, but still support back to 1.4.

- Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...

Linus Torvalds
2019-07-10 03:34:26 +0800

09 Jul, 2019

2 commits

dad1c12ed Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- Remove the unused per rq load array and all its infrastructure, by
Dietmar Eggemann.

- Add utilization clamping support by Patrick Bellasi. This is a
refinement of the energy aware scheduling framework with support for
boosting of interactive and capping of background workloads: to make
sure critical GUI threads get maximum frequency ASAP, and to make
sure background processing doesn't unnecessarily move to cpufreq
governor to higher frequencies and less energy efficient CPU modes.

- Add the bare minimum of tracepoints required for LISA EAS regression
testing, by Qais Yousef - which allows automated testing of various
power management features, including energy aware scheduling.

- Restructure the former tsk_nr_cpus_allowed() facility that the -rt
kernel used to modify the scheduler's CPU affinity logic such as
migrate_disable() - introduce the task->cpus_ptr value instead of
taking the address of &task->cpus_allowed directly - by Sebastian
Andrzej Siewior.

- Misc optimizations, fixes, cleanups and small enhancements - see the
Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
sched/uclamp: Add uclamp support to energy_compute()
sched/uclamp: Add uclamp_util_with()
sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
sched/uclamp: Set default clamps for RT tasks
sched/uclamp: Reset uclamp values on RESET_ON_FORK
sched/uclamp: Extend sched_setattr() to support utilization clamping
sched/core: Allow sched_setattr() to use the current policy
sched/uclamp: Add system default clamps
sched/uclamp: Enforce last task's UCLAMP_MAX
sched/uclamp: Add bucket local max tracking
sched/uclamp: Add CPU's clamp buckets refcounting
sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
sched/debug: Export the newly added tracepoints
sched/debug: Add sched_overutilized tracepoint
sched/debug: Add new tracepoint to track PELT at se level
sched/debug: Add new tracepoints to track PELT at rq level
sched/debug: Add a new sched_trace_*() helper functions
sched/autogroup: Make autogroup_path() always available
sched/wait: Deduplicate code with do-while
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
...

Linus Torvalds
2019-07-09 07:39:53 +0800
e19283286 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:

- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:

"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:

40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

After the patchset, they became:

40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.

Another microbenchmark shows how across the spectrum the
improvements are:

"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:

# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"

The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.

- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.

- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.

- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.

- A set of percpu ops cleanups and fixes by Peter Zijlstra.

- Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...

Linus Torvalds
2019-07-09 07:12:03 +0800

25 Jun, 2019

21 commits

af24bde8d sched/uclamp: Add uclamp support to energy_compute() ... Browse Code »

The Energy Aware Scheduler (EAS) estimates the energy impact of waking
up a task on a given CPU. This estimation is based on:

a) an (active) power consumption defined for each CPU frequency
b) an estimation of which frequency will be used on each CPU
c) an estimation of the busy time (utilization) of each CPU

Utilization clamping can affect both b) and c).

A CPU is expected to run:

- on an higher than required frequency, but for a shorter time, in case
its estimated utilization will be smaller than the minimum utilization
enforced by uclamp
- on a smaller than required frequency, but for a longer time, in case
its estimated utilization is bigger than the maximum utilization
enforced by uclamp

While compute_energy() already accounts clamping effects on busy time,
the clamping effects on frequency selection are currently ignored.

Fix it by considering how CPU clamp values will be affected by a
task waking up and being RUNNABLE on that CPU.

Do that by refactoring schedutil_freq_util() to take an additional
task_struct* which allows EAS to evaluate the impact on clamp values of
a task being eventually queued in a CPU. Clamp values are applied to the
RT+CFS utilization only when a FREQUENCY_UTIL is required by
compute_energy().

Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
computation of the cpu_util signal implies that we are more likely to
estimate the highest OPP when a RT task is running in another CPU of
the same performance domain. This can have an impact on energy
estimation but:

- it's not easy to say which approach is better, since it depends on
the use case
- the original approach could still be obtained by setting a smaller
task-specific util_min whenever required

Since we are at that:

- rename schedutil_freq_util() into schedutil_cpu_util(),
since it's not only used for frequency selection.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-12-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:49 +0800
9d20ad7df sched/uclamp: Add uclamp_util_with() ... Browse Code »

So far uclamp_util() allows to clamp a specified utilization considering
the clamp values requested by RUNNABLE tasks in a CPU. For the Energy
Aware Scheduler (EAS) it is interesting to test how clamp values will
change when a task is becoming RUNNABLE on a given CPU.
For example, EAS is interested in comparing the energy impact of
different scheduling decisions and the clamp values can play a role on
that.

Add uclamp_util_with() which allows to clamp a given utilization by
considering the possible impact on CPU clamp values of a specified task.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-11-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:48 +0800
982d9cdc2 sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks ... Browse Code »

Each time a frequency update is required via schedutil, a frequency is
selected to (possibly) satisfy the utilization reported by each
scheduling class and irqs. However, when utilization clamping is in use,
the frequency selection should consider userspace utilization clamping
hints. This will allow, for example, to:

- boost tasks which are directly affecting the user experience
by running them at least at a minimum "requested" frequency

- cap low priority tasks not directly affecting the user experience
by running them only up to a maximum "allowed" frequency

These constraints are meant to support a per-task based tuning of the
frequency selection thus supporting a fine grained definition of
performance boosting vs energy saving strategies in kernel space.

Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
within the boundaries defined by their aggregated utilization clamp
constraints.

Do that by considering the max(min_util, max_util) to give boosted tasks
the performance they need even when they happen to be co-scheduled with
other capped tasks.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:48 +0800
1a00d9999 sched/uclamp: Set default clamps for RT tasks ... Browse Code »

By default FAIR tasks start without clamps, i.e. neither boosted nor
capped, and they run at the best frequency matching their utilization
demand. This default behavior does not fit RT tasks which instead are
expected to run at the maximum available frequency, if not otherwise
required by explicitly capping them.

Enforce the correct behavior for RT tasks by setting util_min to max
whenever:

1. the task is switched to the RT class and it does not already have a
user-defined clamp value assigned.

2. an RT task is forked from a parent with RESET_ON_FORK set.

NOTE: utilization clamp values are cross scheduling class attributes and
thus they are never changed/reset once a value has been explicitly
defined from user-space.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-9-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:47 +0800
a87498ace sched/uclamp: Reset uclamp values on RESET_ON_FORK ... Browse Code »

A forked tasks gets the same clamp values of its parent however, when
the RESET_ON_FORK flag is set on parent, e.g. via:

sys_sched_setattr()
sched_setattr()
__sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK)

the new forked task is expected to start with all attributes reset to
default values.

Do that for utilization clamp values too by checking the reset request
from the existing uclamp_fork() call which already provides the required
initialization for other uclamp related bits.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-8-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:47 +0800
a509a7cd7 sched/uclamp: Extend sched_setattr() to support utilization clamping ... Browse Code »

The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements that can translate into proper
decisions for both task placements and frequencies selections. Other
classes have a more simplified model based on the POSIX concept of
priorities.

Such a simple priority based model however does not allow to exploit
most advanced features of the Linux scheduler like, for example, driving
frequencies selection via the schedutil cpufreq governor. However, also
for non SCHED_DEADLINE tasks, it's still interesting to define tasks
properties to support scheduler decisions.

Utilization clamping exposes to user-space a new set of per-task
attributes the scheduler can use as hints about the expected/required
utilization for a task. This allows to implement a "proactive" per-task
frequency control policy, a more advanced policy than the current one
based just on "passive" measured task utilization. For example, it's
possible to boost interactive tasks (e.g. to get better performance) or
cap background tasks (e.g. to be more energy/thermal efficient).

Introduce a new API to set utilization clamping values for a specified
task by extending sched_setattr(), a syscall which already allows to
define task specific properties for different scheduling classes. A new
pair of attributes allows to specify a minimum and maximum utilization
the scheduler can consider for a task.

Do that by validating the required clamp values before and then applying
the required changes using _the_ same pattern already in use for
__setscheduler(). This ensures that the task is re-enqueued with the new
clamp values.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-7-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:46 +0800
1d6362fa0 sched/core: Allow sched_setattr() to use the current policy ... Browse Code »

The sched_setattr() syscall mandates that a policy is always specified.
This requires to always know which policy a task will have when
attributes are configured and this makes it impossible to add more
generic task attributes valid across different scheduling policies.
Reading the policy before setting generic tasks attributes is racy since
we cannot be sure it is not changed concurrently.

Introduce the required support to change generic task attributes without
affecting the current task policy. This is done by adding an attribute flag
(SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy.

Add support for the SETPARAM_POLICY policy, which is already used by the
sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX
syscall.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-6-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:46 +0800
e8f14172c sched/uclamp: Add system default clamps ... Browse Code »

Tasks without a user-defined clamp value are considered not clamped
and by default their utilization can have any value in the
[0..SCHED_CAPACITY_SCALE] range.

Tasks with a user-defined clamp value are allowed to request any value
in that range, and the required clamp is unconditionally enforced.
However, a "System Management Software" could be interested in limiting
the range of clamp values allowed for all tasks.

Add a privileged interface to define a system default configuration via:

/proc/sys/kernel/sched_uclamp_util_{min,max}

which works as an unconditional clamp range restriction for all tasks.

With the default configuration, the full SCHED_CAPACITY_SCALE range of
values is allowed for each clamp index. Otherwise, the task-specific
clamp is capped by the corresponding system default value.

Do that by tracking, for each task, the "effective" clamp value and
bucket the task has been refcounted in at enqueue time. This
allows to lazy aggregate "requested" and "system default" values at
enqueue time and simplifies refcounting updates at dequeue time.

The cached bucket ids are used to avoid (relatively) more expensive
integer divisions every time a task is enqueued.

An active flag is used to report when the "effective" value is valid and
thus the task is actually refcounted in the corresponding rq's bucket.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:45 +0800
e496187da sched/uclamp: Enforce last task's UCLAMP_MAX ... Browse Code »

When a task sleeps it removes its max utilization clamp from its CPU.
However, the blocked utilization on that CPU can be higher than the max
clamp value enforced while the task was running. This allows undesired
CPU frequency increases while a CPU is idle, for example, when another
CPU on the same frequency domain triggers a frequency update, since
schedutil can now see the full not clamped blocked utilization of the
idle CPU.

Fix this by using:

uclamp_rq_dec_id(p, rq, UCLAMP_MAX)
uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value)

to detect when a CPU has no more RUNNABLE clamped tasks and to flag this
condition.

Don't track any minimum utilization clamps since an idle CPU never
requires a minimum frequency. The decay of the blocked utilization is
good enough to reduce the CPU frequency.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-4-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:45 +0800
60daf9c19 sched/uclamp: Add bucket local max tracking ... Browse Code »

Because of bucketization, different task-specific clamp values are
tracked in the same bucket. For example, with 20% bucket size and
assuming to have:

Task1: util_min=25%
Task2: util_min=35%

both tasks will be refcounted in the [20..39]% bucket and always boosted
only up to 20% thus implementing a simple floor aggregation normally
used in histograms.

In systems with only few and well-defined clamp values, it would be
useful to track the exact clamp value required by a task whenever
possible. For example, if a system requires only 23% and 47% boost
values then it's possible to track the exact boost required by each
task using only 3 buckets of ~33% size each.

Introduce a mechanism to max aggregate the requested clamp values of
RUNNABLE tasks in the same bucket. Keep it simple by resetting the
bucket value to its base value only when a bucket becomes inactive.
Allow a limited and controlled overboosting margin for tasks recounted
in the same bucket.

In systems where the boost values are not known in advance, it is still
possible to control the maximum acceptable overboosting margin by tuning
the number of clamp groups. For example, 20 groups ensure a 5% maximum
overboost.

Remove the rq bucket initialization code since a correct bucket value
is now computed when a task is refcounted into a CPU's rq.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-3-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:44 +0800
69842cba9 sched/uclamp: Add CPU's clamp buckets refcounting ... Browse Code »

Utilization clamping allows to clamp the CPU's utilization within a
[util_min, util_max] range, depending on the set of RUNNABLE tasks on
that CPU. Each task references two "clamp buckets" defining its minimum
and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
bucket is active if there is at least one RUNNABLE tasks enqueued on
that CPU and refcounting that bucket.

When a task is {en,de}queued {on,from} a rq, the set of active clamp
buckets on that CPU can change. If the set of active clamp buckets
changes for a CPU a new "aggregated" clamp value is computed for that
CPU. This is because each clamp bucket enforces a different utilization
clamp value.

Clamp values are always MAX aggregated for both util_min and util_max.
This ensures that no task can affect the performance of other
co-scheduled tasks which are more boosted (i.e. with higher util_min
clamp) or less capped (i.e. with higher util_max clamp).

A task has:
task_struct::uclamp[clamp_id]::bucket_id
to track the "bucket index" of the CPU's clamp bucket it refcounts while
enqueued, for each clamp index (clamp_id).

A runqueue has:
rq::uclamp[clamp_id]::bucket[bucket_id].tasks
to track how many RUNNABLE tasks on that CPU refcount each
clamp bucket (bucket_id) of a clamp index (clamp_id).
It also has a:
rq::uclamp[clamp_id]::bucket[bucket_id].value
to track the clamp value of each clamp bucket (bucket_id) of a clamp
index (clamp_id).

The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
needed to find a new MAX aggregated clamp value for a clamp_id. This
operation is required only when it's dequeued the last task of a clamp
bucket tracking the current MAX aggregated clamp value. In this case,
the CPU is either entering IDLE or going to schedule a less boosted or
more clamped task.
The expected number of different clamp values configured at build time
is small enough to fit the full unordered array into a single cache
line, for configurations of up to 7 buckets.

Add to struct rq the basic data structures required to refcount the
number of RUNNABLE tasks for each clamp bucket. Add also the max
aggregation required to update the rq's clamp value at each
enqueue/dequeue event.

Use a simple linear mapping of clamp values into clamp buckets.
Pre-compute and cache bucket_id to avoid integer divisions at
enqueue/dequeue time.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:44 +0800
a3df06797 sched/fair: Rename weighted_cpuload() to cpu_runnable_load() ... Browse Code »

The term 'weighted' is not needed since there is no 'unweighted' load.
Instead use the term 'runnable' to distinguish 'runnable' load
(avg.runnable_load_avg) used in load balance from load (avg.load_avg)
which is the sum of 'runnable' and 'blocked' load.

Signed-off-by: Dietmar Eggemann
Signed-off-by: Peter Zijlstra (Intel)
Cc: Frederic Weisbecker
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Patrick Bellasi
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Valentin Schneider
Cc: Vincent Guittot
Link: https://lkml.kernel.org/r/57f27a7f-2775-d832-e965-0f4d51bb1954@arm.com
Signed-off-by: Ingo Molnar

Dietmar Eggemann
2019-06-25 01:23:43 +0800
a056a5bed sched/debug: Export the newly added tracepoints ... Browse Code »

So that external modules can hook into them and extract the info they
need. Since these new tracepoints have no events associated with them
exporting these tracepoints make them useful for external modules to
perform testing and debugging. There's no other way otherwise to access
them.

BPF doesn't have infrastructure to access these bare tracepoints either.

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-7-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:43 +0800
f9f240f96 sched/debug: Add sched_overutilized tracepoint ... Browse Code »

The new tracepoint allows us to track the changes in overutilized
status.

Overutilized status is associated with EAS. It indicates that the system
is in high performance state. EAS is disabled when the system is in this
state since there's not much energy savings while high performance tasks
are pushing the system to the limit and it's better to default to the
spreading behavior of the scheduler.

This tracepoint helps understanding and debugging the conditions under
which this happens.

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-6-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:42 +0800
8de6242cc sched/debug: Add new tracepoint to track PELT at se level ... Browse Code »

The new tracepoint allows tracking PELT signals at sched_entity level.
Which is supported in CFS tasks and taskgroups only.

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-5-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:42 +0800
ba19f51fc sched/debug: Add new tracepoints to track PELT at rq level ... Browse Code »

The new tracepoints allow tracking PELT signals at rq level for all
scheduling classes + irq.

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-4-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:41 +0800
3c93a0c04 sched/debug: Add a new sched_trace_*() helper functions ... Browse Code »

The new functions allow modules to access internal data structures of
unexported struct cfs_rq and struct rq to extract important information
from the tracepoints to be introduced in later patches.

While at it fix alphabetical order of struct declarations in sched.h

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-3-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:41 +0800
9ba5090ae sched/autogroup: Make autogroup_path() always available ... Browse Code »

Remove the #ifdef CONFIG_SCHED_DEBUG.

Some of the tracepoints to be introduced in later patches need to access
this function. Hence make it always available since the tracepoints are
not protected by CONFIG_SCHED_DEBUG.

Signed-off-by: Qais Yousef
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Linus Torvalds
Cc: Pavankumar Kondeti
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Sebastian Andrzej Siewior
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Uwe Kleine-Konig
Link: https://lkml.kernel.org/r/20190604111459.2862-2-qais.yousef@arm.com
Signed-off-by: Ingo Molnar

Qais Yousef
2019-06-25 01:23:40 +0800
016190a4b sched/wait: Deduplicate code with do-while ... Browse Code »

Statements in the loop's body and before it are identical.
Use do-while to not repeat it.

Signed-off-by: Pavel Begunkov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com
Signed-off-by: Ingo Molnar

Pavel Begunkov
2019-06-25 01:23:40 +0800
8ec59c0f5 sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ... Browse Code »

The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
unused since commit:

765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")

Remove it.

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Viresh Kumar
Reviewed-by: Valentin Schneider
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: gregkh@linuxfoundation.org
Cc: linux@armlinux.org.uk
Cc: quentin.perret@arm.com
Cc: rafael@kernel.org
Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2019-06-25 01:23:39 +0800
d2abae71e Merge tag 'v5.2-rc6' into sched/core, to refresh the branch ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-06-25 01:19:53 +0800

19 Jun, 2019

1 commit

d2912cb15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 ... Browse Code »

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Kate Stewart
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:55 +0800