05 Sep, 2019

1 commit

  • Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

    $ chrt -p $$
    chrt: failed to get pid 26306's policy: Argument list too long

    and he has root-caused the bug to the following commit increasing sched_attr
    size and breaking sched_read_attr() into returning -EFBIG:

    a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

    The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
    logic of checking non-zero bits in new ABI components is arguably broken,
    and pretty much any extension of the ABI will spuriously break the ABI.
    That's way too fragile.

    Instead implement the perf syscall's extensible ABI instead, which we
    already implement on the sched_setattr() side:

    - if user-attributes have the same size as kernel attributes then the
    logic is unchanged.

    - if user-attributes are larger than the kernel knows about then simply
    skip the extra bits, but set attr->size to the (smaller) kernel size
    so that tooling can (in principle) handle older kernel as well.

    - if user-attributes are smaller than the kernel knows about then just
    copy whatever user-space can accept.

    Also clean up the whole logic:

    - Simplify the code flow - there's no need for 'ret' for example.

    - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
    always know which side we are dealing with.

    - Why is it called 'read' when what it does is to copy to user? This
    code is so far away from VFS read() semantics that the naming is
    actively confusing. Name it sched_attr_copy_to_user() instead, which
    mirrors other copy_to_user() functionality.

    - Move the attr->size assignment from the head of sched_getattr() to the
    sched_attr_copy_to_user() function. Nothing else within the kernel
    should care about the size of the structure.

    With these fixes the sched_getattr() syscall now nicely supports an
    extensible ABI in both a forward and backward compatible fashion, and
    will also fix the chrt bug.

    As an added bonus the bogus -EFBIG return is removed as well, which as
    Thadeu noted should have been -E2BIG to begin with.

    Reported-by: Thadeu Lima de Souza Cascardo
    Tested-by: Dietmar Eggemann
    Tested-by: Thadeu Lima de Souza Cascardo
    Acked-by: Thadeu Lima de Souza Cascardo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Patrick Bellasi
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
    Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

03 Sep, 2019

1 commit

  • do_sched_cfs_period_timer() will refill cfs_b runtime and call
    distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime
    will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
    attached to this cfs_b can't get runtime and will be throttled.

    We find that one throttled cfs_rq has non-negative
    cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64
    in snippet:

    distribute_cfs_runtime() {
    runtime = -cfs_rq->runtime_remaining + 1;
    }

    The runtime here will change to a large number and consume all
    cfs_b->runtime in this cfs_b period.

    According to Ben Segall, the throttled cfs_rq can have
    account_cfs_rq_runtime called on it because it is throttled before
    idle_balance, and the idle_balance calls update_rq_clock to add time
    that is accounted to the task.

    This commit prevents cfs_rq to be assgined new runtime if it has been
    throttled until that distribute_cfs_runtime is called.

    Signed-off-by: Liangyan
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Valentin Schneider
    Reviewed-by: Ben Segall
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: shanpeic@linux.alibaba.com
    Cc: stable@vger.kernel.org
    Cc: xlpang@linux.alibaba.com
    Fixes: d3d9dc330236 ("sched: Throttle entities exceeding their allowed bandwidth")
    Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.com
    Signed-off-by: Ingo Molnar

    Liangyan
     

26 Aug, 2019

1 commit


25 Aug, 2019

1 commit

  • Only when calling the poll syscall the first time can user receive
    POLLPRI correctly. After that, user always fails to acquire the event
    signal.

    Reproduce case:
    1. Get the monitor code in Documentation/accounting/psi.txt
    2. Run it, and wait for the event triggered.
    3. Kill and restart the process.

    The question is why we can end up with poll_scheduled = 1 but the work
    not running (which would reset it to 0). And the answer is because the
    scheduling side sees group->poll_kworker under RCU protection and then
    schedules it, but here we cancel the work and destroy the worker. The
    cancel needs to pair with resetting the poll_scheduled flag.

    Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
    Signed-off-by: Jason Xing
    Signed-off-by: Joseph Qi
    Reviewed-by: Caspar Zhang
    Reviewed-by: Suren Baghdasaryan
    Acked-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Xing
     

19 Aug, 2019

1 commit

  • If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
    schedule a new kworker if we schedule out due to lock contention because !RT
    does not do that as well. A spinning spinlock disables preemption and a worker
    does not schedule out on lock contention (but spin).

    On RT the RW-semaphore implementation uses an rtmutex so
    tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
    will now start a new worker which may deadlock if one worker is waiting on
    progress from another worker. Since a RW-semaphore starts a new worker on !RT,
    we should do the same on RT.

    XFS is able to trigger this deadlock.

    Allow to schedule new worker if the current worker is PI-blocked.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de
    Signed-off-by: Ingo Molnar

    Sebastian Andrzej Siewior
     

16 Aug, 2019

1 commit


10 Aug, 2019

1 commit

  • To avoid reducing the frequency of a CPU prematurely, we skip reducing
    the frequency if the CPU had been busy recently.

    This should not be done when the limits of the policy are changed, for
    example due to thermal throttling. We should always get the frequency
    within the new limits as soon as possible.

    Trying to fix this by using only one flag, i.e. need_freq_update, can
    lead to a race condition where the flag gets cleared without forcing us
    to change the frequency at least once. And so this patch introduces
    another flag to avoid that race condition.

    Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
    Cc: v4.18+ # v4.18+
    Reported-by: Doug Smythies
    Tested-by: Doug Smythies
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

06 Aug, 2019

3 commits

  • When a process creates a new trigger by writing into /proc/pressure/*
    files, permissions to write such a file should be used to determine whether
    the process is allowed to do so or not. Current implementation would also
    require such a process to have setsched capability. Setting of psi trigger
    thread's scheduling policy is an implementation detail and should not be
    exposed to the user level. Remove the permission check by using _nocheck
    version of the function.

    Suggested-by: Nick Kralevich
    Signed-off-by: Suren Baghdasaryan
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: lizefan@huawei.com
    Cc: mingo@redhat.com
    Cc: akpm@linux-foundation.org
    Cc: kernel-team@android.com
    Cc: dennisszhou@gmail.com
    Cc: dennis@kernel.org
    Cc: hannes@cmpxchg.org
    Cc: axboe@kernel.dk
    Link: https://lkml.kernel.org/r/20190730013310.162367-1-surenb@google.com

    Suren Baghdasaryan
     
  • PSI defaults to a FIFO-99 thread, reduce this to FIFO-1.

    FIFO-99 is the very highest priority available to SCHED_FIFO and
    it not a suitable default; it would indicate the psi work is the
    most important work on the machine.

    Since Real-Time tasks will have pre-allocated memory and locked it in
    place, Real-Time tasks do not care about PSI. All it needs is to be
    above OTHER.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Johannes Weiner
    Tested-by: Suren Baghdasaryan
    Cc: Thomas Gleixner

    Peter Zijlstra
     
  • {push,pull}_dl_task() always calls {de,}activate_task() with .flags=0
    which sets p->on_rq=TASK_ON_RQ_MIGRATING.

    {push,pull}_dl_task()->{de,}activate_task()->{de,en}queue_task()->
    {de,en}queue_task_dl() calls {sub,add}_{running,rq}_bw() since
    p->on_rq==TASK_ON_RQ_MIGRATING.
    So {sub,add}_{running,rq}_bw() in {push,pull}_dl_task() is
    double-accounting for that task.

    Fix it by removing rq/running bw accounting in [push/pull]_dl_task().

    Fixes: 7dd778841164 ("sched/core: Unify p->on_rq updates")
    Signed-off-by: Dietmar Eggemann
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Valentin Schneider
    Cc: Ingo Molnar
    Cc: Luca Abeni
    Cc: Daniel Bristot de Oliveira
    Cc: Juri Lelli
    Cc: Qais Yousef
    Link: https://lkml.kernel.org/r/20190802145945.18702-2-dietmar.eggemann@arm.com

    Dietmar Eggemann
     

25 Jul, 2019

2 commits

  • The old code used RCU annotations and accessors inconsistently for
    ->numa_group, which can lead to use-after-frees and NULL dereferences.

    Let all accesses to ->numa_group use proper RCU helpers to prevent such
    issues.

    Signed-off-by: Jann Horn
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Petr Mladek
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
    Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
    Signed-off-by: Ingo Molnar

    Jann Horn
     
  • When going through execve(), zero out the NUMA fault statistics instead of
    freeing them.

    During execve, the task is reachable through procfs and the scheduler. A
    concurrent /proc/*/sched reader can read data from a freed ->numa_faults
    allocation (confirmed by KASAN) and write it back to userspace.
    I believe that it would also be possible for a use-after-free read to occur
    through a race between a NUMA fault and execve(): task_numa_fault() can
    lead to task_numa_compare(), which invokes task_weight() on the currently
    running task of a different CPU.

    Another way to fix this would be to make ->numa_faults RCU-managed or add
    extra locking, but it seems easier to wipe the NUMA fault statistics on
    execve.

    Signed-off-by: Jann Horn
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Petr Mladek
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
    Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
    Signed-off-by: Ingo Molnar

    Jann Horn
     

21 Jul, 2019

1 commit

  • Pull more KVM updates from Paolo Bonzini:
    "Mostly bugfixes, but also:

    - s390 support for KVM selftests

    - LAPIC timer offloading to housekeeping CPUs

    - Extend an s390 optimization for overcommitted hosts to all
    architectures

    - Debugging cleanups and improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
    KVM: x86: Add fixed counters to PMU filter
    KVM: nVMX: do not use dangling shadow VMCS after guest reset
    KVM: VMX: dump VMCS on failed entry
    KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
    KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
    KVM: Boost vCPUs that are delivering interrupts
    KVM: selftests: Remove superfluous define from vmx.c
    KVM: SVM: Fix detection of AMD Errata 1096
    KVM: LAPIC: Inject timer interrupt via posted interrupt
    KVM: LAPIC: Make lapic timer unpinned
    KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
    KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
    kvm: x86: ioapic and apic debug macros cleanup
    kvm: x86: some tsc debug cleanup
    kvm: vmx: fix coccinelle warnings
    x86: kvm: avoid constant-conversion warning
    x86: kvm: avoid -Wsometimes-uninitized warning
    KVM: x86: expose AVX512_BF16 feature to guest
    KVM: selftests: enable pgste option for the linker on s390
    KVM: selftests: Move kvm_create_max_vcpus test to generic code
    ...

    Linus Torvalds
     

20 Jul, 2019

1 commit

  • Dedicated instances are currently disturbed by unnecessary jitter due
    to the emulated lapic timers firing on the same pCPUs where the
    vCPUs reside. There is no hardware virtual timer on Intel for guest
    like ARM, so both programming timer in guest and the emulated timer fires
    incur vmexits. This patch tries to avoid vmexit when the emulated timer
    fires, at least in dedicated instance scenario when nohz_full is enabled.

    In that case, the emulated timers can be offload to the nearest busy
    housekeeping cpus since APICv has been found for several years in server
    processors. The guest timer interrupt can then be injected via posted interrupts,
    which are delivered by the housekeeping cpu once the emulated timer fires.

    The host should tuned so that vCPUs are placed on isolated physical
    processors, and with several pCPUs surplus for busy housekeeping.
    If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
    ~3% redis performance benefit can be observed on Skylake server, and the
    number of external interrupt vmexits drops substantially. Without patch

    VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
    EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

    While with patch:

    VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
    EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Marcelo Tosatti
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     

13 Jul, 2019

1 commit

  • John reported a DEBUG_PREEMPT warning caused by commit:

    aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups")

    I overlooked that ttwu_stat() requires preemption disabled.

    Reported-by: John Stultz
    Tested-by: John Stultz
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups")
    Link: https://lkml.kernel.org/r/20190710105736.GK3402@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Jul, 2019

1 commit

  • Pull Documentation updates from Jonathan Corbet:
    "It's been a relatively busy cycle for docs:

    - A fair pile of RST conversions, many from Mauro. These create more
    than the usual number of simple but annoying merge conflicts with
    other trees, unfortunately. He has a lot more of these waiting on
    the wings that, I think, will go to you directly later on.

    - A new document on how to use merges and rebases in kernel repos,
    and one on Spectre vulnerabilities.

    - Various improvements to the build system, including automatic
    markup of function() references because some people, for reasons I
    will never understand, were of the opinion that
    :c:func:``function()`` is unattractive and not fun to type.

    - We now recommend using sphinx 1.7, but still support back to 1.4.

    - Lots of smaller improvements, warning fixes, typo fixes, etc"

    * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
    docs: automarkup.py: ignore exceptions when seeking for xrefs
    docs: Move binderfs to admin-guide
    Disable Sphinx SmartyPants in HTML output
    doc: RCU callback locks need only _bh, not necessarily _irq
    docs: format kernel-parameters -- as code
    Doc : doc-guide : Fix a typo
    platform: x86: get rid of a non-existent document
    Add the RCU docs to the core-api manual
    Documentation: RCU: Add TOC tree hooks
    Documentation: RCU: Rename txt files to rst
    Documentation: RCU: Convert RCU UP systems to reST
    Documentation: RCU: Convert RCU linked list to reST
    Documentation: RCU: Convert RCU basic concepts to reST
    docs: filesystems: Remove uneeded .rst extension on toctables
    scripts/sphinx-pre-install: fix out-of-tree build
    docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
    Documentation: PGP: update for newer HW devices
    Documentation: Add section about CPU vulnerabilities for Spectre
    Documentation: platform: Delete x86-laptop-drivers.txt
    docs: Note that :c:func: should no longer be used
    ...

    Linus Torvalds
     

09 Jul, 2019

2 commits

  • Pull scheduler updates from Ingo Molnar:

    - Remove the unused per rq load array and all its infrastructure, by
    Dietmar Eggemann.

    - Add utilization clamping support by Patrick Bellasi. This is a
    refinement of the energy aware scheduling framework with support for
    boosting of interactive and capping of background workloads: to make
    sure critical GUI threads get maximum frequency ASAP, and to make
    sure background processing doesn't unnecessarily move to cpufreq
    governor to higher frequencies and less energy efficient CPU modes.

    - Add the bare minimum of tracepoints required for LISA EAS regression
    testing, by Qais Yousef - which allows automated testing of various
    power management features, including energy aware scheduling.

    - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
    kernel used to modify the scheduler's CPU affinity logic such as
    migrate_disable() - introduce the task->cpus_ptr value instead of
    taking the address of &task->cpus_allowed directly - by Sebastian
    Andrzej Siewior.

    - Misc optimizations, fixes, cleanups and small enhancements - see the
    Git log for details.

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
    sched/uclamp: Add uclamp support to energy_compute()
    sched/uclamp: Add uclamp_util_with()
    sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
    sched/uclamp: Set default clamps for RT tasks
    sched/uclamp: Reset uclamp values on RESET_ON_FORK
    sched/uclamp: Extend sched_setattr() to support utilization clamping
    sched/core: Allow sched_setattr() to use the current policy
    sched/uclamp: Add system default clamps
    sched/uclamp: Enforce last task's UCLAMP_MAX
    sched/uclamp: Add bucket local max tracking
    sched/uclamp: Add CPU's clamp buckets refcounting
    sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
    sched/debug: Export the newly added tracepoints
    sched/debug: Add sched_overutilized tracepoint
    sched/debug: Add new tracepoint to track PELT at se level
    sched/debug: Add new tracepoints to track PELT at rq level
    sched/debug: Add a new sched_trace_*() helper functions
    sched/autogroup: Make autogroup_path() always available
    sched/wait: Deduplicate code with do-while
    sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle are:

    - rwsem scalability improvements, phase #2, by Waiman Long, which are
    rather impressive:

    "On a 2-socket 40-core 80-thread Skylake system with 40 reader
    and writer locking threads, the min/mean/max locking operations
    done in a 5-second testing window before the patchset were:

    40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
    40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

    After the patchset, they became:

    40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
    40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

    There's a lot of changes to the locking implementation that makes
    it similar to qrwlock, including owner handoff for more fair
    locking.

    Another microbenchmark shows how across the spectrum the
    improvements are:

    "With a locking microbenchmark running on 5.1 based kernel, the
    total locking rates (in kops/s) on a 2-socket Skylake system
    with equal numbers of readers and writers (mixed) before and
    after this patchset were:

    # of Threads Before Patch After Patch
    ------------ ------------ -----------
    2 2,618 4,193
    4 1,202 3,726
    8 802 3,622
    16 729 3,359
    32 319 2,826
    64 102 2,744"

    The changes are extensive and the patch-set has been through
    several iterations addressing various locking workloads. There
    might be more regressions, but unless they are pathological I
    believe we want to use this new implementation as the baseline
    going forward.

    - jump-label optimizations by Daniel Bristot de Oliveira: the primary
    motivation was to remove IPI disturbance of isolated RT-workload
    CPUs, which resulted in the implementation of batched jump-label
    updates. Beyond the improvement of the real-time characteristics
    kernel, in one test this patchset improved static key update
    overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
    as well.

    - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
    ~10 years of atomic64_t existence the various types used by the
    APIs only had to be self-consistent within each architecture -
    which means they became wildly inconsistent across architectures.
    Mark puts and end to this by reworking all the atomic64
    implementations to use 's64' as the base type for atomic64_t, and
    to ensure that this type is consistently used for parameters and
    return values in the API, avoiding further problems in this area.

    - A large set of small improvements to lockdep by Yuyang Du: type
    cleanups, output cleanups, function return type and othr cleanups
    all around the place.

    - A set of percpu ops cleanups and fixes by Peter Zijlstra.

    - Misc other changes - please see the Git log for more details"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
    locking/lockdep: increase size of counters for lockdep statistics
    locking/atomics: Use sed(1) instead of non-standard head(1) option
    locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
    x86/jump_label: Make tp_vec_nr static
    x86/percpu: Optimize raw_cpu_xchg()
    x86/percpu, sched/fair: Avoid local_clock()
    x86/percpu, x86/irq: Relax {set,get}_irq_regs()
    x86/percpu: Relax smp_processor_id()
    x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
    locking/rwsem: Guard against making count negative
    locking/rwsem: Adaptive disabling of reader optimistic spinning
    locking/rwsem: Enable time-based spinning on reader-owned rwsem
    locking/rwsem: Make rwsem->owner an atomic_long_t
    locking/rwsem: Enable readers spinning on writer
    locking/rwsem: Clarify usage of owner's nonspinaable bit
    locking/rwsem: Wake up almost all readers in wait queue
    locking/rwsem: More optimal RT task handling of null owner
    locking/rwsem: Always release wait_lock before waking up tasks
    locking/rwsem: Implement lock handoff to prevent lock starvation
    locking/rwsem: Make rwsem_spin_on_owner() return owner state
    ...

    Linus Torvalds
     

25 Jun, 2019

21 commits

  • The Energy Aware Scheduler (EAS) estimates the energy impact of waking
    up a task on a given CPU. This estimation is based on:

    a) an (active) power consumption defined for each CPU frequency
    b) an estimation of which frequency will be used on each CPU
    c) an estimation of the busy time (utilization) of each CPU

    Utilization clamping can affect both b) and c).

    A CPU is expected to run:

    - on an higher than required frequency, but for a shorter time, in case
    its estimated utilization will be smaller than the minimum utilization
    enforced by uclamp
    - on a smaller than required frequency, but for a longer time, in case
    its estimated utilization is bigger than the maximum utilization
    enforced by uclamp

    While compute_energy() already accounts clamping effects on busy time,
    the clamping effects on frequency selection are currently ignored.

    Fix it by considering how CPU clamp values will be affected by a
    task waking up and being RUNNABLE on that CPU.

    Do that by refactoring schedutil_freq_util() to take an additional
    task_struct* which allows EAS to evaluate the impact on clamp values of
    a task being eventually queued in a CPU. Clamp values are applied to the
    RT+CFS utilization only when a FREQUENCY_UTIL is required by
    compute_energy().

    Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
    computation of the cpu_util signal implies that we are more likely to
    estimate the highest OPP when a RT task is running in another CPU of
    the same performance domain. This can have an impact on energy
    estimation but:

    - it's not easy to say which approach is better, since it depends on
    the use case
    - the original approach could still be obtained by setting a smaller
    task-specific util_min whenever required

    Since we are at that:

    - rename schedutil_freq_util() into schedutil_cpu_util(),
    since it's not only used for frequency selection.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-12-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • So far uclamp_util() allows to clamp a specified utilization considering
    the clamp values requested by RUNNABLE tasks in a CPU. For the Energy
    Aware Scheduler (EAS) it is interesting to test how clamp values will
    change when a task is becoming RUNNABLE on a given CPU.
    For example, EAS is interested in comparing the energy impact of
    different scheduling decisions and the clamp values can play a role on
    that.

    Add uclamp_util_with() which allows to clamp a given utilization by
    considering the possible impact on CPU clamp values of a specified task.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-11-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • Each time a frequency update is required via schedutil, a frequency is
    selected to (possibly) satisfy the utilization reported by each
    scheduling class and irqs. However, when utilization clamping is in use,
    the frequency selection should consider userspace utilization clamping
    hints. This will allow, for example, to:

    - boost tasks which are directly affecting the user experience
    by running them at least at a minimum "requested" frequency

    - cap low priority tasks not directly affecting the user experience
    by running them only up to a maximum "allowed" frequency

    These constraints are meant to support a per-task based tuning of the
    frequency selection thus supporting a fine grained definition of
    performance boosting vs energy saving strategies in kernel space.

    Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
    within the boundaries defined by their aggregated utilization clamp
    constraints.

    Do that by considering the max(min_util, max_util) to give boosted tasks
    the performance they need even when they happen to be co-scheduled with
    other capped tasks.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • By default FAIR tasks start without clamps, i.e. neither boosted nor
    capped, and they run at the best frequency matching their utilization
    demand. This default behavior does not fit RT tasks which instead are
    expected to run at the maximum available frequency, if not otherwise
    required by explicitly capping them.

    Enforce the correct behavior for RT tasks by setting util_min to max
    whenever:

    1. the task is switched to the RT class and it does not already have a
    user-defined clamp value assigned.

    2. an RT task is forked from a parent with RESET_ON_FORK set.

    NOTE: utilization clamp values are cross scheduling class attributes and
    thus they are never changed/reset once a value has been explicitly
    defined from user-space.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-9-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • A forked tasks gets the same clamp values of its parent however, when
    the RESET_ON_FORK flag is set on parent, e.g. via:

    sys_sched_setattr()
    sched_setattr()
    __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK)

    the new forked task is expected to start with all attributes reset to
    default values.

    Do that for utilization clamp values too by checking the reset request
    from the existing uclamp_fork() call which already provides the required
    initialization for other uclamp related bits.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-8-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • The SCHED_DEADLINE scheduling class provides an advanced and formal
    model to define tasks requirements that can translate into proper
    decisions for both task placements and frequencies selections. Other
    classes have a more simplified model based on the POSIX concept of
    priorities.

    Such a simple priority based model however does not allow to exploit
    most advanced features of the Linux scheduler like, for example, driving
    frequencies selection via the schedutil cpufreq governor. However, also
    for non SCHED_DEADLINE tasks, it's still interesting to define tasks
    properties to support scheduler decisions.

    Utilization clamping exposes to user-space a new set of per-task
    attributes the scheduler can use as hints about the expected/required
    utilization for a task. This allows to implement a "proactive" per-task
    frequency control policy, a more advanced policy than the current one
    based just on "passive" measured task utilization. For example, it's
    possible to boost interactive tasks (e.g. to get better performance) or
    cap background tasks (e.g. to be more energy/thermal efficient).

    Introduce a new API to set utilization clamping values for a specified
    task by extending sched_setattr(), a syscall which already allows to
    define task specific properties for different scheduling classes. A new
    pair of attributes allows to specify a minimum and maximum utilization
    the scheduler can consider for a task.

    Do that by validating the required clamp values before and then applying
    the required changes using _the_ same pattern already in use for
    __setscheduler(). This ensures that the task is re-enqueued with the new
    clamp values.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-7-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • The sched_setattr() syscall mandates that a policy is always specified.
    This requires to always know which policy a task will have when
    attributes are configured and this makes it impossible to add more
    generic task attributes valid across different scheduling policies.
    Reading the policy before setting generic tasks attributes is racy since
    we cannot be sure it is not changed concurrently.

    Introduce the required support to change generic task attributes without
    affecting the current task policy. This is done by adding an attribute flag
    (SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy.

    Add support for the SETPARAM_POLICY policy, which is already used by the
    sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX
    syscall.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-6-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • Tasks without a user-defined clamp value are considered not clamped
    and by default their utilization can have any value in the
    [0..SCHED_CAPACITY_SCALE] range.

    Tasks with a user-defined clamp value are allowed to request any value
    in that range, and the required clamp is unconditionally enforced.
    However, a "System Management Software" could be interested in limiting
    the range of clamp values allowed for all tasks.

    Add a privileged interface to define a system default configuration via:

    /proc/sys/kernel/sched_uclamp_util_{min,max}

    which works as an unconditional clamp range restriction for all tasks.

    With the default configuration, the full SCHED_CAPACITY_SCALE range of
    values is allowed for each clamp index. Otherwise, the task-specific
    clamp is capped by the corresponding system default value.

    Do that by tracking, for each task, the "effective" clamp value and
    bucket the task has been refcounted in at enqueue time. This
    allows to lazy aggregate "requested" and "system default" values at
    enqueue time and simplifies refcounting updates at dequeue time.

    The cached bucket ids are used to avoid (relatively) more expensive
    integer divisions every time a task is enqueued.

    An active flag is used to report when the "effective" value is valid and
    thus the task is actually refcounted in the corresponding rq's bucket.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • When a task sleeps it removes its max utilization clamp from its CPU.
    However, the blocked utilization on that CPU can be higher than the max
    clamp value enforced while the task was running. This allows undesired
    CPU frequency increases while a CPU is idle, for example, when another
    CPU on the same frequency domain triggers a frequency update, since
    schedutil can now see the full not clamped blocked utilization of the
    idle CPU.

    Fix this by using:

    uclamp_rq_dec_id(p, rq, UCLAMP_MAX)
    uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value)

    to detect when a CPU has no more RUNNABLE clamped tasks and to flag this
    condition.

    Don't track any minimum utilization clamps since an idle CPU never
    requires a minimum frequency. The decay of the blocked utilization is
    good enough to reduce the CPU frequency.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-4-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • Because of bucketization, different task-specific clamp values are
    tracked in the same bucket. For example, with 20% bucket size and
    assuming to have:

    Task1: util_min=25%
    Task2: util_min=35%

    both tasks will be refcounted in the [20..39]% bucket and always boosted
    only up to 20% thus implementing a simple floor aggregation normally
    used in histograms.

    In systems with only few and well-defined clamp values, it would be
    useful to track the exact clamp value required by a task whenever
    possible. For example, if a system requires only 23% and 47% boost
    values then it's possible to track the exact boost required by each
    task using only 3 buckets of ~33% size each.

    Introduce a mechanism to max aggregate the requested clamp values of
    RUNNABLE tasks in the same bucket. Keep it simple by resetting the
    bucket value to its base value only when a bucket becomes inactive.
    Allow a limited and controlled overboosting margin for tasks recounted
    in the same bucket.

    In systems where the boost values are not known in advance, it is still
    possible to control the maximum acceptable overboosting margin by tuning
    the number of clamp groups. For example, 20 groups ensure a 5% maximum
    overboost.

    Remove the rq bucket initialization code since a correct bucket value
    is now computed when a task is refcounted into a CPU's rq.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-3-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • Utilization clamping allows to clamp the CPU's utilization within a
    [util_min, util_max] range, depending on the set of RUNNABLE tasks on
    that CPU. Each task references two "clamp buckets" defining its minimum
    and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
    bucket is active if there is at least one RUNNABLE tasks enqueued on
    that CPU and refcounting that bucket.

    When a task is {en,de}queued {on,from} a rq, the set of active clamp
    buckets on that CPU can change. If the set of active clamp buckets
    changes for a CPU a new "aggregated" clamp value is computed for that
    CPU. This is because each clamp bucket enforces a different utilization
    clamp value.

    Clamp values are always MAX aggregated for both util_min and util_max.
    This ensures that no task can affect the performance of other
    co-scheduled tasks which are more boosted (i.e. with higher util_min
    clamp) or less capped (i.e. with higher util_max clamp).

    A task has:
    task_struct::uclamp[clamp_id]::bucket_id
    to track the "bucket index" of the CPU's clamp bucket it refcounts while
    enqueued, for each clamp index (clamp_id).

    A runqueue has:
    rq::uclamp[clamp_id]::bucket[bucket_id].tasks
    to track how many RUNNABLE tasks on that CPU refcount each
    clamp bucket (bucket_id) of a clamp index (clamp_id).
    It also has a:
    rq::uclamp[clamp_id]::bucket[bucket_id].value
    to track the clamp value of each clamp bucket (bucket_id) of a clamp
    index (clamp_id).

    The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
    needed to find a new MAX aggregated clamp value for a clamp_id. This
    operation is required only when it's dequeued the last task of a clamp
    bucket tracking the current MAX aggregated clamp value. In this case,
    the CPU is either entering IDLE or going to schedule a less boosted or
    more clamped task.
    The expected number of different clamp values configured at build time
    is small enough to fit the full unordered array into a single cache
    line, for configurations of up to 7 buckets.

    Add to struct rq the basic data structures required to refcount the
    number of RUNNABLE tasks for each clamp bucket. Add also the max
    aggregation required to update the rq's clamp value at each
    enqueue/dequeue event.

    Use a simple linear mapping of clamp values into clamp buckets.
    Pre-compute and cache bucket_id to avoid integer divisions at
    enqueue/dequeue time.

    Signed-off-by: Patrick Bellasi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alessio Balsini
    Cc: Dietmar Eggemann
    Cc: Joel Fernandes
    Cc: Juri Lelli
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rafael J . Wysocki
    Cc: Steve Muckle
    Cc: Suren Baghdasaryan
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Cc: Vincent Guittot
    Cc: Viresh Kumar
    Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
    Signed-off-by: Ingo Molnar

    Patrick Bellasi
     
  • The term 'weighted' is not needed since there is no 'unweighted' load.
    Instead use the term 'runnable' to distinguish 'runnable' load
    (avg.runnable_load_avg) used in load balance from load (avg.load_avg)
    which is the sum of 'runnable' and 'blocked' load.

    Signed-off-by: Dietmar Eggemann
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Morten Rasmussen
    Cc: Patrick Bellasi
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Valentin Schneider
    Cc: Vincent Guittot
    Link: https://lkml.kernel.org/r/57f27a7f-2775-d832-e965-0f4d51bb1954@arm.com
    Signed-off-by: Ingo Molnar

    Dietmar Eggemann
     
  • So that external modules can hook into them and extract the info they
    need. Since these new tracepoints have no events associated with them
    exporting these tracepoints make them useful for external modules to
    perform testing and debugging. There's no other way otherwise to access
    them.

    BPF doesn't have infrastructure to access these bare tracepoints either.

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-7-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • The new tracepoint allows us to track the changes in overutilized
    status.

    Overutilized status is associated with EAS. It indicates that the system
    is in high performance state. EAS is disabled when the system is in this
    state since there's not much energy savings while high performance tasks
    are pushing the system to the limit and it's better to default to the
    spreading behavior of the scheduler.

    This tracepoint helps understanding and debugging the conditions under
    which this happens.

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-6-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • The new tracepoint allows tracking PELT signals at sched_entity level.
    Which is supported in CFS tasks and taskgroups only.

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-5-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • The new tracepoints allow tracking PELT signals at rq level for all
    scheduling classes + irq.

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-4-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • The new functions allow modules to access internal data structures of
    unexported struct cfs_rq and struct rq to extract important information
    from the tracepoints to be introduced in later patches.

    While at it fix alphabetical order of struct declarations in sched.h

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-3-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • Remove the #ifdef CONFIG_SCHED_DEBUG.

    Some of the tracepoints to be introduced in later patches need to access
    this function. Hence make it always available since the tracepoints are
    not protected by CONFIG_SCHED_DEBUG.

    Signed-off-by: Qais Yousef
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Pavankumar Kondeti
    Cc: Peter Zijlstra
    Cc: Quentin Perret
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Uwe Kleine-Konig
    Link: https://lkml.kernel.org/r/20190604111459.2862-2-qais.yousef@arm.com
    Signed-off-by: Ingo Molnar

    Qais Yousef
     
  • Statements in the loop's body and before it are identical.
    Use do-while to not repeat it.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com
    Signed-off-by: Ingo Molnar

    Pavel Begunkov
     
  • The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
    unused since commit:

    765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")

    Remove it.

    Signed-off-by: Vincent Guittot
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Viresh Kumar
    Reviewed-by: Valentin Schneider
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: gregkh@linuxfoundation.org
    Cc: linux@armlinux.org.uk
    Cc: quentin.perret@arm.com
    Cc: rafael@kernel.org
    Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org
    Signed-off-by: Ingo Molnar

    Vincent Guittot
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner