18 Mar, 2016

1 commit


09 Feb, 2016

1 commit

  • schedstats is very useful during debugging and performance tuning but it
    incurs overhead to calculate the stats. As such, even though it can be
    disabled at build time, it is often enabled as the information is useful.

    This patch adds a kernel command-line and sysctl tunable to enable or
    disable schedstats on demand (when it's built in). It is disabled
    by default as someone who knows they need it can also learn to enable
    it when necessary.

    The benefits are dependent on how scheduler-intensive the workload is.
    If it is then the patch reduces the number of cycles spent calculating
    the stats with a small benefit from reducing the cache footprint of the
    scheduler.

    These measurements were taken from a 48-core 2-socket
    machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a
    single socket machine 8-core machine with Intel i7-3770 processors.

    netperf-tcp
    4.5.0-rc1 4.5.0-rc1
    vanilla nostats-v3r1
    Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%)
    Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%)
    Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%)
    Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%)
    Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%)
    Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%)
    Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%)
    Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%)
    Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%)

    Small gains here, UDP_STREAM showed nothing intresting and neither did
    the TCP_RR tests. The gains on the 8-core machine were very similar.

    tbench4
    4.5.0-rc1 4.5.0-rc1
    vanilla nostats-v3r1
    Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%)
    Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%)
    Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%)
    Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%)
    Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%)
    Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%)
    Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%)
    Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%)
    Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%)
    Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%)
    Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%)
    Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%)
    Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%)

    Small gains of 2-4% at low thread counts and otherwise flat. The
    gains on the 8-core machine were slightly different

    tbench4 on 8-core i7-3770 single socket machine
    Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%)
    Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%)
    Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%)
    Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%)
    Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%)
    Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%)
    Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%)
    Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%)
    Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%)
    Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%)

    In constract, this shows a relatively steady 2-3% gain at higher thread
    counts. Due to the nature of the patch and the type of workload, it's
    not a surprise that the result will depend on the CPU used.

    hackbench-pipes
    4.5.0-rc1 4.5.0-rc1
    vanilla nostats-v3r1
    Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%)
    Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%)
    Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%)
    Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%)
    Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%)
    Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%)
    Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%)
    Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%)
    Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%)
    Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%)
    Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%)
    Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%)

    Some small gains and losses and while the variance data is not included,
    it's close to the noise. The UMA machine did not show anything particularly
    different

    pipetest
    4.5.0-rc1 4.5.0-rc1
    vanilla nostats-v2r2
    Min Time 4.13 ( 0.00%) 3.99 ( 3.39%)
    1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%)
    2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%)
    3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%)
    Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%)
    Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%)
    Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%)
    Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%)
    Max Time 4.93 ( 0.00%) 4.83 ( 2.03%)
    Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%)
    Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%)
    Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%)
    Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%)
    Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%)
    Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%)
    Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%)
    Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%)

    Small improvement and similar gains were seen on the UMA machine.

    The gain is small but it stands to reason that doing less work in the
    scheduler is a good thing. The downside is that the lack of schedstats and
    tracepoints may be surprising to experts doing performance analysis until
    they find the existence of the schedstats= parameter or schedstats sysctl.
    It will be automatically activated for latencytop and sleep profiling to
    alleviate the problem. For tracepoints, there is a simple warning as it's
    not safe to activate schedstats in the context when it's known the tracepoint
    may be wanted but is unavailable.

    Signed-off-by: Mel Gorman
    Reviewed-by: Matt Fleming
    Reviewed-by: Srikar Dronamraju
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Ingo Molnar

    Mel Gorman
     

23 Sep, 2015

1 commit

  • Move dl_time_before() static definition in include/linux/sched/deadline.h
    so that it can be used by different parties without being re-defined.

    Reported-by: Luca Abeni
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1441188096-23021-3-git-send-email-juri.lelli@arm.com
    Signed-off-by: Ingo Molnar

    Juri Lelli
     

19 Jun, 2015

1 commit

  • Eric reported that the timer_migration sysctl is not really nice
    performance wise as it needs to check at every timer insertion whether
    the feature is enabled or not. Further the check does not live in the
    timer code, so we have an extra function call which checks an extra
    cache line to figure out that it is disabled.

    We can do better and store that information in the per cpu (hr)timer
    bases. I pondered to use a static key, but that's a nightmare to
    update from the nohz code and the timer base cache line is hot anyway
    when we select a timer base.

    The old logic enabled the timer migration unconditionally if
    CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
    line.

    With this modification, we start off with migration disabled. The user
    visible sysctl is still set to enabled. If the kernel switches to NOHZ
    migration is enabled, if the user did not disable it via the sysctl
    prior to the switch. If nohz=off is on the kernel command line,
    migration stays disabled no matter what.

    Before:
    47.76% hog [.] main
    14.84% [kernel] [k] _raw_spin_lock_irqsave
    9.55% [kernel] [k] _raw_spin_unlock_irqrestore
    6.71% [kernel] [k] mod_timer
    6.24% [kernel] [k] lock_timer_base.isra.38
    3.76% [kernel] [k] detach_if_pending
    3.71% [kernel] [k] del_timer
    2.50% [kernel] [k] internal_add_timer
    1.51% [kernel] [k] get_nohz_timer_target
    1.28% [kernel] [k] __internal_add_timer
    0.78% [kernel] [k] timerfn
    0.48% [kernel] [k] wake_up_nohz_cpu

    After:
    48.10% hog [.] main
    15.25% [kernel] [k] _raw_spin_lock_irqsave
    9.76% [kernel] [k] _raw_spin_unlock_irqrestore
    6.50% [kernel] [k] mod_timer
    6.44% [kernel] [k] lock_timer_base.isra.38
    3.87% [kernel] [k] detach_if_pending
    3.80% [kernel] [k] del_timer
    2.67% [kernel] [k] internal_add_timer
    1.33% [kernel] [k] __internal_add_timer
    0.73% [kernel] [k] timerfn
    0.54% [kernel] [k] wake_up_nohz_cpu

    Reported-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Frederic Weisbecker
    Cc: Viresh Kumar
    Cc: John Stultz
    Cc: Joonwoo Park
    Cc: Wenbo Wang
    Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

08 May, 2015

1 commit

  • Ronny reported that the following scenario is not handled correctly:

    T1 (prio = 10)
    lock(rtmutex);

    T2 (prio = 20)
    lock(rtmutex)
    boost T1

    T1 (prio = 20)
    sys_set_scheduler(prio = 30)
    T1 prio = 30
    ....
    sys_set_scheduler(prio = 10)
    T1 prio = 30

    The last step is wrong as T1 should now be back at prio 20.

    Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
    only handles the case where a boosted tasks tries to lower its
    priority.

    Fix it by taking the new effective priority into account for the
    decision whether a change of the priority is required.

    Reported-by: Ronny Meeus
    Tested-by: Steven Rostedt
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt
    Cc:
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Mike Galbraith
    Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

05 Jun, 2014

1 commit


22 May, 2014

1 commit


23 Feb, 2014

3 commits

  • Currently there is lots of hard coding to 19 and -20, to represent
    maximum and minimum of nice values.

    This patch add three macros in prio.h for maximum, minimum and width
    of nice value, and uses it to remove hardcoded values in prio.h.

    Signed-off-by: Dongsheng Yang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/3994e89327b2b15f992277cdf9f409c516f87d1b.1392103744.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner
    [ Collapsed two small patches. ]
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • There is already a macro named DEFAULT_PRIO in prio.h, we can use it
    to define NICE_TO_PRIO and PRIO_TO_NICE rather than use hard coding
    of (MAX_RT_PRIO + 20).

    Signed-off-by: Dongsheng Yang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4e28ec36fb49e8906027cbbdd900ab26a149905e.1392103744.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • If a PI boosted task policy/priority is modified by a setscheduler()
    call we unconditionally dequeue and requeue the task if it is on the
    runqueue even if the new priority is lower than the current effective
    boosted priority. This can result in undesired reordering of the
    priority bucket list.

    If the new priority is less or equal than the current effective we
    just store the new parameters in the task struct and leave the
    scheduler class and the runqueue untouched. This is handled when the
    task deboosts itself. Only if the new priority is higher than the
    effective boosted priority we apply the change immediately.

    Signed-off-by: Thomas Gleixner
    [ Rebase ontop of v3.14-rc1. ]
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Dario Faggioli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

09 Feb, 2014

3 commits

  • As patch "sched: Move the priority specific bits into a new header file" exposes
    the priority related macros in linux/sched/prio.h, we don't have to implement
    task_nice() in kernel/sched/core.c any more.

    This patch implements it in linux/sched/sched.h as static inline function,
    saving the kernel stack and enhancing performance a bit.

    Signed-off-by: Dongsheng Yang
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1390878045-7096-1-git-send-email-yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • Some macros in kernel/sched/sched.h about priority are
    private to kernel/sched. But they are useful to other
    parts of the core kernel.

    This patch moves these macros from kernel/sched/sched.h to
    include/linux/sched/prio.h so that they are available to
    other subsystems.

    Signed-off-by: Dongsheng Yang
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/2b022810905b52d13238466807f4b2a691577180.1390859827.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • Some bits about priority are defined in linux/sched/rt.h, but
    some of them are not only for rt scheduler, such as MAX_PRIO.

    This patch move them all into a new header file, linux/sched/prio.h.

    Signed-off-by: Dongsheng Yang
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/f7549508a1588da2c613d601748ca9de30fa5dcf.1390859827.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     

01 Feb, 2014

1 commit

  • Pull core debug changes from Ingo Molnar:
    "This contains mostly kernel debugging related updates:

    - make hung_task detection more configurable to distros
    - add final bits for x86 UV NMI debugging, with related KGDB changes
    - update the mailing-list of MAINTAINERS entries I'm involved with"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    hung_task: Display every hung task warning
    sysctl: Add neg_one as a standard constraint
    x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
    x86/uv/nmi: Fix Sparse warnings
    kgdb/kdb: Fix no KDB config problem
    MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries

    Linus Torvalds
     

25 Jan, 2014

1 commit

  • When khungtaskd detects hung tasks, it prints out
    backtraces from a number of those tasks.

    Limiting the number of backtraces being printed
    out can result in the user not seeing the information
    necessary to debug the issue. The hung_task_warnings
    sysctl controls this feature.

    This patch makes it possible for hung_task_warnings
    to accept a special value to print an unlimited
    number of backtraces when khungtaskd detects hung
    tasks.

    The special value is -1. To use this value it is
    necessary to change types from ulong to int.

    Signed-off-by: Aaron Tomlin
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Cc: oleg@redhat.com
    Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
    [ Build warning fix. ]
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

24 Jan, 2014

1 commit

  • Add a working sysctl to enable/disable automatic numa memory balancing
    at runtime.

    This allows us to track down performance problems with this feature and
    is generally a good idea.

    This was possible earlier through debugfs, but only with special
    debugging options set. Also fix the boot message.

    [akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
    Signed-off-by: Andi Kleen
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

13 Jan, 2014

4 commits

  • Remove the deadline specific sysctls for now. The problem with them is
    that the interaction with the exisiting rt knobs is nearly impossible
    to get right.

    The current (as per before this patch) situation is that the rt and dl
    bandwidth is completely separate and we enforce rt+dl < 100%. This is
    undesirable because this means that the rt default of 95% leaves us
    hardly any room, even though dl tasks are saver than rt tasks.

    Another proposed solution was (a discarted patch) to have the dl
    bandwidth be a fraction of the rt bandwidth. This is highly
    confusing imo.

    Furthermore neither proposal is consistent with the situation we
    actually want; which is rt tasks ran from a dl server. In which case
    the rt bandwidth is a direct subset of dl.

    So whichever way we go, the introduction of dl controls at this point
    is painful. Therefore remove them and instead share the rt budget.

    This means that for now the rt knobs are used for dl admission control
    and the dl runtime is accounted against the rt runtime. I realise that
    this isn't entirely desirable either; but whatever we do we appear to
    need to change the interface later, so better have a small interface
    for now.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order of deadline scheduling to be effective and useful, it is
    important that some method of having the allocation of the available
    CPU bandwidth to tasks and task groups under control.
    This is usually called "admission control" and if it is not performed
    at all, no guarantee can be given on the actual scheduling of the
    -deadline tasks.

    Since when RT-throttling has been introduced each task group have a
    bandwidth associated to itself, calculated as a certain amount of
    runtime over a period. Moreover, to make it possible to manipulate
    such bandwidth, readable/writable controls have been added to both
    procfs (for system wide settings) and cgroupfs (for per-group
    settings).

    Therefore, the same interface is being used for controlling the
    bandwidth distrubution to -deadline tasks and task groups, i.e.,
    new controls but with similar names, equivalent meaning and with
    the same usage paradigm are added.

    However, more discussion is needed in order to figure out how
    we want to manage SCHED_DEADLINE bandwidth at the task group level.
    Therefore, this patch adds a less sophisticated, but actually
    very sensible, mechanism to ensure that a certain utilization
    cap is not overcome per each root_domain (the single rq for !SMP
    configurations).

    Another main difference between deadline bandwidth management and
    RT-throttling is that -deadline tasks have bandwidth on their own
    (while -rt ones doesn't!), and thus we don't need an higher level
    throttling mechanism to enforce the desired bandwidth.

    This patch, therefore:

    - adds system wide deadline bandwidth management by means of:
    * /proc/sys/kernel/sched_dl_runtime_us,
    * /proc/sys/kernel/sched_dl_period_us,
    that determine (i.e., runtime / period) the total bandwidth
    available on each CPU of each root_domain for -deadline tasks;

    - couples the RT and deadline bandwidth management, i.e., enforces
    that the sum of how much bandwidth is being devoted to -rt
    -deadline tasks to stay below 100%.

    This means that, for a root_domain comprising M CPUs, -deadline tasks
    can be created until the sum of their bandwidths stay below:

    M * (sched_dl_runtime_us / sched_dl_period_us)

    It is also possible to disable this bandwidth management logic, and
    be thus free of oversubscribing the system up to any arbitrary level.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     
  • Some method to deal with rt-mutexes and make sched_dl interact with
    the current PI-coded is needed, raising all but trivial issues, that
    needs (according to us) to be solved with some restructuring of
    the pi-code (i.e., going toward a proxy execution-ish implementation).

    This is under development, in the meanwhile, as a temporary solution,
    what this commits does is:

    - ensure a pi-lock owner with waiters is never throttled down. Instead,
    when it runs out of runtime, it immediately gets replenished and it's
    deadline is postponed;

    - the scheduling parameters (relative deadline and default runtime)
    used for that replenishments --during the whole period it holds the
    pi-lock-- are the ones of the waiting task with earliest deadline.

    Acting this way, we provide some kind of boosting to the lock-owner,
    still by using the existing (actually, slightly modified by the previous
    commit) pi-architecture.

    We would stress the fact that this is only a surely needed, all but
    clean solution to the problem. In the end it's only a way to re-start
    discussion within the community. So, as always, comments, ideas, rants,
    etc.. are welcome! :-)

    Signed-off-by: Dario Faggioli
    Signed-off-by: Juri Lelli
    [ Added !RT_MUTEXES build fix. ]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     
  • Introduces the data structures, constants and symbols needed for
    SCHED_DEADLINE implementation.

    Core data structure of SCHED_DEADLINE are defined, along with their
    initializers. Hooks for checking if a task belong to the new policy
    are also added where they are needed.

    Adds a scheduling class, in sched/dl.c and a new policy called
    SCHED_DEADLINE. It is an implementation of the Earliest Deadline
    First (EDF) scheduling algorithm, augmented with a mechanism (called
    Constant Bandwidth Server, CBS) that makes it possible to isolate
    the behaviour of tasks between each other.

    The typical -deadline task will be made up of a computation phase
    (instance) which is activated on a periodic or sporadic fashion. The
    expected (maximum) duration of such computation is called the task's
    runtime; the time interval by which each instance need to be completed
    is called the task's relative deadline. The task's absolute deadline
    is dynamically calculated as the time instant a task (better, an
    instance) activates plus the relative deadline.

    The EDF algorithms selects the task with the smallest absolute
    deadline as the one to be executed first, while the CBS ensures each
    task to run for at most its runtime every (relative) deadline
    length time interval, avoiding any interference between different
    tasks (bandwidth isolation).
    Thanks to this feature, also tasks that do not strictly comply with
    the computational model sketched above can effectively use the new
    policy.

    To summarize, this patch:
    - introduces the data structures, constants and symbols needed;
    - implements the core logic of the scheduling algorithm in the new
    scheduling class file;
    - provides all the glue code between the new scheduling class and
    the core scheduler and refines the interactions between sched/dl
    and the other existing scheduling classes.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Michael Trimarchi
    Signed-off-by: Fabio Checconi
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     

17 Dec, 2013

1 commit

  • commit 887c290e (sched/numa: Decide whether to favour task or group weights
    based on swap candidate relationships) drop the check against
    sysctl_numa_balancing_settle_count, this patch remove the sysctl.

    Signed-off-by: Wanpeng Li
    Acked-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Naoya Horiguchi
    Link: http://lkml.kernel.org/r/1386833006-6600-1-git-send-email-liwanp@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Wanpeng Li
     

06 Nov, 2013

1 commit


09 Oct, 2013

1 commit

  • With scan rate adaptions based on whether the workload has properly
    converged or not there should be no need for the scan period reset
    hammer. Get rid of it.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Mel Gorman
     

23 Sep, 2013

1 commit

  • As 'sysctl_hung_task_check_count' is 'unsigned long' when this
    value is assigned to max_count in check_hung_uninterruptible_tasks(),
    it's truncated to 'int' type.

    This causes a minor artifact: if we write 2^32 to sysctl.hung_task_check_count,
    hung task detection will be effectively disabled.

    With this fix, it will still truncate the user input to 32 bits, but
    reading sysctl.hung_task_check_count reflects the actual truncated value.

    Signed-off-by: Li Zefan
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/523FFF4E.9050401@huawei.com
    Signed-off-by: Ingo Molnar

    Li Zefan
     

23 Feb, 2013

1 commit


08 Feb, 2013

3 commits

  • Move rt scheduler definitions out of include/linux/sched.h into
    new file include/linux/sched/rt.h

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     
  • Add a /proc/sys/kernel scheduler knob named
    sched_rr_timeslice_ms that allows global changing of the
    SCHED_RR timeslice value. User visable value is in milliseconds
    but is stored as jiffies. Setting to 0 (zero) resets to the
    default (currently 100ms).

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094704.13751796@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     
  • Move the sysctl-related bits from include/linux/sched.h into
    a new file: include/linux/sched/sysctl.h. Then update source
    files requiring access to those bits by including the new
    header file.

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams