23 Feb, 2014

3 commits

  • Currently there is lots of hard coding to 19 and -20, to represent
    maximum and minimum of nice values.

    This patch add three macros in prio.h for maximum, minimum and width
    of nice value, and uses it to remove hardcoded values in prio.h.

    Signed-off-by: Dongsheng Yang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/3994e89327b2b15f992277cdf9f409c516f87d1b.1392103744.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner
    [ Collapsed two small patches. ]
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • There is already a macro named DEFAULT_PRIO in prio.h, we can use it
    to define NICE_TO_PRIO and PRIO_TO_NICE rather than use hard coding
    of (MAX_RT_PRIO + 20).

    Signed-off-by: Dongsheng Yang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4e28ec36fb49e8906027cbbdd900ab26a149905e.1392103744.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • If a PI boosted task policy/priority is modified by a setscheduler()
    call we unconditionally dequeue and requeue the task if it is on the
    runqueue even if the new priority is lower than the current effective
    boosted priority. This can result in undesired reordering of the
    priority bucket list.

    If the new priority is less or equal than the current effective we
    just store the new parameters in the task struct and leave the
    scheduler class and the runqueue untouched. This is handled when the
    task deboosts itself. Only if the new priority is higher than the
    effective boosted priority we apply the change immediately.

    Signed-off-by: Thomas Gleixner
    [ Rebase ontop of v3.14-rc1. ]
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Dario Faggioli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

09 Feb, 2014

3 commits

  • As patch "sched: Move the priority specific bits into a new header file" exposes
    the priority related macros in linux/sched/prio.h, we don't have to implement
    task_nice() in kernel/sched/core.c any more.

    This patch implements it in linux/sched/sched.h as static inline function,
    saving the kernel stack and enhancing performance a bit.

    Signed-off-by: Dongsheng Yang
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1390878045-7096-1-git-send-email-yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • Some macros in kernel/sched/sched.h about priority are
    private to kernel/sched. But they are useful to other
    parts of the core kernel.

    This patch moves these macros from kernel/sched/sched.h to
    include/linux/sched/prio.h so that they are available to
    other subsystems.

    Signed-off-by: Dongsheng Yang
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/2b022810905b52d13238466807f4b2a691577180.1390859827.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • Some bits about priority are defined in linux/sched/rt.h, but
    some of them are not only for rt scheduler, such as MAX_PRIO.

    This patch move them all into a new header file, linux/sched/prio.h.

    Signed-off-by: Dongsheng Yang
    Cc: clark.williams@gmail.com
    Cc: rostedt@goodmis.org
    Cc: raistlin@linux.it
    Cc: juri.lelli@gmail.com
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/f7549508a1588da2c613d601748ca9de30fa5dcf.1390859827.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     

01 Feb, 2014

1 commit

  • Pull core debug changes from Ingo Molnar:
    "This contains mostly kernel debugging related updates:

    - make hung_task detection more configurable to distros
    - add final bits for x86 UV NMI debugging, with related KGDB changes
    - update the mailing-list of MAINTAINERS entries I'm involved with"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    hung_task: Display every hung task warning
    sysctl: Add neg_one as a standard constraint
    x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
    x86/uv/nmi: Fix Sparse warnings
    kgdb/kdb: Fix no KDB config problem
    MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries

    Linus Torvalds
     

25 Jan, 2014

1 commit

  • When khungtaskd detects hung tasks, it prints out
    backtraces from a number of those tasks.

    Limiting the number of backtraces being printed
    out can result in the user not seeing the information
    necessary to debug the issue. The hung_task_warnings
    sysctl controls this feature.

    This patch makes it possible for hung_task_warnings
    to accept a special value to print an unlimited
    number of backtraces when khungtaskd detects hung
    tasks.

    The special value is -1. To use this value it is
    necessary to change types from ulong to int.

    Signed-off-by: Aaron Tomlin
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Cc: oleg@redhat.com
    Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
    [ Build warning fix. ]
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

24 Jan, 2014

1 commit

  • Add a working sysctl to enable/disable automatic numa memory balancing
    at runtime.

    This allows us to track down performance problems with this feature and
    is generally a good idea.

    This was possible earlier through debugfs, but only with special
    debugging options set. Also fix the boot message.

    [akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
    Signed-off-by: Andi Kleen
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

13 Jan, 2014

4 commits

  • Remove the deadline specific sysctls for now. The problem with them is
    that the interaction with the exisiting rt knobs is nearly impossible
    to get right.

    The current (as per before this patch) situation is that the rt and dl
    bandwidth is completely separate and we enforce rt+dl < 100%. This is
    undesirable because this means that the rt default of 95% leaves us
    hardly any room, even though dl tasks are saver than rt tasks.

    Another proposed solution was (a discarted patch) to have the dl
    bandwidth be a fraction of the rt bandwidth. This is highly
    confusing imo.

    Furthermore neither proposal is consistent with the situation we
    actually want; which is rt tasks ran from a dl server. In which case
    the rt bandwidth is a direct subset of dl.

    So whichever way we go, the introduction of dl controls at this point
    is painful. Therefore remove them and instead share the rt budget.

    This means that for now the rt knobs are used for dl admission control
    and the dl runtime is accounted against the rt runtime. I realise that
    this isn't entirely desirable either; but whatever we do we appear to
    need to change the interface later, so better have a small interface
    for now.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order of deadline scheduling to be effective and useful, it is
    important that some method of having the allocation of the available
    CPU bandwidth to tasks and task groups under control.
    This is usually called "admission control" and if it is not performed
    at all, no guarantee can be given on the actual scheduling of the
    -deadline tasks.

    Since when RT-throttling has been introduced each task group have a
    bandwidth associated to itself, calculated as a certain amount of
    runtime over a period. Moreover, to make it possible to manipulate
    such bandwidth, readable/writable controls have been added to both
    procfs (for system wide settings) and cgroupfs (for per-group
    settings).

    Therefore, the same interface is being used for controlling the
    bandwidth distrubution to -deadline tasks and task groups, i.e.,
    new controls but with similar names, equivalent meaning and with
    the same usage paradigm are added.

    However, more discussion is needed in order to figure out how
    we want to manage SCHED_DEADLINE bandwidth at the task group level.
    Therefore, this patch adds a less sophisticated, but actually
    very sensible, mechanism to ensure that a certain utilization
    cap is not overcome per each root_domain (the single rq for !SMP
    configurations).

    Another main difference between deadline bandwidth management and
    RT-throttling is that -deadline tasks have bandwidth on their own
    (while -rt ones doesn't!), and thus we don't need an higher level
    throttling mechanism to enforce the desired bandwidth.

    This patch, therefore:

    - adds system wide deadline bandwidth management by means of:
    * /proc/sys/kernel/sched_dl_runtime_us,
    * /proc/sys/kernel/sched_dl_period_us,
    that determine (i.e., runtime / period) the total bandwidth
    available on each CPU of each root_domain for -deadline tasks;

    - couples the RT and deadline bandwidth management, i.e., enforces
    that the sum of how much bandwidth is being devoted to -rt
    -deadline tasks to stay below 100%.

    This means that, for a root_domain comprising M CPUs, -deadline tasks
    can be created until the sum of their bandwidths stay below:

    M * (sched_dl_runtime_us / sched_dl_period_us)

    It is also possible to disable this bandwidth management logic, and
    be thus free of oversubscribing the system up to any arbitrary level.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     
  • Some method to deal with rt-mutexes and make sched_dl interact with
    the current PI-coded is needed, raising all but trivial issues, that
    needs (according to us) to be solved with some restructuring of
    the pi-code (i.e., going toward a proxy execution-ish implementation).

    This is under development, in the meanwhile, as a temporary solution,
    what this commits does is:

    - ensure a pi-lock owner with waiters is never throttled down. Instead,
    when it runs out of runtime, it immediately gets replenished and it's
    deadline is postponed;

    - the scheduling parameters (relative deadline and default runtime)
    used for that replenishments --during the whole period it holds the
    pi-lock-- are the ones of the waiting task with earliest deadline.

    Acting this way, we provide some kind of boosting to the lock-owner,
    still by using the existing (actually, slightly modified by the previous
    commit) pi-architecture.

    We would stress the fact that this is only a surely needed, all but
    clean solution to the problem. In the end it's only a way to re-start
    discussion within the community. So, as always, comments, ideas, rants,
    etc.. are welcome! :-)

    Signed-off-by: Dario Faggioli
    Signed-off-by: Juri Lelli
    [ Added !RT_MUTEXES build fix. ]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     
  • Introduces the data structures, constants and symbols needed for
    SCHED_DEADLINE implementation.

    Core data structure of SCHED_DEADLINE are defined, along with their
    initializers. Hooks for checking if a task belong to the new policy
    are also added where they are needed.

    Adds a scheduling class, in sched/dl.c and a new policy called
    SCHED_DEADLINE. It is an implementation of the Earliest Deadline
    First (EDF) scheduling algorithm, augmented with a mechanism (called
    Constant Bandwidth Server, CBS) that makes it possible to isolate
    the behaviour of tasks between each other.

    The typical -deadline task will be made up of a computation phase
    (instance) which is activated on a periodic or sporadic fashion. The
    expected (maximum) duration of such computation is called the task's
    runtime; the time interval by which each instance need to be completed
    is called the task's relative deadline. The task's absolute deadline
    is dynamically calculated as the time instant a task (better, an
    instance) activates plus the relative deadline.

    The EDF algorithms selects the task with the smallest absolute
    deadline as the one to be executed first, while the CBS ensures each
    task to run for at most its runtime every (relative) deadline
    length time interval, avoiding any interference between different
    tasks (bandwidth isolation).
    Thanks to this feature, also tasks that do not strictly comply with
    the computational model sketched above can effectively use the new
    policy.

    To summarize, this patch:
    - introduces the data structures, constants and symbols needed;
    - implements the core logic of the scheduling algorithm in the new
    scheduling class file;
    - provides all the glue code between the new scheduling class and
    the core scheduler and refines the interactions between sched/dl
    and the other existing scheduling classes.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Michael Trimarchi
    Signed-off-by: Fabio Checconi
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     

17 Dec, 2013

1 commit

  • commit 887c290e (sched/numa: Decide whether to favour task or group weights
    based on swap candidate relationships) drop the check against
    sysctl_numa_balancing_settle_count, this patch remove the sysctl.

    Signed-off-by: Wanpeng Li
    Acked-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Naoya Horiguchi
    Link: http://lkml.kernel.org/r/1386833006-6600-1-git-send-email-liwanp@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Wanpeng Li
     

06 Nov, 2013

1 commit


09 Oct, 2013

1 commit

  • With scan rate adaptions based on whether the workload has properly
    converged or not there should be no need for the scan period reset
    hammer. Get rid of it.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Mel Gorman
     

23 Sep, 2013

1 commit

  • As 'sysctl_hung_task_check_count' is 'unsigned long' when this
    value is assigned to max_count in check_hung_uninterruptible_tasks(),
    it's truncated to 'int' type.

    This causes a minor artifact: if we write 2^32 to sysctl.hung_task_check_count,
    hung task detection will be effectively disabled.

    With this fix, it will still truncate the user input to 32 bits, but
    reading sysctl.hung_task_check_count reflects the actual truncated value.

    Signed-off-by: Li Zefan
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/523FFF4E.9050401@huawei.com
    Signed-off-by: Ingo Molnar

    Li Zefan
     

23 Feb, 2013

1 commit


08 Feb, 2013

3 commits

  • Move rt scheduler definitions out of include/linux/sched.h into
    new file include/linux/sched/rt.h

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     
  • Add a /proc/sys/kernel scheduler knob named
    sched_rr_timeslice_ms that allows global changing of the
    SCHED_RR timeslice value. User visable value is in milliseconds
    but is stored as jiffies. Setting to 0 (zero) resets to the
    default (currently 100ms).

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094704.13751796@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     
  • Move the sysctl-related bits from include/linux/sched.h into
    a new file: include/linux/sched/sysctl.h. Then update source
    files requiring access to those bits by including the new
    header file.

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams