24 Nov, 2014

1 commit

  • Chris bisected a NULL pointer deference in task_sched_runtime() to
    commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
    inconsistency'.

    Chris observed crashes in atop or other /proc walking programs when he
    started fork bombs on his machine. He assumed that this is a new exit
    race, but that does not make any sense when looking at that commit.

    What's interesting is that, the commit provides update_curr callbacks
    for all scheduling classes except stop_task and idle_task.

    While nothing can ever hit that via the clock_nanosleep() and
    clock_gettime() interfaces, which have been the target of the commit in
    question, the author obviously forgot that there are other code paths
    which invoke task_sched_runtime()

    do_task_stat(()
    thread_group_cputime_adjusted()
    thread_group_cputime()
    task_cputime()
    task_sched_runtime()
    if (task_current(rq, p) && task_on_rq_queued(p)) {
    update_rq_clock(rq);
    up->sched_class->update_curr(rq);
    }

    If the stats are read for a stomp machine task, aka 'migration/N' and
    that task is current on its cpu, this will happily call the NULL pointer
    of stop_task->update_curr. Ooops.

    Chris observation that this happens faster when he runs the fork bomb
    makes sense as the fork bomb will kick migration threads more often so
    the probability to hit the issue will increase.

    Add the missing update_curr callbacks to the scheduler classes stop_task
    and idle_task. While idle tasks cannot be monitored via /proc we have
    other means to hit the idle case.

    Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
    Reported-by: Chris Mason
    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Stanislaw Gruszka
    Cc: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

20 Aug, 2014

1 commit

  • Implement task_on_rq_queued() and use it everywhere instead of
    on_rq check. No functional changes.

    The only exception is we do not use the wrapper in
    check_for_tasks(), because it requires to export
    task_on_rq_queued() in global header files. Next patch in series
    would return it back, so we do not twist it from here to there.

    Signed-off-by: Kirill Tkhai
    Cc: Peter Zijlstra
    Cc: Paul Turner
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Mike Galbraith
    Cc: Kirill Tkhai
    Cc: Tim Chen
    Cc: Nicolas Pitre
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     

22 May, 2014

1 commit

  • Sometimes ->nr_running may cross 2 but interrupt is not being
    sent to rq's cpu. In this case we don't reenable the timer.
    Looks like this may be the reason for rare unexpected effects,
    if nohz is enabled.

    Patch replaces all places of direct changing of nr_running
    and makes add_nr_running() caring about crossing border.

    Signed-off-by: Kirill Tkhai
    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     

22 Feb, 2014

1 commit

  • Dan Carpenter reported:

    > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338)
    > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005)

    Kirill also spotted that migrate_tasks() will have an instant NULL
    deref because pick_next_task() will immediately deref prev.

    Instead of fixing all the corner cases because migrate_tasks() can
    pass in a NULL prev task in the unlikely case of hot-un-plug, provide
    a fake task such that we can remove all the NULL checks from the far
    more common paths.

    A further problem; not previously spotted; is that because we pushed
    pre_schedule() and idle_balance() into pick_next_task() we now need to
    avoid those getting called and pulling more tasks on our dying CPU.

    We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1.
    We also note that since we call pick_next_task() exactly the amount of
    times we have runnable tasks present, we should never land in
    idle_balance().

    Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()")
    Cc: Juri Lelli
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Reported-by: Kirill Tkhai
    Reported-by: Dan Carpenter
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.net
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

10 Feb, 2014

1 commit

  • In order to avoid having to do put/set on a whole cgroup hierarchy
    when we context switch, push the put into pick_next_task() so that
    both operations are in the same function. Further changes then allow
    us to possibly optimize away redundant work.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Jan, 2014

1 commit

  • Introduces the data structures, constants and symbols needed for
    SCHED_DEADLINE implementation.

    Core data structure of SCHED_DEADLINE are defined, along with their
    initializers. Hooks for checking if a task belong to the new policy
    are also added where they are needed.

    Adds a scheduling class, in sched/dl.c and a new policy called
    SCHED_DEADLINE. It is an implementation of the Earliest Deadline
    First (EDF) scheduling algorithm, augmented with a mechanism (called
    Constant Bandwidth Server, CBS) that makes it possible to isolate
    the behaviour of tasks between each other.

    The typical -deadline task will be made up of a computation phase
    (instance) which is activated on a periodic or sporadic fashion. The
    expected (maximum) duration of such computation is called the task's
    runtime; the time interval by which each instance need to be completed
    is called the task's relative deadline. The task's absolute deadline
    is dynamically calculated as the time instant a task (better, an
    instance) activates plus the relative deadline.

    The EDF algorithms selects the task with the smallest absolute
    deadline as the one to be executed first, while the CBS ensures each
    task to run for at most its runtime every (relative) deadline
    length time interval, avoiding any interference between different
    tasks (bandwidth isolation).
    Thanks to this feature, also tasks that do not strictly comply with
    the computational model sketched above can effectively use the new
    policy.

    To summarize, this patch:
    - introduces the data structures, constants and symbols needed;
    - implements the core logic of the scheduling algorithm in the new
    scheduling class file;
    - provides all the glue code between the new scheduling class and
    the core scheduler and refines the interactions between sched/dl
    and the other existing scheduling classes.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Michael Trimarchi
    Signed-off-by: Fabio Checconi
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     

09 Oct, 2013

1 commit

  • Use the new stop_two_cpus() to implement migrate_swap(), a function that
    flips two tasks between their respective cpus.

    I'm fairly sure there's a less crude way than employing the stop_two_cpus()
    method, but everything I tried either got horribly fragile and/or complex. So
    keep it simple for now.

    The notable detail is how we 'migrate' tasks that aren't runnable
    anymore. We'll make it appear like we migrated them before they went to
    sleep. The sole difference is the previous cpu in the wakeup path, so we
    override this.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Mel Gorman
    Link: http://lkml.kernel.org/r/1381141781-10992-39-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 May, 2013

1 commit

  • Read the runqueue clock through an accessor. This
    prepares for adding a debugging infrastructure to
    detect missing or redundant calls to update_rq_clock()
    between a scheduler's entry and exit point.

    Signed-off-by: Frederic Weisbecker
    Cc: Li Zhong
    Cc: Steven Rostedt
    Cc: Paul Turner
    Cc: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1365724262-20142-6-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

14 Aug, 2012

1 commit

  • Make stop scheduler class do the same accounting as other classes,

    Migration threads can be caught in the act while doing exec balancing,
    leading to the below due to use of unmaintained ->se.exec_start. The
    load that triggered this particular instance was an apparently out of
    control heavily threaded application that does system monitoring in
    what equated to an exec bomb, with one of the VERY frequently migrated
    tasks being ps.

    %CPU PID USER CMD
    99.3 45 root [migration/10]
    97.7 53 root [migration/12]
    97.0 57 root [migration/13]
    90.1 49 root [migration/11]
    89.6 65 root [migration/15]
    88.7 17 root [migration/3]
    80.4 37 root [migration/8]
    78.1 41 root [migration/9]
    44.2 13 root [migration/2]

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     

17 Nov, 2011

1 commit