24 Oct, 2012

7 commits

  • Now that the machinery in place is in place to compute contributed load in a
    bottom up fashion; replace the shares distribution code within update_shares()
    accordingly.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141507.061208672@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • Entities of equal weight should receive equitable distribution of cpu time.
    This is challenging in the case of a task_group's shares as execution may be
    occurring on multiple cpus simultaneously.

    To handle this we divide up the shares into weights proportionate with the load
    on each cfs_rq. This does not however, account for the fact that the sum of
    the parts may be less than one cpu and so we need to normalize:
    load(tg) = min(runnable_avg(tg), 1) * tg->shares
    Where runnable_avg is the aggregate time in which the task_group had runnable
    children.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall .
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.930124292@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • Maintain a global running sum of the average load seen on each cfs_rq belonging
    to each task group so that it may be used in calculating an appropriate
    shares:weight distribution.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.792901086@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • We are currently maintaining:

    runnable_load(cfs_rq) = \Sum task_load(t)

    For all running children t of cfs_rq. While this can be naturally updated for
    tasks in a runnable state (as they are scheduled); this does not account for
    the load contributed by blocked task entities.

    This can be solved by introducing a separate accounting for blocked load:

    blocked_load(cfs_rq) = \Sum runnable(b) * weight(b)

    Obviously we do not want to iterate over all blocked entities to account for
    their decay, we instead observe that:

    runnable_load(t) = \Sum p_i*y^i

    and that to account for an additional idle period we only need to compute:

    y*runnable_load(t).

    This means that we can compute all blocked entities at once by evaluating:

    blocked_load(cfs_rq)` = y * blocked_load(cfs_rq)

    Finally we maintain a decay counter so that when a sleeping entity re-awakens
    we can determine how much of its load should be removed from the blocked sum.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.585389902@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • For a given task t, we can compute its contribution to load as:

    task_load(t) = runnable_avg(t) * weight(t)

    On a parenting cfs_rq we can then aggregate:

    runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t

    Maintain this bottom up, with task entities adding their contributed load to
    the parenting cfs_rq sum. When a task entity's load changes we add the same
    delta to the maintained sum.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.514678907@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • Since runqueues do not have a corresponding sched_entity we instead embed a
    sched_avg structure directly.

    Signed-off-by: Ben Segall
    Reviewed-by: Paul Turner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.442637130@google.com
    Signed-off-by: Ingo Molnar

    Ben Segall
     
  • Instead of tracking averaging the load parented by a cfs_rq, we can track
    entity load directly. With the load for a given cfs_rq then being the sum
    of its children.

    To do this we represent the historical contribution to runnable average
    within each trailing 1024us of execution as the coefficients of a
    geometric series.

    We can express this for a given task t as:

    runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i
    load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t)

    Where: u_i is the usage in the last i`th 1024us period (approximately 1ms)
    ~ms and y is chosen such that y^k = 1/2. We currently choose k to be 32 which
    roughly translates to about a sched period.

    Signed-off-by: Paul Turner
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120823141506.372695337@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     

14 May, 2012

1 commit

  • Some numbers like nr_running and nr_uninterruptible are fundamentally
    unsigned since its impossible to have a negative amount of tasks, yet
    we still print them as signed to easily recognise the underflow
    condition.

    rq->nr_uninterruptible has 'special' accounting and can in fact very
    easily become negative on a per-cpu basis.

    It was noted that since the P() macro assumes things are long long and
    the promotion of unsigned 'int/long' to long long on 32bit doesn't
    sign extend we print silly large numbers instead of the easier to read
    signed numbers.

    Therefore extend the P() macro to not require the sign extention.

    Reported-by: Diwakar Tundlam
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-gk5tm8t2n4ix2vkpns42uqqp@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 May, 2012

1 commit

  • Since there's a PID space limit of 30bits (see
    futex.h:FUTEX_TID_MASK) and allocating that many tasks (assuming a
    lower bound of 2 pages per task) would still take 8T of memory it
    seems reasonable to say that unsigned int is sufficient for
    rq->nr_running.

    When we do get anywhere near that amount of tasks I suspect other
    things would go funny, load-balancer load computations would really
    need to be hoisted to 128bit etc.

    So save a few bytes and convert rq->nr_running and friends to
    unsigned int.

    Suggested-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-y3tvyszjdmbibade5bw8zl81@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Jan, 2012

1 commit

  • Currently we don't utilize the sched_switch field anymore.

    But, simply removing sched_switch field from the middle of the
    sched_stat output will break tools.

    So, to stay compatible we hardcode it to zero and remove the
    field from the scheduler data structures.

    Update the schedstat documentation accordingly.

    Signed-off-by: Rakib Mullick
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1327422836.27181.5.camel@localhost.localdomain
    Signed-off-by: Ingo Molnar

    Rakib Mullick
     

17 Nov, 2011

1 commit