15 Oct, 2007

40 commits

  • * save ~300 bytes
    * activate_idle_task() was moved to avoid a warning

    bloat-o-meter output:

    add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295)
    Signed-off-by: Ingo Molnar

    Alexey Dobriyan
     
  • tweak wakeup granularity.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • optimize schedule() a bit on SMP, by moving the rq-clock update
    outside the rq lock.

    code size is the same:

    text data bss dec hex filename
    25725 2666 96 28487 6f47 sched.o.before
    25725 2666 96 28487 6f47 sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • The thing is that __pick_next_entity() must never be called when
    first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
    be the very first field of 'struct sched_entity' (and it's the second).

    The 'nr_running != 0' check is _not_ enough, due to the fact that
    'current' is not within the tree. Generic paths are ok (e.g. schedule()
    as put_prev_task() is called previously)... I'm more worried about e.g.
    migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
    'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
    tasks (or imagine, some other use-cases in the future -- i.e. we should
    not make outer world dependent on internal details of sched_fair class)
    -- it may be "Houston, we've got a problem" case.

    it's +16 bytes to the ".text". Another variant is to make 'run_node' the
    first data member of 'struct sched_entity' but an additional check (se !
    = NULL) is still needed in pick_next_entity().

    Signed-off-by: Dmitry Adamushko
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Dmitry Adamushko
     
  • Make vslice accurate wrt nice levels, and add some comments
    while we're at it.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • more whitespace cleanups. No code changed:

    text data bss dec hex filename
    26553 2790 288 29631 73bf sched.o.before
    26553 2790 288 29631 73bf sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • mark scheduling classes as const. The speeds up the code
    a bit and shrinks it:

    text data bss dec hex filename
    40027 4018 292 44337 ad31 sched.o.before
    40190 3842 292 44324 ad24 sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • There is a possibility that because of task of a group moving from one
    cpu to another, it may gain more cpu time that desired. See
    http://marc.info/?l=linux-kernel&m=119073197730334 for details.

    This is an attempt to fix that problem. Basically it simulates dequeue
    of higher level entities as if they are going to sleep. Similarly it
    simulate wakeup of higher level entities as if they are waking up from
    sleep.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • Recent fix to check_preempt_wakeup() to check for preemption at higher
    levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.

    Fix the problem.

    42277 10598 320 53195 cfcb kernel/sched.o-before_this_patch
    42216 10598 320 53134 cf8e kernel/sched.o-after_this_patch

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • Fix coding style issues reported by Randy Dunlap and others

    Signed-off-by: Dhaval Giani
    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • cleanup, remove stale comment.

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • speed up and simplify vslice calculations.

    [ From: Mike Galbraith : build fix ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • clean up min_vruntime use.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • group scheduler SMP migration fix: use task_cfs_rq(p) to get
    to the relevant fair-scheduling runqueue of a task, rq->cfs
    is not the right one.

    Signed-off-by: Ingo Molnar

    Srivatsa Vaddagiri
     
  • rename all 'cnt' fields and variables to the less yucky 'count' name.

    yuckage noticed by Andrew Morton.

    no change in code, other than the /proc/sched_debug bkl_count string got
    a bit larger:

    text data bss dec hex filename
    38236 3506 24 41766 a326 sched.o.before
    38240 3506 24 41770 a32a sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • fix yield bugs due to the current-not-in-rbtree changes: the task is
    not in the rbtree so rbtree-removal is a no-no.

    [ From: Srivatsa Vaddagiri : build fix. ]

    also, nice code size reduction:

    kernel/sched.o:
    text data bss dec hex filename
    38323 3506 24 41853 a37d sched.o.before
    38236 3506 24 41766 a326 sched.o.after

    Signed-off-by: Ingo Molnar
    Signed-off-by: Dmitry Adamushko
    Reviewed-by: Thomas Gleixner

    Dmitry Adamushko
     
  • group scheduler wakeup latency fix: when checking for preemption
    we must check cross-group too, not just intra-group.

    Signed-off-by: Ingo Molnar

    Srivatsa Vaddagiri
     
  • Lee Schermerhorn noticed that set_leftmost() contains dead code,
    remove this.

    Reported-by: Lee Schermerhorn
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • The adjusting sched_class is a missing part of the already existing "do
    not leak PI boosting priority to the child" at the sched_fork(). This
    patch moves the adjusting sched_class from wake_up_new_task() to
    sched_fork().

    this also shrinks the code a bit:

    text data bss dec hex filename
    40111 4018 292 44421 ad85 sched.o.before
    40102 4018 292 44412 ad7c sched.o.after

    Signed-off-by: Hiroshi Shimamoto
    Signed-off-by: Dmitry Adamushko
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Hiroshi Shimamoto
     
  • max_vruntime() simplification.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Peter Zijlstra
     
  • fix sched_fork(): large latencies at new task creation time because
    the ->vruntime was not fixed up cross-CPU, if the parent got migrated
    after the child's CPU got set up.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • fix sign check error in place_entity() - we'd get excessive
    latencies due to negatives being converted to large u64's.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • undo some of the recent changes that are not needed after all,
    such as last_min_vruntime.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • remove last_min_vruntime use - prepare to remove it.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • remove condition from set_task_cpu(). Now that ->vruntime
    is not global anymore, it should (and does) work fine without
    it too.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • entity_key() fix - we'd occasionally end up with a 0 vruntime
    in the !initial case.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • debug feature: check how well we schedule within a reasonable
    vruntime 'spread' range. (note that CPU overload can increase
    the spread, so this is not a hard condition, but normal loads
    should be within the spread.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Peter Zijlstra
     
  • more width for parameter printouts in /proc/sched_debug.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • add vslice: the load-dependent "virtual slice" a task should
    run ideally, so that the observed latency stays within the
    sched_latency window.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Peter Zijlstra
     
  • print the current value of all tunables in /proc/sched_debug output.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • remove unneeded tunables.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • build fix for the SCHED_DEBUG && !SCHEDSTATS case.

    Signed-off-by: S.Ceglar Onur
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    S.Caglar Onur
     
  • add per task and per rq BKL usage statistics.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • enable CONFIG_FAIR_GROUP_SCHED=y by default.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • fair-group sched, cleanups.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • Enable user-id based fair group scheduling. This is useful for anyone
    who wants to test the group scheduler w/o having to enable
    CONFIG_CGROUPS.

    A separate scheduling group (i.e struct task_grp) is automatically created for
    every new user added to the system. Upon uid change for a task, it is made to
    move to the corresponding scheduling group.

    A /proc tunable (/proc/root_user_share) is also provided to tune root
    user's quota of cpu bandwidth.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • With the view of supporting user-id based fair scheduling (and not just
    container-based fair scheduling), this patch renames several functions
    and makes them independent of whether they are being used for container
    or user-id based fair scheduling.

    Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating
    less-sized array for tg->cfs_rq[] and tf->se[]).

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • - Print &rq->cfs statistics as well (useful for group scheduling)

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • - print nr_running and load information for cfs_rq in /proc/sched_debug

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • - fix a minor bug in yield (seen for CONFIG_FAIR_GROUP_SCHED),
    group scheduling would skew when yield was called.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri