02 Aug, 2009

2 commits

  • We need to add the new prio to the cpupri accounting before
    removing the old prio. This is because removing the old prio
    first will open a race window where the cpu will be removed
    from pri_active. In this case the cpu will not be visible for
    RT push and pulls. This could cause a RT task to not migrate
    appropriately, and create a very large latency.

    This bug was found with the use of ftrace sched events and
    trace_printk.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Background:

    Several race conditions in the scheduler have cropped up
    recently, which Steven and I have tracked down using ftrace.
    The most recent one turns out to be a race in how the scheduler
    determines a suitable migration target for RT tasks, introduced
    recently with commit:

    commit 68e74568fbe5854952355e942acca51f138096d9
    Date: Tue Nov 25 02:35:13 2008 +1030

    sched: convert struct cpupri_vec cpumask_var_t.

    The original design of cpupri allowed lockless readers to
    quickly determine a best-estimate target. Races between the
    pri_active bitmap and the vec->mask were handled in the
    original code because we would detect and return "0" when this
    occured. The design was predicated on the *effective*
    atomicity (*) of caching the result of cpus_and() between the
    cpus_allowed and the vec->mask.

    Commit 68e74568 changed the behavior such that vec->mask is
    accessed multiple times. This introduces a subtle race, the
    result of which means we can have a result that returns "1",
    but with an empty bitmap.

    *) yes, we know cpus_and() is not a locked operator across the
    entire composite array, but it is implicitly atomic on a
    per-word basis which is all the design required to work.

    Implementation:

    Rather than forgoing the lockless design, or reverting to a
    stack-based cpumask_t, we simply check for when the race has
    been encountered and continue processing in the event that the
    race is hit. This renders the removal race as if the priority
    bit had been atomically cleared as well, and allows the
    algorithm to execute correctly.

    Signed-off-by: Gregory Haskins
    CC: Rusty Russell
    CC: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Gregory Haskins
     

17 Jun, 2009

1 commit

  • Those two functions no longer call alloc_bootmmem_cpumask_var(),
    so no need to tag them with __init_refok.

    Signed-off-by: Li Zefan
    Acked-by: Pekka Enberg
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

12 Jun, 2009

1 commit


09 Jun, 2009

1 commit


01 Apr, 2009

1 commit


06 Jan, 2009

1 commit


25 Nov, 2008

1 commit

  • Impact: stack usage reduction, (future) size reduction for large NR_CPUS.

    Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
    space for small nr_cpu_ids but big CONFIG_NR_CPUS.

    The fact cpupro_init is called both before and after the slab is
    available makes for an ugly parameter unfortunately.

    We also use cpumask_any_and to get rid of a temporary in cpupri_find.

    Signed-off-by: Rusty Russell
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

06 Jun, 2008

1 commit

  • The current code use a linear algorithm which causes scaling issues
    on larger SMP machines. This patch replaces that algorithm with a
    2-dimensional bitmap to reduce latencies in the wake-up path.

    Signed-off-by: Gregory Haskins
    Acked-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Gregory Haskins