14 Apr, 2011

8 commits

  • In preparation of calling select_task_rq() without rq->lock held, drop
    the dependency on the rq argument.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.031077745@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently p->pi_lock already serializes p->sched_class, also put
    p->cpus_allowed and try_to_wake_up() under it, this prepares the way
    to do the first part of ttwu() without holding rq->lock.

    By having p->sched_class and p->cpus_allowed serialized by p->pi_lock,
    we prepare the way to call select_task_rq() without holding rq->lock.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152728.990364093@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide a generic p->on_rq because the p->se.on_rq semantics are
    unfavourable for lockless wakeups but needed for sched_fair.

    In particular, p->on_rq is only cleared when we actually dequeue the
    task in schedule() and not on any random dequeue as done by things
    like __migrate_task() and __sched_setscheduler().

    This also allows us to remove p->se usage from !sched_fair code.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.949545047@chello.nl

    Peter Zijlstra
     
  • Collect all ttwu() stat code into a single function and ensure its
    always called for an actual wakeup (changing p->state to
    TASK_RUNNING).

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.908177058@chello.nl

    Peter Zijlstra
     
  • try_to_wake_up() would only return a success when it would have to
    place a task on a rq, change that to every time we change p->state to
    TASK_RUNNING, because that's the real measure of wakeups.

    This results in that success is always true for the tracepoints.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.866866929@chello.nl

    Peter Zijlstra
     
  • wq_worker_waking_up() needs to match wq_worker_sleeping(), since the
    latter is only called on deactivate, move the former near activate.

    Signed-off-by: Peter Zijlstra
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/n/top-t3m7n70n9frmv4pv2n5fwmov@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we now have p->on_cpu unconditionally available, use it to
    re-implement mutex_spin_on_owner.

    Requested-by: Thomas Gleixner
    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl

    Peter Zijlstra
     
  • Always provide p->on_cpu so that we can determine if its on a cpu
    without having to lock the rq.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152728.785452014@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Apr, 2011

1 commit

  • We really only want to unplug the pending IO when the process actually
    goes to sleep. So move the test for flushing the plug up to the place
    where we actually deactivate the task - where we have properly checked
    for preemption and for the process really sleeping.

    Acked-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Apr, 2011

2 commits

  • …-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86-32, fpu: Fix FPU exception handling on non-SSE systems
    x86, hibernate: Initialize mmu_cr4_features during boot
    x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
    x86: visws: Fixup irq overhaul fallout

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Clean up rebalance_domains() load-balance interval calculation

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
    rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()

    * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Fix cpumask leak in __setup_irq()

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf probe: Fix listing incorrect line number with inline function
    perf probe: Fix to find recursively inlined function
    perf probe: Fix multiple --vars options behavior
    perf probe: Fix to remove redundant close
    perf probe: Fix to ensure function declared file

    Linus Torvalds
     
  • * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
    Fix common misspellings

    Linus Torvalds
     

05 Apr, 2011

1 commit

  • Instead of the possible multiple-evaluation of num_online_cpus()
    in rebalance_domains() that Linus reported, avoid it altogether
    in the normal case since it's implemented with a Hamming weight
    function over a cpu bitmask which can be darn expensive for those
    with big iron.

    This also makes it cleaner, smaller and documents the code.

    Reported-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Mar, 2011

2 commits

  • Fixes generated by 'codespell' and manually reviewed.

    Signed-off-by: Lucas De Marchi

    Lucas De Marchi
     
  • sched_setscheduler() (in sched.c) is called in order of changing the
    scheduling policy and/or the real-time priority of a task. Thus,
    if we find out that neither of those are actually being modified, it
    is possible to return earlier and save the overhead of a full
    deactivate+activate cycle of the task in question.

    Beside that, if we have more than one SCHED_FIFO task with the same
    priority on the same rq (which means they share the same priority queue)
    having one of them changing its position in the priority queue because of
    a sched_setscheduler (as it happens by means of the deactivate+activate)
    that does not actually change the priority violates POSIX which states,
    for SCHED_FIFO:

    "If a thread whose policy or priority has been modified by
    pthread_setschedprio() is a running thread or is runnable, the effect on
    its position in the thread list depends on the direction of the
    modification, as follows: a. b. If the priority is unchanged, the
    thread does not change position in the thread list. c. "

    http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html

    (ed: And the POSIX specification here does, briefly and somewhat unexpectedly,
    match what common sense tells us as well. )

    Signed-off-by: Dario Faggioli
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     

26 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

23 Mar, 2011

1 commit


20 Mar, 2011

1 commit

  • Add missing function parameters for yield_to():

    Warning(kernel/sched.c:5470): No description found for parameter 'p'
    Warning(kernel/sched.c:5470): No description found for parameter 'preempt'

    Signed-off-by: Randy Dunlap
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

17 Mar, 2011

2 commits


16 Mar, 2011

4 commits

  • Fix kernel-doc warning for runqueue_is_locked():

    Warning(kernel/sched.c:664): missing initial short description

    Signed-off-by: Randy Dunlap
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     
  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
    x86: Enable forced interrupt threading support
    x86: Mark low level interrupts IRQF_NO_THREAD
    x86: Use generic show_interrupts
    x86: ioapic: Avoid redundant lookup of irq_cfg
    x86: ioapic: Use new move_irq functions
    x86: Use the proper accessors in fixup_irqs()
    x86: ioapic: Use irq_data->state
    x86: ioapic: Simplify irq chip and handler setup
    x86: Cleanup the genirq name space
    genirq: Add chip flag to force mask on suspend
    genirq: Add desc->irq_data accessor
    genirq: Add comments to Kconfig switches
    genirq: Fixup fasteoi handler for oneshot mode
    genirq: Provide forced interrupt threading
    sched: Switch wait_task_inactive to schedule_hrtimeout()
    genirq: Add IRQF_NO_THREAD
    genirq: Allow shared oneshot interrupts
    genirq: Prepare the handling of shared oneshot interrupts
    genirq: Make warning in handle_percpu_event useful
    x86: ioapic: Move trigger defines to io_apic.h
    ...

    Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
    space changes clashing with the Xen cleanups. The set_irq_msi() had
    moved to xen_bind_pirq_msi_to_irq().

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    sched: Resched proper CPU on yield_to()
    sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
    sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
    sched: Clean up the IRQ_TIME_ACCOUNTING code
    sched: Add #ifdef around irq time accounting functions
    sched, autogroup: Stop claiming ownership of the root task group
    sched, autogroup: Stop going ahead if autogroup is disabled
    sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
    sched: Fix the group_imb logic
    sched: Clean up some f_b_g() comments
    sched: Clean up remnants of sd_idle
    sched: Wholesale removal of sd_idle logic
    sched: Add yield_to(task, preempt) functionality
    sched: Use a buddy to implement yield_task_fair()
    sched: Limit the scope of clear_buddies
    sched: Check the right ->nr_running in yield_task_fair()
    sched: Avoid expensive initial update_cfs_load(), on UP too
    sched: Fix switch_from_fair()
    sched: Simplify the idle scheduling class
    softirqs: Account ksoftirqd time as cpustat softirq
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
    perf probe: Clean up probe_point_lazy_walker() return value
    tracing: Fix irqoff selftest expanding max buffer
    tracing: Align 4 byte ints together in struct tracer
    tracing: Export trace_set_clr_event()
    tracing: Explain about unstable clock on resume with ring buffer warning
    ftrace/graph: Trace function entry before updating index
    ftrace: Add .ref.text as one of the safe areas to trace
    tracing: Adjust conditional expression latency formatting.
    tracing: Fix event alignment: skb:kfree_skb
    tracing: Fix event alignment: mce:mce_record
    tracing: Fix event alignment: kvm:kvm_hv_hypercall
    tracing: Fix event alignment: module:module_request
    tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
    tracing: Remove lock_depth from event entry
    perf header: Stop using 'self'
    perf session: Use evlist/evsel for managing perf.data attributes
    perf top: Don't let events to eat up whole header line
    perf top: Fix events overflow in top command
    ring-buffer: Remove unused #include <linux/trace_irq.h>
    tracing: Add an 'overwrite' trace_option.
    ...

    Linus Torvalds
     

11 Mar, 2011

1 commit

  • Although they run as rpciod background tasks, under normal operation
    (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
    and nfs4_do_close() want to be fully synchronous. This means that when we
    exit, we want all references to the rpc_task to be gone, and we want
    any dentry references etc. held by that task to be released.

    For this reason these functions call __rpc_wait_for_completion_task(),
    followed by rpc_put_task() in the expectation that the latter will be
    releasing the last reference to the rpc_task, and thus ensuring that the
    callback_ops->rpc_release() has been called synchronously.

    This patch fixes a race which exists due to the fact that
    rpciod calls rpc_complete_task() (in order to wake up the callers of
    __rpc_wait_for_completion_task()) and then subsequently calls
    rpc_put_task() without ensuring that these two steps are done atomically.

    In order to avoid adding new spin locks, the patch uses the existing
    waitqueue spin lock to order the rpc_task reference count releases between
    the waiting process and rpciod.
    The common case where nobody is waiting for completion is optimised for by
    checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
    reference count is 1: in those cases we drop trying to grab the spin lock,
    and immediately free up the rpc_task.

    Those few processes that need to put the rpc_task from inside an
    asynchronous context and that do not care about ordering are given a new
    helper: rpc_put_task_async().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Mar, 2011

1 commit

  • This patch adds support for creating a queuing context outside
    of the queue itself. This enables us to batch up pieces of IO
    before grabbing the block device queue lock and submitting them to
    the IO scheduler.

    The context is created on the stack of the process and assigned in
    the task structure, so that we can auto-unplug it if we hit a schedule
    event.

    The current queue plugging happens implicitly if IO is submitted to
    an empty device, yet callers have to remember to unplug that IO when
    they are going to wait for it. This is an ugly API and has caused bugs
    in the past. Additionally, it requires hacks in the vm (->sync_page()
    callback) to handle that logic. By switching to an explicit plugging
    scheme we make the API a lot nicer and can get rid of the ->sync_page()
    hack in the vm.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Mar, 2011

1 commit

  • This removes the implementation of the big kernel lock,
    at last. A lot of people have worked on this in the
    past, I so the credit for this patch should be with
    everyone who participated in the hunt.

    The names on the Cc list are the people that were the
    most active in this, according to the recorded git
    history, in alphabetical order.

    Signed-off-by: Arnd Bergmann
    Acked-by: Alan Cox
    Cc: Alessio Igor Bogani
    Cc: Al Viro
    Cc: Andrew Hendry
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Eric W. Biederman
    Cc: Frederic Weisbecker
    Cc: Hans Verkuil
    Acked-by: Ingo Molnar
    Cc: Jan Blunck
    Cc: John Kacur
    Cc: Jonathan Corbet
    Cc: Linus Torvalds
    Cc: Matthew Wilcox
    Cc: Oliver Neukum
    Cc: Paul Menage
    Acked-by: Thomas Gleixner
    Cc: Trond Myklebust

    Arnd Bergmann
     

04 Mar, 2011

2 commits

  • yield_to_task_fair() has code to resched the CPU of yielding task when the
    intention is to resched the CPU of the task that is being yielded to.

    Change here fixes the problem and also makes the resched conditional on
    rq != p_rq.

    Signed-off-by: Venkatesh Pallipadi
    Reviewed-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     
  • The current scheduler implementation returns -EPERM when trying to
    change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE
    is considered to be a nice 20 on steroids, changing to another policy
    should be allowed provided the RLIMIT_NICE is accounted for.

    This patch allows the following test-case to pass with RLIMIT_NICE=40,
    but still fail with RLIMIT_NICE=10 when the calling process is run
    from a typical shell (nice 0, or 20 in rlimit terms).

    int main()
    {
    int ret;
    struct sched_param sp;
    sp.sched_priority = 0;

    /* switch to SCHED_IDLE */
    ret = sched_setscheduler(0, SCHED_IDLE, &sp);
    printf("setscheduler IDLE: %d\n", ret);
    if (ret) return ret;

    /* switch back to SCHED_OTHER */
    ret = sched_setscheduler(0, SCHED_OTHER, &sp);
    printf("setscheduler OTHER: %d\n", ret);

    return ret;
    }

    $ ulimit -e
    40
    $ ./test
    setscheduler IDLE: 0
    setscheduler OTHER: 0

    $ ulimit -e 10
    $ ulimit -e
    10
    $ ./test
    setscheduler IDLE: 0
    setscheduler OTHER: -1

    Signed-off-by: Darren Hart
    Signed-off-by: Peter Zijlstra
    Cc: Richard Purdie
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     

26 Feb, 2011

2 commits

  • Fix this warning:

    lkml.org/lkml/2011/1/30/124

    kernel/sched.c:3719: warning: 'irqtime_account_idle_ticks' defined but not used
    kernel/sched.c:3720: warning: 'irqtime_account_process_tick' defined but not used

    In a cleaner way than:

    7e9498705e81: sched: Add #ifdef around irq time accounting functions

    This patch will not have any functional impact.

    Signed-off-by: Venkatesh Pallipadi
    Cc: heiko.carstens@de.ibm.com
    Cc: a.p.zijlstra@chello.nl
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     
  • When we force thread hard and soft interrupts the startup of ksoftirqd
    would hang in kthread_bind() when wait_task_inactive() calls
    schedule_timeout_uninterruptible() because there is no softirq yet
    which will wake us up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:

    Thomas Gleixner
     

25 Feb, 2011

1 commit


16 Feb, 2011

1 commit

  • Make the ::exit method act like ::attach, it is after all very nearly
    the same thing.

    The bug had no effect on correctness - fixing it is an optimization for
    the scheduler. Also, later perf-cgroups patches rely on it.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul Menage
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 Feb, 2011

1 commit


12 Feb, 2011

1 commit

  • When the fuction graph tracer starts, it needs to make a special
    stack for each task to save the real return values of the tasks.
    All running tasks have this stack created, as well as any new
    tasks.

    On CPU hot plug, the new idle task will allocate a stack as well
    when init_idle() is called. The problem is that cpu hotplug does
    not create a new idle_task. Instead it uses the idle task that
    existed when the cpu went down.

    ftrace_graph_init_task() will add a new ret_stack to the task
    that is given to it. Because a clone will make the task
    have a stack of its parent it does not check if the task's
    ret_stack is already NULL or not. When the CPU hotplug code
    starts a CPU up again, it will allocate a new stack even
    though one already existed for it.

    The solution is to treat the idle_task specially. In fact, the
    function_graph code already does, just not at init_idle().
    Instead of using the ftrace_graph_init_task() for the idle task,
    which that function expects the task to be a clone, have a
    separate ftrace_graph_init_idle_task(). Also, we will create a
    per_cpu ret_stack that is used by the idle task. When we call
    ftrace_graph_init_idle_task() it will check if the idle task's
    ret_stack is NULL, if it is, then it will assign it the per_cpu
    ret_stack.

    Reported-by: Benjamin Herrenschmidt
    Suggested-by: Peter Zijlstra
    Cc: Stable Tree
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

03 Feb, 2011

3 commits

  • Currently only implemented for fair class tasks.

    Add a yield_to_task method() to the fair scheduling class. allowing the
    caller of yield_to() to accelerate another thread in it's thread group,
    task group.

    Implemented via a scheduler hint, using cfs_rq->next to encourage the
    target being selected. We can rely on pick_next_entity to keep things
    fair, so noone can accelerate a thread that has already used its fair
    share of CPU time.

    This also means callers should only call yield_to when they really
    mean it. Calling it too often can result in the scheduler just
    ignoring the hint.

    Signed-off-by: Rik van Riel
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Use the buddy mechanism to implement yield_task_fair. This
    allows us to skip onto the next highest priority se at every
    level in the CFS tree, unless doing so would introduce gross
    unfairness in CPU time distribution.

    We order the buddy selection in pick_next_entity to check
    yield first, then last, then next. We need next to be able
    to override yield, because it is possible for the "next" and
    "yield" task to be different processen in the same sub-tree
    of the CFS tree. When they are, we need to go into that
    sub-tree regardless of the "yield" hint, and pick the correct
    entity once we get to the right level.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rik van Riel
     
  • Oleg reported that on architectures with
    __ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
    task_oncpu_function_call() can land before perf_event_task_sched_in()
    and cause interesting situations for eg. perf_install_in_context().

    This patch reworks the task_oncpu_function_call() interface to give a
    more usable primitive as well as rework all its users to hopefully be
    more obvious as well as remove the races.

    While looking at the code I also found a number of races against
    perf_event_task_sched_out() which can flip contexts between tasks so
    plug those too.

    Reported-and-reviewed-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Jan, 2011

1 commit