14 Apr, 2011

25 commits

  • Now that we've removed the rq->lock requirement from the first part of
    ttwu() and can compute placement without holding any rq->lock, ensure
    we execute the second half of ttwu() on the actual cpu we want the
    task to run on.

    This avoids having to take rq->lock and doing the task enqueue
    remotely, saving lots on cacheline transfers.

    As measured using: http://oss.oracle.com/~mason/sembench.c

    $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
    $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
    $ ./sembench -t 2048 -w 1900 -o 0

    unpatched: run time 30 seconds 647278 worker burns per second
    patched: run time 30 seconds 816715 worker burns per second

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.515897185@chello.nl

    Peter Zijlstra
     
  • Factor our helper functions to make the inner workings of try_to_wake_up()
    more obvious, this also allows for adding remote queues.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.475848012@chello.nl

    Peter Zijlstra
     
  • The ttwu_post_activation() code does the core wakeup, it sets TASK_RUNNING
    and performs wakeup-preemption, so give is a more descriptive name.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.434609705@chello.nl

    Peter Zijlstra
     
  • In order to call ttwu_stat() without holding rq->lock we must remove
    its rq argument. Since we need to change rq stats, account to the
    local rq instead of the task rq, this is safe since we have IRQs
    disabled.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.394638826@chello.nl

    Peter Zijlstra
     
  • Currently ttwu() does two rq->lock acquisitions, once on the task's
    old rq, holding it over the p->state fiddling and load-balance pass.
    Then it drops the old rq->lock to acquire the new rq->lock.

    By having serialized ttwu(), p->sched_class, p->cpus_allowed with
    p->pi_lock, we can now drop the whole first rq->lock acquisition.

    The p->pi_lock serializing concurrent ttwu() calls protects p->state,
    which we will set to TASK_WAKING to bridge possible p->pi_lock to
    rq->lock gaps and serialize set_task_cpu() calls against
    task_rq_lock().

    The p->pi_lock serialization of p->sched_class allows us to call
    scheduling class methods without holding the rq->lock, and the
    serialization of p->cpus_allowed allows us to do the load-balancing
    bits without races.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.354401150@chello.nl

    Peter Zijlstra
     
  • Since we can now call select_task_rq() and set_task_cpu() with only
    p->pi_lock held, and sched_exec() load-balancing has always been
    optimistic, drop all rq->lock usage.

    Oleg also noted that need_migrate_task() will always be true for
    current, so don't bother calling that at all.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.314204889@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since p->pi_lock now protects all things needed to call
    select_task_rq() avoid the double remote rq->lock acquisition and rely
    on p->pi_lock.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.273362517@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order to be able to call set_task_cpu() while either holding
    p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
    stabilize task_rq().

    This makes task_rq_lock() acquire both locks, and have
    __task_rq_lock() validate that p->pi_lock is held. This increases the
    locking overhead for most scheduler syscalls but allows reduction of
    rq->lock contention for some scheduler hot paths (ttwu).

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.232781355@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we now serialize ttwu() using p->pi_lock, we also need to
    serialize ttwu_local() using that, otherwise, once we drop the
    rq->lock from ttwu() it can race with ttwu_local().

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.192366907@chello.nl

    Peter Zijlstra
     
  • In prepratation of having to call task_contributes_to_load() without
    holding rq->lock, we need to store the result until we do and can
    update the rq accounting accordingly.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152729.151523907@chello.nl

    Peter Zijlstra
     
  • In order to avoid reading partial updated min_vruntime values on 32bit
    implement a seqcount like solution.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.111378493@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In preparation of calling this without rq->lock held, remove the
    dependency on the rq argument.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.071474242@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In preparation of calling select_task_rq() without rq->lock held, drop
    the dependency on the rq argument.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152729.031077745@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently p->pi_lock already serializes p->sched_class, also put
    p->cpus_allowed and try_to_wake_up() under it, this prepares the way
    to do the first part of ttwu() without holding rq->lock.

    By having p->sched_class and p->cpus_allowed serialized by p->pi_lock,
    we prepare the way to call select_task_rq() without holding rq->lock.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152728.990364093@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide a generic p->on_rq because the p->se.on_rq semantics are
    unfavourable for lockless wakeups but needed for sched_fair.

    In particular, p->on_rq is only cleared when we actually dequeue the
    task in schedule() and not on any random dequeue as done by things
    like __migrate_task() and __sched_setscheduler().

    This also allows us to remove p->se usage from !sched_fair code.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.949545047@chello.nl

    Peter Zijlstra
     
  • Collect all ttwu() stat code into a single function and ensure its
    always called for an actual wakeup (changing p->state to
    TASK_RUNNING).

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.908177058@chello.nl

    Peter Zijlstra
     
  • try_to_wake_up() would only return a success when it would have to
    place a task on a rq, change that to every time we change p->state to
    TASK_RUNNING, because that's the real measure of wakeups.

    This results in that success is always true for the tracepoints.

    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.866866929@chello.nl

    Peter Zijlstra
     
  • wq_worker_waking_up() needs to match wq_worker_sleeping(), since the
    latter is only called on deactivate, move the former near activate.

    Signed-off-by: Peter Zijlstra
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/n/top-t3m7n70n9frmv4pv2n5fwmov@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we now have p->on_cpu unconditionally available, use it to
    re-implement mutex_spin_on_owner.

    Requested-by: Thomas Gleixner
    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl

    Peter Zijlstra
     
  • Always provide p->on_cpu so that we can determine if its on a cpu
    without having to lock the rq.

    Reviewed-by: Frank Rowand
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110405152728.785452014@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • For future rework of try_to_wake_up() we'd like to push part of that
    function onto the CPU the task is actually going to run on.

    In order to do so we need a generic callback from the existing scheduler IPI.

    This patch introduces such a generic callback: scheduler_ipi() and
    implements it as a NOP.

    BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions!

    Acked-by: Russell King
    Acked-by: Martin Schwidefsky
    Acked-by: Chris Metcalf
    Acked-by: Jesper Nilsson
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Ralf Baechle
    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl

    Peter Zijlstra
     
  • Merge reason: Pick up this upstream commit:

    6631e635c65d: block: don't flush plugged IO on forced preemtion scheduling

    As it modifies the scheduler and we'll queue up dependent patches.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
    x86 platform drivers: Build fix for intel_pmic_gpio

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6:
    avr32: add ATAG_BOARDINFO
    don't check platform_get_irq's return value against zero
    avr32: init cannot ignore signals sent by force_sig_info()
    avr32: fix deadlock when reading clock list in debugfs
    avr32: Fix .size directive for cpu_enter_idle
    avr32: At32ap: pio fix typo "))" on gpio_irq_unmask prototype
    fix the wrong argument of the functions definition

    Linus Torvalds
     
  • * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (22 commits)
    Revert "i915: restore only the mode of this driver on lastclose"
    Revert "ttm: Utilize the DMA API for pages that have TTM_PAGE_FLAG_DMA32 set."
    i915: select VIDEO_OUTPUT_CONTROL for ACPI_VIDEO
    drm/radeon/kms: properly program vddci on evergreen+
    drm/radeon/kms: add voltage type to atom set voltage function
    drm/radeon/kms: fix pcie_p callbacks on btc and cayman
    drm/radeon/kms: fix suspend on rv530 asics
    drm/radeon/kms: clean up gart dummy page handling
    drm/radeon/kms: make radeon i2c put/get bytes less noisy
    drm/radeon/kms: pll tweaks for rv6xx
    drm/radeon: Fix KMS legacy backlight support if CONFIG_BACKLIGHT_CLASS_DEVICE=m.
    radeon: Fix KMS CP writeback on big endian machines.
    i915: restore only the mode of this driver on lastclose
    drm/nvc0: improve vm flush function
    drm/nv50-nvc0: remove some code that doesn't belong here
    drm/nv50: use "nv86" tlb flush method on everything except 0x50/0xac
    drm/nouveau: quirk for XFX GT-240X-YA
    drm/nv50-nvc0: work around an evo channel hang that some people see
    drm/nouveau: implement init table opcode 0x5c
    drm/nouveau: fix oops on unload with disabled LVDS panel
    ...

    Linus Torvalds
     

13 Apr, 2011

15 commits