31 Mar, 2009

1 commit

  • * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
    lockdep: fix deadlock in lockdep_trace_alloc
    lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
    lockdep: annotate reclaim context (__GFP_NOFS), fix
    lockdep: build fix for !PROVE_LOCKING
    lockstat: warn about disabled lock debugging
    lockdep: use stringify.h
    lockdep: simplify check_prev_add_irq()
    lockdep: get_user_chars() redo
    lockdep: simplify get_user_chars()
    lockdep: add comments to mark_lock_irq()
    lockdep: remove macro usage from mark_held_locks()
    lockdep: fully reduce mark_lock_irq()
    lockdep: merge the !_READ mark_lock_irq() helpers
    lockdep: merge the _READ mark_lock_irq() helpers
    lockdep: simplify mark_lock_irq() helpers #3
    lockdep: further simplify mark_lock_irq() helpers
    lockdep: simplify the mark_lock_irq() helpers
    lockdep: split up mark_lock_irq()
    lockdep: generate usage strings
    lockdep: generate the state bit definitions
    ...

    Linus Torvalds
     

15 Jan, 2009

2 commits

  • Prefer tasks that wake other tasks to preempt quickly. This improves
    performance because more work is available sooner.

    The workload that prompted this patch was a kernel build over NFS4 (for some
    curious and not understood reason we had to revert commit:
    18de9735300756e3ca9c361ef58409d8561dfe0d to make any progress at all)

    Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
    3m30-ish, with this patch we're down to 2m50-ish.

    psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
    well, tbench and vmark seemed to not care.

    It is possible to improve upon the build time (to 2m20-ish) but that seriously
    destroys other benchmarks (just shows that there's more room for tinkering).

    Much thanks to Mike who put in a lot of effort to benchmark things and proved
    a worthy opponent with a competing patch.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Change mutex contention behaviour such that it will sometimes busy wait on
    acquisition - moving its behaviour closer to that of spinlocks.

    This concept got ported to mainline from the -rt tree, where it was originally
    implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.

    Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
    gave a 345% boost for VFS scalability on my testbox:

    # ./test-mutex-shm V 16 10 | grep "^avg ops"
    avg ops/sec: 296604

    # ./test-mutex-shm V 16 10 | grep "^avg ops"
    avg ops/sec: 85870

    The key criteria for the busy wait is that the lock owner has to be running on
    a (different) cpu. The idea is that as long as the owner is running, there is a
    fair chance it'll release the lock soon, and thus we'll be better off spinning
    instead of blocking/scheduling.

    Since regular mutexes (as opposed to rtmutexes) do not atomically track the
    owner, we add the owner in a non-atomic fashion and deal with the races in
    the slowpath.

    Furthermore, to ease the testing of the performance impact of this new code,
    there is means to disable this behaviour runtime (without having to reboot
    the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
    by issuing the following command:

    # echo NO_OWNER_SPIN > /debug/sched_features

    This command re-enables spinning again (this is also the default):

    # echo OWNER_SPIN > /debug/sched_features

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

05 Nov, 2008

1 commit

  • Impact: improve/change/fix wakeup-buddy scheduling

    Currently we only have a forward looking buddy, that is, we prefer to
    schedule to the task we last woke up, under the presumption that its
    going to consume the data we just produced, and therefore will have
    cache hot benefits.

    This allows co-waking producer/consumer task pairs to run ahead of the
    pack for a little while, keeping their cache warm. Without this, we
    would interleave all pairs, utterly trashing the cache.

    This patch introduces a backward looking buddy, that is, suppose that
    in the above scenario, the consumer preempts the producer before it
    can go to sleep, we will therefore miss the wakeup from consumer to
    producer (its already running, after all), breaking the cycle and
    reverting to the cache-trashing interleaved schedule pattern.

    The backward buddy will try to schedule back to the task that woke us
    up in case the forward buddy is not available, under the assumption
    that the last task will be the one with the most cache hot task around
    barring current.

    This will basically allow a task to continue after it got preempted.

    In order to avoid starvation, we allow either buddy to get wakeup_gran
    ahead of the pack.

    Signed-off-by: Peter Zijlstra
    Acked-by: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

20 Oct, 2008

1 commit

  • David Miller reported that hrtick update overhead has tripled the
    wakeup overhead on Sparc64.

    That is too much - disable the HRTICK feature for now by default,
    until a faster implementation is found.

    Reported-by: David Miller
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 Sep, 2008

2 commits

  • WAKEUP_OVERLAP is not a winner on a 16way box, running psql+sysbench:

    .27-rc7-NO_WAKEUP_OVERLAP .27-rc7-WAKEUP_OVERLAP
    -------------------------------------------------
    1: 694 811 +14.39%
    2: 1454 1427 -1.86%
    4: 3017 3070 +1.70%
    8: 5694 5808 +1.96%
    16: 10592 10612 +0.19%
    32: 9693 9647 -0.48%
    64: 8507 8262 -2.97%
    128: 8402 7087 -18.55%
    256: 8419 5124 -64.30%
    512: 7990 3671 -117.62%
    -------------------------------------------------
    SUM: 64466 55524 -16.11%

    ... so turn it off by default.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Lin Ming reported a 10% OLTP regression against 2.6.27-rc4.

    The difference seems to come from different preemption agressiveness,
    which affects the cache footprint of the workload and its effective
    cache trashing.

    Aggresively preempt a task if its avg overlap is very small, this should
    avoid the task going to sleep and find it still running when we schedule
    back to it - saving a wakeup.

    Reported-by: Lin Ming
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

21 Aug, 2008

1 commit

  • Yanmin reported a significant regression on his 16-core machine due to:

    commit 93b75217df39e6d75889cc6f8050343286aff4a5
    Author: Peter Zijlstra
    Date: Fri Jun 27 13:41:33 2008 +0200

    Flip back to the old behaviour.

    Reported-by: "Zhang, Yanmin"
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Jun, 2008

5 commits

  • Measurement shows that the difference between cgroup:/ and cgroup:/foo
    wake_affine() results is that the latter succeeds significantly more.

    Therefore bias the calculations towards failing the test.

    Signed-off-by: Peter Zijlstra
    Cc: Srivatsa Vaddagiri
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We found that the affine wakeup code needs rather accurate load figures
    to be effective. The trouble is that updating the load figures is fairly
    expensive with group scheduling. Therefore ratelimit the updating.

    Signed-off-by: Peter Zijlstra
    Cc: Srivatsa Vaddagiri
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The bias given by source/target_load functions can be very large, disable
    it by default to get faster convergence.

    Signed-off-by: Peter Zijlstra
    Cc: Srivatsa Vaddagiri
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • calc_delta_asym() is supposed to do the same as calc_delta_fair() except
    linearly shrink the result for negative nice processes - this causes them
    to have a smaller preemption threshold so that they are more easily preempted.

    The problem is that for task groups se->load.weight is the per cpu share of
    the actual task group weight; take that into account.

    Also provide a debug switch to disable the asymmetry (which I still don't
    like - but it does greatly benefit some workloads)

    This would explain the interactivity issues reported against group scheduling.

    Signed-off-by: Peter Zijlstra
    Cc: Srivatsa Vaddagiri
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Try again..

    initial commit: 8f1bc385cfbab474db6c27b5af1e439614f3025c
    revert: f9305d4a0968201b2818dbed0dc8cb0d4ee7aeb3

    Signed-off-by: Peter Zijlstra
    Cc: Srivatsa Vaddagiri
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Jun, 2008

1 commit


20 Apr, 2008

1 commit