20 Sep, 2011

1 commit


16 Jun, 2011

1 commit

  • Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
    that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
    Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on
    a very traditional single-process model.

    * a master process which reads config files and manages the other
    process
    * multiple imapd processes, one per connection
    * multiple pop3d processes, one per connection
    * multiple lmtpd processes, one per connection
    * periodical "cleanup" processes.

    There are thousands of independent processes. The problem is, recent
    Intel motherboard turn on zone_reclaim_mode by default and traditional
    prefork model software don't work well on it. Unfortunatelly, such models
    are still typical even in the 21st century. We can't ignore them.

    This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have
    any specific meaning. but 20 means that one-hop QPI/Hypertransport and
    such relatively cheap 2-4 socket machine are often used for traditional
    servers as above. The intention is that these machines don't use
    zone_reclaim_mode.

    Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
    This patch doesn't change such high-end NUMA machine behavior.

    Dave Hansen said:

    : I know specifically of pieces of x86 hardware that set the information
    : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
    : behavior which that implies.
    :
    : They've done performance testing and run very large and scary benchmarks
    : to make sure that they _want_ this turned on. What this means for them
    : is that they'll probably be de-optimized, at least on newer versions of
    : the kernel.
    :
    : If you want to do this for particular systems, maybe _that_'s what we
    : should do. Have a list of specific configurations that need the
    : defaults overridden either because they're buggy, or they have an
    : unusual hardware configuration not really reflected in the distance
    : table.

    And later said:

    : The original change in the hardware tables was for the benefit of a
    : benchmark. Said benchmark isn't going to get run on mainline until the
    : next batch of enterprise distros drops, at which point the hardware where
    : this was done will be irrelevant for the benchmark. I'm sure any new
    : hardware will just set this distance to another yet arbitrary value to
    : make the kernel do what it wants. :)
    :
    : Also, when the hardware got _set_ to this initially, I complained. So, I
    : guess I'm getting my way now, with this patch. I'm cool with it.

    Reported-by: Robert Mueller
    Signed-off-by: KOSAKI Motohiro
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Acked-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

10 Sep, 2010

1 commit

  • On top of the SMT and MC scheduling domains this adds the BOOK scheduling
    domain. This is useful for NUMA like machines which do not have an interface
    which tells which piece of memory is attached to which node or where the
    hardware performs striping.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

10 Aug, 2010

1 commit

  • Define stubs for the numa_*_id() generic percpu related functions for
    non-NUMA configurations in where the other
    non-numa stubs live.

    Fixes ia64 !NUMA build breakage -- e.g., tiger_defconfig

    Back out now unneeded '#ifndef CONFIG_NUMA' guards from ia64 smpboot.c

    Signed-off-by: Lee Schermerhorn
    Tested-by: Tony Luck
    Acked-by: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

29 Jun, 2010

1 commit


09 Jun, 2010

1 commit

  • Check to see if the group is packed in a sched doman.

    This is primarily intended to used at the sibling level. Some cores
    like POWER7 prefer to use lower numbered SMT threads. In the case of
    POWER7, it can move to lower SMT modes only when higher threads are
    idle. When in lower SMT modes, the threads will perform better since
    they share less core resources. Hence when we have idle threads, we
    want them to be the higher ones.

    This adds a hook into f_b_g() called check_asym_packing() to check the
    packing. This packing function is run on idle threads. It checks to
    see if the busiest CPU in this domain (core in the P7 case) has a
    higher CPU number than what where the packing function is being run
    on. If it is, calculate the imbalance and return the higher busier
    thread as the busiest group to f_b_g(). Here we are assuming a lower
    CPU number will be equivalent to a lower SMT thread number.

    It also creates a new SD_ASYM_PACKING flag to enable this feature at
    any scheduler domain level.

    It also creates an arch hook to enable this feature at the sibling
    level. The default function doesn't enable this feature.

    Based heavily on patch from Peter Zijlstra.
    Fixes from Srivatsa Vaddagiri.

    Signed-off-by: Michael Neuling
    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Michael Neuling
     

28 May, 2010

2 commits

  • Introduce numa_mem_id(), based on generic percpu variable infrastructure
    to track "nearest node with memory" for archs that support memoryless
    nodes.

    Define API in when CONFIG_HAVE_MEMORYLESS_NODES
    defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
    if/when they support them.

    Archs can override definitions of:

    numa_mem_id() - returns node number of "local memory" node
    set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
    cpu_to_mem() - return numa_mem for specified cpu; may be used as lvalue

    Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
    This will initialize the boot cpu at boot time, and all cpus on change of
    numa_zonelist_order, or when node or memory hot-plug requires zonelist
    rebuild. Archs that support memoryless nodes will need to initialize
    'numa_mem' for secondary cpus as they're brought on-line.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Christoph Lameter
    Cc: Tejun Heo
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Nick Piggin
    Cc: David Rientjes
    Cc: Eric Whitney
    Cc: KAMEZAWA Hiroyuki
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: "Luck, Tony"
    Cc: Pekka Enberg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Rework the generic version of the numa_node_id() function to use the new
    generic percpu variable infrastructure.

    Guard the new implementation with a new config option:

    CONFIG_USE_PERCPU_NUMA_NODE_ID.

    Archs which support this new implemention will default this option to 'y'
    when NUMA is configured. This config option could be removed if/when all
    archs switch over to the generic percpu implementation of numa_node_id().
    Arch support involves:

    1) converting any existing per cpu variable implementations to use
    this implementation. x86_64 is an instance of such an arch.
    2) archs that don't use a per cpu variable for numa_node_id() will
    need to initialize the new per cpu variable "numa_node" as cpus
    are brought on-line. ia64 is an example.
    3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
    when NUMA is configured. This is required because I have
    retained the old implementation by default to allow archs to
    be modified incrementally, as desired.

    Subsequent patches will convert x86_64 and ia64 to use this implemenation.

    Signed-off-by: Lee Schermerhorn
    Cc: Tejun Heo
    Cc: Mel Gorman
    Reviewed-by: Christoph Lameter
    Cc: Nick Piggin
    Cc: David Rientjes
    Cc: Eric Whitney
    Cc: KAMEZAWA Hiroyuki
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: "Luck, Tony"
    Cc: Pekka Enberg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

21 Jan, 2010

1 commit

  • SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
    enabled, leading to many cache misses on large machines as we traverse
    looking for an idle shared cache to wake to. Change the enabler of
    select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
    sibling domain level.

    Reported-by: Lin Ming
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

14 Oct, 2009

1 commit

  • Yanmin reported that both tbench and hackbench were significantly
    hurt by trying to keep tasks local on these domains, esp on small
    cache machines.

    So disable it in order to promote spreading outside of the cache
    domains.

    Reported-by: "Zhang, Yanmin"
    Signed-off-by: Peter Zijlstra
    CC: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

24 Sep, 2009

1 commit


16 Sep, 2009

2 commits

  • Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
    really mind too much, SD_BALANCE_NEWIDLE picks up most of the
    slack.

    On a dual socket, quad core, dual thread nehalem system:

    sysbench (--num_threads=16):

    SD_BALANCE_WAKE-: 13982 tx/s
    SD_BALANCE_WAKE+: 15688 tx/s

    kbuild (-j16):

    SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% )
    SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% )

    (same within noise)

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • And turn it on for NUMA and MC domains. This improves
    locality in balancing decisions by keeping up to
    capacity amount of tasks local before looking for idle
    CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE
    is set.)

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Sep, 2009

5 commits

  • If we're looking to place a new task, we might as well find the
    idlest position _now_, not 1 tick ago.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Make the idle balancer more agressive, to improve a
    x264 encoding workload provided by Jason Garrett-Glaser:

    NEXT_BUDDY NO_LB_BIAS
    encoded 600 frames, 252.82 fps, 22096.60 kb/s
    encoded 600 frames, 250.69 fps, 22096.60 kb/s
    encoded 600 frames, 245.76 fps, 22096.60 kb/s

    NO_NEXT_BUDDY LB_BIAS
    encoded 600 frames, 344.44 fps, 22096.60 kb/s
    encoded 600 frames, 346.66 fps, 22096.60 kb/s
    encoded 600 frames, 352.59 fps, 22096.60 kb/s

    NO_NEXT_BUDDY NO_LB_BIAS
    encoded 600 frames, 425.75 fps, 22096.60 kb/s
    encoded 600 frames, 425.45 fps, 22096.60 kb/s
    encoded 600 frames, 422.49 fps, 22096.60 kb/s

    Peter pointed out that this is better done via newidle_idx,
    not via LB_BIAS, newidle balancing should look for where
    there is load _now_, not where there was load 2 ticks ago.

    Worst-case latencies are improved as well as no buddies
    means less vruntime spread. (as per prior lkml discussions)

    This change improves kbuild-peak parallelism as well.

    Reported-by: Jason Garrett-Glaser
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • CPU level should have WAKE_AFFINE, whereas ALLNODES is dubious.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • When merging select_task_rq_fair() and sched_balance_self() we lost
    the use of wake_idx, restore that and set them to 0 to make wake
    balancing more aggressive.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The problem with wake_idle() is that is doesn't respect things like
    cpu_power, which means it doesn't deal well with SMT nor the recent
    RT interaction.

    To cure this, it needs to do what sched_balance_self() does, which
    leads to the possibility of merging select_task_rq_fair() and
    sched_balance_self().

    Modify sched_balance_self() to:

    - update_shares() when walking up the domain tree,
    (it only called it for the top domain, but it should
    have done this anyway), which allows us to remove
    this ugly bit from try_to_wake_up().

    - do wake_affine() on the smallest domain that contains
    both this (the waking) and the prev (the wakee) cpu for
    WAKE invocations.

    Then use the top-down balance steps it had to replace wake_idle().

    This leads to the dissapearance of SD_WAKE_BALANCE and
    SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.

    SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.

    Touch all topology bits to replace the old with new SD flags --
    platforms might need re-tuning, enabling SD_BALANCE_WAKE
    conditionally on a NUMA distance seems like a good additional
    feature, magny-core and small nehalem systems would want this
    enabled, systems with slow interconnects would not.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Sep, 2009

1 commit


04 Sep, 2009

3 commits

  • Start the re-tuning of the balancer by turning on newidle.

    It improves hackbench performance and parallelism on a 4x4 box.
    The "perf stat --repeat 10" measurements give us:

    domain0 domain1
    .......................................
    -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
    2041.273208 task-clock-msecs # 9.354 CPUs ( +- 0.363% )

    +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
    2086.326925 task-clock-msecs # 11.934 CPUs ( +- 0.301% )

    +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE:
    2115.289791 task-clock-msecs # 12.158 CPUs ( +- 0.263% )

    Acked-by: Peter Zijlstra
    Cc: Andreas Herrmann
    Cc: Andreas Herrmann
    Cc: Gautham R Shenoy
    Cc: Balbir Singh
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Re-organize the flag settings so that it's visible at a glance
    which sched-domains flags are set and which not.

    With the new balancer code we'll need to re-tune these details
    anyway, so make it cleaner to make fewer mistakes down the
    road ;-)

    Cc: Peter Zijlstra
    Cc: Andreas Herrmann
    Cc: Andreas Herrmann
    Cc: Gautham R Shenoy
    Cc: Balbir Singh
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The idea is that multi-threading a core yields more work
    capacity than a single thread, provide a way to express a
    static gain for threads.

    Signed-off-by: Peter Zijlstra
    Tested-by: Andreas Herrmann
    Acked-by: Andreas Herrmann
    Acked-by: Gautham R Shenoy
    Cc: Balbir Singh
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Mar, 2009

2 commits

  • Impact: cleanup, potential bugfix

    Not sure what changed to expose this, but clearly that numa_node_id()
    doesn't belong in mmzone.h (the inline in gfp.h is probably overkill, too).

    In file included from include/linux/topology.h:34,
    from arch/x86/mm/numa.c:2:
    /home/rusty/patches-cpumask/linux-2.6/arch/x86/include/asm/topology.h:64:1: warning: "numa_node_id" redefined
    In file included from include/linux/topology.h:32,
    from arch/x86/mm/numa.c:2:
    include/linux/mmzone.h:770:1: warning: this is the location of the previous definition

    Signed-off-by: Rusty Russell
    Cc: Mike Travis
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     
  • Impact: cleanup

    node_to_cpumask (and the blecherous node_to_cpumask_ptr which
    contained a declaration) are replaced now everyone implements
    cpumask_of_node.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

12 Jan, 2009

1 commit


19 Dec, 2008

2 commits

  • Impact: change task balancing to save power more agressively

    Add SD_BALANCE_NEWIDLE flag at MC level and CPU level
    if sched_mc is set. This helps power savings and
    will not affect performance when sched_mc=0

    Ingo and Mike Galbraith have optimised the SD flags by
    removing SD_BALANCE_NEWIDLE at MC and CPU level. This
    helps performance but hurts power savings since this
    slows down task consolidation by reducing the number
    of times load_balance is run.

    sched: fine-tune SD_MC_INIT
    commit 14800984706bf6936bbec5187f736e928be5c218
    Author: Mike Galbraith
    Date: Fri Nov 7 15:26:50 2008 +0100

    sched: re-tune balancing -- revert
    commit 9fcd18c9e63e325dbd2b4c726623f760788d5aa8
    Author: Ingo Molnar
    Date: Wed Nov 5 16:52:08 2008 +0100

    This patch selectively enables SD_BALANCE_NEWIDLE flag
    only when sched_mc is set to 1 or 2. This helps power savings
    by task consolidation and also does not hurt performance at
    sched_mc=0 where all power saving optimisations are turned off.

    Signed-off-by: Vaidyanathan Srinivasan
    Acked-by: Balbir Singh
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Vaidyanathan Srinivasan
     
  • Impact: cleanup

    BALANCE_FOR_MC_POWER and similar macros defined in sched.h are
    not constants and have various condition checks and significant
    amount of code that is not suitable to be contain in a macro.
    Also there could be side effects on the expressions passed to
    some of them like test_sd_parent().

    This patch converts all complex macros related to power savings
    balance to inline functions.

    Signed-off-by: Vaidyanathan Srinivasan
    Acked-by: Balbir Singh
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Vaidyanathan Srinivasan
     

12 Dec, 2008

1 commit


07 Nov, 2008

2 commits

  • fine-tune the HT sched-domains parameters as well.

    On a HT capable box, this increases lat_ctx performance from 23.87
    usecs to 1.49 usecs:

    # before

    $ ./lat_ctx -s 0 2

    "size=0k ovr=1.89
    2 23.87

    # after

    $ ./lat_ctx -s 0 2

    "size=0k ovr=1.84
    2 1.49

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Tune SD_MC_INIT the same way as SD_CPU_INIT:
    unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE.

    This improves vmark by 5%:

    vmark 132102 125968 125497 messages/sec avg 127855.66 .984
    vmark 139404 131719 131272 messages/sec avg 134131.66 1.033

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    # *DOCUMENTATION*

    Mike Galbraith
     

06 Nov, 2008

1 commit

  • Impact: improve wakeup affinity on NUMA systems, tweak SMP systems

    Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
    balancing defaults on NUMA and SMP systems.

    Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
    why we would not want to have wakeup affinity across nodes as well.
    (we already do this in the standard NUMA template.)

    lat_ctx on a NUMA box is particularly happy about this change:

    before:

    | phoenix:~/l> ./lat_ctx -s 0 2
    | "size=0k ovr=2.60
    | 2 5.70

    after:

    | phoenix:~/l> ./lat_ctx -s 0 2
    | "size=0k ovr=2.65
    | 2 2.07

    a 2.75x speedup.

    pipe-test is similarly happy about it too:

    | phoenix:~/sched-tests> ./pipe-test
    | 18.26 usecs/loop.
    | 14.70 usecs/loop.
    | 14.38 usecs/loop.
    | 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1
    | 8.63 usecs/loop.
    | 8.59 usecs/loop.
    | 9.03 usecs/loop.
    | 8.94 usecs/loop.
    | 8.96 usecs/loop.
    | 8.63 usecs/loop.

    Also:

    - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
    - enable SD_WAKE_BALANCE on SMP domains

    Sysbench+postgresql improves all around the board, quite significantly:

    .28-rc3-11474e2c .28-rc3-11474e2c-tune
    -------------------------------------------------
    1: 571 688 +17.08%
    2: 1236 1206 -2.55%
    4: 2381 2642 +9.89%
    8: 4958 5164 +3.99%
    16: 9580 9574 -0.07%
    32: 7128 8118 +12.20%
    64: 7342 8266 +11.18%
    128: 7342 8064 +8.95%
    256: 7519 7884 +4.62%
    512: 7350 7731 +4.93%
    -------------------------------------------------
    SUM: 55412 59341 +6.62%

    So it's a win both for the runup portion, the peak area and the tail.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

13 Jun, 2008

1 commit

  • This can result in an empty topology directory in sysfs, and requires
    in-kernel users to protect all uses with #ifdef - see
    .

    The documentation of CPU topology specifies what the defaults should be if
    only partial information is available from the hardware. So we can
    provide these defaults as a fallback.

    This patch:

    - Adds default definitions of the 4 topology macros to
    - Changes drivers/base/topology.c to use the topology macros unconditionally
    and to cope with definitions that aren't lvalues
    - Updates documentation accordingly

    [ From: Andrew Morton
    - fold now-duplicated code
    - fix layout
    ]

    Signed-off-by: Ben Hutchings
    Cc: Vegard Nossum
    Cc: Nick Piggin
    Cc: Chandra Seetharaman
    Cc: Suresh Siddha
    Cc: Mike Travis
    Cc: Christoph Lameter
    Cc: John Hawkes
    Cc: Zhang, Yanmin
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Ben Hutchings
     

29 May, 2008

1 commit

  • improve the sysbench ramp-up phase and its peak throughput on
    a 16way NUMA box, by turning on WAKE_AFFINE:

    tip/sched tip/sched+wake-affine
    -------------------------------------------------
    1: 700 830 +15.65%
    2: 1465 1391 -5.28%
    4: 3017 3105 +2.81%
    8: 5100 6021 +15.30%
    16: 10725 10745 +0.19%
    32: 10135 10150 +0.16%
    64: 9338 9240 -1.06%
    128: 8599 8252 -4.21%
    256: 8475 8144 -4.07%
    -------------------------------------------------
    SUM: 57558 57882 +0.56%

    this change also improves lat_ctx from 6.69 usecs to 1.11 usec:

    $ ./lat_ctx -s 0 2
    "size=0k ovr=1.19
    2 1.11

    $ ./lat_ctx -s 0 2
    "size=0k ovr=1.22
    2 6.69

    in sysbench it's an overall win with some weakness at the lots-of-clients
    side. That happens because we now under-balance this workload
    a bit. To counter that effect, turn on NEWIDLE:

    wake-idle wake-idle+newidle
    -------------------------------------------------
    1: 830 834 +0.43%
    2: 1391 1401 +0.65%
    4: 3105 3091 -0.43%
    8: 6021 6046 +0.42%
    16: 10745 10736 -0.08%
    32: 10150 10206 +0.55%
    64: 9240 9533 +3.08%
    128: 8252 8355 +1.24%
    256: 8144 8384 +2.87%
    -------------------------------------------------
    SUM: 57882 58591 +1.21%

    as a bonus this not only improves the many-clients case but
    also improves the (more important) rampup phase.

    sysbench is a workload that quickly breaks down if the
    scheduler over-balances, so since it showed an improvement
    under NEWIDLE this change is definitely good.

    Ingo Molnar
     

20 Apr, 2008

1 commit

  • * Remove empty cpumask_t (and all non-zero/non-null) variables
    in SD_*_INIT macros. Use memset(0) to clear. Also, don't
    inline the initializer functions to save on stack space in
    build_sched_domains().

    * Merge change to include/linux/topology.h that uses the new
    node_to_cpumask_ptr function in the nr_cpus_node macro into
    this patch.

    Depends on:
    [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
    [sched-devel]: sched: add new set_cpus_allowed_ptr function

    Cc: H. Peter Anvin
    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis
     

21 Mar, 2008

1 commit


19 Mar, 2008

1 commit


26 Jan, 2008

2 commits


15 Oct, 2007

2 commits