Commit 8932a63d5edb02f714d50c26583152fe0a97a69c

Authored by Paul E. McKenney
1 parent d8169d4c36

rcu: Reduce cache-miss initialization latencies for large systems

Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
needed to reduce lock contention that was induced by the synchronization
of scheduling-clock interrupts, which was in turn needed to improve
energy efficiency for moderate-sized lightly loaded servers.

However, reducing the leaf-level fanout means that there are more
leaf-level rcu_node structures in the tree, which in turn means that
RCU's grace-period initialization incurs more cache misses.  This is
not a problem on moderate-sized servers with only a few tens of CPUs,
but becomes a major source of real-time latency spikes on systems with
many hundreds of CPUs.  In addition, the workloads running on these large
systems tend to be CPU-bound, which eliminates the energy-efficiency
advantages of synchronizing scheduling-clock interrupts.  Therefore,
these systems need maximal values for the rcu_node leaf-level fanout.

This commit addresses this problem by introducing a new kernel parameter
named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
This parameter defaults to 16 to handle the common case of a moderate
sized lightly loaded servers, but may be set higher on larger systems.

Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Showing 3 changed files with 31 additions and 8 deletions Side-by-side Diff

... ... @@ -458,6 +458,33 @@
458 458 Select a specific number if testing RCU itself.
459 459 Take the default if unsure.
460 460  
  461 +config RCU_FANOUT_LEAF
  462 + int "Tree-based hierarchical RCU leaf-level fanout value"
  463 + range 2 RCU_FANOUT if 64BIT
  464 + range 2 RCU_FANOUT if !64BIT
  465 + depends on TREE_RCU || TREE_PREEMPT_RCU
  466 + default 16
  467 + help
  468 + This option controls the leaf-level fanout of hierarchical
  469 + implementations of RCU, and allows trading off cache misses
  470 + against lock contention. Systems that synchronize their
  471 + scheduling-clock interrupts for energy-efficiency reasons will
  472 + want the default because the smaller leaf-level fanout keeps
  473 + lock contention levels acceptably low. Very large systems
  474 + (hundreds or thousands of CPUs) will instead want to set this
  475 + value to the maximum value possible in order to reduce the
  476 + number of cache misses incurred during RCU's grace-period
  477 + initialization. These systems tend to run CPU-bound, and thus
  478 + are not helped by synchronized interrupts, and thus tend to
  479 + skew them, which reduces lock contention enough that large
  480 + leaf-level fanouts work well.
  481 +
  482 + Select a specific number if testing RCU itself.
  483 +
  484 + Select the maximum permissible value for large systems.
  485 +
  486 + Take the default if unsure.
  487 +
461 488 config RCU_FANOUT_EXACT
462 489 bool "Disable tree-based hierarchical RCU auto-balancing"
463 490 depends on TREE_RCU || TREE_PREEMPT_RCU
... ... @@ -2418,7 +2418,7 @@
2418 2418  
2419 2419 for (i = NUM_RCU_LVLS - 1; i > 0; i--)
2420 2420 rsp->levelspread[i] = CONFIG_RCU_FANOUT;
2421   - rsp->levelspread[0] = RCU_FANOUT_LEAF;
  2421 + rsp->levelspread[0] = CONFIG_RCU_FANOUT_LEAF;
2422 2422 }
2423 2423 #else /* #ifdef CONFIG_RCU_FANOUT_EXACT */
2424 2424 static void __init rcu_init_levelspread(struct rcu_state *rsp)
... ... @@ -29,18 +29,14 @@
29 29 #include <linux/seqlock.h>
30 30  
31 31 /*
32   - * Define shape of hierarchy based on NR_CPUS and CONFIG_RCU_FANOUT.
  32 + * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
  33 + * CONFIG_RCU_FANOUT_LEAF.
33 34 * In theory, it should be possible to add more levels straightforwardly.
34 35 * In practice, this did work well going from three levels to four.
35 36 * Of course, your mileage may vary.
36 37 */
37 38 #define MAX_RCU_LVLS 4
38   -#if CONFIG_RCU_FANOUT > 16
39   -#define RCU_FANOUT_LEAF 16
40   -#else /* #if CONFIG_RCU_FANOUT > 16 */
41   -#define RCU_FANOUT_LEAF (CONFIG_RCU_FANOUT)
42   -#endif /* #else #if CONFIG_RCU_FANOUT > 16 */
43   -#define RCU_FANOUT_1 (RCU_FANOUT_LEAF)
  39 +#define RCU_FANOUT_1 (CONFIG_RCU_FANOUT_LEAF)
44 40 #define RCU_FANOUT_2 (RCU_FANOUT_1 * CONFIG_RCU_FANOUT)
45 41 #define RCU_FANOUT_3 (RCU_FANOUT_2 * CONFIG_RCU_FANOUT)
46 42 #define RCU_FANOUT_4 (RCU_FANOUT_3 * CONFIG_RCU_FANOUT)