Commit 723aae25d5cdb09962901d36d526b44d4be1051c

Authored by Milton Miller
Committed by Linus Torvalds
1 parent 45a5791920

smp_call_function_many: handle concurrent clearing of mask

Mike Galbraith reported finding a lockup ("perma-spin bug") where the
cpumask passed to smp_call_function_many was cleared by other cpu(s)
while a cpu was preparing its call_data block, resulting in no cpu to
clear the last ref and unlock the block.

Having cpus clear their bit asynchronously could be useful on a mask of
cpus that might have a translation context, or cpus that need a push to
complete an rcu window.

Instead of adding a BUG_ON and requiring yet another cpumask copy, just
detect the race and handle it.

Note: arch_send_call_function_ipi_mask must still handle an empty
cpumask because the data block is globally visible before the that arch
callback is made.  And (obviously) there are no guarantees to which cpus
are notified if the mask is changed during the call; only cpus that were
online and had their mask bit set during the whole call are guaranteed
to be called.

Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Jan Beulich <JBeulich@novell.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: stable@kernel.org
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 1 changed file with 10 additions and 3 deletions Side-by-side Diff

... ... @@ -450,7 +450,7 @@
450 450 {
451 451 struct call_function_data *data;
452 452 unsigned long flags;
453   - int cpu, next_cpu, this_cpu = smp_processor_id();
  453 + int refs, cpu, next_cpu, this_cpu = smp_processor_id();
454 454  
455 455 /*
456 456 * Can deadlock when called with interrupts disabled.
... ... @@ -461,7 +461,7 @@
461 461 WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
462 462 && !oops_in_progress && !early_boot_irqs_disabled);
463 463  
464   - /* So, what's a CPU they want? Ignoring this one. */
  464 + /* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
465 465 cpu = cpumask_first_and(mask, cpu_online_mask);
466 466 if (cpu == this_cpu)
467 467 cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
468 468  
... ... @@ -519,7 +519,14 @@
519 519 /* We rely on the "and" being processed before the store */
520 520 cpumask_and(data->cpumask, mask, cpu_online_mask);
521 521 cpumask_clear_cpu(this_cpu, data->cpumask);
  522 + refs = cpumask_weight(data->cpumask);
522 523  
  524 + /* Some callers race with other cpus changing the passed mask */
  525 + if (unlikely(!refs)) {
  526 + csd_unlock(&data->csd);
  527 + return;
  528 + }
  529 +
523 530 raw_spin_lock_irqsave(&call_function.lock, flags);
524 531 /*
525 532 * Place entry at the _HEAD_ of the list, so that any cpu still
... ... @@ -532,7 +539,7 @@
532 539 * to the cpumask before this write to refs, which indicates
533 540 * data is on the list and is ready to be processed.
534 541 */
535   - atomic_set(&data->refs, cpumask_weight(data->cpumask));
  542 + atomic_set(&data->refs, refs);
536 543 raw_spin_unlock_irqrestore(&call_function.lock, flags);
537 544  
538 545 /*