Commit 8852aac25e79e38cc6529f20298eed154f60b574

Authored by Tejun Heo
1 parent 412d32e6c9

workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay

8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()")
implemented mod_delayed_work[_on]() using the improved
try_to_grab_pending().  The function is later used, among others, to
replace [__]candel_delayed_work() + queue_delayed_work() combinations.

Unfortunately, a delayed_work item w/ zero @delay is handled slightly
differently by mod_delayed_work_on() compared to
queue_delayed_work_on().  The latter skips timer altogether and
directly queues it using queue_work_on() while the former schedules
timer which will expire on the closest tick.  This means, when @delay
is zero, that [__]cancel_delayed_work() + queue_delayed_work_on()
makes the target item immediately executable while
mod_delayed_work_on() may induce delay of upto a full tick.

This somewhat subtle difference breaks some of the converted users.
e.g. block queue plugging uses delayed_work for deferred processing
and uses mod_delayed_work_on() when the queue needs to be immediately
unplugged.  The above problem manifested as noticeably higher number
of context switches under certain circumstances.

The difference in behavior was caused by missing special case handling
for 0 delay in mod_delayed_work_on() compared to
queue_delayed_work_on().  Joonsoo Kim posted a patch to add it -
("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1].
The patch was queued for 3.8 but it was described as optimization and
I missed that it was a correctness issue.

As both queue_delayed_work_on() and mod_delayed_work_on() use
__queue_delayed_work() for queueing, it seems that the better approach
is to move the 0 delay special handling to the function instead of
duplicating it in mod_delayed_work_on().

Fix the problem by moving 0 delay special case handling from
queue_delayed_work_on() to __queue_delayed_work().  This replaces
Joonsoo's patch.

[1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Anders Kaseorg <andersk@MIT.EDU>
Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu>
LKML-Reference: <50A78AA9.5040904@iskon.hr>
Cc: Joonsoo Kim <js1304@gmail.com>

Showing 1 changed file with 11 additions and 3 deletions Side-by-side Diff

... ... @@ -1364,6 +1364,17 @@
1364 1364 BUG_ON(timer_pending(timer));
1365 1365 BUG_ON(!list_empty(&work->entry));
1366 1366  
  1367 + /*
  1368 + * If @delay is 0, queue @dwork->work immediately. This is for
  1369 + * both optimization and correctness. The earliest @timer can
  1370 + * expire is on the closest next tick and delayed_work users depend
  1371 + * on that there's no such delay when @delay is 0.
  1372 + */
  1373 + if (!delay) {
  1374 + __queue_work(cpu, wq, &dwork->work);
  1375 + return;
  1376 + }
  1377 +
1367 1378 timer_stats_timer_set_start_info(&dwork->timer);
1368 1379  
1369 1380 /*
... ... @@ -1416,9 +1427,6 @@
1416 1427 struct work_struct *work = &dwork->work;
1417 1428 bool ret = false;
1418 1429 unsigned long flags;
1419   -
1420   - if (!delay)
1421   - return queue_work_on(cpu, wq, &dwork->work);
1422 1430  
1423 1431 /* read the comment in __queue_work() */
1424 1432 local_irq_save(flags);