Eric Lee / smarc-fsl-linux-kernel

20 Jun, 2017

1 commit

ac6424b98 sched/wait: Rename wait_queue_t => wait_queue_entry_t ... Browse Code »

Rename:

wait_queue_t => wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-20 18:18:27 +0800

02 May, 2017

2 commits

3527d3e95 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:

- another round of rq-clock handling debugging, robustization and
fixes

- PELT accounting improvements

- CPU hotplug related ->cpus_allowed affinity handling fixes all
around the tree

- ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
sched/x86: Update reschedule warning text
crypto: N2 - Replace racy task affinity logic
cpufreq/sparc-us2e: Replace racy task affinity logic
cpufreq/sparc-us3: Replace racy task affinity logic
cpufreq/sh: Replace racy task affinity logic
cpufreq/ia64: Replace racy task affinity logic
ACPI/processor: Replace racy task affinity logic
ACPI/processor: Fix error handling in __acpi_processor_start()
sparc/sysfs: Replace racy task affinity logic
powerpc/smp: Replace open coded task affinity logic
ia64/sn/hwperf: Replace racy task affinity logic
ia64/salinfo: Replace racy task affinity logic
workqueue: Provide work_on_cpu_safe()
ia64/topology: Remove cpus_allowed manipulation
sched/fair: Move the PELT constants into a generated header
sched/fair: Increase PELT accuracy for small tasks
sched/fair: Fix comments
sched/Documentation: Add 'sched-pelt' tool
sched/fair: Fix corner case in __accumulate_sum()
sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
...

Linus Torvalds
2017-05-02 10:12:53 +0800
ad1490bcd Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue update from Tejun Heo:
"One trivial patch to use setup_deferrable_timer() instead of
open-coding the initialization"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: use setup_deferrable_timer

Linus Torvalds
2017-05-02 04:49:27 +0800

15 Apr, 2017

1 commit

0e8d6a933 workqueue: Provide work_on_cpu_safe() ... Browse Code »

work_on_cpu() is not protected against CPU hotplug. For code which requires
to be either executed on an online CPU or to fail if the CPU is not
available the callsite would have to protect against CPU hotplug.

Provide a function which does get/put_online_cpus() around the call to
work_on_cpu() and fails the call with -ENODEV if the target CPU is not
online.

Preparatory patch to convert several racy task affinity manipulations.

Signed-off-by: Thomas Gleixner
Acked-by: Tejun Heo
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Herbert Xu
Cc: "Rafael J. Wysocki"
Cc: Peter Zijlstra
Cc: Benjamin Herrenschmidt
Cc: Sebastian Siewior
Cc: Lai Jiangshan
Cc: Viresh Kumar
Cc: Michael Ellerman
Cc: "David S. Miller"
Cc: Len Brown
Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2017-04-15 18:20:53 +0800

07 Mar, 2017

2 commits

c30fb26b1 workqueue: use setup_deferrable_timer ... Browse Code »

Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.

Signed-off-by: Geliang Tang
Signed-off-by: Tejun Heo

Geliang Tang
2017-03-07 04:42:20 +0800
637fdbae6 workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq ... Browse Code »

If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender. This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously. Add NULL @wq check.

Reported-by: Dave Jones
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo

Tejun Heo
2017-03-07 04:33:42 +0800

10 Feb, 2017

1 commit

dfb4357da time: Remove CONFIG_TIMER_STATS ... Browse Code »

Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

#11: , hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner
Signed-off-by: Kees Cook
Acked-by: John Stultz
Cc: Nicolas Pitre
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan
Cc: Shuah Khan
Cc: Xing Gao
Cc: Jonathan Corbet
Cc: Jessica Frazelle
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss
Cc: "Paul E. McKenney"
Cc: Petr Mladek
Cc: Richard Cochran
Cc: Tejun Heo
Cc: Michal Marek
Cc: Josh Poimboeuf
Cc: Dmitry Vyukov
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Olof Johansson
Cc: Andrew Morton
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner

Kees Cook
2017-02-10 18:15:08 +0800

20 Oct, 2016

2 commits

8bc4a0445 Merge branch 'for-4.9' into for-4.10 Browse Code »

Tejun Heo
2016-10-20 00:12:40 +0800
2186d9f94 workqueue: move wq_numa_init() to workqueue_init() ... Browse Code »

While splitting up workqueue initialization into two parts,
ac8f73400782 ("workqueue: make workqueue available early during boot")
put wq_numa_init() into workqueue_init_early(). Unfortunately, on
some archs including power and arm64, cpu to node mapping isn't yet
established by the time the early init is called leading to incorrect
NUMA initialization and subsequently the following oops due to zero
cpumask on node-specific unbound pools.

Unable to handle kernel paging request for data at address 0x00000038
Faulting instruction address: 0xc0000000000fc0cc
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
task: c0000007f5400000 task.stack: c000001ffc084000
NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
REGS: c000001ffc087780 TRAP: 0300 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161005)
MSR: 9000000002009033 CR: 48000424 XER: 00000000
CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
LR [c0000000000ed928] activate_task+0x78/0xe0
Call Trace:
[c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
[c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
[c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
[c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
[c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
[c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
[c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
[c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
[c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
60420000 72490021 ebfe0150 2f890001 419e0de0 7fbee840 419e0e58
---[ end trace 0000000000000000 ]---

Fix it by moving wq_numa_init() to workqueue_init(). As this means
that the early intialization may not have full NUMA info for per-cpu
pools and ignores NUMA affinity for unbound pools, fix them up from
workqueue_init() after wq_numa_init().

Signed-off-by: Tejun Heo
Reported-by: Michael Ellerman
Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au
Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot")
Signed-off-by: Tejun Heo

Tejun Heo
2016-10-20 00:12:26 +0800

12 Oct, 2016

1 commit

e700591ae kthread: rename probe_kthread_data() to kthread_probe_data() ... Browse Code »

Patch series "kthread: Kthread worker API improvements"

The intention of this patchset is to make it easier to manipulate and
maintain kthreads. Especially, I want to replace all the custom main
cycles with a generic one. Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.

This patch (of 11):

A good practice is to prefix the names of functions by the name of the
subsystem.

This patch fixes the name of probe_kthread_data(). The other wrong
functions names are part of the kthread worker API and will be fixed
separately.

Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Suggested-by: Andrew Morton
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800

18 Sep, 2016

2 commits

863b710b6 workqueue: remove keventd_up() ... Browse Code »

keventd_up() no longer has in-kernel users. Remove it and make
wq_online static.

Signed-off-by: Tejun Heo

Tejun Heo
2016-09-18 01:18:21 +0800
3347fa092 workqueue: make workqueue available early during boot ... Browse Code »

Workqueue is currently initialized in an early init call; however,
there are cases where early boot code has to be split and reordered to
come after workqueue initialization or the same code path which makes
use of workqueues is used both before workqueue initailization and
after. The latter cases have to gate workqueue usages with
keventd_up() tests, which is nasty and easy to get wrong.

Workqueue usages have become widespread and it'd be a lot more
convenient if it can be used very early from boot. This patch splits
workqueue initialization into two steps. workqueue_init_early() which
sets up the basic data structures so that workqueues can be created
and work items queued, and workqueue_init() which actually brings up
workqueues online and starts executing queued work items. The former
step can be done very early during boot once memory allocation,
cpumasks and idr are initialized. The latter right after kthreads
become available.

This allows work item queueing and canceling from very early boot
which is what most of these use cases want.

* As systemd_wq being initialized doesn't indicate that workqueue is
fully online anymore, update keventd_up() to test wq_online instead.
The follow-up patches will get rid of all its usages and the
function itself.

* Flushing doesn't make sense before workqueue is fully initialized.
The flush functions trigger WARN and return immediately before fully
online.

* Work items are never in-flight before fully online. Canceling can
always succeed by skipping the flush step.

* Some code paths can no longer assume to be called with irq enabled
as irq is disabled during early boot. Use irqsave/restore
operations instead.

v2: Watchdog init, which requires timer to be running, moved from
workqueue_init_early() to workqueue_init().

Signed-off-by: Tejun Heo
Suggested-by: Linus Torvalds
Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com

Tejun Heo
2016-09-18 01:18:21 +0800

16 Sep, 2016

1 commit

fa07fb6a4 workqueue: dump workqueue state on sanity check failures in destroy_workqueue() ... Browse Code »

destroy_workqueue() performs a number of sanity checks to ensure that
the workqueue is empty before proceeding with destruction. However,
it's not always easy to tell what's going on just from the warning
message. Let's dump workqueue state after sanity check failures to
help debugging.

Signed-off-by: Tejun Heo
Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com
Cc: Dmitry Vyukov

Tejun Heo
2016-09-16 23:08:39 +0800

29 Aug, 2016

1 commit

f72b8792d workqueue: add cancel_work() ... Browse Code »

Like cancel_delayed_work(), but for regular work.

Signed-off-by: Jens Axboe
Mehed-by: Tejun Heo
Acked-by: Tejun Heo

Jens Axboe
2016-08-29 22:13:21 +0800

30 Jul, 2016

1 commit

a6408f6cb Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull smp hotplug updates from Thomas Gleixner:
"This is the next part of the hotplug rework.

- Convert all notifiers with a priority assigned

- Convert all CPU_STARTING/DYING notifiers

The final removal of the STARTING/DYING infrastructure will happen
when the merge window closes.

Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
timers/core: Correct callback order during CPU hot plug
leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
powerpc/numa: Convert to hotplug state machine
arm/perf: Fix hotplug state machine conversion
irqchip/armada: Avoid unused function warnings
ARC/time: Convert to hotplug state machine
clocksource/atlas7: Convert to hotplug state machine
clocksource/armada-370-xp: Convert to hotplug state machine
clocksource/exynos_mct: Convert to hotplug state machine
clocksource/arm_global_timer: Convert to hotplug state machine
rcu: Convert rcutree to hotplug state machine
KVM/arm/arm64/vgic-new: Convert to hotplug state machine
smp/cfd: Convert core to hotplug state machine
x86/x2apic: Convert to CPU hotplug state machine
profile: Convert to hotplug state machine
timers/core: Convert to hotplug state machine
hrtimer: Convert to hotplug state machine
x86/tboot: Convert to hotplug state machine
arm64/armv8 deprecated: Convert to hotplug state machine
hwtracing/coresight-etm4x: Convert to hotplug state machine
...

Linus Torvalds
2016-07-30 04:55:30 +0800

25 Jul, 2016

1 commit

7f234a4d8 Merge branches 'pm-sleep' and 'pm-tools' ... Browse Code »

* pm-sleep:
PM / hibernate: Introduce test_resume mode for hibernation
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
PM / hibernate: Recycle safe pages after image restoration
PM / hibernate: Simplify mark_unsafe_pages()
PM / hibernate: Do not free preallocated safe pages during image restore
PM / suspend: show workqueue state in suspend flow
PM / sleep: make PM notifiers called symmetrically
PM / sleep: Make pm_prepare_console() return void
PM / Hibernate: Don't let kasan instrument snapshot.c

* pm-tools:
PM / tools: scripts: AnalyzeSuspend v4.2
tools/turbostat: allow user to alter DESTDIR and PREFIX

Rafael J. Wysocki
2016-07-25 19:44:32 +0800

14 Jul, 2016

1 commit

7ee681b25 workqueue: Convert to state machine callbacks ... Browse Code »

Get rid of the prio ordering of the separate notifiers and use a proper state
callback pair.

Signed-off-by: Thomas Gleixner
Signed-off-by: Anna-Maria Gleixner
Reviewed-by: Sebastian Andrzej Siewior
Acked-by: Tejun Heo
Cc: Andrew Morton
Cc: Lai Jiangshan
Cc: Linus Torvalds
Cc: Nicolas Iooss
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rasmus Villemoes
Cc: Rusty Russell
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-14 15:34:43 +0800

02 Jul, 2016

1 commit

7b776af66 PM / suspend: show workqueue state in suspend flow ... Browse Code »

If freezable workqueue aborts suspend flow, show
workqueue state for debug purpose.

Signed-off-by: Roger Lu
Acked-by: Tejun Heo
Signed-off-by: Rafael J. Wysocki

Roger Lu
2016-07-02 07:42:48 +0800

17 Jun, 2016

1 commit

d945b5e9f workqueue: Fix setting affinity of unbound worker threads ... Browse Code »

With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

------------[ cut here ]------------
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4

Call Trace:
[c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
[c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
[c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
[c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
[c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
[c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
[c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
[c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
[c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
[c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch fixes this by limiting the mask to the intersection of
the pool affinity and online CPUs.

Changelog-cribbed-from: Gautham R. Shenoy
Reported-by: Abdul Haleem
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Tejun Heo

Peter Zijlstra
2016-06-17 03:37:05 +0800

20 May, 2016

2 commits

b9fdac7f6 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks ... Browse Code »

When activating a static object we need make sure that the object is
tracked in the object tracker. If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks. Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not. If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked. Then Change
such object can overwrite someone's memory and cause unexpected
behaviour. For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800
02a982a6e workqueue: update debugobjects fixup callbacks return type ... Browse Code »

Update the return type to use bool instead of int, corresponding to
change (debugobjects: make fixup functions return bool instead of int)

Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800

14 May, 2016

1 commit

da9222390 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
DOWN_PREPARE which can trigger a WARN_ON() in workqueue.

The bug has been there for a very long time. It only triggers if CPU
down fails at a specific point and I don't think it has adverse
effects other than the warning messages. The fix is very low impact"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix rebind bound workers warning

Linus Torvalds
2016-05-14 07:16:51 +0800

13 May, 2016

1 commit

f7c17d26f workqueue: fix rebind bound workers warning ... Browse Code »

------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
Modules linked in:
CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
Call Trace:
dump_stack+0x89/0xd4
__warn+0xfd/0x120
warn_slowpath_null+0x1d/0x20
rebind_workers+0x1c0/0x1d0
workqueue_cpu_up_callback+0xf5/0x1d0
notifier_call_chain+0x64/0x90
? trace_hardirqs_on_caller+0xf2/0x220
? notify_prepare+0x80/0x80
__raw_notifier_call_chain+0xe/0x10
__cpu_notify+0x35/0x50
notify_down_prepare+0x5e/0x80
? notify_prepare+0x80/0x80
cpuhp_invoke_callback+0x73/0x330
? __schedule+0x33e/0x8a0
cpuhp_down_callbacks+0x51/0xc0
cpuhp_thread_fun+0xc1/0xf0
smpboot_thread_fn+0x159/0x2a0
? smpboot_create_threads+0x80/0x80
kthread+0xef/0x110
? wait_for_completion+0xf0/0x120
? schedule_tail+0x35/0xf0
ret_from_fork+0x22/0x50
? __init_kthread_worker+0x70/0x70
---[ end trace eb12ae47d2382d8f ]---
notify_down_prepare: attempt to take down CPU 0 failed

This bug can be reproduced by below config w/ nohz_full= all cpus:

CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
CONFIG_DEBUG_HOTPLUG_CPU0=y
CONFIG_NO_HZ_FULL=y

As Thomas pointed out:

| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP = 5
| CPU_PRI_WORKQUEUE_DOWN = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -> Ignores DOWN_PREPARE
| CB ...
| CB X ---> Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -> Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.

The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:

workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down

So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.

This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.

Cc: Tejun Heo
Cc: Lai Jiangshan
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Frédéric Weisbecker
Cc: stable@vger.kernel.org
Suggested-by: Lai Jiangshan
Signed-off-by: Wanpeng Li

Wanpeng Li
2016-05-13 00:00:23 +0800

28 Apr, 2016

1 commit

b75a2bf89 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time. AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.

A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item. Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.

Unfortunately, we were missing a smp_mb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.

Lots of kudos to Roman for debugging the problem. The patch for
-stable is the minimal one. For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix ghost PENDING flag while doing MQ IO

Linus Torvalds
2016-04-28 03:03:59 +0800

26 Apr, 2016

1 commit

346c09f80 workqueue: fix ghost PENDING flag while doing MQ IO ... Browse Code »

The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[ 601.347574] Tainted: G O 4.4.5-1-storage+ #6
[ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000
[ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[ 601.350965] Call Trace:
[ 601.351203] [] ? bit_wait+0x60/0x60
[ 601.351444] [] schedule+0x35/0x80
[ 601.351709] [] schedule_timeout+0x192/0x230
[ 601.351958] [] ? blk_flush_plug_list+0xc7/0x220
[ 601.352208] [] ? ktime_get+0x37/0xa0
[ 601.352446] [] ? bit_wait+0x60/0x60
[ 601.352688] [] io_schedule_timeout+0xa4/0x110
[ 601.352951] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[ 601.353196] [] bit_wait_io+0x1b/0x70
[ 601.353440] [] __wait_on_bit+0x5d/0x90
[ 601.353689] [] wait_on_page_bit+0xc0/0xd0
[ 601.353958] [] ? autoremove_wake_function+0x40/0x40
[ 601.354200] [] __filemap_fdatawait_range+0xe4/0x140
[ 601.354441] [] filemap_fdatawait_range+0x14/0x30
[ 601.354688] [] filemap_write_and_wait_range+0x3f/0x70
[ 601.354932] [] blkdev_fsync+0x1b/0x50
[ 601.355193] [] vfs_fsync_range+0x49/0xa0
[ 601.355432] [] blkdev_write_iter+0xca/0x100
[ 601.355679] [] __vfs_write+0xaa/0xe0
[ 601.355925] [] vfs_write+0xa9/0x1a0
[ 601.356164] [] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

queue_mode = MQ
submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
0 1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

save_stack_trace_tsk + 34
blk_mq_insert_requests + 231
blk_mq_flush_plug_list + 281
blk_flush_plug_list + 199
wait_on_page_bit + 192
__filemap_fdatawait_range + 228
filemap_fdatawait_range + 20
filemap_write_and_wait_range + 63
blkdev_fsync + 27
vfs_fsync_range + 73
blkdev_write_iter + 202
__vfs_write + 170
vfs_write + 169
kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

CPU#0 CPU#1
---------------------------------- -------------------------------
reqeust A inserted
STORE hctx->ctx_map[0] bit marked
kblockd_schedule...() returns 1

request B inserted
STORE hctx->ctx_map[1] bit marked
kblockd_schedule...() returns 0
*** WORK PENDING bit is cleared ***
flush_busy_ctxs() is executed, but
bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier , which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen
Cc: Gioh Kim
Cc: Michael Wang
Cc: Tejun Heo
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo

Roman Pen
2016-04-26 23:23:22 +0800

19 Mar, 2016

1 commit

ef504fa59 Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue updates from Tejun Heo:
"Three trivial workqueue changes"

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix comment for work_on_cpu()
sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()
workqueue: Replace usage of init_name with dev_set_name()

Linus Torvalds
2016-03-19 11:05:39 +0800

16 Mar, 2016

1 commit

25528213f tags: Fix DEFINE_PER_CPU expansions ... Browse Code »

$ make tags
GEN tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

scripts/tags.sh:200: '/\
Acked-by: David S. Miller
Acked-by: Rafael J. Wysocki
Cc: Tejun Heo
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2016-03-16 07:55:16 +0800

12 Mar, 2016

1 commit

22aceb317 workqueue: Fix comment for work_on_cpu() ... Browse Code »

Function is processed in thread context, not in user context.

Cc: Tejun Heo
Cc: Lai Jiangshan
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Anna-Maria Gleixner
Signed-off-by: Tejun Heo

Anna-Maria Gleixner
2016-03-12 01:39:01 +0800

02 Mar, 2016

1 commit

9b7f6597f sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() ... Browse Code »

Given that wq_worker_sleeping() could only be called for a
CPU it is running on, we do not need passing a CPU ID as an
argument.

Suggested-by: Oleg Nesterov
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Alexander Gordeev
Signed-off-by: Tejun Heo

Alexander Gordeev
2016-03-02 23:28:47 +0800

18 Feb, 2016

1 commit

23217b443 workqueue: Replace usage of init_name with dev_set_name() ... Browse Code »

The init_name property of the device struct is sort of a hack and should
only be used for statically allocated devices. Since the device is
dynamically allocated here it is safe to use the proper way to set a
devices name by calling dev_set_name().

Signed-off-by: Lars-Peter Clausen
Signed-off-by: Tejun Heo

Lars-Peter Clausen
2016-02-18 05:14:18 +0800

11 Feb, 2016

1 commit

d6e022f1d workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup ... Browse Code »

When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.

This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.

While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.

Signed-off-by: Tejun Heo
Reported-by: Mike Galbraith
Cc: Tang Chen
Cc: Rafael J. Wysocki
Cc: Len Brown
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com

Tejun Heo
2016-02-11 01:13:05 +0800

10 Feb, 2016

3 commits

f303fccb8 workqueue: implement "workqueue.debug_force_rr_cpu" debug feature ... Browse Code »

Workqueue used to guarantee local execution for work items queued
without explicit target CPU. The guarantee is gone now which can
break some usages in subtle ways. To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.

The debug feature defaults to off and can be enabled with a kernel
parameter. The default can be flipped with a debug config option.

If you hit this commit during bisection, please refer to 041bd12e272c
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.

Signed-off-by: Tejun Heo

Tejun Heo
2016-02-10 06:59:38 +0800
ef5571804 workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs ... Browse Code »

WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally. This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs. Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.

tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty
(wq_unbound_cpumask AND cpu_online_mask). If we want that, it
should be done when config changes.

Signed-off-by: Mike Galbraith
Signed-off-by: Tejun Heo

Mike Galbraith
2016-02-10 06:59:38 +0800
041bd12e2 Revert "workqueue: make sure delayed work run in local cpu" ... Browse Code »

This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76.

Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU. Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").

vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee. Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()"). Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600a6 started crashing.

The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway. As, with the vmstat case fixed,
874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit. A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.

Signed-off-by: Tejun Heo
Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: stable@vger.kernel.org
Cc: Mike Galbraith
Cc: Henrique de Moraes Holschuh
Cc: Daniel Bilik
Cc: Jan Kara
Cc: Shaohua Li
Cc: Sasha Levin
Cc: Ben Hutchings
Cc: Thomas Gleixner
Cc: Daniel Bilik
Cc: Jiri Slaby
Cc: Michal Hocko

Tejun Heo
2016-02-10 05:11:26 +0800

30 Jan, 2016

1 commit

23d11a58a workqueue: skip flush dependency checks for legacy workqueues ... Browse Code »

fca839c00a12 ("workqueue: warn if memory reclaim tries to flush
!WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
flush a !WQ_MEM_RECLAIM workquee.

This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
reclaim path and making it depend on something which may need more
memory to make forward progress can lead to deadlocks. Unfortunately,
workqueues created with the legacy create*_workqueue() interface
always have WQ_MEM_RECLAIM regardless of whether they are depended
upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings
cause spurious triggering of the flush dependency checks.

WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
...
Workqueue: deferwq deferred_probe_work_func
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x94/0xd4)
[] (dump_stack) from [] (warn_slowpath_common+0x80/0xb0)
[] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40)
[] (warn_slowpath_fmt) from [] (check_flush_dependency+0x138/0x144)
[] (check_flush_dependency) from [] (flush_work+0x50/0x15c)
[] (flush_work) from [] (lru_add_drain_all+0x130/0x180)
[] (lru_add_drain_all) from [] (migrate_prep+0x8/0x10)
[] (migrate_prep) from [] (alloc_contig_range+0xd8/0x338)
[] (alloc_contig_range) from [] (cma_alloc+0xe0/0x1ac)
[] (cma_alloc) from [] (__alloc_from_contiguous+0x38/0xd8)
[] (__alloc_from_contiguous) from [] (__dma_alloc+0x240/0x278)
[] (__dma_alloc) from [] (arm_dma_alloc+0x54/0x5c)
[] (arm_dma_alloc) from [] (dmam_alloc_coherent+0xc0/0xec)
[] (dmam_alloc_coherent) from [] (ahci_port_start+0x150/0x1dc)
[] (ahci_port_start) from [] (ata_host_start.part.3+0xc8/0x1c8)
[] (ata_host_start.part.3) from [] (ata_host_activate+0x50/0x148)
[] (ata_host_activate) from [] (ahci_host_activate+0x44/0x114)
[] (ahci_host_activate) from [] (ahci_platform_init_host+0x1d8/0x3c8)
[] (ahci_platform_init_host) from [] (tegra_ahci_probe+0x448/0x4e8)
[] (tegra_ahci_probe) from [] (platform_drv_probe+0x50/0xac)
[] (platform_drv_probe) from [] (driver_probe_device+0x214/0x2c0)
[] (driver_probe_device) from [] (bus_for_each_drv+0x60/0x94)
[] (bus_for_each_drv) from [] (__device_attach+0xb0/0x114)
[] (__device_attach) from [] (bus_probe_device+0x84/0x8c)
[] (bus_probe_device) from [] (deferred_probe_work_func+0x68/0x98)
[] (deferred_probe_work_func) from [] (process_one_work+0x120/0x3f8)
[] (process_one_work) from [] (worker_thread+0x38/0x55c)
[] (worker_thread) from [] (kthread+0xdc/0xf4)
[] (kthread) from [] (ret_from_fork+0x14/0x3c)

Fix it by marking workqueues created via create*_workqueue() with
__WQ_LEGACY and disabling flush dependency checks on them.

Signed-off-by: Tejun Heo
Reported-and-tested-by: Thierry Reding
Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")

Tejun Heo
2016-01-30 02:31:10 +0800

08 Jan, 2016

1 commit

6201171e3 workqueue: simplify the apply_workqueue_attrs_locked() ... Browse Code »

If the apply_wqattrs_prepare() returns NULL, it has already cleaned up
the related resources, so it can return directly and avoid calling the
clean up function again.

This doesn't introduce any functional changes.

Signed-off-by: wanghaibin
Signed-off-by: Tejun Heo

wanghaibin
2016-01-08 00:04:34 +0800

09 Dec, 2015

2 commits

82607adcf workqueue: implement lockup detector ... Browse Code »

Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.

To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.

BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...

The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.

v2: Decoupled from softlockup control knobs.

Signed-off-by: Tejun Heo
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Cc: Michal Hocko
Cc: Chris Mason
Cc: Andrew Morton

Tejun Heo
2015-12-09 00:29:47 +0800
fca839c00 workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue ... Browse Code »

Task or work item involved in memory reclaim trying to flush a
non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to
deadlock. Trigger WARN_ONCE() if such conditions are detected.

Signed-off-by: Tejun Heo
Cc: Peter Zijlstra

Tejun Heo
2015-12-09 00:29:36 +0800

06 Nov, 2015

1 commit

e25ac7dda Merge branch 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue update from Tejun Heo:
"This pull request contains one patch to make an unbound worker pool
allocated from the NUMA node containing it if such node exists. As
unbound worker pools are node-affine by default, this makes most pools
allocated on the right node"

* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Allocate the unbound pool using local node memory

Linus Torvalds
2015-11-06 06:16:27 +0800

13 Oct, 2015

1 commit

e2273584d workqueue: Allocate the unbound pool using local node memory ... Browse Code »

Currently, get_unbound_pool() uses kzalloc() to allocate the
worker pool. Actually, we can use the right node to do the
allocation, achieving local memory access.

This patch selects target node first, and uses kzalloc_node()
instead.

Signed-off-by: Xunlei Pang
Signed-off-by: Tejun Heo

Xunlei Pang
2015-10-13 00:17:31 +0800