19 Mar, 2016
1 commit
-
Pull workqueue updates from Tejun Heo:
"Three trivial workqueue changes"* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix comment for work_on_cpu()
sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()
workqueue: Replace usage of init_name with dev_set_name()
16 Mar, 2016
1 commit
-
$ make tags
GEN tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"Which are all the result of the DEFINE_PER_CPU pattern:
scripts/tags.sh:200: '/\
Acked-by: David S. Miller
Acked-by: Rafael J. Wysocki
Cc: Tejun Heo
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Mar, 2016
1 commit
-
Function is processed in thread context, not in user context.
Cc: Tejun Heo
Cc: Lai Jiangshan
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Anna-Maria Gleixner
Signed-off-by: Tejun Heo
02 Mar, 2016
1 commit
-
Given that wq_worker_sleeping() could only be called for a
CPU it is running on, we do not need passing a CPU ID as an
argument.Suggested-by: Oleg Nesterov
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Alexander Gordeev
Signed-off-by: Tejun Heo
18 Feb, 2016
1 commit
-
The init_name property of the device struct is sort of a hack and should
only be used for statically allocated devices. Since the device is
dynamically allocated here it is safe to use the proper way to set a
devices name by calling dev_set_name().Signed-off-by: Lars-Peter Clausen
Signed-off-by: Tejun Heo
11 Feb, 2016
1 commit
-
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.Signed-off-by: Tejun Heo
Reported-by: Mike Galbraith
Cc: Tang Chen
Cc: Rafael J. Wysocki
Cc: Len Brown
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
10 Feb, 2016
3 commits
-
Workqueue used to guarantee local execution for work items queued
without explicit target CPU. The guarantee is gone now which can
break some usages in subtle ways. To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.The debug feature defaults to off and can be enabled with a kernel
parameter. The default can be flipped with a debug config option.If you hit this commit during bisection, please refer to 041bd12e272c
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.Signed-off-by: Tejun Heo
-
WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally. This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs. Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty
(wq_unbound_cpumask AND cpu_online_mask). If we want that, it
should be done when config changes.Signed-off-by: Mike Galbraith
Signed-off-by: Tejun Heo -
This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76.
Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU. Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee. Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()"). Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600a6 started crashing.The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway. As, with the vmstat case fixed,
874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit. A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.Signed-off-by: Tejun Heo
Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: stable@vger.kernel.org
Cc: Mike Galbraith
Cc: Henrique de Moraes Holschuh
Cc: Daniel Bilik
Cc: Jan Kara
Cc: Shaohua Li
Cc: Sasha Levin
Cc: Ben Hutchings
Cc: Thomas Gleixner
Cc: Daniel Bilik
Cc: Jiri Slaby
Cc: Michal Hocko
30 Jan, 2016
1 commit
-
fca839c00a12 ("workqueue: warn if memory reclaim tries to flush
!WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
flush a !WQ_MEM_RECLAIM workquee.This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
reclaim path and making it depend on something which may need more
memory to make forward progress can lead to deadlocks. Unfortunately,
workqueues created with the legacy create*_workqueue() interface
always have WQ_MEM_RECLAIM regardless of whether they are depended
upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings
cause spurious triggering of the flush dependency checks.WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
...
Workqueue: deferwq deferred_probe_work_func
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x94/0xd4)
[] (dump_stack) from [] (warn_slowpath_common+0x80/0xb0)
[] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40)
[] (warn_slowpath_fmt) from [] (check_flush_dependency+0x138/0x144)
[] (check_flush_dependency) from [] (flush_work+0x50/0x15c)
[] (flush_work) from [] (lru_add_drain_all+0x130/0x180)
[] (lru_add_drain_all) from [] (migrate_prep+0x8/0x10)
[] (migrate_prep) from [] (alloc_contig_range+0xd8/0x338)
[] (alloc_contig_range) from [] (cma_alloc+0xe0/0x1ac)
[] (cma_alloc) from [] (__alloc_from_contiguous+0x38/0xd8)
[] (__alloc_from_contiguous) from [] (__dma_alloc+0x240/0x278)
[] (__dma_alloc) from [] (arm_dma_alloc+0x54/0x5c)
[] (arm_dma_alloc) from [] (dmam_alloc_coherent+0xc0/0xec)
[] (dmam_alloc_coherent) from [] (ahci_port_start+0x150/0x1dc)
[] (ahci_port_start) from [] (ata_host_start.part.3+0xc8/0x1c8)
[] (ata_host_start.part.3) from [] (ata_host_activate+0x50/0x148)
[] (ata_host_activate) from [] (ahci_host_activate+0x44/0x114)
[] (ahci_host_activate) from [] (ahci_platform_init_host+0x1d8/0x3c8)
[] (ahci_platform_init_host) from [] (tegra_ahci_probe+0x448/0x4e8)
[] (tegra_ahci_probe) from [] (platform_drv_probe+0x50/0xac)
[] (platform_drv_probe) from [] (driver_probe_device+0x214/0x2c0)
[] (driver_probe_device) from [] (bus_for_each_drv+0x60/0x94)
[] (bus_for_each_drv) from [] (__device_attach+0xb0/0x114)
[] (__device_attach) from [] (bus_probe_device+0x84/0x8c)
[] (bus_probe_device) from [] (deferred_probe_work_func+0x68/0x98)
[] (deferred_probe_work_func) from [] (process_one_work+0x120/0x3f8)
[] (process_one_work) from [] (worker_thread+0x38/0x55c)
[] (worker_thread) from [] (kthread+0xdc/0xf4)
[] (kthread) from [] (ret_from_fork+0x14/0x3c)Fix it by marking workqueues created via create*_workqueue() with
__WQ_LEGACY and disabling flush dependency checks on them.Signed-off-by: Tejun Heo
Reported-and-tested-by: Thierry Reding
Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
08 Jan, 2016
1 commit
-
If the apply_wqattrs_prepare() returns NULL, it has already cleaned up
the related resources, so it can return directly and avoid calling the
clean up function again.This doesn't introduce any functional changes.
Signed-off-by: wanghaibin
Signed-off-by: Tejun Heo
09 Dec, 2015
2 commits
-
Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.v2: Decoupled from softlockup control knobs.
Signed-off-by: Tejun Heo
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Cc: Michal Hocko
Cc: Chris Mason
Cc: Andrew Morton -
Task or work item involved in memory reclaim trying to flush a
non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to
deadlock. Trigger WARN_ONCE() if such conditions are detected.Signed-off-by: Tejun Heo
Cc: Peter Zijlstra
06 Nov, 2015
1 commit
-
Pull workqueue update from Tejun Heo:
"This pull request contains one patch to make an unbound worker pool
allocated from the NUMA node containing it if such node exists. As
unbound worker pools are node-affine by default, this makes most pools
allocated on the right node"* 'for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Allocate the unbound pool using local node memory
13 Oct, 2015
1 commit
-
Currently, get_unbound_pool() uses kzalloc() to allocate the
worker pool. Actually, we can use the right node to do the
allocation, achieving local memory access.This patch selects target node first, and uses kzalloc_node()
instead.Signed-off-by: Xunlei Pang
Signed-off-by: Tejun Heo
01 Oct, 2015
1 commit
-
My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.[ 28.010131] ------------[ cut here ]------------
[ 28.010609] kernel BUG at ../mm/vmstat.c:1392!
[ 28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ 28.011860] Modules linked in:
[ 28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G W4.3.0-rc3+ #634
[ 28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[ 28.014160] Workqueue: events vmstat_update
[ 28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[ 28.015445] RIP: 0010:[] []vmstat_update+0x31/0x80
[ 28.016282] RSP: 0018:ffff8800ba42fd80 EFLAGS: 00010297
[ 28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[ 28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[ 28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[ 28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[ 28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[ 28.020071] FS: 0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[ 28.020071] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[ 28.020071] Stack:
[ 28.020071] ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[ 28.020071] ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[ 28.020071] ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[ 28.020071] Call Trace:
[ 28.020071] [] process_one_work+0x1c8/0x540
[ 28.020071] [] ? process_one_work+0x14b/0x540
[ 28.020071] [] worker_thread+0x114/0x460
[ 28.020071] [] ? process_one_work+0x540/0x540
[ 28.020071] [] kthread+0xf8/0x110
[ 28.020071] [] ?kthread_create_on_node+0x200/0x200
[ 28.020071] [] ret_from_fork+0x3f/0x70
[ 28.020071] [] ?kthread_create_on_node+0x200/0x200Signed-off-by: Shaohua Li
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org # v2.6.31+
02 Sep, 2015
1 commit
-
Pull workqueue updates from Tejun Heo:
"Only three trivial changes for workqueue this time - doc, MAINTAINERS
and EXPORT_SYMBOL updates"* 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix some docbook warnings
workqueue: Make flush_workqueue() available again to non GPL modules
workqueue: add myself as a dedicated reviwer
01 Sep, 2015
1 commit
-
Pull scheduler updates from Ingo Molnar:
"The biggest change in this cycle is the rewrite of the main SMP load
balancing metric: the CPU load/utilization. The main goal was to make
the metric more precise and more representative - see the changelog of
this commit for the gory details:9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
It is done in a way that significantly reduces complexity of the code:
5 files changed, 249 insertions(+), 494 deletions(-)
and the performance testing results are encouraging. Nevertheless we
need to keep an eye on potential regressions, since this potentially
affects every SMP workload in existence.This work comes from Yuyang Du.
Other changes:
- SCHED_DL updates. (Andrea Parri)
- Simplify architecture callbacks by removing finish_arch_switch().
(Peter Zijlstra et al)- cputime accounting: guarantee stime + utime == rtime. (Peter
Zijlstra)- optimize idle CPU wakeups some more - inspired by Facebook server
loads. (Mike Galbraith)- stop_machine fixes and updates. (Oleg Nesterov)
- Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra)
- sched/numa tweaks. (Srikar Dronamraju)
- misc fixes and small cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched/deadline: Fix comment in enqueue_task_dl()
sched/deadline: Fix comment in push_dl_tasks()
sched: Change the sched_class::set_cpus_allowed() calling context
sched: Make sched_class::set_cpus_allowed() unconditional
sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched: Ensure a task has a non-normalized vruntime when returning back to CFS
sched/numa: Fix NUMA_DIRECT topology identification
tile: Reorganize _switch_to()
sched, sparc32: Update scheduler comments in copy_thread()
sched: Remove finish_arch_switch()
sched, tile: Remove finish_arch_switch
sched, sh: Fold finish_arch_switch() into switch_to()
sched, score: Remove finish_arch_switch()
sched, avr32: Remove finish_arch_switch()
sched, MIPS: Get rid of finish_arch_switch()
sched, arm: Remove finish_arch_switch()
sched/fair: Clean up load average references
sched/fair: Provide runnable_load_avg back to cfs_rq
sched/fair: Remove task and group entity load when they are dead
sched/fair: Init cfs_rq's sched_entity load average
...
12 Aug, 2015
1 commit
-
Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY
without locks, a caller might observe an old value and race with the
set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo
it:__kthread_bind()
do_set_cpus_allowed()
sched_setaffinity()
if (p->flags & PF_NO_SETAFFINITIY)
set_cpus_allowed_ptr()
p->flags |= PF_NO_SETAFFINITYFix the bug by putting everything under the regular scheduler locks.
This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed.
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dedekind1@gmail.com
Cc: juri.lelli@arm.com
Cc: mgorman@suse.de
Cc: riel@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org
Signed-off-by: Ingo Molnar
05 Aug, 2015
1 commit
-
Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
flush_scheduled_work() to workqueue.h") moved the exported non GPL
flush_scheduled_work() from a function to an inline wrapper.
Unfortunately, it directly calls flush_workqueue() which is a GPL function.
This has the effect of changing the licensing requirement for this function
and makes it unavailable to non GPL modules.See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a ("workqueue: Make
schedule_work() available again to non GPL modules") for precedent.Signed-off-by: Tim Gardner
Signed-off-by: Tejun Heo
23 Jul, 2015
1 commit
-
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros. This also requires
inverting the sense of the conditional, which this commit also does.Reported-by: Ingo Molnar
Signed-off-by: Paul E. McKenney
Reviewed-by: Ingo Molnar
02 Jul, 2015
1 commit
-
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
29 May, 2015
1 commit
-
tj: dropped iff -> if, iff is if and only if not a typo. Spotted by
Randy Dunlap.Signed-off-by: Shailendra Verma
Signed-off-by: Tejun Heo
Cc: Randy Dunlap
28 May, 2015
1 commit
-
We can avoid an ifdef over wq_power_efficient's declaration
by just using IS_ENABLED().Cc: Rusty Russell
Cc: Jani Nikula
Cc: Andrew Morton
Cc: Kees Cook
Cc: Tejun Heo
Cc: Ingo Molnar
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez
Signed-off-by: Rusty Russell
22 May, 2015
3 commits
-
flush_scheduled_work() is just a simple call to flush_work().
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo -
Reading to wq->unbound_attrs requires protection of either wq_pool_mutex
or wq->mutex, and wq_sysfs_prep_attrs() is called with wq_pool_mutex held,
so we don't need to grab wq->mutex here.Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo -
This pre-declaration was unneeded since a previous refactor patch
6ba94429c8e7 ("workqueue: Reorder sysfs code").Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
20 May, 2015
2 commits
-
Current modification to attrs via sysfs is not fully synchronized.
Process A (change cpumask) | Process B (change numa affinity)
wq_cpumask_store() |
wq_sysfs_prep_attrs() |
| apply_workqueue_attrs()
apply_workqueue_attrs() |It results that the Process B's operation is totally reverted
without any notification, it is a buggy behavior. So this patch
moves wq_sysfs_prep_attrs() into the protection under wq_pool_mutex
to ensure attrs changes are properly synchronized.Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo -
Applying attrs requires two locks: get_online_cpus() and wq_pool_mutex,
and this code is duplicated at two places (apply_workqueue_attrs() and
workqueue_set_unbound_cpumask()). So we separate out this locking
code into apply_wqattrs_[un]lock() and do a minor refactor on
apply_workqueue_attrs().The apply_wqattrs_[un]lock() will be also used on later patch for
ensuring attrs changes are properly synchronized.tj: minor updates to comments
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
19 May, 2015
2 commits
-
wq_update_unbound_numa() is known be called with wq_pool_mutex held.
But wq_update_unbound_numa() requests wq->mutex before reading
wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq. But these fields
were changed to be allowed being read with wq_pool_mutex held. So we
simply remove the mutex_lock(&wq->mutex).Without the dependence on the the mutex_lock(&wq->mutex), the test
of wq->unbound_attrs->no_numa can also be moved upward.The old code need a long comment to describe the stableness of
@wq->unbound_attrs which is also guaranteed by wq_pool_mutex now,
so we don't need this such comment.Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo -
Current wq_pool_mutex doesn't proctect the attrs-installation, it results
that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed
under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa()
has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa
and the old_pwq.attrs-installation is a short operation, so this change will no cause any
latency for other operations which also acquire the wq_pool_mutex.The only unprotected attrs-installation code is in apply_workqueue_attrs(),
so this patch touches code less than comments.It is also a preparation patch for next several patches which read
wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with
only wq_pool_mutex held.Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
13 May, 2015
1 commit
-
s/detemined/determined
Signed-off-by: Chen Hanxiao
Signed-off-by: Tejun Heo
11 May, 2015
1 commit
-
modify wq_calc_node_mask to wq_calc_node_cpumask
Signed-off-by: Gong Zhaogang
Signed-off-by: Tejun Heo
30 Apr, 2015
1 commit
-
Allow to modify the low-level unbound workqueues cpumask through
sysfs. This is performed by traversing the entire workqueue list
and calling apply_wqattrs_prepare() on the unbound workqueues
with the new low level mask. Only after all the preparation are done,
we commit them all together.Ordered workqueues are ignored from the low level unbound workqueue
cpumask, it will be handled in near future.All the (default & per-node) pwqs are mandatorily controlled by
the low level cpumask. If the user configured cpumask doesn't overlap
with the low level cpumask, the low level cpumask will be used for the
wq instead.The comment of wq_calc_node_cpumask() is updated and explicitly
requires that its first argument should be the attrs of the default
pwq.The default wq_unbound_cpumask is cpu_possible_mask. The workqueue
subsystem doesn't know its best default value, let the system manager
or the other subsystem set it when needed.Changed from V8:
merge the calculating code for the attrs of the default pwq together.
minor change the code&comments for saving the user configured attrs.
remove unnecessary list_del().
minor update the comment of wq_calc_node_cpumask().
update the comment of workqueue_set_unbound_cpumask();Cc: Christoph Lameter
Cc: Kevin Hilman
Cc: Lai Jiangshan
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Tejun Heo
Cc: Viresh Kumar
Cc: Frederic Weisbecker
Original-patch-by: Frederic Weisbecker
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
27 Apr, 2015
2 commits
-
Create a cpumask that limits the affinity of all unbound workqueues.
This cpumask is controlled through a file at the root of the workqueue
sysfs directory.It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
such that the effective cpumask applied for a given unbound workqueue is
the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
the new /sys/devices/virtual/workqueue/cpumask file.This patch implements the basic infrastructure and the read interface.
wq_unbound_cpumask is initially set to cpu_possible_mask.Cc: Christoph Lameter
Cc: Kevin Hilman
Cc: Lai Jiangshan
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Tejun Heo
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo -
Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation,
so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't
ensure the transaction must succeed or fail as a complete unit.To solve this, we split apply_workqueue_attrs() into three stages.
The first stage does the preparation: allocation memory, pwqs.
The second stage does the attrs-installaion and pwqs-installation.
The third stage frees the allocated memory and (old or unused) pwqs.As the result, batching multiple apply_workqueue_attrs()s can
succeed or fail as a complete unit:
1) batch do all the first stage for all the workqueues
2) only commit all when all the above succeed.This patch is a preparation for the next patch ("Allow modifying low level
unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs().The patch doesn't have functionality changed except two minor adjustment:
1) free_unbound_pwq() for the error path is removed, we use the
heavier version put_pwq_unlocked() instead since the error path
is rare. this adjustment simplifies the code.
2) the memory-allocation is also moved into wq_pool_mutex.
this is needed to avoid to do the further splitting.tj: minor updates to comments.
Suggested-by: Tejun Heo
Cc: Christoph Lameter
Cc: Kevin Hilman
Cc: Lai Jiangshan
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Tejun Heo
Cc: Viresh Kumar
Cc: Frederic Weisbecker
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
06 Apr, 2015
1 commit
-
The sysfs code usually belongs to the botom of the file since it deals
with high level objects. In the workqueue code it's misplaced and such
that we'll need to work around functions references to allow the sysfs
code to call APIs like apply_workqueue_attrs().Lets move that block further in the file, almost the botom.
And declare workqueue_sysfs_unregister() just before destroy_workqueue()
which reference it.tj: Moved workqueue_sysfs_unregister() forward declaration where other
forward declarations are.Suggested-by: Tejun Heo
Cc: Christoph Lameter
Cc: Kevin Hilman
Cc: Lai Jiangshan
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Tejun Heo
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker
Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo
09 Mar, 2015
3 commits
-
Workqueues are used extensively throughout the kernel but sometimes
it's difficult to debug stalls involving work items because visibility
into its inner workings is fairly limited. Although sysrq-t task dump
annotates each active worker task with the information on the work
item being executed, it is challenging to find out which work items
are pending or delayed on which queues and how pools are being
managed.This patch implements show_workqueue_state() which dumps all busy
workqueues and pools and is called from the sysrq-t handler. At the
end of sysrq-t dump, something like the following is printed.Showing busy workqueues and worker pools:
...
workqueue filler_wq: flags=0x0
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
in-flight: 491:filler_workfn, 507:filler_workfn
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
in-flight: 501:filler_workfn
pending: filler_workfn
...
workqueue test_wq: flags=0x8
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
delayed: test_workfn1 BAR(492), test_workfn2
...
pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62The above shows that test_wq is executing test_workfn() on pid 510
which is the rescuer and also that there are two tasks 69 and 500
waiting for the work item to finish in flush_work(). As test_wq has
max_active of 1, there are two work items for test_workfn1() and
test_workfn2() which are delayed till the current work item is
finished. In addition, pid 492 is flushing test_workfn1().The work item for test_workfn() is being executed on pwq of pool 2
which is the normal priority per-cpu pool for CPU 1. The pool has
three workers, two of which are executing filler_workfn() for
filler_wq and the last one is assuming the manager role trying to
create more workers.This extra workqueue state dump will hopefully help chasing down hangs
involving workqueues.v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.
v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
printk()'s replaced with pr_info()'s, and cpumask printing now
uses cpulist_pr_cont().Signed-off-by: Tejun Heo
Cc: Lai Jiangshan
Cc: Linus Torvalds
Cc: Andrew Morton
CC: Ingo Molnar -
Add wq_barrier->task and worker_pool->manager to keep track of the
flushing task and pool manager respectively. These are purely
informational and will be used to implement sysrq dump of workqueues.Signed-off-by: Tejun Heo
-
The workqueues list is protected by wq_pool_mutex and a workqueue and
its subordinate data structures are freed directly on destruction. We
want to add the ability dump workqueues from a sysrq callback which
requires walking all workqueues without grabbing wq_pool_mutex. This
patch makes freeing of workqueues RCU protected and makes the
workqueues list walkable while holding RCU read lock.Note that pool_workqueues and pools are already sched-RCU protected.
For consistency, workqueues are also protected with sched-RCU.While at it, reverse the workqueues list so that a workqueue which is
created earlier comes before. The order of the list isn't significant
functionally but this makes the planned sysrq dump list system
workqueues first.Signed-off-by: Tejun Heo