Eric Lee / smarc-fsl-linux-kernel

16 Apr, 2019

1 commit

6d25be578 sched/core, workqueues: Distangle worker accounting from rq lock ... Browse Code »

The worker accounting for CPU bound workers is plugged into the core
scheduler code and the wakeup code. This is not a hard requirement and
can be avoided by keeping track of the state in the workqueue code
itself.

Keep track of the sleeping state in the worker itself and call the
notifier before entering the core scheduler. There might be false
positives when the task is woken between that call and actually
scheduling, but that's not really different from scheduling and being
woken immediately after switching away. When nr_running is updated when
the task is retunrning from schedule() then it is later compared when it
is done from ttwu().

[ bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de Oliveira ]

Signed-off-by: Thomas Gleixner
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Daniel Bristot de Oliveira
Cc: Lai Jiangshan
Cc: Linus Torvalds
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com
Signed-off-by: Ingo Molnar

Thomas Gleixner
2019-04-16 22:55:15 +0800

02 Feb, 2019

1 commit

1b69ac6b4 psi: fix aggregation idle shut-off ... Browse Code »

psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating. However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again. This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep. To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself. If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity. But that
should not be a problem. The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation. If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner
Reported-by: Suren Baghdasaryan
Acked-by: Tejun Heo
Cc: Peter Zijlstra
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2019-02-02 07:46:23 +0800

18 May, 2018

2 commits

8bf895931 workqueue: Set worker->desc to workqueue name by default ... Browse Code »

Work functions can use set_worker_desc() to improve the visibility of
what the worker task is doing. Currently, the desc field is unset at
the beginning of each execution and there is a separate field to track
the field is set during the current execution.

Instead of leaving empty till desc is set, worker->desc can be used to
remember the last workqueue the worker worked on by default and users
that use set_worker_desc() can override it to something more
informative as necessary.

This simplifies desc handling and helps tracking the last workqueue
that the worker exected on to improve visibility.

Signed-off-by: Tejun Heo

Tejun Heo
2018-05-18 23:47:13 +0800
a2d812a27 workqueue: Make worker_attach/detach_pool() update worker->pool ... Browse Code »

For historical reasons, the worker attach/detach functions don't
currently manage worker->pool and the callers are manually and
inconsistently updating it.

This patch moves worker->pool updates into the worker attach/detach
functions. This makes worker->pool consistent and clearly defines how
worker->pool updates are synchronized.

This will help later workqueue visibility improvements by allowing
safe access to workqueue information from worker->task.

Signed-off-by: Tejun Heo

Tejun Heo
2018-05-18 23:47:13 +0800

07 Nov, 2017

1 commit

e4880bc5d Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"Another fix for a really old bug.

It only affects drain_workqueue() which isn't used often and even then
triggers only during a pretty small race window, so it isn't too
surprising that it stayed hidden for so long.

The fix is straight-forward and low-risk. Kudos to Li Bin for
reporting and fixing the bug"

* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix NULL pointer dereference

Linus Torvalds
2017-11-07 04:26:49 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

30 Oct, 2017

1 commit

cef572ad9 workqueue: Fix NULL pointer dereference ... Browse Code »

When queue_work() is used in irq (not in task context), there is
a potential case that trigger NULL pointer dereference.
----------------------------------------------------------------
worker_thread()
|-spin_lock_irq()
|-process_one_work()
|-worker->current_pwq = pwq
|-spin_unlock_irq()
|-worker->current_func(work)
|-spin_lock_irq()
|-worker->current_pwq = NULL
|-spin_unlock_irq()

//interrupt here
|-irq_handler
|-__queue_work()
//assuming that the wq is draining
|-is_chained_work(wq)
|-current_wq_worker()
//Here, 'current' is the interrupted worker!
|-current->current_pwq is NULL here!
|-schedule()
----------------------------------------------------------------

Avoid it by checking for task context in current_wq_worker(), and
if not in task context, we shouldn't use the 'current' to check the
condition.

Reported-by: Xiaofei Tan
Signed-off-by: Li Bin
Reviewed-by: Lai Jiangshan
Signed-off-by: Tejun Heo
Fixes: 8d03ecfe4718 ("workqueue: reimplement is_chained_work() using current_wq_worker()")
Cc: stable@vger.kernel.org # v3.9+

Li Bin
2017-10-30 22:56:01 +0800

02 Mar, 2016

1 commit

9b7f6597f sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() ... Browse Code »

Given that wq_worker_sleeping() could only be called for a
CPU it is running on, we do not need passing a CPU ID as an
argument.

Suggested-by: Oleg Nesterov
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Alexander Gordeev
Signed-off-by: Tejun Heo

Alexander Gordeev
2016-03-02 23:28:47 +0800

20 May, 2014

2 commits

92f9c5c40 workqueue: rename manager_mutex to attach_mutex ... Browse Code »

manager_mutex is only used to protect the attaching for the pool
and the pool->workers list. It protects the pool->workers and operations
based on this list, such as:

cpu-binding for the workers in the pool->workers
the operations to set/clear WORKER_UNBOUND

So let's rename manager_mutex to attach_mutex to better reflect its
role. This patch is a pure rename.

tj: Minor command and description updates.

Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo

Lai Jiangshan
2014-05-20 22:59:32 +0800
da028469b workqueue: separate iteration role from worker_idr ... Browse Code »

worker_idr has the iteration (iterating for attached workers) and
worker ID duties. These two duties don't have to be tied together. We
can separate them and use a list for tracking attached workers and
iteration.

Before this separation, it wasn't possible to add rescuer workers to
worker_idr due to rescuer workers couldn't allocate ID dynamically
because ID-allocation depends on memory-allocation, which rescuer
can't depend on.

After separation, we can easily add the rescuer workers to the list for
iteration without any memory-allocation. It is required when we attach
the rescuer worker to the pool in later patch.

tj: Minor description updates.

Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo

Lai Jiangshan
2014-05-20 22:59:31 +0800

19 Jun, 2013

1 commit

0a0fca9d8 sched: Rename sched.c as sched/core.c in comments and Documentation ... Browse Code »

Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.

I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.

I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.

Signed-off-by: Viresh Kumar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar

Viresh Kumar
2013-06-19 18:58:42 +0800

01 May, 2013

1 commit

3d1cb2059 workqueue: include workqueue info when printing debug dump of a worker task ... Browse Code »

One of the problems that arise when converting dedicated custom
threadpool to workqueue is that the shared worker pool used by workqueue
anonimizes each worker making it more difficult to identify what the
worker was doing on which target from the output of sysrq-t or debug
dump from oops, BUG() and friends.

This patch implements set_worker_desc() which can be called from any
workqueue work function to set its description. When the worker task is
dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion
and so on - the description will be printed out together with the
workqueue name and the worker function pointer.

The printing side is implemented by print_worker_info() which is called
from functions in task dump paths - sched_show_task() and
dump_stack_print_info(). print_worker_info() can be safely called on
any task in any state as long as the task struct itself is accessible.
It uses probe_*() functions to access worker fields. It may print
garbage if something went very wrong, but it wouldn't cause (another)
oops.

The description is currently limited to 24bytes including the
terminating \0. worker->desc_valid and workder->desc[] are added and
the 64 bytes marker which was already incorrect before adding the new
fields is moved to the correct position.

Here's an example dump with writeback updated to set the bdi name as
worker desc.

Hardware name: Bochs
Modules linked in:
Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1
Workqueue: writeback bdi_writeback_workfn (flush-8:0)
ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8
ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0
ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08
Call Trace:
[] dump_stack+0x19/0x1b
[] warn_slowpath_common+0x7f/0xc0
[] warn_slowpath_null+0x1a/0x20
[] bdi_writeback_workfn+0x2a0/0x3b0
...

Signed-off-by: Tejun Heo
Cc: Peter Zijlstra
Cc: Ingo Molnar
Acked-by: Jan Kara
Cc: Oleg Nesterov
Cc: Jens Axboe
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2013-05-01 08:04:02 +0800

20 Mar, 2013

1 commit

a9ab775bc workqueue: directly restore CPU affinity of workers from CPU_ONLINE ... Browse Code »

Rebinding workers of a per-cpu pool after a CPU comes online involves
a lot of back-and-forth mostly because only the task itself could
adjust CPU affinity if PF_THREAD_BOUND was set.

As CPU_ONLINE itself couldn't adjust affinity, it had to somehow
coerce the workers themselves to perform set_cpus_allowed_ptr(). Due
to the various states a worker can be in, this led to three different
paths a worker may be rebound. worker->rebind_work is queued to busy
workers. Idle ones are signaled by unlinking worker->entry and call
idle_worker_rebind(). The manager isn't covered by either and
implements its own mechanism.

PF_THREAD_BOUND has been relaced with PF_NO_SETAFFINITY and CPU_ONLINE
itself now can manipulate CPU affinity of workers. This patch
replaces the existing rebind mechanism with direct one where
CPU_ONLINE iterates over all workers using for_each_pool_worker(),
restores CPU affinity, and clears WORKER_UNBOUND.

There are a couple subtleties. All bound idle workers should have
their runqueues set to that of the bound CPU; however, if the target
task isn't running, set_cpus_allowed_ptr() just updates the
cpus_allowed mask deferring the actual migration to when the task
wakes up. This is worked around by waking up idle workers after
restoring CPU affinity before any workers can become bound.

Another subtlety is stems from matching @pool->nr_running with the
number of running unbound workers. While DISASSOCIATED, all workers
are unbound and nr_running is zero. As workers become bound again,
nr_running needs to be adjusted accordingly; however, there is no good
way to tell whether a given worker is running without poking into
scheduler internals. Instead of clearing UNBOUND directly,
rebind_workers() replaces UNBOUND with another new NOT_RUNNING flag -
REBOUND, which will later be cleared by the workers themselves while
preparing for the next round of work item execution. The only change
needed for the workers is clearing REBOUND along with PREP.

* This patch leaves for_each_busy_worker() without any user. Removed.

* idle_worker_rebind(), busy_worker_rebind_fn(), worker->rebind_work
and rebind logic in manager_workers() removed.

* worker_thread() now looks at WORKER_DIE instead of testing whether
@worker->entry is empty to determine whether it needs to do
something special as dying is the only special thing now.

Signed-off-by: Tejun Heo
Reviewed-by: Lai Jiangshan

Tejun Heo
2013-03-20 04:45:21 +0800

13 Mar, 2013

1 commit

d84ff0512 workqueue: consistently use int for @cpu variables ... Browse Code »

Workqueue is mixing unsigned int and int for @cpu variables. There's
no point in using unsigned int for cpus - many of cpu related APIs
take int anyway. Consistently use int for @cpu variables so that we
can use negative values to mark special ones.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo
Reviewed-by: Lai Jiangshan

Tejun Heo
2013-03-13 02:29:59 +0800

05 Mar, 2013

1 commit

b31041042 workqueue: better define synchronization rule around rescuer->pool updates ... Browse Code »

Rescuers visit different worker_pools to process work items from pools
under pressure. Currently, rescuer->pool is updated outside any
locking and when an outsider looks at a rescuer, there's no way to
tell when and whether rescuer->pool is gonna change. While this
doesn't currently cause any problem, it is nasty.

With recent worker_maybe_bind_and_lock() changes, we can move
rescuer->pool updates inside pool locks such that if rescuer->pool
equals a locked pool, it's guaranteed to stay that way until the pool
is unlocked.

Move rescuer->pool inside pool->lock.

This patch doesn't introduce any visible behavior difference.

tj: Updated the description.

Signed-off-by: Lai Jiangshan
Signed-off-by: Tejun Heo

Lai Jiangshan
2013-03-05 01:44:58 +0800

14 Feb, 2013

1 commit

112202d90 workqueue: rename cpu_workqueue to pool_workqueue ... Browse Code »

workqueue has moved away from global_cwqs to worker_pools and with the
scheduled custom worker pools, wforkqueues will be associated with
pools which don't have anything to do with CPUs. The workqueue code
went through significant amount of changes recently and mass renaming
isn't likely to hurt much additionally. Let's replace 'cpu' with
'pool' so that it reflects the current design.

* s/struct cpu_workqueue_struct/struct pool_workqueue/
* s/cpu_wq/pool_wq/
* s/cwq/pwq/

This patch is purely cosmetic.

Signed-off-by: Tejun Heo

Tejun Heo
2013-02-14 11:29:12 +0800

25 Jan, 2013

1 commit

a60dc39c0 workqueue: remove global_cwq ... Browse Code »

global_cwq is now nothing but a container for per-cpu standard
worker_pools. Declare the worker pools directly as
cpu/unbound_std_worker_pools[] and remove global_cwq.

* ____cacheline_aligned_in_smp moved from global_cwq to worker_pool.
This probably would have made sense even before this change as we
want each pool to be aligned.

* get_gcwq() is replaced with std_worker_pools() which returns the
pointer to the standard pool array for a given CPU.

* __alloc_workqueue_key() updated to use get_std_worker_pool() instead
of open-coding pool determination.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.

v2: Joonsoo pointed out that it'd better to align struct worker_pool
rather than the array so that every pool is aligned.

Signed-off-by: Tejun Heo
Reviewed-by: Lai Jiangshan
Cc: Joonsoo Kim

Tejun Heo
2013-01-25 03:01:34 +0800

19 Jan, 2013

3 commits

84b233adc workqueue: implement current_is_async() ... Browse Code »

This function queries whether %current is an async worker executing an
async item. This will be used to implement warning on synchronous
request_module() from async workers.

Signed-off-by: Tejun Heo

Tejun Heo
2013-01-19 06:05:56 +0800
2eaebdb33 workqueue: move struct worker definition to workqueue_internal.h ... Browse Code »

This will be used to implement an inline function to query whether
%current is a workqueue worker and, if so, allow determining which
work item it's executing.

Signed-off-by: Tejun Heo
Cc: Linus Torvalds

Tejun Heo
2013-01-19 06:05:55 +0800
ea138446e workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h ... Browse Code »

Workqueue wants to expose more interface internal to kernel/. Instead
of adding a new header file, repurpose kernel/workqueue_sched.h.
Rename it to workqueue_internal.h and add include protector.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo
Cc: Linus Torvalds
Cc: Ingo Molnar
Cc: Peter Zijlstra

Tejun Heo
2013-01-19 06:05:55 +0800