Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2021

1 commit

0f186b1e6 ANDROID: kthread: cfi: disable callback pointer check with modules ... Browse Code »

With CFI, a callback function passed to __kthread_queue_delayed_work
from a module can point to a jump table entry defined in the module
instead of the one used in the core kernel, which breaks this test:

WARN_ON_ONCE(timer->function != kthread_delayed_work_timer_fn);

To work around the problem, disable the warning when CFI and modules
are both enabled.

Bug: 145210207
Change-Id: I5b0a60bb69ce8e2bc0d8e4bf6736457b6425b6cf
Signed-off-by: Sami Tolvanen

Sami Tolvanen
2021-01-15 00:31:11 +0800

04 Dec, 2020

1 commit

5a920a650 ANDROID: Sched: Export scheduler symbols needed by vendor modules ... Browse Code »

Need to export internal scheduler symbols to facilitate vendor module
with scheduler based value-adds.

Bug: 173725277
Change-Id: I021f09097dfc1480abcc998cc8e05e75b2ee828b
Signed-off-by: Shaleen Agrawal
Signed-off-by: Pavankumar Kondeti

Shaleen Agrawal
2020-12-04 00:50:04 +0800

03 Nov, 2020

1 commit

6993d0fdb kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled ... Browse Code »

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

CPU0 CPU1
kthread_cancel_delayed_work_sync()
__kthread_cancel_work_sync()
__kthread_cancel_work()
work->canceling++;
kthread_delayed_work_timer_fn()
kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling is
set.

Signed-off-by: Zqiang
Signed-off-by: Andrew Morton
Reviewed-by: Petr Mladek
Acked-by: Tejun Heo
Cc:
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds

Zqiang
2020-11-03 04:14:19 +0800

17 Oct, 2020

1 commit

7b7b8a2c9 kernel/: fix repeated words in comments ... Browse Code »

Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word. Change one
instance of "the the" to "that the". Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-10-17 02:11:19 +0800

13 Aug, 2020

1 commit

3d13f313c uaccess: add force_uaccess_{begin,end} helpers ... Browse Code »

Add helpers to wrap the get_fs/set_fs magic for undoing any damange done
by set_fs(KERNEL_DS). There is no real functional benefit, but this
documents the intent of these calls better, and will allow stubbing the
functions out easily for kernels builds that do not allow address space
overrides in the future.

[hch@lst.de: drop two incorrect hunks, fix a commit log typo]
Link: http://lkml.kernel.org/r/20200714105505.935079-6-hch@lst.de

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Linus Torvalds
Acked-by: Mark Rutland
Acked-by: Greentime Hu
Acked-by: Geert Uytterhoeven
Cc: Nick Hu
Cc: Vincent Chen
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Link: http://lkml.kernel.org/r/20200710135706.537715-6-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-08-13 01:57:59 +0800

08 Aug, 2020

2 commits

4ca1085c9 kthread: remove incorrect comment in kthread_create_on_cpu() ... Browse Code »

Originally kthread_create_on_cpu() parked and woke up the new thread.
However, since commit a65d40961dc7 ("kthread/smpboot: do not park in
kthread_create_on_cpu()") this is no longer the case. This patch removes
the comment that has been left behind and is now incorrect / stale.

Fixes: a65d40961dc7 ("kthread/smpboot: do not park in kthread_create_on_cpu()")
Signed-off-by: Ilias Stamatis
Signed-off-by: Andrew Morton
Reviewed-by: Petr Mladek
Link: http://lkml.kernel.org/r/20200611135920.240551-1-stamatis.iliass@gmail.com
Signed-off-by: Linus Torvalds

Ilias Stamatis
2020-08-08 02:33:21 +0800
38cf307c1 mm: fix kthread_use_mm() vs TLB invalidate ... Browse Code »

For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:

CPU0 CPU1

flush_tlb_mm(mm) use_mm(mm)

tsk->active_mm = mm;

if (tsk->active_mm == mm)
// flush TLBs

switch_mm(old_mm,mm,tsk);

Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

Of the (SMP) architectures that have IPI based TLB invalidate:

Alpha - checks active_mm
ARC - ASID specific
IA64 - checks active_mm
MIPS - ASID specific flush
OpenRISC - shoots down world
PARISC - shoots down world
SH - ASID specific
SPARC - ASID specific
x86 - N/A
xtensa - checks active_mm

So at the very least Alpha, IA64 and Xtensa are suspect.

On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.

[akpm@linux-foundation.org: add comment]

Reported-by: Andy Lutomirski
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Andrew Morton
Cc: Nicholas Piggin
Cc: Jens Axboe
Cc: Kees Cook
Cc: Jann Horn
Cc: Will Deacon
Cc: Christoph Hellwig
Cc: Mathieu Desnoyers
Cc:
Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass.net
Signed-off-by: Linus Torvalds

Peter Zijlstra
2020-08-08 02:33:21 +0800

08 Jul, 2020

1 commit

faa2fd7cb Merge branch 'sched/urgent' Browse Code »

Peter Zijlstra
2020-07-08 17:38:59 +0800

18 Jun, 2020

1 commit

fe557319a maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault ... Browse Code »

Better describe what these functions do.

Suggested-by: Linus Torvalds
Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-18 01:57:41 +0800

15 Jun, 2020

2 commits

9cc5b8656 isolcpus: Affine unbound kernel threads to housekeeping cpus ... Browse Code »

This is a kernel enhancement that configures the cpu affinity of kernel
threads via kernel boot option nohz_full=.

When this option is specified, the cpumask is immediately applied upon
kthread launch. This does not affect kernel threads that specify cpu
and node.

This allows CPU isolation (that is not allowing certain threads
to execute on certain CPUs) without using the isolcpus=domain parameter,
making it possible to enable load balancing on such CPUs
during runtime (see kernel-parameters.txt).

Note-1: this is based off on Wind River's patch at
https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch

Difference being that this patch is limited to modifying kernel thread
cpumask. Behaviour of other threads can be controlled via cgroups or
sched_setaffinity.

Note-2: Wind River's patch was based off Christoph Lameter's patch at
https://lwn.net/Articles/565932/ with the only difference being
the kernel parameter changed from kthread to kthread_cpus.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Marcelo Tosatti
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200527142909.23372-3-frederic@kernel.org

Marcelo Tosatti
2020-06-15 20:10:03 +0800
043eb8e10 kthread: Switch to cpu_possible_mask ... Browse Code »

Next patch will switch unbound kernel threads mask to
housekeeping_cpumask(), a subset of cpu_possible_mask. So in order to
ease bisection, lets first switch kthreads default affinity from
cpu_all_mask to cpu_possible_mask.

It looks safe to do so as cpu_possible_mask seem to be initialized
at setup_arch() time, way before kthreadd is created.

Suggested-by: Frederic Weisbecker
Signed-off-by: Frederic Weisbecker
Signed-off-by: Marcelo Tosatti
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200527142909.23372-2-frederic@kernel.org

Marcelo Tosatti
2020-06-15 20:10:03 +0800

12 Jun, 2020

1 commit

623f6dc59 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge some more updates from Andrew Morton:

- various hotfixes and minor things

- hch's use_mm/unuse_mm clearnups

Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.

* emailed patches from Andrew Morton :
kernel: set USER_DS in kthread_use_mm
kernel: better document the use_mm/unuse_mm API contract
kernel: move use_mm/unuse_mm to kthread.c
kernel: move use_mm/unuse_mm to kthread.c
stacktrace: cleanup inconsistent variable type
lib: test get_count_order/long in test_bitops.c
mm: add comments on pglist_data zones
ocfs2: fix spelling mistake and grammar
mm/debug_vm_pgtable: fix kernel crash by checking for THP support
lib: fix bitmap_parse() on 64-bit big endian archs
checkpatch: correct check for kernel parameters doc
nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
lib/lz4/lz4_decompress.c: document deliberate use of `&'
kcov: check kcov_softirq in kcov_remote_stop()
scripts/spelling: add a few more typos
khugepaged: selftests: fix timeout condition in wait_for_scan()

Linus Torvalds
2020-06-12 04:25:53 +0800

11 Jun, 2020

3 commits

37c54f9bd kernel: set USER_DS in kthread_use_mm ... Browse Code »

Some architectures like arm64 and s390 require USER_DS to be set for
kernel threads to access user address space, which is the whole purpose of
kthread_use_mm, but other like x86 don't. That has lead to a huge mess
where some callers are fixed up once they are tested on said
architectures, while others linger around and yet other like io_uring try
to do "clever" optimizations for what usually is just a trivial asignment
to a member in the thread_struct for most architectures.

Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the
previous value instead.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Michael S. Tsirkin
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Felix Kuehling
Cc: Jason Wang
Cc: Zhenyu Wang
Cc: Zhi Wang
Cc: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800
f5678e7f2 kernel: better document the use_mm/unuse_mm API contract ... Browse Code »

Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Felix Kuehling
Acked-by: Greg Kroah-Hartman [usb]
Acked-by: Haren Myneni
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Jason Wang
Cc: "Michael S. Tsirkin"
Cc: Zhenyu Wang
Cc: Zhi Wang
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800
9bf5b9eb2 kernel: move use_mm/unuse_mm to kthread.c ... Browse Code »

Patch series "improve use_mm / unuse_mm", v2.

This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.

This patch (of 3):

Use the proper API instead.

Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward. Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Felix Kuehling
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Jason Wang
Cc: "Michael S. Tsirkin"
Cc: Zhenyu Wang
Cc: Zhi Wang
Cc: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800

09 May, 2020

1 commit

52782c92a kthread: save thread function ... Browse Code »

It's handy to keep the kthread_fn just as a unique cookie to identify
classes of kthreads. E.g. if you can verify that a given task is
running your thread_fn, then you may know what sort of type kthread_data
points to.

We'll use this in nfsd to pass some information into the vfs. Note it
will need kthread_data() exported too.

Original-patch-by: Tejun Heo
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2020-05-09 09:23:10 +0800

20 Mar, 2020

1 commit

26c7295be kthread: Do not preempt current task if it is going to call schedule() ... Browse Code »

when we create a kthread with ktrhead_create_on_cpu(),the child thread
entry is ktread.c:ktrhead() which will be preempted by the parent after
call complete(done) while schedule() is not called yet,then the parent
will call wait_task_inactive(child) but the child is still on the runqueue,
so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of
time,especially on startup.

parent child
ktrhead_create_on_cpu()
wait_fo_completion(&done) -----> ktread.c:ktrhead()
|----- complete(done);--wakeup and preempted by parent
kthread_bind() schedule();--dequeue here
wait_task_inactive(child) |
schedule_hrtimeout(1 jiffy) -|

So we hope the child just wakeup parent but not preempted by parent, and the
child is going to call schedule() soon,then the parent will not call
schedule_hrtimeout(1 jiffy) as the child is already dequeue.

The same issue for ktrhead_park()&&kthread_parkme().
This patch can save 120ms on rk312x startup with CONFIG_HZ=300.

Signed-off-by: Liang Chen
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Steven Rostedt (VMware)
Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.com

Liang Chen
2020-03-20 20:06:20 +0800

17 Oct, 2019

1 commit

bc88f85c6 kthread: make __kthread_queue_delayed_work static ... Browse Code »

The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:

kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static?

Signed-off-by: Ben Dooks
Signed-off-by: Linus Torvalds

Ben Dooks
2019-10-17 00:20:58 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

15 May, 2019

1 commit

8af0c18af include/: refactor headers to allow kthread.h inclusion in psi_types.h ... Browse Code »

kthread.h can't be included in psi_types.h because it creates a circular
inclusion with kthread.h eventually including psi_types.h and
complaining on kthread structures not being defined because they are
defined further in the kthread.h. Resolve this by removing psi_types.h
inclusion from the headers included from kthread.h.

Link: http://lkml.kernel.org/r/20190319235619.260832-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan
Acked-by: Johannes Weiner
Cc: Dennis Zhou
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Li Zefan
Cc: Peter Zijlstra
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suren Baghdasaryan
2019-05-15 10:52:48 +0800

07 Mar, 2019

2 commits

8dcd175bc Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- a few misc things

- ocfs2 updates

- most of MM

* emailed patches from Andrew Morton : (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...

Linus Torvalds
2019-03-07 02:31:36 +0800
45802da05 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:

- refcount conversions

- Solve the rq->leaf_cfs_rq_list can of worms for real.

- improve power-aware scheduling

- add sysctl knob for Energy Aware Scheduling

- documentation updates

- misc other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
kthread: Do not use TIMER_IRQSAFE
kthread: Convert worker lock to raw spinlock
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
sched/fair: Remove unused 'sd' parameter from select_idle_smt()
sched/wait: Use freezable_schedule() when possible
sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
sched/fair: Explain LLC nohz kick condition
sched/fair: Simplify nohz_balancer_kick()
sched/topology: Fix percpu data types in struct sd_data & struct s_data
sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
sched/fair: Fix O(nr_cgroups) in the load balancing path
sched/fair: Optimize update_blocked_averages()
sched/fair: Fix insertion in rq->leaf_cfs_rq_list
sched/fair: Add tmp_alone_branch assertion
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
sched/fair: Update scale invariance of PELT
sched/fair: Move the rq_of() helper function
sched/core: Convert task_struct.stack_refcount to refcount_t
...

Linus Torvalds
2019-03-07 00:14:05 +0800

06 Mar, 2019

1 commit

98fa15f34 mm: replace all open encodings for NUMA_NO_NODE ... Browse Code »

Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.

1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"

This patch (of 2):

At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.

Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual
Reviewed-by: David Hildenbrand
Acked-by: Jeff Kirsher [ixgbe]
Acked-by: Jens Axboe [mtip32xx]
Acked-by: Vinod Koul [dmaengine.c]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Doug Ledford [drivers/infiniband]
Cc: Joseph Qi
Cc: Hans Verkuil
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anshuman Khandual
2019-03-06 13:07:14 +0800

28 Feb, 2019

2 commits

ad01423ae kthread: Do not use TIMER_IRQSAFE ... Browse Code »

The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread:
initial support for delayed kthread work") which modelled the delayed
kthread code after workqueue's code. The workqueue code requires the flag
TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's
delay timer since all operations occur under a lock.

Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup()
for initialisation purpose which is the official function.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Reviewed-by: Petr Mladek
Link: https://lkml.kernel.org/r/20190212162554.19779-2-bigeasy@linutronix.de

Sebastian Andrzej Siewior
2019-02-28 18:18:38 +0800
fe99a4f4d kthread: Convert worker lock to raw spinlock ... Browse Code »

In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.

This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.

Reported-by: Steffen Trumtrar
Reported-by: Tim Sander
Signed-off-by: Julia Cartwright
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Tested-by: Steffen Trumtrar
Reviewed-by: Petr Mladek
Cc: Guenter Roeck
Link: https://lkml.kernel.org/r/20190212162554.19779-1-bigeasy@linutronix.de

Julia Cartwright
2019-02-28 18:18:38 +0800

11 Feb, 2019

1 commit

0121805d9 kthread: Add __kthread_should_park() ... Browse Code »

kthread_should_park() is used to check if the calling kthread ('current')
should park, but there is no function to check whether an arbitrary kthread
should be parked. The latter is required to plug a CPU hotplug race vs. a
parking ksoftirqd thread.

The new __kthread_should_park() receives a task_struct as parameter to
check if the corresponding kernel thread should be parked.

Call __kthread_should_park() from kthread_should_park() to avoid code
duplication.

Signed-off-by: Matthias Kaehlcke
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E . McKenney"
Cc: Sebastian Andrzej Siewior
Cc: Douglas Anderson
Cc: Stephen Boyd
Link: https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org

Matthias Kaehlcke
2019-02-11 04:51:39 +0800

14 Aug, 2018

1 commit

f7951c33f Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Thomas Gleixner:

- Cleanup and improvement of NUMA balancing

- Refactoring and improvements to the PELT (Per Entity Load Tracking)
code

- Watchdog simplification and related cleanups

- The usual pile of small incremental fixes and improvements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
watchdog: Reduce message verbosity
stop_machine: Reflow cpu_stop_queue_two_works()
sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
sched/numa: Use group_weights to identify if migration degrades locality
sched/numa: Update the scan period without holding the numa_group lock
sched/numa: Remove numa_has_capacity()
sched/numa: Modify migrate_swap() to accept additional parameters
sched/numa: Remove unused task_capacity from 'struct numa_stats'
sched/numa: Skip nodes that are at 'hoplimit'
sched/debug: Reverse the order of printing faults
sched/numa: Use task faults only if numa_group is not yet set up
sched/numa: Set preferred_node based on best_cpu
sched/numa: Simplify load_too_imbalanced()
sched/numa: Evaluate move once per node
sched/numa: Remove redundant field
sched/debug: Show the sum wait time of a task group
sched/fair: Remove #ifdefs from scale_rt_capacity()
sched/core: Remove get_cpu() from sched_fork()
sched/cpufreq: Clarify sugov_get_util()
sched/sysctl: Remove unused sched_time_avg_ms sysctl
...

Linus Torvalds
2018-08-14 02:25:07 +0800

26 Jul, 2018

1 commit

3e536e222 kthread, tracing: Don't expose half-written comm when creating kthreads ... Browse Code »

There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.

creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0

The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):

crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"

...and a strcpy out of there would cause stack corruption:

[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78

crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"

crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........

The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.

Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.

Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com

Cc: stable@vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware)
Signed-off-by: Snild Dolkow
Signed-off-by: Steven Rostedt (VMware)

Snild Dolkow
2018-07-26 21:59:33 +0800

03 Jul, 2018

2 commits

f83ee19be kthread: Simplify kthread_park() completion ... Browse Code »

Oleg explains the reason we could hit park+park is that
smpboot_update_cpumask_percpu_thread()'s

for_each_cpu_and(cpu, &tmp, cpu_online_mask)
smpboot_park_kthread();

turns into:

for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
smpboot_park_kthread();

on UP, ignoring the mask. But since we just completely removed that
function, this is no longer relevant.

So revert commit:

b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread")

Suggested-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-07-03 15:20:44 +0800
1cef1150e kthread, sched/core: Fix kthread_parkme() (again...) ... Browse Code »

Gaurav reports that commit:

85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")

isn't working for him. Because of the following race:

> controller Thread CPUHP Thread
> takedown_cpu
> kthread_park
> kthread_parkme
> Set KTHREAD_SHOULD_PARK
> smpboot_thread_fn
> set Task interruptible
>
>
> wake_up_process
> if (!(p->state & state))
> goto out;
>
> Kthread_parkme
> SET TASK_PARKED
> schedule
> raw_spin_lock(&rq->lock)
> ttwu_remote
> waiting for __task_rq_lock
> context_switch
>
> finish_lock_switch
>
>
>
> Case TASK_PARKED
> kthread_park_complete
>
>
> SET Running

Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
handling is buggered because the TASK_DEAD thing is done with
preemption disabled, the current code can still complete early on
preemption :/

So basically revert that earlier fix and go with a variant of the
alternative mentioned in the commit. Promote TASK_PARKED to special
state to avoid the store-store issue on task->state leading to the
WARN in kthread_unpark() -> __kthread_bind().

But in addition, add wait_task_inactive() to kthread_park() to ensure
the task really is PARKED when we return from kthread_park(). This
avoids the whole kthread still gets migrated nonsense -- although it
would be really good to get this done differently.

Reported-by: Gaurav Kohli
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-07-03 15:17:30 +0800

25 May, 2018

1 commit

b1f5b378e kthread: Allow kthread_park() on a parked kthread ... Browse Code »

The following commit:

85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")

added a WARN() in the case where we call kthread_park() on an already
parked thread, because the old code wasn't doing the right thing there
and it wasn't at all clear that would happen.

It turns out, this does in fact happen, so we have to deal with it.

Instead of potentially returning early, also wait for the completion.
This does however mean we have to use complete_all() and re-initialize
the completion on re-use.

Reported-by: LKP
Tested-by: Meelis Roos
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: kernel test robot
Cc: wfg@linux.intel.com
Cc: Thomas Gleixner
Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Link: http://lkml.kernel.org/r/20180504091142.GI12235@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-05-25 14:03:51 +0800

03 May, 2018

2 commits

85f1abe00 kthread, sched/wait: Fix kthread_parkme() completion issue ... Browse Code »

Even with the wait-loop fixed, there is a further issue with
kthread_parkme(). Upon hotplug, when we do takedown_cpu(),
smpboot_park_threads() can return before all those threads are in fact
blocked, due to the placement of the complete() in __kthread_parkme().

When that happens, sched_cpu_dying() -> migrate_tasks() can end up
migrating such a still runnable task onto another CPU.

Normally the task will have hit schedule() and gone to sleep by the
time we do kthread_unpark(), which will then do __kthread_bind() to
re-bind the task to the correct CPU.

However, when we loose the initial TASK_PARKED store to the concurrent
wakeup issue described previously, do the complete(), get migrated, it
is possible to either:

- observe kthread_unpark()'s clearing of SHOULD_PARK and terminate
the park and set TASK_RUNNING, or

- __kthread_bind()'s wait_task_inactive() to observe the competing
TASK_RUNNING store.

Either way the WARN() in __kthread_bind() will trigger and fail to
correctly set the CPU affinity.

Fix this by only issuing the complete() when the kthread has scheduled
out. This does away with all the icky 'still running' nonsense.

The alternative is to promote TASK_PARKED to a special state, this
guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING
and we'll end up doing the right thing, but this preserves the whole
icky business of potentially migating the still runnable thing.

Reported-by: Gaurav Kohli
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-05-03 13:38:05 +0800
741a76b35 kthread, sched/wait: Fix kthread_parkme() wait-loop ... Browse Code »

Gaurav reported a problem with __kthread_parkme() where a concurrent
try_to_wake_up() could result in competing stores to ->state which,
when the TASK_PARKED store got lost bad things would happen.

The comment near set_current_state() actually mentions this competing
store, but only mentions the case against TASK_RUNNING. This same
store, with different timing, can happen against a subsequent !RUNNING
store.

This normally is not a problem, because as per that same comment, the
!RUNNING state store is inside a condition based wait-loop:

for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!need_sleep)
break;
schedule();
}
__set_current_state(TASK_RUNNING);

If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous
(concurrent) wakeup, the schedule() will NO-OP and we'll go around the
loop once more.

The problem here is that the TASK_PARKED store is not inside the
KTHREAD_SHOULD_PARK condition wait-loop.

There is a genuine issue with sleeps that do not have a condition;
this is addressed in a subsequent patch.

Reported-by: Gaurav Kohli
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Oleg Nesterov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-05-03 13:38:04 +0800

22 Nov, 2017

1 commit

841b86f32 treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts ... Browse Code »

With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

perl -pi -e 's|$TIMER_FUNC_TYPE$||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

perl -pi -e 's|$TIMER_DATA_TYPE$||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 08:35:54 +0800

15 Nov, 2017

1 commit

e2c5923c3 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.

Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:

- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.

- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.

- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)

- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)

- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).

- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).

- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).

- Fix laptop mode on blk-mq (me).

- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).

- blktrace race fix, oopsing on sg load (me).

- blk-mq optimizations (me).

- Obscure waitqueue race fix for kyber (Omar).

- NBD fixes (Josef).

- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).

- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.

- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.

- BFQ updates (Paolo).

- blk-mq atomic flags memory ordering fixes (Peter Z).

- Loop cgroup support (Shaohua).

- Lots of minor fixes from lots of different folks, both for core and
driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...

Linus Torvalds
2017-11-15 07:32:19 +0800

11 Nov, 2017

1 commit

e10237cc7 kthread: zero the kthread data structure ... Browse Code »

kthread() could bail out early before we initialize blkcg_css (if the
kthread is killed very early. Please see xchg() statement in kthread()),
which confuses free_kthread_struct. Instead of moving the blkcg_css
initialization early, we simply zero the whole 'self' data structure,
which doesn't sound much overhead.

Reported-by: syzbot
Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info")
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Dmitry Vyukov
Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-11-11 10:53:25 +0800

05 Oct, 2017

1 commit

fe5c3b69b kthread: Convert callback to use from_timer() ... Browse Code »

In preparation for unconditionally passing the struct timer_list pointer
to all timer callbacks, switch kthread to use from_timer() and pass the
timer pointer explicitly.

Signed-off-by: Kees Cook
Signed-off-by: Thomas Gleixner
Cc: linux-mips@linux-mips.org
Cc: Len Brown
Cc: Benjamin Herrenschmidt
Cc: Lai Jiangshan
Cc: Sebastian Reichel
Cc: Kalle Valo
Cc: Paul Mackerras
Cc: Pavel Machek
Cc: linux1394-devel@lists.sourceforge.net
Cc: Chris Metcalf
Cc: linux-s390@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: "James E.J. Bottomley"
Cc: Wim Van Sebroeck
Cc: Michael Ellerman
Cc: Ursula Braun
Cc: Geert Uytterhoeven
Cc: Viresh Kumar
Cc: Harish Patil
Cc: Stephen Boyd
Cc: Guenter Roeck
Cc: Manish Chopra
Cc: Petr Mladek
Cc: Arnd Bergmann
Cc: linux-pm@vger.kernel.org
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Julian Wiedmann
Cc: John Stultz
Cc: Mark Gross
Cc: linux-watchdog@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen"
Cc: Greg Kroah-Hartman
Cc: "Rafael J. Wysocki"
Cc: Oleg Nesterov
Cc: Ralf Baechle
Cc: Stefan Richter
Cc: Michael Reed
Cc: netdev@vger.kernel.org
Cc: Tejun Heo
Cc: Andrew Morton
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Sudip Mukherjee
Link: https://lkml.kernel.org/r/1507159627-127660-13-git-send-email-keescook@chromium.org

Kees Cook
2017-10-05 21:01:22 +0800

27 Sep, 2017

1 commit

0b508bc92 block: fix a build error ... Browse Code »

The code is only for blkcg not for all cgroups

Fixes: d4478e92d618 ("block/loop: make loop cgroup aware")
Reported-by: kbuild test robot
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-27 02:07:24 +0800

26 Sep, 2017

1 commit

05e3db95e kthread: add a mechanism to store cgroup info ... Browse Code »

kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.

Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800

01 Sep, 2017

1 commit

22cf8bc6c kernel/kthread.c: kthread_worker: don't hog the cpu ... Browse Code »

If the worker thread continues getting work, it will hog the cpu and rcu
stall complains. Make it a good citizen. This is triggered in a loop
block device test.

Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com
Signed-off-by: Shaohua Li
Cc: Petr Mladek
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2017-09-01 07:33:15 +0800