Eric Lee / smarc-fsl-linux-kernel

01 May, 2013

1 commit

cd42d559e kthread: implement probe_kthread_data() ... Browse Code »

One of the problems that arise when converting dedicated custom threadpool
to workqueue is that the shared worker pool used by workqueue anonimizes
each worker making it more difficult to identify what the worker was doing
on which target from the output of sysrq-t or debug dump from oops, BUG()
and friends.

For example, after writeback is converted to use workqueue instead of
priviate thread pool, there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, which,
according to our writeback brethren, is important in tracking down issues
with a lot of mounted file systems on a lot of different devices.

This patchset implements a way for a work function to mark its execution
instance so that task dump of the worker task includes information to
indicate what the work item was doing.

An example WARN dump would look like the following.

WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
Modules linked in:
CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
Workqueue: writeback bdi_writeback_workfn (flush-8:16)
ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
Call Trace:
[] dump_stack+0x19/0x1b
[] warn_slowpath_common+0x70/0xa0
...

This patch:

Implement probe_kthread_data() which returns kthread_data if accessible.
The function is equivalent to kthread_data() except that the specified
@task may not be a kthread or its vfork_done is already cleared rendering
struct kthread inaccessible. In the former case, probe_kthread_data() may
return any value. In the latter, NULL.

This will be used to safely print debug information without affecting
synchronization in the normal paths. Workqueue debug info printing on
dump_stack() and friends will make use of it.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Acked-by: Jan Kara
Cc: Dave Chinner
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2013-05-01 08:04:02 +0800

30 Apr, 2013

3 commits

46d9be3e5 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue updates from Tejun Heo:
"A lot of activities on workqueue side this time. The changes achieve
the followings.

- WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
updated to be able to interface with multiple backend worker pools.
This involved a lot of churning but the end result seems actually
neater as unbound workqueues are now a lot closer to per-cpu ones.

- The ability to interface with multiple backend worker pools are
used to implement unbound workqueues with custom attributes.
Currently the supported attributes are the nice level and CPU
affinity. It may be expanded to include cgroup association in
future. The attributes can be specified either by calling
apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
the workqueue in question is exported through sysfs.

The backend worker pools are keyed by the actual attributes and
shared by any workqueues which share the same attributes. When
attributes of a workqueue are changed, the workqueue binds to the
worker pool with the specified attributes while leaving the work
items which are already executing in its previous worker pools
alone.

This allows converting custom worker pool implementations which
want worker attribute tuning to use workqueues. The writeback pool
is already converted in block tree and there are a couple others
are likely to follow including btrfs io workers.

- WQ_UNBOUND's ability to bind to multiple worker pools is also used
to make it NUMA-aware. Because there's no association between work
item issuer and the specific worker assigned to execute it, before
this change, using unbound workqueue led to unnecessary cross-node
bouncing and it couldn't be helped by autonuma as it requires tasks
to have implicit node affinity and workers are assigned randomly.

After these changes, an unbound workqueue now binds to multiple
NUMA-affine worker pools so that queued work items are executed in
the same node. This is turned on by default but can be disabled
system-wide or for individual workqueues.

Crypto was requesting NUMA affinity as encrypting data across
different nodes can contribute noticeable overhead and doing it
per-cpu was too limiting for certain cases and IO throughput could
be bottlenecked by one CPU being fully occupied while others have
idle cycles.

While the new features required a lot of changes including
restructuring locking, it didn't complicate the execution paths much.
The unbound workqueue handling is now closer to per-cpu ones and the
new features are implemented by simply associating a workqueue with
different sets of backend worker pools without changing queue,
execution or flush paths.

As such, even though the amount of change is very high, I feel
relatively safe in that it isn't likely to cause subtle issues with
basic correctness of work item execution and handling. If something
is wrong, it's likely to show up as being associated with worker pools
with the wrong attributes or OOPS while workqueue attributes are being
changed or during CPU hotplug.

While this creates more backend worker pools, it doesn't add too many
more workers unless, of course, there are many workqueues with unique
combinations of attributes. Assuming everything else is the same,
NUMA awareness costs an extra worker pool per NUMA node with online
CPUs.

There are also a couple things which are being routed outside the
workqueue tree.

- block tree pulled in workqueue for-3.10 so that writeback worker
pool can be converted to unbound workqueue with sysfs control
exposed. This simplifies the code, makes writeback workers
NUMA-aware and allows tuning nice level and CPU affinity via sysfs.

- The conversion to workqueue means that there's no 1:1 association
between a specific worker, which makes writeback folks unhappy as
they want to be able to tell which filesystem caused a problem from
backtrace on systems with many filesystems mounted. This is
resolved by allowing work items to set debug info string which is
printed when the task is dumped. As this change involves unifying
implementations of dump_stack() and friends in arch codes, it's
being routed through Andrew's -mm tree."

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
workqueue: use kmem_cache_free() instead of kfree()
workqueue: avoid false negative WARN_ON() in destroy_workqueue()
workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
workqueue: implement NUMA affinity for unbound workqueues
workqueue: introduce put_pwq_unlocked()
workqueue: introduce numa_pwq_tbl_install()
workqueue: use NUMA-aware allocation for pool_workqueues
workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
workqueue: map an unbound workqueues to multiple per-node pool_workqueues
workqueue: move hot fields of workqueue_struct to the end
workqueue: make workqueue->name[] fixed len
workqueue: add workqueue->unbound_attrs
workqueue: determine NUMA node of workers accourding to the allowed cpumask
workqueue: drop 'H' from kworker names of unbound worker pools
workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
workqueue: fix memory leak in apply_workqueue_attrs()
workqueue: fix unbound workqueue attrs hashing / comparison
workqueue: fix race condition in unbound workqueue free path
workqueue: remove pwq_lock which is no longer used
...

Linus Torvalds
2013-04-30 10:07:40 +0800
b5c5442bb kthread: kill task_get_live_kthread() ... Browse Code »

task_get_live_kthread() looks confusing and unneeded. It does
get_task_struct() but only kthread_stop() needs this, it can be called
even if the calller doesn't have a reference when we know that this
kthread can't exit until we do kthread_stop().

kthread_park() and kthread_unpark() do not need get_task_struct(), the
callers already have the reference. And it can not help if we can race
with the exiting kthread anyway, kthread_park() can hang forever in this
case.

Change kthread_park() and kthread_unpark() to use to_live_kthread(),
change kthread_stop() to do get_task_struct() by hand and remove
task_get_live_kthread().

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Namhyung Kim
Cc: "Paul E. McKenney"
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: "Srivatsa S. Bhat"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-04-30 06:54:25 +0800
4ecdafc80 kthread: introduce to_live_kthread() ... Browse Code »

"k->vfork_done != NULL" with a barrier() after to_kthread(k) in
task_get_live_kthread(k) looks unclear, and sub-optimal because we load
->vfork_done twice.

All we need is to ensure that we do not return to_kthread(NULL). Add a
new trivial helper which loads/checks ->vfork_done once, this also looks
more understandable.

Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Namhyung Kim
Cc: "Paul E. McKenney"
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: "Srivatsa S. Bhat"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-04-30 06:54:25 +0800

12 Apr, 2013

1 commit

f2530dc71 kthread: Prevent unpark race which puts threads on the wrong cpu ... Browse Code »

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones
Reported-and-tested-by: Dave Hansen
Reported-and-tested-by: Borislav Petkov
Acked-by: Peter Ziljstra
Cc: Srivatsa S. Bhat
Cc: dhillf@gmail.com
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-04-12 20:18:43 +0800

20 Mar, 2013

1 commit

14a40ffcc sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY ... Browse Code »

PF_THREAD_BOUND was originally used to mark kernel threads which were
bound to a specific CPU using kthread_bind() and a task with the flag
set allows cpus_allowed modifications only to itself. Workqueue is
currently abusing it to prevent userland from meddling with
cpus_allowed of workqueue workers.

What we need is a flag to prevent userland from messing with
cpus_allowed of certain kernel tasks. In kernel, anyone can
(incorrectly) squash the flag, and, for worker-type usages,
restricting cpus_allowed modification to the task itself doesn't
provide meaningful extra proection as other tasks can inject work
items to the task anyway.

This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY.
sched_setaffinity() checks the flag and return -EINVAL if set.
set_cpus_allowed_ptr() is no longer affected by the flag.

This will allow simplifying workqueue worker CPU affinity management.

Signed-off-by: Tejun Heo
Acked-by: Ingo Molnar
Reviewed-by: Lai Jiangshan
Cc: Peter Zijlstra
Cc: Thomas Gleixner

Tejun Heo
2013-03-20 04:45:20 +0800

13 Dec, 2012

1 commit

aee4faa49 kthread: use N_MEMORY instead N_HIGH_MEMORY ... Browse Code »

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Christoph Lameter
Cc: Hillf Danton
Cc: Lin Feng
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2012-12-13 09:38:33 +0800

13 Oct, 2012

2 commits

4e21fc138 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull third pile of kernel_execve() patches from Al Viro:
"The last bits of infrastructure for kernel_thread() et.al., with
alpha/arm/x86 use of those. Plus sanitizing the asm glue and
do_notify_resume() on alpha, fixing the "disabled irq while running
task_work stuff" breakage there.

At that point the rest of kernel_thread/kernel_execve/sys_execve work
can be done independently for different architectures. The only
pending bits that do depend on having all architectures converted are
restrictred to fs/* and kernel/* - that'll obviously have to wait for
the next cycle.

I thought we'd have to wait for all of them done before we start
eliminating the longjump-style insanity in kernel_execve(), but it
turned out there's a very simple way to do that without flagday-style
changes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to saner kernel_execve() semantics
arm: switch to saner kernel_execve() semantics
x86, um: convert to saner kernel_execve() semantics
infrastructure for saner ret_from_kernel_thread semantics
make sure that kernel_thread() callbacks call do_exit() themselves
make sure that we always have a return path from kernel_execve()
ppc: eeh_event should just use kthread_run()
don't bother with kernel_thread/kernel_execve for launching linuxrc
alpha: get rid of switch_stack argument of do_work_pending()
alpha: don't bother passing switch_stack separately from regs
alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
alpha: simplify TIF_NEED_RESCHED handling

Linus Torvalds
2012-10-13 09:05:52 +0800
a74fb73c1 infrastructure for saner ret_from_kernel_thread semantics ... Browse Code »

* allow kernel_execve() leave the actual return to userland to
caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers
updated accordingly.
* architecture that does select GENERIC_KERNEL_EXECVE in its
Kconfig should have its ret_from_kernel_thread() do this:
call schedule_tail
call the callback left for it by copy_thread(); if it ever
returns, that's because it has just done successful kernel_execve()
jump to return from syscall
IOW, its only difference from ret_from_fork() is that it does call the
callback.
* such an architecture should also get rid of ret_from_kernel_execve()
and __ARCH_WANT_KERNEL_EXECVE

This is the last part of infrastructure patches in that area - from
that point on work on different architectures can live independently.

Signed-off-by: Al Viro

Al Viro
2012-10-13 01:35:07 +0800

13 Aug, 2012

1 commit

2a1d44601 kthread: Implement park/unpark facility ... Browse Code »

To avoid the full teardown/setup of per cpu kthreads in the case of
cpu hot(un)plug, provide a facility which allows to put the kthread
into a park position and unpark it when the cpu comes online again.

Signed-off-by: Thomas Gleixner
Reviewed-by: Namhyung Kim
Cc: Peter Zijlstra
Reviewed-by: Srivatsa S. Bhat
Cc: Rusty Russell
Reviewed-by: Paul E. McKenney
Link: http://lkml.kernel.org/r/20120716103948.236618824@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2012-08-13 23:01:06 +0800

23 Jul, 2012

2 commits

46f3d9762 kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed ... Browse Code »

kthread_worker provides minimalistic workqueue-like interface for
users which need a dedicated worker thread (e.g. for realtime
priority). It has basic queue, flush_work, flush_worker operations
which mostly match the workqueue counterparts; however, due to the way
flush_work() is implemented, it has a noticeable difference of not
allowing work items to be freed while being executed.

While the current users of kthread_worker are okay with the current
behavior, the restriction does impede some valid use cases. Also,
removing this difference isn't difficult and actually makes the code
easier to understand.

This patch reimplements flush_kthread_work() such that it uses a
flush_work item instead of queue/done sequence numbers.

Signed-off-by: Tejun Heo

Tejun Heo
2012-07-23 01:15:28 +0800
9a2e03d8e kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation ... Browse Code »

Make the following two non-functional changes.

* Separate out insert_kthread_work() from queue_kthread_work().

* Relocate struct kthread_flush_work and kthread_flush_work_fn()
definitions above flush_kthread_work().

v2: Added lockdep_assert_held() in insert_kthread_work() as suggested
by Andy Walls.

Signed-off-by: Tejun Heo
Acked-by: Andy Walls

Tejun Heo
2012-07-23 01:11:01 +0800

24 Nov, 2011

1 commit

34b087e48 freezer: kill unused set_freezable_with_signal() ... Browse Code »

There's no in-kernel user of set_freezable_with_signal() left. Mixing
TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
kernel threads never travel signal delivery path on their own.

e.g. the current implementation is buggy in the cancelation path of
__thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state. This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.

This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.

Signed-off-by: Tejun Heo
Acked-by: Oleg Nesterov

Tejun Heo
2011-11-24 01:28:17 +0800

22 Nov, 2011

1 commit

8a32c441c freezer: implement and use kthread_freezable_should_stop() ... Browse Code »

Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked. Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads. They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo
Acked-by: Henrique de Moraes Holschuh
Cc: Jens Axboe
Cc: Oleg Nesterov

Tejun Heo
2011-11-22 04:32:23 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

28 May, 2011

1 commit

1e1b6c511 cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed ... Browse Code »

The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
tsk->cpus_allowed. Otherwise RT scheduler may confuse.

Signed-off-by: KOSAKI Motohiro
Cc: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2011-05-28 23:02:57 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

1 commit

207205a2b kthread: NUMA aware kthread_create_on_node() ... Browse Code »

All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.

This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter
to parameters already used by kthread_create().

This parameter serves in allocating memory for the new kthread on its
memory node if possible.

Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Cc: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2011-03-23 08:44:01 +0800

07 Jan, 2011

1 commit

c9b5f501e sched: Constify function scope static struct sched_param usage ... Browse Code »

Function-scope statics are discouraged because they are
easily overlooked and can cause subtle bugs/races due to
their global (non-SMP safe) nature.

Linus noticed that we did this for sched_param - at minimum
make the const.

Suggested-by: Linus Torvalds
Signed-off-by: Peter Zijlstra
LKML-Reference: Message-ID:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-01-07 22:55:45 +0800

05 Jan, 2011

1 commit

27066fd48 Merge commit 'v2.6.37' into sched/core ... Browse Code »

Merge reason: Merge the final .37 tree.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-01-05 21:14:46 +0800

22 Dec, 2010

1 commit

4f32e9b1f kthread_work: make lockdep happy ... Browse Code »

spinlock in kthread_worker and wait_queue_head in kthread_work both
should be lockdep sensible, so change the interface to make it
suiltable for CONFIG_LOCKDEP.

tj: comment update

Reported-by: Nicolas
Signed-off-by: Yong Zhang
Signed-off-by: Andy Walls
Tested-by: Andy Walls
Cc: Tejun Heo
Cc: Andrew Morton
Signed-off-by: Tejun Heo

Yong Zhang
2010-12-22 17:27:53 +0800

23 Oct, 2010

1 commit

fe7de49f9 sched: Make sched_param argument static in sched_setscheduler() callers ... Browse Code »

Andrew Morton pointed out almost all sched_setscheduler() callers are
using fixed parameters and can be converted to static. It reduces runtime
memory use a little.

Signed-off-by: KOSAKI Motohiro
Reported-by: Andrew Morton
Acked-by: James Morris
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2010-10-23 23:56:48 +0800

29 Jun, 2010

2 commits

82805ab77 kthread: implement kthread_data() ... Browse Code »

Implement kthread_data() which takes @task pointing to a kthread and
returns @data specified when creating the kthread. The caller is
responsible for ensuring the validity of @task when calling this
function.

Signed-off-by: Tejun Heo

Tejun Heo
2010-06-29 16:07:09 +0800
b56c0d893 kthread: implement kthread_worker ... Browse Code »

Implement simple work processor for kthread. This is to ease using
kthread. Single thread workqueue used to be used for things like this
but workqueue won't guarantee fixed kthread association anymore to
enable worker sharing.

This can be used in cases where specific kthread association is
necessary, for example, when it should have RT priority or be assigned
to certain cgroup.

Signed-off-by: Tejun Heo
Cc: Andrew Morton

Tejun Heo
2010-06-29 16:07:09 +0800

25 Mar, 2010

1 commit

5ab116c93 cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node ... Browse Code »

cpuset_mem_spread_node() returns an offline node, and causes an oops.

This patch fixes it by initializing task->mems_allowed to
node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing
memory hotplug.

Signed-off-by: Miao Xie
Acked-by: David Rientjes
Reported-by: Nick Piggin
Tested-by: Nick Piggin
Cc: Paul Menage
Cc: Li Zefan
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-03-25 07:31:21 +0800

09 Feb, 2010

1 commit

301ba0457 kthread, sched: Remove reference to kthread_create_on_cpu ... Browse Code »

kthread_create_on_cpu doesn't exist so update a comment in
kthread.c to reflect this.

Signed-off-by: Anton Blanchard
Acked-by: Rusty Russell
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Anton Blanchard
2010-02-09 18:47:39 +0800

17 Dec, 2009

1 commit

881232b70 sched: Move kthread_bind() back to kthread.c ... Browse Code »

Since kthread_bind() lost its dependencies on sched.c, move it
back where it came from.

Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-12-17 02:01:57 +0800

03 Nov, 2009

1 commit

b84ff7d6f sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c ... Browse Code »

Eric Paris reported that commit
f685ceacab07d3f6c236f04803e2f2f0dbcc5afb causes boot time
PREEMPT_DEBUG complaints.

[ 4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314
[ 4.593043] caller is task_hot+0x86/0xd0

Since kthread_bind() messes with scheduler internals, move the
body to sched.c, and lock the runqueue.

Reported-by: Eric Paris
Signed-off-by: Mike Galbraith
Tested-by: Eric Paris
Cc: Peter Zijlstra
LKML-Reference:
[ v2: fix !SMP build and clean up ]
Signed-off-by: Ingo Molnar

Mike Galbraith
2009-11-03 14:25:00 +0800

09 Sep, 2009

1 commit

61cbe54d9 sched: Keep kthreads at default priority ... Browse Code »

Removes kthread/workqueue priority boost, they increase worst-case
desktop latencies.

Signed-off-by: Mike Galbraith
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2009-09-09 23:30:06 +0800

28 Jul, 2009

1 commit

9ae260270 update the comment in kthread_stop() ... Browse Code »

Commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 ("kthreads: rework
kthread_stop()") removed the limitation that the thread function mysr
not call do_exit() itself, but forgot to update the comment.

Since that commit it is OK to use kthread_stop() even if kthread can
exit itself.

Signed-off-by: Oleg Nesterov
Signed-off-by: Rusty Russell
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-07-28 03:15:46 +0800

19 Jun, 2009

2 commits

63706172f kthreads: rework kthread_stop() ... Browse Code »
43

Based on Eric's patch which in turn was based on my patch.

kthread_stop() has the nasty problems:

- it runs unpredictably long with the global semaphore held.

- it deadlocks if kthread itself does kthread_stop() before it obeys
the kthread_should_stop() request.

- it is not useable if kthread exits on its own, see for example the
ugly "wait_to_die:" hack in migration_thread()

- it is not possible to just tell kthread it should stop, we must always
wait for its exit.

With this patch kthread() allocates all neccesary data (struct kthread) on
its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used
as a pointer into "struct kthread", this means kthread_stop() can easily
wait for kthread's exit.

Signed-off-by: Oleg Nesterov
Cc: Christoph Hellwig
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Pavel Emelyanov
Cc: Rusty Russell
Cc: Vitaliy Gusev
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-19 04:03:54 +0800
cdd140bdd kthreads: simplify the startup synchronization ... Browse Code »

We use two completions two create the kernel thread, this is a bit ugly.
kthread() wakes up create_kthread() via ->started, then create_kthread()
wakes up the caller kthread_create() via ->done. But kthread() does not
need to wait for kthread(), it can just return. Instead kthread() itself
can wake up the caller of kthread_create().

Kill kthread_create_info->started, ->done is enough. This improves the
scalability a bit and sijmplifies the code.

The only problem if kernel_thread() fails, in that case create_kthread()
must do complete(&create->done).

Signed-off-by: Oleg Nesterov
Cc: Christoph Hellwig
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Pavel Emelyanov
Cc: Rusty Russell
Cc: Vitaliy Gusev
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-19 04:03:53 +0800

17 Jun, 2009

1 commit

58568d2a8 cpuset,mm: update tasks' mems_allowed in time ... Browse Code »

Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy. Because the memory policy is applied in the process's
context originally. After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression. But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set. In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
The rework of mpol_new() to extract the adjusting of the node mask to
apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
allocation. Fix this by adding the check for MPOL_PREFERRED and empty
node mask to mpol_new_mpolicy().

Remove the now unneeded 'nodes = NULL' from mpol_new().

Note that mpol_new_mempolicy() is always called with a non-NULL
'nodes' parameter now that it has been removed from mpol_new().
Therefore, we don't need to test nodes for NULL before testing it for
'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to
verify this assumption.]
[lee.schermerhorn@hp.com:

I don't think the function name 'mpol_new_mempolicy' is descriptive
enough to differentiate it from mpol_new().

This function applies cpuset set context, usually constraining nodes
to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag
is set, it also translates the nodes. So I settled on
'mpol_set_nodemask()', because the comment block for mpol_new() mentions
that we need to call this function to "set nodes".

Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: Paul Menage
Cc: Nick Piggin
Cc: Yasunori Goto
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2009-06-17 10:47:31 +0800

15 Apr, 2009

2 commits

ad8d75fff tracing/events: move trace point headers into include/trace/events ... Browse Code »

Impact: clean up

Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.

Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Neil Horman
Cc: Zhao Lei
Cc: Eduard - Gabriel Munteanu
Cc: Pekka Enberg
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-04-15 10:05:43 +0800
a8d154b00 tracing: create automated trace defines ... Browse Code »

This patch lowers the number of places a developer must modify to add
new tracepoints. The current method to add a new tracepoint
into an existing system is to write the trace point macro in the
trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
DECLARE_TRACE, then they must add the same named item into the C file
with the macro DEFINE_TRACE(name) and then add the trace point.

This change cuts out the needing to add the DEFINE_TRACE(name).
Every file that uses the tracepoint must still include the trace/.h
file, but the one C file must also add a define before the including
of that file.

#define CREATE_TRACE_POINTS
#include

This will cause the trace/mytrace.h file to also produce the C code
necessary to implement the trace point.

Note, if more than one trace/.h is used to create the C code
it is best to list them all together.

#define CREATE_TRACE_POINTS
#include
#include
#include

Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
the cleaner solution of the define above the includes over my first
design to have the C code include a "special" header.

This patch converts sched, irq and lockdep and skb to use this new
method.

Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Neil Horman
Cc: Zhao Lei
Cc: Eduard - Gabriel Munteanu
Cc: Pekka Enberg
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-04-15 00:57:28 +0800

09 Apr, 2009

2 commits

1c99315bb kthread: move sched-realeted initialization from kthreadd context ... Browse Code »

kthreadd is the single thread which implements ths "create" request, move
sched_setscheduler/etc from create_kthread() to kthread_create() to
improve the scalability.

We should be careful with sched_setscheduler(), use _nochek helper.

Signed-off-by: Oleg Nesterov
Cc: Christoph Hellwig
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Pavel Emelyanov
Cc: Vitaliy Gusev
Signed-off-by: Rusty Russell

Oleg Nesterov
2009-04-09 08:20:37 +0800
3217ab97f kthread: Don't looking for a task in create_kthread() #2 ... Browse Code »

Remove the unnecessary find_task_by_pid_ns(). kthread() can just
use "current" to get the same result.

Signed-off-by: Vitaliy Gusev
Acked-by: Oleg Nesterov
Signed-off-by: Rusty Russell

Vitaliy Gusev
2009-04-09 08:20:36 +0800

30 Mar, 2009

1 commit

1a2142afa cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL ... Browse Code »

Impact: cleanup

(Thanks to Al Viro for reminding me of this, via Ingo)

CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

#define CPU_MASK_ALL (cpumask_t) { { ... } }

Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:

#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

Which formalizes this practice. One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).

So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
with the modern "cpu_all_mask" (a real const struct cpumask *).

Signed-off-by: Rusty Russell
Acked-by: Ingo Molnar
Reported-by: Al Viro
Cc: Mike Travis

Rusty Russell
2009-03-30 19:35:11 +0800

16 Nov, 2008

1 commit

7e066fb87 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() ... Browse Code »

Impact: API *CHANGE*. Must update all tracepoint users.

Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
structure in a single spot for all the kernel. It helps reducing memory
consumption, especially when declaring a lot of tracepoints, e.g. for
kmalloc tracing.

*API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
to do it. The name previously used was misleading.

Updates scheduler instrumentation to follow this API change.

Signed-off-by: Mathieu Desnoyers
Signed-off-by: Ingo Molnar

Mathieu Desnoyers
2008-11-16 16:01:36 +0800

21 Oct, 2008

1 commit

92b29b86f Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
tracing/fastboot: improve help text
tracing/stacktrace: improve help text
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix bootgraph.pl initcall name regexp
tracing/fastboot: fix issues and improve output of bootgraph.pl
tracepoints: synchronize unregister static inline
tracepoints: tracepoint_synchronize_unregister()
ftrace: make ftrace_test_p6nop disassembler-friendly
markers: fix synchronize marker unregister static inline
tracing/fastboot: add better resolution to initcall debug/tracing
trace: add build-time check to avoid overrunning hex buffer
ftrace: fix hex output mode of ftrace
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix printk format typo in boot tracer
ftrace: return an error when setting a nonexistent tracer
ftrace: make some tracers reentrant
ring-buffer: make reentrant
ring-buffer: move page indexes into page headers
tracing/fastboot: only trace non-module initcalls
ftrace: move pc counter in irqtrace
...

Manually fix conflicts:
- init/main.c: initcall tracing
- kernel/module.c: verbose level vs tracepoints
- scripts/bootgraph.pl: fallout from cherry-picking commits.

Linus Torvalds
2008-10-21 04:35:07 +0800