Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 Dec, 2012

1 commit

38d7bee9d cpuset: use N_MEMORY instead N_HIGH_MEMORY ... Browse Code »

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan
Acked-by: Hillf Danton
Signed-off-by: Wen Congyang
Cc: Christoph Lameter
Cc: Lin Feng
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2012-12-13 09:38:32 +0800

20 Nov, 2012

2 commits

033fa1c5f cgroup, cpuset: remove cgroup_subsys->post_clone() ... Browse Code »

Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core. cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.

Signed-off-by: Tejun Heo
Original-patch-by: Glauber Costa
Original-patch:
Acked-by: Serge E. Hallyn
Acked-by: Li Zefan
Cc: Glauber Costa

Tejun Heo
2012-11-20 00:13:39 +0800
92fb97487 cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ... Browse Code »

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2012-11-20 00:13:38 +0800

24 Jul, 2012

4 commits

a1cd2b13f cpusets: Remove/update outdated comments ... Browse Code »

cpuset_track_online_cpus() is no longer present. So remove the
outdated comment and replace it with reference to cpuset_update_active_cpus()
which is its equivalent.

Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
out how it is dealt with. So update that comment as well.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar

Srivatsa S. Bhat
2012-07-24 19:53:28 +0800
7ddf96b02 cpusets, hotplug: Restructure functions that are invoked during hotplug ... Browse Code »

Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".

And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar

Srivatsa S. Bhat
2012-07-24 19:53:22 +0800
80d1fa646 cpusets, hotplug: Implement cpuset tree traversal in a helper function ... Browse Code »

At present, the functions that deal with cpusets during CPU/Mem hotplug
are quite messy, since a lot of the functionality is mixed up without clear
separation. And this takes a toll on optimization as well. For example,
the function cpuset_update_active_cpus() is called on both CPU offline and CPU
online events; and it invokes scan_for_empty_cpusets(), which makes sense
only for CPU offline events. And hence, the current code ends up unnecessarily
traversing the cpuset tree during CPU online also.

As a first step towards cleaning up those functions, encapsulate the cpuset
tree traversal in a helper function, so as to facilitate upcoming changes.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar

Srivatsa S. Bhat
2012-07-24 19:53:18 +0800
d35be8bab CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume ... Browse Code »

In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.

However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.

In order to achieve that, do the following:

1. Don't modify cpusets during suspend/resume. At all.
In particular, don't move the tasks from one cpuset to another, and
don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
during the CPU hotplug operations that are carried out in the
suspend/resume path.

2. However, cpusets and sched domains are related. We just want to avoid
altering cpusets alone. So, to keep the sched domains updated, build
a single sched domain (containing all active cpus) during each of the
CPU hotplug operations carried out in s/r path, effectively ignoring
the cpusets' cpus_allowed masks.

(Since userspace is frozen while doing all this, it will go unnoticed.)

3. During the last CPU online operation during resume, build the sched
domains by looking up the (unaltered) cpusets' cpus_allowed masks.
That will bring back the system to the same original state as it was in
before suspend.

Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar

Srivatsa S. Bhat
2012-07-24 19:53:14 +0800

23 May, 2012

1 commit

88d6ae8dc Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"cgroup file type addition / removal is updated so that file types are
added and removed instead of individual files so that dynamic file
type addition / removal can be implemented by cgroup and used by
controllers. blkio controller changes which will come through block
tree are dependent on this. Other changes include res_counter cleanup
and disallowing kthread / PF_THREAD_BOUND threads to be attached to
non-root cgroups.

There's a reported bug with the file type addition / removal handling
which can lead to oops on cgroup umount. The issue is being looked
into. It shouldn't cause problems for most setups and isn't a
security concern."

Fix up trivial conflict in Documentation/feature-removal-schedule.txt

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
res_counter: Account max_usage when calling res_counter_charge_nofail()
res_counter: Merge res_counter_charge and res_counter_charge_nofail
cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
cgroup: remove cgroup_subsys->populate()
cgroup: get rid of populate for memcg
cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
cgroup: make css->refcnt clearing on cgroup removal optional
cgroup: use negative bias on css->refcnt to block css_tryget()
cgroup: implement cgroup_rm_cftypes()
cgroup: introduce struct cfent
cgroup: relocate __d_cgrp() and __d_cft()
cgroup: remove cgroup_add_file[s]()
cgroup: convert memcg controller to the new cftype interface
memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
cgroup: convert all non-memcg controllers to the new cftype interface
cgroup: relocate cftype and cgroup_subsys definitions in controllers
cgroup: merge cft_release_agent cftype array into the base files array
cgroup: implement cgroup_add_cftypes() and friends
cgroup: build list of all cgroups under a given cgroupfs_root
cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
...

Linus Torvalds
2012-05-23 08:40:19 +0800

02 Apr, 2012

2 commits

deb74f5ca Merge tag 'for-linus' of git://github.com/rustyrussell/linux ... Browse Code »

Pull cpumask cleanups from Rusty Russell:
"(Somehow forgot to send this out; it's been sitting in linux-next, and
if you don't want it, it can sit there another cycle)"

I'm a sucker for things that actually delete lines of code.

Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
a user of &cpu_online_map to be cpu_online_mask, but that code got
deleted by commit b21d55e98ac2 ("ARM: 7332/1: extract out code patch
function from kprobes").

* tag 'for-linus' of git://github.com/rustyrussell/linux:
cpumask: remove old cpu_*_map.
documentation: remove references to cpu_*_map.
drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
remove references to cpu_*_map in arch/

Linus Torvalds
2012-04-02 23:53:24 +0800
4baf6e332 cgroup: convert all non-memcg controllers to the new cftype interface ... Browse Code »

Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
net_cls and device controllers to use the new cftype based interface.
Termination entry is added to cftype arrays and populate callbacks are
replaced with cgroup_subsys->base_cftypes initializations.

This is functionally identical transformation. There shouldn't be any
visible behavior change.

memcg is rather special and will be converted separately.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Paul Menage
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: "David S. Miller"
Cc: Vivek Goyal

Tejun Heo
2012-04-02 03:09:55 +0800

30 Mar, 2012

1 commit

7fda0412c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpusets: Remove an unused variable
sched/rt: Improve pick_next_highest_task_rt()
sched: Fix select_fallback_rq() vs cpu_active/cpu_online
sched/x86/smp: Do not enable IRQs over calibrate_delay()
sched: Fix compiler warning about declared inline after use
MAINTAINERS: Update email address for SCHEDULER and PERF EVENTS

Linus Torvalds
2012-03-30 05:46:05 +0800

29 Mar, 2012

1 commit

5f054e31c documentation: remove references to cpu_*_map. ... Browse Code »

This has been obsolescent for a while, fix documentation and
misc comments.

Signed-off-by: Rusty Russell

Rusty Russell
2012-03-29 13:08:31 +0800

28 Mar, 2012

1 commit

160594e99 cpusets: Remove an unused variable ... Browse Code »

We don't use "cpu" any more after 2baab4e904 "sched: Fix
select_fallback_rq() vs cpu_active/cpu_online".

Signed-off-by: Dan Carpenter
Cc: Paul Menage
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120328104608.GD29022@elgon.mountain
Signed-off-by: Ingo Molnar

Dan Carpenter
2012-03-28 19:40:44 +0800

27 Mar, 2012

1 commit

2baab4e90 sched: Fix select_fallback_rq() vs cpu_active/cpu_online ... Browse Code »

Commit 5fbd036b55 ("sched: Cleanup cpu_active madness"), which was
supposed to finally sort the cpu_active mess, instead uncovered more.

Since CPU_STARTING is ran before setting the cpu online, there's a
(small) window where the cpu has active,!online.

If during this time there's a wakeup of a task that used to reside on
that cpu select_task_rq() will use select_fallback_rq() to compute an
alternative cpu to run on since we find !online.

select_fallback_rq() however will compute the new cpu against
cpu_active, this means that it can return the same cpu it started out
with, the !online one, since that cpu is in fact marked active.

This results in us trying to scheduling a task on an offline cpu and
triggering a WARN in the IPI code.

The solution proposed by Chuansheng Liu of setting cpu_active in
set_cpu_online() is buggy, firstly not all archs actually use
set_cpu_online(), secondly, not all archs call set_cpu_online() with
IRQs disabled, this means we would introduce either the same race or
the race from fd8a7de17 ("x86: cpu-hotplug: Prevent softirq wakeup on
wrong CPU") -- albeit much narrower.

[ By setting online first and active later we have a window of
online,!active, fresh and bound kthreads have task_cpu() of 0 and
since cpu0 isn't in tsk_cpus_allowed() we end up in
select_fallback_rq() which excludes !active, resulting in a reset
of ->cpus_allowed and the thread running all over the place. ]

The solution is to re-work select_fallback_rq() to require active
_and_ online. This makes the active,!online case work as expected,
OTOH archs running CPU_STARTING after setting online are now
vulnerable to the issue from fd8a7de17 -- these are alpha and
blackfin.

Reported-by: Chuansheng Liu
Signed-off-by: Peter Zijlstra
Cc: Mike Frysinger
Cc: linux-alpha@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-hubqk1i10o4dpvlm06gq7v6j@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2012-03-27 20:50:14 +0800

22 Mar, 2012

1 commit

cc9a6c877 cpuset: mm: reduce large amounts of memory barrier related damage v3 ... Browse Code »

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were

3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman
Cc: Miao Xie
Cc: David Rientjes
Cc: Peter Zijlstra
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-03-22 08:54:59 +0800

03 Feb, 2012

1 commit

761b3ef50 cgroup: remove cgroup_subsys argument from callbacks ... Browse Code »

The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.

Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().

So we reduce a few lines of code, though the shrinking of object size
is minimal.

16 files changed, 113 insertions(+), 162 deletions(-)

text data bss dec hex filename
5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
5486170 656987 7039960 13183117 c9288d vmlinux.o

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Li Zefan
2012-02-03 01:20:22 +0800

10 Jan, 2012

1 commit

db0c2bf69 Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cgroup: fix to allow mounting a hierarchy by name
cgroup: move assignement out of condition in cgroup_attach_proc()
cgroup: Remove task_lock() from cgroup_post_fork()
cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
cgroup: only need to check oldcgrp==newgrp once
cgroup: remove redundant get/put of task struct
cgroup: remove redundant get/put of old css_set from migrate
cgroup: Remove unnecessary task_lock before fetching css_set on migration
cgroup: Drop task_lock(parent) on cgroup_fork()
cgroups: remove redundant get/put of css_set from css_set_check_fetched()
resource cgroups: remove bogus cast
cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
cgroup, cpuset: don't use ss->pre_attach()
cgroup: don't use subsys->can_attach_task() or ->attach_task()
cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup: always lock threadgroup during migration
threadgroup: extend threadgroup_lock() to cover exit and exec
threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.

Linus Torvalds
2012-01-10 04:59:24 +0800

21 Dec, 2011

1 commit

b246272ec cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask ... Browse Code »

Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.

Reported-by: Miao Xie
Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Paul Menage
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-12-21 02:25:04 +0800

13 Dec, 2011

3 commits

94196f51c cgroup, cpuset: don't use ss->pre_attach() ... Browse Code »

->pre_attach() is supposed to be called before migration, which is
observed during process migration but task migration does it the other
way around. The only ->pre_attach() user is cpuset which can do the
same operaitons in ->can_attach(). Collapse cpuset_pre_attach() into
cpuset_can_attach().

-v2: Patch contamination from later patch removed. Spotted by Paul
Menage.

Signed-off-by: Tejun Heo
Reviewed-by: Frederic Weisbecker
Acked-by: Paul Menage
Cc: Li Zefan

Tejun Heo
2011-12-13 10:12:22 +0800
bb9d97b6d cgroup: don't use subsys->can_attach_task() or ->attach_task() ... Browse Code »

Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations. Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead. Most converions are straight-forward.
Noteworthy changes are,

* In cgroup_freezer, remove unnecessary NULL assignments to unused
methods. It's useless and very prone to get out of sync, which
already happened.

* In cpuset, PF_THREAD_BOUND test is checked for each task. This
doesn't make any practical difference but is conceptually cleaner.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Frederic Weisbecker
Acked-by: Li Zefan
Cc: Paul Menage
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: James Morris
Cc: Ingo Molnar
Cc: Peter Zijlstra

Tejun Heo
2011-12-13 10:12:21 +0800
2f7ee5691 cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() ... Browse Code »

Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods. This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.

Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset. This unifies passed parameters so
that all methods have access to all information. Conversions in this
patchset are identical and don't introduce any behavior change.

-v2: documentation updated as per Paul Menage's suggestion.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Frederic Weisbecker
Acked-by: Paul Menage
Acked-by: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki
Cc: James Morris

Tejun Heo
2011-12-13 10:12:21 +0800

07 Nov, 2011

1 commit

32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800

03 Nov, 2011

1 commit

89e8a244b cpusets: avoid looping when storing to mems_allowed if one node remains set ... Browse Code »

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself. It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example. In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes. This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes. Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes
Cc: Miao Xie
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-11-03 07:07:00 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

27 Jul, 2011

2 commits

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800
778d3b0ff cpusets: randomize node rotor used in cpuset_mem_spread_node() ... Browse Code »

[ This patch has already been accepted as commit 0ac0c0d0f837 but later
reverted (commit 35926ff5fba8) because it itroduced arch specific
__node_random which was defined only for x86 code so it broke other
archs. This is a followup without any arch specific code. Other than
that there are no functional changes.]

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems). Part of the reason is
that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
at node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number
of the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent]
[akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
Signed-off-by: Jack Steiner
Signed-off-by: Lee Schermerhorn
Signed-off-by: Michal Hocko
Reviewed-by: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Paul Menage
Cc: Jack Steiner
Cc: Robin Holt
Cc: David Rientjes
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Jack Steiner
Cc: KOSAKI Motohiro
Cc: Lee Schermerhorn
Cc: Michal Hocko
Cc: Paul Menage
Cc: Pekka Enberg
Cc: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-07-27 07:49:43 +0800

28 May, 2011

1 commit

1e1b6c511 cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed ... Browse Code »

The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
tsk->cpus_allowed. Otherwise RT scheduler may confuse.

Signed-off-by: KOSAKI Motohiro
Cc: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2011-05-28 23:02:57 +0800

27 May, 2011

2 commits

a77aea920 cgroup: remove the ns_cgroup ... Browse Code »

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup

The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano
Signed-off-by: Serge E. Hallyn
Cc: Eric W. Biederman
Cc: Jamal Hadi Salim
Reviewed-by: Li Zefan
Acked-by: Paul Menage
Acked-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Lezcano
2011-05-27 08:12:34 +0800
f780bdb7c cgroups: add per-thread subsystem callbacks ... Browse Code »

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface. Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this. All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.

Signed-off-by: Ben Blum
Cc: "Eric W. Biederman"
Cc: Li Zefan
Cc: Matt Helsley
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-05-27 08:12:34 +0800

11 Apr, 2011

1 commit

60495e776 sched: Dynamic sched_domain::level ... Browse Code »

Remove the SD_LV_ enum and use dynamic level assignments.

Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20110407122942.969433965@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-11 20:09:32 +0800

24 Mar, 2011

4 commits

523fb486b cpuset: hold callback_mutex in cpuset_post_clone() ... Browse Code »

Chaning cpuset->mems/cpuset->cpus should be protected under
callback_mutex.

cpuset_clone() doesn't follow this rule. It's ok because it's
called when creating and initializing a cgroup, but we'd better
hold the lock to avoid subtil break in the future.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:35 +0800
ee24d3797 cpuset: fix unchecked calls to NODEMASK_ALLOC() ... Browse Code »

Those functions that use NODEMASK_ALLOC() can't propagate errno
to users, but will fail silently.

Fix it by using a static nodemask_t variable for each function, and
those variables are protected by cgroup_mutex;

[akpm@linux-foundation.org: fix comment spelling, strengthen cgroup_lock comment]
Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:35 +0800
c8163ca8a cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_attach() ... Browse Code »

oldcs->mems_allowed is not modified during cpuset_attach(), so we don't
have to copy it to a buffer allocated by NODEMASK_ALLOC(). Just pass it
to cpuset_migrate_mm().

Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:34 +0800
9303e0c48 cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_sprintf_memlist() ... Browse Code »

It's not necessary to copy cpuset->mems_allowed to a buffer allocated by
NODEMASK_ALLOC(). Just pass it to nodelist_scnprintf().

As spotted by Paul, a side effect is we fix a bug that the function can
return -ENOMEM but the caller doesn't expect negative return value.
Therefore change the return value of cpuset_sprintf_cpulist() and
cpuset_sprintf_memlist() from int to size_t.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:34 +0800

05 Mar, 2011

1 commit

b75f38d65 cpuset: add a missing unlock in cpuset_write_resmask() ... Browse Code »

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-05 09:53:38 +0800

29 Oct, 2010

1 commit

f7e835710 convert cgroup and cpuset ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:17:06 +0800

21 Oct, 2010

1 commit

b0ae19811 security: remove unused parameter from security_task_setscheduler() ... Browse Code »

All security modules shouldn't change sched_param parameter of
security_task_setscheduler(). This is not only meaningless, but also
make a harmful result if caller pass a static variable.

This patch remove policy and sched_param parameter from
security_task_setscheduler() becuase none of security module is
using it.

Cc: James Morris
Signed-off-by: KOSAKI Motohiro
Signed-off-by: James Morris

KOSAKI Motohiro
2010-10-21 07:12:44 +0800

07 Aug, 2010

1 commit

c4efd6b56 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
sched: No need for bootmem special cases
sched: Revert nohz_ratelimit() for now
sched: Reduce update_group_power() calls
sched: Update rq->clock for nohz balanced cpus
sched: Fix spelling of sibling
sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
sched: thread_group_cputime: Simplify, document the "alive" check
sched: Remove the obsolete exit_state/signal hacks
sched: task_tick_rt: Remove the obsolete ->signal != NULL check
sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
sched: Fix comments to make them DocBook happy
sched: Fix fix_small_capacity
powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
powerpc: Enable asymmetric SMT scheduling on POWER7
sched: Add asymmetric group packing option for sibling domain
sched: Fix capacity calculations for SMT4
sched: Change nohz idle load balancing logic to push model
...

Linus Torvalds
2010-08-07 00:39:22 +0800

22 Jun, 2010

1 commit

0b2e918aa sched, cpuset: Drop __cpuexit from cpu hotplug callbacks ... Browse Code »

Commit 3a101d05 (sched: adjust when cpu_active and cpuset
configurations are updated during cpu on/offlining) added
hotplug notifiers marked with __cpuexit; however, ia64 drops
text in __cpuexit during link unlike x86.

This means that functions which are referenced during init but used
only for cpu hot unplugging afterwards shouldn't be marked with
__cpuexit. Drop __cpuexit from those functions.

Reported-by: Tony Luck
Signed-off-by: Tejun Heo
Acked-by: Tony Luck
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Tejun Heo
2010-06-22 14:07:39 +0800

17 Jun, 2010

1 commit

f1bbbb691 Merge branch 'master' into for-next Browse Code »

Jiri Kosina
2010-06-17 00:08:13 +0800