13 Dec, 2012

1 commit

  • N_HIGH_MEMORY stands for the nodes that has normal or high memory.
    N_MEMORY stands for the nodes that has any memory.

    The code here need to handle with the nodes which have memory, we should
    use N_MEMORY instead.

    Signed-off-by: Lai Jiangshan
    Acked-by: Hillf Danton
    Signed-off-by: Wen Congyang
    Cc: Christoph Lameter
    Cc: Lin Feng
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

20 Nov, 2012

2 commits

  • Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now
    that clone_children is cpuset specific, there's no reason to have this
    rather odd option activation mechanism in cgroup core. cpuset can
    check the flag from its ->css_allocate() and take the necessary
    action.

    Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
    remove cgroup_subsys->post_clone().

    Loosely based on Glauber's "generalize post_clone into post_create"
    patch.

    Signed-off-by: Tejun Heo
    Original-patch-by: Glauber Costa
    Original-patch:
    Acked-by: Serge E. Hallyn
    Acked-by: Li Zefan
    Cc: Glauber Costa

    Tejun Heo
     
  • Rename cgroup_subsys css lifetime related callbacks to better describe
    what their roles are. Also, update documentation.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

24 Jul, 2012

4 commits

  • cpuset_track_online_cpus() is no longer present. So remove the
    outdated comment and replace it with reference to cpuset_update_active_cpus()
    which is its equivalent.

    Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
    out how it is dealt with. So update that comment as well.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     
  • Separate out the cpuset related handling for CPU/Memory online/offline.
    This also helps us exploit the most obvious and basic level of optimization
    that any notification mechanism (CPU/Mem online/offline) has to offer us:
    "We *know* why we have been invoked. So stop pretending that we are lost,
    and do only the necessary amount of processing!".

    And while at it, rename scan_for_empty_cpusets() to
    scan_cpusets_upon_hotplug(), which is more appropriate considering how
    it is restructured.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     
  • At present, the functions that deal with cpusets during CPU/Mem hotplug
    are quite messy, since a lot of the functionality is mixed up without clear
    separation. And this takes a toll on optimization as well. For example,
    the function cpuset_update_active_cpus() is called on both CPU offline and CPU
    online events; and it invokes scan_for_empty_cpusets(), which makes sense
    only for CPU offline events. And hence, the current code ends up unnecessarily
    traversing the cpuset tree during CPU online also.

    As a first step towards cleaning up those functions, encapsulate the cpuset
    tree traversal in a helper function, so as to facilitate upcoming changes.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     
  • In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
    masks as and when necessary to ensure that the tasks belonging to the cpusets
    have some place (online CPUs) to run on. And regular CPU hotplug is
    destructive in the sense that the kernel doesn't remember the original cpuset
    configurations set by the user, across hotplug operations.

    However, suspend/resume (which uses CPU hotplug) is a special case in which
    the kernel has the responsibility to restore the system (during resume), to
    exactly the same state it was in before suspend.

    In order to achieve that, do the following:

    1. Don't modify cpusets during suspend/resume. At all.
    In particular, don't move the tasks from one cpuset to another, and
    don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
    during the CPU hotplug operations that are carried out in the
    suspend/resume path.

    2. However, cpusets and sched domains are related. We just want to avoid
    altering cpusets alone. So, to keep the sched domains updated, build
    a single sched domain (containing all active cpus) during each of the
    CPU hotplug operations carried out in s/r path, effectively ignoring
    the cpusets' cpus_allowed masks.

    (Since userspace is frozen while doing all this, it will go unnoticed.)

    3. During the last CPU online operation during resume, build the sched
    domains by looking up the (unaltered) cpusets' cpus_allowed masks.
    That will bring back the system to the same original state as it was in
    before suspend.

    Ultimately, this will not only solve the cpuset problem related to suspend
    resume (ie., restores the cpusets to exactly what it was before suspend, by
    not touching it at all) but also speeds up suspend/resume because we avoid
    running cpuset update code for every CPU being offlined/onlined.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     

23 May, 2012

1 commit

  • Pull cgroup updates from Tejun Heo:
    "cgroup file type addition / removal is updated so that file types are
    added and removed instead of individual files so that dynamic file
    type addition / removal can be implemented by cgroup and used by
    controllers. blkio controller changes which will come through block
    tree are dependent on this. Other changes include res_counter cleanup
    and disallowing kthread / PF_THREAD_BOUND threads to be attached to
    non-root cgroups.

    There's a reported bug with the file type addition / removal handling
    which can lead to oops on cgroup umount. The issue is being looked
    into. It shouldn't cause problems for most setups and isn't a
    security concern."

    Fix up trivial conflict in Documentation/feature-removal-schedule.txt

    * 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    res_counter: Account max_usage when calling res_counter_charge_nofail()
    res_counter: Merge res_counter_charge and res_counter_charge_nofail
    cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
    cgroup: remove cgroup_subsys->populate()
    cgroup: get rid of populate for memcg
    cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
    cgroup: make css->refcnt clearing on cgroup removal optional
    cgroup: use negative bias on css->refcnt to block css_tryget()
    cgroup: implement cgroup_rm_cftypes()
    cgroup: introduce struct cfent
    cgroup: relocate __d_cgrp() and __d_cft()
    cgroup: remove cgroup_add_file[s]()
    cgroup: convert memcg controller to the new cftype interface
    memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
    cgroup: convert all non-memcg controllers to the new cftype interface
    cgroup: relocate cftype and cgroup_subsys definitions in controllers
    cgroup: merge cft_release_agent cftype array into the base files array
    cgroup: implement cgroup_add_cftypes() and friends
    cgroup: build list of all cgroups under a given cgroupfs_root
    cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
    ...

    Linus Torvalds
     

02 Apr, 2012

2 commits

  • Pull cpumask cleanups from Rusty Russell:
    "(Somehow forgot to send this out; it's been sitting in linux-next, and
    if you don't want it, it can sit there another cycle)"

    I'm a sucker for things that actually delete lines of code.

    Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
    a user of &cpu_online_map to be cpu_online_mask, but that code got
    deleted by commit b21d55e98ac2 ("ARM: 7332/1: extract out code patch
    function from kprobes").

    * tag 'for-linus' of git://github.com/rustyrussell/linux:
    cpumask: remove old cpu_*_map.
    documentation: remove references to cpu_*_map.
    drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
    remove references to cpu_*_map in arch/

    Linus Torvalds
     
  • Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
    net_cls and device controllers to use the new cftype based interface.
    Termination entry is added to cftype arrays and populate callbacks are
    replaced with cgroup_subsys->base_cftypes initializations.

    This is functionally identical transformation. There shouldn't be any
    visible behavior change.

    memcg is rather special and will be converted separately.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: "David S. Miller"
    Cc: Vivek Goyal

    Tejun Heo
     

30 Mar, 2012

1 commit


29 Mar, 2012

1 commit


28 Mar, 2012

1 commit

  • We don't use "cpu" any more after 2baab4e904 "sched: Fix
    select_fallback_rq() vs cpu_active/cpu_online".

    Signed-off-by: Dan Carpenter
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120328104608.GD29022@elgon.mountain
    Signed-off-by: Ingo Molnar

    Dan Carpenter
     

27 Mar, 2012

1 commit

  • Commit 5fbd036b55 ("sched: Cleanup cpu_active madness"), which was
    supposed to finally sort the cpu_active mess, instead uncovered more.

    Since CPU_STARTING is ran before setting the cpu online, there's a
    (small) window where the cpu has active,!online.

    If during this time there's a wakeup of a task that used to reside on
    that cpu select_task_rq() will use select_fallback_rq() to compute an
    alternative cpu to run on since we find !online.

    select_fallback_rq() however will compute the new cpu against
    cpu_active, this means that it can return the same cpu it started out
    with, the !online one, since that cpu is in fact marked active.

    This results in us trying to scheduling a task on an offline cpu and
    triggering a WARN in the IPI code.

    The solution proposed by Chuansheng Liu of setting cpu_active in
    set_cpu_online() is buggy, firstly not all archs actually use
    set_cpu_online(), secondly, not all archs call set_cpu_online() with
    IRQs disabled, this means we would introduce either the same race or
    the race from fd8a7de17 ("x86: cpu-hotplug: Prevent softirq wakeup on
    wrong CPU") -- albeit much narrower.

    [ By setting online first and active later we have a window of
    online,!active, fresh and bound kthreads have task_cpu() of 0 and
    since cpu0 isn't in tsk_cpus_allowed() we end up in
    select_fallback_rq() which excludes !active, resulting in a reset
    of ->cpus_allowed and the thread running all over the place. ]

    The solution is to re-work select_fallback_rq() to require active
    _and_ online. This makes the active,!online case work as expected,
    OTOH archs running CPU_STARTING after setting online are now
    vulnerable to the issue from fd8a7de17 -- these are alpha and
    blackfin.

    Reported-by: Chuansheng Liu
    Signed-off-by: Peter Zijlstra
    Cc: Mike Frysinger
    Cc: linux-alpha@vger.kernel.org
    Link: http://lkml.kernel.org/n/tip-hubqk1i10o4dpvlm06gq7v6j@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Mar, 2012

1 commit

  • Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
    changing cpuset's mems") wins a super prize for the largest number of
    memory barriers entered into fast paths for one commit.

    [get|put]_mems_allowed is incredibly heavy with pairs of full memory
    barriers inserted into a number of hot paths. This was detected while
    investigating at large page allocator slowdown introduced some time
    after 2.6.32. The largest portion of this overhead was shown by
    oprofile to be at an mfence introduced by this commit into the page
    allocator hot path.

    For extra style points, the commit introduced the use of yield() in an
    implementation of what looks like a spinning mutex.

    This patch replaces the full memory barriers on both read and write
    sides with a sequence counter with just read barriers on the fast path
    side. This is much cheaper on some architectures, including x86. The
    main bulk of the patch is the retry logic if the nodemask changes in a
    manner that can cause a false failure.

    While updating the nodemask, a check is made to see if a false failure
    is a risk. If it is, the sequence number gets bumped and parallel
    allocators will briefly stall while the nodemask update takes place.

    In a page fault test microbenchmark, oprofile samples from
    __alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
    actual results were

    3.3.0-rc3 3.3.0-rc3
    rc3-vanilla nobarrier-v2r1
    Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
    Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
    Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
    Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
    Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
    Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
    Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
    Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
    Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
    Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
    Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
    Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
    Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
    Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
    Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 135.68 132.17
    User+Sys Time Running Test (seconds) 164.2 160.13
    Total Elapsed Time (seconds) 123.46 120.87

    The overall improvement is small but the System CPU time is much
    improved and roughly in correlation to what oprofile reported (these
    performance figures are without profiling so skew is expected). The
    actual number of page faults is noticeably improved.

    For benchmarks like kernel builds, the overall benefit is marginal but
    the system CPU time is slightly reduced.

    To test the actual bug the commit fixed I opened two terminals. The
    first ran within a cpuset and continually ran a small program that
    faulted 100M of anonymous data. In a second window, the nodemask of the
    cpuset was continually randomised in a loop.

    Without the commit, the program would fail every so often (usually
    within 10 seconds) and obviously with the commit everything worked fine.
    With this patch applied, it also worked fine so the fix should be
    functionally equivalent.

    Signed-off-by: Mel Gorman
    Cc: Miao Xie
    Cc: David Rientjes
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

03 Feb, 2012

1 commit

  • The argument is not used at all, and it's not necessary, because
    a specific callback handler of course knows which subsys it
    belongs to.

    Now only ->pupulate() takes this argument, because the handlers of
    this callback always call cgroup_add_file()/cgroup_add_files().

    So we reduce a few lines of code, though the shrinking of object size
    is minimal.

    16 files changed, 113 insertions(+), 162 deletions(-)

    text data bss dec hex filename
    5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
    5486170 656987 7039960 13183117 c9288d vmlinux.o

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

10 Jan, 2012

1 commit

  • * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    cgroup: fix to allow mounting a hierarchy by name
    cgroup: move assignement out of condition in cgroup_attach_proc()
    cgroup: Remove task_lock() from cgroup_post_fork()
    cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
    cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
    cgroup: only need to check oldcgrp==newgrp once
    cgroup: remove redundant get/put of task struct
    cgroup: remove redundant get/put of old css_set from migrate
    cgroup: Remove unnecessary task_lock before fetching css_set on migration
    cgroup: Drop task_lock(parent) on cgroup_fork()
    cgroups: remove redundant get/put of css_set from css_set_check_fetched()
    resource cgroups: remove bogus cast
    cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
    cgroup, cpuset: don't use ss->pre_attach()
    cgroup: don't use subsys->can_attach_task() or ->attach_task()
    cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
    cgroup: improve old cgroup handling in cgroup_attach_proc()
    cgroup: always lock threadgroup during migration
    threadgroup: extend threadgroup_lock() to cover exit and exec
    threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
    ...

    Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
    fix a css_set not found bug in cgroup_attach_proc" that already
    mentioned that the bug is fixed (differently) in Tejun's cgroup
    patchset. This one, in other words.

    Linus Torvalds
     

21 Dec, 2011

1 commit

  • Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
    nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
    new set of allowed cpuset nodes where the two nodemasks, as a result of
    the remap, are now disjoint.

    c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
    cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
    nodes from changing for a thread. This causes any update to a set of
    allowed nodes to stall until put_mems_allowed() is called.

    This stall is unncessary, however, if at least one node remains unchanged
    in the update to the set of allowed nodes. This was addressed by
    89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
    node remains set"), but it's still possible that an empty nodemask may be
    read from a mempolicy because the old nodemask may be remapped to the new
    nodemask during rebind. To prevent this, only avoid the stall if there is
    no mempolicy for the thread being changed.

    This is a temporary solution until all reads from mempolicy nodemasks can
    be guaranteed to not be empty without the get_mems_allowed()
    synchronization.

    Also moves the check for nodemask intersection inside task_lock() so that
    tsk->mems_allowed cannot change. This ensures that nothing can set this
    tsk's mems_allowed out from under us and also protects tsk->mempolicy.

    Reported-by: Miao Xie
    Signed-off-by: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Paul Menage
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

13 Dec, 2011

3 commits

  • ->pre_attach() is supposed to be called before migration, which is
    observed during process migration but task migration does it the other
    way around. The only ->pre_attach() user is cpuset which can do the
    same operaitons in ->can_attach(). Collapse cpuset_pre_attach() into
    cpuset_can_attach().

    -v2: Patch contamination from later patch removed. Spotted by Paul
    Menage.

    Signed-off-by: Tejun Heo
    Reviewed-by: Frederic Weisbecker
    Acked-by: Paul Menage
    Cc: Li Zefan

    Tejun Heo
     
  • Now that subsys->can_attach() and attach() take @tset instead of
    @task, they can handle per-task operations. Convert
    ->can_attach_task() and ->attach_task() users to use ->can_attach()
    and attach() instead. Most converions are straight-forward.
    Noteworthy changes are,

    * In cgroup_freezer, remove unnecessary NULL assignments to unused
    methods. It's useless and very prone to get out of sync, which
    already happened.

    * In cpuset, PF_THREAD_BOUND test is checked for each task. This
    doesn't make any practical difference but is conceptually cleaner.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: James Morris
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Tejun Heo
     
  • Currently, there's no way to pass multiple tasks to cgroup_subsys
    methods necessitating the need for separate per-process and per-task
    methods. This patch introduces cgroup_taskset which can be used to
    pass multiple tasks and their associated cgroups to cgroup_subsys
    methods.

    Three methods - can_attach(), cancel_attach() and attach() - are
    converted to use cgroup_taskset. This unifies passed parameters so
    that all methods have access to all information. Conversions in this
    patchset are identical and don't introduce any behavior change.

    -v2: documentation updated as per Paul Menage's suggestion.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Paul Menage
    Acked-by: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: James Morris

    Tejun Heo
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

03 Nov, 2011

1 commit

  • {get,put}_mems_allowed() exist so that general kernel code may locklessly
    access a task's set of allowable nodes without having the chance that a
    concurrent write will cause the nodemask to be empty on configurations
    where MAX_NUMNODES > BITS_PER_LONG.

    This could incur a significant delay, however, especially in low memory
    conditions because the page allocator is blocking and reclaim requires
    get_mems_allowed() itself. It is not atypical to see writes to
    cpuset.mems take over 2 seconds to complete, for example. In low memory
    conditions, this is problematic because it's one of the most imporant
    times to change cpuset.mems in the first place!

    The only way a task's set of allowable nodes may change is through cpusets
    by writing to cpuset.mems and when attaching a task to a generic code is
    not reading the nodemask with get_mems_allowed() at the same time, and
    then clearing all the old nodes. This prevents the possibility that a
    reader will see an empty nodemask at the same time the writer is storing a
    new nodemask.

    If at least one node remains unchanged, though, it's possible to simply
    set all new nodes and then clear all the old nodes. Changing a task's
    nodemask is protected by cgroup_mutex so it's guaranteed that two threads
    are not changing the same task's nodemask at the same time, so the
    nodemask is guaranteed to be stored before another thread changes it and
    determines whether a node remains set or not.

    Signed-off-by: David Rientjes
    Cc: Miao Xie
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

27 Jul, 2011

2 commits

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • [ This patch has already been accepted as commit 0ac0c0d0f837 but later
    reverted (commit 35926ff5fba8) because it itroduced arch specific
    __node_random which was defined only for x86 code so it broke other
    archs. This is a followup without any arch specific code. Other than
    that there are no functional changes.]

    Some workloads that create a large number of small files tend to assign
    too many pages to node 0 (multi-node systems). Part of the reason is
    that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
    at node 0 for newly created tasks.

    This patch changes the rotor to be initialized to a random node number
    of the cpuset.

    [akpm@linux-foundation.org: fix layout]
    [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
    [mhocko@suse.cz: Make it arch independent]
    [akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
    Signed-off-by: Jack Steiner
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Michal Hocko
    Reviewed-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Paul Menage
    Cc: Jack Steiner
    Cc: Robin Holt
    Cc: David Rientjes
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Jack Steiner
    Cc: KOSAKI Motohiro
    Cc: Lee Schermerhorn
    Cc: Michal Hocko
    Cc: Paul Menage
    Cc: Pekka Enberg
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

28 May, 2011

1 commit


27 May, 2011

2 commits

  • The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
    leads to some problems:

    * cgroup creation is out-of-control
    * cgroup name can conflict when pids are looping
    * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
    * we may want to create a namespace without creating a cgroup

    The ns_cgroup was replaced by a compatibility flag 'clone_children',
    where a newly created cgroup will copy the parent cgroup values.
    The userspace has to manually create a cgroup and add a task to
    the 'tasks' file.

    This patch removes the ns_cgroup as suggested in the following thread:

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

    The 'cgroup_clone' function is removed because it is no longer used.

    This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
    ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
    printk warning users that the feature is planned for removal. Since that
    time we have heard from XXX users who were affected by this.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Jamal Hadi Salim
    Reviewed-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     
  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

11 Apr, 2011

1 commit

  • Remove the SD_LV_ enum and use dynamic level assignments.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110407122942.969433965@chello.nl
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

24 Mar, 2011

4 commits

  • Chaning cpuset->mems/cpuset->cpus should be protected under
    callback_mutex.

    cpuset_clone() doesn't follow this rule. It's ok because it's
    called when creating and initializing a cgroup, but we'd better
    hold the lock to avoid subtil break in the future.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Those functions that use NODEMASK_ALLOC() can't propagate errno
    to users, but will fail silently.

    Fix it by using a static nodemask_t variable for each function, and
    those variables are protected by cgroup_mutex;

    [akpm@linux-foundation.org: fix comment spelling, strengthen cgroup_lock comment]
    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Acked-by: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • oldcs->mems_allowed is not modified during cpuset_attach(), so we don't
    have to copy it to a buffer allocated by NODEMASK_ALLOC(). Just pass it
    to cpuset_migrate_mm().

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Acked-by: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • It's not necessary to copy cpuset->mems_allowed to a buffer allocated by
    NODEMASK_ALLOC(). Just pass it to nodelist_scnprintf().

    As spotted by Paul, a side effect is we fix a bug that the function can
    return -ENOMEM but the caller doesn't expect negative return value.
    Therefore change the return value of cpuset_sprintf_cpulist() and
    cpuset_sprintf_memlist() from int to size_t.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

05 Mar, 2011

1 commit


29 Oct, 2010

1 commit


21 Oct, 2010

1 commit

  • All security modules shouldn't change sched_param parameter of
    security_task_setscheduler(). This is not only meaningless, but also
    make a harmful result if caller pass a static variable.

    This patch remove policy and sched_param parameter from
    security_task_setscheduler() becuase none of security module is
    using it.

    Cc: James Morris
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: James Morris

    KOSAKI Motohiro
     

07 Aug, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     

22 Jun, 2010

1 commit

  • Commit 3a101d05 (sched: adjust when cpu_active and cpuset
    configurations are updated during cpu on/offlining) added
    hotplug notifiers marked with __cpuexit; however, ia64 drops
    text in __cpuexit during link unlike x86.

    This means that functions which are referenced during init but used
    only for cpu hot unplugging afterwards shouldn't be marked with
    __cpuexit. Drop __cpuexit from those functions.

    Reported-by: Tony Luck
    Signed-off-by: Tejun Heo
    Acked-by: Tony Luck
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

17 Jun, 2010

1 commit