04 Apr, 2014

3 commits

  • Merge first patch-bomb from Andrew Morton:
    - Various misc bits
    - kmemleak fixes
    - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
    - fanotify
    - I appear to have become SuperH maintainer
    - ocfs2 updates
    - direct-io tweaks
    - a bit of the MM queue
    - printk updates
    - MAINTAINERS maintenance
    - some backlight things
    - lib/ updates
    - checkpatch updates
    - the rtc queue
    - nilfs2 updates
    - Small Documentation/ updates

    * emailed patches from Andrew Morton : (237 commits)
    Documentation/SubmittingPatches: remove references to patch-scripts
    Documentation/SubmittingPatches: update some dead URLs
    Documentation/filesystems/ntfs.txt: remove changelog reference
    Documentation/kmemleak.txt: updates
    fs/reiserfs/super.c: add __init to init_inodecache
    fs/reiserfs: move prototype declaration to header file
    fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
    fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
    fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
    nilfs2: update project's web site in nilfs2.txt
    nilfs2: update MAINTAINERS file entries fix
    nilfs2: verify metadata sizes read from disk
    nilfs2: add FITRIM ioctl support for nilfs2
    nilfs2: add nilfs_sufile_trim_fs to trim clean segs
    nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
    nilfs2: add nilfs_sufile_set_suinfo to update segment usage
    nilfs2: add struct nilfs_suinfo_update and flags
    nilfs2: update MAINTAINERS file entries
    fs/coda/inode.c: add __init to init_inodecache()
    BEFS: logging cleanup
    ...

    Linus Torvalds
     
  • Since put_mems_allowed() is strictly optional, its a seqcount retry, we
    don't need to evaluate the function if the allocation was in fact
    successful, saving a smp_rmb some loads and comparisons on some relative
    fast-paths.

    Since the naming, get/put_mems_allowed() does suggest a mandatory
    pairing, rename the interface, as suggested by Mel, to resemble the
    seqcount interface.

    This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
    where it is important to note that the return value of the latter call
    is inverted from its previous incarnation.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull cgroup updates from Tejun Heo:
    "A lot updates for cgroup:

    - The biggest one is cgroup's conversion to kernfs. cgroup took
    after the long abandoned vfs-entangled sysfs implementation and
    made it even more convoluted over time. cgroup's internal objects
    were fused with vfs objects which also brought in vfs locking and
    object lifetime rules. Naturally, there are places where vfs rules
    don't fit and nasty hacks, such as credential switching or lock
    dance interleaving inode mutex and cgroup_mutex with object serial
    number comparison thrown in to decide whether the operation is
    actually necessary, needed to be employed.

    After conversion to kernfs, internal object lifetime and locking
    rules are mostly isolated from vfs interactions allowing shedding
    of several nasty hacks and overall simplification. This will also
    allow implmentation of operations which may affect multiple cgroups
    which weren't possible before as it would have required nesting
    i_mutexes.

    - Various simplifications including dropping of module support,
    easier cgroup name/path handling, simplified cgroup file type
    handling and task_cg_lists optimization.

    - Prepatory changes for the planned unified hierarchy, which is still
    a patchset away from being actually operational. The dummy
    hierarchy is updated to serve as the default unified hierarchy.
    Controllers which aren't claimed by other hierarchies are
    associated with it, which BTW was what the dummy hierarchy was for
    anyway.

    - Various fixes from Li and others. This pull request includes some
    patches to add missing slab.h to various subsystems. This was
    triggered xattr.h include removal from cgroup.h. cgroup.h
    indirectly got included a lot of files which brought in xattr.h
    which brought in slab.h.

    There are several merge commits - one to pull in kernfs updates
    necessary for converting cgroup (already in upstream through
    driver-core), others for interfering changes in the fixes branch"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
    cgroup: remove useless argument from cgroup_exit()
    cgroup: fix spurious lockdep warning in cgroup_exit()
    cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
    cgroup: break kernfs active_ref protection in cgroup directory operations
    cgroup: fix cgroup_taskset walking order
    cgroup: implement CFTYPE_ONLY_ON_DFL
    cgroup: make cgrp_dfl_root mountable
    cgroup: drop const from @buffer of cftype->write_string()
    cgroup: rename cgroup_dummy_root and related names
    cgroup: move ->subsys_mask from cgroupfs_root to cgroup
    cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
    cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
    cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
    cgroup: reorganize cgroup bootstrapping
    cgroup: relocate setting of CGRP_DEAD
    cpuset: use rcu_read_lock() to protect task_cs()
    cgroup_freezer: document freezer_fork() subtleties
    cgroup: update cgroup_transfer_tasks() to either succeed or fail
    cgroup: drop task_lock() protection around task->cgroups
    cgroup: update how a newly forked task gets associated with css_set
    ...

    Linus Torvalds
     

19 Mar, 2014

1 commit

  • cftype->write_string() just passes on the writeable buffer from kernfs
    and there's no reason to add const restriction on the buffer. The
    only thing const achieves is unnecessarily complicating parsing of the
    buffer. Drop const from @buffer.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki

    Tejun Heo
     

04 Mar, 2014

1 commit


27 Feb, 2014

2 commits

  • It's not safe to access task's cpuset after releasing task_lock().
    Holding callback_mutex won't help.

    Cc:
    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • I can trigger a lockdep warning:

    # mount -t cgroup -o cpuset xxx /cgroup
    # mkdir /cgroup/cpuset
    # mkdir /cgroup/tmp
    # echo 0 > /cgroup/tmp/cpuset.cpus
    # echo 0 > /cgroup/tmp/cpuset.mems
    # echo 1 > /cgroup/tmp/cpuset.memory_migrate
    # echo $$ > /cgroup/tmp/tasks
    # echo 1 > /cgruop/tmp/cpuset.mems

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.14.0-rc1-0.1-default+ #32 Not tainted
    -------------------------------
    include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
    ...
    [] dump_stack+0x72/0x86
    [] lockdep_rcu_suspicious+0x101/0x140
    [] cpuset_migrate_mm+0xb1/0xe0
    ...

    We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
    we hold cpuset_mutex, which causes task_css() to complain.

    This is not a false-positive but a real issue.

    Holding cpuset_mutex won't prevent a task from migrating to another
    cpuset, and it won't prevent the original task->cgroup from destroying
    during this change.

    Fixes: 5d21cc2db040 (cpuset: replace cgroup_mutex locking with cpuset internal locking)
    Cc: # 3.9+
    Signed-off-by: Li Zefan
    Sigend-off-by: Tejun Heo

    Li Zefan
     

13 Feb, 2014

4 commits

  • cgroup_taskset_cur_css() will be removed during the planned
    resturcturing of migration path. The only use of
    cgroup_taskset_cur_css() is finding out the old cgroup_subsys_state of
    the leader in cpuset_attach(). This usage can easily be removed by
    remembering the old value from cpuset_can_attach().

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching
    css. The intention of the interface is to make it easy to skip css's
    (cgroup_subsys_states) which already match the migration target;
    however, this is entirely unnecessary as migration taskset doesn't
    include tasks which are already in the target cgroup. Drop @skip_css
    from cgroup_taskset_for_each().

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann

    Tejun Heo
     
  • Now that css_task_iter_start/next_end() supports blocking while
    iterating, there's no reason to use css_scan_tasks() which is more
    cumbersome to use and scheduled to be removed.

    Convert all css_scan_tasks() usages in cpuset to
    css_task_iter_start/next/end(). This simplifies the code by removing
    heap allocation and callbacks.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup_task_count() read-locks css_set_lock and walks all tasks to
    count them and then returns the result. The only thing all the users
    want is determining whether the cgroup is empty or not. This patch
    implements cgroup_has_tasks() which tests whether cgroup->cset_links
    is empty, replaces all cgroup_task_count() usages and unexports it.

    Note that the test isn't synchronized. This is the same as before.
    The test has always been racy.

    This will help planned css_set locking update.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki

    Tejun Heo
     

12 Feb, 2014

1 commit

  • cgroup->name handling became quite complicated over time involving
    dedicated struct cgroup_name for RCU protection. Now that cgroup is
    on kernfs, we can drop all of it and simply use kernfs_name/path() and
    friends. Replace cgroup->name and all related code with kernfs
    name/path constructs.

    * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
    of kernfs counterparts, which involves semantic changes.
    pr_cont_cgroup_name() and pr_cont_cgroup_path() added.

    * cgroup->name handling dropped from cgroup_rename().

    * All users of cgroup_name/path() updated to the new semantics. Users
    which were formatting the string just to printk them are converted
    to use pr_cont_cgroup_name/path() instead, which simplifies things
    quite a bit. As cgroup_name() no longer requires RCU read lock
    around it, RCU lockings which were protecting only cgroup_name() are
    removed.

    v2: Comment above oom_info_lock updated as suggested by Michal.

    v3: dummy_top doesn't have a kn associated and
    pr_cont_cgroup_name/path() ended up calling the matching kernfs
    functions with NULL kn leading to oops. Test for NULL kn and
    print "/" if so. This issue was reported by Fengguang Wu.

    v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to
    cgroup_idr with cgroup_mutex").

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Michal Hocko
    Acked-by: Li Zefan
    Cc: Fengguang Wu
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki

    Tejun Heo
     

08 Feb, 2014

1 commit

  • cgroup_subsys is a bit messier than it needs to be.

    * The name of a subsys can be different from its internal identifier
    defined in cgroup_subsys.h. Most subsystems use the matching name
    but three - cpu, memory and perf_event - use different ones.

    * cgroup_subsys_id enums are postfixed with _subsys_id and each
    cgroup_subsys is postfixed with _subsys. cgroup.h is widely
    included throughout various subsystems, it doesn't and shouldn't
    have claim on such generic names which don't have any qualifier
    indicating that they belong to cgroup.

    * cgroup_subsys->subsys_id should always equal the matching
    cgroup_subsys_id enum; however, we require each controller to
    initialize it and then BUG if they don't match, which is a bit
    silly.

    This patch cleans up cgroup_subsys names and initialization by doing
    the followings.

    * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
    cgroup_subsys with _cgrp_subsys.

    * With the above, renaming subsys identifiers to match the userland
    visible names doesn't cause any naming conflicts. All non-matching
    identifiers are renamed to match the official names.

    cpu_cgroup -> cpu
    mem_cgroup -> memory
    perf -> perf_event

    * controllers no longer need to initialize ->subsys_id and ->name.
    They're generated in cgroup core and set automatically during boot.

    * Redundant cgroup_subsys declarations removed.

    * While updating BUG_ON()s in cgroup_init_early(), convert them to
    WARN()s. BUGging that early during boot is stupid - the kernel
    can't print anything, even through serial console and the trap
    handler doesn't even link stack frame properly for back-tracing.

    This patch doesn't introduce any behavior changes.

    v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
    classid handling into core").

    Signed-off-by: Tejun Heo
    Acked-by: Neil Horman
    Acked-by: "David S. Miller"
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Acked-by: Aristeu Rozanski
    Acked-by: Ingo Molnar
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Serge E. Hallyn
    Cc: Vivek Goyal
    Cc: Thomas Graf

    Tejun Heo
     

06 Dec, 2013

2 commits

  • In preparation of conversion to kernfs, cgroup file handling is
    updated so that it can be easily mapped to kernfs. This patch
    replaces cftype->read_seq_string() with cftype->seq_show() which is
    not limited to single_open() operation and will map directcly to
    kernfs seq_file interface.

    The conversions are mechanical. As ->seq_show() doesn't have @css and
    @cft, the functions which make use of them are converted to use
    seq_css() and seq_cft() respectively. In several occassions, e.f. if
    it has seq_string in its name, the function name is updated to fit the
    new method better.

    This patch does not introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Acked-by: Aristeu Rozanski
    Acked-by: Vivek Goyal
    Acked-by: Michal Hocko
    Acked-by: Daniel Wagner
    Acked-by: Li Zefan
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Neil Horman

    Tejun Heo
     
  • In preparation of conversion to kernfs, cgroup file handling is being
    consolidated so that it can be easily mapped to the seq_file based
    interface of kernfs.

    All users of cftype->read() can be easily served, usually better, by
    seq_file and other methods. Rename cpuset_common_file_read() to
    cpuset_common_read_seq_string() and convert it to use
    read_seq_string() interface instead. This not only simplifies the
    code but also makes it more versatile. Before, the file couldn't
    output if the result is longer than PAGE_SIZE. After the conversion,
    seq_file automatically grows the buffer until the output can fit.

    This patch doesn't make any visible behavior changes except for being
    able to handle output larger than PAGE_SIZE.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

28 Nov, 2013

1 commit

  • Juri hit the below lockdep report:

    [ 4.303391] ======================================================
    [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
    [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted
    [ 4.303395] ------------------------------------------------------
    [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
    [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [] new_slab+0x6c/0x290
    [ 4.303417]
    [ 4.303417] and this task is already holding:
    [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [] blk_execute_rq_nowait+0x5b/0x100
    [ 4.303431] which would create a new lock dependency:
    [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
    [ 4.303436]

    [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
    [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
    [ 4.303922] HARDIRQ-ON-W at:
    [ 4.303923] [] __lock_acquire+0x65a/0x1ff0
    [ 4.303926] [] lock_acquire+0x93/0x140
    [ 4.303929] [] kthreadd+0x86/0x180
    [ 4.303931] [] ret_from_fork+0x7c/0xb0
    [ 4.303933] SOFTIRQ-ON-W at:
    [ 4.303933] [] __lock_acquire+0x68c/0x1ff0
    [ 4.303935] [] lock_acquire+0x93/0x140
    [ 4.303940] [] kthreadd+0x86/0x180
    [ 4.303955] [] ret_from_fork+0x7c/0xb0
    [ 4.303959] INITIAL USE at:
    [ 4.303960] [] __lock_acquire+0x344/0x1ff0
    [ 4.303963] [] lock_acquire+0x93/0x140
    [ 4.303966] [] kthreadd+0x86/0x180
    [ 4.303969] [] ret_from_fork+0x7c/0xb0
    [ 4.303972] }

    Which reports that we take mems_allowed_seq with interrupts enabled. A
    little digging found that this can only be from
    cpuset_change_task_nodemask().

    This is an actual deadlock because an interrupt doing an allocation will
    hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
    forever waiting for the write side to complete.

    Cc: John Stultz
    Cc: Mel Gorman
    Reported-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Tested-by: Juri Lelli
    Acked-by: Li Zefan
    Acked-by: Mel Gorman
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Peter Zijlstra
     

04 Sep, 2013

1 commit

  • Pull cgroup updates from Tejun Heo:
    "A lot of activities on the cgroup front. Most changes aren't visible
    to userland at all at this point and are laying foundation for the
    planned unified hierarchy.

    - The biggest change is decoupling the lifetime management of css
    (cgroup_subsys_state) from that of cgroup's. Because controllers
    (cpu, memory, block and so on) will need to be dynamically enabled
    and disabled, css which is the association point between a cgroup
    and a controller may come and go dynamically across the lifetime of
    a cgroup. Till now, css's were created when the associated cgroup
    was created and stayed till the cgroup got destroyed.

    Assumptions around this tight coupling permeated through cgroup
    core and controllers. These assumptions are gradually removed,
    which consists bulk of patches, and css destruction path is
    completely decoupled from cgroup destruction path. Note that
    decoupling of creation path is relatively easy on top of these
    changes and the patchset is pending for the next window.

    - cgroup has its own event mechanism cgroup.event_control, which is
    only used by memcg. It is overly complex trying to achieve high
    flexibility whose benefits seem dubious at best. Going forward,
    new events will simply generate file modified event and the
    existing mechanism is being made specific to memcg. This pull
    request contains prepatory patches for such change.

    - Various fixes and cleanups"

    Fixed up conflict in kernel/cgroup.c as per Tejun.

    * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
    cgroup: fix cgroup_css() invocation in css_from_id()
    cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
    cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
    cgroup: implement CFTYPE_NO_PREFIX
    cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
    cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
    cgroup: fix cgroup_write_event_control()
    cgroup: fix subsystem file accesses on the root cgroup
    cgroup: change cgroup_from_id() to css_from_id()
    cgroup: use css_get() in cgroup_create() to check CSS_ROOT
    cpuset: remove an unncessary forward declaration
    cgroup: RCU protect each cgroup_subsys_state release
    cgroup: move subsys file removal to kill_css()
    cgroup: factor out kill_css()
    cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
    cgroup: replace cgroup->css_kill_cnt with ->nr_css
    cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
    cgroup: move cgroup->subsys[] assignment to online_css()
    cgroup: reorganize css init / exit paths
    cgroup: add __rcu modifier to cgroup->subsys[]
    ...

    Linus Torvalds
     

21 Aug, 2013

1 commit

  • It's not allowed to clear masks of a cpuset if there're tasks in it,
    but it's broken:

    # mkdir /cgroup/sub
    # echo 0 > /cgroup/sub/cpuset.cpus
    # echo 0 > /cgroup/sub/cpuset.mems
    # echo $$ > /cgroup/sub/tasks
    # echo > /cgroup/sub/cpuset.cpus
    (should fail)

    This bug was introduced by commit 88fa523bff295f1d60244a54833480b02f775152
    ("cpuset: allow to move tasks to empty cpusets").

    tj: Dropped temp bool variables and nestes the conditionals directly.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

14 Aug, 2013

1 commit


13 Aug, 2013

1 commit


09 Aug, 2013

11 commits

  • Previously, all css descendant iterators didn't include the origin
    (root of subtree) css in the iteration. The reasons were maintaining
    consistency with css_for_each_child() and that at the time of
    introduction more use cases needed skipping the origin anyway;
    however, given that css_is_descendant() considers self to be a
    descendant, omitting the origin css has become more confusing and
    looking at the accumulated use cases rather clearly indicates that
    including origin would result in simpler code overall.

    While this is a change which can easily lead to subtle bugs, cgroup
    API including the iterators has recently gone through major
    restructuring and no out-of-tree changes will be applicable without
    adjustments making this a relatively acceptable opportunity for this
    type of change.

    The conversions are mostly straight-forward. If the iteration block
    had explicit origin handling before or after, it's moved inside the
    iteration. If not, if (pos == origin) continue; is added. Some
    conversions add extra reference get/put around origin handling by
    consolidating origin handling and the rest. While the extra ref
    operations aren't strictly necessary, this shouldn't cause any
    noticeable difference.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Michal Hocko
    Cc: Jens Axboe
    Cc: Matt Helsley
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     
  • cgroup is in the process of converting to css (cgroup_subsys_state)
    from cgroup as the principal subsystem interface handle. This is
    mostly to prepare for the unified hierarchy support where css's will
    be created and destroyed dynamically but also helps cleaning up
    subsystem implementations as css is usually what they are interested
    in anyway.

    cgroup_taskset which is used by the subsystem attach methods is the
    last cgroup subsystem API which isn't using css as the handle. Update
    cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
    cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.

    The conversions are pretty mechanical. One exception is
    cpuset::cgroup_cs(), which lost its last user and got removed.

    This patch shouldn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Daniel Wagner
    Cc: Ingo Molnar
    Cc: Matt Helsley
    Cc: Steven Rostedt

    Tejun Heo
     
  • cgroup is in the process of converting to css (cgroup_subsys_state)
    from cgroup as the principal subsystem interface handle. This is
    mostly to prepare for the unified hierarchy support where css's will
    be created and destroyed dynamically but also helps cleaning up
    subsystem implementations as css is usually what they are interested
    in anyway.

    This patch converts task iterators to deal with css instead of cgroup.
    Note that under unified hierarchy, different sets of tasks will be
    considered belonging to a given cgroup depending on the subsystem in
    question and making the iterators deal with css instead cgroup
    provides them with enough information about the iteration.

    While at it, fix several function comment formats in cpuset.c.

    This patch doesn't introduce any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley

    Tejun Heo
     
  • cgroup_scan_tasks() takes a pointer to struct cgroup_scanner as its
    sole argument and the only function of that struct is packing the
    arguments of the function call which are consisted of five fields.
    It's not too unusual to pack parameters into a struct when the number
    of arguments gets excessive or the whole set needs to be passed around
    a lot, but neither holds here making it just weird.

    Drop struct cgroup_scanner and pass the params directly to
    cgroup_scan_tasks(). Note that struct cpuset_change_nodemask_arg was
    added to cpuset.c to pass both ->cs and ->newmems pointer to
    cpuset_change_nodemask() using single data pointer.

    This doesn't make any functional differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using css
    (cgroup_subsys_state) as the primary handle instead of cgroup in
    subsystem API. For hierarchy iterators, this is beneficial because

    * In most cases, css is the only thing subsystems care about anyway.

    * On the planned unified hierarchy, iterations for different
    subsystems will need to skip over different subtrees of the
    hierarchy depending on which subsystems are enabled on each cgroup.
    Passing around css makes it unnecessary to explicitly specify the
    subsystem in question as css is intersection between cgroup and
    subsystem

    * For the planned unified hierarchy, css's would need to be created
    and destroyed dynamically independent from cgroup hierarchy. Having
    cgroup core manage css iteration makes enforcing deref rules a lot
    easier.

    Most subsystem conversions are straight-forward. Noteworthy changes
    are

    * blkio: cgroup_to_blkcg() is no longer used. Removed.

    * freezer: cgroup_freezer() is no longer used. Removed.

    * devices: cgroup_to_devcgroup() is no longer used. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup.
    Please see the previous commit which converts the subsystem methods
    for rationale.

    This patch converts all cftype file operations to take @css instead of
    @cgroup. cftypes for the cgroup core files don't have their subsytem
    pointer set. These will automatically use the dummy_css added by the
    previous patch and can be converted the same way.

    Most subsystem conversions are straight forwards but there are some
    interesting ones.

    * freezer: update_if_frozen() is also converted to take @css instead
    of @cgroup for consistency. This will make the code look simpler
    too once iterators are converted to use css.

    * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
    vmpressure while mem_cgroup_from_cont() can be made static.
    Updated accordingly.

    * cpu: cgroup_tg() doesn't have any user left. Removed.

    * cpuacct: cgroup_ca() doesn't have any user left. Removed.

    * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
    Removed.

    * net_cls: cgrp_cls_state() doesn't have any user left. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup *
    in subsystem implementations for the following reasons.

    * With unified hierarchy, subsystems will be dynamically bound and
    unbound from cgroups and thus css's (cgroup_subsys_state) may be
    created and destroyed dynamically over the lifetime of a cgroup,
    which is different from the current state where all css's are
    allocated and destroyed together with the associated cgroup. This
    in turn means that cgroup_css() should be synchronized and may
    return NULL, making it more cumbersome to use.

    * Differing levels of per-subsystem granularity in the unified
    hierarchy means that the task and descendant iterators should behave
    differently depending on the specific subsystem the iteration is
    being performed for.

    * In majority of the cases, subsystems only care about its part in the
    cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
    often obtain the matching css pointer from the cgroup and don't
    bother with the cgroup pointer itself. Passing around css fits
    much better.

    This patch converts all cgroup_subsys methods to take @css instead of
    @cgroup. The conversions are mostly straight-forward. A few
    noteworthy changes are

    * ->css_alloc() now takes css of the parent cgroup rather than the
    pointer to the new cgroup as the css for the new cgroup doesn't
    exist yet. Knowing the parent css is enough for all the existing
    subsystems.

    * In kernel/cgroup.c::offline_css(), unnecessary open coded css
    dereference is replaced with local variable access.

    This patch shouldn't cause any behavior differences.

    v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too. Suggested
    by Li Zefan.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • Currently, controllers have to explicitly follow the cgroup hierarchy
    to find the parent of a given css. cgroup is moving towards using
    cgroup_subsys_state as the main controller interface construct, so
    let's provide a way to climb the hierarchy using just csses.

    This patch implements css_parent() which, given a css, returns its
    parent. The function is guarnateed to valid non-NULL parent css as
    long as the target css is not at the top of the hierarchy.

    freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
    are converted to use css_parent() instead of accessing cgroup->parent
    directly.

    * __parent_ca() is dropped from cpuacct and its usage is replaced with
    parent_ca(). The only difference between the two was NULL test on
    cgroup->parent which is now embedded in css_parent() making the
    distinction moot. Note that eventually a css->parent field will be
    added to css and the NULL check in css_parent() will go away.

    This patch shouldn't cause any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • css (cgroup_subsys_state) is usually embedded in a subsys specific
    data structure. Subsystems either use container_of() directly to cast
    from css to such data structure or has an accessor function wrapping
    such cast. As cgroup as whole is moving towards using css as the main
    interface handle, add and update such accessors to ease dealing with
    css's.

    All accessors explicitly handle NULL input and return NULL in those
    cases. While this looks like an extra branch in the code, as all
    controllers specific data structures have css as the first field, the
    casting doesn't involve any offsetting and the compiler can trivially
    optimize out the branch.

    * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
    accessor. Added.

    * memory, hugetlb and devices already had one but didn't explicitly
    handle NULL input. Updated.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cpuset uses "const" qualifiers on struct cpuset in some functions;
    however, it doesn't work well when a value derived from returned const
    pointer has to be passed to an accessor. It's C after all.

    Drop the "const" qualifiers except for the trivially leaf ones. This
    patch doesn't make any functional changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • The names of the two struct cgroup_subsys_state accessors -
    cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
    The former clashes with the type name and the latter doesn't even
    indicate it's somehow related to cgroup.

    We're about to revamp large portion of cgroup API, so, let's rename
    them so that they're less awkward. Most per-controller usages of the
    accessors are localized in accessor wrappers and given the amount of
    scheduled changes, this isn't gonna add any noticeable headache.

    Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
    to task_css(). This patch is pure rename.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

31 Jul, 2013

1 commit


30 Jul, 2013

2 commits


03 Jul, 2013

1 commit

  • Pull cpuset changes from Tejun Heo:
    "cpuset has always been rather odd about its configurations - a cgroup
    right after creation didn't allow any task executions before
    configuration, changing configuration in the parent modifies the
    descendants irreversibly and so on. These behaviors are inherently
    nasty and almost hostile against sharing the hierarchy with other
    controllers making it very difficult to use in unified hierarchy.

    Li is currently in the process of updating the behaviors for
    __DEVEL__sane_behavior which is the bulk of changes in this pull
    request. It isn't complete yet and the behaviors will change further
    but all changes are gated behind sane_behavior. In the process, the
    rather hairy work-item punting which was used to work around the
    limitations of cgroup descendant iterator was simplified."

    * 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cpuset: rename @cont to @cgrp
    cpuset: fix to migrate mm correctly in a corner case
    cpuset: allow to move tasks to empty cpusets
    cpuset: allow to keep tasks in empty cpusets
    cpuset: introduce effective_{cpumask|nodemask}_cpuset()
    cpuset: record old_mems_allowed in struct cpuset
    cpuset: remove async hotplug propagation work
    cpuset: let hotplug propagation work wait for task attaching
    cpuset: re-structure update_cpumask() a bit
    cpuset: remove cpuset_test_cpumask()
    cpuset: remove unnecessary variable in cpuset_attach()
    cpuset: cleanup guarantee_online_{cpus|mems}()
    cpuset: remove redundant check in cpuset_cpus_allowed_fallback()

    Linus Torvalds
     

19 Jun, 2013

1 commit

  • Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
    back and the comments/Documentation never got updated.

    I figured it out when I was going through sched-domains.txt and so thought of
    fixing it globally.

    I haven't crossed check if the stuff that is referenced in sched/core.c by all
    these files is still present and hasn't changed as that wasn't the motive behind
    this patch.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
    Signed-off-by: Ingo Molnar

    Viresh Kumar
     

14 Jun, 2013

4 commits

  • Cont is short for container. control group was named process container
    at first, but then people found container already has a meaning in
    linux kernel.

    Clean up the leftover variable name @cont.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • Before moving tasks out of empty cpusets, update_tasks_nodemask()
    is called, which calls do_migrate_pages(xx, from, to). Then those
    tasks are moved to an ancestor, and do_migrate_pages() is called
    again.

    The first time: from = node_to_be_offlined, to = empty.
    The second time: from = empty, to = ancestor's nodemask.

    so looks like no pages will be migrated.

    Fix this by:

    - Don't call update_tasks_nodemask() on empty cpusets.
    - Pass cs->old_mems_allowed to do_migrate_pages().

    v4: added comment in cpuset_hotplug_update_tasks() and rephased comment
    in cpuset_attach().

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • Currently some cpuset behaviors are not friendly when cpuset is co-mounted
    with other cgroup controllers.

    Now with this patchset if cpuset is mounted with sane_behavior option,
    it behaves differently:

    - Tasks will be kept in empty cpusets when hotplug happens and take
    masks of ancestors with non-empty cpus/mems, instead of being moved to
    an ancestor.

    - A task can be moved into an empty cpuset, and again it takes masks of
    ancestors, so the user can drop a task into a newly created cgroup without
    having to do anything for it.

    As tasks can reside in empy cpusets, here're some rules:

    - They can be moved to another cpuset, regardless it's empty or not.

    - Though it takes masks from ancestors, it takes other configs from the
    empty cpuset.

    - If the ancestors' masks are changed, those tasks will also be updated
    to take new masks.

    v2: add documentation in include/linux/cgroup.h

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • To achieve this:

    - We call update_tasks_cpumask/nodemask() for empty cpusets when
    hotplug happens, instead of moving tasks out of them.

    - When a cpuset's masks are changed by writing cpuset.cpus/mems,
    we also update tasks in child cpusets which are empty.

    v3:
    - do propagation work in one place for both hotplug and unplug

    v2:
    - drop rcu_read_lock before calling update_task_nodemask() and
    update_task_cpumask(), instead of using workqueue.
    - add documentation in include/linux/cgroup.h

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan