13 Dec, 2011

1 commit

  • Currently, there's no way to pass multiple tasks to cgroup_subsys
    methods necessitating the need for separate per-process and per-task
    methods. This patch introduces cgroup_taskset which can be used to
    pass multiple tasks and their associated cgroups to cgroup_subsys
    methods.

    Three methods - can_attach(), cancel_attach() and attach() - are
    converted to use cgroup_taskset. This unifies passed parameters so
    that all methods have access to all information. Conversions in this
    patchset are identical and don't introduce any behavior change.

    -v2: documentation updated as per Paul Menage's suggestion.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Paul Menage
    Acked-by: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: James Morris

    Tejun Heo
     

21 Jul, 2011

1 commit


20 Jun, 2011

1 commit


27 May, 2011

1 commit

  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

06 May, 2010

1 commit


23 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

24 Sep, 2009

1 commit

  • Alter the ss->can_attach and ss->attach functions to be able to deal with
    a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
    pre-patch to cgroup-procs-writable.patch.)

    Currently, new mode of the attach function can only tell the subsystem
    about the old cgroup of the threadgroup leader. No subsystem currently
    needs that information for each thread that's being moved, but if one were
    to be added (for example, one that counts tasks within a group) this bit
    would need to be reworked a bit to tell the subsystem the right
    information.

    [hidave.darkstar@gmail.com: fix build]
    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Reviewed-by: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

19 Jun, 2009

1 commit


03 Apr, 2009

1 commit

  • There is nothing special that has to be protected by cgroup_lock,
    so introduce devcgroup_mtuex for it's own use.

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Acked-by: Serge Hallyn
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

09 Jan, 2009

2 commits

  • The devcgroup_inode_permission() hook in the devices whitelist cgroup has
    always bypassed access checks on fifos. But the mknod hook did not. The
    devices whitelist is only about block and char devices, and fifos can't
    even be added to the whitelist, so fifos can't be created at all except by
    tasks which have 'a' in their whitelist (meaning they have access to all
    devices).

    Fix the behavior by bypassing access checks to mkfifo.

    Signed-off-by: Serge E. Hallyn
    Cc: Li Zefan
    Cc: Pavel Emelyanov
    Cc: Paul Menage
    Cc: Lai Jiangshan
    Cc: KOSAKI Motohiro
    Cc: James Morris
    Reported-by: Daniel Lezcano
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • We should use list_for_each_entry_rcu in RCU read site.

    Signed-off-by: Lai Jiangshan
    Cc: Paul Menage
    Cc: KAMEZAWA Hiroyuki
    Cc: Pavel Emelyanov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

20 Oct, 2008

3 commits

  • Since we introduced rcu for read side, spin_lock is used only for update.
    But we always hold cgroup_lock() when update, so spin_lock() is not need.

    Additional cleanup:
    1) include linux/rcupdate.h explicitly
    2) remove unused variable cur_devcgroup in devcgroup_update_access()

    Signed-off-by: Lai Jiangshan
    Acked-by: "Serge E. Hallyn"
    Cc: Paul Menage
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • This saves 40 bytes on my x86_32 box.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

03 Sep, 2008

1 commit

  • During the use of a dev_cgroup, we should guarantee the corresponding
    cgroup won't be deleted (i.e. via rmdir). This can be done through
    css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup
    under rcu_read_lock.

    And also remove checking NULL dev_cgroup, it won't be NULL since a task
    always belongs to a cgroup.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

26 Jul, 2008

3 commits

  • - clean up set_majmin()
    - use simple_strtoul() to parse major/minor

    [akpm@linux-foundation.org: fix simple_strtoul() usage]
    [kosaki.motohiro@jp.fujitsu.com: fix warnings]
    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Serge Hallyn
    Cc: Paul Menage
    Cc: Pavel Emelyanov
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Currently this list is protected with a simple spinlock, even for reading
    from one. This is OK, but can be better.

    Actually I want it to be better very much, since after replacing the
    OpenVZ device permissions engine with the cgroup-based one I noticed, that
    we set 12 default device permissions for each newly created container (for
    /dev/null, full, terminals, ect devices), and people sometimes have up to
    20 perms more, so traversing the ~30-40 elements list under a spinlock
    doesn't seem very good.

    Here's the RCU protection for white-list - dev_whitelist_item-s are added
    and removed under the devcg->lock, but are looked up in permissions
    checking under the rcu_read_lock.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Balbir Singh
    Cc: Paul Menage
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This patch converts devcgroup_access_write() from a raw file handler
    into a handler for the cgroup write_string() method. This allows some
    boilerplate copying/locking/checking to be removed and simplifies the
    cleanup path, since these functions are performed by the cgroups
    framework before calling the handler.

    Signed-off-by: Paul Menage
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: Balbir Singh
    Acked-by: Serge Hallyn
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

14 Jul, 2008

2 commits

  • # cat devices.list
    c 1:3 r
    # echo 'c 1:3 w' > sub/devices.allow
    # cat sub/devices.list
    c 1:3 w

    As illustrated, the parent group has no write permission to /dev/null, so
    it's child should not be allowed to add this write permission.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Serge Hallyn
    Cc: Paul Menage
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • # echo "b $((0x7fffffff)):$((0x80000000)) rwm" > devices.allow
    # cat devices.list
    b 214748364:-21474836 rwm

    though a major/minor number of 0x800000000 is meaningless, we
    should not cast it to a negative value.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Serge Hallyn
    Cc: Paul Menage
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

05 Jul, 2008

1 commit

  • # cat /devcg/devices.list
    a *:* rwm
    # echo a > devices.allow
    # cat /devcg/devices.list
    a *:* rwm
    a 0:0 rwm

    This is odd and maybe confusing. With this patch, writing 'a' to
    devices.allow will add 'a *:* rwm' to the whitelist.

    Also a few fixes and updates to the document.

    Signed-off-by: Li Zefan
    Cc: Pavel Emelyanov
    Cc: Serge E. Hallyn
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: James Morris
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

07 Jun, 2008

3 commits

  • Consider you added a 'c foo:bar r' permission to some cgroup and then (a
    bit later) 'c'foo:bar w' for it. After this you'll see the

    c foo:bar r
    c foo:bar w

    lines in a devices.list file.

    Another example - consider you added 10 'c foo:bar r' permissions to some
    cgroup (e.g. by mistake). After this you'll see 10 c foo:bar r lines in
    a list file.

    This is weird. This situation also has one more annoying consequence.
    Having many items in a white list makes permissions checking slower, sine
    it has to walk a longer list.

    The proposal is to merge permissions for items, that correspond to the
    same device.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Two functions, that need to get a device_cgroup from a task (they are
    devcgroup_inode_permission and devcgroup_inode_mknod) make it in a strange
    way:

    They get a css_set from task, then a subsys_state from css_set, then a
    cgroup from the state and then a subsys_state again from the cgroup.
    Besides, the devices_subsys_id is read from memory, whilst there's a
    enum-ed constant for it.

    Optimize this part a bit:
    1. Get the subsys_stats form the task and be done - no 2 extra
    dereferences,
    2. Use the device_subsys_id constant, not the value from memory
    (i.e. one less dereference).

    Found while preparing 2.6.26 OpenVZ port.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Acked-by: Paul Menage
    Cc: Balbir Singh
    Cc: James Morris
    Cc: Chris Wright
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is just picking the container_of out of cgroup_to_devcgroup into a
    separate function.

    This new css_to_devcgroup will be used in the 2nd patch.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: James Morris
    Cc: Chris Wright
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

29 Apr, 2008

2 commits

  • Introduce a read_seq() helper in cftype, which uses seq_file to print out
    lists. Use it in the devices cgroup. Also split devices.allow into two
    files, so now devices.deny and devices.allow are the ones to use to manipulate
    the whitelist, while devices.list outputs the cgroup's current whitelist.

    Signed-off-by: Serge E. Hallyn
    Acked-by: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Implement a cgroup to track and enforce open and mknod restrictions on device
    files. A device cgroup associates a device access whitelist with each cgroup.
    A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
    'all' means it applies to all types and all major and minor numbers. Major
    and minor are either an integer or * for all. Access is a composition of r
    (read), w (write), and m (mknod).

    The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
    the parent. Admins can then remove devices from the whitelist or add new
    entries. A child cgroup can never receive a device access which is denied its
    parent. However when a device access is removed from a parent it will not
    also be removed from the child(ren).

    An entry is added using devices.allow, and removed using
    devices.deny. For instance

    echo 'c 1:3 mr' > /cgroups/1/devices.allow

    allows cgroup 1 to read and mknod the device usually known as
    /dev/null. Doing

    echo a > /cgroups/1/devices.deny

    will remove the default 'a *:* mrw' entry.

    CAP_SYS_ADMIN is needed to change permissions or move another task to a new
    cgroup. A cgroup may not be granted more permissions than the cgroup's parent
    has. Any task can move itself between cgroups. This won't be sufficient, but
    we can decide the best way to adequately restrict movement later.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
    Signed-off-by: Serge E. Hallyn
    Acked-by: James Morris
    Looks-good-to: Pavel Emelyanov
    Cc: Daniel Hokka Zakrisson
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn