29 Apr, 2008

40 commits

  • Make the memory hotplug chain's mutex held for a shorter time: when memory is
    offlined or onlined a work item is added to the global workqueue. When the
    work item is run, it notifies the ipcns notifier chain with the
    IPCNS_MEMCHANGED event.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a trivial patch that defines the priority of slab_memory_callback in
    the callback chain as a constant. This is to prepare for next patch in the
    series.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • On large systems we'd like to allow a larger number of message queues. In
    some cases up to 32K. However simply setting MSGMNI to a larger value may
    cause problems for smaller systems.

    The first patch of this series introduces a default maximum number of message
    queue ids that scales with the amount of lowmem.

    Since msgmni is per namespace and there is no amount of memory dedicated to
    each namespace so far, the second patch of this series scales msgmni to the
    number of ipc namespaces too.

    Since msgmni depends on the amount of memory, it becomes necessary to
    recompute it upon memory add/remove. In the 4th patch, memory hotplug
    management is added: a notifier block is registered into the memory hotplug
    notifier chain for the ipc subsystem. Since the ipc namespaces are not linked
    together, they have their own notification chain: one notifier_block is
    defined per ipc namespace. Each time an ipc namespace is created (removed) it
    registers (unregisters) its notifier block in (from) the ipcns chain. The
    callback routine registered in the memory chain invokes the ipcns notifier
    chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the
    ipcns namespace, in turn, recomputes msgmni for the owning namespace.

    The 5th patch makes it possible to keep the memory hotplug notifier chain's
    lock for a lesser amount of time: instead of directly notifying the ipcns
    notifier chain upon memory add/remove, a work item is added to the global
    workqueue. When activated, this work item is the one who notifies the ipcns
    notifier chain.

    Since msgmni depends on the number of ipc namespaces, it becomes necessary to
    recompute it upon ipc namespace creation / removal. The 6th patch uses the
    ipc namespace notifier chain for that purpose: that chain is notified each
    time an ipc namespace is created or removed. This makes it possible to
    recompute msgmni for all the namespaces each time one of them is created or
    removed.

    When msgmni is explicitely set from userspace, we should avoid recomputing it
    upon memory add/remove or ipcns creation/removal. This is what the 7th patch
    does: it simply unregisters the ipcns callback routine as soon as msgmni has
    been changed from procfs or sysctl().

    Even if msgmni is set by hand, it should be possible to make it back
    automatically recomputed upon memory add/remove or ipcns creation/removal.
    This what is achieved in patch 8: if set to a negative value, msgmni is added
    back to the ipcns notifier chain, making it automatically recomputed again.

    This patch:

    Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is
    now set to make the message queues occupy 1/32 of the available lowmem.

    Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
    says it's not used, but it also defines it as a size in bytes (the code
    expresses it in Kbytes).

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • By continuing to consolidate a little the IPC code, each id can be built
    directly in ipc_addid() instead of having it built from each callers of
    ipc_addid()

    And I also remove shm_addid() in order to have, as much as possible, the
    same code for shm/sem/msg.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Fix kernel bugzilla #10388.

    DMA-API.txt has wrong argument type for some functions. It uses struct device
    but should use struct pci_dev.

    Signed-off-by: Randy Dunlap
    Acked-by: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
    when mapping user-allocated CQs with ib_umem_get().

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     
  • Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
    Implement the old dma_*map_*() interfaces in terms of the corresponding new
    interfaces. For ia64/sn, make use of one dma attribute,
    DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     
  • Document the new dma_*map*_attrs() functions.

    [markn@au1.ibm.com: fix up for dma-add-dma_map_attrs-interfaces and update docs]
    Signed-off-by: Arthur Kepner
    Acked-by: David S. Miller
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Mark Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     
  • Introduce new interfaces, dma_*map*_attrs(), for passing architecture-specific
    attributes when memory is mapped and unmapped for DMA. Give the interfaces
    default implementations which ignore attributes. Also introduce the
    dma_{set|get}_attr() interfaces for setting and retrieving individual
    attributes. Define one attribute, DMA_ATTR_WRITE_BARRIER, in anticipation of
    its use by ia64/sn. Select whether architectures implement arch-specific
    versions of the dma_*map*_attrs() interfaces via HAVE_DMA_ATTRS in Kconfig.

    [markn@au1.ibm.com: dma_{set,get}_attr() have to be static inline]
    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Mark Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     
  • cpu_hotplug_begin() must be always called under cpu_add_remove_lock, this
    means that only one process can be cpu_hotplug.active_writer. So we don't
    need the cpu_hotplug.writer_queue, we can wake up the ->active_writer
    directly.

    Also, fix the comment.

    Signed-off-by: Oleg Nesterov
    Cc: Dipankar Sarma
    Acked-by: Gautham R Shenoy
    Cc: Ingo Molnar
    Cc: Srivatsa Vaddagiri
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • cleanup_workqueue_thread() doesn't need the second argument, remove it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When cpu_populated_map was introduced, it was supposed that cwq->thread can
    survive after CPU_DEAD, that is why we never shrink cpu_populated_map.

    This is not very nice, we can safely remove the already dead CPU from the map.
    The only required change is that destroy_workqueue() must hold the hotplug
    lock until it destroys all cwq->thread's, to protect the cpu_populated_map.
    We could make the local copy of cpu mask and drop the lock, but
    sizeof(cpumask_t) may be very large.

    Also, fix the comment near queue_work(). Unless _cpu_down() happens we do
    guarantee the cpu-affinity of the work_struct, and we have users which rely on
    this.

    [akpm@linux-foundation.org: repair comment]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This flag provides the hardwalling properties of mem_exclusive, without
    enforcing the exclusivity. Either mem_hardwall or mem_exclusive is sufficient
    to prevent GFP_KERNEL allocations from passing outside the cpuset's assigned
    nodes.

    Signed-off-by: Paul Menage
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Currently the cpusets mem_exclusive flag is overloaded to mean both
    "no-overlapping" and "no GFP_KERNEL allocations outside this cpuset".

    These patches add a new mem_hardwall flag with just the allocation restriction
    part of the mem_exclusive semantics, without breaking backwards-compatibility
    for those who continue to use just mem_exclusive. Additionally, the cgroup
    control file registration for cpusets is cleaned up to reduce boilerplate.

    This patch:

    This change tidies up the cpusets control file definitions, and reduces the
    amount of boilerplate required to add/change control files in the future.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Make the following needlessly global functions static:

    - cpuset_test_cpumask()
    - cpuset_change_cpumask()
    - cpuset_do_move_task()

    Signed-off-by: Adrian Bunk
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • *mem has been zeroed, that means mem->info has already been filled with 0.

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • On ia64, this kmalloc() requires order-4 pages. But this is not necessary to
    be physically contiguous. For big mem_cgroup, vmalloc is better. For small
    ones, kmalloc is used.

    [akpm@linux-foundation.org: simplification]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Pavel Emelyanov
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch makes the memory controller more responsive on my desktop.

    1. Set all cached pages as inactive. We were by default marking all pages
    as active, thus forcing us to go through two passes for reclaiming pages

    2. Remove congestion_wait(), since we already have that logic in
    do_try_to_free_pages()

    Signed-off-by: Balbir Singh
    Reviewed-by: KOSAKI Motohiro
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Cc: Pavel Emelianov
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • remove_list/add_list uses page_cgroup_zoneinfo() in it.

    So, it's called twice before and after lock.

    mz = page_cgroup_zoneinfo();
    lock();
    mz = page_cgroup_zoneinfo();
    ....
    unlock();

    And address of mz never changes.

    This is not good. This patch fixes this behavior.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This is a very common requirement from people using the resource accounting
    facilities (not only memcgroup but also OpenVZ beancounters). They want to
    put the cgroup in an initial state without re-creating it.

    For example after re-configuring a group people want to observe how this new
    configuration fits the group needs without saving the previous failcnt value.

    Merge two resets into one mem_cgroup_reset() function to demonstrate how
    multiplexing work.

    Besides, I have plans to move the files, that correspond to res_counter to the
    res_counter.c file and somehow "import" them into controller. I don't know
    how to make it gracefully yet, but merging resets of max_usage and failcnt in
    one function will be there for sure.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • These two files are essentially event callbacks. They do not care about the
    contents of the string, but only about the fact of the write itself.

    Signed-off-by: Pavel Emelyanov
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Move the memory controller data structure page_cgroup to its own slab cache.
    It saves space on the system, allocations are not necessarily pushed to order
    of 2 and should provide performance benefits. Users who disable the memory
    controller can also double check that the memory controller is not allocating
    page_cgroup's.

    NOTE: Hugh Dickins brought up the issue of whether we want to mark page_cgroup
    as __GFP_MOVABLE or __GFP_RECLAIMABLE. I don't think there is an easy answer
    at the moment. page_cgroup's are associated with user pages, they can be
    reclaimed once the user page has been reclaimed, so it might make sense to
    mark them as __GFP_RECLAIMABLE. For now, I am leaving the marking to default
    values that the slab allocator uses.

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Hugh Dickins
    Cc: Sudhir Kumar
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • The resource counter is supposed to facilitate the resource accounting of
    arbitrary resource (and it already does this for memory controller).

    However, it is about to be used in other resources controllers (swap, kernel
    memory, networking, etc), so provide a doc describing how to work with it.
    This will eliminate all the possible future duplications in the appropriate
    controllers' docs.

    Fixed errors pointed out by Randy.

    [akpm@linux-foundation.org: fix documentation tpyo]
    Signed-off-by: Pavel Emelyanov
    Cc: Randy Dunlap
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This field is the maximal value of the usage one since the counter creation
    (or since the latest reset).

    To reset this to the usage value simply write anything to the appropriate
    cgroup file.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Remove the mem_cgroup member from mm_struct and instead adds an owner.

    This approach was suggested by Paul Menage. The advantage of this approach
    is that, once the mm->owner is known, using the subsystem id, the cgroup
    can be determined. It also allows several control groups that are
    virtually grouped by mm_struct, to exist independent of the memory
    controller i.e., without adding mem_cgroup's for each controller, to
    mm_struct.

    A new config option CONFIG_MM_OWNER is added and the memory resource
    controller selects this config option.

    This patch also adds cgroup callbacks to notify subsystems when mm->owner
    changes. The mm_cgroup_changed callback is called with the task_lock() of
    the new task held and is called just prior to changing the mm->owner.

    I am indebted to Paul Menage for the several reviews of this patchset and
    helping me make it lighter and simpler.

    This patch was tested on a powerpc box, it was compiled with both the
    MM_OWNER config turned on and off.

    After the thread group leader exits, it's moved to init_css_state by
    cgroup_exit(), thus all future charges from runnings threads would be
    redirected to the init_css_set's subsystem.

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Hugh Dickins
    Cc: Sudhir Kumar
    Cc: YAMAMOTO Takashi
    Cc: Hirokazu Takahashi
    Cc: David Rientjes ,
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Pekka Enberg
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Introduce a read_seq() helper in cftype, which uses seq_file to print out
    lists. Use it in the devices cgroup. Also split devices.allow into two
    files, so now devices.deny and devices.allow are the ones to use to manipulate
    the whitelist, while devices.list outputs the cgroup's current whitelist.

    Signed-off-by: Serge E. Hallyn
    Acked-by: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Now we can run through the hash table instead of running through the
    linked-list.

    Signed-off-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • We are at system boot and there is only 1 cgroup group (i,e, init_css_set), so
    we don't need to run through the css_set linked list. Neither do we need to
    run through the task list, since no processes have been created yet.

    Also referring to a comment in cgroup.h:

    struct css_set
    {
    ...
    /*
    * Set of subsystem states, one for each subsystem. This array
    * is immutable after creation apart from the init_css_set
    * during subsystem registration (at boot time).
    */
    struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
    }

    Signed-off-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • When we attach a process to a different cgroup, the css_set linked-list will
    be run through to find a suitable existing css_set to use. This patch
    implements a hash table for better performance.

    The following benchmarks have been tested:

    For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping
    task in each, and then move an additional task through each cgroup in
    turn.

    Here is a test result:

    N Loop orig - Time(s) hash - Time(s)
    ----------------------------------------------
    1 10000 1.201231728 1.196311177
    5 2000 1.065743872 1.040566424
    10 1000 0.991054735 0.986876440
    50 200 0.976554203 0.969608733
    100 100 0.998504680 0.969218270
    500 20 1.157347764 0.962602963
    1000 10 1.619521852 1.085140172

    Signed-off-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Implement a cgroup to track and enforce open and mknod restrictions on device
    files. A device cgroup associates a device access whitelist with each cgroup.
    A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
    'all' means it applies to all types and all major and minor numbers. Major
    and minor are either an integer or * for all. Access is a composition of r
    (read), w (write), and m (mknod).

    The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
    the parent. Admins can then remove devices from the whitelist or add new
    entries. A child cgroup can never receive a device access which is denied its
    parent. However when a device access is removed from a parent it will not
    also be removed from the child(ren).

    An entry is added using devices.allow, and removed using
    devices.deny. For instance

    echo 'c 1:3 mr' > /cgroups/1/devices.allow

    allows cgroup 1 to read and mknod the device usually known as
    /dev/null. Doing

    echo a > /cgroups/1/devices.deny

    will remove the default 'a *:* mrw' entry.

    CAP_SYS_ADMIN is needed to change permissions or move another task to a new
    cgroup. A cgroup may not be granted more permissions than the cgroup's parent
    has. Any task can move itself between cgroups. This won't be sufficient, but
    we can decide the best way to adequately restrict movement later.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
    Signed-off-by: Serge E. Hallyn
    Acked-by: James Morris
    Looks-good-to: Pavel Emelyanov
    Cc: Daniel Hokka Zakrisson
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Trigger callback can be used to receive a kick-up from the user space. The
    string written is ignored.

    The cftype->private is used for multiplexing events.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • There is a race between create_proc_entry() and the assignment of file ops.
    proc_create() is invented to fix it.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • It is called by cgroup_init() and cgroup_init_early() only, which are
    annotated with __init.

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • This removes some filesystem boilerplate from the CFS cgroup subsystem.

    Signed-off-by: Paul Menage
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • These patches add cgroups read_s64 and write_s64 control file methods (the
    signed equivalent of read_u64/write_u64) and use them to implement the
    cpu.rt_runtime_us control file in the CFS cgroup subsystem.

    This patch:

    These are the signed equivalents of the read_u64/write_u64 methods

    Signed-off-by: Paul Menage
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • The cgroup debug subsystem isn't generally useful for users. It should
    default to "n".

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • The "releasable" control file provided by the cgroup framework exports the
    state of a per-cgroup flag that's related to the notify-on-release feature.
    This isn't really generally useful, unless you're trying to debug this
    particular feature of cgroups.

    This patch moves the "releasable" file to the cgroup_debug subsystem.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This function isn't needed - a NULL pointer in the cftype read function will
    result in the same EINVAL response to userspace.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage