30 May, 2012

1 commit

  • We call the destroy function when a cgroup starts to be removed, such as
    by a rmdir event.

    However, because of our reference counters, some objects are still
    inflight. Right now, we are decrementing the static_keys at destroy()
    time, meaning that if we get rid of the last static_key reference, some
    objects will still have charges, but the code to properly uncharge them
    won't be run.

    This becomes a problem specially if it is ever enabled again, because now
    new charges will be added to the staled charges making keeping it pretty
    much impossible.

    We just need to be careful with the static branch activation: since there
    is no particular preferred order of their activation, we need to make sure
    that we only start using it after all call sites are active. This is
    achieved by having a per-memcg flag that is only updated after
    static_key_slow_inc() returns. At this time, we are sure all sites are
    active.

    This is made per-memcg, not global, for a reason: it also has the effect
    of making socket accounting more consistent. The first memcg to be
    limited will trigger static_key() activation, therefore, accounting. But
    all the others will then be accounted no matter what. After this patch,
    only limited memcgs will have its sockets accounted.

    [akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
    document enum sock_flag_bits,
    convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
    redo tcp_update_limit() comment to 80 cols]
    Signed-off-by: Glauber Costa
    Cc: Tejun Heo
    Cc: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Acked-by: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

11 Apr, 2012

1 commit

  • The only reason cgroup was used, was to be consistent with the populate()
    interface. Now that we're getting rid of it, not only we no longer need
    it, but we also *can't* call it this way.

    Since we will no longer rely on populate(), this will be called from
    create(). During create, the association between struct mem_cgroup
    and struct cgroup does not yet exist, since cgroup internals hasn't
    yet initialized its bookkeeping. This means we would not be able
    to draw the memcg pointer from the cgroup pointer in these
    functions, which is highly undesirable.

    Signed-off-by: Glauber Costa
    Acked-by: Kamezawa Hiroyuki
    Signed-off-by: Tejun Heo
    CC: Li Zefan
    CC: Johannes Weiner
    CC: Michal Hocko

    Glauber Costa
     

02 Apr, 2012

2 commits

  • Convert memcg to use the new cftype based interface. kmem support
    abuses ->populate() for mem_cgroup_sockets_init() so it can't be
    removed at the moment.

    tcp_memcontrol is updated so that tcp_files[] is registered via a
    __initcall. This change also allows removing the forward declaration
    of tcp_files[]. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Balbir Singh
    Cc: Glauber Costa
    Cc: Hugh Dickins
    Cc: Greg Thelen

    Tejun Heo
     
  • blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol
    unnecessarily define cftype array and cgroup_subsys structures at the
    top of the file, which is unconventional and necessiates forward
    declaration of methods.

    This patch relocates those below the definitions of the methods and
    removes the forward declarations. Note that forward declaration of
    tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup(). This
    will be removed soon by another patch.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

21 Mar, 2012

1 commit

  • Pull cgroup changes from Tejun Heo:
    "Out of the 8 commits, one fixes a long-standing locking issue around
    tasklist walking and others are cleanups."

    * 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
    cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
    cgroup: remove cgroup_subsys argument from callbacks
    cgroup: remove extra calls to find_existing_css_set
    cgroup: replace tasklist_lock with rcu_read_lock
    cgroup: simplify double-check locking in cgroup_attach_proc
    cgroup: move struct cgroup_pidlist out from the header file
    cgroup: remove cgroup_attach_task_current_cg()

    Linus Torvalds
     

24 Feb, 2012

1 commit

  • …_key_slow_[inc|dec]()

    So here's a boot tested patch on top of Jason's series that does
    all the cleanups I talked about and turns jump labels into a
    more intuitive to use facility. It should also address the
    various misconceptions and confusions that surround jump labels.

    Typical usage scenarios:

    #include <linux/static_key.h>

    struct static_key key = STATIC_KEY_INIT_TRUE;

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    Or:

    if (static_key_true(&key))
    do likely code
    else
    do unlikely code

    The static key is modified via:

    static_key_slow_inc(&key);
    ...
    static_key_slow_dec(&key);

    The 'slow' prefix makes it abundantly clear that this is an
    expensive operation.

    I've updated all in-kernel code to use this everywhere. Note
    that I (intentionally) have not pushed through the rename
    blindly through to the lowest levels: the actual jump-label
    patching arch facility should be named like that, so we want to
    decouple jump labels from the static-key facility a bit.

    On non-jump-label enabled architectures static keys default to
    likely()/unlikely() branches.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Jason Baron <jbaron@redhat.com>
    Acked-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: a.p.zijlstra@chello.nl
    Cc: mathieu.desnoyers@efficios.com
    Cc: davem@davemloft.net
    Cc: ddaney.cavm@gmail.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Ingo Molnar
     

03 Feb, 2012

1 commit

  • The argument is not used at all, and it's not necessary, because
    a specific callback handler of course knows which subsys it
    belongs to.

    Now only ->pupulate() takes this argument, because the handlers of
    this callback always call cgroup_add_file()/cgroup_add_files().

    So we reduce a few lines of code, though the shrinking of object size
    is minimal.

    16 files changed, 113 insertions(+), 162 deletions(-)

    text data bss dec hex filename
    5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
    5486170 656987 7039960 13183117 c9288d vmlinux.o

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

13 Jan, 2012

1 commit

  • The logic of the current code is that whenever we destroy
    a cgroup that had its limit set (set meaning different than
    maximum), we should decrement the jump_label counter.
    Otherwise we assume it was never incremented.

    But what the code actually does is test for RES_USAGE
    instead of RES_LIMIT. Usage being different than maximum
    is likely to be true most of the time.

    The effect of this is that the key must become negative,
    and since the jump_label test says:

    !!atomic_read(&key->enabled);

    we'll have jump_labels still on when no one else is
    using this functionality.

    Signed-off-by: Glauber Costa
    CC: David S. Miller
    Signed-off-by: David S. Miller

    Glauber Costa
     

16 Dec, 2011

1 commit


13 Dec, 2011

6 commits

  • This patch introduces kmem.tcp.max_usage_in_bytes file, living in the
    kmem_cgroup filesystem. The root cgroup will display a value equal
    to RESOURCE_MAX. This is to avoid introducing any locking schemes in
    the network paths when cgroups are not being actively used.

    All others, will see the maximum memory ever used by this cgroup.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces kmem.tcp.failcnt file, living in the
    kmem_cgroup filesystem. Following the pattern in the other
    memcg resources, this files keeps a counter of how many times
    allocation failed due to limits being hit in this cgroup.
    The root cgroup will always show a failcnt of 0.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces kmem.tcp.usage_in_bytes file, living in the
    kmem_cgroup filesystem. It is a simple read-only file that displays the
    amount of kernel memory currently consumed by the cgroup.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
    effectively control the amount of kernel memory pinned by a cgroup.

    This value is ignored in the root cgroup, and in all others,
    caps the value specified by the admin in the net namespaces'
    view of tcp_sysctl_mem.

    If namespaces are being used, the admin is allowed to set a
    value bigger than cgroup's maximum, the same way it is allowed
    to set pretty much unlimited values in a real box.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch allows each namespace to independently set up
    its levels for tcp memory pressure thresholds. This patch
    alone does not buy much: we need to make this values
    per group of process somehow. This is achieved in the
    patches that follows in this patchset.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces memory pressure controls for the tcp
    protocol. It uses the generic socket memory pressure code
    introduced in earlier patches, and fills in the
    necessary data in cg_proto struct.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa