09 Jul, 2016

3 commits

  • In current code, we can get cpuacct data from several files,
    but each file has various limitations.

    For example:

    - We can get CPU usage in user and kernel mode via cpuacct.stat,
    but we can't get detailed data about each CPU.

    - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys,
    but we can't get user mode usage data at the same time.

    This patch introduces cpuacct.usage_all, to show all detailed CPU
    accounting data together:

    # cat cpuacct.usage_all
    cpu user system
    0 3809760299 5807968992
    1 3250329855 454612211
    ..

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • In cpuacct_stats_show() we currently we have copies of similar code,
    for each cpustat(system/user) variant.

    Use a loop instead to consolidate the code. This will also work better
    if we extend the CPUACCT_STAT_NSTATS type.

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • These two types have similar function, no need to separate them.

    Signed-off-by: Zhao Lei
    Cc: KOSAKI Motohiro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     

13 Apr, 2016

1 commit

  • task_pt_regs() can return NULL for kernel threads, so add a check.
    This fixes an oops at boot on ppc64.

    Reported-and-Tested-by: Srikar Dronamraju
    Tested-by: Zhao Lei
    Signed-off-by: Anton Blanchard
    Acked-by: Zhao Lei
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Cc: efault@gmx.de
    Cc: htejun@gmail.com
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: tj@kernel.org
    Cc: yangds.fnst@cn.fujitsu.com
    Link: http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

31 Mar, 2016

2 commits

  • Sometimes, cpuacct.usage is not detailed enough to see how much CPU
    usage a group had. We want to know how much time it used in user mode
    and how much in kernel mode.

    This patch introduces more files to give this information:

    # ls /sys/fs/cgroup/cpuacct/cpuacct.usage*
    /sys/fs/cgroup/cpuacct/cpuacct.usage
    /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu
    /sys/fs/cgroup/cpuacct/cpuacct.usage_user
    /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu_user
    /sys/fs/cgroup/cpuacct/cpuacct.usage_sys
    /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu_sys

    ... while keeping the ABI with the existing counter.

    Signed-off-by: Dongsheng Yang
    [ Ported to newer kernels. ]
    Signed-off-by: Zhao Lei
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/aa171da036b520b51c79549e9b3215d29473f19d.1458635566.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     
  • Current code show stats of online CPUs in cpuacct.statcpus,
    show stats of present cpus in cpuacct.usage(_percpu), and using
    present CPUs for setting cpuacct.usage.

    It will cause inconsistent result when a CPU is online or offline
    or hotpluged.

    We should always use possible CPUs to avoid above problem.

    Here are the contents of a cpuacct.usage_percpu sysfs file,
    on a 4 CPU system with maxcpus=32:

    Before the patch:
    # cat cpuacct.usage_percpu
    2456565 411435 1052897 832584

    After the patch:
    # cat cpuacct.usage_percpu
    2456565 411435 1052897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    Suggested-by: Peter Zijlstra
    Signed-off-by: Zhao Lei
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a11d56cef12d0b4807f8be3a46bf9798c3014d59.1458635566.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     

25 Mar, 2016

1 commit

  • Pull scheduler fixes from Ingo Molnar:
    "Misc fixes: a cgroup fix, a fair-scheduler migration accounting fix, a
    cputime fix and two cpuacct cleanups"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cpuacct: Simplify the cpuacct code
    sched/cpuacct: Rename parameter in cpuusage_write() for readability
    sched/fair: Add comments to explain select_idle_sibling()
    sched/fair: Fix fairness issue on migration
    sched/cgroup: Fix/cleanup cgroup teardown/init
    sched/cputime: Fix steal time accounting vs. CPU hotplug

    Linus Torvalds
     

21 Mar, 2016

2 commits

  • - Use for() instead of while() loop in some functions
    to make the code simpler.

    - Use this_cpu_ptr() instead of per_cpu_ptr() to make the code
    cleaner and a bit faster.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Zhao Lei
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/d8a7ef9592f55224630cb26dea239f05b6398a4e.1458187654.git.zhaolei@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhao Lei
     
  • The name of the 'reset' parameter to cpuusage_write() is quite confusing,
    because the only valid value we allow is '0', so !reset is actually the
    case that resets ...

    Rename it to 'val' and explain it in a comment that we only allow 0.

    Signed-off-by: Dongsheng Yang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: cgroups@vger.kernel.org
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1450696483-2864-1-git-send-email-yangds.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dongsheng Yang
     

23 Feb, 2016

1 commit


15 Jul, 2014

1 commit

  • Currently, cgroup_subsys->base_cftypes is used for both the unified
    default hierarchy and legacy ones and subsystems can mark each file
    with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
    only on one of them. This is quite hairy and error-prone. Also, we
    may end up exposing interface files to the default hierarchy without
    thinking it through.

    cgroup_subsys will grow two separate cftype arrays and apply each only
    on the hierarchies of the matching type. This will allow organizing
    cftypes in a lot clearer way and encourage subsystems to scrutinize
    the interface which is being exposed in the new default hierarchy.

    In preparation, this patch renames cgroup_subsys->base_cftypes to
    cgroup_subsys->legacy_cftypes. This patch is pure rename.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Horman
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vivek Goyal
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Aristeu Rozanski
    Cc: Aneesh Kumar K.V

    Tejun Heo
     

17 May, 2014

1 commit

  • cgroup in general is moving towards using cgroup_subsys_state as the
    fundamental structural component and css_parent() was introduced to
    convert from using cgroup->parent to css->parent. It was quite some
    time ago and we're moving forward with making css more prominent.

    This patch drops the trivial wrapper css_parent() and let the users
    dereference css->parent. While at it, explicitly mark fields of css
    which are public and immutable.

    v2: New usage from device_cgroup.c converted.

    Signed-off-by: Tejun Heo
    Acked-by: Michal Hocko
    Acked-by: Neil Horman
    Acked-by: "David S. Miller"
    Acked-by: Li Zefan
    Cc: Vivek Goyal
    Cc: Jens Axboe
    Cc: Peter Zijlstra
    Cc: Johannes Weiner

    Tejun Heo
     

08 Feb, 2014

1 commit

  • cgroup_subsys is a bit messier than it needs to be.

    * The name of a subsys can be different from its internal identifier
    defined in cgroup_subsys.h. Most subsystems use the matching name
    but three - cpu, memory and perf_event - use different ones.

    * cgroup_subsys_id enums are postfixed with _subsys_id and each
    cgroup_subsys is postfixed with _subsys. cgroup.h is widely
    included throughout various subsystems, it doesn't and shouldn't
    have claim on such generic names which don't have any qualifier
    indicating that they belong to cgroup.

    * cgroup_subsys->subsys_id should always equal the matching
    cgroup_subsys_id enum; however, we require each controller to
    initialize it and then BUG if they don't match, which is a bit
    silly.

    This patch cleans up cgroup_subsys names and initialization by doing
    the followings.

    * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
    cgroup_subsys with _cgrp_subsys.

    * With the above, renaming subsys identifiers to match the userland
    visible names doesn't cause any naming conflicts. All non-matching
    identifiers are renamed to match the official names.

    cpu_cgroup -> cpu
    mem_cgroup -> memory
    perf -> perf_event

    * controllers no longer need to initialize ->subsys_id and ->name.
    They're generated in cgroup core and set automatically during boot.

    * Redundant cgroup_subsys declarations removed.

    * While updating BUG_ON()s in cgroup_init_early(), convert them to
    WARN()s. BUGging that early during boot is stupid - the kernel
    can't print anything, even through serial console and the trap
    handler doesn't even link stack frame properly for back-tracing.

    This patch doesn't introduce any behavior changes.

    v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
    classid handling into core").

    Signed-off-by: Tejun Heo
    Acked-by: Neil Horman
    Acked-by: "David S. Miller"
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Acked-by: Aristeu Rozanski
    Acked-by: Ingo Molnar
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Serge E. Hallyn
    Cc: Vivek Goyal
    Cc: Thomas Graf

    Tejun Heo
     

06 Dec, 2013

2 commits

  • In preparation of conversion to kernfs, cgroup file handling is
    updated so that it can be easily mapped to kernfs. This patch
    replaces cftype->read_seq_string() with cftype->seq_show() which is
    not limited to single_open() operation and will map directcly to
    kernfs seq_file interface.

    The conversions are mechanical. As ->seq_show() doesn't have @css and
    @cft, the functions which make use of them are converted to use
    seq_css() and seq_cft() respectively. In several occassions, e.f. if
    it has seq_string in its name, the function name is updated to fit the
    new method better.

    This patch does not introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Acked-by: Aristeu Rozanski
    Acked-by: Vivek Goyal
    Acked-by: Michal Hocko
    Acked-by: Daniel Wagner
    Acked-by: Li Zefan
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Neil Horman

    Tejun Heo
     
  • In preparation of conversion to kernfs, cgroup file handling is being
    consolidated so that it can be easily mapped to the seq_file based
    interface of kernfs.

    cftype->read_map() doesn't add any value and being replaced with
    ->read_seq_string(). Update cpu_stats_show() and cpuacct_stats_show()
    accordingly.

    This patch doesn't make any visible behavior changes.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Tejun Heo
     

09 Aug, 2013

5 commits

  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup.
    Please see the previous commit which converts the subsystem methods
    for rationale.

    This patch converts all cftype file operations to take @css instead of
    @cgroup. cftypes for the cgroup core files don't have their subsytem
    pointer set. These will automatically use the dummy_css added by the
    previous patch and can be converted the same way.

    Most subsystem conversions are straight forwards but there are some
    interesting ones.

    * freezer: update_if_frozen() is also converted to take @css instead
    of @cgroup for consistency. This will make the code look simpler
    too once iterators are converted to use css.

    * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
    vmpressure while mem_cgroup_from_cont() can be made static.
    Updated accordingly.

    * cpu: cgroup_tg() doesn't have any user left. Removed.

    * cpuacct: cgroup_ca() doesn't have any user left. Removed.

    * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
    Removed.

    * net_cls: cgrp_cls_state() doesn't have any user left. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • cgroup is currently in the process of transitioning to using struct
    cgroup_subsys_state * as the primary handle instead of struct cgroup *
    in subsystem implementations for the following reasons.

    * With unified hierarchy, subsystems will be dynamically bound and
    unbound from cgroups and thus css's (cgroup_subsys_state) may be
    created and destroyed dynamically over the lifetime of a cgroup,
    which is different from the current state where all css's are
    allocated and destroyed together with the associated cgroup. This
    in turn means that cgroup_css() should be synchronized and may
    return NULL, making it more cumbersome to use.

    * Differing levels of per-subsystem granularity in the unified
    hierarchy means that the task and descendant iterators should behave
    differently depending on the specific subsystem the iteration is
    being performed for.

    * In majority of the cases, subsystems only care about its part in the
    cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
    often obtain the matching css pointer from the cgroup and don't
    bother with the cgroup pointer itself. Passing around css fits
    much better.

    This patch converts all cgroup_subsys methods to take @css instead of
    @cgroup. The conversions are mostly straight-forward. A few
    noteworthy changes are

    * ->css_alloc() now takes css of the parent cgroup rather than the
    pointer to the new cgroup as the css for the new cgroup doesn't
    exist yet. Knowing the parent css is enough for all the existing
    subsystems.

    * In kernel/cgroup.c::offline_css(), unnecessary open coded css
    dereference is replaced with local variable access.

    This patch shouldn't cause any behavior differences.

    v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too. Suggested
    by Li Zefan.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Michal Hocko
    Acked-by: Vivek Goyal
    Acked-by: Aristeu Rozanski
    Acked-by: Daniel Wagner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: Matt Helsley
    Cc: Jens Axboe
    Cc: Steven Rostedt

    Tejun Heo
     
  • Currently, controllers have to explicitly follow the cgroup hierarchy
    to find the parent of a given css. cgroup is moving towards using
    cgroup_subsys_state as the main controller interface construct, so
    let's provide a way to climb the hierarchy using just csses.

    This patch implements css_parent() which, given a css, returns its
    parent. The function is guarnateed to valid non-NULL parent css as
    long as the target css is not at the top of the hierarchy.

    freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
    are converted to use css_parent() instead of accessing cgroup->parent
    directly.

    * __parent_ca() is dropped from cpuacct and its usage is replaced with
    parent_ca(). The only difference between the two was NULL test on
    cgroup->parent which is now embedded in css_parent() making the
    distinction moot. Note that eventually a css->parent field will be
    added to css and the NULL check in css_parent() will go away.

    This patch shouldn't cause any behavior differences.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • css (cgroup_subsys_state) is usually embedded in a subsys specific
    data structure. Subsystems either use container_of() directly to cast
    from css to such data structure or has an accessor function wrapping
    such cast. As cgroup as whole is moving towards using css as the main
    interface handle, add and update such accessors to ease dealing with
    css's.

    All accessors explicitly handle NULL input and return NULL in those
    cases. While this looks like an extra branch in the code, as all
    controllers specific data structures have css as the first field, the
    casting doesn't involve any offsetting and the compiler can trivially
    optimize out the branch.

    * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
    accessor. Added.

    * memory, hugetlb and devices already had one but didn't explicitly
    handle NULL input. Updated.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • The names of the two struct cgroup_subsys_state accessors -
    cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
    The former clashes with the type name and the latter doesn't even
    indicate it's somehow related to cgroup.

    We're about to revamp large portion of cgroup API, so, let's rename
    them so that they're less awkward. Most per-controller usages of the
    accessors are localized in accessor wrappers and given the amount of
    scheduled changes, this isn't gonna add any noticeable headache.

    Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
    to task_css(). This patch is pure rename.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     

10 Apr, 2013

11 commits