11 Jul, 2017

1 commit

  • By default, vmpressure events are not pass-through, i.e. they propagate
    up through the memcg hierarchy until an event notifier is found for any
    threshold level.

    This presents a difficulty when a thread waiting on a read(2) for a
    vmpressure event cannot distinguish between local memory pressure and
    memory pressure in a descendant memcg, especially when that thread may
    not control the memcg hierarchy.

    Consider a user-controlled child memcg with a smaller limit than a
    top-level memcg controlled by the "Activity Manager" specified in
    Documentation/cgroup-v1/memory.txt. It may register for memory pressure
    notification for descendant memcgs to make a policy decision: oom kill a
    low priority job, increase the limit, decrease other limits, etc. If it
    registers for memory pressure notification on the top-level memcg, it
    currently cannot distinguish between memory pressure in its own memcg or
    a descendant memcg, which is user-controlled.

    Conversely, if a user registers for memory pressure notification on
    their own descendant memcg, the Activity Manager does not receive any
    pressure notification for that child memcg hierarchy. Vmpressure events
    are not received for ancestor memcgs if the memcg experiencing pressure
    have notifiers registered, perhaps outside the knowledge of the thread
    waiting on read(2) at the top level.

    Both of these are consequences of vmpressure notification not being
    pass-through.

    This implements a pass-through behavior for vmpressure events. When
    writing to control.event_control, vmpressure event handlers may
    optionally specify a mode. There are two new modes:

    - "hierarchy": always propagate memory pressure events up the hierarchy
    regardless if descendant memcgs have their own notifiers registered,
    and

    - "local": only receive notifications when the memcg for which the
    event is registered experiences memory pressure.

    Of course, processes may register for one notification of "low,local",
    for example, and another for "low".

    If no mode is specified, the current behavior is maintained for
    backwards compatibility.

    See the change to Documentation/cgroup-v1/memory.txt for full
    specification.

    [dan.carpenter@oracle.com: free the same pointer we allocated]
    Link: http://lkml.kernel.org/r/20170613191820.GA20003@elgon.mountain
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705311421320.8946@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Signed-off-by: Dan Carpenter
    Cc: Johannes Weiner
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Jonathan Corbet
    Cc: Anton Vorontsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

28 Feb, 2017

1 commit

  • Pull cgroup updates from Tejun Heo:
    "Several noteworthy changes.

    - Parav's rdma controller is finally merged. It is very straight
    forward and can limit the abosolute numbers of common rdma
    constructs used by different cgroups.

    - kernel/cgroup.c got too chubby and disorganized. Created
    kernel/cgroup/ subdirectory and moved all cgroup related files
    under kernel/ there and reorganized the core code. This hurts for
    backporting patches but was long overdue.

    - cgroup v2 process listing reimplemented so that it no longer
    depends on allocating a buffer large enough to cache the entire
    result to sort and uniq the output. v2 has always mangled the sort
    order to ensure that users don't depend on the sorted output, so
    this shouldn't surprise anybody. This makes the pid listing
    functions use the same iterators that are used internally, which
    have to have the same iterating capabilities anyway.

    - perf cgroup filtering now works automatically on cgroup v2. This
    patch was posted a long time ago but somehow fell through the
    cracks.

    - misc fixes asnd documentation updates"

    * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
    kernfs: fix locking around kernfs_ops->release() callback
    cgroup: drop the matching uid requirement on migration for cgroup v2
    cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
    cgroup: misc cleanups
    cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
    cgroup: track migration context in cgroup_mgctx
    cgroup: cosmetic update to cgroup_taskset_add()
    rdmacg: Fixed uninitialized current resource usage
    cgroup: Add missing cgroup-v2 PID controller documentation.
    rdmacg: Added documentation for rdmacg
    IB/core: added support to use rdma cgroup controller
    rdmacg: Added rdma cgroup controller
    cgroup: fix a comment typo
    cgroup: fix RCU related sparse warnings
    cgroup: move namespace code to kernel/cgroup/namespace.c
    cgroup: rename functions for consistency
    cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
    cgroup: separate out cgroup1_kf_syscall_ops
    cgroup: refactor mount path and clearly distinguish v1 and v2 paths
    cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
    ...

    Linus Torvalds
     

14 Jan, 2017

1 commit


11 Jan, 2017

1 commit


24 Oct, 2016

1 commit


07 Aug, 2016

1 commit

  • Pull documentation fixes from Jonathan Corbet:
    "Three fixes for the docs build, including removing an annoying warning
    on 'make help' if sphinx isn't present"

    * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
    DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
    Documenation: update cgroup's document path
    Documentation/sphinx: do not warn about missing tools in 'make help'

    Linus Torvalds
     

04 Aug, 2016

1 commit


29 Jul, 2016

1 commit

  • Node-based reclaim requires node-based LRUs and locking. This is a
    preparation patch that just moves the lru_lock to the node so later
    patches are easier to review. It is a mechanical change but note this
    patch makes contention worse because the LRU lock is hotter and direct
    reclaim and kswapd can contend on the same lock even when reclaiming
    from different zones.

    Link: http://lkml.kernel.org/r/1467970510-21195-3-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

15 May, 2016

1 commit

  • The restriction of kmem setting is not there anymore because the
    accounting is enabled by default even in the cgroup v1 - see
    b313aeee2509 ("mm: memcontrol: enable kmem accounting for all
    cgroups in the legacy hierarchy").

    Update docs accordingly.

    Signed-off-by: Qiang Huang
    Acked-by: Michal Hocko
    Signed-off-by: Jonathan Corbet

    Qiang Huang
     

19 Mar, 2016

1 commit

  • Pull cgroup updates from Tejun Heo:
    "cgroup changes for v4.6-rc1. No userland visible behavior changes in
    this pull request. I'll send out a separate pull request for the
    addition of cgroup namespace support.

    - The biggest change is the revamping of cgroup core task migration
    and controller handling logic. There are quite a few places where
    controllers and tasks are manipulated. Previously, many of those
    places implemented custom operations for each specific use case
    assuming specific starting conditions. While this worked, it makes
    the code fragile and difficult to follow.

    The bulk of this pull request restructures these operations so that
    most related operations are performed through common helpers which
    implement recursive (subtrees are always processed consistently)
    and idempotent (they make cgroup hierarchy converge to the target
    state rather than performing operations assuming specific starting
    conditions). This makes the code a lot easier to understand,
    verify and extend.

    - Implicit controller support is added. This is primarily for using
    perf_event on the v2 hierarchy so that perf can match cgroup v2
    path without requiring the user to do anything special. The kernel
    portion of perf_event changes is acked but userland changes are
    still pending review.

    - cgroup_no_v1= boot parameter added to ease testing cgroup v2 in
    certain environments.

    - There is a regression introduced during v4.4 devel cycle where
    attempts to migrate zombie tasks can mess up internal object
    management. This was fixed earlier this week and included in this
    pull request w/ stable cc'd.

    - Misc non-critical fixes and improvements"

    * 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (44 commits)
    cgroup: avoid false positive gcc-6 warning
    cgroup: ignore css_sets associated with dead cgroups during migration
    Documentation: cgroup v2: Trivial heading correction.
    cgroup: implement cgroup_subsys->implicit_on_dfl
    cgroup: use css_set->mg_dst_cgrp for the migration target cgroup
    cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup
    cgroup: move migration destination verification out of cgroup_migrate_prepare_dst()
    cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses()
    cgroup: Trivial correction to reflect controller.
    cgroup: remove stale item in cgroup-v1 document INDEX file.
    cgroup: update css iteration in cgroup_update_dfl_csses()
    cgroup: allocate 2x cgrp_cset_links when setting up a new root
    cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask
    cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends
    cgroup: use cgroup_apply_enable_control() in cgroup creation path
    cgroup: combine cgroup_mutex locking and offline css draining
    cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()
    cgroup: introduce cgroup_{save|propagate|restore}_control()
    cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive
    cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write()
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit


04 Mar, 2016

1 commit


12 Jan, 2016

1 commit