20 Nov, 2012

2 commits


10 Nov, 2012

6 commits

  • Up until now, cgroup_freezer didn't implement hierarchy properly.
    cgroups could be arranged in hierarchy but it didn't make any
    difference in how each cgroup_freezer behaved. They all operated
    separately.

    This patch implements proper hierarchy support. If a cgroup is
    frozen, all its descendants are frozen. A cgroup is thawed iff it and
    all its ancestors are THAWED. freezer.self_freezing shows the current
    freezing state for the cgroup itself. freezer.parent_freezing shows
    whether the cgroup is freezing because any of its ancestors is
    freezing.

    freezer_post_create() locks the parent and new cgroup and inherits the
    parent's state and freezer_change_state() applies new state top-down
    using cgroup_for_each_descendant_pre() which guarantees that no child
    can escape its parent's state. update_if_frozen() uses
    cgroup_for_each_descendant_post() to propagate frozen states
    bottom-up.

    Synchronization could be coarser and easier by using a single mutex to
    protect all hierarchy operations. Finer grained approach was used
    because it wasn't too difficult for cgroup_freezer and I think it's
    beneficial to have an example implementation and cgroup_freezer is
    rather simple and can serve a good one.

    As this makes cgroup_freezer properly hierarchical,
    freezer_subsys.broken_hierarchy marking is removed.

    Note that this patch changes userland visible behavior - freezing a
    cgroup now freezes all its descendants too. This behavior change is
    intended and has been warned via .broken_hierarchy.

    v2: Michal spotted a bug in freezer_change_state() - descendants were
    inheriting from the wrong ancestor. Fixed.

    v3: Documentation/cgroups/freezer-subsystem.txt updated.

    Signed-off-by: Tejun Heo
    Reviewed-by: Michal Hocko

    Tejun Heo
     
  • A cgroup is online and visible to iteration between ->post_create()
    and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and
    toggles it from the newly added freezer_post_create() and
    freezer_pre_destroy() while holding freezer->lock such that a
    cgroup_freezer can be reilably distinguished to be online. This will
    be used by full hierarchy support.

    ONLINE test is added to freezer_apply_state() but it currently doesn't
    make any difference as freezer_write() can only be called for an
    online cgroup.

    Adjusting system_freezing_cnt on destruction is moved from
    freezer_destroy() to the new freezer_pre_destroy() for consistency.

    This patch doesn't introduce any noticeable behavior change.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko

    Tejun Heo
     
  • Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
    the two flags. This is to prepare for full hierarchy support.

    freezer_apply_date() is updated such that it can handle setting and
    clearing of both flags. The two flags are also exposed to userland
    via read-only files self_freezing and parent_freezing.

    Other than the added cgroupfs files, this patch doesn't introduce any
    behavior change.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko

    Tejun Heo
     
  • freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
    As the scheduled full hierarchy support requires more than one
    freezing condition, switch it to mask of flags. If FREEZING is not
    set, it's thawed. FREEZING is set if freezing or frozen. If frozen,
    both FREEZING and FROZEN are set. Now that tasks can be attached to
    an already frozen cgroup, this also makes freezing condition checks
    more natural.

    This patch doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko

    Tejun Heo
     
  • * Make freezer_change_state() take bool @freeze instead of enum
    freezer_state.

    * Separate out freezer_apply_state() out of freezer_change_state().
    This makes freezer_change_state() a rather silly thin wrapper. It
    will be filled with hierarchy handling later on.

    This patch doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko

    Tejun Heo
     
  • * Clean-up indentation and line-breaks. Drop the invalid comment
    about freezer->lock.

    * Make all internal functions take @freezer instead of both @cgroup
    and @freezer.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko

    Tejun Heo
     

27 Oct, 2012

1 commit

  • try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
    to ensure that a task doing STOPPED/TRACED -> RUNNING transition
    can't escape freezing. This mostly works, but ptrace_stop() does
    not necessarily call schedule(), it can change task->state back to
    RUNNING and check freezing() without any lock/barrier in between.

    We could add the necessary barrier, but this patch changes
    ptrace_stop() and do_signal_stop() to use freezable_schedule().
    This fixes the race, freezer_count() and freezer_should_skip()
    carefully avoid the race.

    And this simplifies the code, try_to_freeze_tasks/update_if_frozen
    no longer need to use task_is_stopped_or_traced() checks with the
    non trivial assumptions. We can rely on the mechanism which was
    specially designed to mark the sleeping task as "frozen enough".

    v2: As Tejun pointed out, we can also change get_signal_to_deliver()
    and move try_to_freeze() up before 'relock' label.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     

21 Oct, 2012

3 commits

  • freezer_read/write() used cgroup_lock_live_group() to synchronize
    against task migration into and out of the target cgroup.
    cgroup_lock_live_group() grabs the internal cgroup lock and using it
    from outside cgroup core leads to complex and fragile locking
    dependency issues which are difficult to resolve.

    Now that freezer_can_attach() is replaced with freezer_attach() and
    update_if_frozen() updated, nothing requires excluding migration
    against freezer state reads and changes.

    This patch removes cgroup_lock_live_group() and the matching
    cgroup_unlock() usages. The prone-to-bitrot, already outdated and
    unnecessary global lock hierarchy documentation is replaced with
    documentation in local scope.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki
    Cc: Li Zefan

    Tejun Heo
     
  • Locking will change such that migration can happen while
    freezer_read/write() is in progress. This means that
    update_if_frozen() can no longer assume that all tasks in the cgroup
    coform to the current freezer state - newly migrated tasks which
    haven't finished freezer_attach() yet might be in any state.

    This patch updates update_if_frozen() such that it no longer verifies
    task states against freezer state. It now simply decides whether
    FREEZING stage is complete.

    This removal of verification makes it meaningless to call from
    freezer_change_state(). Drop it and move the fast exit test from
    freezer_read() - the only left caller - to update_if_frozen().

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki
    Cc: Li Zefan

    Tejun Heo
     
  • cgroup_freezer is one of the few users of cgroup_subsys->can_attach()
    and uses it to prevent tasks from being migrated into or out of a
    frozen cgroup. This makes cgroup_freezer cumbersome to use especially
    when co-mounted with other controllers.

    ->can_attach() is problematic in general as it can make co-mounting
    multiple cgroups difficult - migrating tasks may fail for reasons
    completely irrelevant for other controllers. freezer_can_attach() in
    particular is more problematic because it messes with cgroup internal
    locking to ensure that the state verification performed at
    freezer_can_attach() stays valid until migration is complete.

    This patch replaces freezer_can_attach() with freezer_attach() so that
    tasks are always allowed to migrate - they are nudged into the
    conforming state from freezer_attach(). This means that there can be
    tasks which are being migrated which don't conform to the current
    cgroup_freezer state until freezer_attach() is complete. Under the
    current locking scheme, the only such place is freezer_fork() which is
    updated to handle such window.

    While this patch doesn't remove the use of internal cgroup locking
    from freezer_read/write() paths, it removes the requirement to keep
    the freezer state constant while migrating and enables such change.

    Note that this creates a userland visible behavior change - FROZEN
    cgroup can no longer be used to lock migrations in and out of the
    cgroup. This behavior change is intended. I don't think the feature
    is necessary - userland should coordinate accesses to cgroup fs anyway
    - and even if the feature is needed cgroup_freezer is the completely
    wrong place to implement it.

    Signed-off-by: Tejun Heo
    LKML-Reference:
    Cc: Matt Helsley
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki
    Cc: Li Zefan

    Tejun Heo
     

17 Oct, 2012

3 commits

  • cgroup_freezer doesn't transition from FREEZING to FROZEN if the
    cgroup contains PF_NOFREEZE tasks or tasks sleeping with
    PF_FREEZER_SKIP set.

    Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's
    nothing cgroup_freezer or userland can do about or to it. It's
    pointless to stall the transition for PF_NOFREEZE tasks.

    PF_FREEZER_SKIP indicates that the task can be skipped when
    determining whether frozen state is reached. A task with
    PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it
    wakes up and can be considered frozen much like stopped or traced
    tasks. Note that a vfork parent uses PF_FREEZER_SKIP while waiting
    for the child.

    This updates update_if_frozen() such that it only considers freezable
    tasks and treats %true freezer_should_skip() tasks as frozen.

    This allows cgroups w/ kthreads and vfork parents successfully reach
    FROZEN state.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki

    Tejun Heo
     
  • try_to_freeze_cgroup() has condition checks which are intended to fail
    the write operation to freezer.state if there are tasks which can't be
    frozen. The condition checks have been broken for quite some time
    now. freeze_task() returns %false if the target task can't be frozen,
    so num_cant_freeze_now is never incremented.

    In addition, strangely, cgroup freezing proceeds even after the write
    is failed, which is rather broken.

    This patch rips out the non-working code intended to fail the write to
    freezer.state when the cgroup contains non-freezable tasks and makes
    it official that writes to freezer.state succeed whether there are
    non-freezable tasks in the cgroup or not.

    This leaves is_task_frozen_enough() with only one user -
    upste_if_frozen(). Collapse it into the caller. Note that this
    removes an extra call to freezing().

    This doesn't cause any userland behavior changes.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki

    Tejun Heo
     
  • cgroup core has a bug which violates a basic rule about event
    notifications - when a new entity needs to be added, you add that to
    the notification list first and then make the new entity conform to
    the current state. If done in the reverse order, an event happening
    inbetween will be lost.

    cgroup_subsys->fork() is invoked way before the new task is added to
    the css_set. Currently, cgroup_freezer is the only user of ->fork()
    and uses it to make new tasks conform to the current state of the
    freezer. If FROZEN state is requested while fork is in progress
    between cgroup_fork_callbacks() and cgroup_post_fork(), the child
    could escape freezing - the cgroup isn't frozen when ->fork() is
    called and the freezer couldn't see the new task on the css_set.

    This patch moves cgroup_subsys->fork() invocation to
    cgroup_post_fork() after the new task is added to the css_set.
    cgroup_fork_callbacks() is removed.

    Because now a task may be migrated during cgroup_subsys->fork(),
    freezer_fork() is updated so that it adheres to the usual RCU locking
    and the rather pointless comment on why locking can be different there
    is removed (if it doesn't make anything simpler, why even bother?).

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Rafael J. Wysocki
    Cc: stable@vger.kernel.org

    Tejun Heo
     

15 Sep, 2012

1 commit

  • Currently, cgroup hierarchy support is a mess. cpu related subsystems
    behave correctly - configuration, accounting and control on a parent
    properly cover its children. blkio and freezer completely ignore
    hierarchy and treat all cgroups as if they're directly under the root
    cgroup. Others show yet different behaviors.

    These differing interpretations of cgroup hierarchy make using cgroup
    confusing and it impossible to co-mount controllers into the same
    hierarchy and obtain sane behavior.

    Eventually, we want full hierarchy support from all subsystems and
    probably a unified hierarchy. Users using separate hierarchies
    expecting completely different behaviors depending on the mounted
    subsystem is deterimental to making any progress on this front.

    This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
    for controllers which are lacking in hierarchy support. The goal of
    this patch is two-fold.

    * Move users away from using hierarchy on currently non-hierarchical
    subsystems, so that implementing proper hierarchy support on those
    doesn't surprise them.

    * Keep track of which controllers are broken how and nudge the
    subsystems to implement proper hierarchy support.

    For now, start with a single warning message. We can whine louder
    later on.

    v2: Fixed a typo spotted by Michal. Warning message updated.

    v3: Updated memcg part so that it doesn't generate warning in the
    cases where .use_hierarchy=false doesn't make the behavior
    different from root.use_hierarchy=true. Fixed a typo spotted by
    Glauber.

    v4: Check ->broken_hierarchy after cgroup creation is complete so that
    ->create() can affect the result per Michal. Dropped unnecessary
    memcg root handling per Michal.

    Signed-off-by: Tejun Heo
    Acked-by: Michal Hocko
    Acked-by: Li Zefan
    Acked-by: Serge E. Hallyn
    Cc: Glauber Costa
    Cc: Peter Zijlstra
    Cc: Paul Turner
    Cc: Johannes Weiner
    Cc: Thomas Graf
    Cc: Vivek Goyal
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Neil Horman
    Cc: Aneesh Kumar K.V

    Tejun Heo
     

02 Apr, 2012

1 commit

  • Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
    net_cls and device controllers to use the new cftype based interface.
    Termination entry is added to cftype arrays and populate callbacks are
    replaced with cgroup_subsys->base_cftypes initializations.

    This is functionally identical transformation. There shouldn't be any
    visible behavior change.

    memcg is rather special and will be converted separately.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: "David S. Miller"
    Cc: Vivek Goyal

    Tejun Heo
     

03 Feb, 2012

1 commit

  • The argument is not used at all, and it's not necessary, because
    a specific callback handler of course knows which subsys it
    belongs to.

    Now only ->pupulate() takes this argument, because the handlers of
    this callback always call cgroup_add_file()/cgroup_add_files().

    So we reduce a few lines of code, though the shrinking of object size
    is minimal.

    16 files changed, 113 insertions(+), 162 deletions(-)

    text data bss dec hex filename
    5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
    5486170 656987 7039960 13183117 c9288d vmlinux.o

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

10 Jan, 2012

1 commit

  • * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    cgroup: fix to allow mounting a hierarchy by name
    cgroup: move assignement out of condition in cgroup_attach_proc()
    cgroup: Remove task_lock() from cgroup_post_fork()
    cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
    cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
    cgroup: only need to check oldcgrp==newgrp once
    cgroup: remove redundant get/put of task struct
    cgroup: remove redundant get/put of old css_set from migrate
    cgroup: Remove unnecessary task_lock before fetching css_set on migration
    cgroup: Drop task_lock(parent) on cgroup_fork()
    cgroups: remove redundant get/put of css_set from css_set_check_fetched()
    resource cgroups: remove bogus cast
    cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
    cgroup, cpuset: don't use ss->pre_attach()
    cgroup: don't use subsys->can_attach_task() or ->attach_task()
    cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
    cgroup: improve old cgroup handling in cgroup_attach_proc()
    cgroup: always lock threadgroup during migration
    threadgroup: extend threadgroup_lock() to cover exit and exec
    threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
    ...

    Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
    fix a css_set not found bug in cgroup_attach_proc" that already
    mentioned that the bug is fixed (differently) in Tejun's cgroup
    patchset. This one, in other words.

    Linus Torvalds
     

22 Dec, 2011

1 commit

  • * master: (848 commits)
    SELinux: Fix RCU deref check warning in sel_netport_insert()
    binary_sysctl(): fix memory leak
    mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
    ipmi_watchdog: restore settings when BMC reset
    oom: fix integer overflow of points in oom_badness
    memcg: keep root group unchanged if creation fails
    nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
    nilfs2: unbreak compat ioctl
    cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
    evm: prevent racing during tfm allocation
    evm: key must be set once during initialization
    mmc: vub300: fix type of firmware_rom_wait_states module parameter
    Revert "mmc: enable runtime PM by default"
    mmc: sdhci: remove "state" argument from sdhci_suspend_host
    x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
    IB/qib: Correct sense on freectxts increment and decrement
    RDMA/cma: Verify private data length
    cgroups: fix a css_set not found bug in cgroup_attach_proc
    oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
    ...

    Conflicts:
    kernel/cgroup_freezer.c

    Rafael J. Wysocki
     

13 Dec, 2011

2 commits

  • Now that subsys->can_attach() and attach() take @tset instead of
    @task, they can handle per-task operations. Convert
    ->can_attach_task() and ->attach_task() users to use ->can_attach()
    and attach() instead. Most converions are straight-forward.
    Noteworthy changes are,

    * In cgroup_freezer, remove unnecessary NULL assignments to unused
    methods. It's useless and very prone to get out of sync, which
    already happened.

    * In cpuset, PF_THREAD_BOUND test is checked for each task. This
    doesn't make any practical difference but is conceptually cleaner.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: James Morris
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Tejun Heo
     
  • Currently, there's no way to pass multiple tasks to cgroup_subsys
    methods necessitating the need for separate per-process and per-task
    methods. This patch introduces cgroup_taskset which can be used to
    pass multiple tasks and their associated cgroups to cgroup_subsys
    methods.

    Three methods - can_attach(), cancel_attach() and attach() - are
    converted to use cgroup_taskset. This unifies passed parameters so
    that all methods have access to all information. Conversions in this
    patchset are identical and don't introduce any behavior change.

    -v2: documentation updated as per Paul Menage's suggestion.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Frederic Weisbecker
    Acked-by: Paul Menage
    Acked-by: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: James Morris

    Tejun Heo
     

25 Nov, 2011

1 commit

  • 2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
    transitions) removed is_task_frozen_enough and replaced it with a simple
    frozen call. This, however, breaks freezing for a group with stopped tasks
    because those cannot be frozen and so the group remains in CGROUP_FREEZING
    state (update_if_frozen doesn't count stopped tasks) and never reaches
    CGROUP_FROZEN.

    Let's add is_task_frozen_enough back and use it at the original locations
    (update_if_frozen and try_to_freeze_cgroup). Semantically we consider
    stopped tasks as frozen enough so we should consider both cases when
    testing frozen tasks.

    Testcase:
    mkdir /dev/freezer
    mount -t cgroup -o freezer none /dev/freezer
    mkdir /dev/freezer/foo
    sleep 1h &
    pid=$!
    kill -STOP $pid
    echo $pid > /dev/freezer/foo/tasks
    echo FROZEN > /dev/freezer/foo/freezer.state
    while true
    do
    cat /dev/freezer/foo/freezer.state
    [ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
    sleep 1
    done
    echo OK

    Signed-off-by: Michal Hocko
    Acked-by: Li Zefan
    Cc: Tomasz Buchert
    Cc: Paul Menage
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Tejun Heo

    Michal Hocko
     

22 Nov, 2011

5 commits

  • After "freezer: make freezing() test freeze conditions in effect
    instead of TIF_FREEZE", freezing() returns authoritative answer on
    whether the current task should freeze or not and freeze_task()
    doesn't need or use @sig_only. Remove it.

    While at it, rewrite function comment for freeze_task() and rename
    @sig_only to @user_only in try_to_freeze_tasks().

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Using TIF_FREEZE for freezing worked when there was only single
    freezing condition (the PM one); however, now there is also the
    cgroup_freezer and single bit flag is getting clumsy.
    thaw_processes() is already testing whether cgroup freezing in in
    effect to avoid thawing tasks which were frozen by both PM and cgroup
    freezers.

    This is racy (nothing prevents race against cgroup freezing) and
    fragile. A much simpler way is to test actual freeze conditions from
    freezing() - ie. directly test whether PM or cgroup freezing is in
    effect.

    This patch adds variables to indicate whether and what type of
    freezing conditions are in effect and reimplements freezing() such
    that it directly tests whether any of the two freezing conditions is
    active and the task should freeze. On fast path, freezing() is still
    very cheap - it only tests system_freezing_cnt.

    This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
    freeze/thaw operations more usual - updating state variables for the
    new state and nudging target tasks so that they notice the new state
    and comply. As long as the nudging happens after state update, it's
    race-free.

    * This allows use of freezing() in freeze_task(). Replace the open
    coded tests with freezing().

    * p != current test is added to warning printing conditions in
    try_to_freeze_tasks() failure path. This is necessary as freezing()
    is now true for the task which initiated freezing too.

    -v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
    system_freezing_cnt. Fixed.

    Signed-off-by: Tejun Heo
    Acked-by: Paul Menage (for the cgroup portions)

    Tejun Heo
     
  • TIF_FREEZE will be removed soon and freezing() will directly test
    whether any freezing condition is in effect. Make the following
    changes in preparation.

    * Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
    return bool.

    * Make cgroup_freezing() access task_freezer() under rcu read lock
    instead of task_lock(). This makes the state dereferencing racy
    against task moving to another cgroup; however, it was already racy
    without this change as ->state dereference wasn't synchronized.
    This will be later dealt with using attach hooks.

    * freezer->state is now set before trying to push tasks into the
    target state.

    -v2: Oleg pointed out that freeze_change_state() was setting
    freeze->state incorrectly to CGROUP_FROZEN instead of
    CGROUP_FREEZING. Fixed.

    -v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
    try_to_freeze_cgroup() regardless of the current state. Patch
    updated such that the actual freeze/thaw operations are always
    performed on invocation. This shouldn't make any difference
    unless something is broken.

    Signed-off-by: Tejun Heo
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
    interlocked - freezing is set to request freeze and when the task
    actually freezes, it clears freezing and sets frozen.

    This interlocking makes things more complex than necessary - freezing
    doesn't mean there's freezing condition in effect and frozen doesn't
    match the task actually entering and leaving frozen state (it's
    cleared by the thawing task).

    This patch makes freezing indicate that freeze condition is in effect.
    A task enters and stays frozen if freezing. This makes PF_FROZEN
    manipulation done only by the task itself and prevents wakeup from
    __thaw_task() leaking outside of refrigerator.

    The only place which needs to tell freezing && !frozen is
    try_to_freeze_task() to whine about tasks which don't enter frozen.
    It's updated to test the condition explicitly.

    With the change, frozen() state my linger after __thaw_task() until
    the task wakes up and exits fridge. This can trigger BUG_ON() in
    update_if_frozen(). Work it around by testing freezing() && frozen()
    instead of frozen().

    -v2: Oleg pointed out missing re-check of freezing() when trying to
    clear FROZEN and possible spurious BUG_ON() trigger in
    update_if_frozen(). Both fixed.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Paul Menage

    Tejun Heo
     
  • thaw_process() now has only internal users - system and cgroup
    freezers. Remove the unnecessary return value, rename, unexport and
    collapse __thaw_process() into it. This will help further updates to
    the freezer code.

    -v3: oom_kill grew a use of thaw_process() while this patch was
    pending. Convert it to use __thaw_task() for now. In the longer
    term, this should be handled by allowing tasks to die if killed
    even if it's frozen.

    -v2: minor style update as suggested by Matt.

    Signed-off-by: Tejun Heo
    Cc: Paul Menage
    Cc: Matt Helsley

    Tejun Heo
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

27 May, 2011

1 commit

  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

28 Oct, 2010

3 commits

  • There are 4 state transitions possible for a freezer. Only FREEZING ->
    FROZEN transaction is done lazily. This patch allows update_freezer_state
    only to perform this transaction and renames the function to
    update_if_frozen.

    Moreover is_task_frozen_enough function is removed and its every occurence
    is replaced with frozen(). Therefore for a group to become FROZEN every
    task must be frozen.

    The previous version could trigger a following bug: When cgroup is in the
    process of freezing (but none of its tasks are frozen yet),
    update_freezer_state() (called from freezer_read or freezer_write) would
    incorrectly report that a group is 'THAWED' (because nfrozen = 0),
    allowing the transaction FREEZING -> THAWED without writing anything to
    'freezer.state'. This is incorrect according to the documentation. This
    could result in a 'THAWED' cgroup with frozen tasks inside.

    A code to reproduce this bug is available here:
    http://pentium.hopto.org/~thinred/repos/linux-misc/freezer_bug2.c

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Tomasz Buchert
    Cc: Matt Helsley
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Buchert
     
  • It is possible to move a task from its cgroup even if this group is
    'FREEZING'. This results in a nasty bug - the moved task will become
    frozen OUTSIDE its original cgroup and will remain in a permanent 'D'
    state.

    This patch allows to migrate the task only between THAWED cgroups.

    This behavior was observed and easily reproduced on a single core laptop.
    Notice that reproducibility depends highly on the machine used. Program
    and instructions how to reproduce the bug can be fetched from:
    http://pentium.hopto.org/~thinred/repos/linux-misc/freezer_bug.c

    Signed-off-by: Tomasz Buchert
    Cc: Matt Helsley
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Buchert
     
  • The root freezer_state is always CGROUP_THAWED so we can remove the
    special case from the code. The test itself can be handy and is extracted
    to static function.

    Signed-off-by: Tomasz Buchert
    Cc: Matt Helsley
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Buchert
     

11 May, 2010

1 commit


30 Apr, 2010

1 commit

  • Add an RCU read-side critical section to suppress this false
    positive.

    Located-by: Eric Paris
    Signed-off-by: Paul E. McKenney
    Acked-by: Li Zefan
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: eric.dumazet@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

05 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit

  • When the cgroup freezer is used to freeze tasks we do not want to thaw
    those tasks during resume. Currently we test the cgroup freezer
    state of the resuming tasks to see if the cgroup is FROZEN. If so
    then we don't thaw the task. However, the FREEZING state also indicates
    that the task should remain frozen.

    This also avoids a problem pointed out by Oren Ladaan: the freezer state
    transition from FREEZING to FROZEN is updated lazily when userspace reads
    or writes the freezer.state file in the cgroup filesystem. This means that
    resume will thaw tasks in cgroups which should be in the FROZEN state if
    there is no read/write of the freezer.state file to trigger this
    transition before suspend.

    NOTE: Another "simple" solution would be to always update the cgroup
    freezer state during resume. However it's a bad choice for several reasons:
    Updating the cgroup freezer state is somewhat expensive because it requires
    walking all the tasks in the cgroup and checking if they are each frozen.
    Worse, this could easily make resume run in N^2 time where N is the number
    of tasks in the cgroup. Finally, updating the freezer state from this code
    path requires trickier locking because of the way locks must be ordered.

    Instead of updating the freezer state we rely on the fact that lazy
    updates only manage the transition from FREEZING to FROZEN. We know that
    a cgroup with the FREEZING state may actually be FROZEN so test for that
    state too. This makes sense in the resume path even for partially-frozen
    cgroups -- those that really are FREEZING but not FROZEN.

    Reported-by: Oren Ladaan
    Signed-off-by: Matt Helsley
    Cc: stable@kernel.org
    Signed-off-by: Rafael J. Wysocki

    Matt Helsley
     

24 Sep, 2009

1 commit

  • Alter the ss->can_attach and ss->attach functions to be able to deal with
    a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
    pre-patch to cgroup-procs-writable.patch.)

    Currently, new mode of the attach function can only tell the subsystem
    about the old cgroup of the threadgroup leader. No subsystem currently
    needs that information for each thread that's being moved, but if one were
    to be added (for example, one that counts tasks within a group) this bit
    would need to be reworked a bit to tell the subsystem the right
    information.

    [hidave.darkstar@gmail.com: fix build]
    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Reviewed-by: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

13 Nov, 2008

1 commit

  • With this change, control file 'freezer.state' doesn't exist in root
    cgroup, making root cgroup unfreezable.

    I think it's reasonable to disallow freeze tasks in the root cgroup. And
    then we can avoid fork overhead when freezer subsystem is compiled but not
    used.

    Also make writing invalid value to freezer.state returns EINVAL rather
    than EIO. This is more consistent with other cgroup subsystem.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Cc: Cedric Le Goater
    Cc: Paul Menage
    Cc: Matt Helsley
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan