08 Oct, 2016

1 commit

  • Commit 74070542099c ("oom, suspend: fix oom_reaper vs.
    oom_killer_disable race") has workaround an existing race between
    oom_killer_disable and oom_reaper by adding another round of
    try_to_freeze_tasks after the oom killer was disabled. This was the
    easiest thing to do for a late 4.7 fix. Let's fix it properly now.

    After "oom: keep mm of the killed task available" we no longer have to
    call exit_oom_victim from the oom reaper because we have stable mm
    available and hide the oom_reaped mm by MMF_OOM_SKIP flag. So let's
    remove exit_oom_victim and the race described in the above commit
    doesn't exist anymore if.

    Unfortunately this alone is not sufficient for the oom_killer_disable
    usecase because now we do not have any reliable way to reach
    exit_oom_victim (the victim might get stuck on a way to exit for an
    unbounded amount of time). OOM killer can cope with that by checking mm
    flags and move on to another victim but we cannot do the same for
    oom_killer_disable as we would lose the guarantee of no further
    interference of the victim with the rest of the system. What we can do
    instead is to cap the maximum time the oom_killer_disable waits for
    victims. The only current user of this function (pm suspend) already
    has a concept of timeout for back off so we can reuse the same value
    there.

    Let's drop set_freezable for the oom_reaper kthread because it is no
    longer needed as the reaper doesn't wake or thaw any processes.

    Link: http://lkml.kernel.org/r/1472119394-11342-7-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

09 Jul, 2016

1 commit


02 Jul, 2016

1 commit


25 Jun, 2016

1 commit

  • Tetsuo has reported the following potential oom_killer_disable vs.
    oom_reaper race:

    (1) freeze_processes() starts freezing user space threads.
    (2) Somebody (maybe a kenrel thread) calls out_of_memory().
    (3) The OOM killer calls mark_oom_victim() on a user space thread
    P1 which is already in __refrigerator().
    (4) oom_killer_disable() sets oom_killer_disabled = true.
    (5) P1 leaves __refrigerator() and enters do_exit().
    (6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
    exit_oom_victim(P1).
    (7) oom_killer_disable() returns while P1 not yet finished
    (8) P1 perform IO/interfere with the freezer.

    This situation is unfortunate. We cannot move oom_killer_disable after
    all the freezable kernel threads are frozen because the oom victim might
    depend on some of those kthreads to make a forward progress to exit so
    we could deadlock. It is also far from trivial to teach the oom_reaper
    to not call exit_oom_victim() because then we would lose a guarantee of
    the OOM killer and oom_killer_disable forward progress because
    exit_mm->mmput might block and never call exit_oom_victim.

    It seems the easiest way forward is to workaround this race by calling
    try_to_freeze_tasks again after oom_killer_disable. This will make sure
    that all the tasks are frozen or it bails out.

    Fixes: 449d777d7ad6 ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
    Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reported-by: Tetsuo Handa
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

11 Feb, 2016

1 commit

  • Wall time obtained from do_gettimeofday gives 32 bit timeval which can only
    represent time until January 2038. This patch moves to ktime_t, a 64-bit time.

    Also, wall time is susceptible to sudden jumps due to user setting the time or
    due to NTP. Boot time is constantly increasing time better suited for
    subtracting two timestamps.

    Signed-off-by: Abhilash Jindal
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Abhilash Jindal
     

12 Feb, 2015

2 commits

  • Commit 5695be142e20 ("OOM, PM: OOM killed task shouldn't escape PM
    suspend") has left a race window when OOM killer manages to
    note_oom_kill after freeze_processes checks the counter. The race
    window is quite small and really unlikely and partial solution deemed
    sufficient at the time of submission.

    Tejun wasn't happy about this partial solution though and insisted on a
    full solution. That requires the full OOM and freezer's task freezing
    exclusion, though. This is done by this patch which introduces oom_sem
    RW lock and turns oom_killer_disable() into a full OOM barrier.

    oom_killer_disabled check is moved from the allocation path to the OOM
    level and we take oom_sem for reading for both the check and the whole
    OOM invocation.

    oom_killer_disable() takes oom_sem for writing so it waits for all
    currently running OOM killer invocations. Then it disable all the further
    OOMs by setting oom_killer_disabled and checks for any oom victims.
    Victims are counted via mark_tsk_oom_victim resp. unmark_oom_victim. The
    last victim wakes up all waiters enqueued by oom_killer_disable().
    Therefore this function acts as the full OOM barrier.

    The page fault path is covered now as well although it was assumed to be
    safe before. As per Tejun, "We used to have freezing points deep in file
    system code which may be reacheable from page fault." so it would be
    better and more robust to not rely on freezing points here. Same applies
    to the memcg OOM killer.

    out_of_memory tells the caller whether the OOM was allowed to trigger and
    the callers are supposed to handle the situation. The page allocation
    path simply fails the allocation same as before. The page fault path will
    retry the fault (more on that later) and Sysrq OOM trigger will simply
    complain to the log.

    Normally there wouldn't be any unfrozen user tasks after
    try_to_freeze_tasks so the function will not block. But if there was an
    OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
    finish yet then we have to wait for it. This should complete in a finite
    time, though, because

    - the victim cannot loop in the page fault handler (it would die
    on the way out from the exception)
    - it cannot loop in the page allocator because all the further
    allocation would fail and __GFP_NOFAIL allocations are not
    acceptable at this stage
    - it shouldn't be blocked on any locks held by frozen tasks
    (try_to_freeze expects lockless context) and kernel threads and
    work queues are not frozen yet

    Signed-off-by: Michal Hocko
    Suggested-by: Tejun Heo
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Cong Wang
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • While touching this area let's convert printk to pr_*. This also makes
    the printing of continuation lines done properly.

    Signed-off-by: Michal Hocko
    Acked-by: Tejun Heo
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Cong Wang
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

23 Oct, 2014

1 commit


22 Oct, 2014

2 commits

  • as per 0c740d0afc3b (introduce for_each_thread() to replace the buggy
    while_each_thread()) get rid of do_each_thread { } while_each_thread()
    construct and replace it by a more error prone for_each_thread.

    This patch doesn't introduce any user visible change.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Michal Hocko
    Signed-off-by: Rafael J. Wysocki

    Michal Hocko
     
  • PM freezer relies on having all tasks frozen by the time devices are
    getting frozen so that no task will touch them while they are getting
    frozen. But OOM killer is allowed to kill an already frozen task in
    order to handle OOM situtation. In order to protect from late wake ups
    OOM killer is disabled after all tasks are frozen. This, however, still
    keeps a window open when a killed task didn't manage to die by the time
    freeze_processes finishes.

    Reduce the race window by checking all tasks after OOM killer has been
    disabled. This is still not race free completely unfortunately because
    oom_killer_disable cannot stop an already ongoing OOM killer so a task
    might still wake up from the fridge and get killed without
    freeze_processes noticing. Full synchronization of OOM and freezer is,
    however, too heavy weight for this highly unlikely case.

    Introduce and check oom_kills counter which gets incremented early when
    the allocator enters __alloc_pages_may_oom path and only check all the
    tasks if the counter changes during the freezing attempt. The counter
    is updated so early to reduce the race window since allocator checked
    oom_killer_disabled which is set by PM-freezing code. A false positive
    will push the PM-freezer into a slow path but that is not a big deal.

    Changes since v1
    - push the re-check loop out of freeze_processes into
    check_frozen_processes and invert the condition to make the code more
    readable as per Rafael

    Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
    Cc: 3.2+ # 3.2+
    Signed-off-by: Michal Hocko
    Signed-off-by: Rafael J. Wysocki

    Michal Hocko
     

01 Sep, 2014

1 commit

  • It sometimes may be necessary to abort a system suspend in
    progress or wake up the system from suspend-to-idle even if the
    pm_wakeup_event()/pm_stay_awake() mechanism is not enabled.

    For this purpose, introduce a new global variable pm_abort_suspend
    and make pm_wakeup_pending() check its value. Also add routines
    for manipulating that variable.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

15 Jul, 2014

1 commit

  • The commit [247bc037: PM / Sleep: Mitigate race between the freezer
    and request_firmware()] introduced the finer state control, but it
    also leads to a new bug; for example, a bug report regarding the
    firmware loading of intel BT device at suspend/resume:
    https://bugzilla.novell.com/show_bug.cgi?id=873790

    The root cause seems to be a small window between the process resume
    and the clear of usermodehelper lock. The request_firmware() function
    checks the UMH lock and gives up when it's in UMH_DISABLE state. This
    is for avoiding the invalid f/w loading during suspend/resume phase.
    The problem is, however, that usermodehelper_enable() is called at the
    end of thaw_processes(). Thus, a thawed process in between can kick
    off the f/w loader code path (in this case, via btusb_setup_intel())
    even before the call of usermodehelper_enable(). Then
    usermodehelper_read_trylock() returns an error and request_firmware()
    spews WARN_ON() in the end.

    This oneliner patch fixes the issue just by setting to UMH_FREEZING
    state again before restarting tasks, so that the call of
    request_firmware() will be blocked until the end of this function
    instead of returning an error.

    Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware())
    Link: https://bugzilla.novell.com/show_bug.cgi?id=873790
    Cc: 3.4+ # 3.4+
    Signed-off-by: Takashi Iwai
    Signed-off-by: Rafael J. Wysocki

    Takashi Iwai
     

07 Jun, 2014

1 commit

  • Adds trace events that give finer resolution into suspend/resume. These
    events are graphed in the timelines generated by the analyze_suspend.py
    script. They represent large areas of time consumed that are typical to
    suspend and resume.

    The event is triggered by calling the function "trace_suspend_resume"
    with three arguments: a string (the name of the event to be displayed
    in the timeline), an integer (case specific number, such as the power
    state or cpu number), and a boolean (where true is used to denote the start
    of the timeline event, and false to denote the end).

    The suspend_resume trace event reproduces the data that the machine_suspend
    trace event did, so the latter has been removed.

    Signed-off-by: Todd Brandt
    Acked-by: Steven Rostedt
    Signed-off-by: Rafael J. Wysocki

    Todd E Brandt
     

30 Jul, 2013

1 commit

  • Calling freeze_processes sets a global flag that will cause any
    process that calls try_to_freeze to enter the refrigerator. It
    skips sending a signal to the current task, but if the current
    task ever hits try_to_freeze, all threads will be frozen and the
    system will deadlock.

    Set a new flag, PF_SUSPEND_TASK, on the task that calls
    freeze_processes. The flag notifies the freezer that the thread
    is involved in suspend and should not be frozen. Also add a
    WARN_ON in thaw_processes if the caller does not have the
    PF_SUSPEND_TASK flag set to catch if a different task calls
    thaw_processes than the one that called freeze_processes, leaving
    a task with PF_SUSPEND_TASK permanently set on it.

    Threads that spawn off a task with PF_SUSPEND_TASK set (which
    swsusp does) will also have PF_SUSPEND_TASK set, preventing them
    from freezing while they are helping with suspend, but they need
    to be dead by the time suspend is triggered, otherwise they may
    run when userspace is expected to be frozen. Add a WARN_ON in
    thaw_processes if more than one thread has the PF_SUSPEND_TASK
    flag set.

    Reported-and-tested-by: Michael Leun
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

12 May, 2013

1 commit


10 Feb, 2013

1 commit

  • At present, the value of timeout for freezing is 20s, which is
    meaningless in case that one thread is frozen with mutex locked
    and another thread is trying to lock the mutex, as this time of
    freezing will fail unavoidably.
    And if there is no new wakeup event registered, the system will
    waste at most 20s for such meaningless trying of freezing.

    With this patch, the value of timeout can be configured to smaller
    value, so such meaningless trying of freezing will be aborted in
    earlier time, and later freezing can be also triggered in earlier
    time. And more power will be saved.
    In normal case on mobile phone, it costs real little time to freeze
    processes. On some platform, it only costs about 20ms to freeze
    user space processes and 10ms to freeze kernel freezable threads.

    Signed-off-by: Liu Chuansheng
    Signed-off-by: Li Fei
    Signed-off-by: Rafael J. Wysocki

    Li Fei
     

27 Oct, 2012

1 commit

  • try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
    to ensure that a task doing STOPPED/TRACED -> RUNNING transition
    can't escape freezing. This mostly works, but ptrace_stop() does
    not necessarily call schedule(), it can change task->state back to
    RUNNING and check freezing() without any lock/barrier in between.

    We could add the necessary barrier, but this patch changes
    ptrace_stop() and do_signal_stop() to use freezable_schedule().
    This fixes the race, freezer_count() and freezer_should_skip()
    carefully avoid the race.

    And this simplifies the code, try_to_freeze_tasks/update_if_frozen
    no longer need to use task_is_stopped_or_traced() checks with the
    non trivial assumptions. We can rely on the mechanism which was
    specially designed to mark the sleeping task as "frozen enough".

    v2: As Tejun pointed out, we can also change get_signal_to_deliver()
    and move try_to_freeze() up before 'relock' label.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     

23 Aug, 2012

1 commit


29 Mar, 2012

2 commits

  • There is a race condition between the freezer and request_firmware()
    such that if request_firmware() is run on one CPU and
    freeze_processes() is run on another CPU and usermodehelper_disable()
    called by it succeeds to grab umhelper_sem for writing before
    usermodehelper_read_trylock() called from request_firmware()
    acquires it for reading, the request_firmware() will fail and
    trigger a WARN_ON() complaining that it was called at a wrong time.
    However, in fact, it wasn't called at a wrong time and
    freeze_processes() simply happened to be executed simultaneously.

    To avoid this race, at least in some cases, modify
    usermodehelper_read_trylock() so that it doesn't fail if the
    freezing of tasks has just started and hasn't been completed yet.
    Instead, during the freezing of tasks, it will try to freeze the
    task that has called it so that it can wait until user space is
    thawed without triggering the scary warning.

    For this purpose, change usermodehelper_disabled so that it can
    take three different values, UMH_ENABLED (0), UMH_FREEZING and
    UMH_DISABLED. The first one means that usermode helpers are
    enabled, the last one means "hard disable" (i.e. the system is not
    ready for usermode helpers to be used) and the second one
    is reserved for the freezer. Namely, when freeze_processes() is
    started, it sets usermodehelper_disabled to UMH_FREEZING which
    tells usermodehelper_read_trylock() that it shouldn't fail just
    yet and should call try_to_freeze() if woken up and cannot
    return immediately. This way all freezable tasks that happen
    to call request_firmware() right before freeze_processes() is
    started and lose the race for umhelper_sem with it will be
    frozen and will sleep until thaw_processes() unsets
    usermodehelper_disabled. [For the non-freezable callers of
    request_firmware() the race for umhelper_sem against
    freeze_processes() is unfortunately unavoidable.]

    Reported-by: Stephen Boyd
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Cc: stable@vger.kernel.org

    Rafael J. Wysocki
     
  • The core suspend/hibernation code calls usermodehelper_disable() to
    avoid race conditions between the freezer and the starting of
    usermode helpers and each code path has to do that on its own.
    However, it is always called right before freeze_processes()
    and usermodehelper_enable() is always called right after
    thaw_processes(). For this reason, to avoid code duplication and
    to make the connection between usermodehelper_disable() and the
    freezer more visible, make freeze_processes() call it and remove the
    direct usermodehelper_disable() and usermodehelper_enable() calls
    from all suspend/hibernation code paths.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Cc: stable@vger.kernel.org

    Rafael J. Wysocki
     

05 Mar, 2012

1 commit

  • This patch removes all the references in the code about the TIF_FREEZE
    flag removed by commit a3201227f803ad7fd43180c5195dbe5a2bf998aa

    freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE

    There still are some references to TIF_FREEZE in
    Documentation/power/freezing-of-tasks.txt, but it looks like that
    documentation needs more thorough work to reflect how the new
    freezer works, and hence merely removing the references to TIF_FREEZE
    won't really help. So I have not touched that part in this patch.

    Suggested-by: Srivatsa S. Bhat
    Signed-off-by: Marcos Paulo de Souza
    Signed-off-by: Rafael J. Wysocki

    Marcos Paulo de Souza
     

13 Feb, 2012

1 commit


05 Feb, 2012

1 commit

  • If freezing of kernel threads fails, we are expected to automatically
    thaw tasks in the error recovery path. However, at times, we encounter
    situations in which we would like the automatic error recovery path
    to thaw only the kernel threads, because we want to be able to do
    some more cleanup before we thaw userspace. Something like:

    error = freeze_kernel_threads();
    if (error) {
    /* Do some cleanup */

    /* Only then thaw userspace tasks*/
    thaw_processes();
    }

    An example of such a situation is where we freeze/thaw filesystems
    during suspend/hibernation. There, if freezing of kernel threads
    fails, we would like to thaw the frozen filesystems before thawing
    the userspace tasks.

    So, modify freeze_kernel_threads() to thaw only kernel threads in
    case of freezing failure. And change suspend_freeze_processes()
    accordingly. (At the same time, let us also get rid of the rather
    cryptic usage of the conditional operator (:?) in that function.)

    [rjw: In fact, this patch fixes a regression introduced during the
    3.3 merge window, because without it thaw_processes() may be called
    before swsusp_free() in some situations and that may lead to massive
    memory allocation failures.]

    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Tejun Heo
    Acked-by: Nigel Cunningham
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

30 Jan, 2012

1 commit

  • Commit 2aede851ddf08666f68ffc17be446420e9d2a056

    PM / Hibernate: Freeze kernel threads after preallocating memory

    introduced a mechanism by which kernel threads were frozen after
    the preallocation of hibernate image memory to avoid problems with
    frozen kernel threads not responding to memory freeing requests.
    However, it overlooked the s2disk code path in which the
    SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
    which caused freeze_workqueues_begin() to BUG(), because it saw
    that worqueues had been already frozen.

    Although in principle this issue might be addressed by removing
    the relevant BUG_ON() from freeze_workqueues_begin(), that would
    reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
    attempted to avoid into that particular code path. For this reason,
    to fix the issue at hand, introduce thaw_kernel_threads() and make
    the SNAPSHOT_FREE ioctl execute it.

    Special thanks to Srivatsa S. Bhat for detailed analysis of the
    problem.

    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Srivatsa S. Bhat
    Cc: stable@kernel.org

    Rafael J. Wysocki
     

22 Nov, 2011

10 commits

  • After "freezer: make freezing() test freeze conditions in effect
    instead of TIF_FREEZE", freezing() returns authoritative answer on
    whether the current task should freeze or not and freeze_task()
    doesn't need or use @sig_only. Remove it.

    While at it, rewrite function comment for freeze_task() and rename
    @sig_only to @user_only in try_to_freeze_tasks().

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Using TIF_FREEZE for freezing worked when there was only single
    freezing condition (the PM one); however, now there is also the
    cgroup_freezer and single bit flag is getting clumsy.
    thaw_processes() is already testing whether cgroup freezing in in
    effect to avoid thawing tasks which were frozen by both PM and cgroup
    freezers.

    This is racy (nothing prevents race against cgroup freezing) and
    fragile. A much simpler way is to test actual freeze conditions from
    freezing() - ie. directly test whether PM or cgroup freezing is in
    effect.

    This patch adds variables to indicate whether and what type of
    freezing conditions are in effect and reimplements freezing() such
    that it directly tests whether any of the two freezing conditions is
    active and the task should freeze. On fast path, freezing() is still
    very cheap - it only tests system_freezing_cnt.

    This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
    freeze/thaw operations more usual - updating state variables for the
    new state and nudging target tasks so that they notice the new state
    and comply. As long as the nudging happens after state update, it's
    race-free.

    * This allows use of freezing() in freeze_task(). Replace the open
    coded tests with freezing().

    * p != current test is added to warning printing conditions in
    try_to_freeze_tasks() failure path. This is necessary as freezing()
    is now true for the task which initiated freezing too.

    -v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
    system_freezing_cnt. Fixed.

    Signed-off-by: Tejun Heo
    Acked-by: Paul Menage (for the cgroup portions)

    Tejun Heo
     
  • TIF_FREEZE will be removed soon and freezing() will directly test
    whether any freezing condition is in effect. Make the following
    changes in preparation.

    * Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
    return bool.

    * Make cgroup_freezing() access task_freezer() under rcu read lock
    instead of task_lock(). This makes the state dereferencing racy
    against task moving to another cgroup; however, it was already racy
    without this change as ->state dereference wasn't synchronized.
    This will be later dealt with using attach hooks.

    * freezer->state is now set before trying to push tasks into the
    target state.

    -v2: Oleg pointed out that freeze_change_state() was setting
    freeze->state incorrectly to CGROUP_FROZEN instead of
    CGROUP_FREEZING. Fixed.

    -v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
    try_to_freeze_cgroup() regardless of the current state. Patch
    updated such that the actual freeze/thaw operations are always
    performed on invocation. This shouldn't make any difference
    unless something is broken.

    Signed-off-by: Tejun Heo
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Oleg Nesterov

    Tejun Heo
     
  • freeze_processes() failure path is rather messy. Freezing is canceled
    for workqueues and tasks which aren't frozen yet but frozen tasks are
    left alone and should be thawed by the caller and of course some
    callers (xen and kexec) didn't do it.

    This patch updates __thaw_task() to handle cancelation correctly and
    makes freeze_processes() and freeze_kernel_threads() call
    thaw_processes() on failure instead so that the system is fully thawed
    on failure. Unnecessary [suspend_]thaw_processes() calls are removed
    from kernel/power/hibernate.c, suspend.c and user.c.

    While at it, restructure error checking if clause in suspend_prepare()
    to be less weird.

    -v2: Srivatsa spotted missing removal of suspend_thaw_processes() in
    suspend_prepare() and error in commit message. Updated.

    Signed-off-by: Tejun Heo
    Acked-by: Srivatsa S. Bhat

    Tejun Heo
     
  • try_to_freeze_tasks() and thaw_processes() use freezable() and
    frozen() as preliminary tests before initiating operations on a task.
    These are done without any synchronization and hinder with
    synchronization cleanup without any real performance benefits.

    In try_to_freeze_tasks(), open code self test and move PF_NOFREEZE and
    frozen() tests inside freezer_lock in freeze_task().

    thaw_processes() can simply drop freezable() test as frozen() test in
    __thaw_task() is enough.

    Note: This used to be a part of larger patch to fix set_freezable()
    race. Separated out to satisfy ordering among dependent fixes.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
    interlocked - freezing is set to request freeze and when the task
    actually freezes, it clears freezing and sets frozen.

    This interlocking makes things more complex than necessary - freezing
    doesn't mean there's freezing condition in effect and frozen doesn't
    match the task actually entering and leaving frozen state (it's
    cleared by the thawing task).

    This patch makes freezing indicate that freeze condition is in effect.
    A task enters and stays frozen if freezing. This makes PF_FROZEN
    manipulation done only by the task itself and prevents wakeup from
    __thaw_task() leaking outside of refrigerator.

    The only place which needs to tell freezing && !frozen is
    try_to_freeze_task() to whine about tasks which don't enter frozen.
    It's updated to test the condition explicitly.

    With the change, frozen() state my linger after __thaw_task() until
    the task wakes up and exits fridge. This can trigger BUG_ON() in
    update_if_frozen(). Work it around by testing freezing() && frozen()
    instead of frozen().

    -v2: Oleg pointed out missing re-check of freezing() when trying to
    clear FROZEN and possible spurious BUG_ON() trigger in
    update_if_frozen(). Both fixed.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Paul Menage

    Tejun Heo
     
  • Freezer synchronization is needlessly complicated - it's by no means a
    hot path and the priority is staying unintrusive and safe. This patch
    makes it simply use a dedicated lock instead of piggy-backing on
    task_lock() and playing with memory barriers.

    On the failure path of try_to_freeze_tasks(), locking is moved from it
    to cancel_freezing(). This makes the frozen() test racy but the race
    here is a non-issue as the warning is printed for tasks which failed
    to enter frozen for 20 seconds and race on PF_FROZEN at the last
    moment doesn't change anything.

    This simplifies freezer implementation and eases further changes
    including some race fixes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • There's no point in thawing nosig tasks before others. There's no
    ordering requirement between the two groups on thaw, which the staged
    thawing can't guarantee anyway. Simplify thaw_processes() by removing
    the distinction and collapsing thaw_tasks() into thaw_processes().
    This will help further updates to freezer.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • clear_freeze_flag() in exit_mm() is racy. Freezing can start
    afterwards. Remove it. Skipping freezer for exiting task will be
    properly implemented later.

    Also, freezable() was testing exit_state directly to make system
    freezer ignore dead tasks. Let the exiting task set PF_NOFREEZE after
    entering TASK_DEAD instead.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • thaw_process() now has only internal users - system and cgroup
    freezers. Remove the unnecessary return value, rename, unexport and
    collapse __thaw_process() into it. This will help further updates to
    the freezer code.

    -v3: oom_kill grew a use of thaw_process() while this patch was
    pending. Convert it to use __thaw_task() for now. In the longer
    term, this should be handled by allowing tasks to die if killed
    even if it's frozen.

    -v2: minor style update as suggested by Matt.

    Signed-off-by: Tejun Heo
    Cc: Paul Menage
    Cc: Matt Helsley

    Tejun Heo
     

17 Oct, 2011

1 commit

  • There is a problem with the current ordering of hibernate code which
    leads to deadlocks in some filesystems' memory shrinkers. Namely,
    some filesystems use freezable kernel threads that are inactive when
    the hibernate memory preallocation is carried out. Those same
    filesystems use memory shrinkers that may be triggered by the
    hibernate memory preallocation. If those memory shrinkers wait for
    the frozen kernel threads, the hibernate process deadlocks (this
    happens with XFS, for one example).

    Apparently, it is not technically viable to redesign the filesystems
    in question to avoid the situation described above, so the only
    possible solution of this issue is to defer the freezing of kernel
    threads until the hibernate memory preallocation is done, which is
    implemented by this change.

    Unfortunately, this requires the memory preallocation to be done
    before the "prepare" stage of device freeze, so after this change the
    only way drivers can allocate additional memory for their freeze
    routines in a clean way is to use PM notifiers.

    Reported-by: Christoph
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

17 Feb, 2011

1 commit

  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

24 Dec, 2010

2 commits

  • To avoid confusion with the meaning and return value of
    pm_check_wakeup_events() replace it with pm_wakeup_pending() that
    will work the other way around (ie. return true when system-wide
    power transition should be aborted).

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • After calling freeze_task(), try_to_freeze_tasks() see whether the
    task is stopped or traced and if so, considers it to be frozen;
    however, nothing guarantees that either the task being frozen sees
    TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING
    transition. The task being frozen may wake up and not see TIF_FREEZE
    while the freezer fails to notice the transition and believes the task
    is still stopped.

    This patch fixes the race by making freeze_task() always go through
    fake_signal_wake_up() for applicable tasks. The function goes through
    the target task's scheduler lock and thus guarantees that either the
    target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING.

    Signed-off-by: Tejun Heo
    Signed-off-by: Rafael J. Wysocki

    Tejun Heo
     

17 Oct, 2010

1 commit

  • If there is a wakeup event during the freezing of tasks, suspend or
    hibernation will fail anyway. Since try_to_freeze_tasks() can take
    up to 20 seconds to complete or fail, aborting it as soon as a wakeup
    event is detected improves the worst case wakeup latency.

    Based on a patch from Arve Hjønnevåg.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek

    Rafael J. Wysocki
     

29 Jun, 2010

1 commit

  • Currently, workqueue freezing is implemented by marking the worker
    freezeable and calling try_to_freeze() from dispatch loop.
    Reimplement it using cwq->limit so that the workqueue is frozen
    instead of the worker.

    * workqueue_struct->saved_max_active is added which stores the
    specified max_active on initialization.

    * On freeze, all cwq->max_active's are quenched to zero. Freezing is
    complete when nr_active on all cwqs reach zero.

    * On thaw, all cwq->max_active's are restored to wq->saved_max_active
    and the worklist is repopulated.

    This new implementation allows having single shared pool of workers
    per cpu.

    Signed-off-by: Tejun Heo

    Tejun Heo