12 Sep, 2010

2 commits

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Avoid hitting OOM during preallocation of memory
    PM QoS: Correct pr_debug() misuse and improve parameter checks
    PM: Prevent waiting forever on asynchronous resume after failing suspend

    Linus Torvalds
     
  • There is a problem in hibernate_preallocate_memory() that it calls
    preallocate_image_memory() with an argument that may be greater than
    the total number of available non-highmem memory pages. If that's
    the case, the OOM condition is guaranteed to trigger, which in turn
    can cause significant slowdown to occur during hibernation.

    To avoid that, make preallocate_image_memory() adjust its argument
    before calling preallocate_image_pages(), so that the total number of
    saveable non-highem pages left is not less than the minimum size of
    a hibernation image. Change hibernate_preallocate_memory() to try to
    allocate from highmem if the number of pages allocated by
    preallocate_image_memory() is too low.

    Modify free_unnecessary_pages() to take all possible memory
    allocation patterns into account.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki
    Tested-by: M. Vefa Bicakci

    Rafael J. Wysocki
     

10 Sep, 2010

1 commit

  • Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf
    "hibernation: freeze swap at hibernation". It complicated matters by
    adding a second swap allocation path, just for hibernation; without in any
    way fixing the issue that it was intended to address - page reclaim after
    fixing the hibernation image might free swap from a page already imaged as
    swapcache, letting its swap be reallocated to store a different page of
    the image: resulting in data corruption if the imaged page were freed as
    clean then swapped back in. Pages freed to si->swap_map were still in
    danger of being reallocated by the alternative allocation path.

    I guess it inadvertently fixed slow SSD swap allocation for hibernation,
    as reported by Nigel Cunningham: by missing out the discards that occur on
    the usual swap allocation path; but that was unintentional, and needs a
    separate fix.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: "Rafael J. Wysocki"
    Cc: Ondrej Zary
    Cc: Andrea Gelmini
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

20 Aug, 2010

1 commit


11 Aug, 2010

1 commit

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     

10 Aug, 2010

1 commit

  • When taking a memory snapshot in hibernate_snapshot(), all (directly
    called) memory allocations use GFP_ATOMIC. Hence swap misusage during
    hibernation never occurs.

    But from a pessimistic point of view, there is no guarantee that no page
    allcation has __GFP_WAIT. It is better to have a global indication "we
    enter hibernation, don't use swap!".

    This patch tries to freeze new-swap-allocation during hibernation. (All
    user processes are frozenm so swapin is not a concern).

    This way, no updates will happen to swap_map[] between
    hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
    is called. We can be assured that swap corruption will not occur.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: "Rafael J. Wysocki"
    Cc: Hugh Dickins
    Cc: KOSAKI Motohiro
    Cc: Ondrej Zary
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

08 Aug, 2010

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
    workqueue: mark init_workqueues() as early_initcall()
    workqueue: explain for_each_*cwq_cpu() iterators
    fscache: fix build on !CONFIG_SYSCTL
    slow-work: kill it
    gfs2: use workqueue instead of slow-work
    drm: use workqueue instead of slow-work
    cifs: use workqueue instead of slow-work
    fscache: drop references to slow-work
    fscache: convert operation to use workqueue instead of slow-work
    fscache: convert object to use workqueue instead of slow-work
    workqueue: fix how cpu number is stored in work->data
    workqueue: fix mayday_mask handling on UP
    workqueue: fix build problem on !CONFIG_SMP
    workqueue: fix locking in retry path of maybe_create_worker()
    async: use workqueue for worker pool
    workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
    workqueue: implement unbound workqueue
    workqueue: prepare for WQ_UNBOUND implementation
    libata: take advantage of cmwq and remove concurrency limitations
    workqueue: fix worker management invocation without pending works
    ...

    Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
    include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c

    Linus Torvalds
     
  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Aug, 2010

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits)
    Documentation: update broken web addresses.
    fix comment typo "choosed" -> "chosen"
    hostap:hostap_hw.c Fix typo in comment
    Fix spelling contorller -> controller in comments
    Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault
    fs/Kconfig: Fix typo Userpace -> Userspace
    Removing dead MACH_U300_BS26
    drivers/infiniband: Remove unnecessary casts of private_data
    fs/ocfs2: Remove unnecessary casts of private_data
    libfc: use ARRAY_SIZE
    scsi: bfa: use ARRAY_SIZE
    drm: i915: use ARRAY_SIZE
    drm: drm_edid: use ARRAY_SIZE
    synclink: use ARRAY_SIZE
    block: cciss: use ARRAY_SIZE
    comment typo fixes: charater => character
    fix comment typos concerning "challenge"
    arm: plat-spear: fix typo in kerneldoc
    reiserfs: typo comment fix
    update email address
    ...

    Linus Torvalds
     

04 Aug, 2010

1 commit


19 Jul, 2010

6 commits

  • pavel@suse.cz no longer works, replace it with working address.

    Signed-off-by: Pavel Machek
    Signed-off-by: Jiri Kosina

    Pavel Machek
     
  • The ACPI suspend code calls suspend_nvs_free() at a wrong place,
    which may lead to a memory leak if there's an error executing
    acpi_pm_prepare(), because acpi_pm_finish() will not be called in
    that case. However, the root cause of this problem is the
    apparently confusing ordering of calls in suspend error paths that
    needs to be fixed.

    In addition to that, fix a typo in a label name in suspend.c.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Len Brown

    Rafael J. Wysocki
     
  • There is an inconsistency between hibernation_platform_enter()
    and hibernation_snapshot(), because the latter calls
    hibernation_ops->end() after failing hibernation_ops->begin(), while
    the former doesn't do that. Make hibernation_snapshot() behave in
    the same way as hibernation_platform_enter() in that respect.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Len Brown

    Rafael J. Wysocki
     
  • The hibernation_platform_enter() function calls dpm_suspend_noirq()
    instead of dpm_resume_noirq() by mistake. Fix this.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Len Brown

    Rafael J. Wysocki
     
  • One of the arguments during the suspend blockers discussion was that
    the mainline kernel didn't contain any mechanisms making it possible
    to avoid races between wakeup and system suspend.

    Generally, there are two problems in that area. First, if a wakeup
    event occurs exactly when /sys/power/state is being written to, it
    may be delivered to user space right before the freezer kicks in, so
    the user space consumer of the event may not be able to process it
    before the system is suspended. Second, if a wakeup event occurs
    after user space has been frozen, it is not generally guaranteed that
    the ongoing transition of the system into a sleep state will be
    aborted.

    To address these issues introduce a new global sysfs attribute,
    /sys/power/wakeup_count, associated with a running counter of wakeup
    events and three helper functions, pm_stay_awake(), pm_relax(), and
    pm_wakeup_event(), that may be used by kernel subsystems to control
    the behavior of this attribute and to request the PM core to abort
    system transitions into a sleep state already in progress.

    The /sys/power/wakeup_count file may be read from or written to by
    user space. Reads will always succeed (unless interrupted by a
    signal) and return the current value of the wakeup events counter.
    Writes, however, will only succeed if the written number is equal to
    the current value of the wakeup events counter. If a write is
    successful, it will cause the kernel to save the current value of the
    wakeup events counter and to abort the subsequent system transition
    into a sleep state if any wakeup events are reported after the write
    has returned.

    [The assumption is that before writing to /sys/power/state user space
    will first read from /sys/power/wakeup_count. Next, user space
    consumers of wakeup events will have a chance to acknowledge or
    veto the upcoming system transition to a sleep state. Finally, if
    the transition is allowed to proceed, /sys/power/wakeup_count will
    be written to and if that succeeds, /sys/power/state will be written
    to as well. Still, if any wakeup events are reported to the PM core
    by kernel subsystems after that point, the transition will be
    aborted.]

    Additionally, put a wakeup events counter into struct dev_pm_info and
    make these per-device wakeup event counters available via sysfs,
    so that it's possible to check the activity of various wakeup event
    sources within the kernel.

    To illustrate how subsystems can use pm_wakeup_event(), make the
    low-level PCI runtime PM wakeup-handling code use it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Jesse Barnes
    Acked-by: Greg Kroah-Hartman
    Acked-by: markgross
    Reviewed-by: Alan Stern

    Rafael J. Wysocki
     
  • There are a few typos in kernel/power/swap.c. Fix them.

    Signed-off-by: Cesar Eduardo Barros
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Cesar Eduardo Barros
     

29 Jun, 2010

1 commit

  • Currently, workqueue freezing is implemented by marking the worker
    freezeable and calling try_to_freeze() from dispatch loop.
    Reimplement it using cwq->limit so that the workqueue is frozen
    instead of the worker.

    * workqueue_struct->saved_max_active is added which stores the
    specified max_active on initialization.

    * On freeze, all cwq->max_active's are quenched to zero. Freezing is
    complete when nr_active on all cwqs reach zero.

    * On thaw, all cwq->max_active's are restored to wq->saved_max_active
    and the worklist is repopulated.

    This new implementation allows having single shared pool of workers
    per cpu.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

10 Jun, 2010

1 commit


11 May, 2010

5 commits


11 Apr, 2010

1 commit

  • When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device
    improperly by old_decode_dev and it results in an error while
    hibernating with s2disk.

    All users already pass the new device number, so switch to
    new_decode_dev().

    Signed-off-by: Jiri Slaby
    Reported-and-tested-by: Jiri Kosina
    Signed-off-by: "Rafael J. Wysocki"

    Jiri Slaby
     

05 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

2 commits

  • When the cgroup freezer is used to freeze tasks we do not want to thaw
    those tasks during resume. Currently we test the cgroup freezer
    state of the resuming tasks to see if the cgroup is FROZEN. If so
    then we don't thaw the task. However, the FREEZING state also indicates
    that the task should remain frozen.

    This also avoids a problem pointed out by Oren Ladaan: the freezer state
    transition from FREEZING to FROZEN is updated lazily when userspace reads
    or writes the freezer.state file in the cgroup filesystem. This means that
    resume will thaw tasks in cgroups which should be in the FROZEN state if
    there is no read/write of the freezer.state file to trigger this
    transition before suspend.

    NOTE: Another "simple" solution would be to always update the cgroup
    freezer state during resume. However it's a bad choice for several reasons:
    Updating the cgroup freezer state is somewhat expensive because it requires
    walking all the tasks in the cgroup and checking if they are each frozen.
    Worse, this could easily make resume run in N^2 time where N is the number
    of tasks in the cgroup. Finally, updating the freezer state from this code
    path requires trickier locking because of the way locks must be ordered.

    Instead of updating the freezer state we rely on the fact that lazy
    updates only manage the transition from FREEZING to FROZEN. We know that
    a cgroup with the FREEZING state may actually be FROZEN so test for that
    state too. This makes sense in the resume path even for partially-frozen
    cgroups -- those that really are FREEZING but not FROZEN.

    Reported-by: Oren Ladaan
    Signed-off-by: Matt Helsley
    Cc: stable@kernel.org
    Signed-off-by: Rafael J. Wysocki

    Matt Helsley
     
  • show_state will dump all tasks state, so if freezer failed to freeze
    any task, kernel will dump all tasks state and flood the dmesg log.
    This patch makes freezer only show state of tasks refusing to freeze.

    Signed-off-by: Xiaotian Feng
    Acked-by: Pavel Machek
    Acked-by: David Rientjes
    Signed-off-by: Rafael J. Wysocki

    Xiaotian Feng
     

07 Mar, 2010

1 commit

  • There are quite a few GFP_KERNEL memory allocations made during
    suspend/hibernation and resume that may cause the system to hang, because
    the I/O operations they depend on cannot be completed due to the
    underlying devices being suspended.

    Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
    gfp_allowed_mask before suspend/hibernation and restoring the original
    values of these bits in gfp_allowed_mask durig the subsequent resume.

    [akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
    Signed-off-by: Rafael J. Wysocki
    Reported-by: Maxim Levitsky
    Cc: Sebastian Ott
    Cc: Benjamin Herrenschmidt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

27 Feb, 2010

7 commits

  • The hibernate memory preallocation code allocates memory to push some
    user space data out of physical RAM, so that the hibernation image is
    not too large. It allocates more memory than necessary for creating
    the image, so it has to release some pages to make room for
    allocations made while suspending devices and disabling nonboot CPUs,
    or the system will hang due to the lack of free pages to allocate
    from. Unfortunately, the function used for freeing these pages,
    free_unnecessary_pages(), contains a bug that prevents it from doing
    the job on all systems without highmem.

    Fix this problem, which is a regression from the 2.6.30 kernel, by
    using the right condition for the termination of the loop in
    free_unnecessary_pages().

    Signed-off-by: Rafael J. Wysocki
    Reported-and-tested-by: Alan Jenkins
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • Its contents and entry in Makefile were already removed in
    8e60c6a1348e17e68ad73589a52a03876e7059be
    (Shift remaining code from swsusp.c to hibernate.c)
    but somehow it remained in-place (rjw: which most likely was my
    mistake).

    Signed-off-by: Jiri Slaby
    Acked-by: Nigel Cunningham
    Signed-off-by: Rafael J. Wysocki

    Jiri Slaby
     
  • Remove a trailing space from a message in swsusp_save().

    Signed-off-by: Frans Pop
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Frans Pop
     
  • It will never reach here if the sws_resume_bdev is erratic.
    swsusp_read() is called only from software_resume(), but after
    swsusp_check() which would catch the error state.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Rafael J. Wysocki

    Jiri Slaby
     
  • They were deprecated and removed from exported headers more than 2
    years ago. Inform users about their removal in the future now.

    (Switch cases needed to be reorderded for an easy fall through.)

    And add an entry to feature-removal-schedule.

    Signed-off-by: Jiri Slaby
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Jiri Slaby
     
  • Add configuration switch CONFIG_PM_ADVANCED_DEBUG for compiling in
    extra PM debugging/testing code allowing one to access some
    PM-related attributes of devices from the user space via sysfs.

    If CONFIG_PM_ADVANCED_DEBUG is set, add sysfs attribute power/async
    for every device allowing the user space to access the device's
    power.async_suspend flag and modify it, if desired.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Add sysfs attribute /sys/power/pm_async allowing the user space to
    disable/enable asynchronous suspend/resume of devices.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

23 Feb, 2010

1 commit

  • Introduce run-time PM callbacks for the PCI bus type. Make the new
    callbacks work in analogy with the existing system sleep PM
    callbacks, so that the drivers already converted to struct dev_pm_ops
    can use their suspend and resume routines for run-time PM without
    modifications.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Jesse Barnes

    Rafael J. Wysocki
     

16 Dec, 2009

1 commit

  • The kernel offers with TIOCL_GETKMSGREDIRECT ioctl() the possibility to
    redirect the kernel messages to a specific console.

    However, since it's not possible to switch to the kernel message console
    after a panic(), it would be nice if the kernel would print the panic
    message on the current console.

    This patch series adds a new interface to access the global kmsg_redirect
    variable by a function to be able to use it in code where
    CONFIG_VT_CONSOLE is not set (kernel/panic.c).

    This patch:

    Instead of using and exporting a global value kmsg_redirect, introduce a
    function vt_kmsg_redirect() that both can set and return the console where
    messages are printed.

    Change all users of kmsg_redirect (the VT code itself and kernel/power.c)
    to the new interface.

    The main advantage is that vt_kmsg_redirect() can also be used when
    CONFIG_VT_CONSOLE is not set.

    Signed-off-by: Bernhard Walle
    Cc: Alan Cox
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

06 Dec, 2009

2 commits