11 Jan, 2012

1 commit

  • When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
    0, we have lot of free pages that are not marked so. Snapshot code
    account them as savable, what cause hibernate memory preallocation
    failure.

    It is pretty hard to make hibernate allocation succeed with
    debug_guardpage_minorder=1. This change at least make it possible when
    system has relatively big amount of RAM.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Rafael J. Wysocki
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislaw Gruszka
     

17 Oct, 2011

1 commit

  • For s390 there is one additional byte associated with each page,
    the storage key. This byte contains the referenced and changed
    bits and needs to be included into the hibernation image.
    If the storage keys are not restored to their previous state all
    original pages would appear to be dirty. This can cause
    inconsistencies e.g. with read-only filesystems.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Rafael J. Wysocki

    Martin Schwidefsky
     

07 Jul, 2011

1 commit

  • There is a bug in free_unnecessary_pages() that causes it to
    attempt to free too many pages in some cases, which triggers the
    BUG_ON() in memory_bm_clear_bit() for copy_bm. Namely, if
    count_data_pages() is initially greater than alloc_normal, we get
    to_free_normal equal to 0 and "save" greater from 0. In that case,
    if the sum of "save" and count_highmem_pages() is greater than
    alloc_highmem, we subtract a positive number from to_free_normal.
    Hence, since to_free_normal was 0 before the subtraction and is
    an unsigned int, the result is converted to a huge positive number
    that is used as the number of pages to free.

    Fix this bug by checking if to_free_normal is actually greater
    than or equal to the number we're going to subtract from it.

    Signed-off-by: Rafael J. Wysocki
    Reported-and-tested-by: Matthew Garrett
    Cc: stable@kernel.org

    Rafael J. Wysocki
     

18 May, 2011

2 commits

  • This reverts commit bea3864fb627d110933cfb8babe048b63c4fc76e
    (PM / Hibernate: Reduce autotuned default image size), because users
    are now able to resolve the issue this commit was supposed to address
    in a different way (i.e. by using the new /sys/power/reserved_size
    interface).

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Martin reports that on his system hibernation occasionally fails due
    to the lack of memory, because the radeon driver apparently allocates
    too much of it during the device freeze stage. It turns out that the
    amount of memory allocated by radeon during hibernation (and
    presumably during system suspend too) depends on the utilization of
    the GPU (e.g. hibernating while there are two KDE 4 sessions with
    compositing enabled causes radeon to allocate more memory than for
    one KDE 4 session).

    In principle it should be possible to use image_size to make the
    memory preallocation mechanism free enough memory for the radeon
    driver, but in practice it is not easy to guess the right value
    because of the way the preallocation code uses image_size. For this
    reason, it seems reasonable to allow users to control the amount of
    memory reserved for driver allocations made after the hibernate
    preallocation, which currently is constant and amounts to 1 MB.

    Introduce a new sysfs file, /sys/power/reserved_size, whose value
    will be used as the amount of memory to reserve for the
    post-preallocation reservations made by device drivers, in bytes.
    For backwards compatibility, set its default (and initial) value to
    the currently used number (1 MB).

    References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
    Reported-and-tested-by: Martin Steigerwald
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

15 Mar, 2011

1 commit

  • The hibernate image size autotuning mechanism sets the default
    image size to 5/2 of the total system RAM, but it is reported
    that on some systems device drivers allocate substantial
    amounts of memory during suspend and the creation of the image
    fails as a result (too little memory is preallocated).

    Modify the autotuning mechanism to use 1/3 instead of 2/5 of RAM
    as the default image size, which is reported to be sufficient for
    the affected systems.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=30482
    Reported-and-tested-by: Martin Steigerwald
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

17 Feb, 2011

1 commit


27 Oct, 2010

2 commits


17 Oct, 2010

2 commits


12 Sep, 2010

2 commits

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Avoid hitting OOM during preallocation of memory
    PM QoS: Correct pr_debug() misuse and improve parameter checks
    PM: Prevent waiting forever on asynchronous resume after failing suspend

    Linus Torvalds
     
  • There is a problem in hibernate_preallocate_memory() that it calls
    preallocate_image_memory() with an argument that may be greater than
    the total number of available non-highmem memory pages. If that's
    the case, the OOM condition is guaranteed to trigger, which in turn
    can cause significant slowdown to occur during hibernation.

    To avoid that, make preallocate_image_memory() adjust its argument
    before calling preallocate_image_pages(), so that the total number of
    saveable non-highem pages left is not less than the minimum size of
    a hibernation image. Change hibernate_preallocate_memory() to try to
    allocate from highmem if the number of pages allocated by
    preallocate_image_memory() is too low.

    Modify free_unnecessary_pages() to take all possible memory
    allocation patterns into account.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki
    Tested-by: M. Vefa Bicakci

    Rafael J. Wysocki
     

10 Sep, 2010

1 commit

  • Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf
    "hibernation: freeze swap at hibernation". It complicated matters by
    adding a second swap allocation path, just for hibernation; without in any
    way fixing the issue that it was intended to address - page reclaim after
    fixing the hibernation image might free swap from a page already imaged as
    swapcache, letting its swap be reallocated to store a different page of
    the image: resulting in data corruption if the imaged page were freed as
    clean then swapped back in. Pages freed to si->swap_map were still in
    danger of being reallocated by the alternative allocation path.

    I guess it inadvertently fixed slow SSD swap allocation for hibernation,
    as reported by Nigel Cunningham: by missing out the discards that occur on
    the usual swap allocation path; but that was unintentional, and needs a
    separate fix.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: "Rafael J. Wysocki"
    Cc: Ondrej Zary
    Cc: Andrea Gelmini
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

10 Aug, 2010

1 commit

  • When taking a memory snapshot in hibernate_snapshot(), all (directly
    called) memory allocations use GFP_ATOMIC. Hence swap misusage during
    hibernation never occurs.

    But from a pessimistic point of view, there is no guarantee that no page
    allcation has __GFP_WAIT. It is better to have a global indication "we
    enter hibernation, don't use swap!".

    This patch tries to freeze new-swap-allocation during hibernation. (All
    user processes are frozenm so swapin is not a concern).

    This way, no updates will happen to swap_map[] between
    hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
    is called. We can be assured that swap corruption will not occur.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: "Rafael J. Wysocki"
    Cc: Hugh Dickins
    Cc: KOSAKI Motohiro
    Cc: Ondrej Zary
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

19 Jul, 2010

1 commit


11 May, 2010

1 commit

  • Remove support of reads with offset. This means snapshot_read/write_next
    now does not accept count parameter. It allows to clean up the functions
    and snapshot handle which no longer needs to care about offsets.

    /dev/snapshot handler is converted to simple_{read_from,write_to}_buffer
    which take care of offsets.

    Signed-off-by: Jiri Slaby
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Jiri Slaby
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Feb, 2010

2 commits

  • The hibernate memory preallocation code allocates memory to push some
    user space data out of physical RAM, so that the hibernation image is
    not too large. It allocates more memory than necessary for creating
    the image, so it has to release some pages to make room for
    allocations made while suspending devices and disabling nonboot CPUs,
    or the system will hang due to the lack of free pages to allocate
    from. Unfortunately, the function used for freeing these pages,
    free_unnecessary_pages(), contains a bug that prevents it from doing
    the job on all systems without highmem.

    Fix this problem, which is a regression from the 2.6.30 kernel, by
    using the right condition for the termination of the loop in
    free_unnecessary_pages().

    Signed-off-by: Rafael J. Wysocki
    Reported-and-tested-by: Alan Jenkins
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • Remove a trailing space from a message in swsusp_save().

    Signed-off-by: Frans Pop
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Frans Pop
     

22 Sep, 2009

1 commit

  • Since alloc_bootmem() will never return inaccessible (via virtual
    addressing) memory anyway, using the ..._low() variant only makes sense
    when the physical address range of the allocated memory must fulfill
    further constraints, espacially since on 64-bits (or more generally in all
    cases where the pools the two variants allocate from are than the full
    available range.

    Probably the use in alloc_tce_table() could also be eliminated (based on
    code inspection of pci-calgary_64.c), but that seems too risky given I
    know nothing about that hardware and have no way to test it.

    Signed-off-by: Jan Beulich
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

15 Sep, 2009

5 commits

  • Fix the definition of BM_BITS_PER_BLOCK and kerneldoc
    description of create_bm_block_list().

    [rjw: Added changelog.]

    Signed-off-by: Wu Fengguang
    Signed-off-by: Rafael J. Wysocki

    Wu Fengguang
     
  • Use for_each_populated_zone() instead of for_each_zone() in hibernation
    code. This fixes a bug on s390, where we allow both config options
    HIBERNATION and MEMORY_HOTPLUG, so that we also have a ZONE_MOVABLE
    here. We only allow hibernation if no memory hotplug operation was
    performed, so in fact both features can only be used exclusively, but
    this way we don't need 2 differently configured (distribution) kernels.

    If we have an unpopulated ZONE_MOVABLE, we allow hibernation but run
    into a BUG_ON() in memory_bm_test/set/clear_bit() because hibernation
    code iterates through all zones, not only the populated zones, in
    several places. For example, swsusp_free() does for_each_zone() and
    then checks for pfn_valid(), which is true even if the zone is not
    populated, resulting in a BUG_ON() later because the pfn cannot be
    found in the memory bitmap.

    Replacing all occurences of for_each_zone() in hibernation code with
    for_each_populated_zone() would fix this issue.

    [rjw: Rebased on top of linux-next hibernation patches.]

    Signed-off-by: Gerald Schaefer
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki

    Gerald Schaefer
     
  • We want to avoid attempting to free too much memory too hard during
    hibernation, so estimate the minimum size of the image to use as the
    lower limit for preallocating memory.

    The approach here is based on the (experimental) observation that we
    can't free more page frames than the sum of:

    * global_page_state(NR_SLAB_RECLAIMABLE)
    * global_page_state(NR_ACTIVE_ANON)
    * global_page_state(NR_INACTIVE_ANON)
    * global_page_state(NR_ACTIVE_FILE)
    * global_page_state(NR_INACTIVE_FILE)

    minus

    * global_page_state(NR_FILE_MAPPED)

    Namely, if this number is subtracted from the number of saveable
    pages in the system, we get a good estimate of the minimum reasonable
    size of a hibernation image.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Wu Fengguang

    Rafael J. Wysocki
     
  • Since the hibernation code is now going to use allocations of memory
    to make enough room for the image, it can also use the page frames
    allocated at this stage as image page frames. The low-level
    hibernation code needs to be rearranged for this purpose, but it
    allows us to avoid freeing a great number of pages and allocating
    these same pages once again later, so it generally is worth doing.

    [rev. 2: Take highmem into account correctly.]

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
    just once to make some room for the image and then allocates memory
    to apply more pressure to the memory management subsystem, if
    necessary.

    Unfortunately, we don't seem to be able to drop shrink_all_memory()
    entirely just yet, because that would lead to huge performance
    regressions in some test cases.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek

    Rafael J. Wysocki
     

13 Jun, 2009

1 commit

  • A future patch is going to modify the memory shrinking code so that
    it will make memory allocations to free memory instead of using an
    artificial memory shrinking mechanism for that. For this purpose it
    is convenient to move swsusp_shrink_memory() from
    kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
    memory-shrinking code is going to use things that are local to
    kernel/power/snapshot.c .

    [rev. 2: Make some functions static and remove their headers from
    kernel/power/power.h]

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Acked-by: Wu Fengguang

    Rafael J. Wysocki
     

01 Apr, 2009

1 commit

  • Impact: cleanup

    In almost cases, for_each_zone() is used with populated_zone(). It's
    because almost function doesn't need memoryless node information.
    Therefore, for_each_populated_zone() can help to make code simplify.

    This patch has no functional change.

    [akpm@linux-foundation.org: small cleanup]
    Signed-off-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Reviewed-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

19 Dec, 2008

3 commits

  • Replace one evaluation of pfn_to_page() in copy_data_pages() with
    the value of a local variable containing the right number already.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     
  • It has been requested to make hibernation work with memory
    hotplugging enabled and for this purpose the hibernation code has to
    be reworked to take the possible overlapping of zones into account.
    Thus, rework the hibernation memory bitmaps code to prevent
    duplication of PFNs from occuring and add checks to make sure that
    one page frame will not be marked as saveable many times.

    Additionally, use list.h lists instead of open-coded lists to
    implement the memory bitmaps.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     
  • During resume from hibernation using the userland interface image
    data are being passed from the used space process to the kernel.
    These data need not be valid, but currently the kernel sometimes
    oopses if it gets invalid image data, which is wrong. Make the
    kernel return error codes to the user space in such cases.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     

25 Jul, 2008

1 commit

  • This patch simplifies the memory bitmap manipulations.

    - remove the member size in struct bm_block

    It is not necessary for struct bm_block to have the number of bit chunks that
    can be calculated by using end_pfn and start_pfn.

    - use find_next_bit() for memory_bm_next_pfn

    No need to invent the bitmap library only for the memory bitmap.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

12 Mar, 2008

1 commit

  • There is a problem in the hibernation code that triggers on some NUMA
    systems on which pfn_valid() returns 'true' for some PFNs that don't
    belong to any zone. Namely, there is a BUG_ON() in
    memory_bm_find_bit() that triggers for PFNs not belonging to any
    zone and passing the pfn_valid() test. On the affected systems it
    triggers when we mark PFNs reported by the platform as not saveable,
    because the PFNs in question belong to a region mapped directly using
    iorepam() (i.e. the ACPI data area) and they pass the pfn_valid()
    test.

    Modify memory_bm_find_bit() so that it returns an error if given PFN
    doesn't belong to any zone instead of crashing the kernel and ignore
    the result returned by it in mark_nosave_pages(), while marking the
    "nosave" memory regions.

    This doesn't affect the hibernation functionality, as we won't touch
    the PFNs in question anyway.

    http://bugzilla.kernel.org/show_bug.cgi?id=9966 .

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     

21 Feb, 2008

1 commit

  • Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by
    checking if the pages to be copied are marked as present in the
    kernel mapping and temporarily marking them as present if that's not
    the case. No functional modifications are introduced if
    CONFIG_DEBUG_PAGEALLOC is unset.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     

06 Feb, 2008

1 commit

  • - Add comments explaing how drain_pages() works.

    - Eliminate useless functions

    - Rename drain_all_local_pages to drain_all_pages(). It does drain
    all pages not only those of the local processor.

    - Eliminate useless interrupt off / on sequences. drain_pages()
    disables interrupts on its own. The execution thread is
    pinned to processor by the caller. So there is no need to
    disable interrupts.

    - Put drain_all_pages() declaration in gfp.h and remove the
    declarations from suspend.h and from mm/memory_hotplug.c

    - Make software suspend call drain_all_pages(). The draining
    of processor local pages is may not the right approach if
    software suspend wants to support SMP. If they call drain_all_pages
    then we can make drain_pages() static.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Christoph Lameter
    Acked-by: Mel Gorman
    Cc: "Rafael J. Wysocki"
    Cc: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Feb, 2008

3 commits

  • Make hibernation messages start with one common prefix "PM: " and use
    the word "hibernation" in the messages as a synonym of "suspend to
    disk".

    Turn some KERN_INFO messages into debug ones.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     
  • This patch moves the prototypes of count_highmem_pages() and
    restore_highmem() to kernel/power/power.h

    Signed-off-by: Adrian Bunk
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Adrian Bunk
     
  • Add a new ioctl, SNAPSHOT_GET_IMAGE_SIZE, returning the size of the (just
    created) hibernation image, to the hibernation userland interface.

    This ioctl is necessary so that the userland utilities using the interface need
    not access the hibernation image header, owned by the kernel, in order to obtain
    the size of the image.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     

20 Oct, 2007

1 commit


19 Oct, 2007

1 commit

  • Add the bits needed for supporting arbitrary boot kernels to the common
    hibernation code.

    To support arbitrary boot kernels, make it possible to replace the 'struct
    new_utsname' and the kernel version in the hibernation image header by some
    architecture specific data that will be used to verify if the image is valid
    and to restore the image.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki