20 Jul, 2007

40 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • Implement the cpu_clock(cpu) interface for kernel-internal use:
    high-speed (but slightly incorrect) per-cpu clock constructed from
    sched_clock().

    This API, unused at the moment, will be used in the future by blktrace,
    by the softlockup-watchdog, by printk and by lockstat.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • nr_moved is not the correct check for triggering all pinned logic. Fix
    the all pinned logic in the case of load_balance_newidle().

    Signed-off-by: Suresh Siddha
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • In the presence of SMT, newly idle balance was never happening for
    multi-core and SMP domains (even when both the logical siblings are
    idle).

    If thread 0 is already idle and when thread 1 is about to go to idle,
    newly idle load balance always think that one of the threads is not idle
    and skips doing the newly idle load balance for multi-core and SMP
    domains.

    This is because of the idle_cpu() macro, which checks if the current
    process on a cpu is an idle process. But this is not the case for the
    thread doing the load_balance_newidle().

    Fix this by using runqueue's nr_running field instead of idle_cpu(). And
    also skip the logic of 'only one idle cpu in the group will be doing
    load balancing' during newly idle case.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • I've been chasing these comments around this file all week. Hopefully we're
    straight now.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This is the code for the "lg.ko" module, which allows lguest guests to
    be launched.

    [akpm@linux-foundation.org: update for futex-new-private-futexes]
    [akpm@linux-foundation.org: build fix]
    [jmorris@namei.org: lguest: use hrtimers]
    [akpm@linux-foundation.org: x86_64 build fix]
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • lguest does some fairly lowlevel things to support a host, which
    normal modules don't need:

    math_state_restore:
    When the guest triggers a Device Not Available fault, we need
    to be able to restore the FPU

    __put_task_struct:
    We need to hold a reference to another task for inter-guest
    I/O, and put_task_struct() is an inline function which calls
    __put_task_struct.

    access_process_vm:
    We need to access another task for inter-guest I/O.

    map_vm_area & __get_vm_area:
    We need to map the switcher shim (ie. monitor) at 0xFFC01000.

    Signed-off-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • clocksource_adjust() has a clock argument, which shadows the file global clock
    variable. Fix this up.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • When I started adding support for lockdep to 64-bit powerpc, I got a
    lockdep_init_error and with this patch was able to pinpoint why and where
    to put lockdep_init(). Let's support this generally for others adding
    lockdep support to their architecture.

    Signed-off-by: Johannes Berg
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • optionally add class->name_version and class->subclass to the class name

    Signed-off-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • __acquire
    |
    lock _____
    | \
    | __contended
    | |
    | wait
    | _______/
    |/
    |
    __acquired
    |
    __release
    |
    unlock

    We measure acquisition and contention bouncing.

    This is done by recording a cpu stamp in each lock instance.

    Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
    move __acquired into the generic path.

    __acquired is then used to measure acquisition bouncing by comparing the
    current cpu with the old stamp before replacing it.

    __contended is used to measure contention bouncing (only useful for preemptable
    locks)

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • - update the copyright notices
    - use the default hash function
    - fix a thinko in a BUILD_BUG_ON
    - add a WARN_ON to spot inconsitent naming
    - fix a termination issue in /proc/lock_stat

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Call the new lockstat tracking functions from the various lock primitives.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Present all this fancy new lock statistics information:

    *warning, _wide_ output ahead*

    (output edited for purpose of brevity)

    # cat /proc/lock_stat
    lock_stat version 0.1
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------

    &inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
    ---------------
    &inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
    &inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
    &inode->i_mutex 0 [] pipe_read+0x74/0x3a5
    &inode->i_mutex 0 [] do_lookup+0x81/0x1ae

    .................................................................................................................................................................

    &inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
    &inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
    --------------------------
    &inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
    &inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
    &inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
    &inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58

    .................................................................................................................................................................

    proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
    proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
    shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
    shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76

    'contentions' and 'acquisitions' are the number of such events measured (since
    the last reset). The waittime- and holdtime- (min, max, total) numbers are
    presented in microseconds.

    If there are any contention points, the lock class is presented in the block
    format (as i_mutex and tree_lock above), otherwise a single line of output is
    presented.

    The output is sorted on absolute number of contentions (read + write), this
    should get the worst offenders presented first, so that:

    # grep : /proc/lock_stat | head

    will quickly show who's bad.

    The stats can be reset using:

    # echo 0 > /proc/lock_stat

    [bunk@stusta.de: make 2 functions static]
    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Introduce the core lock statistics code.

    Lock statistics provides lock wait-time and hold-time (as well as the count
    of corresponding contention and acquisitions events). Also, the first few
    call-sites that encounter contention are tracked.

    Lock wait-time is the time spent waiting on the lock. This provides insight
    into the locking scheme, that is, a heavily contended lock is indicative of
    a too coarse locking scheme.

    Lock hold-time is the duration the lock was held, this provides a reference for
    the wait-time numbers, so they can be put into perspective.

    1)
    lock
    2)
    ... do stuff ..
    unlock
    3)

    The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
    hold-time.

    The lockdep held-lock tracking code is reused, because it already collects locks
    into meaningful groups (classes), and because it is an existing infrastructure
    for lock instrumentation.

    Currently lockdep tracks lock acquisition with two hooks:

    lock()
    lock_acquire()
    _lock()

    ... code protected by lock ...

    unlock()
    lock_release()
    _unlock()

    We need to extend this with two more hooks, in order to measure contention.

    lock_contended() - used to measure contention events
    lock_acquired() - completion of the contention

    These are then placed the following way:

    lock()
    lock_acquire()
    if (!_try_lock())
    lock_contended()
    _lock()
    lock_acquired()

    ... do locked stuff ...

    unlock()
    lock_release()
    _unlock()

    (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
    dependent lock primitive implementations.)

    It is also possible to toggle the two lockdep features at runtime using:

    /proc/sys/kernel/prove_locking
    /proc/sys/kernel/lock_stat

    (esp. turning off the O(n^2) prove_locking functionaliy can help)

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: nuke unneeded ifdefs]
    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Move code around to get fewer but larger #ifdef sections. Break some
    in-function #ifdefs out into their own functions.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Ensure that all of the lock dependency tracking code is under
    CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
    other purposes.

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • This patch adds the /sys/kernel/notes magic file. Reading this delivers the
    contents of the kernel's .notes section. This lets userland easily glean any
    detailed information about the running kernel's build that was stored there at
    compile time.

    Signed-off-by: Roland McGrath
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Signed-off-by: Adrian Bunk
    Cc: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch adds an interface to set/reset flags which determines each memory
    segment should be dumped or not when a core file is generated.

    /proc//coredump_filter file is provided to access the flags. You can
    change the flag status for a particular process by writing to or reading from
    the file.

    The flag status is inherited to the child process when it is created.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch changes mm_struct.dumpable to a pair of bit flags.

    set_dumpable() converts three-value dumpable to two flags and stores it into
    lower two bits of mm_struct.flags instead of mm_struct.dumpable.
    get_dumpable() behaves in the opposite way.

    [akpm@linux-foundation.org: export set_dumpable]
    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch series is version 5 of the core dump masking feature, which
    controls which VMAs should be dumped based on their memory types and
    per-process flags.

    I adopted most of Andrew's suggestion at the previous version. He also
    suggested using system call instead of /proc// interface, I decided to
    use the latter continuously because adding new system call with pid argument
    will give a big impact on the kernel.

    You can access the per-process flags via /proc//coredump_filter
    interface. coredump_filter represents a bitmask of memory types, and if a bit
    is set, VMAs of corresponding memory type are written into a core file when
    the process is dumped. The bitmask is inherited from the parent process when
    a process is created.

    The original purpose is to avoid longtime system slowdown when a number of
    processes which share a huge shared memory are dumped at the same time. To
    achieve this purpose, this patch series adds an ability to suppress dumping
    anonymous shared memory for specified processes. In this version, three other
    memory types are also supported.

    Here are the coredump_filter bits:
    bit 0: anonymous private memory
    bit 1: anonymous shared memory
    bit 2: file-backed private memory
    bit 3: file-backed shared memory

    The default value of coredump_filter is 0x3. This means the new core dump
    routine has the same behavior as conventional behavior by default.

    In this version, coredump_filter bits and mm.dumpable are merged into
    mm.flags, and it is accessed by atomic bitops.

    The supported core file formats are ELF and ELF-FDPIC. ELF has been tested,
    but ELF-FDPIC has not been built and tested because I don't have the test
    environment.

    This patch limits a value of suid_dumpable sysctl to the range of 0 to 2.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
    the old mm into the new mm.

    We create the new mm before the binfmt code runs, and place the new stack at
    the very top of the address space. Once the binfmt code runs and figures out
    where the stack should be, we move it downwards.

    It is a bit peculiar in that we have one task with two mm's, one of which is
    inactive.

    [a.p.zijlstra@chello.nl: limit stack size]
    Signed-off-by: Ollie Wild
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Hugh Dickins
    [bunk@stusta.de: unexport bprm_mm_init]
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ollie Wild
     
  • The purpose of audit_bprm() is to log the argv array to a userspace daemon at
    the end of the execve system call. Since user-space hasn't had time to run,
    this array is still in pristine state on the process' stack; so no need to
    copy it, we can just grab it from there.

    In order to minimize the damage to audit_log_*() copy each string into a
    temporary kernel buffer first.

    Currently the audit code requires that the full argument vector fits in a
    single packet. So currently it does clip the argv size to a (sysctl) limit,
    but only when execve auditing is enabled.

    If the audit protocol gets extended to allow for multiple packets this check
    can be removed.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ollie Wild
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Currently most of the per cpu data, which is accessed by different cpus,
    has a ____cacheline_aligned_in_smp attribute. Move all this data to the
    new per cpu shared data section: .data.percpu.shared_aligned.

    This will seperate the percpu data which is referenced frequently by other
    cpus from the local only percpu data.

    Signed-off-by: Fenghua Yu
    Acked-by: Suresh Siddha
    Cc: Rusty Russell
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fenghua Yu
     
  • I realise jprobes are a razor-blades-included type of interface, but that
    doesn't mean we can't try and make them safer to use. This guy I know once
    wrote code like this:

    struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };

    And then his kernel exploded. Oops.

    This patch adds an arch hook, arch_deref_entry_point() (I don't like it
    either) which takes the void * in a struct jprobe, and gives back the text
    address that it represents.

    We can then use that in register_jprobe() to check that the entry point we're
    passed is actually in the kernel text, rather than just some random value.

    Signed-off-by: Michael Ellerman
    Cc: Prasanna S Panchamukhi
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • Move "debug during resume from s2ram" into the variable we already use
    for real-mode flags to simplify code. It also closes nasty trap for
    the user in acpi_sleep_setup; order of parameters actually mattered there,
    acpi_sleep=s3_bios,s3_mode doing something different from
    acpi_sleep=s3_mode,s3_bios.

    Signed-off-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Add a feature allowing the user to make the system beep during a resume from
    suspend to RAM, on x86_64 and i386.

    This is useful for the users with broken resume from RAM, so that they can
    verify if the control reaches the kernel after a wake-up event.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nigel Cunningham
     
  • Introduce the pm_power_off_prepare() callback that can be registered by the
    interested platforms in analogy with pm_idle() and pm_power_off(), used for
    preparing the system to power off (needed by ACPI).

    This allows us to drop acpi_sysclass and device_acpi that are only defined in
    order to register the ACPI power off preparation callback, which is needed by
    pm_power_off() registered in a much different way.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • The SNAPSHOT_S2RAM ioctl code is outdated and it should not duplicate the
    suspend code in kernel/power/main.c. Fix that.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • At present, if a user mode helper is running while
    usermodehelper_pm_callback() is executed, the helper may be frozen and the
    completion in call_usermodehelper_exec() won't be completed until user
    space processes are thawed. As a result, the freezing of kernel threads
    may fail, which is not desirable.

    Prevent this from happening by introducing a counter of running user mode
    helpers and allowing usermodehelper_pm_callback() to succeed for action =
    PM_HIBERNATION_PREPARE or action = PM_SUSPEND_PREPARE only if there are no
    helpers running. [Namely, usermodehelper_pm_callback() waits for at most
    RUNNING_HELPERS_TIMEOUT for the number of running helpers to become zero
    and fails if that doesn't happen.]

    Special thanks to Uli Luckas , Pavel Machek
    and Oleg Nesterov for reviewing the
    previous versions of this patch and for very useful comments.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Uli Luckas
    Acked-by: Nigel Cunningham
    Acked-by: Pavel Machek
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Use a hibernation and suspend notifier to disable the user mode helper before
    a hibernation/suspend and enable it after the operation.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Acked-by: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Make it possible to register hibernation and suspend notifiers, so that
    subsystems can perform hibernation-related or suspend-related operations that
    should not be carried out by device drivers' .suspend() and .resume()
    routines.

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • We don't need to check if todo is positive before calling time_after() in
    try_to_freeze_tasks(), because if todo is zero at this point, the loop will be
    broken anyway due to the while () condition being false.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Make try_to_freeze_tasks() and freeze_processes() return -EBUSY on failure
    instead of the number of unfrozen tasks (none of the callers actually uses
    this number).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Use __set_current_state() as appropriate in refrigerator() instead of
    accessing current->state directly.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Kernel threads should not have TIF_FREEZE set when user space processes are
    being frozen, since otherwise some of them might be frozen prematurely.
    To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE
    unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks()
    check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags
    when user space processes are to be frozen.

    Namely, when user space processes are being frozen, we only should set
    TIF_FREEZE for tasks that have p->mm different from NULL and don't have
    PF_BORROWED_MM set in p->flags. For this reason task_lock() must be used to
    prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which
    p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p). Also, we
    need to prevent the following scenario from happening:

    * daemonize() is called by a task spawned from a user space code path
    * freezer checks if the task has p->mm set and the result is positive
    * task enters exit_mm() and clears its TIF_FREEZE
    * freezer sets TIF_FREEZE for the task
    * task calls try_to_freeze() and goes to the refrigerator, which is wrong at
    that point

    This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and
    p->mm are examined and release it after TIF_FREEZE is set for p (or it turns
    out that TIF_FREEZE should not be set).

    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Cc: Nigel Cunningham
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • During hibernation we call hibernation_ops->prepare() before creating the image,
    but then, before saving it, we cancel the power transition by calling
    hibernation_ops->finish(). Thus prior to calling hibernation_ops->enter() we
    should let the platform firmware know that we're going to enter the low power
    state after all.

    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Cc: Nigel Cunningham
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Change the code ordering so that hibernation_ops->prepare() is called after
    device_suspend(). This is needed so that we don't violate the ACPI
    specification, which states that the _PTS and _GTS system-control methods,
    executed from acpi_sleep_prepare(), ought to be called after devices have been
    put in low power states.

    The "Finish" label in hibernation_restore() is moved, because device_suspend()
    resumes devices if the suspending of them fails and the restore code ordering
    should reflect the hibernation code ordering.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki