Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2007

40 commits

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800
e436d8008 [PATCH] sched: implement cpu_clock(cpu) high-speed time source ... Browse Code »

Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().

This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-07-20 03:28:35 +0800
969bb4e40 [PATCH] sched: fix the all pinned logic in load_balance_newidle() ... Browse Code »

nr_moved is not the correct check for triggering all pinned logic. Fix
the all pinned logic in the case of load_balance_newidle().

Signed-off-by: Suresh Siddha
Signed-off-by: Ingo Molnar

Suresh Siddha
2007-07-20 03:28:35 +0800
9439aab8d [PATCH] sched: fix newly idle load balance in case of SMT ... Browse Code »

In the presence of SMT, newly idle balance was never happening for
multi-core and SMP domains (even when both the logical siblings are
idle).

If thread 0 is already idle and when thread 1 is about to go to idle,
newly idle load balance always think that one of the threads is not idle
and skips doing the newly idle load balance for multi-core and SMP
domains.

This is because of the idle_cpu() macro, which checks if the current
process on a cpu is an idle process. But this is not the case for the
thread doing the load_balance_newidle().

Fix this by using runqueue's nr_running field instead of idle_cpu(). And
also skip the logic of 'only one idle cpu in the group will be doing
load balancing' during newly idle case.

Signed-off-by: Suresh Siddha
Signed-off-by: Ingo Molnar

Suresh Siddha
2007-07-20 03:28:35 +0800
ed2c12f32 kernel/sysctl.c: finish off the warning comments ... Browse Code »

I've been chasing these comments around this file all week. Hopefully we're
straight now.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-20 01:04:57 +0800
d7e28ffe6 lguest: the host code ... Browse Code »

This is the code for the "lg.ko" module, which allows lguest guests to
be launched.

[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Eric Dumazet
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:52 +0800
5992b6dac lguest: export symbols for lguest as a module ... Browse Code »

lguest does some fairly lowlevel things to support a host, which
normal modules don't need:

math_state_restore:
When the guest triggers a Device Not Available fault, we need
to be able to restore the FPU

__put_task_struct:
We need to hold a reference to another task for inter-guest
I/O, and put_task_struct() is an inline function which calls
__put_task_struct.

access_process_vm:
We need to access another task for inter-guest I/O.

map_vm_area & __get_vm_area:
We need to map the switcher shim (ie. monitor) at 0xFFC01000.

Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:52 +0800
6819457d2 timer.c: cleanup recently introduced whitespace damage ... Browse Code »

Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-07-20 01:04:52 +0800
71120f183 timekeeping: fixup shadow variable argument ... Browse Code »

clocksource_adjust() has a clock argument, which shadows the file global clock
variable. Fix this up.

Signed-off-by: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-07-20 01:04:52 +0800
c71063c9c lockdep debugging: give stacktrace for init_error ... Browse Code »

When I started adding support for lockdep to 64-bit powerpc, I got a
lockdep_init_error and with this patch was able to pinpoint why and where
to put lockdep_init(). Let's support this generally for others adding
lockdep support to their architecture.

Signed-off-by: Johannes Berg
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Berg
2007-07-20 01:04:49 +0800
d38e1d5aa lockstat: better class name representation ... Browse Code »

optionally add class->name_version and class->subclass to the class name

Signed-off-by: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
96645678c lockstat: measure lock bouncing ... Browse Code »

__acquire
|
lock _____
| \
| __contended
| |
| wait
| _______/
|/
|
__acquired
|
__release
|
unlock

We measure acquisition and contention bouncing.

This is done by recording a cpu stamp in each lock instance.

Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
move __acquired into the generic path.

__acquired is then used to measure acquisition bouncing by comparing the
current cpu with the old stamp before replacing it.

__contended is used to measure contention bouncing (only useful for preemptable
locks)

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
4b32d0a4e lockdep: various fixes ... Browse Code »

- update the copyright notices
- use the default hash function
- fix a thinko in a BUILD_BUG_ON
- add a WARN_ON to spot inconsitent naming
- fix a termination issue in /proc/lock_stat

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
4fe87745a lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex ... Browse Code »

Call the new lockstat tracking functions from the various lock primitives.

Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
c46261de0 lockstat: human readability tweaks ... Browse Code »

Present all this fancy new lock statistics information:

*warning, _wide_ output ahead*

(output edited for purpose of brevity)

# cat /proc/lock_stat
lock_stat version 0.1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
class name contentions waittime-min waittime-max waittime-total acquisitions holdtime-min holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

&inode->i_mutex: 14458 6.57 398832.75 2469412.23 6768876 0.34 11398383.65 339410830.89
---------------
&inode->i_mutex 4486 [] pipe_wait+0x86/0x8d
&inode->i_mutex 0 [] pipe_write_fasync+0x29/0x5d
&inode->i_mutex 0 [] pipe_read+0x74/0x3a5
&inode->i_mutex 0 [] do_lookup+0x81/0x1ae

.................................................................................................................................................................

&inode->i_data.tree_lock-W: 491 0.27 62.47 493.89 2477833 0.39 468.89 1146584.25
&inode->i_data.tree_lock-R: 65 0.44 4.27 48.78 26288792 0.36 184.62 10197458.24
--------------------------
&inode->i_data.tree_lock 46 [] __do_page_cache_readahead+0x69/0x24f
&inode->i_data.tree_lock 31 [] add_to_page_cache+0x31/0xba
&inode->i_data.tree_lock 0 [] __do_page_cache_readahead+0xc2/0x24f
&inode->i_data.tree_lock 0 [] find_get_page+0x1a/0x58

.................................................................................................................................................................

proc_inum_idr.lock: 0 0.00 0.00 0.00 36 0.00 65.60 148.26
proc_subdir_lock: 0 0.00 0.00 0.00 3049859 0.00 106.81 1563212.42
shrinker_rwsem-W: 0 0.00 0.00 0.00 5 0.00 1.73 3.68
shrinker_rwsem-R: 0 0.00 0.00 0.00 633 2.57 246.57 10909.76

'contentions' and 'acquisitions' are the number of such events measured (since
the last reset). The waittime- and holdtime- (min, max, total) numbers are
presented in microseconds.

If there are any contention points, the lock class is presented in the block
format (as i_mutex and tree_lock above), otherwise a single line of output is
presented.

The output is sorted on absolute number of contentions (read + write), this
should get the worst offenders presented first, so that:

# grep : /proc/lock_stat | head

will quickly show who's bad.

The stats can be reset using:

# echo 0 > /proc/lock_stat

[bunk@stusta.de: make 2 functions static]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
f20786ff4 lockstat: core infrastructure ... Browse Code »

Introduce the core lock statistics code.

Lock statistics provides lock wait-time and hold-time (as well as the count
of corresponding contention and acquisitions events). Also, the first few
call-sites that encounter contention are tracked.

Lock wait-time is the time spent waiting on the lock. This provides insight
into the locking scheme, that is, a heavily contended lock is indicative of
a too coarse locking scheme.

Lock hold-time is the duration the lock was held, this provides a reference for
the wait-time numbers, so they can be put into perspective.

1)
lock
2)
... do stuff ..
unlock
3)

The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
hold-time.

The lockdep held-lock tracking code is reused, because it already collects locks
into meaningful groups (classes), and because it is an existing infrastructure
for lock instrumentation.

Currently lockdep tracks lock acquisition with two hooks:

lock()
lock_acquire()
_lock()

... code protected by lock ...

unlock()
lock_release()
_unlock()

We need to extend this with two more hooks, in order to measure contention.

lock_contended() - used to measure contention events
lock_acquired() - completion of the contention

These are then placed the following way:

lock()
lock_acquire()
if (!_try_lock())
lock_contended()
_lock()
lock_acquired()

... do locked stuff ...

unlock()
lock_release()
_unlock()

(Note: the try_lock() 'trick' is used to avoid instrumenting all platform
dependent lock primitive implementations.)

It is also possible to toggle the two lockdep features at runtime using:

/proc/sys/kernel/prove_locking
/proc/sys/kernel/lock_stat

(esp. turning off the O(n^2) prove_locking functionaliy can help)

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: nuke unneeded ifdefs]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
8e18257d2 lockdep: reduce the ifdeffery ... Browse Code »

Move code around to get fewer but larger #ifdef sections. Break some
in-function #ifdefs out into their own functions.

Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
ca58abcb4 lockdep: sanitise CONFIG_PROVE_LOCKING ... Browse Code »

Ensure that all of the lock dependency tracking code is under
CONFIG_PROVE_LOCKING. This allows us to use the held lock tracking code for
other purposes.

Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
da1a679cd Add /sys/kernel/notes ... Browse Code »

This patch adds the /sys/kernel/notes magic file. Reading this delivers the
contents of the kernel's .notes section. This lets userland easily glean any
detailed information about the running kernel's build that was stored there at
compile time.

Signed-off-by: Roland McGrath
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2007-07-20 01:04:47 +0800
01c55ed32 kernel/relay.c: make functions static ... Browse Code »

Signed-off-by: Adrian Bunk
Cc: Tom Zanussi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-07-20 01:04:47 +0800
3cb4a0bb1 coredump masking: add an interface for core dump filter ... Browse Code »

This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated.

/proc//coredump_filter file is provided to access the flags. You can
change the flag status for a particular process by writing to or reading from
the file.

The flag status is inherited to the child process when it is created.

Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:47 +0800
6c5d52382 coredump masking: reimplementation of dumpable using two flags ... Browse Code »

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.

[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:46 +0800
76fdbb25f coredump masking: bound suid_dumpable sysctl ... Browse Code »

This patch series is version 5 of the core dump masking feature, which
controls which VMAs should be dumped based on their memory types and
per-process flags.

I adopted most of Andrew's suggestion at the previous version. He also
suggested using system call instead of /proc// interface, I decided to
use the latter continuously because adding new system call with pid argument
will give a big impact on the kernel.

You can access the per-process flags via /proc//coredump_filter
interface. coredump_filter represents a bitmask of memory types, and if a bit
is set, VMAs of corresponding memory type are written into a core file when
the process is dumped. The bitmask is inherited from the parent process when
a process is created.

The original purpose is to avoid longtime system slowdown when a number of
processes which share a huge shared memory are dumped at the same time. To
achieve this purpose, this patch series adds an ability to suppress dumping
anonymous shared memory for specified processes. In this version, three other
memory types are also supported.

Here are the coredump_filter bits:
bit 0: anonymous private memory
bit 1: anonymous shared memory
bit 2: file-backed private memory
bit 3: file-backed shared memory

The default value of coredump_filter is 0x3. This means the new core dump
routine has the same behavior as conventional behavior by default.

In this version, coredump_filter bits and mm.dumpable are merged into
mm.flags, and it is accessed by atomic bitops.

The supported core file formats are ELF and ELF-FDPIC. ELF has been tested,
but ELF-FDPIC has not been built and tested because I don't have the test
environment.

This patch limits a value of suid_dumpable sysctl to the range of 0 to 2.

Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:46 +0800
b6a2fea39 mm: variable length argument support ... Browse Code »

Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.

We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.

It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.

[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild
Signed-off-by: Peter Zijlstra
Cc:
Cc: Hugh Dickins
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ollie Wild
2007-07-20 01:04:45 +0800
bdf4c48af audit: rework execve audit ... Browse Code »

The purpose of audit_bprm() is to log the argv array to a userspace daemon at
the end of the execve system call. Since user-space hasn't had time to run,
this array is still in pristine state on the process' stack; so no need to
copy it, we can just grab it from there.

In order to minimize the damage to audit_log_*() copy each string into a
temporary kernel buffer first.

Currently the audit code requires that the full argument vector fits in a
single packet. So currently it does clip the argv size to a (sysctl) limit,
but only when execve auditing is enabled.

If the audit protocol gets extended to allow for multiple packets this check
can be removed.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ollie Wild
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:45 +0800
f34e3b61f use the new percpu interface for shared data ... Browse Code »

Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.

This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.

Signed-off-by: Fenghua Yu
Acked-by: Suresh Siddha
Cc: Rusty Russell
Cc: Christoph Lameter
Cc: "Luck, Tony"
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fenghua Yu
2007-07-20 01:04:45 +0800
3d7e33825 jprobes: make jprobes a little safer for users ... Browse Code »

I realise jprobes are a razor-blades-included type of interface, but that
doesn't mean we can't try and make them safer to use. This guy I know once
wrote code like this:

struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };

And then his kernel exploded. Oops.

This patch adds an arch hook, arch_deref_entry_point() (I don't like it
either) which takes the void * in a struct jprobe, and gives back the text
address that it represents.

We can then use that in register_jprobe() to check that the entry point we're
passed is actually in the kernel text, rather than just some random value.

Signed-off-by: Michael Ellerman
Cc: Prasanna S Panchamukhi
Acked-by: Ananth N Mavinakayanahalli
Cc: Anil S Keshavamurthy
Cc: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Ellerman
2007-07-20 01:04:44 +0800
77afcf78a PM: Integrate beeping flag with existing acpi_sleep flags ... Browse Code »

Move "debug during resume from s2ram" into the variable we already use
for real-mode flags to simplify code. It also closes nasty trap for
the user in acpi_sleep_setup; order of parameters actually mattered there,
acpi_sleep=s3_bios,s3_mode doing something different from
acpi_sleep=s3_mode,s3_bios.

Signed-off-by: Pavel Machek
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Machek
2007-07-20 01:04:43 +0800
5a60d6235 PM: Optional beeping during resume from suspend to RAM ... Browse Code »

Add a feature allowing the user to make the system beep during a resume from
suspend to RAM, on x86_64 and i386.

This is useful for the users with broken resume from RAM, so that they can
verify if the control reaches the kernel after a wake-up event.

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nigel Cunningham
2007-07-20 01:04:43 +0800
bd804eba1 PM: Introduce pm_power_off_prepare ... Browse Code »

Introduce the pm_power_off_prepare() callback that can be registered by the
interested platforms in analogy with pm_idle() and pm_power_off(), used for
preparing the system to power off (needed by ACPI).

This allows us to drop acpi_sysclass and device_acpi that are only defined in
order to register the ACPI power off preparation callback, which is needed by
pm_power_off() registered in a much different way.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
6c961dfb7 PM: Reduce code duplication between main.c and user.c ... Browse Code »

The SNAPSHOT_S2RAM ioctl code is outdated and it should not duplicate the
suspend code in kernel/power/main.c. Fix that.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Nigel Cunningham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
ccd4b65ae PM: prevent frozen user mode helpers from failing the freezing of tasks ... Browse Code »

At present, if a user mode helper is running while
usermodehelper_pm_callback() is executed, the helper may be frozen and the
completion in call_usermodehelper_exec() won't be completed until user
space processes are thawed. As a result, the freezing of kernel threads
may fail, which is not desirable.

Prevent this from happening by introducing a counter of running user mode
helpers and allowing usermodehelper_pm_callback() to succeed for action =
PM_HIBERNATION_PREPARE or action = PM_SUSPEND_PREPARE only if there are no
helpers running. [Namely, usermodehelper_pm_callback() waits for at most
RUNNING_HELPERS_TIMEOUT for the number of running helpers to become zero
and fails if that doesn't happen.]

Special thanks to Uli Luckas , Pavel Machek
and Oleg Nesterov for reviewing the
previous versions of this patch and for very useful comments.

Signed-off-by: Rafael J. Wysocki
Acked-by: Uli Luckas
Acked-by: Nigel Cunningham
Acked-by: Pavel Machek
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
8cdd4936c PM: disable usermode helper before hibernation and suspend ... Browse Code »

Use a hibernation and suspend notifier to disable the user mode helper before
a hibernation/suspend and enable it after the operation.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Acked-by: Nigel Cunningham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
b10d91174 PM: introduce hibernation and suspend notifiers ... Browse Code »

Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume()
routines.

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Nigel Cunningham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
c2cf7d87d Freezer: remove redundant check in try_to_freeze_tasks ... Browse Code »

We don't need to check if todo is positive before calling time_after() in
try_to_freeze_tasks(), because if todo is zero at this point, the loop will be
broken anyway due to the while () condition being false.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
e7cd8a722 Freezer: return int from freeze_processes ... Browse Code »

Make try_to_freeze_tasks() and freeze_processes() return -EBUSY on failure
instead of the number of unfrozen tasks (none of the callers actually uses
this number).

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
f4a3a7d60 Freezer: use __set_current_state in refrigerator ... Browse Code »

Use __set_current_state() as appropriate in refrigerator() instead of
accessing current->state directly.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
0c1eecfb3 Freezer: avoid freezing kernel threads prematurely ... Browse Code »

Kernel threads should not have TIF_FREEZE set when user space processes are
being frozen, since otherwise some of them might be frozen prematurely.
To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE
unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks()
check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags
when user space processes are to be frozen.

Namely, when user space processes are being frozen, we only should set
TIF_FREEZE for tasks that have p->mm different from NULL and don't have
PF_BORROWED_MM set in p->flags. For this reason task_lock() must be used to
prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which
p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p). Also, we
need to prevent the following scenario from happening:

* daemonize() is called by a task spawned from a user space code path
* freezer checks if the task has p->mm set and the result is positive
* task enters exit_mm() and clears its TIF_FREEZE
* freezer sets TIF_FREEZE for the task
* task calls try_to_freeze() and goes to the refrigerator, which is wrong at
that point

This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and
p->mm are examined and release it after TIF_FREEZE is set for p (or it turns
out that TIF_FREEZE should not be set).

Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Cc: Nigel Cunningham
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
b1457bcc3 Hibernation: prepare to enter the low power state ... Browse Code »

During hibernation we call hibernation_ops->prepare() before creating the image,
but then, before saving it, we cancel the power transition by calling
hibernation_ops->finish(). Thus prior to calling hibernation_ops->enter() we
should let the platform firmware know that we're going to enter the low power
state after all.

Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Cc: Nigel Cunningham
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800
10a1803d6 swsusp: fix hibernation code ordering ... Browse Code »

Change the code ordering so that hibernation_ops->prepare() is called after
device_suspend(). This is needed so that we don't violate the ACPI
specification, which states that the _PTS and _GTS system-control methods,
executed from acpi_sleep_prepare(), ought to be called after devices have been
put in low power states.

The "Finish" label in hibernation_restore() is moved, because device_suspend()
resumes devices if the suspending of them fails and the restore code ordering
should reflect the hibernation code ordering.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Cc: Nigel Cunningham
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-20 01:04:42 +0800