Eric Lee / smarc-fsl-linux-kernel

18 Aug, 2010

1 commit

d7627467b Make do_execve() take a const filename pointer ... Browse Code »

Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells
Tested-by: Ralf Baechle
Acked-by: Russell King
Signed-off-by: Linus Torvalds

David Howells
2010-08-18 09:07:43 +0800

14 Aug, 2010

1 commit

c7dcf87a6 time: Workaround gcc loop optimization that causes 64bit div errors ... Browse Code »

Early 4.3 versions of gcc apparently aggressively optimize the raw
time accumulation loop, replacing it with a divide.

On 32bit systems, this causes the following link errors:
undefined reference to `__umoddi3'
undefined reference to `__udivdi3'

The gcc issue has been fixed in 4.4 and greater.

This patch replaces the accumulation loop with a do_div, as suggested
by Linus.

Signed-off-by: John Stultz
CC: Jason Wessel
CC: Larry Finger
CC: Ingo Molnar
CC: Linus Torvalds
Signed-off-by: Linus Torvalds

John Stultz
2010-08-14 03:03:24 +0800

13 Aug, 2010

4 commits

2069601b3 Revert "fsnotify: store struct file not struct path" ... Browse Code »

This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.

Signed-off-by: Linus Torvalds

Linus Torvalds
2010-08-13 05:23:04 +0800
26df0766a Merge branch 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus ... Browse Code »

* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
param: don't deref arg in __same_type() checks
param: update drivers/acpi/debug.c to new scheme
param: use module_param in drivers/message/fusion/mptbase.c
ide: use module_param_named rather than module_param_call
param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
param: lock myri10ge_fw_name against sysfs changes.
param: simple locking for sysfs-writable charp parameters
param: remove unnecessary writable charp
param: add kerneldoc to moduleparam.h
param: locking for kernel parameters
param: make param sections const.
param: use free hook for charp (fix leak of charp parameters)
param: add a free hook to kernel_param_ops.
param: silence .init.text references from param ops
Add param ops struct for hvc_iucv driver.
nfs: update for module_param_named API change
AppArmor: update for module_param_named API change
param: use ops in struct kernel_param, rather than get and set fns directly
param: move the EXPORT_SYMBOL to after the definitions.
...

Linus Torvalds
2010-08-13 01:01:59 +0800
deda2e819 timekeeping: Fix overflow in rawtime tv_nsec on 32 bit archs ... Browse Code »

The tv_nsec is a long and when added to the shifted interval it can wrap
and become negative which later causes looping problems in the
getrawmonotonic(). The edge case occurs when the system has slept for
a short period of time of ~2 seconds.

A trace printk of the values in this patch illustrate the problem:

ftrace time stamp: log
43.716079: logarithmic_accumulation: raw: 3d0913 tv_nsec d687faa
43.718513: logarithmic_accumulation: raw: 3d0913 tv_nsec da588bd
43.722161: logarithmic_accumulation: raw: 3d0913 tv_nsec de291d0
46.349925: logarithmic_accumulation: raw: 7a122600 tv_nsec e1f9ae3
46.349930: logarithmic_accumulation: raw: 1e848980 tv_nsec 8831c0e3

The kernel starts looping at 46.349925 in the getrawmonotonic() due to
the negative value from adding the raw value to tv_nsec.

A simple solution is to accumulate into a u64, and then normalize it
to a timespec_t.

Signed-off-by: Jason Wessel
[ Reworked variable names and simplified some of the code. - John ]
Signed-off-by: John Stultz
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Signed-off-by: Linus Torvalds

Jason Wessel
2010-08-13 00:53:39 +0800
12fdff3fc Add a dummy printk function for the maintenance of unused printks ... Browse Code »

Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-08-13 00:51:35 +0800

12 Aug, 2010

3 commits

8d57a98cc block: add secure discard ... Browse Code »

Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.

Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Hunter
2010-08-12 23:43:30 +0800
d78a3eda6 kernel/kfifo.c: add handling of chained scatterlists ... Browse Code »

The current kfifo scatterlist implementation will not work with chained
scatterlists. It assumes that struct scatterlist arrays are allocated
contiguously, which is not the case when chained scatterlists (struct
sg_table) are in use.

Signed-off-by: Stefani Seibold
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stefani Seibold
2010-08-12 23:43:29 +0800
5af568cbd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
isofs: Fix lseek() to position beyond 4 GB
vfs: remove unused MNT_STRICTATIME
vfs: show unreachable paths in getcwd and proc
vfs: only add " (deleted)" where necessary
vfs: add prepend_path() helper
vfs: __d_path: dont prepend the name of the root dentry
ia64: perfmon: add d_dname method
vfs: add helpers to get root and pwd
cachefiles: use path_get instead of lone dget
fs/sysv/super.c: add support for non-PDP11 v7 filesystems
V7: Adjust sanity checks for some volumes
Add v7 alias
v9fs: fixup for inode_setattr being removed

Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.

Linus Torvalds
2010-08-12 00:23:32 +0800

11 Aug, 2010

22 commits

2e956fb32 kfifo: replace the old non generic API ... Browse Code »

Simply replace the whole kfifo.c and kfifo.h files with the new generic
version and fix the kerneldoc API template file.

Signed-off-by: Stefani Seibold
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stefani Seibold
2010-08-11 23:59:23 +0800
4201d9a8e kfifo: add the new generic kfifo API ... Browse Code »

Add the new version of the kfifo API files kfifo.c and kfifo.h.

Signed-off-by: Stefani Seibold
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stefani Seibold
2010-08-11 23:59:23 +0800
f65a03f6a kexec: return -EFAULT on copy_to_user() failures ... Browse Code »

copy_to/from_user() returns the number of bytes remaining to be copied.
It never returns a negative value. The correct return code is -EFAULT and
not -EIO.

All the callers check for non-zero returns so that's Ok, but the return
code is passed to the user so we should fix this.

Signed-off-by: Dan Carpenter
Cc: Hidetoshi Seto
Cc: "Paul E. McKenney"
Cc: "Eric W. Biederman"
Cc: Simon Kagstrom
Acked-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2010-08-11 23:59:22 +0800
863a60492 lib/bug.c: add oops end marker to WARN implementation ... Browse Code »

We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.

Signed-off-by: Anton Blanchard
Cc: Ingo Molnar
Cc: Arjan van de Ven
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2010-08-11 23:59:22 +0800
c7ff0d9c9 panic: keep blinking in spite of long spin timer mode ... Browse Code »

To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.

This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.

The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.

Signed-off-by: TAMUKI Shoichi
Cc: Ben Dooks
Cc: Russell King
Acked-by: Dmitry Torokhov
Cc: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

TAMUKI Shoichi
2010-08-11 23:59:22 +0800
c52b0b91b pids: alloc_pidmap: remove the unnecessary boundary checks ... Browse Code »

alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map->page twice. This is correct, we want to find the
unused bits < offset in this bitmap block. Add the comment.

But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map->page for the second time. We have already
already checked the bits >= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.

Remove this hard-to-understand code. It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.

Signed-off-by: Oleg Nesterov
Cc: Salman Qazi
Cc: Ingo Molnar
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2010-08-11 23:59:20 +0800
5fdee8c4a pids: fix a race in pid generation that causes pids to be reused immediately ... Browse Code »

A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program. This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.

Race Description:

A B

// pid == offset == n // pid == offset == n + 1
test_and_set_bit(offset, map->page)
test_and_set_bit(offset, map->page);
pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
// pid == n + 1 is freed (wait())

// Next fork()...
last = pid_ns->last_pid; // == n
pid = last + 1;

Code to reproduce it (Running multiple instances is more effective):

#include
#include
#include
#include
#include
#include

// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
return (second + 32768 - first) % 32768;
}

int main(int argc, char* argv[]) {
int failed = 0;
pid_t last_pid = 0;
int i;
printf("%d\n", sizeof(pid_t));
for (i = 0; i < 10000000; ++i) {
if (i % 32786 == 0)
printf("Iter: %d\n", i/32768);
int child_exit_code = i % 256;
pid_t pid = fork();
if (pid == -1) {
fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
exit(1);
}
if (pid == 0) {
// Child
exit(child_exit_code);
} else {
// Parent
if (i > 0) {
int distance = PidDistance(last_pid, pid);
if (distance == 0 || distance > 30000) {
fprintf(stderr,
"Unexpected pid sequence: previous fork: pid=%d, "
"current fork: pid=%d for iteration=%d.\n",
last_pid, pid, i);
failed = 1;
}
}
last_pid = pid;
int status;
int reaped = wait(&status);
if (reaped != pid) {
fprintf(stderr,
"Wait return value: expected pid=%d, "
"got %d, iteration %d\n",
pid, reaped, i);
failed = 1;
} else if (WEXITSTATUS(status) != child_exit_code) {
fprintf(stderr,
"Unexpected exit status %x, iteration %d\n",
WEXITSTATUS(status), i);
failed = 1;
}
}
}
exit(failed);
}

Thanks to Ted Tso for the key ideas of this implementation.

Signed-off-by: Salman Qazi
Cc: Ingo Molnar
Cc: Theodore Ts'o
Cc: Peter Zijlstra
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Salman
2010-08-11 23:59:20 +0800
c7e49c148 ptrace: optimize exit_ptrace() for the likely case ... Browse Code »

exit_ptrace() takes tasklist_lock unconditionally. We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.

Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock. Change exit_ptrace() to drop and reacquire this lock if
needed.

This allows us to add the fastpath list_empty(ptraced) check. In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.

"Zhang, Yanmin" suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.

Suggested-and-tested-by: "Zhang, Yanmin"
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2010-08-11 23:59:19 +0800
e400c2852 cgroups: save space for the terminator ... Browse Code »

The original code didn't leave enough space for a NULL terminator. These
strings are copied with strcpy() into fixed length buffers in
cgroup_root_from_opts().

Signed-off-by: Dan Carpenter
Acked-by: Serge E. Hallyn
Reviewd-by: KAMEZAWA Hiroyuki
Cc: Paul Menage
Cc: Li Zefan
Cc: Ben Blum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2010-08-11 23:59:18 +0800
907b29eb4 param: locking for kernel parameters ... Browse Code »

There may be cases (most obviously, sysfs-writable charp parameters) where
a module needs to prevent sysfs access to parameters.

Rather than express this in terms of a big lock, the functions are
expressed in terms of what they protect against. This is clearer, esp.
if the implementation changes to a module-level or even param-level lock.

Signed-off-by: Rusty Russell
Reviewed-by: Takashi Iwai
Tested-by: Phil Carmody

Rusty Russell
2010-08-11 21:34:20 +0800
914dcaa84 param: make param sections const. ... Browse Code »

Since this section can be read-only (they're in .rodata), they should
always have been const. Minor flow-through various functions.

Signed-off-by: Rusty Russell
Tested-by: Phil Carmody

Rusty Russell
2010-08-11 21:34:19 +0800
a1054322a param: use free hook for charp (fix leak of charp parameters) ... Browse Code »

Instead of using a "I kmalloced this" flag, we keep track of the kmalloced
strings and use that list to check if we need to kfree (in practice, the
list is very short).

This means that kparams can be const again, and plugs a leak. This
is important for drivers/usb/gadget/nokia.c which gets modprobe/rmmod'ed
frequently on the N9000.

Signed-off-by: Rusty Russell
Reviewed-by: Takashi Iwai
Cc: Artem Bityutskiy
Tested-by: Phil Carmody

Rusty Russell
2010-08-11 21:34:18 +0800
e6df34a44 param: add a free hook to kernel_param_ops. ... Browse Code »

This allows us to generalize the KPARAM_KMALLOCED flag, by calling a function
on every parameter when a module is unloaded.

Signed-off-by: Rusty Russell
Reviewed-by: Takashi Iwai
Tested-by: Phil Carmody

Rusty Russell
2010-08-11 21:34:18 +0800
9bbb9e5a3 param: use ops in struct kernel_param, rather than get and set fns directly ... Browse Code »

This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.

The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).

Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).

To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.

Signed-off-by: Rusty Russell
Reviewed-by: Takashi Iwai
Tested-by: Phil Carmody
Cc: "David S. Miller"
Cc: Ville Syrjala
Cc: Dmitry Torokhov
Cc: Alessandro Rubini
Cc: Michal Januszewski
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Neil Brown
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org

Rusty Russell
2010-08-11 21:34:13 +0800
a14fe249a param: move the EXPORT_SYMBOL to after the definitions. ... Browse Code »

This is modern style, and good to do before we start changing things.

Signed-off-by: Rusty Russell
Reviewed-by: Takashi Iwai
Tested-by: Phil Carmody

Rusty Russell
2010-08-11 21:34:12 +0800
2e9fb9953 params: don't hand NULL values to param.set callbacks. ... Browse Code »

An audit by Dongdong Deng revealed that most driver-author-written param
calls don't handle val == NULL (which happens when parameters are specified
with no =, eg "foo" instead of "foo=1").

The only real case to use this is boolean, so handle it specially for that
case and remove a source of bugs for everyone else.

Signed-off-by: Rusty Russell
Cc: Dongdong Deng
Cc: Américo Wang

Rusty Russell
2010-08-11 21:34:11 +0800
f7ad3c6be vfs: add helpers to get root and pwd ... Browse Code »

Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.

get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2010-08-11 12:28:20 +0800
0caa62106 kernel/timer.c: fix kernel-doc function parameter warning ... Browse Code »

Fix kernel-doc warning, add @timer description:

Warning(kernel/timer.c:335): No description found for parameter 'timer'

Signed-off-by: Randy Dunlap
Cc: Thomas Gleixner
Signed-off-by: Linus Torvalds

Randy Dunlap
2010-08-11 06:33:09 +0800
2f9e825d3 Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.

Linus Torvalds
2010-08-11 06:22:42 +0800
b34d8915c Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux ... Browse Code »

* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit

Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.

Linus Torvalds
2010-08-11 03:07:51 +0800
8c8946f50 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify ... Browse Code »

* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
fanotify: use both marks when possible
fsnotify: pass both the vfsmount mark and inode mark
fsnotify: walk the inode and vfsmount lists simultaneously
fsnotify: rework ignored mark flushing
fsnotify: remove global fsnotify groups lists
fsnotify: remove group->mask
fsnotify: remove the global masks
fsnotify: cleanup should_send_event
fanotify: use the mark in handler functions
audit: use the mark in handler functions
dnotify: use the mark in handler functions
inotify: use the mark in handler functions
fsnotify: send fsnotify_mark to groups in event handling functions
fsnotify: Exchange list heads instead of moving elements
fsnotify: srcu to protect read side of inode and vfsmount locks
fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
fsnotify: use _rcu functions for mark list traversal
fsnotify: place marks on object in order of group memory address
vfs/fsnotify: fsnotify_close can delay the final work in fput
fsnotify: store struct file not struct path
...

Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.

Linus Torvalds
2010-08-11 02:39:13 +0800
5f248c9c2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...

Fix up trivial conflicts in fs/nilfs2/super.c

Linus Torvalds
2010-08-11 02:26:52 +0800

10 Aug, 2010

8 commits

8c4af38e9 gcc-4.6: printk: use stable variable to dump kmsg buffer ... Browse Code »

kmsg_dump takes care to sample the global variables
inside a spinlock, but then goes on to use the same
variables outside the spinlock region too.

Use the correct variable. This will make the race
window smaller.

Found by gcc 4.6's new warnings.

Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2010-08-10 11:45:06 +0800
878ae1274 stop_machine: struct cpu_stopper, remove alignment padding on 64 bits ... Browse Code »

Reorder elements in structure cpu_stopper to remove alignment padding on
64 bit builds, this shrinks its size from 40 to 32 bytes saving 8 bytes
per cpu.

Signed-off-by: Richard Kennedy
Acked-by: Tejun Heo
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Richard Kennedy
2010-08-10 11:45:06 +0800
459b37d42 kernel/range: remove unused definition of ARRAY_SIZE() ... Browse Code »

Remove duplicate definition of ARRAY_SIZE(), which was never used anyway.

Signed-off-by: Geert Uytterhoeven
Cc: Yinghai Lu
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2010-08-10 11:45:06 +0800
2ee7c922f sys_personality: remove the bogus checks in sys_personality()->__set_personality() path ... Browse Code »

Cleanup, no functional changes.

- __set_personality() always changes ->exec_domain/personality, the
special case when ->exec_domain remains the same buys nothing but
complicates the code. Unify both cases to simplify the code.

- The -EINVAL check in sys_personality() was never right. If we assume
that set_personality() can fail we should check the value it returns
instead of verifying that task->personality was actually changed.

Remove it. Before the previous patch it was possible to hit this case
due to overflow problems, but this -EINVAL just indicated the kernel
bug.

OTOH, probably it makes sense to change lookup_exec_domain() to return
ERR_PTR() instead of default_exec_domain if the search in exec_domains
list fails, and report this error to the user-space. But this means
another user-space change, and we have in-kernel users which need fixes.
For example, PER_OSF4 falls into PER_MASK for unkown reason and nobody
cares to register this domain.

Signed-off-by: Oleg Nesterov
Cc: Wenming Zhang
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2010-08-10 11:45:05 +0800
d2997b104 hibernation: freeze swap at hibernation ... Browse Code »

When taking a memory snapshot in hibernate_snapshot(), all (directly
called) memory allocations use GFP_ATOMIC. Hence swap misusage during
hibernation never occurs.

But from a pessimistic point of view, there is no guarantee that no page
allcation has __GFP_WAIT. It is better to have a global indication "we
enter hibernation, don't use swap!".

This patch tries to freeze new-swap-allocation during hibernation. (All
user processes are frozenm so swapin is not a concern).

This way, no updates will happen to swap_map[] between
hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
is called. We can be assured that swap corruption will not occur.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Rafael J. Wysocki"
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Cc: Ondrej Zary
Cc: Balbir Singh
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-08-10 11:45:04 +0800
a63d83f42 oom: badness heuristic rewrite ... Browse Code »

This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions. The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.

Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead. This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits. This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.

The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory. "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit. The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.

The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.

Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs. In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.

Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it. It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability. Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered. The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.

/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa. Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning. Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity. This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.

Signed-off-by: David Rientjes
Cc: Nick Piggin
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Oleg Nesterov
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-08-10 11:45:02 +0800
8e4228e1e oom: move sysctl declarations to oom.h ... Browse Code »

The three oom killer sysctl variables (sysctl_oom_dump_tasks,
sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better
declared in include/linux/oom.h rather than kernel/sysctl.c.

Signed-off-by: David Rientjes
Acked-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-08-10 11:44:57 +0800
ebabe9a90 pass a struct path to vfs_statfs ... Browse Code »

We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:

- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.

In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-10 04:48:42 +0800

09 Aug, 2010

1 commit

f6500947a workqueue: workqueue_cpu_callback() should be cpu_notifier instead of hotcpu_notifier ... Browse Code »

Commit 6ee0578b (workqueue: mark init_workqueues as early_initcall)
made workqueue SMP initialization depend on workqueue_cpu_callback(),
which however was registered as hotcpu_notifier() and didn't get
called if CONFIG_HOTPLUG_CPU is not set. This made gcwqs on non-boot
CPUs not create their initial workers leading to boot failures. Fix
it by making it a cpu_notifier.

Signed-off-by: Tejun Heo
Reported-and-bisected-by: walt
Tested-by: Markus Trippelsdorf

Tejun Heo
2010-08-09 17:50:34 +0800