Eric Lee / smarc-fsl-linux-kernel

01 May, 2013

40 commits

31c3a3fe0 kexec: Use min() and min_t() to simplify logic ... Browse Code »

Simplify the logic of variable assignments.

[akpm@linux-foundation.org: replace min_t with min, remove unneeded casts]
Signed-off-by: Zhang Yanfei
Cc: "Eric W. Biederman"
Reviewed-by: Simon Horman
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Yanfei
2013-05-01 08:04:07 +0800
310faaa9b kexec: fix wrong types of some local variables ... Browse Code »

The types of the following local variables:

- ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment()

- r in vmcoreinfo_append_str()

are wrong, so fix them.

Signed-off-by: Zhang Yanfei
Cc: "Eric W. Biederman"
Cc: Simon Horman
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Yanfei
2013-05-01 08:04:07 +0800
e56fb2874 exec: do not abuse ->cred_guard_mutex in threadgroup_lock() ... Browse Code »

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones
Acked-by: Tejun Heo
Acked-by: Li Zefan
Signed-off-by: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:07 +0800
12eaaf309 set_task_comm: kill the pointless memset() + wmb() ... Browse Code »

set_task_comm() does memset() + wmb() before strlcpy(). This buys
nothing and to add to the confusion, the comment is wrong.

- We do not need memset() to be "safe from non-terminating string
reads", the final char is always zero and we never change it.

- wmb() is paired with nothing, it cannot prevent from printing
the mixture of the old/new data unless the reader takes the lock.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:07 +0800
830e0fc96 fs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes ... Browse Code »

Currently, a write to a procfs file will return the number of bytes
successfully written. If the actual string is longer than this, the
remainder of the string will not be be written and userspace will
complete the operation by issuing additional write()s.

Hence

$ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

results in

$ cat /proc/$$/comm
pqrs

since the final four bytes were written with a second write() since
TASK_COMM_LEN == 16. This is obviously an undesired result and not
equivalent to prctl(PR_SET_NAME). The implementation should not need to
know the definition of TASK_COMM_LEN.

This patch truncates the string to the first TASK_COMM_LEN bytes and
returns the bytes written as the length of the string written so the
second write() is suppressed.

$ cat /proc/$$/comm
abcdefghijklmno

Signed-off-by: David Rientjes
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2013-05-01 08:04:07 +0800
dc7ee2aac coredump: change wait_for_dump_helpers() to use wait_event_interruptible() ... Browse Code »

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop. This is not needed and in fact this is not
strictly correct, we can/should do this only once after we change
pipe->writers. We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can also
help to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this is
fine. Once we see pipe->readers == 1 we know that the handler
decremented the counter, this is all we need.

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
079148b91 coredump: factor out the setting of PF_DUMPCORE ... Browse Code »

Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
zap_threads() called by do_coredump().

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
528f827ee coredump: introduce dump_interrupted() ... Browse Code »

By discussion with Mandeep.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true. dump_seek() does this only
before f_op->llseek(), otherwise it relies on dump_write().

We need this change to ensure that the coredump won't delay suspend, and
to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
lot of time. In particular this can help oom-killer.

We add the new trivial helper, dump_interrupted() to add the comments and
to simplify the potential freezer changes. Perhaps it will have more
callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
change dump_write() to restart if f_op->write() fails because of
freezing(). We need to handle the short writes, we need to clear
TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
it to check PF_DUMPCORE). And if the buggy f_op->write() sets
TIF_SIGPENDING we can not distinguish this case from the race with
freeze_task() + __thaw_task().

So we simply accept the fact that the freezer can truncate a core-dump but
at least you can reliably suspend. Hopefully we can tolerate this
unlikely case and the necessary complications doesn't worth a trouble.
But if we decide to make the coredumping freezable later we can do this on
top of this change.

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
acdedd99b coredump: sanitize the setting of signal->group_exit_code ... Browse Code »

Now that the coredumping process can be SIGKILL'ed, the setting of
->group_exit_code in do_coredump() can race with complete_signal() and
SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
SIGKILL | 0x80.

But the main problem is that it is not clear to me what should we do if
binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
comes as a separate change.

This patch adds 0x80 if ->core_dump() succeeds and the process was not
killed. But perhaps we can (should?) re-set ->group_exit_code changed by
SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
6cd8f0aca coredump: ensure that SIGKILL always kills the dumping thread ... Browse Code »

prepare_signal() blesses SIGKILL sent to the dumping process but this
signal can be "lost" anyway. The problems is, complete_signal() sees
SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
dumping process is single-threaded (so the target is always "correct"),
the group-wide SIGKILL is not recorded in task->pending and thus
__fatal_signal_pending() won't be true. A multi-threaded case has even
more problems.

And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
right to me. This coredumping process is not exiting yet, it can do a lot
of work dumping the core.

With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
signal->group_exit_task instead. This makes signal_group_exit() true and
thus this should equally close the races with exit/exec/stop but allows to
kill the dumping thread reliably.

Notes:
- It is not clear what should we do with ->group_exit_code
if the dumper was killed, see the next change.

- we need more (hopefully straightforward) changes to ensure
that SIGKILL actually interrupts the coredump. Basically we
need to check __fatal_signal_pending() in dump_write() and
dump_seek().

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
403bad72b coredump: only SIGKILL should interrupt the coredumping task ... Browse Code »

There are 2 well known and ancient problems with coredump/signals, and a
lot of related bug reports:

- do_coredump() clears TIF_SIGPENDING but of course this can't help
if, say, SIGCHLD comes after that.

In this case the coredump can fail unexpectedly. See for example
wait_for_dump_helper()->signal_pending() check but there are other
reasons.

- At the same time, dumping a huge core on the slow media can take a
lot of time/resources and there is no way to kill the coredumping
task reliably. In particular this is not oom_kill-friendly.

This patch tries to fix the 1st problem, and makes the preparation for the
next changes.

We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
that this process dumps the core. prepare_signal() checks this flag and
nacks any signal except SIGKILL.

Note that this check tries to be conservative, in the long term we should
probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
discussion. See marc.info/?l=linux-kernel&m=120508897917439

Notes:
- recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
The patch assumes that dump_write/etc paths should never
call it, but we can change it as well.

- There is another source of TIF_SIGPENDING, freezer. This
will be addressed separately.

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
66e5b7e19 kmod: remove call_usermodehelper_fns() ... Browse Code »

This function suffers from not being able to determine if the cleanup is
called in case it returns -ENOMEM. Nobody is using it anymore, so let's
remove it.

Signed-off-by: Lucas De Marchi
Cc: Oleg Nesterov
Cc: David Howells
Cc: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:06 +0800
907ed1328 usermodehelper: split remaining calls to call_usermodehelper_fns() ... Browse Code »

These are the only users of call_usermodehelper_fns(). This function
suffers from not being able to determine if the cleanup is called. Even
if in this places the cleanup pointer is NULL, convert them to use the
separate call_usermodehelper_setup() + call_usermodehelper_exec()
functions so we can remove the _fns variant.

Signed-off-by: Lucas De Marchi
Cc: Oleg Nesterov
Cc: David Howells
Cc: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:06 +0800
fb96c475f coredump: remove trailling whitespace ... Browse Code »

Signed-off-by: Lucas De Marchi
Cc: Oleg Nesterov
Cc: David Howells
Cc: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:06 +0800
93997f6dd KEYS: split call to call_usermodehelper_fns() ... Browse Code »

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case there's an OOM in this last
function the cleanup function may not be called - in this case we would
miss a call to key_put().

Signed-off-by: Lucas De Marchi
Cc: Oleg Nesterov
Acked-by: David Howells
Acked-by: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:06 +0800
f634460c9 kmod: split call to call_usermodehelper_fns() ... Browse Code »

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case the latter returns -ENOMEM the
cleanup function may had not been called - in this case we would not free
argv and module_name.

Signed-off-by: Lucas De Marchi
Cc: Oleg Nesterov
Cc: David Howells
Cc: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:06 +0800
938e4b22e usermodehelper: export call_usermodehelper_exec() and call_usermodehelper_setup() ... Browse Code »

call_usermodehelper_setup() + call_usermodehelper_exec() need to be
called instead of call_usermodehelper_fns() when the cleanup function
needs to be called even when an ENOMEM error occurs. In this case using
call_usermodehelper_fns() the user can't distinguish if the cleanup
function was called or not.

[akpm@linux-foundation.org: export call_usermodehelper_setup() to modules]
Signed-off-by: Lucas De Marchi
Reviewed-by: Oleg Nesterov
Cc: David Howells
Cc: James Morris
Cc: Al Viro
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2013-05-01 08:04:05 +0800
17afab1de selftest: add a test case for PTRACE_PEEKSIGINFO ... Browse Code »

* Dump signals from process-wide and per-thread queues with
different sizes of buffers.
* Check error paths for buffers with restricted permissions. A part of
buffer or a whole buffer is for read-only.
* Try to get nonexistent signal.

Signed-off-by: Andrew Vagin
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Cc: David Howells
Cc: Dave Jones
Cc: "Michael Kerrisk (man-pages)"
Cc: Pavel Emelyanov
Cc: Linus Torvalds
Cc: Pedro Alves
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2013-05-01 08:04:05 +0800
84c751bd4 ptrace: add ability to retrieve signals without removing from a queue (v4) ... Browse Code »

This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

This request is used to retrieve information about pending signals
starting with the specified sequence number. Siginfo_t structures are
copied from the child into the buffer starting at "data".

The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
struct ptrace_peeksiginfo_args {
u64 off; /* from which siginfo to start */
u32 flags;
s32 nr; /* how may siginfos to take */
};

"nr" has type "s32", because ptrace() returns "long", which has 32 bits on
i386 and a negative values is used for errors.

Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
signals from process-wide queue. If this flag is not set, signals are
read from a per-thread queue.

The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a
signal with the specified sequence number doesn't exist, ptrace returns
zero. The request returns an error, if no signal has been dumped.

Errors:
EINVAL - one or more specified flags are not supported or nr is negative
EFAULT - buf or addr is outside your accessible address space.

A result siginfo contains a kernel part of si_code which usually striped,
but it's required for queuing the same siginfo back during restore of
pending signals.

This functionality is required for checkpointing pending signals. Pedro
Alves suggested using it in "gdb" to peek at pending signals. gdb already
uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
dequeued. This functionality allows gdb to look at the pending signals
which were not reported yet.

The prototype of this code was developed by Oleg Nesterov.

Signed-off-by: Andrew Vagin
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Cc: David Howells
Cc: Dave Jones
Cc: "Michael Kerrisk (man-pages)"
Cc: Pavel Emelyanov
Cc: Linus Torvalds
Cc: Pedro Alves
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2013-05-01 08:04:05 +0800
865f38a3a hfsplus: remove duplicated message prefix in hfsplus_block_free() ... Browse Code »

Signed-off-by: Vyacheslav Dubeyko
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Hin-Tak Leung
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-05-01 08:04:05 +0800
d7a475d0c hfsplus: add error propagation to __hfsplus_ext_write_extent() ... Browse Code »

__hfsplus_ext_write_extent() suppresses errors coming from
hfs_brec_find(). The patch implements error code propagation.

Signed-off-by: Alexey Khoroshilov
Reviewed-by: Vyacheslav Dubeyko
Cc: Hin-Tak Leung
Cc: Al Viro
Cc: Artem Bityutskiy
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Khoroshilov
2013-05-01 08:04:05 +0800
d61426732 hfs/hfsplus: convert printks to pr_<level> ... Browse Code »

Use a more current logging style.

Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
hfsplus now uses "hfsplus: " for all messages.
Coalesce formats.
Prefix debugging messages too.

Signed-off-by: Joe Perches
Cc: Vyacheslav Dubeyko
Cc: Hin-Tak Leung
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-05-01 08:04:05 +0800
c2b3e1f76 hfs/hfsplus: convert dprint to hfs_dbg ... Browse Code »

Use a more current logging style.

Rename macro and uses.
Add do {} while (0) to macro.
Add DBG_ to macro.
Add and use hfs_dbg_cont variant where appropriate.

Signed-off-by: Joe Perches
Cc: Vyacheslav Dubeyko
Cc: Hin-Tak Leung
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-05-01 08:04:05 +0800
5f3726f94 hfsplus: fix warnings in fs/hfsplus/bfind.c ... Browse Code »

fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
(1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
(2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

[akpm@linux-foundation.org: make the workaround more explicit]
Signed-off-by: Vyacheslav Dubeyko
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-05-01 08:04:05 +0800
9509f1785 hfs: add error checking for hfs_find_init() ... Browse Code »

hfs_find_init() may fail with ENOMEM, but there are places, where the
returned value is not checked. The consequences can be very unpleasant,
e.g. kfree uninitialized pointer and inappropriate mutex unlocking.

The patch adds checks for errors in hfs_find_init().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov
Reviewed-by: Vyacheslav Dubeyko
Cc: Hin-Tak Leung
Cc: Al Viro
Cc: Artem Bityutskiy
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Khoroshilov
2013-05-01 08:04:05 +0800
eb53b6db7 nilfs2: remove unneeded test in nilfs_writepage() ... Browse Code »

page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
unneeded test.

The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
error: we previously assumed 'inode' could be null (see line 195)".

Reported-by: Dan Carpenter
Signed-off-by: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-05-01 08:04:05 +0800
dc33f5f3c nilfs2: fix using of PageLocked() in nilfs_clear_dirty_page() ... Browse Code »

Change test_bit(PG_locked, &page->flags) to PageLocked().

Signed-off-by: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-05-01 08:04:04 +0800
8c26c4e26 nilfs2: fix issue with flush kernel thread after remount in RO mode because of d… ... Browse Code »

…river's internal error or metadata corruption

The NILFS2 driver remounts itself in RO mode in the case of discovering
metadata corruption (for example, discovering a broken bmap). But
usually, this takes place when there have been file system operations
before remounting in RO mode.

Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
modified inodes' address spaces. It results in flush kernel thread's
infinite trying to flush dirty pages in RO mode. As a result, it is
possible to see such side effects as: (1) flush kernel thread occupies
50% - 99% of CPU time; (2) system can't be shutdowned without manual
power switch off.

SYMPTOMS:
(1) System log contains error message: "Remounting filesystem read-only".
(2) The flush kernel thread occupies 50% - 99% of CPU time.
(3) The system can't be shutdowned without manual power switch off.

REPRODUCTION PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------
#!/bin/bash

VG=unencrypted
#apt-get install nilfs-tools darcs
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

(3) Try to shutdown the system.

REPRODUCIBILITY: 100%

FIX:

This patch implements checking mount state of NILFS2 driver in
nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
methods. If it is detected the RO mount state then all dirty pages are
simply discarded with warning messages is written in system log.

[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Vyacheslav Dubeyko
2013-05-01 08:04:04 +0800
9151b3982 i2o: check copy_from_user() size parameter ... Browse Code »

Limit the size of the copy so we don't corrupt memory. Hopefully this
can only be called by root, but fixing this makes the static checkers
happier.

Signed-off-by: Dan Carpenter
Cc: Jiri Kosina
Cc: Masanari Iida
Cc: Alan Cox
Cc: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2013-05-01 08:04:04 +0800
79bae42d5 dmi_scan: refactor dmi_scan_machine(), {smbios,dmi}_present() ... Browse Code »

Move the calls to memcpy_fromio() up into the loop in
dmi_scan_machine(), and move the signature checks back down into
dmi_decode(). We need to check at 16-byte intervals but keep a 32-byte
buffer for an SMBIOS entry, so shift the buffer after each iteration.

Merge smbios_present() into dmi_present(), so we look for an SMBIOS
signature at the beginning of the given buffer and then for a DMI
signature at an offset of 16 bytes.

[artem.savkov@gmail.com: use proper buf type in dmi_present()]
Signed-off-by: Ben Hutchings
Reported-by: Tim McGrath
Tested-by: Tim Mcgrath
Cc: Zhenzhong Duan
Signed-off-by: Artem Savkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Hutchings
2013-05-01 08:04:04 +0800
c1d025e22 binfmt_elf: PIE: make PF_RANDOMIZE check comment more accurate ... Browse Code »

The comment I originally added in commit a3defbe5c337 ("binfmt_elf: fix
PIE execution with randomization disabled") is not really 100% accurate
-- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset
in runtime.

Another option of course is direct modification of personality flags
(i.e. running through setarch wrapper).

Make the comment more explicit and accurate.

Signed-off-by: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Kosina
2013-05-01 08:04:04 +0800
2535e0d72 fs: make binfmt support for #! scripts modular and removable ... Browse Code »

Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
for interpreted scripts starting with "#!"; allow compiling out that
support, or building it as a module. Embedded systems running exclusively
compiled binaries could leave this support out, and systems that don't
need scripts before mounting the root filesystem can build this as a
module.

Signed-off-by: Josh Triplett
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2013-05-01 08:04:04 +0800
d6d67e723 epoll: cleanup: use RCU_INIT_POINTER when nulling ... Browse Code »

It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results
in slightly smaller/faster code.

Signed-off-by: Eric Wong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Wong
2013-05-01 08:04:04 +0800
450d89ec0 epoll: cleanup: hoist out f_op->poll calls ... Browse Code »

This reduces the amount of code inside the ready list iteration loops for
better readability IMHO.

Signed-off-by: Eric Wong
Cc: Davide Libenzi
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Wong
2013-05-01 08:04:04 +0800
ddf676c38 epoll: lock ep->mtx in ep_free to silence lockdep ... Browse Code »

Technically we do not need to hold ep->mtx during ep_free since we are
certain there are no other users of ep at that point. However, lockdep
complains with a "suspicious rcu_dereference_check() usage!" message; so
lock the mutex before ep_remove to silence the warning.

Signed-off-by: Eric Wong
Cc: Al Viro
Cc: Arve Hjønnevåg
Cc: Davide Libenzi
Cc: Eric Dumazet
Cc: NeilBrown ,
Cc: Rafael J. Wysocki
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Wong
2013-05-01 08:04:04 +0800
eea1d5859 epoll: use RCU to protect wakeup_source in epitem ... Browse Code »

This prevents wakeup_source destruction when a user hits the item with
EPOLL_CTL_MOD while ep_poll_callback is running.

Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2"

Signed-off-by: Eric Wong
Cc: Alexander Viro
Cc: Arve Hjønnevåg
Cc: Davide Libenzi
Cc: Eric Dumazet
Cc: NeilBrown
Cc: "Rafael J. Wysocki"
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Wong
2013-05-01 08:04:04 +0800
39732ca5a epoll: trim epitem by one cache line ... Browse Code »

It is common for epoll users to have thousands of epitems, so saving a
cache line on every allocation leads to large memory savings.

Since epitem allocations are cache-aligned, reducing sizeof(struct
epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
cache line boundary on x86_64.

Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my
x86_64 Core2 Duo (which has 64-byte cache alignment):

object_size : 192 => 128
objs_per_slab: 21 => 32

Also, add a BUILD_BUG_ON() to check for future accidental breakage.

[akpm@linux-foundation.org: use __packed, for all architectures]
Signed-off-by: Eric Wong
Cc: Davide Libenzi
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Wong
2013-05-01 08:04:04 +0800
4a22f1663 kernel/timer.c: move some non timer related syscalls to kernel/sys.c ... Browse Code »

Andrew Morton noted:

akpm3:/usr/src/25> grep SYSCALL kernel/timer.c
SYSCALL_DEFINE1(alarm, unsigned int, seconds)
SYSCALL_DEFINE0(getpid)
SYSCALL_DEFINE0(getppid)
SYSCALL_DEFINE0(getuid)
SYSCALL_DEFINE0(geteuid)
SYSCALL_DEFINE0(getgid)
SYSCALL_DEFINE0(getegid)
SYSCALL_DEFINE0(gettid)
SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info)
COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)

Only one of those should be in kernel/timer.c. Who wrote this thing?

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Rothwell
Acked-by: Thomas Gleixner
Cc: Guenter Roeck
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2013-05-01 08:04:03 +0800
1043f65a5 kernel/timer.c: convert compat_sys_sysinfo to COMPAT_SYSCALL_DEFINE ... Browse Code »

Signed-off-by: Stephen Rothwell
Cc: Thomas Gleixner
Cc: Guenter Roeck
Cc: Al Viro
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2013-05-01 08:04:03 +0800
1a0df5944 kernel/compat.c: make do_sysinfo() static ... Browse Code »

The only use outside of kernel/timer.c was in kernel/compat.c, so move
compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c.

Signed-off-by: Stephen Rothwell
Cc: Thomas Gleixner
Cc: Guenter Roeck
Cc: Al Viro
Acked-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2013-05-01 08:04:03 +0800