Eric Lee / smarc-fsl-linux-kernel

25 Mar, 2012

1 commit

11bcb3284 Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
"Fix up files in fs/ and lib/ dirs to only use module.h if they really
need it.

These are trivial in scope vs the work done previously. We now have
things where any few remaining cleanups can be farmed out to arch or
subsystem maintainers, and I have done so when possible. What is
remaining here represents the bits that don't clearly lie within a
single arch/subsystem boundary, like the fs dir and the lib dir.

Some duplicate includes arising from overlapping fixes from
independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
lib: reduce the use of module.h wherever possible
fs: reduce the use of module.h wherever possible
includecheck: delete any duplicate instances of module.h

Linus Torvalds
2012-03-25 01:24:31 +0800

24 Mar, 2012

20 commits

f1d38e423 Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl ... Browse Code »

Pull sysctl updates from Eric Biederman:

- Rewrite of sysctl for speed and clarity.

Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
are no longer bottlenecks in the process of adding and removing
network devices.

sysctl is now focused on being a filesystem instead of system call
and the code can all be found in fs/proc/proc_sysctl.c. Hopefully
this means the code is now approachable.

Much thanks is owed to Lucian Grinjincu for keeping at this until
something was found that was usable.

- The recent proc_sys_poll oops found by the fuzzer during hibernation
is fixed.

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
sysctl: protect poll() in entries that may go away
sysctl: Don't call sysctl_follow_link unless we are a link.
sysctl: Comments to make the code clearer.
sysctl: Correct error return from get_subdir
sysctl: An easier to read version of find_subdir
sysctl: fix memset parameters in setup_sysctl_set()
sysctl: remove an unused variable
sysctl: Add register_sysctl for normal sysctl users
sysctl: Index sysctl directories with rbtrees.
sysctl: Make the header lists per directory.
sysctl: Move sysctl_check_dups into insert_header
sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
sysctl: Replace root_list with links between sysctl_table_sets.
sysctl: Add sysctl_print_dir and use it in get_subdir
sysctl: Stop requiring explicit management of sysctl directories
sysctl: Add a root pointer to ctl_table_set
sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
sysctl: Normalize the root_table data structure.
sysctl: Factor out insert_header and erase_header
...

Linus Torvalds
2012-03-24 09:08:58 +0800
1cc684ab7 kmod: make __request_module() killable ... Browse Code »

As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.

The task T uses "almost all" memory, then it does something which
triggers request_module(). Say, it can simply call sys_socket(). This
in turn needs more memory and leads to OOM. oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.

Make __request_module() killable. The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE. This memory is freed via
call_usermodehelper_freeinfo()->cleanup.

Reported-by: Tetsuo Handa
Signed-off-by: Oleg Nesterov
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:42 +0800
3e63a93b9 kmod: introduce call_modprobe() helper ... Browse Code »

No functional changes. Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().

Signed-off-by: Oleg Nesterov
Cc: Tetsuo Handa
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:42 +0800
5b9bd473e usermodehelper: ____call_usermodehelper() doesn't need do_exit() ... Browse Code »

Minor cleanup. ____call_usermodehelper() can simply return, no need to
call do_exit() explicitely.

Signed-off-by: Oleg Nesterov
Cc: Tetsuo Handa
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
9d944ef32 usermodehelper: kill umh_wait, renumber UMH_* constants ... Browse Code »
44

No functional changes. It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers. Kill this enum, use the plain int.

Signed-off-by: Oleg Nesterov
Cc: Tetsuo Handa
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
d0bd587a8 usermodehelper: implement UMH_KILLABLE ... Browse Code »

Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().

call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable. If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.

If call_usermodehelper_exec wins, it can safely return. umh_complete()
should get NULL and call call_usermodehelper_freeinfo().

Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".

Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE. We delay the neccessary cleanup to simplify the back
porting.

Signed-off-by: Oleg Nesterov
Cc: Tetsuo Handa
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
b34499225 usermodehelper: introduce umh_complete(sub_info) ... Browse Code »

Preparation. Add the new trivial helper, umh_complete(). Currently it
simply does complete(sub_info->complete).

Signed-off-by: Oleg Nesterov
Cc: Tetsuo Handa
Cc: Rusty Russell
Cc: Tejun Heo
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
a02d6fd64 signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/ ... Browse Code »

Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
send_signal().

It is also more efficient if we need to kill a lot of tasks because it
doesn't alloc sigqueue.

While at it, add the __fatal_signal_pending(task) check as a minor
optimization.

Signed-off-by: Oleg Nesterov
Cc: Tejun Heo
Cc: Anton Vorontsov
Cc: "Eric W. Biederman"
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
def8cf725 signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths ... Browse Code »

Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
paths. After the previous change it doesn't match the reality.

Signed-off-by: Oleg Nesterov
Cc: Tejun Heo
Cc: Anton Vorontsov
Cc: "Eric W. Biederman"
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
629d362b9 signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE ... Browse Code »

force_sig_info() and friends have the special semantics for synchronous
signals, this interface should not be used if the target is not current.
And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
is not exactly right.

However there are callers which have to use force_ exactly because it
clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
although this is almost always is wrong by various reasons.

With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
the signal comes from the ancestor namespace.

This makes the naming in prepare_signal() paths insane, fixed by the
next cleanup.

Note: this only affects SIGKILL/SIGSTOP, but this is enough for
force_sig() abusers.

Signed-off-by: Oleg Nesterov
Cc: Tejun Heo
Cc: Anton Vorontsov
Cc: "Eric W. Biederman"
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-03-24 07:58:41 +0800
ee00560c7 ptrace: remove PTRACE_SEIZE_DEVEL bit ... Browse Code »

PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.

Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Acked-by: Oleg Nesterov
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:41 +0800
aa9147c98 ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter ... Browse Code »

This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.

While we are at it:

Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter. To that end, error out if 'addr'
is non-zero. PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there... let's avoid
repeating this mistake with SEIZE.

Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
was probably ok (not a bug), but let's be on a safer side.

Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Cc: Pedro Alves
Reviewed-by: Oleg Nesterov
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:40 +0800
86b6c1f30 ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code ... Browse Code »

Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.

Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Reviewed-by: Oleg Nesterov
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:40 +0800
8c5cf9e5c ptrace: don't modify flags on PTRACE_SETOPTIONS failure ... Browse Code »

On ptrace(PTRACE_SETOPTIONS, pid, 0, ), we used to set those
option bits which are known, and then fail with -EINVAL if there are
some unknown bits in .

This is inconsistent with typical error handling, which does not change
any state if input is invalid.

This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
return -EINVAL and don't change any bits in task->ptrace.

It's very unlikely that there is userspace code in the wild which will
be affected by this change: it should have the form

ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)

where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel
headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
way userspace can use one if it defines one itself. I can't see why
anyone would do such a thing deliberately.

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Reviewed-by: Oleg Nesterov
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:40 +0800
b60f796c4 kernel/watchdog.c: add comment to watchdog() exit path ... Browse Code »

Revelation from Peter.

Cc: Peter Zijlstra
Cc: Don Zickus
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-03-24 07:58:32 +0800
4501980aa kernel/watchdog.c: convert to pr_foo() ... Browse Code »

It fixes some 80-col wordwrappings and adds some consistency.

Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-03-24 07:58:32 +0800
7a05c0f7b watchdog: make sure the watchdog thread gets CPU on loaded system ... Browse Code »

If the system is loaded while hotplugging a CPU we might end up with a
bogus hardlockup detection. This has been seen during LTP pounder test
executed in parallel with hotplug test.

The main problem is that enable_watchdog (called when CPU is brought up)
registers perf event which periodically checks per-cpu counter
(hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer
is fired from the kernel thread.

This means that while we already do check for the hard lockup the kernel
thread might be sitting on the runqueue with zillions of tasks so there
is nobody to update the value we rely on and so we KABOOM.

Let's fix this by boosting the watchdog thread priority before we wake
it up rather than when it's already running. This still doesn't handle
a case where we have the same amount of high prio FIFO tasks but that
doesn't seem to be common. The current implementation doesn't handle
that case anyway so this is not worse at least.

Unfortunately, we cannot start perf counter from the watchdog thread
because we could miss a real lock up and also we cannot start the
hrtimer watchdog_enable because we there is no way (at least I don't
know any) to start a hrtimer from a different CPU.

[dzickus@redhat.com: fix compile issue with param]
Cc: Ingo Molnar
Cc: Peter Zijlstra
Reviewed-by: Mandeep Singh Baines
Signed-off-by: Michal Hocko
Signed-off-by: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-03-24 07:58:32 +0800
397a21f24 kernel/exit.c: if init dies, log a signal which killed it, if any ... Browse Code »

I just received another user's pleas for help when their init
mysteriously died. I again explained that they need to check whether it
died because of bad instruction, a segv, or something else. Which was
an annoying detour into writing a trivial C program to spawn his init
and print its exit code:

http://lists.busybox.net/pipermail/busybox/2012-January/077172.html

I hear you saying "just test it under /bin/sh". Well, the crashing init
_was_ /bin/sh.

Which prompted me to make kernel do this first step automatically. We can
print exit code, which makes it possible to see that death was from e.g.
SIGILL without writing test programs.

[akpm@linux-foundation.org: add 0x to hex number output]
Signed-off-by: Denys Vlasenko
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:32 +0800
ebec18a6d prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision ... Browse Code »

Userspace service managers/supervisors need to track their started
services. Many services daemonize by double-forking and get implicitly
re-parented to PID 1. The service manager will no longer be able to
receive the SIGCHLD signals for them, and is no longer in charge of
reaping the children with wait(). All information about the children is
lost at the moment PID 1 cleans up the re-parented processes.

With this prctl, a service manager process can mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services. All SIGCHLD signals will be delivered
to the service manager.

Receiving SIGCHLD and doing wait() is in cases of a service-manager much
preferred over any possible asynchronous notification about specific
PIDs, because the service manager has full access to the child process
data in /proc and the PID can not be re-used until the wait(), the
service-manager itself is in charge of, has happened.

As a side effect, the relevant parent PID information does not get lost
by a double-fork, which results in a more elaborate process tree and
'ps' output:

before:
# ps afx
253 ? Ss 0:00 /bin/dbus-daemon --system --nofork
294 ? Sl 0:00 /usr/libexec/polkit-1/polkitd
328 ? S 0:00 /usr/sbin/modem-manager
608 ? Sl 0:00 /usr/libexec/colord
658 ? Sl 0:00 /usr/libexec/upowerd
819 ? Sl 0:00 /usr/libexec/imsettings-daemon
916 ? Sl 0:00 /usr/libexec/udisks-daemon
917 ? S 0:00 \_ udisks-daemon: not polling any devices

after:
# ps afx
294 ? Ss 0:00 /bin/dbus-daemon --system --nofork
426 ? Sl 0:00 \_ /usr/libexec/polkit-1/polkitd
449 ? S 0:00 \_ /usr/sbin/modem-manager
635 ? Sl 0:00 \_ /usr/libexec/colord
705 ? Sl 0:00 \_ /usr/libexec/upowerd
959 ? Sl 0:00 \_ /usr/libexec/udisks-daemon
960 ? S 0:00 | \_ udisks-daemon: not polling any devices
977 ? Sl 0:00 \_ /usr/libexec/packagekitd

This prctl is orthogonal to PID namespaces. PID namespaces are isolated
from each other, while a service management process usually requires the
services to live in the same namespace, to be able to talk to each
other.

Users of this will be the systemd per-user instance, which provides
init-like functionality for the user's login session and D-Bus, which
activates bus services on-demand. Both need init-like capabilities to
be able to properly keep track of the services they start.

Many thanks to Oleg for several rounds of review and insights.

[akpm@linux-foundation.org: fix comment layout and spelling]
[akpm@linux-foundation.org: add lengthy code comment from Oleg]
Reviewed-by: Oleg Nesterov
Signed-off-by: Lennart Poettering
Signed-off-by: Kay Sievers
Acked-by: Valdis Kletnieks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lennart Poettering
2012-03-24 07:58:32 +0800
a20ae85ab Merge tag 'for_linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb ... Browse Code »

Pull KGDB/KDB updates from Jason Wessel:
"Fixes:
- Fix KDB keyboard repeat scan codes and leaked keyboard events
- Fix kernel crash with kdb_printf() for users who compile new
kdb_printf()'s in early code
- Return all segment registers to gdb on x86_64

Features:
- KDB/KGDB hook the reboot notifier and end user can control if it
stops, detaches or does nothing (updated docs as well)
- Notify users who use CONFIG_DEBUG_RODATA to use hw breakpoints"

* tag 'for_linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kdb: Add message about CONFIG_DEBUG_RODATA on failure to install breakpoint
kdb: Avoid using dbg_io_ops until it is initialized
kgdb,debug_core: add the ability to control the reboot notifier
KDB: Fix usability issues relating to the 'enter' key.
kgdb,debug-core,gdbstub: Hook the reboot notifier for debugger detach
kgdb: Respect that flush op is optional
kgdb: x86: Return all segment registers also in 64-bit mode

Linus Torvalds
2012-03-24 00:29:44 +0800

23 Mar, 2012

8 commits

7bfe0e66d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input ... Browse Code »

Pull input subsystem updates from Dmitry Torokhov:
"- we finally merged driver for USB version of Synaptics touchpads
(I guess most commonly found in IBM/Lenovo keyboard/touchpad combo);

- a bunch of new drivers for embedded platforms (Cypress
touchscreens, DA9052 OnKey, MAX8997-haptic, Ilitek ILI210x
touchscreens, TI touchscreen);

- input core allows clients to specify desired clock source for
timestamps on input events (EVIOCSCLOCKID ioctl);

- input core allows querying state of all MT slots for given event
code via EVIOCGMTSLOTS ioctl;

- various driver fixes and improvements."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
Input: ili210x - add support for Ilitek ILI210x based touchscreens
Input: altera_ps2 - use of_match_ptr()
Input: synaptics_usb - switch to module_usb_driver()
Input: convert I2C drivers to use module_i2c_driver()
Input: convert SPI drivers to use module_spi_driver()
Input: omap4-keypad - move platform_data to
Input: kxtj9 - who_am_i check value and initial data rate fixes
Input: add driver support for MAX8997-haptic
Input: tegra-kbc - revise device tree support
Input: of_keymap - add device tree bindings for simple key matrices
Input: wacom - fix physical size calculation for 3rd-gen Bamboo
Input: twl4030-vibra - really switch from #if to #ifdef
Input: hp680_ts_input - ensure arguments to request_irq and free_irq are compatible
Input: max8925_onkey - avoid accessing input device too early
Input: max8925_onkey - allow to be used as a wakeup source
Input: atmel-wm97xx - convert to dev_pm_ops
Input: atmel-wm97xx - set driver owner
Input: add cyttsp touchscreen maintainer entry
Input: cyttsp - remove useless checks in cyttsp_probe()
Input: usbtouchscreen - add support for Data Modul EasyTouch TP 72037
...

Linus Torvalds
2012-03-23 11:20:18 +0800
1ba0c1720 kdb: Add message about CONFIG_DEBUG_RODATA on failure to install breakpoint ... Browse Code »

On x86, if CONFIG_DEBUG_RODATA is set, one cannot set breakpoints
via KDB. Apparently this is a well-known problem, as at least one distribution
now ships with both KDB enabled and CONFIG_DEBUG_RODATA=y for security reasons.

This patch adds an printk message to the breakpoint failure case,
in order to provide suggestions about how to use the debugger.

Reported-by: Tim Bird
Signed-off-by: Jason Wessel
Acked-by: Tim Bird

Jason Wessel
2012-03-23 04:07:16 +0800
b8adde8dd kdb: Avoid using dbg_io_ops until it is initialized ... Browse Code »
44

This fixes a bug with setting a breakpoint during kdb initialization
(from kdb_cmds). Any call to kdb_printf() before the initialization
of the kgdboc serial console driver (which happens much later during
bootup than kdb_init), results in kernel panic due to the use of
dbg_io_ops before it is initialized.

Signed-off-by: Tim Bird
Signed-off-by: Jason Wessel

Tim Bird
2012-03-23 04:07:16 +0800
bec4d62ea kgdb,debug_core: add the ability to control the reboot notifier ... Browse Code »

Sometimes it is desirable to stop the kernel debugger before allowing
a system to reboot either with kdb or kgdb. This patch adds the
ability to turn the reboot notifier on and off or enter the debugger
and stop kernel execution before rebooting.

It is possible to change the setting after booting the kernel with the
following:

echo 1 > /sys/module/debug_core/parameters/kgdbreboot

It is also possible to change this setting using kdb / kgdb to
manipulate the variable directly.

Using KDB:
mm kgdbreboot 1

Using gdb:
set kgdbreboot=1

Reported-by: Jan Kiszka
Signed-off-by: Jason Wessel

Jason Wessel
2012-03-23 04:07:16 +0800
8f30d4117 KDB: Fix usability issues relating to the 'enter' key. ... Browse Code »

This fixes the following problems:
1) Typematic-repeat of 'enter' gives warning message
and leaks make/break if KDB exits. Repeats
look something like 0x1c 0x1c .... 0x9c
2) Use of 'keypad enter' gives warning message and
leaks the ENTER break/make code out if KDB exits.
KP ENTER repeats look someting like 0xe0 0x1c
0xe0 0x1c ... 0xe0 0x9c.
3) Lag on the order of seconds between "break" and "make" when
expecting the enter "break" code. Seen under virtualized
environments such as VMware ESX.

The existing special enter handler tries to glob the enter break code,
but this fails if the other (KP) enter was used, or if there was a key
repeat. It also fails if you mashed some keys along with enter, and
you ended up with a non-enter make or non-enter break code coming
after the enter make code. So first, we modify the handler to handle
these cases. But performing these actions on every enter is annoying
since now you can't hold ENTER down to scroll d messages in
KDB. Since this special behaviour is only necessary to handle the
exiting KDB ('g' + ENTER) without leaking scancodes to the OS. This
cleanup needs to get executed anytime the kdb_main loop exits.

Tested on QEMU. Set a bp on atkbd.c to verify no scan code was leaked.

Cc: Andrei Warkentin
[jason.wessel@windriver.com: move cleanup calls to kdb_main.c]
Signed-off-by: Andrei Warkentin
Signed-off-by: Jason Wessel

Andrei Warkentin
2012-03-23 04:07:15 +0800
2366e0478 kgdb,debug-core,gdbstub: Hook the reboot notifier for debugger detach ... Browse Code »

The gdbstub and kdb should get detached if the system is rebooting.
Calling gdbstub_exit() will set the proper debug core state and send a
message to any debugger that is connected to correctly detach.

An attached debugger will receive the exit code from
include/linux/reboot.h based on SYS_HALT, SYS_REBOOT, etc...

Reported-by: Jan Kiszka
Signed-off-by: Jason Wessel

Jason Wessel
2012-03-23 04:07:15 +0800
9fbe465ef kgdb: Respect that flush op is optional ... Browse Code »

Not all kgdb I/O drivers implement a flush operation. Adjust
gdbstub_exit accordingly.

Signed-off-by: Jan Kiszka
Signed-off-by: Jason Wessel

Jan Kiszka
2012-03-23 04:07:15 +0800
95211279c Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge first batch of patches from Andrew Morton:
"A few misc things and all the MM queue"

* emailed from Andrew Morton : (92 commits)
memcg: avoid THP split in task migration
thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
memcg: clean up existing move charge code
mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
mm/memcontrol.c: s/stealed/stolen/
memcg: fix performance of mem_cgroup_begin_update_page_stat()
memcg: remove PCG_FILE_MAPPED
memcg: use new logic for page stat accounting
memcg: remove PCG_MOVE_LOCK flag from page_cgroup
memcg: simplify move_account() check
memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
memcg: kill dead prev_priority stubs
memcg: remove PCG_CACHE page_cgroup flag
memcg: let css_get_next() rely upon rcu_read_lock()
cgroup: revert ss_id_lock to spinlock
idr: make idr_get_next() good for rcu_read_lock()
memcg: remove unnecessary thp check in page stat accounting
memcg: remove redundant returns
memcg: enum lru_list lru
...

Linus Torvalds
2012-03-23 00:04:48 +0800

22 Mar, 2012

11 commits

ca464d69b memcg: let css_get_next() rely upon rcu_read_lock() ... Browse Code »

Remove lock and unlock around css_get_next()'s call to idr_get_next().
memcg iterators (only users of css_get_next) already did rcu_read_lock(),
and its comment demands that; but add a WARN_ON_ONCE to make sure of it.

Signed-off-by: Hugh Dickins
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Li Zefan
Cc: Eric Dumazet
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-03-22 08:55:01 +0800
42aee6c49 cgroup: revert ss_id_lock to spinlock ... Browse Code »

Commit c1e2ee2dc436 ("memcg: replace ss->id_lock with a rwlock") has now
been seen to cause the unfair behavior we should have expected from
converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
get_new_cssid() is waiting for the wlock, while there are 19 tasks using
the rlock in css_get_next() to get on with their memcg workload (in an
artificial test, admittedly). Yet lib/idr.c was made suitable for RCU
way back: revert that commit, restoring ss->id_lock to a spinlock.

Signed-off-by: Hugh Dickins
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Li Zefan
Cc: Eric Dumazet
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-03-22 08:55:01 +0800
05af2e104 mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() ... Browse Code »

sync_mm_rss() can only be used for current to avoid race conditions in
iterating and clearing its per-task counters. Remove the task argument
for it and its helper function, __sync_task_rss_stat(), to avoid thinking
it can be used safely for anything other than current.

Signed-off-by: David Rientjes
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-03-22 08:54:59 +0800
cc9a6c877 cpuset: mm: reduce large amounts of memory barrier related damage v3 ... Browse Code »
44

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were

3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman
Cc: Miao Xie
Cc: David Rientjes
Cc: Peter Zijlstra
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-03-22 08:54:59 +0800
c3f0327f8 mm: add rss counters consistency check ... Browse Code »
132

Warn about non-zero rss counters at final mmdrop.

This check will prevent reoccurences of bugs such as that fixed in "mm:
fix rss count leakage during migration".

I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
rss counters cover whole page-table management, so this is a good
invariant.

Signed-off-by: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-03-22 08:54:55 +0800
e2a0883e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...

Linus Torvalds
2012-03-22 04:36:41 +0800
3556485f1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates for 3.4 from James Morris:
"The main addition here is the new Yama security module from Kees Cook,
which was discussed at the Linux Security Summit last year. Its
purpose is to collect miscellaneous DAC security enhancements in one
place. This also marks a departure in policy for LSM modules, which
were previously limited to being standalone access control systems.
Chromium OS is using Yama, and I believe there are plans for Ubuntu,
at least.

This patchset also includes maintenance updates for AppArmor, TOMOYO
and others."

Fix trivial conflict in due to the jumo_label->static_key
rename.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
AppArmor: Fix location of const qualifier on generated string tables
TOMOYO: Return error if fails to delete a domain
AppArmor: add const qualifiers to string arrays
AppArmor: Add ability to load extended policy
TOMOYO: Return appropriate value to poll().
AppArmor: Move path failure information into aa_get_name and rename
AppArmor: Update dfa matching routines.
AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
AppArmor: Add const qualifiers to generated string tables
AppArmor: Fix oops in policy unpack auditing
AppArmor: Fix error returned when a path lookup is disconnected
KEYS: testing wrong bit for KEY_FLAG_REVOKED
TOMOYO: Fix mount flags checking order.
security: fix ima kconfig warning
AppArmor: Fix the error case for chroot relative path name lookup
AppArmor: fix mapping of META_READ to audit and quiet flags
AppArmor: Fix underflow in xindex calculation
AppArmor: Fix dropping of allowed operations that are force audited
AppArmor: Add mising end of structure test to caps unpacking
...

Linus Torvalds
2012-03-22 04:25:04 +0800
b8716614a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto update from Herbert Xu:
"* sha512 bug fixes (already in your tree).
* SHA224/SHA384 AEAD support in caam.
* X86-64 optimised version of Camellia.
* Tegra AES support.
* Bulk algorithm registration interface to make driver registration easier.
* padata race fixes.
* Misc fixes."

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
padata: Fix race on sequence number wrap
padata: Fix race in the serialization path
crypto: camellia - add assembler implementation for x86_64
crypto: camellia - rename camellia.c to camellia_generic.c
crypto: camellia - fix checkpatch warnings
crypto: camellia - rename camellia module to camellia_generic
crypto: tcrypt - add more camellia tests
crypto: testmgr - add more camellia test vectors
crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
crypto: twofish-x86_64/i586 - set alignmask to zero
crypto: blowfish-x86_64 - set alignmask to zero
crypto: serpent-sse2 - combine ablk_*_init functions
crypto: blowfish-x86_64 - use crypto_[un]register_algs
crypto: twofish-x86_64-3way - use crypto_[un]register_algs
crypto: serpent-sse2 - use crypto_[un]register_algs
crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
crypto: caam - fix gcc 4.6 warning
crypto: Add bulk algorithm registration interface
...

Linus Torvalds
2012-03-22 04:20:43 +0800
c207f3a43 Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

Pull irq_domain support for all architectures from Grant Likely:
"Generialize powerpc's irq_host as irq_domain

This branch takes the PowerPC irq_host infrastructure (reverse mapping
from Linux IRQ numbers to hardware irq numbering), generalizes it,
renames it to irq_domain, and makes it available to all architectures.

Originally the plan has been to create an all-new irq_domain
implementation which addresses some of the powerpc shortcomings such
as not handling 1:1 mappings well, but doing that proved to be far
more difficult and invasive than generalizing the working code and
refactoring it in-place. So, this branch rips out the 'new'
irq_domain and replaces it with the modified powerpc version (in a
fully bisectable way of course). It converts all users over to the
new API and makes irq_domain selectable on any architecture.

No architecture is forced to enable irq_domain, but the infrastructure
is required for doing OpenFirmware style irq translations. It will
even work on SPARC even though SPARC has it's own mechanism for
translating irqs at boot time. MIPS, microblaze, embedded x86 and c6x
are converted too.

The resulting irq_domain code is probably still too verbose and can be
optimized more, but that can be done incrementally and is a task for
follow-on patches."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: (31 commits)
dt: fix twl4030 for non-dt compile on x86
mfd: twl-core: Add IRQ_DOMAIN dependency
devicetree: Add empty of_platform_populate() for !CONFIG_OF_ADDRESS (sparc)
irq_domain: Centralize definition of irq_dispose_mapping()
irq_domain/mips: Allow irq_domain on MIPS
irq_domain/x86: Convert x86 (embedded) to use common irq_domain
ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
irq_domain/microblaze: Convert microblaze to use irq_domains
irq_domain/powerpc: Replace custom xlate functions with library functions
irq_domain/powerpc: constify irq_domain_ops
irq_domain/c6x: Use library of xlate functions
irq_domain/c6x: constify irq_domain structures
irq_domain/c6x: Convert c6x to use generic irq_domain support.
irq_domain: constify irq_domain_ops
irq_domain: Create common xlate functions that device drivers can use
irq_domain: Remove irq_domain_add_simple()
irq_domain: Remove 'new' irq_domain in favour of the ppc one
mfd: twl-core.c: Fix the number of interrupts managed by twl4030
of/address: add empty static inlines for !CONFIG_OF
irq_domain: Add support for base irq and hwirq in legacy mappings
...

Linus Torvalds
2012-03-22 01:27:19 +0800
c7c66c0cb Merge tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates for 3.4 from Rafael Wysocki:
"Assorted extensions and fixes including:

* Introduction of early/late suspend/hibernation device callbacks.
* Generic PM domains extensions and fixes.
* devfreq updates from Axel Lin and MyungJoo Ham.
* Device PM QoS updates.
* Fixes of concurrency problems with wakeup sources.
* System suspend and hibernation fixes."

* tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
PM / Domains: Check domain status during hibernation restore of devices
PM / devfreq: add relation of recommended frequency.
PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
PM / Domains: Introduce "always on" device flag
PM / Domains: Fix hibernation restore of devices, v2
PM / Domains: Fix handling of wakeup devices during system resume
sh_mmcif / PM: Use PM QoS latency constraint
tmio_mmc / PM: Use PM QoS latency constraint
PM / QoS: Make it possible to expose PM QoS latency constraints
PM / Sleep: JBD and JBD2 missing set_freezable()
PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
PM / Freezer: Remove references to TIF_FREEZE in comments
PM / Sleep: Add more wakeup source initialization routines
PM / Hibernate: Enable usermodehelpers in hibernate() error path
PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
PM / Sleep: Fix race conditions related to wakeup source timer function
PM / Sleep: Fix possible infinite loop during wakeup source destruction
PM / Hibernate: print physical addresses consistently with other parts of kernel
...

Linus Torvalds
2012-03-22 01:15:51 +0800
9f3938346 Merge branch 'kmap_atomic' of git://github.com/congwang/linux ... Browse Code »

Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...

Linus Torvalds
2012-03-22 00:40:26 +0800