Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 May, 2013

2 commits

26b840ae5 Merge tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing/kprobes update from Steven Rostedt:
"The majority of these changes are from Masami Hiramatsu bringing
kprobes up to par with the latest changes to ftrace (multi buffering
and the new function probes).

He also discovered and fixed some bugs in doing so. When pulling in
his patches, I also found a few minor bugs as well and fixed them.

This also includes a compile fix for some archs that select the ring
buffer but not tracing.

I based this off of the last patch you took from me that fixed the
merge conflict error, as that was the commit that had all the changes
I needed for this set of changes."

* tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Support soft-mode disabling
tracing/kprobes: Support ftrace_event_file base multibuffer
tracing/kprobes: Pass trace_probe directly from dispatcher
tracing/kprobes: Increment probe hit-count even if it is used by perf
tracing/kprobes: Use bool for retprobe checker
ftrace: Fix function probe when more than one probe is added
ftrace: Fix the output of enabled_functions debug file
ftrace: Fix locking in register_ftrace_function_probe()
tracing: Add helper function trace_create_new_event() to remove duplicate code
tracing: Modify soft-mode only if there's no other referrer
tracing: Indicate enabled soft-mode in enable file
tracing/kprobes: Fix to increment return event probe hit-count
ftrace: Cleanup regex_lock and ftrace_lock around hash updating
ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
ftrace: Have ftrace_regex_write() return either read or error
tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
tracing: Don't succeed if event_enable_func did not register anything
ring-buffer: Select IRQ_WORK

Linus Torvalds
2013-05-12 08:04:59 +0800
c4cc75c33 Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.

Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...

Linus Torvalds
2013-05-12 05:29:11 +0800

11 May, 2013

2 commits

3644bc2ec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)

Linus Torvalds
2013-05-11 00:21:05 +0800
f741df1f3 Merge tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6 ... Browse Code »

Pull misc fixes from David Woodhouse:
"This is some miscellaneous cleanups that don't really belong anywhere
else (or were ignored), that have been sitting in linux-next for some
time. Two of them are fixes resulting from my audit of krealloc()
usage that don't seem to have elicited any response when I posted
them, and the other three are patches from Artem removing dead code."

* tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6:
pcmcia: remove RPX board stuff
m68k: remove rpxlite stuff
pcmcia: remove Motorola MBX860 support
params: Fix potential memory leak in add_sysfs_param()
dell-laptop: Fix krealloc() misuse in parse_da_table()

Linus Torvalds
2013-05-11 00:09:47 +0800

10 May, 2013

16 commits

b8820084f tracing/kprobes: Support soft-mode disabling ... Browse Code »

Support soft-mode disabling on kprobe-based dynamic events.
Soft-disabling is just ignoring recording if the soft disabled
flag is set.

Link: http://lkml.kernel.org/r/20130509054454.30398.7237.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Cc: Oleg Nesterov
Cc: Srikar Dronamraju
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:22:16 +0800
41a7dd420 tracing/kprobes: Support ftrace_event_file base multibuffer ... Browse Code »

Support multi-buffer on kprobe-based dynamic events by
using ftrace_event_file.

Link: http://lkml.kernel.org/r/20130509054449.30398.88343.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Cc: Oleg Nesterov
Cc: Srikar Dronamraju
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:21:47 +0800
2b106aabe tracing/kprobes: Pass trace_probe directly from dispatcher ... Browse Code »

Pass the pointer of struct trace_probe directly from probe
dispatcher to handlers. This removes redundant container_of
macro uses. Same thing has already done in trace_uprobe.

Link: http://lkml.kernel.org/r/20130509054441.30398.69112.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Cc: Oleg Nesterov
Cc: Srikar Dronamraju
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:19:48 +0800
48182bd22 tracing/kprobes: Increment probe hit-count even if it is used by perf ... Browse Code »

Increment probe hit-count for profiling even if it is used
by perf tool. Same thing has already done in trace_uprobe.

Link: http://lkml.kernel.org/r/20130509054436.30398.21133.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Cc: Oleg Nesterov
Cc: Srikar Dronamraju
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:18:44 +0800
db02038f4 tracing/kprobes: Use bool for retprobe checker ... Browse Code »

Use bool instead of int for kretprobe checker.

Link: http://lkml.kernel.org/r/20130509054431.30398.38561.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:17:35 +0800
19dd603e4 ftrace: Fix function probe when more than one probe is added ... Browse Code »

When the first function probe is added and the function tracer
is updated the functions are modified to call the probe.
But when a second function is added, it updates the function
records to have the second function also update, but it fails
to update the actual function itself.

This prevents the second (or third or forth and so on) probes
from having their functions called.

# echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter
# echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
# touch /tmp/a
# rm /tmp/a
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
# ln -s /tmp/a
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 414/414 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
-0 [000] d..3 2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120
-3114 [001] d..4 2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120
bash-2786 [000] d..3 2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120
kworker/0:1-34 [000] d..3 2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
-0 [002] d..3 2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120
sshd-2783 [002] d..3 2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120

Still need to update the functions even though the probe itself
does not need to be registered again when added a new probe.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-10 08:16:27 +0800
23ea9c4dd ftrace: Fix the output of enabled_functions debug file ... Browse Code »

The enabled_functions debugfs file was created to be able to see
what functions have been modified from nops to calling a tracer.

The current method uses the counter in the function record.
As when a ftrace_ops is registered to a function, its count
increases. But that doesn't mean that the function is actively
being traced. /proc/sys/kernel/ftrace_enabled can be set to zero
which would disable it, as well as something can go wrong and
we can think its enabled when only the counter is set.

The record's FTRACE_FL_ENABLED flag is set or cleared when its
function is modified. That is a much more accurate way of knowing
what function is enabled or not.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-10 08:16:16 +0800
5ae0bf597 ftrace: Fix locking in register_ftrace_function_probe() ... Browse Code »

The iteration of the ftrace function list and the call to
ftrace_match_record() need to be protected by the ftrace_lock.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-10 08:15:30 +0800
da511bf33 tracing: Add helper function trace_create_new_event() to remove duplicate code ... Browse Code »

Both __trace_add_new_event() and __trace_early_add_new_event() do
basically the same thing, except that __trace_add_new_event() does
a little more.

Instead of having duplicate code between the two functions, add
a helper function trace_create_new_event() that both can use.
This will help against having bugs fixed in one function but not
the other.

Cc: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-10 08:14:49 +0800
1cf4c0732 tracing: Modify soft-mode only if there's no other referrer ... Browse Code »

Modify soft-mode flag only if no other soft-mode referrer
(currently only the ftrace triggers) by using a reference
counter in each ftrace_event_file.

Without this fix, adding and removing several different
enable/disable_event triggers on the same event clear
soft-mode bit from the ftrace_event_file. This also
happens with a typo of glob on setting triggers.

e.g.

# echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
# cat events/net/netif_rx/enable
0*
# echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
# cat events/net/netif_rx/enable
0
# cat set_ftrace_filter
#### all functions enabled ####
vfs_symlink:enable_event:net:netif_rx:unlimited

As above, we still have a trigger, but soft-mode is gone.

Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: David Sharp
Cc: Hiraku Toyooka
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:14:25 +0800
30052170d tracing: Indicate enabled soft-mode in enable file ... Browse Code »

Indicate enabled soft-mode event as "1*" in "enable" file
for each event, because it can be soft-disabled when disable_event
trigger is hit.

Link: http://lkml.kernel.org/r/20130509054426.30398.28202.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:14:07 +0800
cce2c8f26 tracing/kprobes: Fix to increment return event probe hit-count ... Browse Code »

Fix to increment probe hit-count for function return event.

Link: http://lkml.kernel.org/r/20130509054424.30398.34058.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Cc: Oleg Nesterov
Cc: Srikar Dronamraju
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:13:51 +0800
3f2367ba7 ftrace: Cleanup regex_lock and ftrace_lock around hash updating ... Browse Code »

Cleanup regex_lock and ftrace_lock locking points around
ftrace_ops hash update code.

The new rule is that regex_lock protects ops->*_hash
read-update-write code for each ftrace_ops. Usually,
hash update is done by following sequence.

1. allocate a new local hash and copy the original hash.
2. update the local hash.
3. move(actually, copy) back the local hash to ftrace_ops.
4. update ftrace entries if needed.
5. release the local hash.

This makes regex_lock protect #1-#4, and ftrace_lock
to protect #3, #4 and adding and removing ftrace_ops from the
ftrace_ops_list. The ftrace_lock protects #3 as well because
the move functions update the entries too.

Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:11:48 +0800
f04f24fb7 ftrace, kprobes: Fix a deadlock on ftrace_regex_lock ... Browse Code »

Fix a deadlock on ftrace_regex_lock which happens when setting
an enable_event trigger on dynamic kprobe event as below.

----
sh-2.05b# echo p vfs_symlink > kprobe_events
sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter

=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #35 Not tainted
---------------------------------------------
sh/72 is trying to acquire lock:
(ftrace_regex_lock){+.+.+.}, at: [] ftrace_set_hash+0x81/0x1f0

but task is already holding lock:
(ftrace_regex_lock){+.+.+.}, at: [] ftrace_regex_write.isra.29.part.30+0x3d/0x220

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(ftrace_regex_lock);
lock(ftrace_regex_lock);

*** DEADLOCK ***
----

To fix that, this introduces a finer regex_lock for each ftrace_ops.
ftrace_regex_lock is too big of a lock which protects all
filter/notrace_hash operations, but it doesn't need to be a global
lock after supporting multiple ftrace_ops because each ftrace_ops
has its own filter/notrace_hash.

Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
[ Added initialization flag and automate mutex initialization for
non ftrace.c ftrace_probes. ]
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-10 08:10:22 +0800
c5ddd2024 switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-10 02:53:20 +0800
91c2e0bca unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-10 01:46:38 +0800

09 May, 2013

5 commits

7c088b512 ftrace: Have ftrace_regex_write() return either read or error ... Browse Code »

As ftrace_regex_write() reads the result of ftrace_process_regex()
which can sometimes return a positive number, only consider a
failure if the return is negative. Otherwise, it will skip possible
other registered probes and by returning a positive number that
wasn't read, it will confuse the user processes doing the writing.

Cc: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-09 23:35:12 +0800
ff305ded9 tracing: Return error if register_ftrace_function_probe() fails for event_enable_func() ... Browse Code »

register_ftrace_function_probe() returns the number of functions
it registered, which can be zero, it can also return a negative number
if something went wrong. But event_enable_func() only checks for
the case that it didn't register anything, it needs to also check
for the case that something went wrong and return that error code
as well.

Added some comments about the code as well, to make it more
understandable.

Cc: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-09 23:30:26 +0800
a5b85bd15 tracing: Don't succeed if event_enable_func did not register anything ... Browse Code »

Return 0 instead of the number of activated ftrace function probes if
event_enable_func succeeded and return an error code if it failed or
did not register any functions. But it currently returns the number
of registered functions and if it didn't register anything, it returns 0,
but that is considered success.

This also fixes the return value. As if it succeeds, it returns the
number of functions that were enabled, which is returned back to
the user in ftrace_regex_write (the write() return code). If only
one function is enabled, then the return code of the write is one,
and this can confuse the user program in thinking it only wrote 1
byte.

Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Tom Zanussi
Signed-off-by: Masami Hiramatsu
[ Rewrote change log to reflect that this fixes two bugs - SR ]
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-05-09 23:26:01 +0800
ebb372777 Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:

- mtip32xx fixes from Micron.

- A slew of drbd updates, this time in a nicer series.

- bcache, a flash/ssd caching framework from Kent.

- Fixes for cciss"

* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include
...

Linus Torvalds
2013-05-09 02:51:05 +0800
4de13d7aa Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block core updates from Jens Axboe:

- Major bit is Kents prep work for immutable bio vecs.

- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.

- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.

- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.

- Runtime PM framework, SCSI patches exists on top of these in James'
tree.

- A few random fixes.

* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...

Linus Torvalds
2013-05-09 01:13:35 +0800

08 May, 2013

5 commits

2a0b4be6d audit: fix message spacing printing auid ... Browse Code »

The helper function didn't include a leading space, so it was jammed
against the previous text in the audit record.

Signed-off-by: Eric Paris

Eric Paris
2013-05-08 12:02:19 +0800
5af43c24c Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge more incoming from Andrew Morton:

- Various fixes which were stalled or which I picked up recently

- A large rotorooting of the AIO code. Allegedly to improve
performance but I don't really have good performance numbers (I might
have lost the email) and I can't raise Kent today. I held this out
of 3.9 and we could give it another cycle if it's all too late/scary.

I ended up taking only the first two thirds of the AIO rotorooting. I
left the percpu parts and the batch completion for later. - Linus

* emailed patches from Andrew Morton : (33 commits)
aio: don't include aio.h in sched.h
aio: kill ki_retry
aio: kill ki_key
aio: give shared kioctx fields their own cachelines
aio: kill struct aio_ring_info
aio: kill batch allocation
aio: change reqs_active to include unreaped completions
aio: use cancellation list lazily
aio: use flush_dcache_page()
aio: make aio_read_evt() more efficient, convert to hrtimers
wait: add wait_event_hrtimeout()
aio: refcounting cleanup
aio: make aio_put_req() lockless
aio: do fget() after aio_get_req()
aio: dprintk() -> pr_debug()
aio: move private stuff out of aio.h
aio: add kiocb_cancel()
aio: kill return value of aio_complete()
char: add aio_{read,write} to /dev/{null,zero}
aio: remove retry-based AIO
...

Linus Torvalds
2013-05-08 11:49:51 +0800
a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800
82d8da0d4 Revert "audit: move kaudit thread start from auditd registration to kaudit init" ... Browse Code »

This reverts commit 6ff5e45985c2fcb97947818f66d1eeaf9d6600b2.

Conflicts:
kernel/audit.c

This patch was starting a kthread for all the time. Since the follow on
patches that required it didn't get finished in 3.10 time, we shouldn't
ship this change in 3.10.

Signed-off-by: Eric Paris

Eric Paris
2013-05-08 10:27:21 +0800
780a7654c audit: Make testing for a valid loginuid explicit. ... Browse Code »

audit rule additions containing "-F auid!=4294967295" were failing
with EINVAL because of a regression caused by e1760bd.

Apparently some userland audit rule sets want to know if loginuid uid
has been set and are using a test for auid != 4294967295 to determine
that.

In practice that is a horrible way to ask if a value has been set,
because it relies on subtle implementation details and will break
every time the uid implementation in the kernel changes.

So add a clean way to test if the audit loginuid has been set, and
silently convert the old idiom to the cleaner and more comprehensible
new idiom.

Cc: # 3.7
Reported-By: Richard Guy Briggs
Signed-off-by: "Eric W. Biederman"
Tested-by: Richard Guy Briggs
Signed-off-by: Eric Paris

Eric W. Biederman
2013-05-08 10:27:15 +0800

06 May, 2013

4 commits

5fe0c1f2f irqdomain: Allow quiet failure mode ... Browse Code »

Some interrupt controllers refuse to map interrupts marked as
"protected" by firwmare. Since we try to map everyting in the
device-tree on some platforms, we end up with a lot of nasty
WARN's in the boot log for what is a normal situation on those
machines.

This defines a specific return code (-EPERM) from the host map()
callback which cause irqdomain to fail silently.

MPIC is updated to return this when hitting a protected source
printing only a single line message for diagnostic purposes.

Signed-off-by: Benjamin Herrenschmidt

Benjamin Herrenschmidt
2013-05-06 09:37:43 +0800
534c97b09 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.

This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.

This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:

- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.

- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.

- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.

The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.

Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:

- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.

- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.

- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.

The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.

This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.

This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:

- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.

- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.

- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.

- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.

- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)

Future plans:

- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.

- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.

I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.

More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...

Linus Torvalds
2013-05-06 04:23:27 +0800
64049d197 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc fixes plus a small hw-enablement patch for Intel IB model 58
uncore events"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
perf/x86/intel/lbr: Fix LBR filter
perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
perf: Fix vmalloc ring buffer pages handling
perf/x86/intel: Fix unintended variable name reuse
perf/x86/intel: Add support for IvyBridge model 58 Uncore
perf/x86/intel: Fix typo in perf_event_intel_uncore.c
x86: Eliminate irq_mis_count counted in arch_irq_stat

Linus Torvalds
2013-05-06 02:37:16 +0800
f8ce1faf5 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull mudule updates from Rusty Russell:
"We get rid of the general module prefix confusion with a binary config
option, fix a remove/insert race which Never Happens, and (my
favorite) handle the case when we have too many modules for a single
commandline. Seriously, the kernel is full, please go away!"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
X.509: Support parse long form of length octets in Authority Key Identifier
module: don't unlink the module until we've removed all exposure.
kernel: kallsyms: memory override issue, need check destination buffer length
MODSIGN: do not send garbage to stderr when enabling modules signature
modpost: handle huge numbers of modules.
modpost: add -T option to read module names from file/stdin.
modpost: minor cleanup.
genksyms: pass symbol-prefix instead of arch
module: fix symbol versioning with symbol prefixes
CONFIG_SYMBOL_PREFIX: cleanup.

Linus Torvalds
2013-05-06 01:58:06 +0800

05 May, 2013

3 commits

7ee2b9e56 rcutrace: single_open() leaks ... Browse Code »

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2013-05-05 12:16:35 +0800
bd932ae1b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second round of VFS updates from Al Viro:
"Assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xtensa simdisk: fix braino in "xtensa simdisk: switch to proc_create_data()"
hostfs: use kmalloc instead of kzalloc
hostfs: move HOSTFS_SUPER_MAGIC to
hostfs: remove "will unlock" comment
vfs: use list_move instead of list_del/list_add
proc_devtree: Replace include linux/module.h with linux/export.h
create_mnt_ns: unidiomatic use of list_add()
fs: remove dentry_lru_prune()
Removed unused typedef to avoid "unused local typedef" warnings.
kill fs/read_write.h
fs: Fix hang with BSD accounting on frozen filesystem
sun3_scsi: add ->show_info()
nubus: Kill nubus_proc_detach_device()
more mode_t whack-a-mole...
do_coredump(): don't wait for thaw if coredump has already been interrupted
do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks")

Linus Torvalds
2013-05-05 04:29:38 +0800
5ae98f158 fs: Fix hang with BSD accounting on frozen filesystem ... Browse Code »

When BSD process accounting is enabled and logs information to a
filesystem which gets frozen, system easily becomes unusable because
each attempt to account process information blocks. Thus e.g. every task
gets blocked in exit.

It seems better to drop accounting information (which can already happen
when filesystem is running out of space) instead of locking system up.
So we just skip the write if the filesystem is frozen.

Reported-by: Nikola Ciprich
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2013-05-05 02:57:58 +0800

04 May, 2013

3 commits

265f22a97 sched: Keep at least 1 tick per second for active dynticks tasks ... Browse Code »

The scheduler doesn't yet fully support environments
with a single task running without a periodic tick.

In order to ensure we still maintain the duties of scheduler_tick(),
keep at least 1 tick per second.

This makes sure that we keep the progression of various scheduler
accounting and background maintainance even with a very low granularity.
Examples include cpu load, sched average, CFS entity vruntime,
avenrun and events such as load balancing, amongst other details
handled in sched_class::task_tick().

This limitation will be removed in the future once we get
these individual items to work in full dynticks CPUs.

Suggested-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Christoph Lameter
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-05-04 14:32:02 +0800
73c308287 rcu: Fix full dynticks' dependency on wide RCU nocb mode ... Browse Code »

Commit 0637e029392386e6996f5d6574aadccee8315efa
("nohz: Select wide RCU nocb for full dynticks") intended
to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
enabled.

However this option is part of a choice menu and Kconfig's
"select" instruction has no effect on such targets.

Fix this by using reverse dependencies on the targets we
don't want instead.

Reviewed-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Christoph Lameter
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-05-04 14:30:34 +0800
222876888 ring-buffer: Select IRQ_WORK ... Browse Code »

As the wake up logic for waiters on the buffer has been moved
from the tracing code to the ring buffer, it requires also adding
IRQ_WORK as the wake up code is performed via irq_work.

This fixes compile breakage when a user of the ring buffer is selected
but tracing and irq_work are not.

Link http://lkml.kernel.org/r/20130503115332.GT8356@rric.localhost

Cc: Arnd Bergmann
Reported-by: Robert Richter
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-04 07:24:17 +0800