Eric Lee / smarc-fsl-linux-kernel

13 Sep, 2011

6 commits

8292c9e15 locking, semaphores: Annotate inner lock as raw ... Browse Code »

There is no reason to have the spin_lock protecting the semaphore
preemptible on -rt. Annotate it as a raw_spinlock.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

( On rt this also solves lockdep complaining about the
rt_mutex.wait_lock being not initialized. )

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:57 +0800
ee30a7b2f locking, sched: Annotate thread_group_cputimer as raw ... Browse Code »

The thread_group_cputimer lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:55 +0800
07354eb1a locking, printk: Annotate logbuf_lock as raw ... Browse Code »

The logbuf_lock lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
[ merged and fixed it ]
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:54 +0800
5389f6fad locking, tracing: Annotate tracing locks as raw ... Browse Code »

The tracing locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:52 +0800
cdcc136ff locking, sched, cgroups: Annotate release_list_lock as raw ... Browse Code »

The release_list_lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:49 +0800
ec484608c locking, kprobes: Annotate the hash locks and kretprobe.lock as raw ... Browse Code »

The kprobe locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:45 +0800

12 Sep, 2011

1 commit

0fa914c63 rtmutex: Cleanup the debug code ... Browse Code »

Use the existing lock debugging macros.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2011-09-12 19:42:32 +0800

08 Sep, 2011

2 commits

79016f648 Merge branch 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip ... Browse Code »

* 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
rtc: twl: Fix registration vs. init order
rtc: Initialized rtc_time->tm_isdst
rtc: Fix RTC PIE frequency limit
rtc: rtc-twl: Remove lockdep related local_irq_enable()
rtc: rtc-twl: Switch to using threaded irq
rtc: ep93xx: Fix 'rtc' may be used uninitialized warning
alarmtimers: Avoid possible denial of service with high freq periodic timers
alarmtimers: Memset itimerspec passed into alarm_timer_get
alarmtimers: Avoid possible null pointer traversal

Linus Torvalds
2011-09-08 04:03:48 +0800
e81b693c0 Merge branch 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip ... Browse Code »

* 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
sched: Fix a memory leak in __sdt_free()
sched: Move blk_schedule_flush_plug() out of __schedule()
sched: Separate the scheduler entry for preemption

Linus Torvalds
2011-09-08 04:01:34 +0800

31 Aug, 2011

1 commit

7f310a5d4 perf_event: Fix broken calc_timer_values() ... Browse Code »

We detected a serious issue with PERF_SAMPLE_READ and
timing information when events were being multiplexing.

Samples would have time_running > time_enabled. That
was easy to reproduce with a libpfm4 example (ran 3
times to cause multiplexing on Core 2):

$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
syst_smpl: WARNING: time_running > time_enabled
63277537998 uops_retired:freq=1 , scaled

The bug was not present in kernel up to (and including) 3.0. It turns
out the bug was introduced by the following commit:

commit c4794295917ebeda8013b6cb9c8d71ab4f74a1fa

events: Move lockless timer calculation into helper function

The parameters of the function got reversed yet the call sites
were not updated to reflect the change. That lead to time_running
and time_enabled being swapped. That had no effect when there was
no multiplexing because in that case time_running = time_enabled
but it would show up in any other scenario.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad
Signed-off-by: Ingo Molnar

Eric B Munson
2011-08-31 21:56:29 +0800

29 Aug, 2011

4 commits

a8d757ef0 perf events: Fix slow and broken cgroup context switch code ... Browse Code »

The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

$ ./pong
Both processes pinned to CPU1, running for 10s
10684.51 ctxsw/s

Now start a cgroup perf stat:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

$ ./pong
Both processes pinned to CPU1, running for 10s
6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

Performance counter stats for 'sleep 100':

CPU1 cycles test
CPU1 16,984,189,138 cycles # 0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
$ ./pong
Both processes pinned to CPU1, running for 10s
10775.30 ctxsw/s

Start perf stat with cgroup:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Run pong outside the cgroup:
$ /pong
Both processes pinned to CPU1, running for 10s
10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 22,297,726,205 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-08-29 18:28:33 +0800
feff8fa00 sched: Fix a memory leak in __sdt_free() ... Browse Code »
1

This patch fixes the following memory leak:

unreferenced object 0xffff880107266800 (size 512):
comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] create_object+0x187/0x28b
[] kmemleak_alloc+0x73/0x98
[] __kmalloc_node+0x104/0x159
[] kzalloc_node.clone.97+0x15/0x17
[] build_sched_domains+0xb7/0x7f3
[] partition_sched_domains+0x1db/0x24a
[] do_rebuild_sched_domains+0x3b/0x47
[] rebuild_sched_domains+0x10/0x12
[] sched_power_savings_store+0x6c/0x7b
[] sched_mc_power_savings_store+0x16/0x18
[] sysdev_class_store+0x20/0x22
[] sysfs_write_file+0x108/0x144
[] vfs_write+0xaf/0x102
[] sys_write+0x4d/0x74
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

Signed-off-by: WANG Cong
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org # 3.0
Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com
Signed-off-by: Ingo Molnar

WANG Cong
2011-08-29 18:27:01 +0800
9c40cef2b sched: Move blk_schedule_flush_plug() out of __schedule() ... Browse Code »
1

There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.

Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.

This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.

Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Cc: Jens Axboe
Cc: Linus Torvalds
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-08-29 18:26:59 +0800
c259e01a1 sched: Separate the scheduler entry for preemption ... Browse Code »
2

Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.

To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.

Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Cc: Jens Axboe
Cc: Linus Torvalds
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-08-29 18:26:57 +0800

27 Aug, 2011

1 commit

f5b940997 All Arch: remove linkage for sys_nfsservctl system call ... Browse Code »

The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

NeilBrown
2011-08-27 06:09:58 +0800

26 Aug, 2011

2 commits

4c30c6f56 kernel/printk: do not turn off bootconsole in printk_late_init() if keep_bootcon ... Browse Code »
1

It seems that 7bf693951a8e ("console: allow to retain boot console via
boot option keep_bootcon") doesn't always achieve what it aims, as when
printk_late_init() runs it unconditionally turns off all boot consoles.
With this patch, I am able to see more messages on the boot console in
KVM guests than I can without, when keep_bootcon is specified.

I think it is appropriate for the relevant -stable trees. However, it's
more of an annoyance than a serious bug (ideally you don't need to keep
the boot console around as console handover should be working -- I was
encountering a situation where the console handover wasn't working and
not having the boot console available meant I couldn't see why).

Signed-off-by: Nishanth Aravamudan
Cc: David S. Miller
Cc: Alan Cox
Cc: Greg KH
Acked-by: Fabio M. Di Nitto
Cc: [2.6.39.x, 3.0.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2011-08-26 07:25:34 +0800
be27425dc Add a personality to report 2.6.x version numbers ... Browse Code »
1

I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

$ hpacucli ctrl all show

Error: No controllers detected.

$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2011-08-26 01:17:28 +0800

24 Aug, 2011

2 commits

35a177a08 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix tracing builds inside the source tree
xfs: remove subdirectories
xfs: don't expect xfs headers to be in subdirectories

Linus Torvalds
2011-08-24 02:41:44 +0800
69dd3d8e2 Revert "irq: Always set IRQF_ONESHOT if no primary handler is specified" ... Browse Code »

This reverts commit f3637a5f2e2eb391ff5757bc83fb5de8f9726464.

It turns out that this breaks several drivers, one example being OMAP
boards which use the on-board OMAP UARTs and the omap-serial driver that
will not boot to userspace after the commit.

Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
handler type mismatch' errors:

IRQ handler type mismatch for IRQ 74
current handler: serial idle
...

and the reason is that setting IRQF_ONESHOT will now result in those
interrupt handlers having different IRQF flags, and thus being
unsharable. So the commit log in the reverted commit:

"Since it is required for those users and
there is no difference for others it makes sense to add this flag
unconditionally."

is simply not true: there may not be any difference from a "actions at
irq time", but there is a *big* difference wrt this flag testing irq
management (see __setup_irq() in kernel/irq/manage.c).

One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
right now the safe course of action is to revert the change. Let's
revisit this in a later merge window.

Reported-by: Paul Walmsley
Cc: Sebastian Andrzej Siewior
Requested-by: Alan Cox
Acked-by: Thomas Gleixner
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-08-24 01:36:51 +0800

20 Aug, 2011

1 commit

5ccc38740 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
Revert "cfq: Remove special treatment for metadata rqs."
block: fix flush machinery for stacking drivers with differring flush flags
block: improve rq_affinity placement
blktrace: add FLUSH/FUA support
Move some REQ flags to the common bio/request area
allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
xen/blkback: Make description more obvious.
cfq-iosched: Add documentation about idling
block: Make rq_affinity = 1 work as expected
block: swim3: fix unterminated of_device_id table
block/genhd.c: remove useless cast in diskstats_show()
drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
bsg-lib: add module.h include
cfq-iosched: Reduce linked group count upon group destruction
blk-throttle: correctly determine sync bio
loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
loop: add management interface for on-demand device allocation
loop: replace linked list of allocated devices with an idr index
...

Linus Torvalds
2011-08-20 01:47:07 +0800

19 Aug, 2011

1 commit

d522a0d17 irqdesc: fix new kernel-doc warning ... Browse Code »

Fix kernel-doc warning in irqdesc.c:

Warning(kernel/irq/irqdesc.c:353): No description found for parameter 'owner'

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-08-19 05:12:48 +0800

18 Aug, 2011

3 commits

b4fd4ae6c Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix build for CONFIG_PM_RUNTIME unset

Linus Torvalds
2011-08-18 04:15:25 +0800
2da9f365f Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix wrong assumption in match_held_lock

Linus Torvalds
2011-08-18 01:25:08 +0800
950d0a10d Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Track the owner of irq descriptor
irq: Always set IRQF_ONESHOT if no primary handler is specified
genirq: Fix wrong bit operation

Linus Torvalds
2011-08-18 01:23:50 +0800

14 Aug, 2011

1 commit

17f2ae7f6 PM / Domains: Fix build for CONFIG_PM_RUNTIME unset ... Browse Code »

Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case. Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2011-08-14 19:34:31 +0800

13 Aug, 2011

1 commit

c59d87c46 xfs: remove subdirectories ... Browse Code »

Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code. Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-08-13 05:21:35 +0800

12 Aug, 2011

2 commits

72fa59970 move RLIMIT_NPROC check from set_user() to do_execve_common() ... Browse Code »

The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only. But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges. So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected. If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris
Signed-off-by: Vasiliy Kulikov
Acked-by: NeilBrown
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2011-08-12 02:24:42 +0800
1d229d54d Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf symbols: Check '/tmp/perf-' symbol file ownership
perf sched: Usage leftover from trace -> script rename
perf sched: Do not delete session object prematurely
perf tools: Check $HOME/.perfconfig ownership
perf, x86: Add model 45 SandyBridge support
perf tools: Add support to install perf python extension
perf tools: do not look at ./config for configuration
perf tools: Make clean leaves some files
perf lock: Dropping unsupported ':r' modifier
perf probe: Fix coredump introduced by probe module option
jump label: Reduce the cycle count by changing the link order
perf report: Use ui__warning in some more places
perf python: Add PERF_RECORD_{LOST,READ,SAMPLE} routine tables
perf evlist: Introduce 'disable' method
trace events: Update version number reference to new 3.x scheme for EVENT_POWER_TRACING_DEPRECATED
perf buildid-cache: Zero out buffer of filenames when adding/removing buildid

Linus Torvalds
2011-08-12 00:03:48 +0800

11 Aug, 2011

2 commits

c09c47cae blktrace: add FLUSH/FUA support ... Browse Code »

Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

- WRITE: W
- WRITE_FLUSH: FW
- WRITE_FUA: WF
- WRITE_FLUSH_FUA: FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: Namhyung Kim
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Namhyung Kim
2011-08-11 16:36:05 +0800
6af7e471e alarmtimers: Avoid possible denial of service with high freq periodic timers ... Browse Code »
1

Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.

A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.

CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz

John Stultz
2011-08-11 01:26:09 +0800

10 Aug, 2011

3 commits

ea7802f63 alarmtimers: Memset itimerspec passed into alarm_timer_get ... Browse Code »
1

Following common_timer_get, zero out the itimerspec passed in.

CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz

John Stultz
2011-08-10 22:10:09 +0800
971c90bfa alarmtimers: Avoid possible null pointer traversal ... Browse Code »
1

We don't check if old_setting is non null before assigning it, so
correct this.

CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz

John Stultz
2011-08-10 22:09:53 +0800
f2c0d0266 cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning ... Browse Code »
1

syslog-ng versions before 3.3.0beta1 (2011-05-12) assume that
CAP_SYS_ADMIN is sufficient to access syslog, so ever since CAP_SYSLOG
was introduced (2010-11-25) they have triggered a warning.

Commit ee24aebffb75 ("cap_syslog: accept CAP_SYS_ADMIN for now")
improved matters a little by making syslog-ng work again, just keeping
the WARN_ONCE(). But still, this is a warning that writes a stack trace
we don't care about to syslog, sets a taint flag, and alarms sysadmins
when nothing worse has happened than use of an old userspace with a
recent kernel.

Convert the WARN_ONCE to a printk_once to avoid that while continuing to
give userspace developers a hint that this is an unwanted
backward-compatibility feature and won't be around forever.

Reported-by: Ralf Hildebrandt
Reported-by: Niels
Reported-by: Paweł Sikora
Signed-off-by: Jonathan Nieder
Liked-by: Gergely Nagy
Acked-by: Serge Hallyn
Acked-by: James Morris
Signed-off-by: Linus Torvalds

Jonathan Nieder
2011-08-10 09:22:22 +0800

09 Aug, 2011

1 commit

80e0401e3 lockdep: Fix wrong assumption in match_held_lock ... Browse Code »

match_held_lock() was assuming it was being called on a lock class
that had already seen usage.

This condition was true for bug-free code using lockdep_assert_held(),
since you're in fact holding the lock when calling it. However the
assumption fails the moment you assume the assertion can fail, which
is the whole point of having the assertion in the first place.

Anyway, now that there's more lockdep_is_held() users, notably
__rcu_dereference_check(), its much easier to trigger this since we
test for a number of locks and we only need to hold any one of them to
be good.

Reported-by: Sergey Senozhatsky
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1312547787.28695.2.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-08-09 17:57:35 +0800

06 Aug, 2011

1 commit

b77f0f3c1 jump label: Reduce the cycle count by changing the link order ... Browse Code »

In the course of testing jump labels for use with the CFS
bandwidth controller, Paul Turner, discovered that using jump
labels reduced the branch count and the instruction count, but
did not reduce the cycle count or wall time.

I noticed that having the jump_label.o included in the kernel
but not used in any way still caused this increase in cycle
count and wall time. Thus, I moved jump_label.o in the
kernel/Makefile, thus changing the link order, and presumably
moving it out of hot icache areas. This brought down the cycle
count/time as expected.

In addition to Paul's testing, I've tested the patch using a
single 'static_branch()' in the getppid() path, and basically
running tight loops of calls to getppid(). Here are my results
for the branch disabled case:

With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled:

Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

3,969,510,217 instructions # 0.864 IPC ( +-0.000% )
4,592,334,954 cycles ( +- 0.046% )
751,634,470 branches ( +- 0.000% )

1.722635797 seconds time elapsed ( +- 0.046% )

Jump labels turned off (CONFIG_JUMP_LABEL not set), branch
disabled:

Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

4,009,611,846 instructions # 0.867 IPC ( +-0.000% )
4,622,210,580 cycles ( +- 0.012% )
771,662,904 branches ( +- 0.000% )

1.734341454 seconds time elapsed ( +- 0.022% )

Signed-off-by: Jason Baron
Cc: rth@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20110805204040.GG2522@redhat.com
Signed-off-by: Ingo Molnar
Tested-by: Paul Turner

Jason Baron
2011-08-06 05:57:33 +0800

05 Aug, 2011

2 commits

3272cab40 Merge branch 'linus' into perf/urgent ... Browse Code »

Merge reason: Include most of the merge window trees, to do fixes on top.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-08-05 16:33:55 +0800
f03683b8f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
slab, lockdep: Annotate the locks before using them
lockdep: Clear whole lockdep_map on initialization
slab, lockdep: Annotate slab -> rcu -> debug_object -> slab
lockdep: Fix up warning
lockdep: Fix trace_hardirqs_on_caller()
futex: Fix regression with read only mappings

Linus Torvalds
2011-08-05 10:44:04 +0800

04 Aug, 2011

3 commits

f59de8992 lockdep: Clear whole lockdep_map on initialization ... Browse Code »

lockdep_init_map() only initializes parts of lockdep_map and triggers
kmemcheck warning when it is copied as a whole. There isn't anything
to be gained by clearing selectively. memset() the whole structure
and remove loop for ->class_cache[] clearing.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532

Signed-off-by: Tejun Heo
Reported-and-tested-by: Christian Casteyde
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.org
Signed-off-by: Ingo Molnar

Tejun Heo
2011-08-04 16:17:56 +0800
70a0686a7 lockdep: Fix up warning ... Browse Code »

On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:

> /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
> /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
> distinct pointer types lacks a cast

The warning is harmless in this case, but the below makes it go away.

Reported-by: Arnaud Lacombe
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptop
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-08-04 16:17:41 +0800
7d36b26be lockdep: Fix trace_hardirqs_on_caller() ... Browse Code »

Commit dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
recursion") made a bit of a mess of the various checks and error
conditions.

In particular it moved the check for !irqs_disabled() before the
spurious enable test, resulting in some warnings.

Reported-by: Arnaud Lacombe
Reported-by: Dave Jones
Reported-and-tested-by: Sergey Senozhatsky
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-08-04 16:17:36 +0800