09 May, 2009
2 commits
-
Signed-off-by: Al Viro
-
Fix kprobes to lock text_mutex around some arch_arm/disarm_kprobe() which
are newly added by commit de5bd88d5a5cce3cacea904d3503e5ebdb3852a2.Signed-off-by: Masami Hiramatsu
Acked-by: Ananth N Mavinakayanahalli
Cc: Mathieu Desnoyers
Cc: Jim Keniston
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds
07 May, 2009
2 commits
-
When building with gcc 3.2 I get thousands of warnings such as
include/linux/gfp.h: In function `allocflags_to_migratetype':
include/linux/gfp.h:105: warning: null format stringdue to passing a NULL format string to warn_slowpath() in
#define __WARN() warn_slowpath(__FILE__, __LINE__, NULL)
Split this case out into a separate call. This also shrinks the kernel
slightly:text data bss dec hex filename
4802274 707668 712704 6222646 5ef336 vmlinux
text data bss dec hex filename
4799027 703572 712704 6215303 5ed687 vmlinuxdue to removeing one argument from the commonly-called __WARN().
[akpm@linux-foundation.org: reduce scope of `empty']
Acked-by: Jesper Nilsson
Acked-by: Johannes Weiner
Acked-by: Arjan van de Ven
Signed-off-by: Andi Kleen
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is what we believe to be a false positive reported by lockdep.
inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlockThe plan is to fix this via lockdep annotation, but that is proving to be
quite involved.The patch flips the allocation over to GFP_NFS to shut the warning up, for
the 2.6.30 release.Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm
to switch it back to GFP_KERNEL so we don't forget.=================================
[ INFO: inconsistent lock state ]
2.6.30-rc2-next-20090417 #203
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x35/0xb0
{RECLAIM_FS-ON-W} state was registered at:
[] mark_held_locks+0x68/0x90
[] lockdep_trace_alloc+0xf5/0x100
[] __kmalloc_node+0x31/0x1e0
[] kernel_event+0xe2/0x190
[] inotify_dev_queue_event+0x126/0x230
[] inotify_inode_queue_event+0xc6/0x110
[] vfs_create+0xcd/0x140
[] do_filp_open+0x88d/0xa20
[] do_sys_open+0x98/0x140
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff
irq event stamp: 690455
hardirqs last enabled at (690455): [] _spin_unlock_irqrestore+0x44/0x80
hardirqs last disabled at (690454): [] _spin_lock_irqsave+0x32/0xa0
softirqs last enabled at (690178): [] __do_softirq+0x202/0x220
softirqs last disabled at (690157): [] call_softirq+0x1c/0x50other info that might help us debug this:
2 locks held by kswapd0/380:
#0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x37/0x180
#1: (&type->s_umount_key#17){++++..}, at: [] shrink_dcache_memory+0x11f/0x1e0stack backtrace:
Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
Call Trace:
[] print_usage_bug+0x19f/0x200
[] ? save_stack_trace+0x2f/0x50
[] mark_lock+0x4bb/0x6d0
[] ? check_usage_forwards+0x0/0xc0
[] __lock_acquire+0xc62/0x1ae0
[] ? slob_free+0x10c/0x370
[] lock_acquire+0xe1/0x120
[] ? inotify_inode_is_dead+0x35/0xb0
[] mutex_lock_nested+0x63/0x420
[] ? inotify_inode_is_dead+0x35/0xb0
[] ? inotify_inode_is_dead+0x35/0xb0
[] ? sched_clock+0x9/0x10
[] ? lock_release_holdtime+0x35/0x1c0
[] inotify_inode_is_dead+0x35/0xb0
[] dentry_iput+0xbc/0xe0
[] d_kill+0x33/0x60
[] __shrink_dcache_sb+0x2d3/0x350
[] shrink_dcache_memory+0x15a/0x1e0
[] shrink_slab+0x125/0x180
[] kswapd+0x560/0x7a0
[] ? isolate_pages_global+0x0/0x2c0
[] ? autoremove_wake_function+0x0/0x40
[] ? trace_hardirqs_on+0xd/0x10
[] ? kswapd+0x0/0x7a0
[] kthread+0x5b/0xa0
[] child_rip+0xa/0x20
[] ? restore_args+0x0/0x30
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20[eparis@redhat.com: fix audit too]
Cc: Al Viro
Cc: Matt Mackall
Cc: Christoph Lameter
Signed-off-by: Wu Fengguang
Signed-off-by: Eric Paris
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 May, 2009
5 commits
-
* 'timers/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevents: prevent endless loop in tick_handle_periodic() -
* 'irq/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "genirq: assert that irq handlers are indeed running in hardirq context" -
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: account system time properly -
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kernel/posix-cpu-timers.c: fix sparse warning
dma-debug: remove broken dma memory leak detection for 2.6.30
locking: Documentation: lockdep-design.txt, fix note of state bits -
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: x86, mmiotrace: fix range test
tracing: fix ref count in splice pages
03 May, 2009
1 commit
-
Avoid setting less than two pages for vm_dirty_bytes: this is necessary to
avoid potential division by 0 (like the following) in get_dirty_limits().[ 49.951610] divide error: 0000 [#1] PREEMPT SMP
[ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
[ 49.952195] CPU 1
[ 49.952195] Modules linked in: pcspkr
[ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
[ 49.952195] RIP: 0010:[] [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202
[ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
[ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
[ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
[ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
[ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
[ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
[ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
[ 49.952195] Stack:
[ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
[ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
[ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
[ 49.952195] Call Trace:
[ 49.952195] [] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
[ 49.952195] [] ? ext3_writeback_write_end+0x9e/0x120
[ 49.952195] [] generic_file_buffered_write+0x12f/0x330
[ 49.952195] [] __generic_file_aio_write_nolock+0x26d/0x460
[ 49.952195] [] ? generic_file_aio_write+0x52/0xd0
[ 49.952195] [] generic_file_aio_write+0x69/0xd0
[ 49.952195] [] ext3_file_write+0x26/0xc0
[ 49.952195] [] do_sync_write+0xf1/0x140
[ 49.952195] [] ? get_lock_stats+0x2a/0x60
[ 49.952195] [] ? autoremove_wake_function+0x0/0x40
[ 49.952195] [] vfs_write+0xcb/0x190
[ 49.952195] [] sys_write+0x50/0x90
[ 49.952195] [] system_call_fastpath+0x16/0x1b
[ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
[ 49.952195] RIP [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP
[ 50.096523] ---[ end trace 008d7aa02f244d7b ]---Signed-off-by: Andrea Righi
Cc: Peter Zijlstra
Cc: David Rientjes
Cc: Dave Chinner
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 May, 2009
1 commit
-
tick_handle_periodic() can lock up hard when a one shot clock event
device is used in combination with jiffies clocksource.Avoid an endless loop issue by requiring that a highres valid
clocksource be installed before we call tick_periodic() in a loop when
using ONESHOT mode. The result is we will only increment jiffies once
per interrupt until a continuous hardware clocksource is available.Without this, we can run into a endless loop, where each cycle through
the loop, jiffies is updated which increments time by tick_period or
more (due to clock steering), which can cause the event programming to
think the next event was before the newly incremented time and fail
causing tick_periodic() to be called again and the whole process loops
forever.[ Impact: prevent hard lock up ]
Signed-off-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Cc: stable@kernel.org
01 May, 2009
1 commit
-
This reverts commit 044d408409cc4e1bc75c886e27ca85c270db104c.
The commit added a warning when handle_IRQ_event() is called outside
of hard interrupt context. This breaks the generic tasklet based
interrupt resend mechanism which is used when the hardware has no way
to retrigger the interrupt. So we get a warning for a use case which
is correct and worked for years. Remove it.Signed-off-by: Thomas Gleixner
30 Apr, 2009
1 commit
-
Sparse reports the following in kernel/posix-cpu-timers.c:
warning: symbol 'firing' shadows an earlier one
Signed-off-by: H Hartley Sweeten
Cc: Subrata Modak
LKML-Reference:
Signed-off-by: Ingo Molnar
29 Apr, 2009
2 commits
-
Andrew Gallatin reported that IRQ and SOFTIRQ times were
sometime not reported correctly on recent kernels, and even
bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4
([PATCH] fix scaled & unscaled cputime accounting) as the first
bad commit.Further analysis pointed that commit
79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime
accounting) was the real cause of the problem.account_process_tick() was not taking into account timer IRQ
interrupting the idle task servicing a hard or soft irq.On mostly idle cpu, irqs were thus not accounted and top or
mpstat could tell user/admin that cpu was 100 % idle, 0.00 %
irq, 0.00 % softirq, while it was not.[ Impact: fix occasionally incorrect CPU statistics in top/mpstat ]
Reported-by: Andrew Gallatin
Re-reported-by: Andrew Morton
Signed-off-by: Eric Dumazet
Acked-by: Martin Schwidefsky
Cc: rick.jones2@hp.com
Cc: brice@myri.com
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
LKML-Reference:
Signed-off-by: Ingo Molnar -
The pages allocated for the splice binary buffer did not initialize
the ref count correctly. This caused pages not to be freed and causes
a drastic memory leak.Thanks to logdev I was able to trace the tracer to find where the leak
was.[ Impact: stop memory leak when using splice ]
Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar
27 Apr, 2009
4 commits
-
…s/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex -
ptrace_attach() needs task->cred_exec_mutex, not current->cred_exec_mutex.
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Acked-by: David Howells
Signed-off-by: James Morris -
…git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/irq: mark NUMA_MIGRATE_IRQ_DESC broken
x86, irq: Remove IRQ_DISABLED check in process context IRQ move -
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: clarify kernel-taint warning message
lockdep, x86: account for irqs enabled in paranoid_exit
lockdep: more robust lockdep_map init sequence
25 Apr, 2009
1 commit
-
Commit c751085943362143f84346d274e0011419c84202 ("PM/Hibernate: Wait for
SCSI devices scan to complete during resume") added a call to
scsi_complete_async_scans() to software_resume(), so that it waited for
the SCSI scanning to complete, but the call was added at a wrong place.Namely, it should have been added after wait_for_device_probe(), which
is called only if the image partition hasn't been specified yet. Also,
it's reasonable to check if the image partition is present and only wait
for the device probing and SCSI scanning to complete if it is not the
case.Additionally, since noresume is checked right at the beginning of
software_resume() and the function returns immediately if it's set, it
doesn't make sense to check it once again later.Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds
24 Apr, 2009
1 commit
-
Slow-work appears to delete its timer as soon as the first user
unregisters, even though other users could be active. At the same time, it
never seems to delete slow_work_oom_timer. Arrange for both to happen in
the shutdown path.Signed-off-by: Jonathan Corbet
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
23 Apr, 2009
1 commit
-
Andi Kleen reported this message triggering on non-lockdep kernels:
Disabling lockdep due to kernel taint
Clarify the message to say 'lock debugging' - debug_locks_off()
turns off all things lock debugging, not just lockdep.[ Impact: change kernel warning message text ]
Reported-by: Andi Kleen
Cc: Peter Zijlstra
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
22 Apr, 2009
2 commits
-
Add enable() and disable() callbacks for clocksources.
This allows us to put unused clocksources in power save mode. The
functions clocksource_enable() and clocksource_disable() wrap the
callbacks and are inserted in the timekeeping code to enable before use
and disable after switching to a new clocksource.Signed-off-by: Magnus Damm
Acked-by: John Stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pass clocksource pointer to the read() callback for clocksources. This
allows us to share the callback between multiple instances.[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm
Acked-by: John Stultz
Cc: Thomas Gleixner
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Apr, 2009
1 commit
-
is_under() will DTRT anyway. And yes, is_subdir() behaviour
is intentional.Signed-off-by: Al Viro
20 Apr, 2009
1 commit
-
Commit 900af0d973856d6feb6fc088c2d0d3fde57707d3 (PM: Change suspend
code ordering) changed the ordering of suspend code in such a way
that the platform .prepare() callback is now executed after the
device drivers' late suspend callbacks have run. Unfortunately, this
turns out to break ARM platforms that need to talk via I2C to power
control devices during the .prepare() callback.For this reason introduce two new platform suspend callbacks,
.prepare_late() and .wake(), that will be called just prior to
disabling non-boot CPUs and right after bringing them back on line,
respectively, and use them instead of .prepare() and .finish() for
ACPI suspend. Make the PM core execute the .prepare() and .finish()
platform suspend callbacks where they were executed previously (that
is, right after calling the regular suspend methods provided by
device drivers and right before executing their regular resume
methods, respectively).It is not necessary to make analogous changes to the hibernation
code and data structures at the moment, because they are only used
by ACPI platforms.Signed-off-by: Rafael J. Wysocki
Reported-by: Russell King
Acked-by: Len Brown
19 Apr, 2009
1 commit
-
This function is not actually used right now, since the original use
case for it was done with insert_resource_expand_to_fit() instead.However, we now have another usage case that wants to basically do a
"reserve IO resource, splitting around existing resources", however that
one doesn't actually want the "recurse into the conflicting resource"
logic at all.And since recursing into the conflicting resource was the most complex
part, and isn't wanted, just remove it. Maybe we'll some day want both
versions, but we can just resurrect the logic then.Tested-by: Yinghai Lu
Signed-off-by: Linus Torvalds
18 Apr, 2009
1 commit
-
Steven Rostedt reported:
> OK, I think I figured this bug out. This is a lockdep issue with respect
> to tracepoints.
>
> The trace points in lockdep are called all the time. Outside the lockdep
> logic. But if lockdep were to trigger an error / warning (which this run
> did) we might be in trouble. For new locks, like the dentry->d_lock, that
> are created, they will not get a name:
>
> void lockdep_init_map(struct lockdep_map *lock, const char *name,
> struct lock_class_key *key, int subclass)
> {
> if (unlikely(!debug_locks))
> return;
>
> When a problem is found by lockdep, debug_locks becomes false. Thus we
> stop allocating names for locks. This dentry->d_lock I had, now has no
> name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non
> initialized memory. Thus, when the trace point was hit, it had junk for
> the lock->name, and the machine crashed.Ah, nice catch. I think we should put at least the name in regardless.
Ensure we at least initialize the trivial entries of the depmap so that
they can be relied upon, even when lockdep itself decided to pack up and
go home.[ Impact: fix lock tracing after lockdep warnings. ]
Reported-by: Steven Rostedt
Signed-off-by: Peter Zijlstra
Acked-by: Steven Rostedt
Cc: Andrew Morton
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Apr, 2009
5 commits
-
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix branch tracer header
tracing: Fix power tracer header -
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Avoid printing sched_group::__cpu_power for default case
tracing, sched: mark get_parent_ip() notrace -
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kernel/softirq.c: fix sparse warning
rcu: Make hierarchical RCU less IPI-happy -
Fix sparse warning in kernel/softirq.c.
warning: do-while statement is not a compound statement
Signed-off-by: H Hartley Sweeten
LKML-Reference:
Signed-off-by: Ingo Molnar -
Commit 46e0bb9c12f4 ("sched: Print sched_group::__cpu_power
in sched_domain_debug") produces a messy dmesg output while
attempting to print the sched_group::__cpu_power for each
group in the sched_domain hierarchy.Fix this by avoid printing the __cpu_power for default cases.
(i.e, __cpu_power == SCHED_LOAD_SCALE).[ Impact: reduce syslog clutter ]
Reported-by: Tony Luck
Signed-off-by: Gautham R Shenoy
Fixed-by: Tony Luck
Cc: a.p.zijlstra@chello.nl
LKML-Reference:
Signed-off-by: Ingo Molnar
16 Apr, 2009
1 commit
-
Don't try and predeclare inline funcs like this:
static inline void wait_migrated_callbacks(void)
...
static void _rcu_barrier(enum rcu_barrier type)
{
...
wait_migrated_callbacks();
}
...
static inline void wait_migrated_callbacks(void)
{
wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
}as it upsets some versions of gcc under some circumstances:
kernel/rcupdate.c: In function `_rcu_barrier':
kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available
kernel/rcupdate.c:152: sorry, unimplemented: called from hereThis can be dealt with by simply putting the static variables (rcu_migrate_*)
at the top, and moving the implementation of the function up so that it
replaces its forward declaration.Signed-off-by: David Howells
Cc: Dipankar Sarma
Cc: Paul E. McKenney
Signed-off-by: Linus Torvalds
15 Apr, 2009
1 commit
-
Remove code handling bio_alloc failure with __GFP_WAIT.
Signed-off-by: Nikanth Karthikesan
Signed-off-by: Jens Axboe
14 Apr, 2009
5 commits
-
As discussed in the thread here:
http://marc.info/?l=linux-kernel&m=123964468521142&w=2
Eric W. Biederman observed:
> It looks like some additional bugs have slipped in since last I looked.
>
> set_irq_affinity does this:
> ifdef CONFIG_GENERIC_PENDING_IRQ
> if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
> cpumask_copy(desc->affinity, cpumask);
> desc->chip->set_affinity(irq, cpumask);
> } else {
> desc->status |= IRQ_MOVE_PENDING;
> cpumask_copy(desc->pending_mask, cpumask);
> }
> #else
>
> That IRQ_DISABLED case is a software state and as such it has nothing to
> do with how safe it is to move an irq in process context.[...]
>
> The only reason we migrate MSIs in interrupt context today is that there
> wasn't infrastructure for support migration both in interrupt context
> and outside of it.Yes. The idea here was to force the MSI migration to happen in process
context. One of the patches in the series diddisable_irq(dev->irq);
irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
enable_irq(dev->irq);with the above patch adding irq/manage code check for interrupt disabled
and moving the interrupt in process context.IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
when we eventually submitted the patch upstream. But, looks like I did a
blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.Below patch fixes this. i.e., revert commit 932775a4ab622e3c99bd59f14cc
and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
in generic code as set_affinity routines are doing it internally.Reported-by: "Eric W. Biederman"
Signed-off-by: Venkatesh Pallipadi
Acked-by: "Eric W. Biederman"
Cc: "Li Shaohua"
Cc: Gary Hade
Cc: "lcm@us.ibm.com"
Cc: suresh.b.siddha@intel.com
LKML-Reference:
Signed-off-by: Ingo Molnar -
This patch fixes a hierarchical-RCU performance bug located by Anton
Blanchard. The problem stems from a misguided attempt to provide a
work-around for jiffies-counter failure. This work-around uses a per-CPU
n_rcu_pending counter, which is incremented on each call to rcu_pending(),
which in turn is called from each scheduling-clock interrupt. Each CPU
then treats this counter as a surrogate for the jiffies counter, so
that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
counter will cause RCU to invoke force_quiescent_state(), which in turn
will (among other things) send resched IPIs to CPUs that have thus far
failed to pass through an RCU quiescent state.Unfortunately, each CPU resets only its own counter after sending a
batch of IPIs. This means that the other CPUs will also (needlessly)
send -another- round of IPIs, for a full N-squared set of IPIs in the
worst case every three scheduler-clock ticks until the grace period
finally ends. It is not reasonable for a given CPU to reset each and
every n_rcu_pending for all the other CPUs, so this patch instead simply
disables the jiffies-counter "training wheels", thus eliminating the
excessive IPIs.Note that the jiffies-counter IPIs do not have this problem due to
the fact that the jiffies counter is global, so that the CPU sending
the IPIs can easily reset things, thus preventing the other CPUs from
sending redundant IPIs.Note also that the n_rcu_pending counter remains, as it will continue to
be used for tracing. It may also see use to update the jiffies counter,
should an appropriate kick-the-jiffies-counter API appear.Located-by: Anton Blanchard
Tested-by: Anton Blanchard
Signed-off-by: Paul E. McKenney
Cc: anton@samba.org
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: manfred@colorfullife.com
Cc: cl@linux-foundation.org
Cc: josht@linux.vnet.ibm.com
Cc: schamp@sgi.com
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: ego@in.ibm.com
Cc: laijs@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: peterz@infradead.org
Cc: penberg@cs.helsinki.fi
Cc: andi@firstfloor.org
Cc: "Paul E. McKenney"
LKML-Reference:
Signed-off-by: Ingo Molnar -
Before patch:
# tracer: branch
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
-2981 [000] 24008.872738: [ ok ] trace_irq_handler_exit:irq_event_types.h:41
-2981 [000] 24008.872742: [ ok ] note_interrupt:spurious.c:229
...After patch:
# tracer: branch
#
# TASK-PID CPU# TIMESTAMP CORRECT FUNC:FILE:LINE
# | | | | | |
-2985 [000] 26329.142970: [ ok ] slab_free:slub.c:1776
-2985 [000] 26329.142972: [ ok ] trace_kmem_cache_free:kmem_event_types.h:191
...Signed-off-by: Zhao Lei
Acked-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: remove overly redundant tracing entries
When tracer is "function" or "function_graph", way too much
"get_parent_ip" entries are recorded in ring_buffer.Signed-off-by: Lai Jiangshan
Acked-by: Frederic Weisbecker
Acked-by: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Impact: cleanup, fix
Clean up sys_shutdown() exit path. Factor out common code. Return
correct error code instead of always 0 on failure.Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds