19 May, 2010
4 commits
-
Add get_online_cpus/put_online_cpus to ensure that no cpu goes
offline during the flushing of the padata percpu queues.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
yield was used to wait until all references of the internal control
structure in use are dropped before it is freed. This patch implements
padata_flush_queues which actively flushes the padata percpu queues
in this case.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
padata_get_next needs to check whether the next object that
need serialization must be parallel processed by the local cpu.
This check was wrong implemented and returned always true,
so the try_again loop in padata_reorder was never taken. This
can lead to object leaks in some rare cases due to a race that
appears with the trylock in padata_reorder. The try_again loop
was not a good idea after all, because a cpu could take that
loop frequently, so we handle this with a timer instead.This patch adds a timer to handle the race that appears with
the trylock. If cpu1 queues an object to the reorder queue while
cpu2 holds the pd->lock but left the while loop in padata_reorder
already, cpu2 can't care for this object and cpu1 exits because
it can't get the lock. Usually the next cpu that takes the lock
cares for this object too. We need the timer just if this object
was the last one that arrives to the reorder queues. The timer
function sends it out in this case.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu
03 May, 2010
6 commits
-
This patch puts get_online_cpus/put_online_cpus around the places
we modify the padata cpumask to ensure that no cpu goes offline
during this operation.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
padata_alloc_pd set up queues for all possible cpus.
This patch changes this to set up the queues just for
the used cpus.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
might_sleep() was placed before mutex_lock() in some places.
We remove them because mutex_lock() does might_sleep() too.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
This patch makes the padata cpu hotplug code dependend on CONFIG_HOTPLUG_CPU.
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Scaling the maximum number of objects in the parallel
codepath can lead to out of memory problems on bigsmp
machines.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu
25 Apr, 2010
1 commit
-
On ppc64 you get this error:
$ setarch ppc -R true
setarch: ppc: Unrecognized architecturebecause uname still reports ppc64 as the machine.
So mask off the personality flags when checking for PER_LINUX32.
Signed-off-by: Andreas Schwab
Reviewed-by: Christoph Hellwig
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Apr, 2010
1 commit
-
creds_are_invalid() reads both cred->usage and cred->subscribers and then
compares them to make sure the number of processes subscribed to a cred struct
never exceeds the refcount of that cred struct.The problem is that this can cause a race with both copy_creds() and
exit_creds() as the two counters, whilst they are of atomic_t type, are only
atomic with respect to themselves, and not atomic with respect to each other.This means that if creds_are_invalid() can read the values on one CPU whilst
they're being modified on another CPU, and so can observe an evolving state in
which the subscribers count now is greater than the usage count a moment
before.Switching the order in which the counts are read cannot help, so the thing to
do is to remove that particular check.I had considered rechecking the values to see if they're in flux if the test
fails, but I can't guarantee they won't appear the same, even if they've
changed several times in the meantime.Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled.
The problem is only likely to occur with multithreaded programs, and can be
tested by the tst-eintr1 program from glibc's "make check". The symptoms look
like:CRED: Invalid credentials
CRED: At include/linux/cred.h:240
CRED: Specified credentials: ffff88003dda5878 [real][eff]
CRED: ->magic=43736564, put_addr=(null)
CRED: ->usage=766, subscr=766
CRED: ->*uid = { 0,0,0,0 }
CRED: ->*gid = { 0,0,0,0 }
CRED: ->security is ffff88003d72f538
CRED: ->security {359, 359}
------------[ cut here ]------------
kernel BUG at kernel/cred.c:850!
...
RIP: 0010:[] [] __invalid_creds+0x4e/0x52
...
Call Trace:
[] copy_creds+0x6b/0x23fNote the ->usage=766 and subscr=766. The values appear the same because
they've been re-read since the check was made.Reported-by: Roland McGrath
Signed-off-by: David Howells
Signed-off-by: James Morris
21 Apr, 2010
1 commit
-
Patch 570b8fb505896e007fd3bb07573ba6640e51851d:
Author: Mathieu Desnoyers
Date: Tue Mar 30 00:04:00 2010 +0100
Subject: CRED: Fix memory leak in error handlingattempts to fix a memory leak in the error handling by making the offending
return statement into a jump down to the bottom of the function where a
kfree(tgcred) is inserted.This is, however, incorrect, as it does a kfree() after doing put_cred() if
security_prepare_creds() fails. That will result in a double free if 'error'
is jumped to as put_cred() will also attempt to free the new tgcred record by
virtue of it being pointed to by the new cred record.Signed-off-by: David Howells
Signed-off-by: James Morris
19 Apr, 2010
1 commit
-
The lockdep facility temporarily disables lockdep checking by
incrementing the current->lockdep_recursion variable. Such
disabling happens in NMIs and in other situations where lockdep
might expect to recurse on itself.This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero. In addition, this patch
removes the "likely()", as suggested by Lai Jiangshan.Reported-by: Frederic Weisbecker
Reported-by: David Miller
Tested-by: Frederic Weisbecker
Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference:
Signed-off-by: Ingo Molnar
11 Apr, 2010
1 commit
-
When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device
improperly by old_decode_dev and it results in an error while
hibernating with s2disk.All users already pass the new device number, so switch to
new_decode_dev().Signed-off-by: Jiri Slaby
Reported-and-tested-by: Jiri Kosina
Signed-off-by: "Rafael J. Wysocki"
08 Apr, 2010
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix sched_getaffinity()
07 Apr, 2010
2 commits
-
- We weren't zeroing p->rss_stat[] at fork()
- Consequently sync_mm_rss() was dereferencing tsk->mm for kernel
threads and was oopsing.- Make __sync_task_rss_stat() static, too.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648
[akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)]
Reported-by: Troels Liebe Bentsen
Signed-off-by: KAMEZAWA Hiroyuki
"Michael S. Tsirkin"
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Force MSI irq handlers to run with interrupts disabled
06 Apr, 2010
5 commits
-
taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with
the following error:sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument)
This box has 128 threads and 16 bytes is enough to cover it.
Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched:
sched_getaffinity(): Allow less than NR_CPUS length) is
comparing this 16 bytes agains nr_cpu_ids.Fix it by comparing nr_cpu_ids to the number of bits in the
cpumask we pass in.Signed-off-by: Anton Blanchard
Reviewed-by: KOSAKI Motohiro
Cc: Sharyathi Nagesh
Cc: Ulrich Drepper
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Jack Steiner
Cc: Russ Anderson
Cc: Mike Travis
LKML-Reference:
Signed-off-by: Ingo Molnar -
Module refcounting is implemented with a per-cpu counter for speed.
However there is a race when tallying the counter where a reference may
be taken by one CPU and released by another. Reference count summation
may then see the decrement without having seen the previous increment,
leading to lower than expected count. A module which never has its
actual reference drop below 1 may return a reference count of 0 due to
this race.Module removal generally runs under stop_machine, which prevents this
race causing bugs due to removal of in-use modules. However there are
other real bugs in module.c code and driver code (module_refcount is
exported) where the callers do not run under stop_machine.Fix this by maintaining running per-cpu counters for the number of
module refcount increments and the number of refcount decrements. The
increments are tallied after the decrements, so any decrement seen will
always have its corresponding increment counted. The final refcount is
the difference of the total increments and decrements, preventing a
low-refcount from being returned.Signed-off-by: Nick Piggin
Acked-by: Rusty Russell
Signed-off-by: Linus Torvalds -
There have been a number of reports of people seeing the message:
"name_count maxed, losing inode data: dev=00:05, inode=3185"
in dmesg. These usually lead to people reporting problems to the filesystem
group who are in turn clueless what they mean.Eventually someone finds me and I explain what is going on and that
these come from the audit system. The basics of the problem is that the
audit subsystem never expects a single syscall to 'interact' (for some
wish washy meaning of interact) with more than 20 inodes. But in fact
some operations like loading kernel modules can cause changes to lots of
inodes in debugfs.There are a couple real fixes being bandied about including removing the
fixed compile time limit of 20 or not auditing changes in debugfs (or
both) but neither are small and obvious so I am not sending them for
immediate inclusion (I hope Al forwards a real solution next devel
window).In the meantime this patch simply adds 'audit' to the beginning of the
crap message so if a user sees it, they come blame me first and we can
talk about what it means and make sure we understand all of the reasons
it can happen and make sure this gets solved correctly in the long run.Signed-off-by: Eric Paris
Signed-off-by: Linus Torvalds -
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
eeepc-wmi: include slab.h
staging/otus: include slab.h from usbdrv.h
percpu: don't implicitly include slab.h from percpu.h
kmemcheck: Fix build errors due to missing slab.h
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
x86: don't include slab.h from arch/x86/include/asm/pgtable_32.hFix up trivial conflicts in include/linux/percpu.h due to
is_kernel_percpu_address() having been introduced since the slab.h
cleanup with the percpu_up.c splitup. -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
module: add stub for is_module_percpu_address
percpu, module: implement and use is_kernel/module_percpu_address()
module: encapsulate percpu handling better and record percpu_size
05 Apr, 2010
4 commits
-
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Always build the powerpc perf_arch_fetch_caller_regs version
perf: Always build the stub perf_arch_fetch_caller_regs version
perf, probe-finder: Build fix on Debian
perf/scripts: Tuple was set from long in both branches in python_process_event()
perf: Fix 'perf sched record' deadlock
perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels
perf, x86: Fix AMD hotplug & constraint initialization
x86: Move notify_cpu_starting() callback to a later stage
x86,kgdb: Always initialize the hw breakpoint attribute
perf: Use hot regs with software sched switch/migrate events
perf: Correctly align perf event tracing buffer -
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlock
sched: Fix proc_sched_set_task() -
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: Add missing unlock
tracing: Fix lockdep warning in global_clock()
03 Apr, 2010
11 commits
-
Now that software events use perf_arch_fetch_caller_regs() too, we
need the stub version to be always built in for archs that don't
implement it.Fixes the following build error in PARISC:
kernel/built-in.o: In function `perf_event_task_sched_out':
(.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs'Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras -
* 'kgdb-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Turn off tracing while in the debugger
kgdb: use atomic_inc and atomic_dec instead of atomic_set
kgdb: eliminate kgdb_wait(), all cpus enter the same way
kgdbts,sh: Add in breakpoint pc offset for superh
kgdb: have ebin2mem call probe_kernel_write once -
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
Freezer: Fix buggy resume test for tasks frozen with cgroup freezer
Freezer: Only show the state of tasks refusing to freeze -
The kernel debugger should turn off kernel tracing any time the
debugger is active and restore it on resume.Signed-off-by: Jason Wessel
Reviewed-by: Steven Rostedt -
Memory barriers should be used for the kgdb cpu synchronization. The
atomic_set() does not imply a memory barrier.Reported-by: Will Deacon
Signed-off-by: Jason Wessel -
This is a kgdb architectural change to have all the cpus (master or
slave) enter the same function.A cpu that hits an exception (wants to be the master cpu) will call
kgdb_handle_exception() from the trap handler and then invoke a
kgdb_roundup_cpu() to synchronize the other cpus and bring them into
the kgdb_handle_exception() as well.A slave cpu will enter kgdb_handle_exception() from the
kgdb_nmicallback() and set the exception state to note that the
processor is a slave.Previously the salve cpu would have called kgdb_wait(). This change
allows the debug core to change cpus without resuming the system in
order to inspect arch specific cpu information.Signed-off-by: Jason Wessel
-
Rather than call probe_kernel_write() one byte at a time, process the
whole buffer locally and pass the entire result in one go. This way,
architectures that need to do special handling based on the length can
do so, or we only end up calling memcpy() once.[sonic.zhang@analog.com: Reported original problem and preliminary patch]
Signed-off-by: Jason Wessel
Signed-off-by: Sonic Zhang
Signed-off-by: Mike Frysinger -
Trivial typo fix. rq->migration_thread can be NULL after
task_rq_unlock(), this is why we have "mt" which should be
used instead.Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks
task_times(). Other places in kernel use nvcsw and nivcsw, which are
being cleared as well, Clear task statistics only.Reported-by: Török Edwin
Signed-off-by: Mike Galbraith
Cc: Hidetoshi Seto
Cc: Arjan van de Ven
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
perf sched record can deadlock a box should the holder of
handle->data->lock take an interrupt, and then attempt to
acquire an rq lock held by a CPU trying to acquire the
same lock. Disable interrupts.CPU0 CPU1
sched event with rq->lock held
grab handle->data->lock
spin on handle->data->lock
interrupt
try to grab rq->lockReported-by: Li Zefan
Signed-off-by: Mike Galbraith
Tested-by: Li Zefan
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
…eric/random-tracing into perf/urgent
01 Apr, 2010
2 commits
-
Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.This makes the task migration event working and fix the context
switch callchains and origin ip.Example: perf record -a -e cs
Before:
10.91% ksoftirqd/0 0 [k] 0000000000000000
|
--- (nil)
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_swevent_ctx_event
do_perf_sw_event
__perf_sw_event
perf_event_task_sched_out
schedule
run_ksoftirqd
kthread
kernel_thread_helperAfter:
23.77% hald-addon-stor [kernel.kallsyms] [k] schedule
|
--- schedule
|
|--60.00%-- schedule_timeout
| wait_for_common
| wait_for_completion
| blk_execute_rq
| scsi_execute
| scsi_execute_req
| sr_test_unit_ready
| |
| |--66.67%-- sr_media_change
| | media_changed
| | cdrom_media_changed
| | sr_block_media_changed
| | check_disk_change
| | cdrom_openv2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.cSigned-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: David Miller -
The trace event buffer used by perf to record raw sample events
is typed as an array of char and may then not be aligned to 8
by alloc_percpu().But we need it to be aligned to 8 in sparc64 because we cast
this buffer into a random structure type built by the TRACE_EVENT()
macro to store the traces. So if a random 64 bits field is accessed
inside, it may be not under an expected good alignment.Use an array of long instead to force the appropriate alignment, and
perform a compile time check to ensure the size in byte of the buffer
is a multiple of sizeof(long) so that its actual size doesn't get
shrinked under us.This fixes unaligned accesses reported while using perf lock
in sparc 64.Suggested-by: David Miller
Suggested-by: Tejun Heo
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: David Miller
Cc: Steven Rostedt