07 Jun, 2014
23 commits
-
+ fix small typo
Signed-off-by: Fabian Frederick
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
trailing whitespace
Signed-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use #include instead of
Use #include instead ofSigned-off-by: Paul McQuade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
schedstr, sleepstr and kvmstr are only used in strcmp & strlen
Signed-off-by: Fabian Frederick
Cc: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Fabian Frederick
Cc: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
-uid->gid
-split some function declarations
-if/then/else warningSigned-off-by: Fabian Frederick
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start. This means the contents of
the last write to the sysctl controls the string contents instead of the
first:open("/proc/sys/kernel/modprobe", O_WRONLY) = 1
write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
write(1, "/bin/true", 9) = 9
close(1) = 0$ cat /proc/sys/kernel/modprobe
/bin/trueExpected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write. Similarly, multiple short writes would
not append to the sysctl.The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations. For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions). For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior. Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Consolidate buffer length checking with new-line/end-of-line checking.
Additionally, instead of reading user memory twice, just do the
assignment during the loop.This change doesn't affect the potential races here. It was already
possible to read a sysctl that was in the middle of a write. In both
cases, the string will always be NULL terminated. The pre-existing race
remains a problem to be solved.Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When writing to a sysctl string, each write, regardless of VFS position,
began writing the string from the start. This meant the contents of the
last write to the sysctl controlled the string contents instead of the
first.This misbehavior was featured in an exploit against Chrome OS. While
it's not in itself a vulnerability, it's a weirdness that isn't on the
mind of most auditors: "This filter looks correct, the first line
written would not be meaningful to sysctl" doesn't apply here, since the
size of the write and the contents of the final write are what matter
when writing to sysctls.This adds the sysctl kernel.sysctl_writes_strict to control the write
behavior. The default (0) reports when VFS position is non-0 on a
write, but retains legacy behavior, -1 disables the warning, and 1
enables the position-respecting behavior.The long-term plan here is to wait for userspace to be fixed in response
to the new warning and to then switch the default kernel behavior to the
new position-respecting behavior.This patch (of 4):
The char buffer arguments are needlessly cast in weird places. Clean it
up so things are easier to read.Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
+ some pr_warning -> pr_warn and checkpatch warning fixes
Signed-off-by: Fabian Frederick
Cc: Eric Biederman
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).Usage: add "crash_kexec_post_notifiers" to kernel boot option.
Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.Signed-off-by: Masami Hiramatsu
Acked-by: Motohiro Kosaki
Acked-by: Vivek Goyal
Cc: Eric Biederman
Cc: Yoshihiro YUNOMAE
Cc: Satoru MORIYA
Cc: Tomoki Sekiyama
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is a longstanding problem related to CPU hotplug which causes IPIs
to be delivered to offline CPUs, and the smp-call-function IPI handler
code prints out a warning whenever this is detected. Every once in a
while this (usually harmless) warning gets reported on LKML, but so far
it has not been completely fixed. Usually the solution involves finding
out the IPI sender and fixing it by adding appropriate synchronization
with CPU hotplug.However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more
specifically, in stop-machine) that can lead to this problem even when
the sender code is perfectly fine. This patchset fixes that
synchronization problem in the CPU hotplug stop-machine code.Patch 1 adds some additional debug code to the smp-call-function
framework, to help debug such issues easily.Patch 2 modifies the stop-machine code to ensure that any IPIs that were
sent while the target CPU was online, would be noticed and handled by
that CPU without fail before it goes offline. Thus, this avoids
scenarios where IPIs are received on offline CPUs (as long as the sender
uses proper hotplug synchronization).In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq()
function. But I was not able to find anything wrong with the block
layer code. That's when I started looking at the stop-machine code and
realized that there is a race-window which makes the IPI _receiver_ the
culprit, not the sender. Patch 2 fixes that race and hence this should
put an end to most of the hard-to-debug IPI-to-offline-CPU issues.This patch (of 2):
Today the smp-call-function code just prints a warning if we get an IPI
on an offline CPU. This info is sufficient to let us know that
something went wrong, but often it is very hard to debug exactly who
sent the IPI and why, from this info alone.In most cases, we get the warning about the IPI to an offline CPU,
immediately after the CPU going offline comes out of the stop-machine
phase and reenables interrupts. Since all online CPUs participate in
stop-machine, the information regarding the sender of the IPI is already
lost by the time we exit the stop-machine loop. So even if we dump the
stack on each CPU at this point, we won't find anything useful since all
of them will show the stack-trace of the stopper thread. So we need a
better way to figure out who sent the IPI and why.To achieve this, when we detect an IPI targeted to an offline CPU, loop
through the call-single-data linked list and print out the payload
(i.e., the name of the function which was supposed to be executed by the
target CPU). This would give us an insight as to who might have sent
the IPI and help us debug this further.[akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
Signed-off-by: Srivatsa S. Bhat
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Tejun Heo
Cc: Rusty Russell
Cc: Frederic Weisbecker
Cc: Christoph Hellwig
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Borislav Petkov
Cc: Steven Rostedt
Cc: Mike Galbraith
Cc: Gautham R Shenoy
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that we have kernel_sigaction() we can change wait_for_helper() to
use it and cleans up the code a bit.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that allow_signal() is really trivial we can unify it with
disallow_signal(). Add the new helper, kernel_sigaction(), and
reimplement allow_signal/disallow_signal as a trivial wrappers.This saves one EXPORT_SYMBOL() and the new helper can have more users.
Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
disallow_signal() simply sets SIG_IGN, this is not enough and
recalc_sigpending() is simply pointless because in can never change the
state of TIF_SIGPENDING.If we ignore a signal, we also need to do flush_sigqueue_mask() for the
case when this signal is pending, this way recalc_sigpending() can
actually clear TIF_SIGPENDING and we do not "leak" the allocated
siginfo's.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
allow_signal() does sigdelset(current->blocked) due to historic reason,
previously it could be called by a daemonize()'ed kthread, and
daemonize() played with current->blocked.Now that daemonize() has gone away we can remove sigdelset() and
recalc_sigpending(). If a user really wants to unblock a signal, it
must use sigprocmask() or set_current_block() explicitely.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the declaration/definition of allow_signal/disallow_signal to
signal.h/signal.c. The new place is more logical and allows to use the
static helpers in signal.c (see the next changes).While at it, make them return void and remove the valid_signal() check.
Nobody checks the returned value, and in-kernel users must not pass the
wrong signal number.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The usage of "task_struct *t" and "current" in do_sigaction() looks really
annoying and chaotic. Initially "t" is used as a cached value of current
but not consistently, then it is reused as a loop variable and we have to
use "current" again.Clean up this mess and also convert the code to use for_each_thread().
Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
"rm_from_queue_full" looks ugly and misleading, especially now that
rm_from_queue() has gone away. Rename it to flush_sigqueue_mask(), this
matches flush_sigqueue() we already have.Also remove the obsolete comment which explains the difference with
rm_from_queue() we already killed.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
rm_from_queue() doesn't make sense. The only caller, prepare_signal(),
can use rm_from_queue_full() with the same effect.While at it, change prepare_signal() to use for_each_thread() instead of
do/while_each_thread.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cosmetic, but siginitset(0) looks a bit strange, sigemptyset() is what
do_sigtimedwait() needs.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Al Viro
Cc: David Woodhouse
Cc: Frederic Weisbecker
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Cc: Richard Weinberger
Cc: Steven Rostedt
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__wake_up_bit() checks waitqueue_active() and thus the caller needs mb()
as wake_up_bit() documents, fix task_clear_jobctl_trapping().Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When tracing a process in another pid namespace, it's important for fork
event messages to contain the child's pid as seen from the tracer's pid
namespace, not the parent's. Otherwise, the tracer won't be able to
correlate the fork event with later SIGTRAP signals it receives from the
child.We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value. However, sending a
bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.Signed-off-by: Matthew Dempsky
Acked-by: Oleg Nesterov
Cc: Kees Cook
Cc: Julien Tinnes
Cc: Roland McGrath
Cc: Jan Kratochvil
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jun, 2014
1 commit
-
Pull ARM updates from Russell King:
- Major clean-up of the L2 cache support code. The existing mess was
becoming rather unmaintainable through all the additions that others
have done over time. This turns it into a much nicer structure, and
implements a few performance improvements as well.- Clean up some of the CP15 control register tweaks for alignment
support, moving some code and data into alignment.c- DMA properties for ARM, from Santosh and reviewed by DT people. This
adds DT properties to specify bus translations we can't discover
automatically, and to indicate whether devices are coherent.- Hibernation support for ARM
- Make ftrace work with read-only text in modules
- add suspend support for PJ4B CPUs
- rework interrupt masking for undefined instruction handling, which
allows us to enable interrupts earlier in the handling of these
exceptions.- support for big endian page tables
- fix stacktrace support to exclude stacktrace functions from the
trace, and add save_stack_trace_regs() implementation so that kprobes
can record stack traces.- Add support for the Cortex-A17 CPU.
- Remove last vestiges of ARM710 support.
- Removal of ARM "meminfo" structure, finally converting us solely to
memblock to handle the early memory initialisation.* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (142 commits)
ARM: ensure C page table setup code follows assembly code (part II)
ARM: ensure C page table setup code follows assembly code
ARM: consolidate last remaining open-coded alignment trap enable
ARM: remove global cr_no_alignment
ARM: remove CPU_CP15 conditional from alignment.c
ARM: remove unused adjust_cr() function
ARM: move "noalign" command line option to alignment.c
ARM: provide common method to clear bits in CPU control register
ARM: 8025/1: Get rid of meminfo
ARM: 8060/1: mm: allow sub-architectures to override PCI I/O memory type
ARM: 8066/1: correction for ARM patch 8031/2
ARM: 8049/1: ftrace/add save_stack_trace_regs() implementation
ARM: 8065/1: remove last use of CONFIG_CPU_ARM710
ARM: 8062/1: Modify ldrt fixup handler to re-execute the userspace instruction
ARM: 8047/1: rwsem: use asm-generic rwsem implementation
ARM: l2c: trial at enabling some Cortex-A9 optimisations
ARM: l2c: add warnings for stuff modifying aux_ctrl register values
ARM: l2c: print a warning with L2C-310 caches if the cache size is modified
ARM: l2c: remove old .set_debug method
ARM: l2c: kill L2X0_AUX_CTRL_MASK before anyone else makes use of this
...
05 Jun, 2014
16 commits
-
Pull x86 cdso updates from Peter Anvin:
"Vdso cleanups and improvements largely from Andy Lutomirski. This
makes the vdso a lot less ''special''"* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso, build: Make LE access macros clearer, host-safe
x86/vdso, build: Fix cross-compilation from big-endian architectures
x86/vdso, build: When vdso2c fails, unlink the output
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
x86, mm: Improve _install_special_mapping and fix x86 vdso naming
mm, fs: Add vm_ops->name as an alternative to arch_vma_name
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
x86, vdso: Move the 32-bit vdso special pages after the text
x86, vdso: Reimplement vdso.so preparation in build-time C
x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
x86, vdso: Clean up 32-bit vs 64-bit vdso params
x86, mm: Ensure correct alignment of the fixmap -
Merge misc updates from Andrew Morton:
- a few fixes for 3.16. Cc'ed to stable so they'll get there somehow.
- various misc fixes and cleanups
- most of the ocfs2 queue. Review is slow...
- most of MM. The MM queue is pretty huge this time, but not much in
the way of feature work.- some tweaks under kernel/
- printk maintenance work
- updates to lib/
- checkpatch updates
- tweaks to init/
* emailed patches from Andrew Morton : (276 commits)
fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
init/main.c: remove an ifdef
kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
init/main.c: add initcall_blacklist kernel parameter
init/main.c: don't use pr_debug()
fs/binfmt_flat.c: make old_reloc() static
fs/binfmt_elf.c: fix bool assignements
fs/efs: convert printk(KERN_DEBUG to pr_debug
fs/efs: add pr_fmt / use __func__
fs/efs: convert printk to pr_foo()
scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
checkpatch: check stable email address
checkpatch: warn on unnecessary void function return statements
checkpatch: prefer kstrto to sscanf(buf, "%", &bar);
checkpatch: add warning for kmalloc/kzalloc with multiply
checkpatch: warn on #defines ending in semicolon
checkpatch: make --strict a default for files in drivers/net and net/
checkpatch: always warn on missing blank line after variable declaration block
checkpatch: fix wildcard DT compatible string checking
... -
Fix 4 checkpatch warnings
WARNING: sizeof *tv should be sizeof(*tv)Signed-off-by: Fabian Frederick
Cc: "H. Peter Anvin"
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
... instead of naked numbers.
Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.Signed-off-by: Borislav Petkov
Acked-by: Kees Cook
Acked-by: Randy Dunlap
Cc: Joe Perches
Cc: Valdis Kletnieks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the log ring buffer becomes full, we silently overwrite old messages
with new data. console_unlock will detect this case and fast-forward the
console_* pointers to skip over the corrupted data, but nothing will be
reported to the user.This patch hijacks the first valid log message after detecting that we
dropped messages and prefixes it with a note detailing how many messages
were dropped. For long (~1000 char) messages, this will result in some
truncation of the real message, but given that we're dropping things
anyway, that doesn't seem to be the end of the world.Signed-off-by: Will Deacon
Acked-by: Peter Zijlstra
Cc: Kay Sievers
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Jiri Bohac pointed out that there are rare but potential deadlock
possibilities when calling printk while holding the timekeeping
seqlock.This is due to printk() triggering console sem wakeup, which can
cause scheduling code to trigger hrtimers which may try to read
the time.Specifically, as Jiri pointed out, that path is:
printk
vprintk_emit
console_unlock
up(&console_sem)
__up
wake_up_process
try_to_wake_up
ttwu_do_activate
ttwu_activate
activate_task
enqueue_task
enqueue_task_fair
hrtick_update
hrtick_start_fair
hrtick_start_fair
get_time
ktime_get
--> endless loop on
read_seqcount_retry(&timekeeper_seq, ...)This patch tries to avoid this issue by using printk_deferred (previously
named printk_sched) which should defer printing via a irq_work_queue.Signed-off-by: John Stultz
Reported-by: Jiri Bohac
Reviewed-by: Steven Rostedt
Cc: Jan Kara
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Two of the three prink_deferred uses are really printk_once style
uses, so add a printk_deferred_once macro to simplify those call
sites.Signed-off-by: John Stultz
Reviewed-by: Steven Rostedt
Reviewed-by: Jan Kara
Cc: Peter Zijlstra
Cc: Jiri Bohac
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.This only changes the function name. No logic changes.
Signed-off-by: John Stultz
Reviewed-by: Steven Rostedt
Cc: Jan Kara
Cc: Peter Zijlstra
Cc: Jiri Bohac
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
An earlier change in -mm (printk: remove separate printk_sched
buffers...), removed the printk_sched irqsave/restore lines since it was
safe for current users. Since we may be expanding usage of
printk_sched(), disable preepmtion for this function to make it more
generally safe to call.Signed-off-by: John Stultz
Reviewed-by: Jan Kara
Cc: Peter Zijlstra
Cc: Jiri Bohac
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.Signed-off-by: Steven Rostedt
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()). However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Printk calls mutex_acquire() / mutex_release() by hand to instrument
lockdep about console_sem. However in some corner cases the
instrumentation is missing. Fix the problem by creating helper functions
for locking / unlocking console_sem which take care of lockdep
instrumentation as well.Signed-off-by: Jan Kara
Reported-by: Fabio Estevam
Reported-by: Andy Shevchenko
Tested-by: Fabio Estevam
Tested-By: Valdis Kletnieks
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There's no reason to hold lockbuf_lock when entering
console_trylock_for_printk().The first thing this function does is to call down_trylock(console_sem)
and if that fails it immediately unlocks lockbuf_lock. So lockbuf_lock
isn't needed for that branch. When down_trylock() succeeds, the rest of
console_trylock() is OK without lockbuf_lock (it is called without it
from other places), and the only remaining thing in
console_trylock_for_printk() is can_use_console() call. For that call
console_sem is enough (it iterates all consoles and checks CON_ANYTIME
flag).So we drop logbuf_lock before entering console_trylock_for_printk() which
simplifies the code.[akpm@linux-foundation.org: fix have_callable_console() comment]
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Comment about interesting interlocking between lockbuf_lock and
console_sem is outdated.It was added in 2002 by commit a880f45a48be during conversion of
console_lock to console_sem + lockbuf_lock.At that time release_console_sem() (today's equivalent is
console_unlock()) was indeed using lockbuf_lock to avoid races between
trylock on console_sem in printk() and unlock of console_sem. However
these days the interlocking is gone and the races are avoided by
rechecking logbuf state after releasing console_sem.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I wonder if anyone uses printk return value but it is there and should be
counted correctly.This patch modifies log_store() to return the number of really stored
bytes from the 'text' part. Also it handles the return value in
vprintk_emit().Note that log_store() is used also in cont_flush() but we could ignore the
return value there. The function works with characters that were already
counted earlier. In addition, the store could newer fail here because the
length of the printed text is limited by the "cont" buffer and "dict" is
NULL.Signed-off-by: Petr Mladek
Cc: Jan Kara
Cc: Jiri Kosina
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds