Eric Lee / smarc-fsl-linux-kernel

15 Nov, 2016

1 commit

f5c9f9c72 Revert "printk: make reading the kernel log flush pending lines" ... Browse Code »

This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.

It turns out that this flushes things much too aggressiverly, and causes
lines to break up when the system logger races with new continuation
lines being printed.

There's a pending patch to make printk() flushing much more
straightforward, but it's too invasive for 4.9, so in the meantime let's
just not make the system message logging flush continuation lines.
They'll be flushed by the final newline anyway.

Suggested-by: Petr Mladek
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-11-15 01:31:52 +0800

12 Nov, 2016

1 commit

c6c7d83b9 Revert "console: don't prefer first registered if DT specifies stdout-path" ... Browse Code »

This reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path").

The reverted commit changes existing behavior on which many ARM boards
rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have
both a video output and a serial console. Depending on whether the user
is using the device as a more regular computer; or as a headless device
we need to have the console on either one or the other.

Many users rely on the kernel behavior of the console being present on
both outputs, before the reverted commit the console setup with no
console= kernel arguments on an ARM board which sets stdout-path in dt
would look like this:

[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64
tty0 -WU (E p ) 4:1

Where as after the reverted commit, it looks like this:

[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64

This commit reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path") restoring the original
behavior.

Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com
Signed-off-by: Hans de Goede
Cc: Paul Burton
Cc: Rob Herring
Cc: Frank Rowand
Cc: Thorsten Leemhuis
Cc: Greg Kroah-Hartman
Cc: Tejun Heo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans de Goede
2016-11-12 00:12:37 +0800

20 Oct, 2016

1 commit

8835ca59d printk: suppress empty continuation lines ... Browse Code »

We have a fairly common pattern where you print several things as
continuations on one single line in a loop, and then at the end you do

printk(KERN_CONT "\n");

to flush the buffered output.

But if the output was flushed by something else (concurrent printk
activity, or just system logging), we don't want that final flushing to
just print an empty line.

So just suppress empty continuation lines when they couldn't be merged
into the line they are a continuation of.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-20 00:11:24 +0800

11 Oct, 2016

1 commit

563873318 Merge branch 'printk-cleanups' ... Browse Code »

Merge my system logging cleanups, triggered by the broken '\n' patches.

The line continuation handling has been broken basically forever, and
the code to handle the system log records was both confusing and
dubious. And it would do entirely the wrong thing unless you always had
a terminating newline, partly because it couldn't actually see whether a
message was marked KERN_CONT or not (but partly because the LOG_CONT
handling in the recording code was rather confusing too).

This re-introduces a real semantically meaningful KERN_CONT, and fixes
the few places I noticed where it was missing. There are probably more
missing cases, since KERN_CONT hasn't actually had any semantic meaning
for at least four years (other than the checkpatch meaning of "no log
level necessary, this is a continuation line").

This also allows the combination of KERN_CONT and a log level. In that
case the log level will be ignored if the merging with a previous line
is successful, but if a new record is needed, that new record will now
get the right log level.

That also means that you can at least in theory combine KERN_CONT with
the "pr_info()" style helpers, although any use of pr_fmt() prefixing
would make that just result in a mess, of course (the prefix would end
up in the middle of a continuing line).

* printk-cleanups:
printk: make reading the kernel log flush pending lines
printk: re-organize log_output() to be more legible
printk: split out core logging code into helper function
printk: reinstate KERN_CONT for printing continuation lines

Linus Torvalds
2016-10-11 00:29:50 +0800

10 Oct, 2016

4 commits

bfd8d3f23 printk: make reading the kernel log flush pending lines ... Browse Code »

That will mean that any possible subsequent continuation will now be
broken up onto a line of its own (since reading the log has finalized
the beginning og the line), but if user space has activated system
logging (or if there's a kernel message dump going on) that is the right
thing to do.

And now that we actually get the continuation flags _right_ for this
all, the user space logger that is reading the kernel messages can
actually see the continuation marker. Not that anybody seems to really
bother with it (or care), but in theory user space can do its own
message stitching.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-10 03:23:40 +0800
5e467652f printk: re-organize log_output() to be more legible ... Browse Code »

Avoid some duplicate logic now that we can return early, and update the
comments for the new LOG_CONT world order.

This also stops the continuation flushing from just using random record
flags for the flushing action, instead taking the flags from the proper
original line and updating them as we add continuations to it.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-10 03:23:40 +0800
c362c7ff8 printk: split out core logging code into helper function ... Browse Code »

The code that actually decides how to log the message (whether to put it
directly into the record log, whether to append it to an existing
buffered log, or whether to start a new buffered log) is fairly
non-obvious code in the middle of the vprintk_emit() function.

Splitting that code up into a helper function makes it easier to
understand, but perhaps more importantly also allows for the code to
just return early out of the helper function once it has made the
decision about where the new log content goes.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-10 03:23:39 +0800
4bcc595cc printk: reinstate KERN_CONT for printing continuation lines ... Browse Code »

Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.

Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.

To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:

474925277671 ("printk: add KERN_CONT annotation").

That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.

To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.

5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")

and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.

Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.

You can see the beginning of that conversion in commits

e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")

with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.

And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.

However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit

61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.

This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.

But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.

For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.

Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.

Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.

So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.

There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.

That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.

But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".

(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-10 03:23:38 +0800

08 Oct, 2016

1 commit

05fd007e4 console: don't prefer first registered if DT specifies stdout-path ... Browse Code »

If a device tree specifies a preferred device for kernel console output
via the stdout-path or linux,stdout-path chosen node properties or the
stdout alias then the kernel ought to honor it & output the kernel
console to that device. As it stands, this isn't the case. Whilst we
parse the stdout-path properties & set an of_stdout variable from
of_alias_scan(), and use that from of_console_check() to determine
whether to add a console device as a preferred console whilst
registering it, we also prefer the first registered console if no other
has been selected at the time of its registration.

This means that if a console other than the one the device tree selects
via stdout-path is registered first, we will switch to using it & when
the stdout-path console is later registered the call to
add_preferred_console() via of_console_check() is too late to do
anything useful. In practice this seems to mean that we switch to the
dummy console device fairly early & see no further console output:

Console: colour dummy device 80x25
console [tty0] enabled
bootconsole [ns16550a0] disabled

Fix this by not automatically preferring the first registered console if
one is specified by the device tree. This allows consoles to be
registered but not enabled, and once the driver for the console selected
by stdout-path calls of_console_check() the driver will be added to the
list of preferred consoles before any other console has been enabled.
When that console is then registered via register_console() it will be
enabled as expected.

Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.com
Signed-off-by: Paul Burton
Cc: Ralf Baechle
Cc: Paul Burton
Cc: Tejun Heo
Cc: Sergey Senozhatsky
Cc: Jiri Slaby
Cc: Daniel Vetter
Cc: Ivan Delalande
Cc: Thierry Reding
Cc: Borislav Petkov
Cc: Jan Kara
Cc: Petr Mladek
Cc: Joe Perches
Cc: Greg Kroah-Hartman
Cc: Rob Herring
Cc: Frank Rowand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Burton
2016-10-08 09:46:30 +0800

02 Sep, 2016

1 commit

19feeff18 printk/nmi: avoid direct printk()-s from __printk_nmi_flush() ... Browse Code »

__printk_nmi_flush() can be called from nmi_panic(), therefore it has to
test whether it's executed in NMI context and thus must route the
messages through deferred printk() or via direct printk().

This is to avoid potential deadlocks, as described in commit
cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic").

However there remain two places where __printk_nmi_flush() does
unconditional direct printk() calls:

- pr_err("printk_nmi_flush: internal error ...")
- pr_cont("\n")

Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
function, which takes care of in_nmi(), and use it in
__printk_nmi_flush() for printing and error-reporting.

Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Cc: Petr Mladek
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-09-02 08:52:01 +0800

27 Aug, 2016

1 commit

ae6c33ba6 printk: fix parsing of "brl=" option ... Browse Code »

Commit bbeddf52adc1 ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.

Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files")
Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Iooss
2016-08-27 08:39:35 +0800

10 Aug, 2016

1 commit

a0cba2179 Revert "printk: create pr_<level> functions" ... Browse Code »

This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142.

Geert Uytterhoeven reports:
"This change seems to have an (unintendent?) side-effect.

Before, pr_*() calls without a trailing newline characters would be
printed with a newline character appended, both on the console and in
the output of the dmesg command.

After this commit, no new line character is appended, and the output
of the next pr_*() call of the same type may be appended, like in:

- Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
- Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
+ Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"

Joe Perches says:
"No, that is not intentional.

The newline handling code inside vprintk_emit is a bit involved and
for now I suggest a revert until this has all the same behavior as
earlier"

Reported-by: Geert Uytterhoeven
Requested-by: Joe Perches
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-08-10 01:48:18 +0800

09 Aug, 2016

1 commit

574673c23 printk: Remove unnecessary #ifdef CONFIG_PRINTK ... Browse Code »

In commit 874f9c7da9a4 ("printk: create pr_ functions"), new
pr_level defines were added to printk.c.

These new defines are guarded by an #ifdef CONFIG_PRINTK - however,
there is already a surrounding #ifdef CONFIG_PRINTK starting a lot
earlier in line 249 which means the newly introduced #ifdef is
unnecessary.

Let's remove it to avoid confusion.

Signed-off-by: Andreas Ziegler
Cc: Joe Perches
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Andreas Ziegler
2016-08-09 02:29:39 +0800

03 Aug, 2016

5 commits

750afe7ba printk: add kernel parameter to control writes to /dev/kmsg ... Browse Code »

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:

* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

This changes the kernel default setting of "on" to "ratelimit" and we do
that because we want to keep userspace spamming /dev/kmsg to sane
levels. This is especially moot when a small kernel log buffer wraps
around and messages get lost. So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.

That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line. If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).

This patch is based on previous patches from Linus and Steven.

[bp@suse.de: fixes]
Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
Signed-off-by: Borislav Petkov
Cc: Dave Young
Cc: Franck Bui
Cc: Greg Kroah-Hartman
Cc: Ingo Molnar
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Uwe Kleine-König
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Borislav Petkov
2016-08-03 07:35:06 +0800
40a7d9f5f printk: include <asm/sections.h> instead of <asm-generic/sections.h> ... Browse Code »

asm-generic headers are generic implementations for architecture
specific code and should not be included by common code. Thus use the
asm/ version of sections.h to get at the linker sections.

Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-08-03 07:35:05 +0800
cf7754441 printk: introduce suppress_message_printing() ... Browse Code »

Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.

Schematically, console_unlock() does the following:

console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);

if (msg->flags & LOG_NOCONS) {
...
goto skip;
}

level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.

if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);

call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;

console->write(...);
}

local_irq_restore(flags);
}
...
}

The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.

This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the

console_seq, console_idx, console_prev

and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.

We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.

The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.

Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Reviewed-by: Petr Mladek
Cc: Tejun Heo
Cc: Jan Kara
Cc: Calvin Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-08-03 07:35:04 +0800
874f9c7da printk: create pr_<level> functions ... Browse Code »

Using functions instead of macros can reduce overall code size by
eliminating unnecessary "KERN_SOH" prefixes from format strings.

defconfig x86-64:

$ size vmlinux*
text data bss dec hex filename
10193570 4331464 1105920 15630954 ee826a vmlinux.new
10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old

As the return value are unimportant and unused in the kernel tree, these
new functions return void.

Miscellanea:

- change pr_ macros to call new __pr_ functions
- change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

[akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2016-08-03 07:35:04 +0800
bebca0528 printk: do not include interrupt.h ... Browse Code »

A trivial cosmetic change: interrupt.h header is redundant since commit
6b898c07cb1d ("console: use might_sleep in console_lock").

Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-08-03 07:35:03 +0800

29 Jul, 2016

1 commit

8b70ca656 printk: when dumping regs, show the stack, not thread_info ... Browse Code »

We currently show:

task: ti: task.ti: "

"ti" and "task.ti" are redundant, and neither is actually what we want
to show, which the the base of the thread stack. Change the display to
show the stack pointer explicitly.

Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org
Signed-off-by: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-07-29 07:07:41 +0800

21 May, 2016

4 commits

cf9b1106c printk/nmi: flush NMI messages on the system panic ... Browse Code »

In NMI context, printk() messages are stored into per-CPU buffers to
avoid a possible deadlock. They are normally flushed to the main ring
buffer via an IRQ work. But the work is never called when the system
calls panic() in the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is
generated. In this case it does not risk a double release and bails out
when the logbuf_lock is already taken. The aim is to get the messages
into the main ring buffer when possible. It makes them better
accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs
are down. It might be more aggressive and reset logbuf_lock. The aim
is to get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again.
But it is done via printk_deferred() so that the console handling is
skipped. Consoles use internal locks and we could not prevent a
deadlock easily. They are explicitly called later when the crash dump
is not generated, see console_flush_on_panic().

Signed-off-by: Petr Mladek
Cc: Benjamin Herrenschmidt
Cc: Daniel Thompson
Cc: David Miller
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jiri Kosina
Cc: Martin Schwidefsky
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Steven Rostedt
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800
427934b87 printk/nmi: increase the size of NMI buffer and make it configurable ... Browse Code »

Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context. The warnings are gone
when I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific implementation
that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
stack trace on all CPUs"). Nobody noticed it because it did not print
any warnings.

Signed-off-by: Petr Mladek
Cc: Jan Kara
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Russell King
Cc: Daniel Thompson
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: David Miller
Cc: Daniel Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800
b522deabc printk/nmi: warn when some message has been lost in NMI context ... Browse Code »

We could not resize the temporary buffer in NMI context. Let's warn if
a message is lost.

This is rather theoretical. printk() should not be used in NMI. The
only sensible use is when we want to print backtrace from all CPUs. The
current buffer should be enough for this purpose.

[akpm@linux-foundation.org: whitespace fixlet]
Signed-off-by: Petr Mladek
Cc: Jan Kara
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Russell King
Cc: Daniel Thompson
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: David Miller
Cc: Daniel Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800
42a0bb3f7 printk/nmi: generic solution for safe printk in NMI ... Browse Code »

printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.

This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek
Suggested-by: Peter Zijlstra
Suggested-by: Steven Rostedt
Cc: Jan Kara
Acked-by: Russell King [arm part]
Cc: Daniel Thompson
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: David Miller
Cc: Daniel Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800

18 Mar, 2016

4 commits

f468908bb printk: add clear_idx symbol to vmcoreinfo ... Browse Code »

This allows us to extract from the vmcore only the messages emitted
since the last time the ring buffer was cleared. We just have to make
sure its value is always up-to-date, when old messages are discarded to
free space in log_make_free_space() for example.

Signed-off-by: Zeyu Zhao
Signed-off-by: Ivan Delalande
Cc: Kay Sievers
Cc: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ivan Delalande
2016-03-18 06:09:34 +0800
adaf6590e printk: check CON_ENABLED in have_callable_console() ... Browse Code »

have_callable_console() must also test CON_ENABLED bit, not just
CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can
wrongly assume that it's safe to call_console_drivers().

Signed-off-by: Sergey Senozhatsky
Reviewed-by: Petr Mladek
Cc: Jan Kara
Cc: Tejun Heo
Cc: Kyle McMartin
Cc: Dave Jones
Cc: Calvin Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-03-18 06:09:34 +0800
6b97a20d3 printk: set may_schedule for some of console_trylock() callers ... Browse Code »

console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1, since 8d91f8b15361 ("printk: do
cond_resched() between lines while outputting to consoles").

The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0

However, console_trylock() callers (among them is printk()) do not
always call printk() from atomic contexts, and some of them can
cond_resched() in console_unlock(), so console_trylock() can set
`console_may_schedule' to 1 for such processes.

For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
sets `console_may_schedule' to 0.

It's possible to drop explicit preempt_disable()/preempt_enable() in
vprintk_emit(), because console_unlock() and console_trylock() are now
smart enough:
a) console_unlock() does not cond_resched() when it's unsafe
(console_trylock() takes care of that)
b) console_unlock() does can_use_console() check.

Signed-off-by: Sergey Senozhatsky
Reviewed-by: Petr Mladek
Cc: Jan Kara
Cc: Tejun Heo
Cc: Kyle McMartin
Cc: Dave Jones
Cc: Calvin Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-03-18 06:09:34 +0800
a8199371a printk: move can_use_console() out of console_trylock_for_printk() ... Browse Code »

console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").

The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0

printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).

printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.

This patch (of 3):

vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.

can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.

Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.

CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock

call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}

console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself

[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify

while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online

So grabbing the console_sem lock while CPU is !online is possible,
in theory.

This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.

Signed-off-by: Sergey Senozhatsky
Reviewed-by: Petr Mladek
Cc: Jan Kara
Cc: Tejun Heo
Cc: Kyle McMartin
Cc: Dave Jones
Cc: Calvin Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-03-18 06:09:34 +0800

21 Jan, 2016

1 commit

5c9cf8af2 kernel: printk: specify alignment for struct printk_log ... Browse Code »

On architectures that have support for efficient unaligned access struct
printk_log has 4-byte alignment. Specify alignment attribute in type
declaration.

The whole point of this patch is to fix deadlock which happening when
UBSAN detects unaligned access in printk() thus UBSAN recursively calls
printk() with logbuf_lock held by top printk() call.

Signed-off-by: Andrey Ryabinin
Cc: Peter Zijlstra
Cc: Sasha Levin
Cc: Randy Dunlap
Cc: Rasmus Villemoes
Cc: Jonathan Corbet
Cc: Michal Marek
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yury Gribov
Cc: Dmitry Vyukov
Cc: Konstantin Khlebnikov
Cc: Kostya Serebryany
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2016-01-21 09:09:18 +0800

17 Jan, 2016

3 commits

06b031de2 printk: change recursion_bug type to bool ... Browse Code »

`recursion_bug' is used as recursion_bug toggle, so make it `bool'.

Signed-off-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-01-17 03:17:31 +0800
8d91f8b15 printk: do cond_resched() between lines while outputting to consoles ... Browse Code »

@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.

However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.

If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.

Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.

Signed-off-by: Tejun Heo
Reported-by: Calvin Owens
Acked-by: Jan Kara
Cc: Dave Jones
Cc: Kyle McMartin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2016-01-17 03:17:25 +0800
81cc26f2b printk: only unregister boot consoles when necessary ... Browse Code »

Boot consoles are typically replaced by proper consoles during the boot
process. This can be problematic if the boot console data is part of
the init section that is reclaimed late during boot. If the proper
console does not register before this point in time, the boot console
will need to be removed (so that the freed memory is not accessed),
leaving the system without output for some time.

There are various reasons why the proper console may not register early
enough, such as deferred probe or the driver being a loadable module.
If that happens, there is some amount of time where no console messages
are visible to the user, which in turn can mean that they won't see
crashes or other potentially useful information.

To avoid this situation, only remove the boot console when it resides in
the init section. Code exists to replace the boot console by the proper
console when it is registered, keeping a seamless transition between the
boot and proper consoles.

Signed-off-by: Thierry Reding
Cc: Arnd Bergmann
Cc: Greg Kroah-Hartman
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thierry Reding
2016-01-17 03:17:25 +0800

07 Nov, 2015

1 commit

3824657c5 printk: prevent userland from spoofing kernel messages ... Browse Code »

The following statement of ABI/testing/dev-kmsg is not quite right:

It is not possible to inject messages from userspace with the
facility number LOG_KERN (0), to make sure that the origin of the
messages can always be reliably determined.

Userland actually can inject messages with a facility of 0 by abusing the
fact that the facility is stored in a u8 data type. By using a facility
which is a multiple of 256 the assignment of msg->facility in log_store()
implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg
to spoof kernel messages as shown below:

The following call...
# printf 'Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
...leads to the following log entry (dmesg -x | tail -n 1):
user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty

However, this call...
# printf 'Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
...leads to the slightly different log entry (note the kernel facility):
kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty

Fix that by limiting the user provided facility to 8 bit right from the
beginning and catch the truncation early.

Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...")
Signed-off-by: Mathias Krause
Cc: Greg Kroah-Hartman
Cc: Petr Mladek
Cc: Alex Elder
Cc: Joe Perches
Cc: Kay Sievers
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathias Krause
2015-11-07 09:50:42 +0800

22 Oct, 2015

1 commit

ee1d26742 pstore: add pstore unregister ... Browse Code »

pstore doesn't support unregistering yet. It was marked as TODO.
This patch adds some code to fix it:
1) Add functions to unregister kmsg/console/ftrace/pmsg.
2) Add a function to free compression buffer.
3) Unmap the memory and free it.
4) Add a function to unregister pstore filesystem.

Signed-off-by: Geliang Tang
Acked-by: Kees Cook
[Removed __exit annotation from ramoops_remove(). Reported by Arnd Bergmann]
Signed-off-by: Tony Luck

Geliang Tang
2015-10-22 23:59:18 +0800

11 Sep, 2015

1 commit

2965faa5e kexec: split kexec_load syscall from kexec core code ... Browse Code »

There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young
Cc: Eric W. Biederman
Cc: Vivek Goyal
Cc: Petr Tesarik
Cc: Theodore Ts'o
Cc: Josh Boyer
Cc: David Howells
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2015-09-11 04:29:01 +0800

01 Jul, 2015

1 commit

b389645f0 printk: improve the description of /dev/kmsg line format ... Browse Code »

The comment about /dev/kmsg does not mention the additional values which
may actually be exported, fix that.

Also move up the part of the comment instructing the users to ignore these
additional values, this way the reading is more fluent and logically
compact.

Signed-off-by: Antonio Ospite
Cc: Joe Perches
Cc: Jonathan Corbet
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Antonio Ospite
2015-07-01 10:44:59 +0800

26 Jun, 2015

5 commits

3ea4331c6 check_syslog_permissions() cleanup ... Browse Code »

Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM:
"from_file handling makes me cry.

That's not a boolean - it's an enumerated value with two values
currently defined.

But the code in check_syslog_permissions() treats it as a boolean and
also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`).

And the name is wrong: it should be called from_proc to match
SYSLOG_FROM_PROC."

Signed-off-by: Vasily Averin
Cc: Kees Cook
Cc: Josh Boyer
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasily Averin
2015-06-26 08:00:39 +0800
d194e5d66 security_syslog() should be called once only ... Browse Code »

The final version of commit 637241a900cb ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:

- open of /dev/kmsg checks syslog access permissions by using
check_syslog_permissions() where security_syslog() is not called if
dmesg_restrict is set.

- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
can be executed twice (inside check_syslog_permissions() and then
directly in do_syslog())

With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.

Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
Signed-off-by: Vasily Averin
Cc: Kees Cook
Cc: Josh Boyer
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasily Averin
2015-06-26 08:00:39 +0800
6fe29354b printk: implement support for extended console drivers ... Browse Code »

printk log_buf keeps various metadata for each message including its
sequence number and timestamp. The metadata is currently available only
through /dev/kmsg and stripped out before passed onto console drivers. We
want this metadata to be available to console drivers too so that console
consumers can get full information including the metadata and dictionary,
which among other things can be used to detect whether messages got lost
in transit.

This patch implements support for extended console drivers. Consoles can
indicate that they want extended messages by setting the new CON_EXTENDED
flag and they'll be fed messages formatted the same way as /dev/kmsg.

",,,;\n"

If extended consoles exist, in-kernel fragment assembly is disabled. This
ensures that all messages emitted to consoles have full metadata including
sequence number. The contflag carries enough information to reassemble
the fragments from the reader side trivially. Note that this only affects
/dev/kmsg. Regular console and /proc/kmsg outputs are not affected by
this change.

* Extended message formatting for console drivers is enabled iff there
are registered extended consoles.

* Comment describing /dev/kmsg message format updated to add missing
contflag field and help distinguishing variable from verbatim terms.

Signed-off-by: Tejun Heo
Cc: David Miller
Cc: Kay Sievers
Reviewed-by: Petr Mladek
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-06-26 08:00:38 +0800
0a295e67e printk: factor out message formatting from devkmsg_read() ... Browse Code »

The extended message formatting used for /dev/kmsg will be used implement
extended consoles. Factor out msg_print_ext_header() and
msg_print_ext_body() from devkmsg_read().

This is pure restructuring.

Signed-off-by: Tejun Heo
Cc: David Miller
Cc: Kay Sievers
Reviewed-by: Petr Mladek
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-06-26 08:00:38 +0800
d43ff430f printk: guard the amount written per line by devkmsg_read() ... Browse Code »

This patchset updates netconsole so that it can emit messages with the
same header as used in /dev/kmsg which gives neconsole receiver full log
information which enables things like structured logging and detection
of lost messages.

This patch (of 7):

devkmsg_read() uses 8k buffer and assumes that the formatted output
message won't overrun which seems safe given LOG_LINE_MAX, the current use
of dict and the escaping method being used; however, we're planning to use
devkmsg formatting wider and accounting for the buffer size properly isn't
that complicated.

This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
so that it limits output accordingly.

Signed-off-by: Tejun Heo
Cc: David Miller
Cc: Kay Sievers
Reviewed-by: Petr Mladek
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-06-26 08:00:38 +0800