15 Nov, 2016

1 commit

  • This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.

    It turns out that this flushes things much too aggressiverly, and causes
    lines to break up when the system logger races with new continuation
    lines being printed.

    There's a pending patch to make printk() flushing much more
    straightforward, but it's too invasive for 4.9, so in the meantime let's
    just not make the system message logging flush continuation lines.
    They'll be flushed by the final newline anyway.

    Suggested-by: Petr Mladek
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Nov, 2016

1 commit

  • This reverts commit 05fd007e4629 ("console: don't prefer first
    registered if DT specifies stdout-path").

    The reverted commit changes existing behavior on which many ARM boards
    rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have
    both a video output and a serial console. Depending on whether the user
    is using the device as a more regular computer; or as a headless device
    we need to have the console on either one or the other.

    Many users rely on the kernel behavior of the console being present on
    both outputs, before the reverted commit the console setup with no
    console= kernel arguments on an ARM board which sets stdout-path in dt
    would look like this:

    [root@localhost ~]# cat /proc/consoles
    ttyS0 -W- (EC p a) 4:64
    tty0 -WU (E p ) 4:1

    Where as after the reverted commit, it looks like this:

    [root@localhost ~]# cat /proc/consoles
    ttyS0 -W- (EC p a) 4:64

    This commit reverts commit 05fd007e4629 ("console: don't prefer first
    registered if DT specifies stdout-path") restoring the original
    behavior.

    Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
    Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com
    Signed-off-by: Hans de Goede
    Cc: Paul Burton
    Cc: Rob Herring
    Cc: Frank Rowand
    Cc: Thorsten Leemhuis
    Cc: Greg Kroah-Hartman
    Cc: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans de Goede
     

20 Oct, 2016

1 commit

  • We have a fairly common pattern where you print several things as
    continuations on one single line in a loop, and then at the end you do

    printk(KERN_CONT "\n");

    to flush the buffered output.

    But if the output was flushed by something else (concurrent printk
    activity, or just system logging), we don't want that final flushing to
    just print an empty line.

    So just suppress empty continuation lines when they couldn't be merged
    into the line they are a continuation of.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Oct, 2016

1 commit

  • Merge my system logging cleanups, triggered by the broken '\n' patches.

    The line continuation handling has been broken basically forever, and
    the code to handle the system log records was both confusing and
    dubious. And it would do entirely the wrong thing unless you always had
    a terminating newline, partly because it couldn't actually see whether a
    message was marked KERN_CONT or not (but partly because the LOG_CONT
    handling in the recording code was rather confusing too).

    This re-introduces a real semantically meaningful KERN_CONT, and fixes
    the few places I noticed where it was missing. There are probably more
    missing cases, since KERN_CONT hasn't actually had any semantic meaning
    for at least four years (other than the checkpatch meaning of "no log
    level necessary, this is a continuation line").

    This also allows the combination of KERN_CONT and a log level. In that
    case the log level will be ignored if the merging with a previous line
    is successful, but if a new record is needed, that new record will now
    get the right log level.

    That also means that you can at least in theory combine KERN_CONT with
    the "pr_info()" style helpers, although any use of pr_fmt() prefixing
    would make that just result in a mess, of course (the prefix would end
    up in the middle of a continuing line).

    * printk-cleanups:
    printk: make reading the kernel log flush pending lines
    printk: re-organize log_output() to be more legible
    printk: split out core logging code into helper function
    printk: reinstate KERN_CONT for printing continuation lines

    Linus Torvalds
     

10 Oct, 2016

4 commits

  • That will mean that any possible subsequent continuation will now be
    broken up onto a line of its own (since reading the log has finalized
    the beginning og the line), but if user space has activated system
    logging (or if there's a kernel message dump going on) that is the right
    thing to do.

    And now that we actually get the continuation flags _right_ for this
    all, the user space logger that is reading the kernel messages can
    actually see the continuation marker. Not that anybody seems to really
    bother with it (or care), but in theory user space can do its own
    message stitching.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Avoid some duplicate logic now that we can return early, and update the
    comments for the new LOG_CONT world order.

    This also stops the continuation flushing from just using random record
    flags for the flushing action, instead taking the flags from the proper
    original line and updating them as we add continuations to it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The code that actually decides how to log the message (whether to put it
    directly into the record log, whether to append it to an existing
    buffered log, or whether to start a new buffered log) is fairly
    non-obvious code in the middle of the vprintk_emit() function.

    Splitting that code up into a helper function makes it easier to
    understand, but perhaps more importantly also allows for the code to
    just return early out of the helper function once it has made the
    decision about where the new log content goes.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Long long ago the kernel log buffer was a buffered stream of bytes, very
    much like stdio in user space. It supported log levels by scanning the
    stream and noticing the log level markers at the beginning of each line,
    but if you wanted to print a partial line in multiple chunks, you just
    did multiple printk() calls, and it just automatically worked.

    Except when it didn't, and you had very confusing output when different
    lines got all mixed up with each other. Then you got fragment lines
    mixing with each other, or with non-fragment lines, because it was
    traditionally impossible to tell whether a printk() call was a
    continuation or not.

    To at least help clarify the issue of continuation lines, we added a
    KERN_CONT marker back in 2007 to mark continuation lines:

    474925277671 ("printk: add KERN_CONT annotation").

    That continuation marker was initially an empty string, and didn't
    actuall make any semantic difference. But it at least made it possible
    to annotate the source code, and have check-patch notice that a printk()
    didn't need or want a log level marker, because it was a continuation of
    a previous line.

    To avoid the ambiguity between a continuation line that had that
    KERN_CONT marker, and a printk with no level information at all, we then
    in 2009 made KERN_CONT be a real log level marker which meant that we
    could now reliably tell the difference between the two cases.

    5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")

    and we could take advantage of that to make sure we didn't mix up
    continuation lines with lines that just didn't have any loglevel at all.

    Then, in 2012, the kernel log buffer was changed to be a "record" based
    log, where each line was a record that has a loglevel and a timestamp.

    You can see the beginning of that conversion in commits

    e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
    7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")

    with a number of follow-up commits to fix some painful fallout from that
    conversion. Over all, it took a couple of months to sort out most of
    it. But the upside was that you could have concurrent readers (and
    writers) of the kernel log and not have lines with mixed output in them.

    And one particular pain-point for the record-based kernel logging was
    exactly the fragmentary lines that are generated in smaller chunks. In
    order to still log them as one recrod, the continuation lines need to be
    attached to the previous record properly.

    However the explicit continuation record marker that is actually useful
    for this exact case was actually removed in aroundm the same time by commit

    61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

    due to the incorrect belief that KERN_CONT wasn't meaningful. The
    ambiguity between "is this a continuation line" or "is this a plain
    printk with no log level information" was reintroduced, and in fact
    became an even bigger pain point because there was now the whole
    record-level merging of kernel messages going on.

    This patch reinstates the KERN_CONT as a real non-empty string marker,
    so that the ambiguity is fixed once again.

    But it's not a plain revert of that original removal: in the four years
    since we made KERN_CONT an empty string again, not only has the format
    of the log level markers changed, we've also had some usage changes in
    this area.

    For example, some ACPI code seems to use KERN_CONT _together_ with a log
    level, and now uses both the KERN_CONT marker and (for example) a
    KERN_INFO marker to show that it's an informational continuation of a
    line.

    Which is actually not a bad idea - if the continuation line cannot be
    attached to its predecessor, without the log level information we don't
    know what log level to assign to it (and we traditionally just assigned
    it the default loglevel). So having both a log level and the KERN_CONT
    marker is not necessarily a bad idea, but it does mean that we need to
    actually iterate over potentially multiple markers, rather than just a
    single one.

    Also, since KERN_CONT was still conceptually needed, and encouraged, but
    didn't actually _do_ anything, we've also had the reverse problem:
    rather than having too many annotations it has too few, and there is bit
    rot with code that no longer marks the continuation lines with the
    KERN_CONT marker.

    So this patch not only re-instates the non-empty KERN_CONT marker, it
    also fixes up the cases of bit-rot I noticed in my own logs.

    There are probably other cases where KERN_CONT will be needed to be
    added, either because it is new code that never dealt with the need for
    KERN_CONT, or old code that has bitrotted without anybody noticing.

    That said, we should strive to avoid the need for KERN_CONT. It does
    result in real problems for logging, and should generally not be seen as
    a good feature. If we some day can get rid of the feature entirely,
    because nobody does any fragmented printk calls, that would be lovely.

    But until that point, let's at mark the code that relies on the hacky
    multi-fragment kernel printk's. Not only does it avoid the ambiguity,
    it also annotates code as "maybe this would be good to fix some day".

    (That said, particularly during single-threaded bootup, the downsides of
    KERN_CONT are very limited. Things get much hairier when you have
    multiple threads going on and user level reading and writing logs too).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • If a device tree specifies a preferred device for kernel console output
    via the stdout-path or linux,stdout-path chosen node properties or the
    stdout alias then the kernel ought to honor it & output the kernel
    console to that device. As it stands, this isn't the case. Whilst we
    parse the stdout-path properties & set an of_stdout variable from
    of_alias_scan(), and use that from of_console_check() to determine
    whether to add a console device as a preferred console whilst
    registering it, we also prefer the first registered console if no other
    has been selected at the time of its registration.

    This means that if a console other than the one the device tree selects
    via stdout-path is registered first, we will switch to using it & when
    the stdout-path console is later registered the call to
    add_preferred_console() via of_console_check() is too late to do
    anything useful. In practice this seems to mean that we switch to the
    dummy console device fairly early & see no further console output:

    Console: colour dummy device 80x25
    console [tty0] enabled
    bootconsole [ns16550a0] disabled

    Fix this by not automatically preferring the first registered console if
    one is specified by the device tree. This allows consoles to be
    registered but not enabled, and once the driver for the console selected
    by stdout-path calls of_console_check() the driver will be added to the
    list of preferred consoles before any other console has been enabled.
    When that console is then registered via register_console() it will be
    enabled as expected.

    Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.com
    Signed-off-by: Paul Burton
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: Tejun Heo
    Cc: Sergey Senozhatsky
    Cc: Jiri Slaby
    Cc: Daniel Vetter
    Cc: Ivan Delalande
    Cc: Thierry Reding
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Greg Kroah-Hartman
    Cc: Rob Herring
    Cc: Frank Rowand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Burton
     

02 Sep, 2016

1 commit

  • __printk_nmi_flush() can be called from nmi_panic(), therefore it has to
    test whether it's executed in NMI context and thus must route the
    messages through deferred printk() or via direct printk().

    This is to avoid potential deadlocks, as described in commit
    cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic").

    However there remain two places where __printk_nmi_flush() does
    unconditional direct printk() calls:

    - pr_err("printk_nmi_flush: internal error ...")
    - pr_cont("\n")

    Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
    function, which takes care of in_nmi(), and use it in
    __printk_nmi_flush() for printing and error-reporting.

    Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Petr Mladek
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

27 Aug, 2016

1 commit

  • Commit bbeddf52adc1 ("printk: move braille console support into separate
    braille.[ch] files") moved the parsing of braille-related options into
    _braille_console_setup(), changing the type of variable str from char*
    to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated
    to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

    Update the code to make "brl=" option work again and replace memcmp()
    with strncmp() to make the compiler able to detect such an issue.

    Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files")
    Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
    Signed-off-by: Nicolas Iooss
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     

10 Aug, 2016

1 commit

  • This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142.

    Geert Uytterhoeven reports:
    "This change seems to have an (unintendent?) side-effect.

    Before, pr_*() calls without a trailing newline characters would be
    printed with a newline character appended, both on the console and in
    the output of the dmesg command.

    After this commit, no new line character is appended, and the output
    of the next pr_*() call of the same type may be appended, like in:

    - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
    - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
    + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"

    Joe Perches says:
    "No, that is not intentional.

    The newline handling code inside vprintk_emit is a bit involved and
    for now I suggest a revert until this has all the same behavior as
    earlier"

    Reported-by: Geert Uytterhoeven
    Requested-by: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Aug, 2016

1 commit

  • In commit 874f9c7da9a4 ("printk: create pr_ functions"), new
    pr_level defines were added to printk.c.

    These new defines are guarded by an #ifdef CONFIG_PRINTK - however,
    there is already a surrounding #ifdef CONFIG_PRINTK starting a lot
    earlier in line 249 which means the newly introduced #ifdef is
    unnecessary.

    Let's remove it to avoid confusion.

    Signed-off-by: Andreas Ziegler
    Cc: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Ziegler
     

03 Aug, 2016

5 commits

  • Add a "printk.devkmsg" kernel command line parameter which controls how
    userspace writes into /dev/kmsg. It has three options:

    * ratelimit - ratelimit logging from userspace.
    * on - unlimited logging from userspace
    * off - logging from userspace gets ignored

    The default setting is to ratelimit the messages written to it.

    This changes the kernel default setting of "on" to "ratelimit" and we do
    that because we want to keep userspace spamming /dev/kmsg to sane
    levels. This is especially moot when a small kernel log buffer wraps
    around and messages get lost. So the ratelimiting setting should be a
    sane setting where kernel messages should have a bit higher chance of
    survival from all the spamming.

    It additionally does not limit logging to /dev/kmsg while the system is
    booting if we haven't disabled it on the command line.

    Furthermore, we can control the logging from a lower priority sysctl
    interface - kernel.printk_devkmsg.

    That interface will succeed only if printk.devkmsg *hasn't* been
    supplied on the command line. If it has, then printk.devkmsg is a
    one-time setting which remains for the duration of the system lifetime.
    This "locking" of the setting is to prevent userspace from changing the
    logging on us through sysctl(2).

    This patch is based on previous patches from Linus and Steven.

    [bp@suse.de: fixes]
    Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
    Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Dave Young
    Cc: Franck Bui
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • asm-generic headers are generic implementations for architecture
    specific code and should not be included by common code. Thus use the
    asm/ version of sections.h to get at the linker sections.

    Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Messages' levels and console log level are inspected when the actual
    printing occurs, which may provoke console_unlock() and
    console_cont_flush() to waste CPU cycles on every message that has
    loglevel above the current console_loglevel.

    Schematically, console_unlock() does the following:

    console_unlock()
    {
    ...
    for (;;) {
    ...
    raw_spin_lock_irqsave(&logbuf_lock, flags);
    skip:
    msg = log_from_idx(console_idx);

    if (msg->flags & LOG_NOCONS) {
    ...
    goto skip;
    }

    level = msg->level;
    len += msg_print_text(); >> sprintfs
    memcpy,
    etc.

    if (nr_ext_console_drivers) {
    ext_len = msg_print_ext_header(); >> scnprintf
    ext_len += msg_print_ext_body(); >> scnprintfs
    etc.
    }
    ...
    raw_spin_unlock(&logbuf_lock);

    call_console_drivers(level, ext_text, ext_len, text, len)
    {
    if (level >= console_loglevel && >> drop the message
    !ignore_loglevel)
    return;

    console->write(...);
    }

    local_irq_restore(flags);
    }
    ...
    }

    The thing here is this deferred `level >= console_loglevel' check. We
    are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
    that we will eventually drop.

    This can be huge when we register a new CON_PRINTBUFFER console, for
    instance. For every such a console register_console() resets the

    console_seq, console_idx, console_prev

    and sets a `exclusive console' pointer to replay the log buffer to that
    just-registered console. And there can be a lot of messages to replay,
    in the worst case most of which can be dropped after console_loglevel
    test.

    We know messages' levels long before we call msg_print_text() and
    friends, so we can just move console_loglevel check out of
    call_console_drivers() and format a new message only if we are sure that
    it won't be dropped.

    The patch factors out loglevel check into suppress_message_printing()
    function and tests message->level and console_loglevel before formatting
    functions in console_unlock() and console_cont_flush() are getting
    executed. This improves things not only for exclusive CON_PRINTBUFFER
    consoles, but for every console_unlock() that attempts to print a
    message of level above the console_loglevel.

    Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Tejun Heo
    Cc: Jan Kara
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Using functions instead of macros can reduce overall code size by
    eliminating unnecessary "KERN_SOH" prefixes from format strings.

    defconfig x86-64:

    $ size vmlinux*
    text data bss dec hex filename
    10193570 4331464 1105920 15630954 ee826a vmlinux.new
    10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old

    As the return value are unimportant and unused in the kernel tree, these
    new functions return void.

    Miscellanea:

    - change pr_ macros to call new __pr_ functions
    - change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

    [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
    Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • A trivial cosmetic change: interrupt.h header is redundant since commit
    6b898c07cb1d ("console: use might_sleep in console_lock").

    Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

29 Jul, 2016

1 commit

  • We currently show:

    task: ti: task.ti: "

    "ti" and "task.ti" are redundant, and neither is actually what we want
    to show, which the the base of the thread stack. Change the display to
    show the stack pointer explicitly.

    Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

21 May, 2016

4 commits

  • In NMI context, printk() messages are stored into per-CPU buffers to
    avoid a possible deadlock. They are normally flushed to the main ring
    buffer via an IRQ work. But the work is never called when the system
    calls panic() in the very same NMI handler.

    This patch tries to flush NMI buffers before the crash dump is
    generated. In this case it does not risk a double release and bails out
    when the logbuf_lock is already taken. The aim is to get the messages
    into the main ring buffer when possible. It makes them better
    accessible in the vmcore.

    Then the patch tries to flush the buffers second time when other CPUs
    are down. It might be more aggressive and reset logbuf_lock. The aim
    is to get the messages available for the consequent kmsg_dump() and
    console_flush_on_panic() calls.

    The patch causes vprintk_emit() to be called even in NMI context again.
    But it is done via printk_deferred() so that the console handling is
    skipped. Consoles use internal locks and we could not prevent a
    deadlock easily. They are explicitly called later when the crash dump
    is not generated, see console_flush_on_panic().

    Signed-off-by: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Daniel Thompson
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Testing has shown that the backtrace sometimes does not fit into the 4kB
    temporary buffer that is used in NMI context. The warnings are gone
    when I double the temporary buffer size.

    This patch doubles the buffer size and makes it configurable.

    Note that this problem existed even in the x86-specific implementation
    that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
    stack trace on all CPUs"). Nobody noticed it because it did not print
    any warnings.

    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • We could not resize the temporary buffer in NMI context. Let's warn if
    a message is lost.

    This is rather theoretical. printk() should not be used in NMI. The
    only sensible use is when we want to print backtrace from all CPUs. The
    current buffer should be enough for this purpose.

    [akpm@linux-foundation.org: whitespace fixlet]
    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

18 Mar, 2016

4 commits

  • This allows us to extract from the vmcore only the messages emitted
    since the last time the ring buffer was cleared. We just have to make
    sure its value is always up-to-date, when old messages are discarded to
    free space in log_make_free_space() for example.

    Signed-off-by: Zeyu Zhao
    Signed-off-by: Ivan Delalande
    Cc: Kay Sievers
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ivan Delalande
     
  • have_callable_console() must also test CON_ENABLED bit, not just
    CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can
    wrongly assume that it's safe to call_console_drivers().

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1, since 8d91f8b15361 ("printk: do
    cond_resched() between lines while outputting to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    However, console_trylock() callers (among them is printk()) do not
    always call printk() from atomic contexts, and some of them can
    cond_resched() in console_unlock(), so console_trylock() can set
    `console_may_schedule' to 1 for such processes.

    For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
    sets `console_may_schedule' to 0.

    It's possible to drop explicit preempt_disable()/preempt_enable() in
    vprintk_emit(), because console_unlock() and console_trylock() are now
    smart enough:
    a) console_unlock() does not cond_resched() when it's unsafe
    (console_trylock() takes care of that)
    b) console_unlock() does can_use_console() check.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1 (this functionality is present since
    8d91f8b15361 ("printk: do cond_resched() between lines while outputting
    to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    printk() calls console_unlock() with preemption desabled, which
    basically can lead to RCU stalls, watchdog soft lockups, etc. if
    something is simultaneously calling printk() frequent enough (IOW,
    console_sem owner always has new data to send to console divers and
    can't leave console_unlock() for a long time).

    printk()->console_trylock() callers do not necessarily execute in atomic
    contexts, and some of them can cond_resched() in console_unlock().
    console_trylock() can set `console_may_schedule' to 1 (allow
    cond_resched() later in consoe_unlock()) when it's safe.

    This patch (of 3):

    vprintk_emit() disables preemption around console_trylock_for_printk()
    and console_unlock() calls for a strong reason -- can_use_console()
    check. The thing is that vprintl_emit() can be called on a CPU that is
    not fully brought up yet (!cpu_online()), which potentially can cause
    problems if console driver wants to access per-cpu data. A console
    driver can explicitly state that it's safe to call it from !online cpu
    by setting CON_ANYTIME bit in console ->flags. That's why for
    !cpu_online() can_use_console() iterates all the console to find out if
    there is a CON_ANYTIME console, otherwise console_unlock() must be
    avoided.

    can_use_console() ensures that console_unlock() call is safe in
    vprintk_emit() only; console_lock() and console_trylock() are not
    covered by this check. Even though call_console_drivers(), invoked from
    console_cont_flush() and console_unlock(), tests `!cpu_online() &&
    CON_ANYTIME' for_each_console(), it may be too late, which can result in
    messages loss.

    Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
    CON_ANYTIME consoles available.

    CPU0 online CPU1 !online
    console_trylock()
    ...
    console_unlock()
    console_cont_flush
    spin_lock logbuf_lock
    if (!cont.len) {
    spin_unlock logbuf_lock
    return
    }
    for (;;) {
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    spin_lock logbuf_lock
    !console_trylock_for_printk msg_print_text
    return console_idx = log_next()
    console_seq++
    console_prev = msg->flags
    spin_unlock logbuf_lock

    call_console_drivers()
    for_each_console(con) {
    if (!cpu_online() &&
    !(con->flags & CON_ANYTIME))
    continue;
    }
    /*
    * no message printed, we lost it
    */
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    !console_trylock_for_printk
    return
    /*
    * go to the beginning of the loop,
    * find out there are new messages,
    * lose it
    */
    }

    console_trylock()/console_lock() call on CPU1 may come from cpu
    notifiers registered on that CPU. Since notifiers are not getting
    unregistered when CPU is going DOWN, all of the notifiers receive
    notifications during CPU UP. For example, on my x86_64, I see around 50
    notification sent from offline CPU to itself

    [swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify

    while doing
    echo 0 > /sys/devices/system/cpu/cpu2/online
    echo 1 > /sys/devices/system/cpu/cpu2/online

    So grabbing the console_sem lock while CPU is !online is possible,
    in theory.

    This patch moves can_use_console() check out of
    console_trylock_for_printk(). Instead it calls it in console_unlock(),
    so now console_lock()/console_unlock() are also 'protected' by
    can_use_console(). This also means that console_trylock_for_printk() is
    not really needed anymore and can be removed.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

21 Jan, 2016

1 commit

  • On architectures that have support for efficient unaligned access struct
    printk_log has 4-byte alignment. Specify alignment attribute in type
    declaration.

    The whole point of this patch is to fix deadlock which happening when
    UBSAN detects unaligned access in printk() thus UBSAN recursively calls
    printk() with logbuf_lock held by top printk() call.

    Signed-off-by: Andrey Ryabinin
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Randy Dunlap
    Cc: Rasmus Villemoes
    Cc: Jonathan Corbet
    Cc: Michal Marek
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yury Gribov
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Cc: Kostya Serebryany
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

17 Jan, 2016

3 commits

  • `recursion_bug' is used as recursion_bug toggle, so make it `bool'.

    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • @console_may_schedule tracks whether console_sem was acquired through
    lock or trylock. If the former, we're inside a sleepable context and
    console_conditional_schedule() performs cond_resched(). This allows
    console drivers which use console_lock for synchronization to yield
    while performing time-consuming operations such as scrolling.

    However, the actual console outputting is performed while holding
    irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
    before starting outputting lines. Also, only a few drivers call
    console_conditional_schedule() to begin with. This means that when a
    lot of lines need to be output by console_unlock(), for example on a
    console registration, the task doing console_unlock() may not yield for
    a long time on a non-preemptible kernel.

    If this happens with a slow console devices, for example a serial
    console, the outputting task may occupy the cpu for a very long time.
    Long enough to trigger softlockup and/or RCU stall warnings, which in
    turn pile more messages, sometimes enough to trigger the next cycle of
    warnings incapacitating the system.

    Fix it by making console_unlock() insert cond_resched() between lines if
    @console_may_schedule.

    Signed-off-by: Tejun Heo
    Reported-by: Calvin Owens
    Acked-by: Jan Kara
    Cc: Dave Jones
    Cc: Kyle McMartin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Boot consoles are typically replaced by proper consoles during the boot
    process. This can be problematic if the boot console data is part of
    the init section that is reclaimed late during boot. If the proper
    console does not register before this point in time, the boot console
    will need to be removed (so that the freed memory is not accessed),
    leaving the system without output for some time.

    There are various reasons why the proper console may not register early
    enough, such as deferred probe or the driver being a loadable module.
    If that happens, there is some amount of time where no console messages
    are visible to the user, which in turn can mean that they won't see
    crashes or other potentially useful information.

    To avoid this situation, only remove the boot console when it resides in
    the init section. Code exists to replace the boot console by the proper
    console when it is registered, keeping a seamless transition between the
    boot and proper consoles.

    Signed-off-by: Thierry Reding
    Cc: Arnd Bergmann
    Cc: Greg Kroah-Hartman
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thierry Reding
     

07 Nov, 2015

1 commit

  • The following statement of ABI/testing/dev-kmsg is not quite right:

    It is not possible to inject messages from userspace with the
    facility number LOG_KERN (0), to make sure that the origin of the
    messages can always be reliably determined.

    Userland actually can inject messages with a facility of 0 by abusing the
    fact that the facility is stored in a u8 data type. By using a facility
    which is a multiple of 256 the assignment of msg->facility in log_store()
    implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg
    to spoof kernel messages as shown below:

    The following call...
    # printf 'Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
    ...leads to the following log entry (dmesg -x | tail -n 1):
    user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty

    However, this call...
    # printf 'Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
    ...leads to the slightly different log entry (note the kernel facility):
    kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty

    Fix that by limiting the user provided facility to 8 bit right from the
    beginning and catch the truncation early.

    Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...")
    Signed-off-by: Mathias Krause
    Cc: Greg Kroah-Hartman
    Cc: Petr Mladek
    Cc: Alex Elder
    Cc: Joe Perches
    Cc: Kay Sievers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

22 Oct, 2015

1 commit

  • pstore doesn't support unregistering yet. It was marked as TODO.
    This patch adds some code to fix it:
    1) Add functions to unregister kmsg/console/ftrace/pmsg.
    2) Add a function to free compression buffer.
    3) Unmap the memory and free it.
    4) Add a function to unregister pstore filesystem.

    Signed-off-by: Geliang Tang
    Acked-by: Kees Cook
    [Removed __exit annotation from ramoops_remove(). Reported by Arnd Bergmann]
    Signed-off-by: Tony Luck

    Geliang Tang
     

11 Sep, 2015

1 commit

  • There are two kexec load syscalls, kexec_load another and kexec_file_load.
    kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
    split kexec_load syscall code to kernel/kexec.c.

    And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
    use kexec_file_load only, or vice verse.

    The original requirement is from Ted Ts'o, he want kexec kernel signature
    being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
    kexec_load syscall can bypass the checking.

    Vivek Goyal proposed to create a common kconfig option so user can compile
    in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
    KEXEC_CORE so that old config files still work.

    Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
    architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
    KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
    kexec_load syscall.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Young
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Petr Tesarik
    Cc: Theodore Ts'o
    Cc: Josh Boyer
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

01 Jul, 2015

1 commit

  • The comment about /dev/kmsg does not mention the additional values which
    may actually be exported, fix that.

    Also move up the part of the comment instructing the users to ignore these
    additional values, this way the reading is more fluent and logically
    compact.

    Signed-off-by: Antonio Ospite
    Cc: Joe Perches
    Cc: Jonathan Corbet
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Antonio Ospite
     

26 Jun, 2015

5 commits

  • Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM:
    "from_file handling makes me cry.

    That's not a boolean - it's an enumerated value with two values
    currently defined.

    But the code in check_syslog_permissions() treats it as a boolean and
    also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`).

    And the name is wrong: it should be called from_proc to match
    SYSLOG_FROM_PROC."

    Signed-off-by: Vasily Averin
    Cc: Kees Cook
    Cc: Josh Boyer
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • The final version of commit 637241a900cb ("kmsg: honor dmesg_restrict
    sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
    processed incorrectly:

    - open of /dev/kmsg checks syslog access permissions by using
    check_syslog_permissions() where security_syslog() is not called if
    dmesg_restrict is set.

    - syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
    can be executed twice (inside check_syslog_permissions() and then
    directly in do_syslog())

    With this patch security_syslog() is called once only in all
    syslog-related operations regardless of dmesg_restrict value.

    Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
    Signed-off-by: Vasily Averin
    Cc: Kees Cook
    Cc: Josh Boyer
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • printk log_buf keeps various metadata for each message including its
    sequence number and timestamp. The metadata is currently available only
    through /dev/kmsg and stripped out before passed onto console drivers. We
    want this metadata to be available to console drivers too so that console
    consumers can get full information including the metadata and dictionary,
    which among other things can be used to detect whether messages got lost
    in transit.

    This patch implements support for extended console drivers. Consoles can
    indicate that they want extended messages by setting the new CON_EXTENDED
    flag and they'll be fed messages formatted the same way as /dev/kmsg.

    ",,,;\n"

    If extended consoles exist, in-kernel fragment assembly is disabled. This
    ensures that all messages emitted to consoles have full metadata including
    sequence number. The contflag carries enough information to reassemble
    the fragments from the reader side trivially. Note that this only affects
    /dev/kmsg. Regular console and /proc/kmsg outputs are not affected by
    this change.

    * Extended message formatting for console drivers is enabled iff there
    are registered extended consoles.

    * Comment describing /dev/kmsg message format updated to add missing
    contflag field and help distinguishing variable from verbatim terms.

    Signed-off-by: Tejun Heo
    Cc: David Miller
    Cc: Kay Sievers
    Reviewed-by: Petr Mladek
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The extended message formatting used for /dev/kmsg will be used implement
    extended consoles. Factor out msg_print_ext_header() and
    msg_print_ext_body() from devkmsg_read().

    This is pure restructuring.

    Signed-off-by: Tejun Heo
    Cc: David Miller
    Cc: Kay Sievers
    Reviewed-by: Petr Mladek
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • This patchset updates netconsole so that it can emit messages with the
    same header as used in /dev/kmsg which gives neconsole receiver full log
    information which enables things like structured logging and detection
    of lost messages.

    This patch (of 7):

    devkmsg_read() uses 8k buffer and assumes that the formatted output
    message won't overrun which seems safe given LOG_LINE_MAX, the current use
    of dict and the escaping method being used; however, we're planning to use
    devkmsg formatting wider and accounting for the buffer size properly isn't
    that complicated.

    This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
    so that it limits output accordingly.

    Signed-off-by: Tejun Heo
    Cc: David Miller
    Cc: Kay Sievers
    Reviewed-by: Petr Mladek
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo