05 Sep, 2018

1 commit

  • commit 03fc7f9c99c1e7ae2925d459e8487f1a6f199f79 upstream.

    The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
    when logbuf_lock is available") brought back the possible deadlocks
    in printk() and NMI.

    The check of logbuf_lock is done only in printk_nmi_enter() to prevent
    mixed output. But another CPU might take the lock later, enter NMI, and:

    + Both NMIs might be serialized by yet another lock, for example,
    the one in nmi_cpu_backtrace().

    + The other CPU might get stopped in NMI, see smp_send_stop()
    in panic().

    The only safe solution is to use trylock when storing the message
    into the main log-buffer. It might cause reordering when some lines
    go to the main lock buffer directly and others are delayed via
    the per-CPU buffer. It means that it is not useful in general.

    This patch replaces the problematic NMI deferred context with NMI
    direct context. It can be used to mark a code that might produce
    many messages in NMI and the risk of losing them is more critical
    than problems with eventual reordering.

    The context is then used when dumping trace buffers on oops. It was
    the primary motivation for the original fix. Also the reordering is
    even smaller issue there because some traces have their own time stamps.

    Finally, nmi_cpu_backtrace() need not longer be serialized because
    it will always us the per-CPU buffers again.

    Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 May, 2017

1 commit

  • Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
    reuse crashkernel parameter for fadump", v4.

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    This patchset removes dependency with CONFIG_KEXEC for crashkernel
    parameter and vmcoreinfo related code as it can be reused without kexec
    support. Also, crashkernel parameter is reused instead of
    fadump_reserve_mem to reserve memory for fadump.

    The first patch moves crashkernel parameter parsing and vmcoreinfo
    related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
    second patch reuses the definitions of append_elf_note() & final_note()
    functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
    removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
    in powerpc. The next patch reuses crashkernel parameter for reserving
    memory for fadump, instead of the fadump_reserve_mem parameter. This
    has the advantage of using all syntaxes crashkernel parameter supports,
    for fadump as well. The last patch updates fadump kernel documentation
    about use of crashkernel parameter.

    This patch (of 5):

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    But currently, code related to vmcoreinfo and parsing of crashkernel
    parameter is built under CONFIG_KEXEC_CORE. This patch introduces
    CONFIG_CRASH_CORE and moves the above mentioned code under this config,
    allowing code reuse without dependency on CONFIG_KEXEC. There is no
    functional change with this patch.

    Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
    Signed-off-by: Hari Bathini
    Acked-by: Dave Young
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: Eric Biederman
    Cc: Mahesh Salgaonkar
    Cc: Vivek Goyal
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hari Bathini
     

08 Feb, 2017

2 commits

  • This patch extends the idea of NMI per-cpu buffers to regions
    that may cause recursive printk() calls and possible deadlocks.
    Namely, printk() can't handle printk calls from schedule code
    or printk() calls from lock debugging code (spin_dump() for instance);
    because those may be called with `sem->lock' already taken or any
    other `critical' locks (p->pi_lock, etc.). An example of deadlock
    can be

    vprintk_emit()
    console_unlock()
    up() << raw_spin_lock_irqsave(&sem->lock, flags);
    wake_up_process()
    try_to_wake_up()
    ttwu_queue()
    ttwu_activate()
    activate_task()
    enqueue_task()
    enqueue_task_fair()
    cfs_rq_of()
    task_of()
    WARN_ON_ONCE(!entity_is_task(se))
    vprintk_emit()
    console_trylock()
    down_trylock()
    raw_spin_lock_irqsave(&sem->lock, flags)
    ^^^^ deadlock

    and some other cases.

    Just like in NMI implementation, the solution uses a per-cpu
    `printk_func' pointer to 'redirect' printk() calls to a 'safe'
    callback, that store messages in a per-cpu buffer and flushes
    them back to logbuf buffer later.

    Usage example:

    printk()
    printk_safe_enter_irqsave(flags)
    //
    // any printk() call from here will endup in vprintk_safe(),
    // that stores messages in a special per-CPU buffer.
    //
    printk_safe_exit_irqrestore(flags)

    The 'redirection' mechanism, though, has been reworked, as suggested
    by Petr Mladek. Instead of using a per-cpu @print_func callback we now
    keep a per-cpu printk-context variable and call either default or nmi
    vprintk function depending on its value. printk_nmi_entrer/exit and
    printk_safe_enter/exit, thus, just set/celar corresponding bits in
    printk-context functions.

    The patch only adds printk_safe support, we don't use it yet.

    Link: http://lkml.kernel.org/r/20161227141611.940-4-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Reviewed-by: Steven Rostedt (VMware)

    Sergey Senozhatsky
     
  • A preparation patch for printk_safe work. No functional change.
    - rename nmi.c to print_safe.c
    - add `printk_safe' prefix to some (which used both by printk-safe
    and printk-nmi) of the exported functions.

    Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Sergey Senozhatsky
     

13 Dec, 2016

3 commits

  • Add a configuration option to set the default console loglevel. This
    is, as before, still possible to override at runtime through bootargs
    (loglevel=), sysrq and /proc/printk.

    There are cases where adding additional arguments on the commandline is
    impractical, and changing the default for the kernel when being built
    makes more sense. Provide such a method here, for those who choose to
    do so.

    Also, while touching this code, clarify the difference between
    MESSAGE_LOGLEVEL_DEFAULT and CONSOLE_LOGLEVEL_DEFAULT.

    Link: http://lkml.kernel.org/r/1479676829-30031-1-git-send-email-olof@lixom.net
    Signed-off-by: Olof Johansson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
    continuation lines") allows to define more message headers for a single
    message. The motivation is that continuous lines might get mixed.
    Therefore it make sense to define the right log level for every piece of
    a cont line.

    The current btrfs_printk() macros do not support continuous lines at the
    moment. But better be prepared for a custom messages and avoid
    potential "lvl" buffer overflow.

    This patch iterates over the entire message header. It is interested
    only into the message level like the original code.

    This patch also introduces PRINTK_MAX_SINGLE_HEADER_LEN. Three bytes
    are enough for the message level header at the moment. But it used to
    be three, see the commit 04d2c8c83d0e ("printk: convert the format for
    KERN_ to a 2 byte pattern").

    Also I fixed the default ratelimit level. It looked very strange when it
    was different from the default log level.

    [pmladek@suse.com: Fix a check of the valid message level]
    Link: http://lkml.kernel.org/r/20161111183236.GD2145@dhcp128.suse.cz
    Link: http://lkml.kernel.org/r/1478695291-12169-4-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: David Sterba
    Cc: Joe Perches
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Cc: Jason Wessel
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Chris Mason
    Cc: Josef Bacik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
    continuation lines") allows to define more message headers for a single
    message. The motivation is that continuous lines might get mixed.
    Therefore it make sense to define the right log level for every piece of
    a cont line.

    This patch introduces printk_skip_headers() that will skip all headers
    and uses it in the kdb code instead of printk_skip_level().

    This approach helps to fix other printk_skip_level() users
    independently.

    Link: http://lkml.kernel.org/r/1478695291-12169-3-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Cc: Joe Perches
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Cc: Jason Wessel
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

10 Oct, 2016

1 commit

  • Long long ago the kernel log buffer was a buffered stream of bytes, very
    much like stdio in user space. It supported log levels by scanning the
    stream and noticing the log level markers at the beginning of each line,
    but if you wanted to print a partial line in multiple chunks, you just
    did multiple printk() calls, and it just automatically worked.

    Except when it didn't, and you had very confusing output when different
    lines got all mixed up with each other. Then you got fragment lines
    mixing with each other, or with non-fragment lines, because it was
    traditionally impossible to tell whether a printk() call was a
    continuation or not.

    To at least help clarify the issue of continuation lines, we added a
    KERN_CONT marker back in 2007 to mark continuation lines:

    474925277671 ("printk: add KERN_CONT annotation").

    That continuation marker was initially an empty string, and didn't
    actuall make any semantic difference. But it at least made it possible
    to annotate the source code, and have check-patch notice that a printk()
    didn't need or want a log level marker, because it was a continuation of
    a previous line.

    To avoid the ambiguity between a continuation line that had that
    KERN_CONT marker, and a printk with no level information at all, we then
    in 2009 made KERN_CONT be a real log level marker which meant that we
    could now reliably tell the difference between the two cases.

    5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")

    and we could take advantage of that to make sure we didn't mix up
    continuation lines with lines that just didn't have any loglevel at all.

    Then, in 2012, the kernel log buffer was changed to be a "record" based
    log, where each line was a record that has a loglevel and a timestamp.

    You can see the beginning of that conversion in commits

    e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
    7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")

    with a number of follow-up commits to fix some painful fallout from that
    conversion. Over all, it took a couple of months to sort out most of
    it. But the upside was that you could have concurrent readers (and
    writers) of the kernel log and not have lines with mixed output in them.

    And one particular pain-point for the record-based kernel logging was
    exactly the fragmentary lines that are generated in smaller chunks. In
    order to still log them as one recrod, the continuation lines need to be
    attached to the previous record properly.

    However the explicit continuation record marker that is actually useful
    for this exact case was actually removed in aroundm the same time by commit

    61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

    due to the incorrect belief that KERN_CONT wasn't meaningful. The
    ambiguity between "is this a continuation line" or "is this a plain
    printk with no log level information" was reintroduced, and in fact
    became an even bigger pain point because there was now the whole
    record-level merging of kernel messages going on.

    This patch reinstates the KERN_CONT as a real non-empty string marker,
    so that the ambiguity is fixed once again.

    But it's not a plain revert of that original removal: in the four years
    since we made KERN_CONT an empty string again, not only has the format
    of the log level markers changed, we've also had some usage changes in
    this area.

    For example, some ACPI code seems to use KERN_CONT _together_ with a log
    level, and now uses both the KERN_CONT marker and (for example) a
    KERN_INFO marker to show that it's an informational continuation of a
    line.

    Which is actually not a bad idea - if the continuation line cannot be
    attached to its predecessor, without the log level information we don't
    know what log level to assign to it (and we traditionally just assigned
    it the default loglevel). So having both a log level and the KERN_CONT
    marker is not necessarily a bad idea, but it does mean that we need to
    actually iterate over potentially multiple markers, rather than just a
    single one.

    Also, since KERN_CONT was still conceptually needed, and encouraged, but
    didn't actually _do_ anything, we've also had the reverse problem:
    rather than having too many annotations it has too few, and there is bit
    rot with code that no longer marks the continuation lines with the
    KERN_CONT marker.

    So this patch not only re-instates the non-empty KERN_CONT marker, it
    also fixes up the cases of bit-rot I noticed in my own logs.

    There are probably other cases where KERN_CONT will be needed to be
    added, either because it is new code that never dealt with the need for
    KERN_CONT, or old code that has bitrotted without anybody noticing.

    That said, we should strive to avoid the need for KERN_CONT. It does
    result in real problems for logging, and should generally not be seen as
    a good feature. If we some day can get rid of the feature entirely,
    because nobody does any fragmented printk calls, that would be lovely.

    But until that point, let's at mark the code that relies on the hacky
    multi-fragment kernel printk's. Not only does it avoid the ambiguity,
    it also annotates code as "maybe this would be good to fix some day".

    (That said, particularly during single-threaded bootup, the downsides of
    KERN_CONT are very limited. Things get much hairier when you have
    multiple threads going on and user level reading and writing logs too).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Aug, 2016

1 commit

  • This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142.

    Geert Uytterhoeven reports:
    "This change seems to have an (unintendent?) side-effect.

    Before, pr_*() calls without a trailing newline characters would be
    printed with a newline character appended, both on the console and in
    the output of the dmesg command.

    After this commit, no new line character is appended, and the output
    of the next pr_*() call of the same type may be appended, like in:

    - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
    - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
    + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"

    Joe Perches says:
    "No, that is not intentional.

    The newline handling code inside vprintk_emit is a bit involved and
    for now I suggest a revert until this has all the same behavior as
    earlier"

    Reported-by: Geert Uytterhoeven
    Requested-by: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Aug, 2016

3 commits

  • Add a "printk.devkmsg" kernel command line parameter which controls how
    userspace writes into /dev/kmsg. It has three options:

    * ratelimit - ratelimit logging from userspace.
    * on - unlimited logging from userspace
    * off - logging from userspace gets ignored

    The default setting is to ratelimit the messages written to it.

    This changes the kernel default setting of "on" to "ratelimit" and we do
    that because we want to keep userspace spamming /dev/kmsg to sane
    levels. This is especially moot when a small kernel log buffer wraps
    around and messages get lost. So the ratelimiting setting should be a
    sane setting where kernel messages should have a bit higher chance of
    survival from all the spamming.

    It additionally does not limit logging to /dev/kmsg while the system is
    booting if we haven't disabled it on the command line.

    Furthermore, we can control the logging from a lower priority sysctl
    interface - kernel.printk_devkmsg.

    That interface will succeed only if printk.devkmsg *hasn't* been
    supplied on the command line. If it has, then printk.devkmsg is a
    one-time setting which remains for the duration of the system lifetime.
    This "locking" of the setting is to prevent userspace from changing the
    logging on us through sysctl(2).

    This patch is based on previous patches from Linus and Steven.

    [bp@suse.de: fixes]
    Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
    Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Dave Young
    Cc: Franck Bui
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Using functions instead of macros can reduce overall code size by
    eliminating unnecessary "KERN_SOH" prefixes from format strings.

    defconfig x86-64:

    $ size vmlinux*
    text data bss dec hex filename
    10193570 4331464 1105920 15630954 ee826a vmlinux.new
    10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old

    As the return value are unimportant and unused in the kernel tree, these
    new functions return void.

    Miscellanea:

    - change pr_ macros to call new __pr_ functions
    - change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

    [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
    Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • kernel.h header doesn't directly use dynamic debug, instead we can
    include it in module.c (which used it via kernel.h). printk.h only uses
    it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
    in that case.

    Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
    [luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
    Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
    Signed-off-by: Luis de Bethencourt
    Cc: Rusty Russell
    Cc: Hidehiro Kawai
    Cc: Borislav Petkov
    Cc: Michal Nazarewicz
    Cc: Rasmus Villemoes
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis de Bethencourt
     

08 Jul, 2016

1 commit

  • Have printk*once() return a bool which denotes whether the string was
    printed or not so that calling code can react accordingly.

    Signed-off-by: Borislav Petkov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1467671487-10344-3-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

21 May, 2016

2 commits

  • In NMI context, printk() messages are stored into per-CPU buffers to
    avoid a possible deadlock. They are normally flushed to the main ring
    buffer via an IRQ work. But the work is never called when the system
    calls panic() in the very same NMI handler.

    This patch tries to flush NMI buffers before the crash dump is
    generated. In this case it does not risk a double release and bails out
    when the logbuf_lock is already taken. The aim is to get the messages
    into the main ring buffer when possible. It makes them better
    accessible in the vmcore.

    Then the patch tries to flush the buffers second time when other CPUs
    are down. It might be more aggressive and reset logbuf_lock. The aim
    is to get the messages available for the consequent kmsg_dump() and
    console_flush_on_panic() calls.

    The patch causes vprintk_emit() to be called even in NMI context again.
    But it is done via printk_deferred() so that the console handling is
    skipped. Consoles use internal locks and we could not prevent a
    deadlock easily. They are explicitly called later when the crash dump
    is not generated, see console_flush_on_panic().

    Signed-off-by: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Daniel Thompson
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

17 Jan, 2016

1 commit

  • Currently, pr_debug and pr_devel will not elide function call arguments
    appearing in calls to the no_printk for these macros. This is because
    all side effects must be honored before proceeding to the 0-value
    assignment in no_printk.

    The behavior is contrary to documentation found in the CodingStyle and
    the header file where these functions are declared.

    This patch corrects that behavior by shunting out the call to no_printk
    completely. The format string is still checked by gcc for correctness,
    but no code seems to be emitted in common cases.

    [akpm@linux-foundation.org: remove braces, per Joe]
    Fixes: 5264f2f75d86 ("include/linux/printk.h: use and neaten no_printk")
    Signed-off-by: Aaron Conole
    Reported-by: Dmitry Vyukov
    Cc: Joe Perches
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Conole
     

11 Sep, 2015

2 commits

  • print_hex_dump_debug() is likely supposed to be analogous to pr_debug() or
    dev_dbg() & friends. Currently it will adhere to dynamic debug, but will
    not stub out prints if CONFIG_DEBUG is not set. Let's make it do the
    right thing, because I am tired of having my dmesg buffer full of hex
    dumps on production systems.

    Signed-off-by: Linus Walleij
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Walleij
     
  • The other two implementations of pr_debug_ratelimited include pr_fmt,
    along with every other pr_* function. But pr_debug_ratelimited forgot to
    add it with the CONFIG_DYNAMIC_DEBUG implementation.

    This patch unifies the behavior.

    Signed-off-by: Jason A. Donenfeld
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason A. Donenfeld
     

18 Jul, 2015

1 commit

  • Using __printf attributes helps to detect several format string issues
    at compile time (even though -Wformat-security is currently disabled in
    Makefile). For example it can detect when formatting a pointer as a
    number, like the issue fixed in commit a3fa71c40f18 ("wl18xx: show
    rx_frames_per_rates as an array as it really is"), or when the arguments
    do not match the format string, c.f. for example commit 5ce1aca81435
    ("reiserfs: fix __RASSERT format string").

    To prevent similar bugs in the future, add a __printf attribute to every
    function prototype which needs one in include/linux/ and lib/. These
    functions were mostly found by using gcc's -Wsuggest-attribute=format
    flag.

    Signed-off-by: Nicolas Iooss
    Cc: Greg Kroah-Hartman
    Cc: Felipe Balbi
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     

26 Jun, 2015

1 commit

  • This patchset updates netconsole so that it can emit messages with the
    same header as used in /dev/kmsg which gives neconsole receiver full log
    information which enables things like structured logging and detection
    of lost messages.

    This patch (of 7):

    devkmsg_read() uses 8k buffer and assumes that the formatted output
    message won't overrun which seems safe given LOG_LINE_MAX, the current use
    of dict and the escaping method being used; however, we're planning to use
    devkmsg formatting wider and accounting for the buffer size properly isn't
    that complicated.

    This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
    so that it limits output accordingly.

    Signed-off-by: Tejun Heo
    Cc: David Miller
    Cc: Kay Sievers
    Reviewed-by: Petr Mladek
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

16 Apr, 2015

1 commit

  • KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used
    more often, and it lacks the comment stating what it is used for. It can
    be confused as continuing the log level, but that is not its purpose. Its
    purpose is to continue a line that had no newline enclosed. This should
    be documented by pr_cont() as well.

    Signed-off-by: Steven Rostedt
    Acked-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

13 Feb, 2015

1 commit

  • This patch makes hexdump return the number of bytes placed in the buffer
    excluding trailing NUL. In the case of overflow it returns the desired
    amount of bytes to produce the entire dump. Thus, it mimics snprintf().

    This will be useful for users that would like to repeat with a bigger
    buffer.

    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

27 Jan, 2015

1 commit

  • There are missing dummy routines for log_buf_addr_get() and
    log_buf_len_get() for when CONFIG_PRINTK is not set causing build
    failures.

    This patch adds these dummy routines at the appropriate location.

    Signed-off-by: Pranith Kumar
    Cc: Michael Ellerman
    Reviewed-by: Petr Mladek
    Acked-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pranith Kumar
     

11 Dec, 2014

2 commits

  • Pull nmi-safe seq_buf printk update from Steven Rostedt:
    "This code is a fork from the trace-3.19 pull as it needed the
    trace_seq clean ups from that branch.

    This code solves the issue of performing stack dumps from NMI context.
    The issue is that printk() is not safe from NMI context as if the NMI
    were to trigger when a printk() was being performed, the NMI could
    deadlock from the printk() internal locks. This has been seen in
    practice.

    With lots of review from Petr Mladek, this code went through several
    iterations, and we feel that it is now at a point of quality to be
    accepted into mainline.

    Here's what is contained in this patch set:

    - Creates a "seq_buf" generic buffer utility that allows a descriptor
    to be passed around where functions can write their own "printk()"
    formatted strings into it. The generic version was pulled out of
    the trace_seq() code that was made specifically for tracing.

    - The seq_buf code was change to model the seq_file code. I have a
    patch (not included for 3.19) that converts the seq_file.c code
    over to use seq_buf.c like the trace_seq.c code does. This was
    done to make sure that seq_buf.c is compatible with seq_file.c. I
    may try to get that patch in for 3.20.

    - The seq_buf.c file was moved to lib/ to remove it from being
    dependent on CONFIG_TRACING.

    - The printk() was updated to allow for a per_cpu "override" of the
    internal calls. That is, instead of writing to the console, a call
    to printk() may do something else. This made it easier to allow
    the NMI to change what printk() does in order to call dump_stack()
    without needing to update that code as well.

    - Finally, the dump_stack from all CPUs via NMI code was converted to
    use the seq_buf code. The caller to trigger the NMI code would
    wait till all the NMIs finished, and then it would print the
    seq_buf data to the console safely from a non NMI context

    One added bonus is that this code also makes the NMI dump stack work
    on PREEMPT_RT kernels. As printk() includes sleeping locks on
    PREEMPT_RT, printk() only writes to console if the console does not
    use any rt_mutex converted spin locks. Which a lot do"

    * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    x86/nmi: Fix use of unallocated cpumask_var_t
    printk/percpu: Define printk_func when printk is not defined
    x86/nmi: Perform a safe NMI stack trace on all CPUs
    printk: Add per_cpu printk func to allow printk to be diverted
    seq_buf: Move the seq_buf code to lib/
    seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
    tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
    tracing: Have seq_buf use full buffer
    seq_buf: Add seq_buf_can_fit() helper function
    tracing: Add paranoid size check in trace_printk_seq()
    tracing: Use trace_seq_used() and seq_buf_used() instead of len
    tracing: Clean up tracing_fill_pipe_page()
    seq_buf: Create seq_buf_used() to find out how much was written
    tracing: Add a seq_buf_clear() helper and clear len and readpos in init
    tracing: Convert seq_buf fields to be like seq_file fields
    tracing: Convert seq_buf_path() to be like seq_path()
    tracing: Create seq_buf layer in trace_seq

    Linus Torvalds
     
  • Eliminate the unlikely possibility of message interleaving for
    early_printk/early_vprintk use.

    early_vprintk can be done via the %pV extension so remove this
    unnecessary function and change early_printk to have the equivalent
    vprintk code.

    All uses of early_printk already end with a newline so also remove the
    unnecessary newline from the early_printk function.

    Signed-off-by: Joe Perches
    Acked-by: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

22 Nov, 2014

1 commit

  • To avoid include hell, the per_cpu variable printk_func was declared
    in percpu.h. But it is only defined if printk is defined.

    As users of printk may also use the printk_func variable, it needs to
    be defined even if CONFIG_PRINTK is not.

    Also add a printk.h include in percpu.h just to be safe.

    Link: http://lkml.kernel.org/r/20141121183215.01ba539c@canb.auug.org.au

    Reported-by: Stephen Rothwell
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

20 Nov, 2014

1 commit

  • Being able to divert printk to call another function besides the normal
    logging is useful for such things like NMI handling. If some functions
    are to be called from NMI that does printk() it is possible to lock up
    the box if the nmi handler triggers when another printk is happening.

    One example of this use is to perform a stack trace on all CPUs via NMI.
    But if the NMI is to do the printk() it can cause the system to lock up.
    By allowing the printk to be diverted to another function that can safely
    record the printk output and then print it when it in a safe context
    then NMIs will be safe to call these functions like show_regs().

    Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org

    Tested-by: Jiri Kosina
    Acked-by: Jiri Kosina
    Acked-by: Paul E. McKenney
    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

13 Aug, 2014

1 commit

  • Platforms like IBM Power Systems supports service processor
    assisted dump. It provides interface to add memory region to
    be captured when system is crashed.

    During initialization/running we can add kernel memory region
    to be collected.

    Presently we don't have a way to get the log buffer base address
    and size. This patch adds support to return log buffer address
    and size.

    Signed-off-by: Vasant Hegde
    Signed-off-by: Benjamin Herrenschmidt
    Acked-by: Andrew Morton

    Vasant Hegde
     

07 Aug, 2014

1 commit

  • Commit a8fe19ebfbfd ("kernel/printk: use symbolic defines for console
    loglevels") makes consistent use of symbolic values for printk() log
    levels.

    The naming scheme used is different from the one used for
    DEFAULT_MESSAGE_LOGLEVEL though. Change that symbol name to be
    MESSAGE_LOGLEVEL_DEFAULT for consistency. And because the value of that
    symbol comes from a similarly-named config option, rename
    CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well.

    Signed-off-by: Alex Elder
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: John Stultz
    Cc: Petr Mladek
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     

05 Jun, 2014

4 commits

  • ... instead of naked numbers.

    Stuff in sysrq.c used to set it to 8 which is supposed to mean above
    default level so set it to DEBUG instead as we're terminating/killing all
    tasks and we want to be verbose there.

    Also, correct the check in x86_64_start_kernel which should be >= as
    we're clearly issuing the string there for all debug levels, not only
    the magical 10.

    Signed-off-by: Borislav Petkov
    Acked-by: Kees Cook
    Acked-by: Randy Dunlap
    Cc: Joe Perches
    Cc: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • The pr_debug() and related debug print macros all differ from the normal
    pr_XXX() macros, in that the normal ones print unconditionally, while
    the debug macros are compiled out unless DEBUG is defined or
    CONFIG_DYNAMIC_DEBUG is set. This isn't obvious, and the only way to
    find this out is either to review the actual printk.h code or to read
    CodingStyle, and the message there doesn't highlight the fact.

    Change Documentation/CodingStyle to clearly indicate that pr_debug() and
    related debug printing macros behave differently than all other pr_XXX()
    macros, and attempt to clarify when and where the different debug
    printing methods might be used.

    Add short comment to printk.h above the pr_XXX() macros indicating that
    while these macros print unconditionally, pr_debug() does not.

    Signed-off-by: Dan Streetman
    Cc: Joe Perches
    Cc: Fabian Frederick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Two of the three prink_deferred uses are really printk_once style
    uses, so add a printk_deferred_once macro to simplify those call
    sites.

    Signed-off-by: John Stultz
    Reviewed-by: Steven Rostedt
    Reviewed-by: Jan Kara
    Cc: Peter Zijlstra
    Cc: Jiri Bohac
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     
  • After learning we'll need some sort of deferred printk functionality in
    the timekeeping core, Peter suggested we rename the printk_sched function
    so it can be reused by needed subsystems.

    This only changes the function name. No logic changes.

    Signed-off-by: John Stultz
    Reviewed-by: Steven Rostedt
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Jiri Bohac
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     

04 Apr, 2014

2 commits


26 Jan, 2014

1 commit

  • Pull networking updates from David Miller:

    1) BPF debugger and asm tool by Daniel Borkmann.

    2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

    3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

    4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match. From Ben Hutchings.

    5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

    6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

    7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

    8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

    9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

    10) Fix ipv6 router reachability probing, from Jiri Benc.

    11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

    12) Support tunneling in GRO layer, from Jerry Chu.

    13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

    14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI. From Atzm Watanabe.

    15) New "Heavy Hitter" qdisc, from Terry Lam.

    16) Significantly improve the IPSEC support in pktgen, from Fan Du.

    17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
    Herbert.

    18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

    19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

    20) Key TCP metrics blobs also by source address, not just destination
    address. From Christoph Paasch.

    21) Support 10G in generic phylib. From Andy Fleming.

    22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided. From Tom Herbert.

    The wireless and netfilter folks have been busy little bees too.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
    net/cxgb4: Fix referencing freed adapter
    ipv6: reallocate addrconf router for ipv6 address when lo device up
    fib_frontend: fix possible NULL pointer dereference
    rtnetlink: remove IFLA_BOND_SLAVE definition
    rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
    qlcnic: update version to 5.3.55
    qlcnic: Enhance logic to calculate msix vectors.
    qlcnic: Refactor interrupt coalescing code for all adapters.
    qlcnic: Update poll controller code path
    qlcnic: Interrupt code cleanup
    qlcnic: Enhance Tx timeout debugging.
    qlcnic: Use bool for rx_mac_learn.
    bonding: fix u64 division
    rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
    sfc: Use the correct maximum TX DMA ring size for SFC9100
    Add Shradha Shah as the sfc driver maintainer.
    net/vxlan: Share RX skb de-marking and checksum checks with ovs
    tulip: cleanup by using ARRAY_SIZE()
    ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
    net/cxgb4: Don't retrieve stats during recovery
    ...

    Linus Torvalds
     

24 Jan, 2014

1 commit

  • Add #include to define __read_mostly.

    Convert cache.h to use uapi/linux/kernel.h instead
    of linux/kernel.h to avoid recursive #includes.

    Convert the ALIGN macro to __ALIGN_KERNEL.

    printk_once only sets the bool variable tested
    once so mark it __read_mostly.

    Neaten the alignment so it matches the rest of the
    pr__once #defines too.

    Signed-off-by: Joe Perches
    Reviewed-by: James Hogan
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

01 Jan, 2014

1 commit

  • sctp has several points in its setsockopt path in which it issues deprecation
    warnings. It seems like it might be handy to macrotize such a warning so other
    subsystems can use it easily

    Signed-off-by: Neil Horman
    CC: Greg Kroah-Hartman
    CC: "David S. Miller"
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Neil Horman
     

30 Oct, 2013

1 commit

  • pr_debug_ratelimited should be coded similarly to dev_dbg_ratelimited
    to reduce the "callbacks suppressed" messages.

    Add #include to printk.h. Unfortunately, this
    new #include must be after the prototype/declaration of function printk.

    It may be better to split out these _ratelimited declarations into
    a separate file one day.

    Any use of these pr__ratelimited functions must also have another
    specific #include . Most users have this done indirectly
    via #include

    printk.h may not #include as it causes circular
    dependencies and compilation failures.

    Signed-off-by: Joe Perches
    Tested-by: Krzysztof Mazur
    Signed-off-by: Greg Kroah-Hartman

    Joe Perches