25 Dec, 2016

1 commit


17 Dec, 2016

1 commit

  • Pull vfs updates from Al Viro:

    - more ->d_init() stuff (work.dcache)

    - pathname resolution cleanups (work.namei)

    - a few missing iov_iter primitives - copy_from_iter_full() and
    friends. Either copy the full requested amount, advance the iterator
    and return true, or fail, return false and do _not_ advance the
    iterator. Quite a few open-coded callers converted (and became more
    readable and harder to fuck up that way) (work.iov_iter)

    - several assorted patches, the big one being logfs removal

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    logfs: remove from tree
    vfs: fix put_compat_statfs64() does not handle errors
    namei: fold should_follow_link() with the step into not-followed link
    namei: pass both WALK_GET and WALK_MORE to should_follow_link()
    namei: invert WALK_PUT logics
    namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
    namei: saner calling conventions for mountpoint_last()
    namei.c: get rid of user_path_parent()
    switch getfrag callbacks to ..._full() primitives
    make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
    [iov_iter] new primitives - copy_from_iter_full() and friends
    don't open-code file_inode()
    ceph: switch to use of ->d_init()
    ceph: unify dentry_operations instances
    lustre: switch to use of ->d_init()

    Linus Torvalds
     

16 Dec, 2016

1 commit

  • If CONFIG_PRINTK=n:

    kernel/printk/printk.c:1893: warning: ‘cont’ defined but not used

    Note that there are actually two different struct cont definitions and
    objects: the first one is used if CONFIG_PRINTK=y, the second one became
    unused by removing console_cont_flush().

    Fixes: 5c2992ee7fd8 ("printk: remove console flushing special cases for partial buffered lines")
    Signed-off-by: Geert Uytterhoeven
    Acked-by: Petr Mladek
    [ I do the occasional "allnoconfig" builds, but apparently not often
    enough - Linus ]
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

15 Dec, 2016

3 commits

  • It actively hurts proper merging, and makes for a lot of special cases.
    There was a good(ish) reason for doing it originally, but it's getting
    too painful to maintain. And most of the original reasons for it are
    long gone.

    So instead of having special code to flush partial lines to the console
    (as opposed to the record buffers), do _all_ the console writing from
    the record buffer, and be done with it.

    If an oops happens (or some other synchronous event), we will flush the
    partial lines due to the oops printing activity, so this does not affect
    that. It does mean that if you have a completely hung machine, a
    partial preceding line may not have been printed out.

    That was some of the original reason for this complexity, in fact, back
    when we used to test for the historical i386 "halt" instruction problem
    by doing

    pr_info("Checking 'hlt' instruction... ");

    if (!boot_cpu_data.hlt_works_ok) {
    pr_cont("disabled\n");
    return;
    }
    halt();
    halt();
    halt();
    halt();
    pr_cont("OK\n");

    and that model no longer works (it the 'hlt' instruction kills the
    machine, the partial line won't have been flushed, so you won't even see
    it).

    Of course, that was also back in the days when people actually had
    textual console output rather than a graphical splash-screen at bootup.
    How times change..

    Cc: Sergey Senozhatsky
    Cc: Joe Perches
    Cc: Steven Rostedt
    Tested-by: Petr Mladek
    Tested-by: Geert Uytterhoeven
    Tested-by: Mark Rutland
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The record logging code looks at the previous record flags in various
    ways, and they are all wrong.

    You can't use the previous record flags to determine anything about the
    next record, because they may simply not be related. In particular, the
    reason the previous record was a continuation record may well be exactly
    _because_ the new record was printed by a different process, which is
    why the previous record was flushed.

    So all those games are simply wrong, and make the code hard to
    understand (because the code fundamentally cdoes not make sense).

    So remove it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • kdb_trap_printk allows to pass normal printk() messages to kdb via
    vkdb_printk(). For example, it is used to get backtrace using the
    classic show_stack(), see kdb_show_stack().

    vkdb_printf() tries to avoid a potential infinite loop by disabling the
    trap. But this approach is racy, for example:

    CPU1 CPU2

    vkdb_printf()
    // assume that kdb_trap_printk == 0
    saved_trap_printk = kdb_trap_printk;
    kdb_trap_printk = 0;

    kdb_show_stack()
    kdb_trap_printk++;

    Problem1: Now, a nested printk() on CPU0 calls vkdb_printf()
    even when it should have been disabled. It will not
    cause a deadlock but...

    // using the outdated saved value: 0
    kdb_trap_printk = saved_trap_printk;

    kdb_trap_printk--;

    Problem2: Now, kdb_trap_printk == -1 and will stay like this.
    It means that all messages will get passed to kdb from
    now on.

    This patch removes the racy saved_trap_printk handling. Instead, the
    recursion is prevented by a check for the locked CPU.

    The solution is still kind of racy. A non-related printk(), from
    another process, might get trapped by vkdb_printf(). And the wanted
    printk() might not get trapped because kdb_printf_cpu is assigned. But
    this problem existed even with the original code.

    A proper solution would be to get_cpu() before setting kdb_trap_printk
    and trap messages only from this CPU. I am not sure if it is worth the
    effort, though.

    In fact, the race is very theoretical. When kdb is running any of the
    commands that use kdb_trap_printk there is a single active CPU and the
    other CPUs should be in a holding pen inside kgdb_cpu_enter().

    The only time this is violated is when there is a timeout waiting for
    the other CPUs to report to the holding pen.

    Finally, note that the situation is a bit schizophrenic. vkdb_printf()
    explicitly allows recursion but only from KDB code that calls
    kdb_printf() directly. On the other hand, the generic printk()
    recursion is not allowed because it might cause an infinite loop. This
    is why we could not hide the decision inside vkdb_printf() easily.

    Link: http://lkml.kernel.org/r/1480412276-16690-4-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Cc: Daniel Thompson
    Cc: Jason Wessel
    Cc: Peter Zijlstra
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

13 Dec, 2016

4 commits

  • Merge updates from Andrew Morton:

    - various misc bits

    - most of MM (quite a lot of MM material is awaiting the merge of
    linux-next dependencies)

    - kasan

    - printk updates

    - procfs updates

    - MAINTAINERS

    - /lib updates

    - checkpatch updates

    * emailed patches from Andrew Morton : (123 commits)
    init: reduce rootwait polling interval time to 5ms
    binfmt_elf: use vmalloc() for allocation of vma_filesz
    checkpatch: don't emit unified-diff error for rename-only patches
    checkpatch: don't check c99 types like uint8_t under tools
    checkpatch: avoid multiple line dereferences
    checkpatch: don't check .pl files, improve absolute path commit log test
    scripts/checkpatch.pl: fix spelling
    checkpatch: don't try to get maintained status when --no-tree is given
    lib/ida: document locking requirements a bit better
    lib/rbtree.c: fix typo in comment of ____rb_erase_color
    lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
    MAINTAINERS: add drm and drm/i915 irc channels
    MAINTAINERS: add "C:" for URI for chat where developers hang out
    MAINTAINERS: add drm and drm/i915 bug filing info
    MAINTAINERS: add "B:" for URI where to file bugs
    get_maintainer: look for arbitrary letter prefixes in sections
    printk: add Kconfig option to set default console loglevel
    printk/sound: handle more message headers
    printk/btrfs: handle more message headers
    printk/kdb: handle more message headers
    ...

    Linus Torvalds
     
  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the final round of converting the notifier mess to the state
    machine. The removal of the notifiers and the related infrastructure
    will happen around rc1, as there are conversions outstanding in other
    trees.

    The whole exercise removed about 2000 lines of code in total and in
    course of the conversion several dozen bugs got fixed. The new
    mechanism allows to test almost every hotplug step standalone, so
    usage sites can exercise all transitions extensively.

    There is more room for improvement, like integrating all the
    pointlessly different architecture mechanisms of synchronizing,
    setting cpus online etc into the core code"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    tracing/rb: Init the CPU mask on allocation
    soc/fsl/qbman: Convert to hotplug state machine
    soc/fsl/qbman: Convert to hotplug state machine
    zram: Convert to hotplug state machine
    KVM/PPC/Book3S HV: Convert to hotplug state machine
    arm64/cpuinfo: Convert to hotplug state machine
    arm64/cpuinfo: Make hotplug notifier symmetric
    mm/compaction: Convert to hotplug state machine
    iommu/vt-d: Convert to hotplug state machine
    mm/zswap: Convert pool to hotplug state machine
    mm/zswap: Convert dst-mem to hotplug state machine
    mm/zsmalloc: Convert to hotplug state machine
    mm/vmstat: Convert to hotplug state machine
    mm/vmstat: Avoid on each online CPU loops
    mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
    tracing/rb: Convert to hotplug state machine
    oprofile/nmi timer: Convert to hotplug state machine
    net/iucv: Use explicit clean up labels in iucv_init()
    x86/pci/amd-bus: Convert to hotplug state machine
    x86/oprofile/nmi: Convert to hotplug state machine
    ...

    Linus Torvalds
     
  • Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
    continuation lines") added back KERN_CONT message header. As a result
    it might appear in the middle of the line when the parts are squashed
    via the temporary NMI buffer.

    A reasonable solution seems to be to split the text in the NNI temporary
    not only by newlines but also by the message headers.

    Another solution would be to filter out KERN_CONT when writing to the
    temporary buffer. But this would complicate the lockless handling.
    Also it would not solve problems with a missing newline that was there
    even before the KERN_CONT stuff.

    This patch moves the temporary buffer handling into separate function.
    I played with it and it seems that using the char pointers make the code
    easier to read.

    Also it prints the final newline as a continuous line.

    Finally, it moves handling of the s->len overflow into the paranoid
    check. And allows to recover from the disaster.

    Link: http://lkml.kernel.org/r/1478695291-12169-2-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Reviewed-by: Sergey Senozhatsky
    Cc: Joe Perches
    Cc: Steven Rostedt
    Cc: Jason Wessel
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • vsnprintf() adds the trailing '\0' but it does not count it into the
    number of printed characters. The result is that there is one byte less
    space for the real characters in the buffer.

    The broken check for the free space might cause that we will repeatedly
    try to print 1 character into the buffer, never reach the full buffer,
    and do not count the messages as missed.

    Also vsnprintf() returns the number of characters that would be printed
    if the buffer was big enough. As a result, s->len might be bigger than
    the size of the buffer[*]. And the printk() function might return
    bigger len than it really printed. Both problems are fixed by using
    vscnprintf() instead.

    Note that I though about increasing the number of missed messages even
    when the message was shrunken. But it made the code even more
    complicated. I think that it is not worth it. Shrunken messages are
    usually easy to recognize. And it should be a corner case.

    [*] The overflown s->len value is crazy and unexpected. I "made a
    mistake" and reported this situation as an internal error when fixed
    handling of PR_CONT headers in some other patch.

    Link: http://lkml.kernel.org/r/20161208174912.GA17042@linux.suse
    Signed-off-by: Petr Mladek
    CcL Sergey Senozhatsky
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jason Wessel
    Cc: Josef Bacik
    Cc: Joe Perches
    Cc: Jaroslav Kysela
    Cc: Steven Rostedt
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

06 Dec, 2016

1 commit

  • copy_from_iter_full(), copy_from_iter_full_nocache() and
    csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
    et.al., advancing iterator only in case of successful full copy
    and returning whether it had been successful or not.

    Convert some obvious users. *NOTE* - do not blindly assume that
    something is a good candidate for those unless you are sure that
    not advancing iov_iter in failure case is the right thing in
    this case. Anything that does short read/short write kind of
    stuff (or is in a loop, etc.) is unlikely to be a good one.

    Signed-off-by: Al Viro

    Al Viro
     

18 Nov, 2016

1 commit

  • The recent conversion of the console hotplug notifier to the state machine
    missed the fact, that the notifier only operated on the non frozen
    transitions. As a consequence the console_lock/unlock() pair is also
    invoked during suspend, which results in a lockdep warning.

    Restore the previous state by making the lock/unlock conditional on
    !tasks_frozen.

    Fixes: 90b14889d2f9 ("kernel/printk: Convert to hotplug state machine")
    Reported-and-tested-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611171729320.3645@nanos
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior

    Thomas Gleixner
     

15 Nov, 2016

1 commit

  • This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.

    It turns out that this flushes things much too aggressiverly, and causes
    lines to break up when the system logger races with new continuation
    lines being printed.

    There's a pending patch to make printk() flushing much more
    straightforward, but it's too invasive for 4.9, so in the meantime let's
    just not make the system message logging flush continuation lines.
    They'll be flushed by the final newline anyway.

    Suggested-by: Petr Mladek
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Nov, 2016

1 commit

  • This reverts commit 05fd007e4629 ("console: don't prefer first
    registered if DT specifies stdout-path").

    The reverted commit changes existing behavior on which many ARM boards
    rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have
    both a video output and a serial console. Depending on whether the user
    is using the device as a more regular computer; or as a headless device
    we need to have the console on either one or the other.

    Many users rely on the kernel behavior of the console being present on
    both outputs, before the reverted commit the console setup with no
    console= kernel arguments on an ARM board which sets stdout-path in dt
    would look like this:

    [root@localhost ~]# cat /proc/consoles
    ttyS0 -W- (EC p a) 4:64
    tty0 -WU (E p ) 4:1

    Where as after the reverted commit, it looks like this:

    [root@localhost ~]# cat /proc/consoles
    ttyS0 -W- (EC p a) 4:64

    This commit reverts commit 05fd007e4629 ("console: don't prefer first
    registered if DT specifies stdout-path") restoring the original
    behavior.

    Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
    Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com
    Signed-off-by: Hans de Goede
    Cc: Paul Burton
    Cc: Rob Herring
    Cc: Frank Rowand
    Cc: Thorsten Leemhuis
    Cc: Greg Kroah-Hartman
    Cc: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans de Goede
     

10 Nov, 2016

1 commit


20 Oct, 2016

1 commit

  • We have a fairly common pattern where you print several things as
    continuations on one single line in a loop, and then at the end you do

    printk(KERN_CONT "\n");

    to flush the buffered output.

    But if the output was flushed by something else (concurrent printk
    activity, or just system logging), we don't want that final flushing to
    just print an empty line.

    So just suppress empty continuation lines when they couldn't be merged
    into the line they are a continuation of.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Oct, 2016

1 commit

  • Merge my system logging cleanups, triggered by the broken '\n' patches.

    The line continuation handling has been broken basically forever, and
    the code to handle the system log records was both confusing and
    dubious. And it would do entirely the wrong thing unless you always had
    a terminating newline, partly because it couldn't actually see whether a
    message was marked KERN_CONT or not (but partly because the LOG_CONT
    handling in the recording code was rather confusing too).

    This re-introduces a real semantically meaningful KERN_CONT, and fixes
    the few places I noticed where it was missing. There are probably more
    missing cases, since KERN_CONT hasn't actually had any semantic meaning
    for at least four years (other than the checkpatch meaning of "no log
    level necessary, this is a continuation line").

    This also allows the combination of KERN_CONT and a log level. In that
    case the log level will be ignored if the merging with a previous line
    is successful, but if a new record is needed, that new record will now
    get the right log level.

    That also means that you can at least in theory combine KERN_CONT with
    the "pr_info()" style helpers, although any use of pr_fmt() prefixing
    would make that just result in a mess, of course (the prefix would end
    up in the middle of a continuing line).

    * printk-cleanups:
    printk: make reading the kernel log flush pending lines
    printk: re-organize log_output() to be more legible
    printk: split out core logging code into helper function
    printk: reinstate KERN_CONT for printing continuation lines

    Linus Torvalds
     

10 Oct, 2016

4 commits

  • That will mean that any possible subsequent continuation will now be
    broken up onto a line of its own (since reading the log has finalized
    the beginning og the line), but if user space has activated system
    logging (or if there's a kernel message dump going on) that is the right
    thing to do.

    And now that we actually get the continuation flags _right_ for this
    all, the user space logger that is reading the kernel messages can
    actually see the continuation marker. Not that anybody seems to really
    bother with it (or care), but in theory user space can do its own
    message stitching.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Avoid some duplicate logic now that we can return early, and update the
    comments for the new LOG_CONT world order.

    This also stops the continuation flushing from just using random record
    flags for the flushing action, instead taking the flags from the proper
    original line and updating them as we add continuations to it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The code that actually decides how to log the message (whether to put it
    directly into the record log, whether to append it to an existing
    buffered log, or whether to start a new buffered log) is fairly
    non-obvious code in the middle of the vprintk_emit() function.

    Splitting that code up into a helper function makes it easier to
    understand, but perhaps more importantly also allows for the code to
    just return early out of the helper function once it has made the
    decision about where the new log content goes.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Long long ago the kernel log buffer was a buffered stream of bytes, very
    much like stdio in user space. It supported log levels by scanning the
    stream and noticing the log level markers at the beginning of each line,
    but if you wanted to print a partial line in multiple chunks, you just
    did multiple printk() calls, and it just automatically worked.

    Except when it didn't, and you had very confusing output when different
    lines got all mixed up with each other. Then you got fragment lines
    mixing with each other, or with non-fragment lines, because it was
    traditionally impossible to tell whether a printk() call was a
    continuation or not.

    To at least help clarify the issue of continuation lines, we added a
    KERN_CONT marker back in 2007 to mark continuation lines:

    474925277671 ("printk: add KERN_CONT annotation").

    That continuation marker was initially an empty string, and didn't
    actuall make any semantic difference. But it at least made it possible
    to annotate the source code, and have check-patch notice that a printk()
    didn't need or want a log level marker, because it was a continuation of
    a previous line.

    To avoid the ambiguity between a continuation line that had that
    KERN_CONT marker, and a printk with no level information at all, we then
    in 2009 made KERN_CONT be a real log level marker which meant that we
    could now reliably tell the difference between the two cases.

    5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")

    and we could take advantage of that to make sure we didn't mix up
    continuation lines with lines that just didn't have any loglevel at all.

    Then, in 2012, the kernel log buffer was changed to be a "record" based
    log, where each line was a record that has a loglevel and a timestamp.

    You can see the beginning of that conversion in commits

    e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
    7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")

    with a number of follow-up commits to fix some painful fallout from that
    conversion. Over all, it took a couple of months to sort out most of
    it. But the upside was that you could have concurrent readers (and
    writers) of the kernel log and not have lines with mixed output in them.

    And one particular pain-point for the record-based kernel logging was
    exactly the fragmentary lines that are generated in smaller chunks. In
    order to still log them as one recrod, the continuation lines need to be
    attached to the previous record properly.

    However the explicit continuation record marker that is actually useful
    for this exact case was actually removed in aroundm the same time by commit

    61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

    due to the incorrect belief that KERN_CONT wasn't meaningful. The
    ambiguity between "is this a continuation line" or "is this a plain
    printk with no log level information" was reintroduced, and in fact
    became an even bigger pain point because there was now the whole
    record-level merging of kernel messages going on.

    This patch reinstates the KERN_CONT as a real non-empty string marker,
    so that the ambiguity is fixed once again.

    But it's not a plain revert of that original removal: in the four years
    since we made KERN_CONT an empty string again, not only has the format
    of the log level markers changed, we've also had some usage changes in
    this area.

    For example, some ACPI code seems to use KERN_CONT _together_ with a log
    level, and now uses both the KERN_CONT marker and (for example) a
    KERN_INFO marker to show that it's an informational continuation of a
    line.

    Which is actually not a bad idea - if the continuation line cannot be
    attached to its predecessor, without the log level information we don't
    know what log level to assign to it (and we traditionally just assigned
    it the default loglevel). So having both a log level and the KERN_CONT
    marker is not necessarily a bad idea, but it does mean that we need to
    actually iterate over potentially multiple markers, rather than just a
    single one.

    Also, since KERN_CONT was still conceptually needed, and encouraged, but
    didn't actually _do_ anything, we've also had the reverse problem:
    rather than having too many annotations it has too few, and there is bit
    rot with code that no longer marks the continuation lines with the
    KERN_CONT marker.

    So this patch not only re-instates the non-empty KERN_CONT marker, it
    also fixes up the cases of bit-rot I noticed in my own logs.

    There are probably other cases where KERN_CONT will be needed to be
    added, either because it is new code that never dealt with the need for
    KERN_CONT, or old code that has bitrotted without anybody noticing.

    That said, we should strive to avoid the need for KERN_CONT. It does
    result in real problems for logging, and should generally not be seen as
    a good feature. If we some day can get rid of the feature entirely,
    because nobody does any fragmented printk calls, that would be lovely.

    But until that point, let's at mark the code that relies on the hacky
    multi-fragment kernel printk's. Not only does it avoid the ambiguity,
    it also annotates code as "maybe this would be good to fix some day".

    (That said, particularly during single-threaded bootup, the downsides of
    KERN_CONT are very limited. Things get much hairier when you have
    multiple threads going on and user level reading and writing logs too).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • If a device tree specifies a preferred device for kernel console output
    via the stdout-path or linux,stdout-path chosen node properties or the
    stdout alias then the kernel ought to honor it & output the kernel
    console to that device. As it stands, this isn't the case. Whilst we
    parse the stdout-path properties & set an of_stdout variable from
    of_alias_scan(), and use that from of_console_check() to determine
    whether to add a console device as a preferred console whilst
    registering it, we also prefer the first registered console if no other
    has been selected at the time of its registration.

    This means that if a console other than the one the device tree selects
    via stdout-path is registered first, we will switch to using it & when
    the stdout-path console is later registered the call to
    add_preferred_console() via of_console_check() is too late to do
    anything useful. In practice this seems to mean that we switch to the
    dummy console device fairly early & see no further console output:

    Console: colour dummy device 80x25
    console [tty0] enabled
    bootconsole [ns16550a0] disabled

    Fix this by not automatically preferring the first registered console if
    one is specified by the device tree. This allows consoles to be
    registered but not enabled, and once the driver for the console selected
    by stdout-path calls of_console_check() the driver will be added to the
    list of preferred consoles before any other console has been enabled.
    When that console is then registered via register_console() it will be
    enabled as expected.

    Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.com
    Signed-off-by: Paul Burton
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: Tejun Heo
    Cc: Sergey Senozhatsky
    Cc: Jiri Slaby
    Cc: Daniel Vetter
    Cc: Ivan Delalande
    Cc: Thierry Reding
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Greg Kroah-Hartman
    Cc: Rob Herring
    Cc: Frank Rowand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Burton
     

02 Sep, 2016

1 commit

  • __printk_nmi_flush() can be called from nmi_panic(), therefore it has to
    test whether it's executed in NMI context and thus must route the
    messages through deferred printk() or via direct printk().

    This is to avoid potential deadlocks, as described in commit
    cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic").

    However there remain two places where __printk_nmi_flush() does
    unconditional direct printk() calls:

    - pr_err("printk_nmi_flush: internal error ...")
    - pr_cont("\n")

    Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
    function, which takes care of in_nmi(), and use it in
    __printk_nmi_flush() for printing and error-reporting.

    Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Petr Mladek
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

27 Aug, 2016

1 commit

  • Commit bbeddf52adc1 ("printk: move braille console support into separate
    braille.[ch] files") moved the parsing of braille-related options into
    _braille_console_setup(), changing the type of variable str from char*
    to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated
    to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

    Update the code to make "brl=" option work again and replace memcmp()
    with strncmp() to make the compiler able to detect such an issue.

    Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files")
    Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
    Signed-off-by: Nicolas Iooss
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     

10 Aug, 2016

1 commit

  • This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142.

    Geert Uytterhoeven reports:
    "This change seems to have an (unintendent?) side-effect.

    Before, pr_*() calls without a trailing newline characters would be
    printed with a newline character appended, both on the console and in
    the output of the dmesg command.

    After this commit, no new line character is appended, and the output
    of the next pr_*() call of the same type may be appended, like in:

    - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
    - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
    + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"

    Joe Perches says:
    "No, that is not intentional.

    The newline handling code inside vprintk_emit is a bit involved and
    for now I suggest a revert until this has all the same behavior as
    earlier"

    Reported-by: Geert Uytterhoeven
    Requested-by: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Aug, 2016

1 commit

  • In commit 874f9c7da9a4 ("printk: create pr_ functions"), new
    pr_level defines were added to printk.c.

    These new defines are guarded by an #ifdef CONFIG_PRINTK - however,
    there is already a surrounding #ifdef CONFIG_PRINTK starting a lot
    earlier in line 249 which means the newly introduced #ifdef is
    unnecessary.

    Let's remove it to avoid confusion.

    Signed-off-by: Andreas Ziegler
    Cc: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Ziegler
     

03 Aug, 2016

5 commits

  • Add a "printk.devkmsg" kernel command line parameter which controls how
    userspace writes into /dev/kmsg. It has three options:

    * ratelimit - ratelimit logging from userspace.
    * on - unlimited logging from userspace
    * off - logging from userspace gets ignored

    The default setting is to ratelimit the messages written to it.

    This changes the kernel default setting of "on" to "ratelimit" and we do
    that because we want to keep userspace spamming /dev/kmsg to sane
    levels. This is especially moot when a small kernel log buffer wraps
    around and messages get lost. So the ratelimiting setting should be a
    sane setting where kernel messages should have a bit higher chance of
    survival from all the spamming.

    It additionally does not limit logging to /dev/kmsg while the system is
    booting if we haven't disabled it on the command line.

    Furthermore, we can control the logging from a lower priority sysctl
    interface - kernel.printk_devkmsg.

    That interface will succeed only if printk.devkmsg *hasn't* been
    supplied on the command line. If it has, then printk.devkmsg is a
    one-time setting which remains for the duration of the system lifetime.
    This "locking" of the setting is to prevent userspace from changing the
    logging on us through sysctl(2).

    This patch is based on previous patches from Linus and Steven.

    [bp@suse.de: fixes]
    Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
    Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Dave Young
    Cc: Franck Bui
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • asm-generic headers are generic implementations for architecture
    specific code and should not be included by common code. Thus use the
    asm/ version of sections.h to get at the linker sections.

    Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Messages' levels and console log level are inspected when the actual
    printing occurs, which may provoke console_unlock() and
    console_cont_flush() to waste CPU cycles on every message that has
    loglevel above the current console_loglevel.

    Schematically, console_unlock() does the following:

    console_unlock()
    {
    ...
    for (;;) {
    ...
    raw_spin_lock_irqsave(&logbuf_lock, flags);
    skip:
    msg = log_from_idx(console_idx);

    if (msg->flags & LOG_NOCONS) {
    ...
    goto skip;
    }

    level = msg->level;
    len += msg_print_text(); >> sprintfs
    memcpy,
    etc.

    if (nr_ext_console_drivers) {
    ext_len = msg_print_ext_header(); >> scnprintf
    ext_len += msg_print_ext_body(); >> scnprintfs
    etc.
    }
    ...
    raw_spin_unlock(&logbuf_lock);

    call_console_drivers(level, ext_text, ext_len, text, len)
    {
    if (level >= console_loglevel && >> drop the message
    !ignore_loglevel)
    return;

    console->write(...);
    }

    local_irq_restore(flags);
    }
    ...
    }

    The thing here is this deferred `level >= console_loglevel' check. We
    are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
    that we will eventually drop.

    This can be huge when we register a new CON_PRINTBUFFER console, for
    instance. For every such a console register_console() resets the

    console_seq, console_idx, console_prev

    and sets a `exclusive console' pointer to replay the log buffer to that
    just-registered console. And there can be a lot of messages to replay,
    in the worst case most of which can be dropped after console_loglevel
    test.

    We know messages' levels long before we call msg_print_text() and
    friends, so we can just move console_loglevel check out of
    call_console_drivers() and format a new message only if we are sure that
    it won't be dropped.

    The patch factors out loglevel check into suppress_message_printing()
    function and tests message->level and console_loglevel before formatting
    functions in console_unlock() and console_cont_flush() are getting
    executed. This improves things not only for exclusive CON_PRINTBUFFER
    consoles, but for every console_unlock() that attempts to print a
    message of level above the console_loglevel.

    Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Tejun Heo
    Cc: Jan Kara
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Using functions instead of macros can reduce overall code size by
    eliminating unnecessary "KERN_SOH" prefixes from format strings.

    defconfig x86-64:

    $ size vmlinux*
    text data bss dec hex filename
    10193570 4331464 1105920 15630954 ee826a vmlinux.new
    10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old

    As the return value are unimportant and unused in the kernel tree, these
    new functions return void.

    Miscellanea:

    - change pr_ macros to call new __pr_ functions
    - change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

    [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
    Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • A trivial cosmetic change: interrupt.h header is redundant since commit
    6b898c07cb1d ("console: use might_sleep in console_lock").

    Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

29 Jul, 2016

1 commit

  • We currently show:

    task: ti: task.ti: "

    "ti" and "task.ti" are redundant, and neither is actually what we want
    to show, which the the base of the thread stack. Change the display to
    show the stack pointer explicitly.

    Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

21 May, 2016

4 commits

  • In NMI context, printk() messages are stored into per-CPU buffers to
    avoid a possible deadlock. They are normally flushed to the main ring
    buffer via an IRQ work. But the work is never called when the system
    calls panic() in the very same NMI handler.

    This patch tries to flush NMI buffers before the crash dump is
    generated. In this case it does not risk a double release and bails out
    when the logbuf_lock is already taken. The aim is to get the messages
    into the main ring buffer when possible. It makes them better
    accessible in the vmcore.

    Then the patch tries to flush the buffers second time when other CPUs
    are down. It might be more aggressive and reset logbuf_lock. The aim
    is to get the messages available for the consequent kmsg_dump() and
    console_flush_on_panic() calls.

    The patch causes vprintk_emit() to be called even in NMI context again.
    But it is done via printk_deferred() so that the console handling is
    skipped. Consoles use internal locks and we could not prevent a
    deadlock easily. They are explicitly called later when the crash dump
    is not generated, see console_flush_on_panic().

    Signed-off-by: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Daniel Thompson
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Testing has shown that the backtrace sometimes does not fit into the 4kB
    temporary buffer that is used in NMI context. The warnings are gone
    when I double the temporary buffer size.

    This patch doubles the buffer size and makes it configurable.

    Note that this problem existed even in the x86-specific implementation
    that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
    stack trace on all CPUs"). Nobody noticed it because it did not print
    any warnings.

    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • We could not resize the temporary buffer in NMI context. Let's warn if
    a message is lost.

    This is rather theoretical. printk() should not be used in NMI. The
    only sensible use is when we want to print backtrace from all CPUs. The
    current buffer should be enough for this purpose.

    [akpm@linux-foundation.org: whitespace fixlet]
    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

18 Mar, 2016

4 commits

  • This allows us to extract from the vmcore only the messages emitted
    since the last time the ring buffer was cleared. We just have to make
    sure its value is always up-to-date, when old messages are discarded to
    free space in log_make_free_space() for example.

    Signed-off-by: Zeyu Zhao
    Signed-off-by: Ivan Delalande
    Cc: Kay Sievers
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ivan Delalande
     
  • have_callable_console() must also test CON_ENABLED bit, not just
    CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can
    wrongly assume that it's safe to call_console_drivers().

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1, since 8d91f8b15361 ("printk: do
    cond_resched() between lines while outputting to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    However, console_trylock() callers (among them is printk()) do not
    always call printk() from atomic contexts, and some of them can
    cond_resched() in console_unlock(), so console_trylock() can set
    `console_may_schedule' to 1 for such processes.

    For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
    sets `console_may_schedule' to 0.

    It's possible to drop explicit preempt_disable()/preempt_enable() in
    vprintk_emit(), because console_unlock() and console_trylock() are now
    smart enough:
    a) console_unlock() does not cond_resched() when it's unsafe
    (console_trylock() takes care of that)
    b) console_unlock() does can_use_console() check.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1 (this functionality is present since
    8d91f8b15361 ("printk: do cond_resched() between lines while outputting
    to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    printk() calls console_unlock() with preemption desabled, which
    basically can lead to RCU stalls, watchdog soft lockups, etc. if
    something is simultaneously calling printk() frequent enough (IOW,
    console_sem owner always has new data to send to console divers and
    can't leave console_unlock() for a long time).

    printk()->console_trylock() callers do not necessarily execute in atomic
    contexts, and some of them can cond_resched() in console_unlock().
    console_trylock() can set `console_may_schedule' to 1 (allow
    cond_resched() later in consoe_unlock()) when it's safe.

    This patch (of 3):

    vprintk_emit() disables preemption around console_trylock_for_printk()
    and console_unlock() calls for a strong reason -- can_use_console()
    check. The thing is that vprintl_emit() can be called on a CPU that is
    not fully brought up yet (!cpu_online()), which potentially can cause
    problems if console driver wants to access per-cpu data. A console
    driver can explicitly state that it's safe to call it from !online cpu
    by setting CON_ANYTIME bit in console ->flags. That's why for
    !cpu_online() can_use_console() iterates all the console to find out if
    there is a CON_ANYTIME console, otherwise console_unlock() must be
    avoided.

    can_use_console() ensures that console_unlock() call is safe in
    vprintk_emit() only; console_lock() and console_trylock() are not
    covered by this check. Even though call_console_drivers(), invoked from
    console_cont_flush() and console_unlock(), tests `!cpu_online() &&
    CON_ANYTIME' for_each_console(), it may be too late, which can result in
    messages loss.

    Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
    CON_ANYTIME consoles available.

    CPU0 online CPU1 !online
    console_trylock()
    ...
    console_unlock()
    console_cont_flush
    spin_lock logbuf_lock
    if (!cont.len) {
    spin_unlock logbuf_lock
    return
    }
    for (;;) {
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    spin_lock logbuf_lock
    !console_trylock_for_printk msg_print_text
    return console_idx = log_next()
    console_seq++
    console_prev = msg->flags
    spin_unlock logbuf_lock

    call_console_drivers()
    for_each_console(con) {
    if (!cpu_online() &&
    !(con->flags & CON_ANYTIME))
    continue;
    }
    /*
    * no message printed, we lost it
    */
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    !console_trylock_for_printk
    return
    /*
    * go to the beginning of the loop,
    * find out there are new messages,
    * lose it
    */
    }

    console_trylock()/console_lock() call on CPU1 may come from cpu
    notifiers registered on that CPU. Since notifiers are not getting
    unregistered when CPU is going DOWN, all of the notifiers receive
    notifications during CPU UP. For example, on my x86_64, I see around 50
    notification sent from offline CPU to itself

    [swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify

    while doing
    echo 0 > /sys/devices/system/cpu/cpu2/online
    echo 1 > /sys/devices/system/cpu/cpu2/online

    So grabbing the console_sem lock while CPU is !online is possible,
    in theory.

    This patch moves can_use_console() check out of
    console_trylock_for_printk(). Instead it calls it in console_unlock(),
    so now console_lock()/console_unlock() are also 'protected' by
    can_use_console(). This also means that console_trylock_for_printk() is
    not really needed anymore and can be removed.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky