14 Dec, 2014

1 commit


11 Dec, 2014

6 commits

  • As printk_func will either be the default function, or a per_cpu function
    for the current CPU, there's no reason to disable preemption to access
    it from printk. That's because if the printk_func is not the default
    then the caller had better disabled preemption as they were the one to
    change it.

    Link: http://lkml.kernel.org/r/CA+55aFz5-_LKW4JHEBoWinN9_ouNcGRWAF2FUA35u46FRN-Kxw@mail.gmail.com

    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Pull nmi-safe seq_buf printk update from Steven Rostedt:
    "This code is a fork from the trace-3.19 pull as it needed the
    trace_seq clean ups from that branch.

    This code solves the issue of performing stack dumps from NMI context.
    The issue is that printk() is not safe from NMI context as if the NMI
    were to trigger when a printk() was being performed, the NMI could
    deadlock from the printk() internal locks. This has been seen in
    practice.

    With lots of review from Petr Mladek, this code went through several
    iterations, and we feel that it is now at a point of quality to be
    accepted into mainline.

    Here's what is contained in this patch set:

    - Creates a "seq_buf" generic buffer utility that allows a descriptor
    to be passed around where functions can write their own "printk()"
    formatted strings into it. The generic version was pulled out of
    the trace_seq() code that was made specifically for tracing.

    - The seq_buf code was change to model the seq_file code. I have a
    patch (not included for 3.19) that converts the seq_file.c code
    over to use seq_buf.c like the trace_seq.c code does. This was
    done to make sure that seq_buf.c is compatible with seq_file.c. I
    may try to get that patch in for 3.20.

    - The seq_buf.c file was moved to lib/ to remove it from being
    dependent on CONFIG_TRACING.

    - The printk() was updated to allow for a per_cpu "override" of the
    internal calls. That is, instead of writing to the console, a call
    to printk() may do something else. This made it easier to allow
    the NMI to change what printk() does in order to call dump_stack()
    without needing to update that code as well.

    - Finally, the dump_stack from all CPUs via NMI code was converted to
    use the seq_buf code. The caller to trigger the NMI code would
    wait till all the NMIs finished, and then it would print the
    seq_buf data to the console safely from a non NMI context

    One added bonus is that this code also makes the NMI dump stack work
    on PREEMPT_RT kernels. As printk() includes sleeping locks on
    PREEMPT_RT, printk() only writes to console if the console does not
    use any rt_mutex converted spin locks. Which a lot do"

    * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    x86/nmi: Fix use of unallocated cpumask_var_t
    printk/percpu: Define printk_func when printk is not defined
    x86/nmi: Perform a safe NMI stack trace on all CPUs
    printk: Add per_cpu printk func to allow printk to be diverted
    seq_buf: Move the seq_buf code to lib/
    seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
    tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
    tracing: Have seq_buf use full buffer
    seq_buf: Add seq_buf_can_fit() helper function
    tracing: Add paranoid size check in trace_printk_seq()
    tracing: Use trace_seq_used() and seq_buf_used() instead of len
    tracing: Clean up tracing_fill_pipe_page()
    seq_buf: Create seq_buf_used() to find out how much was written
    tracing: Add a seq_buf_clear() helper and clear len and readpos in init
    tracing: Convert seq_buf fields to be like seq_file fields
    tracing: Convert seq_buf_path() to be like seq_path()
    tracing: Create seq_buf layer in trace_seq

    Linus Torvalds
     
  • Merge first patchbomb from Andrew Morton:
    - a few minor cifs fixes
    - dma-debug upadtes
    - ocfs2
    - slab
    - about half of MM
    - procfs
    - kernel/exit.c
    - panic.c tweaks
    - printk upates
    - lib/ updates
    - checkpatch updates
    - fs/binfmt updates
    - the drivers/rtc tree
    - nilfs
    - kmod fixes
    - more kernel/exit.c
    - various other misc tweaks and fixes

    * emailed patches from Andrew Morton : (190 commits)
    exit: pidns: fix/update the comments in zap_pid_ns_processes()
    exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
    exit: exit_notify: re-use "dead" list to autoreap current
    exit: reparent: call forget_original_parent() under tasklist_lock
    exit: reparent: avoid find_new_reaper() if no children
    exit: reparent: introduce find_alive_thread()
    exit: reparent: introduce find_child_reaper()
    exit: reparent: document the ->has_child_subreaper checks
    exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
    exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
    exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
    exit: proc: don't try to flush /proc/tgid/task/tgid
    exit: release_task: fix the comment about group leader accounting
    exit: wait: drop tasklist_lock before psig->c* accounting
    exit: wait: don't use zombie->real_parent
    exit: wait: cleanup the ptrace_reparented() checks
    usermodehelper: kill the kmod_thread_locker logic
    usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
    fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
    nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
    ...

    Linus Torvalds
     
  • Pranith Kumar posted a patch in which removed the "volatile"
    qualifier for the "logbuf_cpu" variable in vprintk_emit().
    https://lkml.org/lkml/2014/11/13/894
    In his patch, he used ACCESS_ONCE() for all references to
    that symbol to provide whatever protection was intended.

    There was some discussion that followed, and in the end Steven Rostedt
    concluded that not only was "volatile" not needed, neither was it
    required to use ACCESS_ONCE(). I offered an elaborate description that
    concluded Steven was right, and Pranith asked me to submit an
    alternative patch. And this is it.

    The basic reason "volatile" is not needed is that "logbuf_cpu" has
    static storage duration, and vprintk_emit() is an exported
    interface. This means that the value of logbuf_cpu must be read
    from memory the first time it is used in a particular call of
    vprintk_emit(). The variable's value is read only once in that
    function, when it's read it'll be the copy from memory (or cache).

    In addition, the value of "logbuf_cpu" is only ever written under
    protection of a spinlock. So the value that is read is the "real"
    value (and not an out-of-date cached one). If its value is not
    UINT_MAX, it is the current CPU's processor id, and it will have
    been last written by the running CPU.

    Signed-off-by: Alex Elder
    Reported-by: Pranith Kumar
    Suggested-by: Steven Rostedt
    Reviewed-by: Jan Kara
    Cc: Petr Mladek
    Cc: Luis R. Rodriguez
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Use #defines instead of magic values.

    Signed-off-by: Joe Perches
    Acked-by: Greg Kroah-Hartman
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Eliminate the unlikely possibility of message interleaving for
    early_printk/early_vprintk use.

    early_vprintk can be done via the %pV extension so remove this
    unnecessary function and change early_printk to have the equivalent
    vprintk code.

    All uses of early_printk already end with a newline so also remove the
    unnecessary newline from the early_printk function.

    Signed-off-by: Joe Perches
    Acked-by: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

22 Nov, 2014

1 commit

  • To avoid include hell, the per_cpu variable printk_func was declared
    in percpu.h. But it is only defined if printk is defined.

    As users of printk may also use the printk_func variable, it needs to
    be defined even if CONFIG_PRINTK is not.

    Also add a printk.h include in percpu.h just to be safe.

    Link: http://lkml.kernel.org/r/20141121183215.01ba539c@canb.auug.org.au

    Reported-by: Stephen Rothwell
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

20 Nov, 2014

1 commit

  • Being able to divert printk to call another function besides the normal
    logging is useful for such things like NMI handling. If some functions
    are to be called from NMI that does printk() it is possible to lock up
    the box if the nmi handler triggers when another printk is happening.

    One example of this use is to perform a stack trace on all CPUs via NMI.
    But if the NMI is to do the printk() it can cause the system to lock up.
    By allowing the printk to be diverted to another function that can safely
    record the printk output and then print it when it in a safe context
    then NMIs will be safe to call these functions like show_regs().

    Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org

    Tested-by: Jiri Kosina
    Acked-by: Jiri Kosina
    Acked-by: Paul E. McKenney
    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

06 Nov, 2014

1 commit

  • When the kernel.dmesg_restrict restriction is in place, only users with
    CAP_SYSLOG should be able to access crash dumps (like: attacker is
    trying to exploit a bug, watchdog reboots, attacker can happily read
    crash dumps and logs).

    This puts the restriction on console-* types as well as sensitive
    information could have been leaked there.

    Other log types are unaffected.

    Signed-off-by: Sebastian Schmidt
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Sebastian Schmidt
     

15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

14 Oct, 2014

2 commits

  • Commit 458df9fd4815 ("printk: remove separate printk_sched buffers and use
    printk buf instead") hardcodes printk_deferred() to KERN_WARNING and
    inserts the string "[sched_delayed] " before the actual message. However
    it doesn't take into account the KERN_* prefix of the message, that now
    ends up in the middle of the output:

    [sched_delayed] ^a4CE: hpet increased min_delta_ns to 20115 nsec

    Fix this by just getting rid of the "[sched_delayed] " scnprintf(). The
    prefix is useless since 458df9fd4815 anyway since from that moment
    printk_deferred() inserts the message into the kernel printk buffer
    immediately. So if the message eventually gets printed to console, it is
    printed in the correct order with other messages and there's no need for
    any special prefix. And if the kernel crashes before the message makes it
    to console, then prefix in the printk buffer doesn't make the situation
    any better.

    Link: http://lkml.org/lkml/2014/9/14/4

    Signed-off-by: Markus Trippelsdorf
    Acked-by: Jan Kara
    Acked-by: Steven Rostedt
    Cc: Geert Uytterhoeven
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Trippelsdorf
     
  • When configuring a uniprocessor kernel, don't bother the user with an
    irrelevant LOG_CPU_MAX_BUF_SHIFT question, and don't build the unused
    code.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Luis R. Rodriguez
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

09 Oct, 2014

1 commit


11 Sep, 2014

1 commit

  • We shouldn't set text_len in the code path that detects printk recursion
    because text_len corresponds to the length of the string inside textbuf.
    A few lines down from the line

    text_len = strlen(recursion_msg);

    is the line

    text_len += vscnprintf(text + text_len, ...);

    So if printk detects recursion, it sets text_len to 29 (the length of
    recursion_msg) and logs an error. Then the message supplied by the
    caller of printk is stored inside textbuf but offset by 29 bytes. This
    means that the output of the recursive call to printk will contain 29
    bytes of garbage in front of it.

    This defect is caused by commit 458df9fd4815 ("printk: remove separate
    printk_sched buffers and use printk buf instead") which turned the line

    text_len = vscnprintf(text, ...);

    into

    text_len += vscnprintf(text + text_len, ...);

    To fix this, this patch avoids setting text_len when logging the printk
    recursion error. This patch also marks unlikely() the branch leading up
    to this code.

    Fixes: 458df9fd4815b478 ("printk: remove separate printk_sched buffers and use printk buf instead")
    Signed-off-by: Patrick Palka
    Reviewed-by: Petr Mladek
    Reviewed-by: Jan Kara
    Acked-by: Steven Rostedt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Patrick Palka
     

27 Aug, 2014

1 commit


13 Aug, 2014

1 commit

  • Platforms like IBM Power Systems supports service processor
    assisted dump. It provides interface to add memory region to
    be captured when system is crashed.

    During initialization/running we can add kernel memory region
    to be collected.

    Presently we don't have a way to get the log buffer base address
    and size. This patch adds support to return log buffer address
    and size.

    Signed-off-by: Vasant Hegde
    Signed-off-by: Benjamin Herrenschmidt
    Acked-by: Andrew Morton

    Vasant Hegde
     

07 Aug, 2014

11 commits

  • Fix coccinelle warnings.

    Signed-off-by: Neil Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Zhang
     
  • We need interrupts disabled when calling console_trylock_for_printk()
    only so that cpu id we pass to can_use_console() remains valid (for
    other things console_sem provides all the exclusion we need and
    deadlocks on console_sem due to interrupts are impossible because we use
    down_trylock()). However if we are rescheduled, we are guaranteed to
    run on an online cpu so we can easily just get the cpu id in
    can_use_console().

    We can lose a bit of performance when we enable interrupts in
    vprintk_emit() and then disable them again in console_unlock() but OTOH
    it can somewhat reduce interrupt latency caused by console_unlock().

    We differ from (reverted) commit 939f04bec1a4 in that we avoid calling
    console_unlock() from vprintk_emit() with lockdep enabled as that has
    unveiled quite some bugs leading to system freezes during boot (e.g.
    https://lkml.org/lkml/2014/5/30/242,
    https://lkml.org/lkml/2014/6/28/521).

    Signed-off-by: Jan Kara
    Tested-by: Andreas Bombe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Some small cleanups to kernel/printk/printk.c. None of them should
    cause any change in behavior.

    - When CONFIG_PRINTK is defined, parenthesize the value of LOG_LINE_MAX.
    - When CONFIG_PRINTK is *not* defined, there is an extra LOG_LINE_MAX
    definition; delete it.
    - Pull an assignment out of a conditional expression in console_setup().
    - Use isdigit() in console_setup() rather than open coding it.
    - In update_console_cmdline(), drop a NUL-termination assignment;
    the strlcpy() call that precedes it guarantees it's not needed.
    - Simplify some logic in printk_timed_ratelimit().

    Signed-off-by: Alex Elder
    Reviewed-by: Petr Mladek
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: John Stultz
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Use the IS_ENABLED() macro rather than #ifdef blocks to set certain
    global values.

    Signed-off-by: Alex Elder
    Acked-by: Borislav Petkov
    Reviewed-by: Petr Mladek
    Cc: Andi Kleen
    Cc: Jan Kara
    Cc: John Stultz
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Fix a few comments that don't accurately describe their corresponding
    code. It also fixes some minor typographical errors.

    Signed-off-by: Alex Elder
    Reviewed-by: Petr Mladek
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: John Stultz
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Commit a8fe19ebfbfd ("kernel/printk: use symbolic defines for console
    loglevels") makes consistent use of symbolic values for printk() log
    levels.

    The naming scheme used is different from the one used for
    DEFAULT_MESSAGE_LOGLEVEL though. Change that symbol name to be
    MESSAGE_LOGLEVEL_DEFAULT for consistency. And because the value of that
    symbol comes from a similarly-named config option, rename
    CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well.

    Signed-off-by: Alex Elder
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc: Jan Kara
    Cc: John Stultz
    Cc: Petr Mladek
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • In do_syslog() there's a path used by kmsg_poll() and kmsg_read() that
    only needs to know whether there's any data available to read (and not
    its size). These callers only check for non-zero return. As a
    shortcut, do_syslog() returns the difference between what has been
    logged and what has been "seen."

    The comments say that the "count of records" should be returned but it's
    not. Instead it returns (log_next_idx - syslog_idx), which is a
    difference between buffer offsets--and the result could be negative.

    The behavior is the same (it'll be zero or not in the same cases), but
    the count of records is more meaningful and it matches what the comments
    say. So change the code to return that.

    Signed-off-by: Alex Elder
    Cc: Petr Mladek
    Cc: Jan Kara
    Cc: Joe Perches
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • The default size of the ring buffer is too small for machines with a
    large amount of CPUs under heavy load. What ends up happening when
    debugging is the ring buffer overlaps and chews up old messages making
    debugging impossible unless the size is passed as a kernel parameter.
    An idle system upon boot up will on average spew out only about one or
    two extra lines but where this really matters is on heavy load and that
    will vary widely depending on the system and environment.

    There are mechanisms to help increase the kernel ring buffer for tracing
    through debugfs, and those interfaces even allow growing the kernel ring
    buffer per CPU. We also have a static value which can be passed upon
    boot. Relying on debugfs however is not ideal for production, and
    relying on the value passed upon bootup is can only used *after* an
    issue has creeped up. Instead of being reactive this adds a proactive
    measure which lets you scale the amount of contributions you'd expect to
    the kernel ring buffer under load by each CPU in the worst case
    scenario.

    We use num_possible_cpus() to avoid complexities which could be
    introduced by dynamically changing the ring buffer size at run time,
    num_possible_cpus() lets us use the upper limit on possible number of
    CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
    This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
    which is used to specify the maximum amount of contributions to the
    kernel ring buffer in the worst case before the kernel ring buffer flips
    over, the size is specified as a power of 2. The total amount of
    contributions made by each CPU must be greater than half of the default
    kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
    an increase upon bootup. The kernel ring buffer is increased to the
    next power of two that would fit the required minimum kernel ring buffer
    size plus the additional CPU contribution. For example if LOG_BUF_SHIFT
    is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
    in order to trigger an increase of the kernel ring buffer. With a
    LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
    possible CPUs to trigger an increase. If you had 128 possible CPUs the
    amount of minimum required kernel ring buffer bumps to:

    ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB

    Since we require the ring buffer to be a power of two the new required
    size would be 1024 KB.

    This CPU contributions are ignored when the "log_buf_len" kernel
    parameter is used as it forces the exact size of the ring buffer to an
    expected power of two value.

    [pmladek@suse.cz: fix build]
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Petr Mladek
    Tested-by: Davidlohr Bueso
    Tested-by: Petr Mladek
    Reviewed-by: Davidlohr Bueso
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Signed-off-by: Luis R. Rodriguez
    Suggested-by: Davidlohr Bueso
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Greg Kroah-Hartman
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • In practice the power of 2 practice of the size of the kernel ring
    buffer remains purely historical but not a requirement, specially now
    that we have LOG_ALIGN and use it for both static and dynamic
    allocations. It could have helped with implicit alignment back in the
    days given the even the dynamically sized ring buffer was guaranteed to
    be aligned so long as CONFIG_LOG_BUF_SHIFT was set to produce a
    __LOG_BUF_LEN which is architecture aligned, since log_buf_len=n would
    be allowed only if it was > __LOG_BUF_LEN and we always ended up
    rounding the log_buf_len=n to the next power of 2 with
    roundup_pow_of_two(), any multiple of 2 then should be also architecture
    aligned. These assumptions of course relied heavily on
    CONFIG_LOG_BUF_SHIFT producing an aligned value but users can always
    change this.

    We now have precise alignment requirements set for the log buffer size
    for both static and dynamic allocations, but lets upkeep the old
    practice of using powers of 2 for its size to help with easy expected
    scalable values and the allocators for dynamic allocations. We'll reuse
    this later so move this into a helper.

    Signed-off-by: Luis R. Rodriguez
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Greg Kroah-Hartman
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • We have to consider alignment for the ring buffer both for the default
    static size, and then also for when an dynamic allocation is made when
    the log_buf_len=n kernel parameter is passed to set the size
    specifically to a size larger than the default size set by the
    architecture through CONFIG_LOG_BUF_SHIFT.

    The default static kernel ring buffer can be aligned properly if
    architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges for
    the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible aligned
    value it can be reduced to a non aligned value. Commit 6ebb017de9
    ("printk: Fix alignment of buf causing crash on ARM EABI") by Andrew
    Lunn ensures the static buffer is always aligned and the decision of
    alignment is done by the compiler by using __alignof__(struct log).

    When log_buf_len=n is used we allocate the ring buffer dynamically.
    Dynamic allocation varies, for the early allocation called before
    setup_arch() memblock_virt_alloc() requests a page aligment and for the
    default kernel allocation memblock_virt_alloc_nopanic() requests no
    special alignment, which in turn ends up aligning the allocation to
    SMP_CACHE_BYTES, which is L1 cache aligned.

    Since we already have the required alignment for the kernel ring buffer
    though we can do better and request explicit alignment for LOG_ALIGN.
    This does that to be safe and make dynamic allocation alignment
    explicit.

    Signed-off-by: Luis R. Rodriguez
    Tested-by: Petr Mladek
    Acked-by: Petr Mladek
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Greg Kroah-Hartman
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

04 Jul, 2014

1 commit

  • …_trylock_for_printk()"

    Revert commit 939f04bec1a4 ("printk: enable interrupts before calling
    console_trylock_for_printk()").

    Andreas reported:

    : None of the post 3.15 kernel boot for me. They all hang at the GRUB
    : screen telling me it loaded and started the kernel, but the kernel
    : itself stops before it prints anything (or even replaces the GRUB
    : background graphics).

    939f04bec1a4 is modest latency reduction. Revert it until we understand
    the reason for these failures.

    Reported-by: Andreas Bombe <aeb@debian.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Andrew Morton
     

05 Jun, 2014

11 commits

  • ... instead of naked numbers.

    Stuff in sysrq.c used to set it to 8 which is supposed to mean above
    default level so set it to DEBUG instead as we're terminating/killing all
    tasks and we want to be verbose there.

    Also, correct the check in x86_64_start_kernel which should be >= as
    we're clearly issuing the string there for all debug levels, not only
    the magical 10.

    Signed-off-by: Borislav Petkov
    Acked-by: Kees Cook
    Acked-by: Randy Dunlap
    Cc: Joe Perches
    Cc: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • If the log ring buffer becomes full, we silently overwrite old messages
    with new data. console_unlock will detect this case and fast-forward the
    console_* pointers to skip over the corrupted data, but nothing will be
    reported to the user.

    This patch hijacks the first valid log message after detecting that we
    dropped messages and prefixes it with a note detailing how many messages
    were dropped. For long (~1000 char) messages, this will result in some
    truncation of the real message, but given that we're dropping things
    anyway, that doesn't seem to be the end of the world.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra
    Cc: Kay Sievers
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • After learning we'll need some sort of deferred printk functionality in
    the timekeeping core, Peter suggested we rename the printk_sched function
    so it can be reused by needed subsystems.

    This only changes the function name. No logic changes.

    Signed-off-by: John Stultz
    Reviewed-by: Steven Rostedt
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Jiri Bohac
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     
  • An earlier change in -mm (printk: remove separate printk_sched
    buffers...), removed the printk_sched irqsave/restore lines since it was
    safe for current users. Since we may be expanding usage of
    printk_sched(), disable preepmtion for this function to make it more
    generally safe to call.

    Signed-off-by: John Stultz
    Reviewed-by: Jan Kara
    Cc: Peter Zijlstra
    Cc: Jiri Bohac
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     
  • To prevent deadlocks with doing a printk inside the scheduler,
    printk_sched() was created. The issue is that printk has a console_sem
    that it can grab and release. The release does a wake up if there's a
    task pending on the sem, and this wake up grabs the rq locks that is
    held in the scheduler. This leads to a possible deadlock if the wake up
    uses the same rq as the one with the rq lock held already.

    What printk_sched() does is to save the printk write in a per cpu buffer
    and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
    set, the printk() is done against the buffer.

    There's a couple of issues with this approach.

    1) If two printk_sched()s are called before the tick, the second one
    will overwrite the first one.

    2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
    bit of space wasted for something that is seldom used.

    In order to remove this, the printk_sched() can use the printk buffer
    instead, and delay the console_trylock()/console_unlock() to the queued
    work.

    Because printk_sched() would then be taking the logbuf_lock, the
    logbuf_lock must not be held while doing anything that may call into the
    scheduler functions, which includes wake ups. Unfortunately, printk()
    also has a console_sem that it uses, and on release, the up(&console_sem)
    may do a wake up of any pending waiters. This must be avoided while
    holding the logbuf_lock.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • We need interrupts disabled when calling console_trylock_for_printk()
    only so that cpu id we pass to can_use_console() remains valid (for
    other things console_sem provides all the exclusion we need and
    deadlocks on console_sem due to interrupts are impossible because we use
    down_trylock()). However if we are rescheduled, we are guaranteed to
    run on an online cpu so we can easily just get the cpu id in
    can_use_console().

    We can lose a bit of performance when we enable interrupts in
    vprintk_emit() and then disable them again in console_unlock() but OTOH
    it can somewhat reduce interrupt latency caused by console_unlock()
    especially since later in the patch series we will want to spin on
    console_sem in console_trylock_for_printk().

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Printk calls mutex_acquire() / mutex_release() by hand to instrument
    lockdep about console_sem. However in some corner cases the
    instrumentation is missing. Fix the problem by creating helper functions
    for locking / unlocking console_sem which take care of lockdep
    instrumentation as well.

    Signed-off-by: Jan Kara
    Reported-by: Fabio Estevam
    Reported-by: Andy Shevchenko
    Tested-by: Fabio Estevam
    Tested-By: Valdis Kletnieks
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • There's no reason to hold lockbuf_lock when entering
    console_trylock_for_printk().

    The first thing this function does is to call down_trylock(console_sem)
    and if that fails it immediately unlocks lockbuf_lock. So lockbuf_lock
    isn't needed for that branch. When down_trylock() succeeds, the rest of
    console_trylock() is OK without lockbuf_lock (it is called without it
    from other places), and the only remaining thing in
    console_trylock_for_printk() is can_use_console() call. For that call
    console_sem is enough (it iterates all consoles and checks CON_ANYTIME
    flag).

    So we drop logbuf_lock before entering console_trylock_for_printk() which
    simplifies the code.

    [akpm@linux-foundation.org: fix have_callable_console() comment]
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Comment about interesting interlocking between lockbuf_lock and
    console_sem is outdated.

    It was added in 2002 by commit a880f45a48be during conversion of
    console_lock to console_sem + lockbuf_lock.

    At that time release_console_sem() (today's equivalent is
    console_unlock()) was indeed using lockbuf_lock to avoid races between
    trylock on console_sem in printk() and unlock of console_sem. However
    these days the interlocking is gone and the races are avoided by
    rechecking logbuf state after releasing console_sem.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • I wonder if anyone uses printk return value but it is there and should be
    counted correctly.

    This patch modifies log_store() to return the number of really stored
    bytes from the 'text' part. Also it handles the return value in
    vprintk_emit().

    Note that log_store() is used also in cont_flush() but we could ignore the
    return value there. The function works with characters that were already
    counted earlier. In addition, the store could newer fail here because the
    length of the printed text is limited by the "cont" buffer and "dict" is
    NULL.

    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • We might want to print at least part of too long messages and add some
    warning for debugging purpose.

    The question is how long the shrunken message should be. If we use the
    whole buffer, it might get rotated too soon. Let's try to use only 1/4 of
    the buffer for now.

    Also shrink the whole dictionary. We do not want to parse it or break it
    in the middle of some pair of values. It would not cause any real harm
    but still.

    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek