22 Feb, 2013

1 commit

  • Pull pstore patches from Tony Luck:
    "A few fixes to reduce places where pstore might hang a system in the
    crash path. Plus a new mountpoint (/sys/fs/pstore ... makes more
    sense then /dev/pstore)."

    Fix up trivial conflict in drivers/firmware/efivars.c

    * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    pstore: Create a convenient mount point for pstore
    efi_pstore: Introducing workqueue updating sysfs
    efivars: Disable external interrupt while holding efivars->lock
    efi_pstore: Avoid deadlock in non-blocking paths
    pstore: Avoid deadlock in panic and emergency-restart path

    Linus Torvalds
     

13 Feb, 2013

1 commit

  • Using /dev/pstore as a mount point for the pstore filesystem is slightly
    awkward. We don't normally mount filesystems in /dev/ and the /dev/pstore
    file isn't created automatically by anything. While this method will
    still work, we can create a persistent mount point in sysfs. This will
    put pstore on par with things like cgroups and efivarfs.

    Signed-off-by: Josh Boyer
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Josh Boyer
     

04 Feb, 2013

1 commit


16 Jan, 2013

1 commit

  • The pstore RAM backend can get called during resume, and must be defensive
    against a suspended time source. Expose getnstimeofday logic that returns
    an error instead of a WARN. This can be detected and the timestamp can
    be zeroed out.

    Reported-by: Doug Anderson
    Cc: John Stultz
    Cc: Anton Vorontsov
    Signed-off-by: Kees Cook
    Signed-off-by: John Stultz

    Kees Cook
     

12 Jan, 2013

1 commit

  • [Issue]

    When pstore is in panic and emergency-restart paths, it may be blocked
    in those paths because it simply takes spin_lock.

    This is an example scenario which pstore may hang up in a panic path:

    - cpuA grabs psinfo->buf_lock
    - cpuB panics and calls smp_send_stop
    - smp_send_stop sends IRQ to cpuA
    - after 1 second, cpuB gives up on cpuA and sends an NMI instead
    - cpuA is now in an NMI handler while still holding buf_lock
    - cpuB is deadlocked

    This case may happen if a firmware has a bug and
    cpuA is stuck talking with it more than one second.

    Also, this is a similar scenario in an emergency-restart path:

    - cpuA grabs psinfo->buf_lock and stucks in a firmware
    - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
    And then, cpuB is deadlocked by taking psinfo->buf_lock again.

    [Solution]

    This patch avoids the deadlocking issues in both panic and emergency_restart
    paths by introducing a function, is_non_blocking_path(), to check if a cpu
    can be blocked in current path.

    With this patch, pstore is not blocked even if another cpu has
    taken a spin_lock, in those paths by changing from spin_lock_irqsave
    to spin_trylock_irqsave.

    In addition, according to a comment of emergency_restart() in kernel/sys.c,
    spin_lock shouldn't be taken in an emergency_restart path to avoid
    deadlock. This patch fits the comment below.

    /**
    * emergency_restart - reboot the system
    *
    * Without shutting down any hardware or taking any locks
    * reboot the system. This is called when we know we are in
    * trouble so this is our best effort to reboot. This is
    * safe to call in interrupt context.
    */
    void emergency_restart(void)

    Signed-off-by: Seiji Aguchi
    Acked-by: Don Zickus
    Signed-off-by: Tony Luck

    Seiji Aguchi
     

04 Jan, 2013

1 commit

  • CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
    markings need to be removed.

    This change removes the use of __devinit from the pstore filesystem.

    Based on patches originally written by Bill Pemberton, but redone by me
    in order to handle some of the coding style issues better, by hand.

    Cc: Bill Pemberton
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Kees Cook
    Cc: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

18 Dec, 2012

1 commit


16 Dec, 2012

1 commit

  • Pull pstore update from Anton Vorontsov:
    "Here are just a few fixups for the pstore subsystem, nothing special
    this time"

    * tag 'for-v3.8' of git://git.infradead.org/users/cbou/linux-pstore:
    pstore/ftrace: Adjust for ftrace_ops->func prototype change
    pstore/ram: Fix bounds checks for mem_size, record_size, console_size and ftrace_size
    pstore/ram: Fix undefined usage of rounddown_pow_of_two(0)
    pstore/ram: Fixup section annotations

    Linus Torvalds
     

13 Dec, 2012

2 commits


27 Nov, 2012

2 commits

  • [Issue]

    Currently, a variable name, which identifies each entry, consists of type, id and ctime.
    But if multiple events happens in a short time, a second/third event may fail to log because
    efi_pstore can't distinguish each event with current variable name.

    [Solution]

    A reasonable way to identify all events precisely is introducing a sequence counter to
    the variable name.

    The sequence counter has already supported in a pstore layer with "oopscount".
    So, this patch adds it to a variable name.
    Also, it is passed to read/erase callbacks of platform drivers in accordance with
    the modification of the variable name.


    a variable name of first event: dump-type0-1-12345678
    a variable name of second event: dump-type0-1-12345678

    type:0
    id:1
    ctime:12345678

    If multiple events happen in a short time, efi_pstore can't distinguish them because
    variable names are same among them.

    it can be distinguishable by adding a sequence counter as follows.

    a variable name of first event: dump-type0-1-1-12345678
    a variable name of Second event: dump-type0-1-2-12345678

    type:0
    id:1
    sequence counter: 1(first event), 2(second event)
    ctime:12345678

    In case of a write callback executed in pstore_console_write(), "0" is added to
    an argument of the write callback because it just logs all kernel messages and
    doesn't need to care about multiple events.

    Signed-off-by: Seiji Aguchi
    Acked-by: Rafael J. Wysocki
    Acked-by: Mike Waychison
    Signed-off-by: Tony Luck

    Seiji Aguchi
     
  • [Issue]

    Currently, a variable name, which is used to identify each log entry, consists of type,
    id and ctime. But an erase callback does not use ctime.

    If efi_pstore supported just one log, type and id were enough.
    However, in case of supporting multiple logs, it doesn't work because
    it can't distinguish each entry without ctime at erasing time.

    As you can see below, efi_pstore can't differentiate first event from second one without ctime.

    a variable name of first event: dump-type0-1-12345678
    a variable name of second event: dump-type0-1-23456789

    type:0
    id:1
    ctime:12345678, 23456789

    [Solution]

    This patch adds ctime to an argument of an erase callback.

    It works across reboots because ctime of pstore means the date that the record was originally stored.
    To do this, efi_pstore saves the ctime to variable name at writing time and passes it to pstore
    at reading time.

    Signed-off-by: Seiji Aguchi
    Acked-by: Mike Waychison
    Signed-off-by: Tony Luck

    Seiji Aguchi
     

18 Nov, 2012

1 commit

  • record_size / console_size / ftrace_size can be 0 (this is how you disable
    the feature), but rounddown_pow_of_two(0) is undefined. As suggested by
    Kees Cook, use !is_power_of_2() as a condition to call
    rounddown_pow_of_two and avoid its undefined behavior on the value 0. This
    issue has been present since commit 1894a253 (ramoops: Move to
    fs/pstore/ram.c).

    Cc: stable@vger.kernel.org
    Signed-off-by: Maxime Bizon
    Signed-off-by: Florian Fainelli
    Acked-by: Kees Cook
    Signed-off-by: Anton Vorontsov

    Maxime Bizon
     

17 Nov, 2012

1 commit


15 Nov, 2012

1 commit

  • Passing a NULL id causes a NULL pointer deference in writers such as
    erst_writer and efi_pstore_write because they expect to update this id.
    Pass a dummy id instead.

    This avoids a cascade of oopses caused when the initial
    pstore_console_write passes a null which in turn causes writes to the
    console causing further oopses in subsequent pstore_console_write calls.

    Signed-off-by: Colin Ian King
    Acked-by: Kees Cook
    Cc: stable@vger.kernel.org
    Signed-off-by: Anton Vorontsov

    Colin Ian King
     

21 Sep, 2012

1 commit


07 Sep, 2012

1 commit

  • With this patch we no longer reuse function tracer infrastructure, now
    we register our own tracer back-end via a debugfs knob.

    It's a bit more code, but that is the only downside. On the bright side we
    have:

    - Ability to make persistent_ram module removable (when needed, we can
    move ftrace_ops struct into a module). Note that persistent_ram is still
    not removable for other reasons, but with this patch it's just one
    thing less to worry about;

    - Pstore part is more isolated from the generic function tracer. We tried
    it already by registering our own tracer in available_tracers, but that
    way we're loosing ability to see the traces while we record them to
    pstore. This solution is somewhere in the middle: we only register
    "internal ftracer" back-end, but not the "front-end";

    - When there is only pstore tracing enabled, the kernel will only write
    to the pstore buffer, omitting function tracer buffer (which, of course,
    still can be enabled via 'echo function > current_tracer').

    Suggested-by: Steven Rostedt
    Signed-off-by: Anton Vorontsov

    Anton Vorontsov
     

01 Sep, 2012

1 commit


05 Aug, 2012

3 commits

  • write_buf() should be marked as notrace, otherwise it is prone to
    recursion.

    Though, yet the issue is never triggered in real life, because we run
    inside the function tracer, where ftrace does its own recurse protection.

    But it's still no good, plus soon we might switch to our own tracer ops,
    and then the issue will be fatal. So, let's fix it.

    Signed-off-by: Anton Vorontsov

    Anton Vorontsov
     
  • Fix printk format warning (on i386) in pstore:

    fs/pstore/ram.c:409:3: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'size_t'

    Signed-off-by: Randy Dunlap
    Acked-by: Kees Cook
    Signed-off-by: Anton Vorontsov

    Randy Dunlap
     
  • We can dereference 'cxt->cprz' if console and dump logging are disabled
    (which is unlikely, but still possible to do). This patch fixes the issue
    by changing the code so that we don't dereference przs at all, we can
    just calculate bufsize from console_size and record_size values.

    Plus, while at it, the patch improves the buffer size calculation.

    After Kay's printk rework, we know the optimal buffer size for console
    logging -- it is LOG_LINE_MAX (defined privately in printk.c). Previously,
    if only console logging was enabled, we would allocate unnecessary large
    buffer in pstore, while we only need LOG_LINE_MAX. (Pstore console logging
    is still capable of handling buffers > LOG_LINE_MAX, it will just do
    multiple calls to psinfo->write).

    Note that I don't export the constant, since we will do even a better
    thing soon: we will switch console logging to a new write_buf API, which
    will eliminate the need for the additional buffer; and so we won't need
    the constant.

    Reported-by: Dan Carpenter
    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook

    Anton Vorontsov
     

18 Jul, 2012

9 commits

  • Decoding the binary trace w/ a different kernel might be troublesome
    since we convert addresses to symbols. For kernels with minimal changes,
    the mappings would probably match, but it's not guaranteed at all.
    (But still we could convert the addresses by hand, since we do print
    raw addresses.)

    If we use modules, the symbols could be loaded at different addresses
    from the previously booted kernel, and so this would also fail, but
    there's nothing we can do about it.

    Also, the binary data format that pstore/ram is using in its ringbuffer
    may change between the kernels, so here we too must ensure that we're
    running the same kernel.

    So, there are two questions really:

    1. How to compute the unique kernel tag;
    2. Where to store it.

    In this patch we're using LINUX_VERSION_CODE, just as hibernation
    (suspend-to-disk) does. This way we are protecting from the kernel
    version mismatch, making sure that we're running the same kernel
    version and patch level. We could use CRC of a symbol table (as
    suggested by Tony Luck), but for now let's not be that strict.

    And as for storing, we are using a small trick here. Instead of
    allocating a dedicated buffer for the tag (i.e. another prz), or
    hacking ram_core routines to "reserve" some control data in the
    buffer, we are just encoding the tag into the buffer signature
    (and XOR'ing it with the actual signature value, so that buffers
    not needing a tag can just pass zero, which will result into the
    plain old PRZ signature).

    Suggested-by: Steven Rostedt
    Suggested-by: Tony Luck
    Suggested-by: Colin Cross
    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • Headers should really include all the needed prototypes, types, defines
    etc. to be self-contained. This is a long-standing issue, but apparently
    the new tracing code unearthed it (SMP=n is also a prerequisite):

    In file included from fs/pstore/internal.h:4:0,
    from fs/pstore/ftrace.c:21:
    include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type

    While at it, I also added the following:

    linux/types.h -> size_t, phys_addr_t, uXX and friends
    linux/spinlock.h -> spinlock_t
    linux/errno.h -> Exxxx
    linux/time.h -> struct timespec (struct passed by value)
    struct module and rs_control forward declaration (passed via pointers).

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • The ftrace log size is configurable via ramoops.ftrace_size
    module option, and the log itself is available via
    /ftrace-ramoops file.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • Don't use pstore.buf directly, instead convert the code to write_buf callback
    which passes a pointer to a buffer as an argument.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • With this support kernel can save function call chain log into a
    persistent ram buffer that can be decoded and dumped after reboot
    through pstore filesystem. It can be used to determine what function
    was last called before a reset or panic.

    We store the log in a binary format and then decode it at read time.

    p.s.
    Mostly the code comes from trace_persistent.c driver found in the
    Android git tree, written by Colin Cross
    (according to sign-off history). I reworked the driver a little bit,
    and ported it to pstore.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • For function tracing we need to stop using pstore.buf directly, since
    in a tracing callback we can't use spinlocks, and thus we can't safely
    use the global buffer.

    With write_buf callback, backends no longer need to access pstore.buf
    directly, and thus we can pass any buffers (e.g. allocated on stack).

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • Nowadays we can use prz->ecc_size as a flag, no need for the special
    member in the prz struct.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • This is now pretty straightforward: instead of using bool, just pass
    an integer. For backwards compatibility ramoops.ecc=1 means 16 bytes
    ECC (using 1 byte for ECC isn't much of use anyway).

    Suggested-by: Arve Hjønnevåg
    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • The struct members were never used anywhere outside of
    persistent_ram_init_ecc(), so there's actually no need for them
    to be in the struct.

    If we ever want to make polynomial or symbol size configurable,
    it would make more sense to just pass initialized rs_decoder
    to the persistent_ram init functions.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     

26 Jun, 2012

1 commit


21 Jun, 2012

5 commits

  • - Instead of exploiting unsigned overflows (which doesn't work for all
    sizes), use straightforward checking for ECC total size not exceeding
    initial buffer size;

    - Printing overflowed buffer_size is not informative. Instead, print
    ecc_size and buffer_size;

    - No need for buffer_size argument in persistent_ram_init_ecc(),
    we can address prz->buffer_size directly.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • We will implement variable-sized ECC buffers soon, so post_init routine
    might fail much more likely, so we'd better check for its errors.

    To make error handling simple, modify persistent_ram_free() to it be safe
    at all times.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • persistent_ram_new() returns ERR_PTR() value on errors, so during
    freeing of the przs we should check for both NULL and IS_ERR() entries,
    otherwise bad things will happen.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • Registering the platform driver before module_init allows us to log oopses
    that happen during device probing.

    This requires changing module_init to postcore_initcall, and switching
    from platform_driver_probe to platform_driver_register because the
    platform device is not registered when the platform driver is registered;
    and because we use driver_register, now can't use create_bundle() (since
    it will try to register the same driver once again), so we have to switch
    to platform_device_register_data().

    Also, some __init -> __devinit changes were needed.

    Overall, the registration logic is now much clearer, since we have only
    one driver registration point, and just an optional dummy device, which
    is created from the module parameters.

    Suggested-by: Colin Cross
    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • Pull staging tree fixes from Greg Kroah-Hartman:
    "Here are a number of small fixes for the drivers/staging tree, as well
    as iio and pstore drivers (which came from the staging tree in the
    3.5-rc1 merge). All of these are tiny, but resolve issues that people
    have been reporting.

    There's also a documentation update to reflect what the iio drivers
    really are doing, which is good to get straightened out.

    Signed-off-by: Greg Kroah-Hartman "

    * tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: r8712u: Add new USB IDs
    staging: gdm72xx: Release netlink socket properly
    iio: drop wrong reference from Kconfig
    pstore/inode: Make pstore_fill_super() static
    pstore/ram: Should zap persistent zone on unlink
    pstore/ram_core: Factor persistent_ram_zap() out of post_init()
    pstore/ram_core: Do not reset restored zone's position and size
    pstore/ram: Should update old dmesg buffer before reading
    staging:iio:ad7298: Fix linker error due to missing IIO kfifo buffer
    Revert "staging: usbip: bugfix for stack corruption on 64-bit architectures"
    staging: usbip: bugfix for stack corruption on 64-bit architectures
    staging/comedi: fix build for USB not enabled
    staging: omapdrm: fix crash when freeing bad fb
    staging:iio:ad7606: Re-add missing scale attribute
    iio: Fix potential use after free
    staging:iio: remove num_interrupt_lines from documentation
    iio: documentation: Add out_altvoltage and friends

    Linus Torvalds
     

16 Jun, 2012

1 commit

  • Provide an iterator to receive the log buffer content, and convert all
    kmsg_dump() users to it.

    The structured data in the kmsg buffer now contains binary data, which
    should no longer be copied verbatim to the kmsg_dump() users.

    The iterator should provide reliable access to the buffer data, and also
    supports proper log line-aware chunking of data while iterating.

    Signed-off-by: Kay Sievers
    Tested-by: Tony Luck
    Reported-by: Anton Vorontsov
    Tested-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

14 Jun, 2012

3 commits

  • Having automatic updates seems pointless for production system, and
    even dangerous and thus counter-productive:

    1. If we can mount pstore, or read files, we can as well read
    /proc/kmsg. So, there's little point in duplicating the
    functionality and present the same information but via another
    userland ABI;

    2. Expecting the kernel to behave sanely after oops/panic is naive.
    It might work, but you'd rather not try it. Screwed up kernel
    can do rather bad things, like recursive faults[1]; and pstore
    rather provoking bad things to happen. It uses:

    1. Timers (assumes sane interrupts state);
    2. Workqueues and mutexes (assumes scheduler in a sane state);
    3. kzalloc (a working slab allocator);

    That's too much for a dead kernel, so the debugging facility
    itself might just make debugging harder, which is not what
    we want.

    Maybe for non-oops message types it would make sense to re-enable
    automatic updates, but so far I don't see any use case for this.
    Even for tracing, it has its own run-time/normal ABI, so we're
    only interested in pstore upon next boot, to retrieve what has
    gone wrong with HW or SW.

    So, let's disable the updates by default.

    [1]
    BUG: unable to handle kernel paging request at fffffffffffffff8
    IP: [] kthread_data+0xb/0x20
    [...]
    Process kworker/0:1 (pid: 14, threadinfo ffff8800072c0000, task ffff88000725b100)
    [...
    Call Trace:
    [] wq_worker_sleeping+0x10/0xa0
    [] __schedule+0x568/0x7d0
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? call_rcu_sched+0x12/0x20
    [] ? release_task+0x156/0x2d0
    [] ? release_task+0x1e/0x2d0
    [] ? trace_hardirqs_on+0xd/0x10
    [] schedule+0x24/0x70
    [] do_exit+0x1f8/0x370
    [] oops_end+0x77/0xb0
    [] no_context+0x1a6/0x1b5
    [] __bad_area_nosemaphore+0x1ce/0x1ed
    [] ? ttwu_queue+0xc6/0xe0
    [] bad_area_nosemaphore+0xe/0x10
    [] do_page_fault+0x2c7/0x450
    [] ? __lock_release+0x6b/0xe0
    [] ? mark_held_locks+0x61/0x140
    [] ? __wake_up+0x4e/0x70
    [] ? trace_hardirqs_off_thunk+0x3a/0x3c
    [] ? pstore_register+0x120/0x120
    [] page_fault+0x1f/0x30
    [] ? pstore_register+0x120/0x120
    [] ? memcpy+0x68/0x110
    [] ? pstore_get_records+0x3a/0x130
    [] ? persistent_ram_copy_old+0x64/0x90
    [] ramoops_pstore_read+0x84/0x130
    [] pstore_get_records+0x79/0x130
    [] ? process_one_work+0x116/0x450
    [] ? pstore_register+0x120/0x120
    [] pstore_dowork+0xe/0x10
    [] process_one_work+0x174/0x450
    [] ? process_one_work+0x116/0x450
    [] worker_thread+0x123/0x2d0
    [] ? manage_workers.isra.28+0x120/0x120
    [] kthread+0x8e/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? retint_restore_args+0xe/0xe
    [] ? __init_kthread_worker+0x70/0x70
    [] ? gs_change+0xb/0xb
    Code: be e2 00 00 00 48 c7 c7 d1 2a 4e 81 e8 bf fb fd ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 48 8b 87 08 02 00 00 55 48 89 e5 8b 40 f8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
    RIP [] kthread_data+0xb/0x20
    RSP
    CR2: fffffffffffffff8
    ---[ end trace 996a332dc399111d ]---
    Fixing recursive fault but reboot is needed!

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • There is no behavioural change, the default value is still 60 seconds.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov
     
  • The code tried to maintain the global list of persistent ram zones,
    which isn't a great idea overall, plus since Android's ram_console
    is no longer there, we can remove some unused functions.

    Signed-off-by: Anton Vorontsov
    Acked-by: Kees Cook
    Acked-by: Colin Cross
    Signed-off-by: Greg Kroah-Hartman

    Anton Vorontsov