01 Apr, 2013

1 commit

  • This reverts commit 6aa9707099c4b25700940eb3d016f16c4434360d.

    Commit 6aa9707099c4 ("lockdep: check that no locks held at freeze time")
    causes problems with NFS root filesystems. The failures were noticed on
    OMAP2 and 3 boards during kernel init:

    [ BUG: swapper/0/1 still has locks held! ]
    3.9.0-rc3-00344-ga937536 #1 Not tainted
    -------------------------------------
    1 lock held by swapper/0/1:
    #0: (&type->s_umount_key#13/1){+.+.+.}, at: [] sget+0x248/0x574

    stack backtrace:
    rpc_wait_bit_killable
    __wait_on_bit
    out_of_line_wait_on_bit
    __rpc_execute
    rpc_run_task
    rpc_call_sync
    nfs_proc_get_root
    nfs_get_root
    nfs_fs_mount_common
    nfs_try_mount
    nfs_fs_mount
    mount_fs
    vfs_kern_mount
    do_mount
    sys_mount
    do_mount_root
    mount_root
    prepare_namespace
    kernel_init_freeable
    kernel_init

    Although the rootfs mounts, the system is unstable. Here's a transcript
    from a PM test:

    http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt

    Here's what the test log should look like:

    http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt

    Mailing list discussion is here:

    http://lkml.org/lkml/2013/3/4/221

    Deal with this for v3.9 by reverting the problem commit, until folks can
    figure out the right long-term course of action.

    Signed-off-by: Paul Walmsley
    Cc: Mandeep Singh Baines
    Cc: Jeff Layton
    Cc: Shawn Guo
    Cc:
    Cc: Fengguang Wu
    Cc: Trond Myklebust
    Cc: Ingo Molnar
    Cc: Ben Chan
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Rafael J. Wysocki
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Walmsley
     

29 Mar, 2013

1 commit

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

27 Mar, 2013

2 commits

  • Only allow unprivileged mounts of proc and sysfs if they are already
    mounted when the user namespace is created.

    proc and sysfs are interesting because they have content that is
    per namespace, and so fresh mounts are needed when new namespaces
    are created while at the same time proc and sysfs have content that
    is shared between every instance.

    Respect the policy of who may see the shared content of proc and sysfs
    by only allowing new mounts if there was an existing mount at the time
    the user namespace was created.

    In practice there are only two interesting cases: proc and sysfs are
    mounted at their usual places, proc and sysfs are not mounted at all
    (some form of mount namespace jail).

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Guarantee that the policy of which files may be access that is
    established by setting the root directory will not be violated
    by user namespaces by verifying that the root directory points
    to the root of the mount namespace at the time of user namespace
    creation.

    Changing the root is a privileged operation, and as a matter of policy
    it serves to limit unprivileged processes to files below the current
    root directory.

    For reasons of simplicity and comprehensibility the privilege to
    change the root directory is gated solely on the CAP_SYS_CHROOT
    capability in the user namespace. Therefore when creating a user
    namespace we must ensure that the policy of which files may be access
    can not be violated by changing the root directory.

    Anyone who runs a processes in a chroot and would like to use user
    namespace can setup the same view of filesystems with a mount
    namespace instead. With this result that this is not a practical
    limitation for using user namespaces.

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

26 Mar, 2013

2 commits

  • When a multi-threaded init exits and the initial thread is not the
    last thread to exit the initial thread hangs around as a zombie
    until the last thread exits. In that case zap_pid_ns_processes
    needs to wait until there are only 2 hashed pids in the pid
    namespace not one.

    v2. Replace thread_pid_vnr(me) == 1 with the test thread_group_leader(me)
    as suggested by Oleg.

    Cc: stable@vger.kernel.org
    Cc: Oleg Nesterov
    Reported-by: Caj Larsson
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Pull timer fix from Thomas Gleixner:
    "A single bugfix which prevents that a non functional timer device is
    selected to provide the fallback device, which is supposed to serve
    timer interrupts on behalf of non functional devices ..."

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clockevents: Don't allow dummy broadcast timers

    Linus Torvalds
     

23 Mar, 2013

2 commits

  • David said:

    Commit 6c0c0d4d1080 ("poweroff: fix bug in orderly_poweroff()")
    apparently fixes one bug in orderly_poweroff(), but introduces
    another. The comments on orderly_poweroff() claim it can be called
    from any context - and indeed we call it from interrupt context in
    arch/powerpc/platforms/pseries/ras.c for example. But since that
    commit this is no longer safe, since call_usermodehelper_fns() is not
    safe in interrupt context without the UMH_NO_WAIT option.

    orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
    sleepable. Move the "force" logic into __orderly_poweroff() and change
    orderly_poweroff() to use the global poweroff_work which simply calls
    __orderly_poweroff().

    While at it, remove the unneeded "int argc" and change argv_split() to
    use GFP_KERNEL.

    We use the global "bool poweroff_force" to pass the argument, this can
    obviously affect the previous request if it is pending/running. So we
    only allow the "false => true" transition assuming that the pending
    "true" should succeed anyway. If schedule_work() fails after that we
    know that work->func() was not called yet, it must see the new value.

    This means that orderly_poweroff() becomes async even if we do not run
    the command and always succeeds, schedule_work() can only fail if the
    work is already pending. We can export __orderly_poweroff() and change
    the non-atomic callers which want the old semantics.

    Signed-off-by: Oleg Nesterov
    Reported-by: Benjamin Herrenschmidt
    Reported-by: David Gibson
    Cc: Lucas De Marchi
    Cc: Feng Hong
    Cc: Kees Cook
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
    nor printk_sched() are in use and there are actually no waiter on
    log_wait waitqueue. It should be a stub in this case for users like
    bust_spinlocks().

    Otherwise this results in this warning when CONFIG_PRINTK=n and
    CONFIG_IRQ_WORK=n:

    kernel/built-in.o In function `wake_up_klogd':
    (.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'

    To fix this, provide an off-case for wake_up_klogd() when
    CONFIG_PRINTK=n.

    There is much more from console_unlock() and other console related code
    in printk.c that should be moved under CONFIG_PRINTK. But for now,
    focus on a minimal fix as we passed the merged window already.

    [akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
    Signed-off-by: Frederic Weisbecker
    Reported-by: James Hogan
    Cc: James Hogan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     

21 Mar, 2013

1 commit

  • Pull perf fixes from Ingo Molnar:
    "A fair chunk of the linecount comes from a fix for a tracing bug that
    corrupts latency tracing buffers when the overwrite mode is changed on
    the fly - the rest is mostly assorted fewliner fixlets."

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
    kprobes/x86: Check Interrupt Flag modifier when registering probe
    kprobes: Make hash_64() as always inlined
    perf: Generate EXIT event only once per task context
    perf: Reset hwc->last_period on sw clock events
    tracing: Prevent buffer overwrite disabled for latency tracers
    tracing: Keep overwrite in sync between regular and snapshot buffers
    tracing: Protect tracer flags with trace_types_lock
    perf tools: Fix LIBNUMA build with glibc 2.12 and older.
    tracing: Fix free of probe entry by calling call_rcu_sched()
    perf/POWER7: Create a sysfs format entry for Power7 events
    perf probe: Fix segfault
    libtraceevent: Remove hard coded include to /usr/local/include in Makefile
    perf record: Fix -C option
    perf tools: check if -DFORTIFY_SOURCE=2 is allowed
    perf report: Fix build with NO_NEWT=1
    perf annotate: Fix build with NO_NEWT=1
    tracing: Fix race in snapshot swapping

    Linus Torvalds
     

19 Mar, 2013

1 commit


18 Mar, 2013

3 commits

  • …t/rostedt/linux-trace into perf/urgent

    Pull tracing fixes from Steven Rostedt.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • perf_event_task_event() iterates pmu list and generate events
    for each eligible pmu context. But if task_event has task_ctx
    like in EXIT it'll generate events even though the pmu doesn't
    have an eligible one. Fix it by moving the code to proper
    places.

    Before this patch:

    $ perf record -n true
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.006 MB perf.data (~248 samples) ]

    $ perf report -D | tail
    Aggregated stats:
    TOTAL events: 73
    MMAP events: 67
    COMM events: 2
    EXIT events: 4
    cycles stats:
    TOTAL events: 73
    MMAP events: 67
    COMM events: 2
    EXIT events: 4

    After this patch:

    $ perf report -D | tail
    Aggregated stats:
    TOTAL events: 70
    MMAP events: 67
    COMM events: 2
    EXIT events: 1
    cycles stats:
    TOTAL events: 70
    MMAP events: 67
    COMM events: 2
    EXIT events: 1

    Signed-off-by: Namhyung Kim
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1363332433-7637-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Ingo Molnar

    Namhyung Kim
     
  • When cpu/task clock events are initialized, their sampling
    frequencies are converted to have a fixed value. However it
    missed to update the hwc->last_period which was set to 1 for
    initial sampling frequency calibration.

    Because this hwc->last_period value is used as a period in
    perf_swevent_ hrtime(), every recorded sample will have an
    incorrected period of 1.

    $ perf record -e task-clock noploop 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.158 MB perf.data (~6919 samples) ]

    $ perf report -n --show-total-period --stdio
    # Samples: 4K of event 'task-clock'
    # Event count (approx.): 4000
    #
    # Overhead Samples Period Command Shared Object Symbol
    # ........ ............ ............ ....... ............. ..................
    #
    99.95% 3998 3998 noploop noploop [.] main
    0.03% 1 1 noploop libc-2.15.so [.] init_cacheinfo
    0.03% 1 1 noploop ld-2.15.so [.] open_verify

    Note that it doesn't affect the non-sampling event so that the
    perf stat still gets correct value with or without this patch.

    $ perf stat -e task-clock noploop 1

    Performance counter stats for 'noploop 1':

    1000.272525 task-clock # 1.000 CPUs utilized

    1.000560605 seconds time elapsed

    Signed-off-by: Namhyung Kim
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1363574507-18808-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Ingo Molnar

    Namhyung Kim
     

15 Mar, 2013

3 commits

  • The latency tracers require the buffers to be in overwrite mode,
    otherwise they get screwed up. Force the buffers to stay in overwrite
    mode when latency tracers are enabled.

    Added a flag_changed() method to the tracer structure to allow
    the tracers to see what flags are being changed, and also be able
    to prevent the change from happing.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Changing the overwrite mode for the ring buffer via the trace
    option only sets the normal buffer. But the snapshot buffer could
    swap with it, and then the snapshot would be in non overwrite mode
    and the normal buffer would be in overwrite mode, even though the
    option flag states otherwise.

    Keep the two buffers overwrite modes in sync.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Seems that the tracer flags have never been protected from
    synchronous writes. Luckily, admins don't usually modify the
    tracing flags via two different tasks. But if scripts were to
    be used to modify them, then they could get corrupted.

    Move the trace_types_lock that protects against tracers changing
    to also protect the flags being set.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

14 Mar, 2013

7 commits

  • …t/rostedt/linux-trace into perf/urgent

    Pull tracing fixes from Steven Rostedt.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Merge misc fixes from Andrew Morton:

    - A bunch of fixes

    - Finish off the idr API conversions before someone starts to use the
    old interfaces again.

    * emailed patches from Andrew Morton :
    idr: idr_alloc() shouldn't trigger lowmem warning when preloaded
    UAPI: fix endianness conditionals in M32R's asm/stat.h
    UAPI: fix endianness conditionals in linux/raid/md_p.h
    UAPI: fix endianness conditionals in linux/acct.h
    UAPI: fix endianness conditionals in linux/aio_abi.h
    decompressors: fix typo "POWERPC"
    mm/fremap.c: fix oops on error path
    idr: deprecate idr_pre_get() and idr_get_new[_above]()
    tidspbridge: convert to idr_alloc()
    zcache: convert to idr_alloc()
    mlx4: remove leftover idr_pre_get() call
    workqueue: convert to idr_alloc()
    nfsd: convert to idr_alloc()
    nfsd: remove unused get_new_stid()
    kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER
    signal: always clear sa_restorer on execve
    mm: remove_memory(): fix end_pfn setting
    include/linux/res_counter.h needs errno.h

    Linus Torvalds
     
  • idr_get_new*() and friends are about to be deprecated. Convert to the
    new idr_alloc() interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
    later kernels, per Kees.

    Cc: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When the new signal handlers are set up, the location of sa_restorer is
    not cleared, leaking a parent process's address space location to
    children. This allows for a potential bypass of the parent's ASLR by
    examining the sa_restorer value returned when calling sigaction().

    Based on what should be considered "secret" about addresses, it only
    matters across the exec not the fork (since the VMAs haven't changed
    until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
    this is where it should be fixed.

    Given the few uses of sa_restorer, a "set" function was not written
    since this would be the only use. Instead, we use
    __ARCH_HAS_SA_RESTORER, as already done in other places.

    Example of the leak before applying this patch:

    $ cat /proc/$$/maps
    ...
    7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    $ ./leak
    ...
    7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    1 0 (nil) 0x7fb9f30b94a0
    2 4000000 (nil) 0x7f278bcaa4a0
    3 4000000 (nil) 0x7f278bcaa4a0
    4 0 (nil) 0x7fb9f30b94a0
    ...

    [akpm@linux-foundation.org: use SA_RESTORER for backportability]
    Signed-off-by: Kees Cook
    Reported-by: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Don't allowing sharing the root directory with processes in a
    different user namespace. There doesn't seem to be any point, and to
    allow it would require the overhead of putting a user namespace
    reference in fs_struct (for permission checks) and incrementing that
    reference count on practically every call to fork.

    So just perform the inexpensive test of forbidding sharing fs_struct
    acrosss processes in different user namespaces. We already disallow
    other forms of threading when unsharing a user namespace so this
    should be no real burden in practice.

    This updates setns, clone, and unshare to disallow multiple user
    namespaces sharing an fs_struct.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Because function tracing is very invasive, and can even trace
    calls to rcu_read_lock(), RCU access in function tracing is done
    with preempt_disable_notrace(). This requires a synchronize_sched()
    for updates and not a synchronize_rcu().

    Function probes (traceon, traceoff, etc) must be freed after
    a synchronize_sched() after its entry has been removed from the
    hash. But call_rcu() is used. Fix this by using call_rcu_sched().

    Also fix the usage to use hlist_del_rcu() instead of hlist_del().

    Cc: stable@vger.kernel.org
    Cc: Paul McKenney
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

13 Mar, 2013

2 commits

  • Fix kernel-doc warning in futex.c and convert 'Returns' to the new Return:
    kernel-doc notation format.

    Warning(kernel/futex.c:2286): Excess function parameter 'clockrt' description in 'futex_wait_requeue_pi'

    Fix one spello.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix new kernel-doc warnings in kernel/signal.c:

    Warning(kernel/signal.c:2689): No description found for parameter 'uset'
    Warning(kernel/signal.c:2689): Excess function parameter 'set' description in 'sys_rt_sigpending'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

12 Mar, 2013

1 commit

  • Although the swap is wrapped with a spin_lock, the assignment
    of the temp buffer used to swap is not within that lock.
    It needs to be moved into that lock, otherwise two swaps
    happening on two different CPUs, can end up using the wrong
    temp buffer to assign in the swap.

    Luckily, all current callers of the swap function appear to have
    their own locks. But in case something is added that allows two
    different callers to call the swap, then there's a chance that
    this race can trigger and corrupt the buffers.

    New code is coming soon that will allow for this race to trigger.

    I've Cc'd stable, so this bug will not show up if someone backports
    one of the changes that can trigger this bug.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

11 Mar, 2013

1 commit


09 Mar, 2013

2 commits

  • Since multiple pools per cpu have been introduced, wq_unbind_fn() has
    a subtle bug which may theoretically stall work item processing. The
    problem is two-fold.

    * wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself
    to start unbound chain execution, which works fine when there was
    only single pool. With multiple pools, only the pool which is
    running wq_unbind_fn() - the highpri one - is guaranteed to have
    such kick-off. The other pool could stall when its busy workers
    block.

    * The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of
    the two pools in succession without initiating work execution
    inbetween. Because setting the flags requires grabbing assoc_mutex
    which is held while new workers are created, this could lead to
    stalls if a pool's manager is waiting for the previous pool's work
    items to release memory. This is almost purely theoretical tho.

    Update wq_unbind_fn() such that it sets WORKER_UNBIND /
    POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off
    execution for a pool and then moves on to the next one.

    tj: Updated comments and description.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Lai Jiangshan
     
  • Commit b67bfe0d42ca ("hlist: drop the node parameter from iterators")
    did a lot of nice changes but also contains two small hunks that seem to
    have slipped in accidentally and have no apparent connection to the
    intent of the patch.

    This reverts the two extraneous changes.

    Signed-off-by: Arnd Bergmann
    Cc: Peter Senna Tschudin
    Cc: Paul E. McKenney
    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

08 Mar, 2013

1 commit

  • Currently tick_check_broadcast_device doesn't reject clock_event_devices
    with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
    hardware if they have a higher rating value. In this situation, the
    dummy timer is responsible for broadcasting to itself, and the core
    clockevents code may attempt to call non-existent callbacks for
    programming the dummy, eventually leading to a panic.

    This patch makes tick_check_broadcast_device always reject dummy timers,
    preventing this problem.

    Signed-off-by: Mark Rutland
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Jon Medhurst (Tixy)
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Mark Rutland
     

07 Mar, 2013

2 commits

  • To use the tracing snapshot feature, writing a '1' into the snapshot
    file causes the snapshot buffer to be allocated if it has not already
    been allocated and dose a 'swap' with the main buffer, so that the
    snapshot now contains what was in the main buffer, and the main buffer
    now writes to what was the snapshot buffer.

    To free the snapshot buffer, a '0' is written into the snapshot file.

    To clear the snapshot buffer, any number but a '0' or '1' is written
    into the snapshot file. But if the file is not allocated it returns
    -EINVAL error code. This is rather pointless. It is better just to
    do nothing and return success.

    Acked-by: Hiraku Toyooka
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • When cat'ing the snapshot file, instead of showing an empty trace
    header like the trace file does, show how to use the snapshot
    feature.

    Also, this is a good place to show if the snapshot has been allocated
    or not. Users may want to "pre allocate" the snapshot to have a fast
    "swap" of the current buffer. Otherwise, a swap would be slow and might
    fail as it would need to allocate the snapshot buffer, and that might
    fail under tight memory constraints.

    Here's what it looked like before:

    # tracer: nop
    #
    # entries-in-buffer/entries-written: 0/0 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |

    Here's what it looks like now:

    # tracer: nop
    #
    #
    # * Snapshot is freed *
    #
    # Snapshot commands:
    # echo 0 > snapshot : Clears and frees snapshot buffer
    # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
    # Takes a snapshot of the main buffer.
    # echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
    # (Doesn't have to be '2' works with any number that
    # is not a '0' or '1')

    Acked-by: Hiraku Toyooka
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

06 Mar, 2013

2 commits

  • Pull irq fixes and cleanups from Thomas Gleixner:
    "Commit e5ab012c3271 ("nohz: Make tick_nohz_irq_exit() irq safe") is
    the first commit in the series and the minimal necessary bugfix, which
    needs to go back into stable.

    The remanining commits enforce irq disabling in irq_exit(), sanitize
    the hardirq/softirq preempt count transition and remove a bunch of no
    longer necessary conditionals."

    I personally love getting rid of the very subtle and confusing
    IRQ_EXIT_OFFSET thing. Even apart from the whole "more lines removed
    than added" thing.

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irq: Don't re-enable interrupts at the end of irq_exit
    irq: Remove IRQ_EXIT_OFFSET workaround
    Revert "nohz: Make tick_nohz_irq_exit() irq safe"
    irq: Sanitize invoke_softirq
    irq: Ensure irq_exit() code runs with interrupts disabled
    nohz: Make tick_nohz_irq_exit() irq safe

    Linus Torvalds
     
  • Pull smpboot bugfix from Thomas Gleixner:
    "A single bugfix for a regression introduced with the conversion of the
    stop machine threads to the generic smpboot thread management
    facility"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    stop_machine: Mark per cpu stopper enabled early

    Linus Torvalds
     

04 Mar, 2013

2 commits

  • Pull more VFS bits from Al Viro:
    "Unfortunately, it looks like xattr series will have to wait until the
    next cycle ;-/

    This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
    etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
    more file_inode() work"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    constify path_get/path_put and fs_struct.c stuff
    fix nommu breakage in shmem.c
    cache the value of file_inode() in struct file
    9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
    9p: make sure ->lookup() adds fid to the right dentry
    9p: untangle ->lookup() a bit
    9p: double iput() in ->lookup() if d_materialise_unique() fails
    9p: v9fs_fid_add() can't fail now
    v9fs: get rid of v9fs_dentry
    9p: turn fid->dlist into hlist
    9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
    more file_inode() open-coded instances
    selinux: opened file can't have NULL or negative ->f_path.dentry

    (In the meantime, the hlist traversal macros have changed, so this
    required a semantic conflict fixup for the newly hlistified fid->dlist)

    Linus Torvalds
     
  • Pull new ImgTec Meta architecture from James Hogan:
    "This adds core architecture support for Imagination's Meta processor
    cores, followed by some later miscellaneous arch/metag cleanups and
    fixes which I kept separate to ease review:

    - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
    - A few fixes all over, particularly for symbol prefixes
    - A few privilege protection fixes
    - Several cleanups (setup.c includes, split out a lot of
    metag_ksyms.c)
    - Fix some missing exports
    - Convert hugetlb to use vm_unmapped_area()
    - Copy device tree to non-init memory
    - Provide dma_get_sgtable()"

    * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
    metag: Provide dma_get_sgtable()
    metag: prom.h: remove declaration of metag_dt_memblock_reserve()
    metag: copy devicetree to non-init memory
    metag: cleanup metag_ksyms.c includes
    metag: move mm/init.c exports out of metag_ksyms.c
    metag: move usercopy.c exports out of metag_ksyms.c
    metag: move setup.c exports out of metag_ksyms.c
    metag: move kick.c exports out of metag_ksyms.c
    metag: move traps.c exports out of metag_ksyms.c
    metag: move irq enable out of irqflags.h on SMP
    genksyms: fix metag symbol prefix on crc symbols
    metag: hugetlb: convert to vm_unmapped_area()
    metag: export clear_page and copy_page
    metag: export metag_code_cache_flush_all
    metag: protect more non-MMU memory regions
    metag: make TXPRIVEXT bits explicit
    metag: kernel/setup.c: sort includes
    perf: Enable building perf tools for Meta
    metag: add boot time LNKGET/LNKSET check
    metag: add __init to metag_cache_probe()
    ...

    Linus Torvalds
     

03 Mar, 2013

4 commits

  • Pull sigprocmask compat fix from Al Viro:
    "generic compat_sys_rt_sigprocmask() had a very dumb braino; I'd spent
    quite a while staring at the offending commit before finally managing
    to spot the idiocy ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    fix compat_sys_rt_sigprocmask()

    Linus Torvalds
     
  • Converting bitmask to 32bit granularity is fine, but we'd better
    _do_ something with the result. Such as "copy it to userland"...

    Signed-off-by: Al Viro

    Al Viro
     
  • Some 32 bit architectures require 64 bit values to be aligned (for
    example Meta which has 64 bit read/write instructions). These require 8
    byte alignment of event data too, so use
    !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT ||
    CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align
    buffer_data_page::data accordingly.

    Signed-off-by: James Hogan
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Acked-by: Steven Rostedt (previous version subtly different)

    James Hogan
     
  • Pull KGDB/KDB fixes and cleanups from Jason Wessel:
    "For a change we removed more code than we added. If people aren't
    using it we shouldn't be carrying it. :-)

    Cleanups:
    - Remove kdb ssb command - there is no in kernel disassembler to
    support it

    - Remove kdb ll command - Always caused a kernel oops and there were
    no bug reports so no one was using this command

    - Use kernel ARRAY_SIZE macro instead of array computations

    Fixes:
    - Stop oops in kdb if user executes kdb_defcmd with args

    - kdb help command truncated text

    - ppc64 support for kgdbts

    - Add missing kconfig option from original kdb port for dealing with
    catastrophic kernel crashes such that you can reboot automatically
    on continue from kdb"

    * tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
    kdb: Remove unhandled ssb command
    kdb: Prevent kernel oops with kdb_defcmd
    kdb: Remove the ll command
    kdb_main: fix help print
    kdb: Fix overlap in buffers with strcpy
    Fixed dead ifdef block by adding missing Kconfig option.
    kdb: Setup basic kdb state before invoking commands via kgdb
    kdb: use ARRAY_SIZE where possible
    kgdb/kgdbts: support ppc64
    kdb: A fix for kdb command table expansion

    Linus Torvalds