06 Jul, 2014

2 commits

  • Pull irq fixes from Thomas Gleixner:
    "A few minor fixlets in ARM SoC irq drivers and a fix for a memory leak
    which I introduced in the last round of cleanups :("

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Fix memory leak when calling irq_free_hwirqs()
    irqchip: spear_shirq: Fix interrupt offset
    irqchip: brcmstb-l2: Level-2 interrupts are edge sensitive
    irqchip: armada-370-xp: Mask all interrupts during initialization.

    Linus Torvalds
     
  • irq_free_hwirqs() always calls irq_free_descs() with a cnt == 0
    which makes it a no-op since the interrupt count to free is
    decremented in itself.

    Fixes: 7b6ef1262549f6afc5c881aaef80beb8fd15f908

    Signed-off-by: Keith Busch
    Acked-by: David Rientjes
    Link: http://lkml.kernel.org/r/1404167084-8070-1-git-send-email-keith.busch@intel.com
    Signed-off-by: Thomas Gleixner

    Keith Busch
     

04 Jul, 2014

2 commits

  • …it/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Oleg Nesterov found and fixed a bug in the perf/ftrace/uprobes code
    where running:

    # perf probe -x /lib/libc.so.6 syscall
    # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
    # perf record -e probe_libc:syscall whatever

    kills the uprobe. Along the way he found some other minor bugs and
    clean ups that he fixed up making it a total of 4 patches.

    Doing unrelated work, I found that the reading of the ftrace trace
    file disables all function tracer callbacks. This was fine when
    ftrace was the only user, but now that it's used by perf and kprobes,
    this is a bug where reading trace can disable kprobes and perf. A
    very unexpected side effect and should be fixed"

    * tag 'trace-fixes-v3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Remove ftrace_stop/start() from reading the trace file
    tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable()
    tracing/uprobes: Kill the bogus UPROBE_HANDLER_REMOVE code in uprobe_dispatcher()
    uprobes: Change unregister/apply to WARN() if uprobe/consumer is gone
    tracing/uprobes: Revert "Support mix of ftrace and perf"

    Linus Torvalds
     
  • …_trylock_for_printk()"

    Revert commit 939f04bec1a4 ("printk: enable interrupts before calling
    console_trylock_for_printk()").

    Andreas reported:

    : None of the post 3.15 kernel boot for me. They all hang at the GRUB
    : screen telling me it loaded and started the kernel, but the kernel
    : itself stops before it prints anything (or even replaces the GRUB
    : background graphics).

    939f04bec1a4 is modest latency reduction. Revert it until we understand
    the reason for these failures.

    Reported-by: Andreas Bombe <aeb@debian.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Andrew Morton
     

02 Jul, 2014

1 commit


01 Jul, 2014

4 commits

  • The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,

    1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
    _enable() should be called only if !enabled.

    2. If uprobe_buffer_enable() fails probe_event_enable() should clear
    tp.flags and free event_file_link.

    3. If uprobe_register() fails it should do uprobe_buffer_disable().

    Link: http://lkml.kernel.org/p/20140627170146.GA18332@redhat.com

    Acked-by: Namhyung Kim
    Acked-by: Srikar Dronamraju
    Reviewed-by: Masami Hiramatsu
    Fixes: dcad1a204f72 "tracing/uprobes: Fetch args before reserving a ring buffer"
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • I do not know why dd9fa555d7bb "tracing/uprobes: Move argument fetching
    to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks
    wrong.

    OK, perhaps it makes sense to avoid store_trace_args() if the tracee is
    nacked by uprobe_perf_filter(). But then we should kill the same code
    in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to
    do this anyway to mix perf/ftrace). Until then this code actually adds
    the pessimization because uprobe_perf_filter() will be called twice and
    return T in likely case.

    Link: http://lkml.kernel.org/p/20140627170143.GA18329@redhat.com

    Acked-by: Namhyung Kim
    Acked-by: Srikar Dronamraju
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure
    that nobody tries to play with the dead uprobe/consumer. This helps
    to catch the bugs like the one fixed by the previous patch.

    In the longer term we should fix this poorly designed interface.
    uprobe_register() should return "struct uprobe *" which should be
    passed to apply/unregister. Plus other semantic changes, see the
    changelog in commit 41ccba029e94.

    Link: http://lkml.kernel.org/p/20140627170140.GA18322@redhat.com

    Acked-by: Namhyung Kim
    Acked-by: Srikar Dronamraju
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0.

    This patch is very wrong. Firstly, this change leads to unbalanced
    uprobe_unregister(). Just for example,

    # perf probe -x /lib/libc.so.6 syscall
    # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
    # perf record -e probe_libc:syscall whatever

    after that uprobe is dead (unregistered) but the user of ftrace/perf
    can't know this, and it looks as if nobody hits this probe.

    This would be easy to fix, but there are other reasons why it is not
    simple to mix ftrace and perf. If nothing else, they can't share the
    same ->consumer.filter. This is fixable too, but probably we need to
    fix the poorly designed uprobe_register() interface first. At least
    "register" and "apply" should be clearly separated.

    Link: http://lkml.kernel.org/p/20140627170136.GA18319@redhat.com

    Cc: Tom Zanussi
    Cc: "zhangwei(Jovi)"
    Cc: stable@vger.kernel.org # v3.14
    Acked-by: Namhyung Kim
    Acked-by: Srikar Dronamraju
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

25 Jun, 2014

1 commit

  • …l/git/rostedt/linux-trace

    Pull tracing cleanups and fixes from Steven Rostedt:
    "This includes three patches from Oleg Nesterov. The first is a fix to
    a race condition that happens between enabling/disabling syscall
    tracepoints and new process creations (the check to go into the ptrace
    path for a process can be set when it shouldn't, or not set when it
    should). Not a major bug but one that should be fixed and even
    applied to stable.

    The other two patches are cleanup/fixes that are not that critical,
    but for an -rc1 release would be nice to have. They both deal with
    syscall tracepoints.

    It also includes a patch to introduce a new macro for the
    TRACE_EVENT() format called __field_struct(). Originally, __field()
    was used to record any variable into a trace event, but with the
    addition of setting the "is signed" attribute, the check causes
    anything but a primitive variable to fail to compile. That is,
    structs and unions can't be used as they once were. When the "is
    signed" check was introduce there were only primitive variables being
    recorded. But that will change soon and it was reported that
    __field() causes build failures.

    To solve the __field() issue, __field_struct() is introduced to allow
    trace_events to be able to record complex types too"

    * tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add __field_struct macro for TRACE_EVENT()
    tracing: syscall_regfunc() should not skip kernel threads
    tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()
    tracing: Fix syscall_*regfunc() vs copy_process() race

    Linus Torvalds
     

24 Jun, 2014

5 commits

  • A 'softlockup' is defined as a bug that causes the kernel to loop in
    kernel mode for more than a predefined period to time, without giving
    other tasks a chance to run.

    Currently, upon detection of this condition by the per-cpu watchdog
    task, debug information (including a stack trace) is sent to the system
    log.

    On some occasions, we have observed that the "victim" rather than the
    actual "culprit" (i.e. the owner/holder of the contended resource) is
    reported to the user. Often this information has proven to be
    insufficient to assist debugging efforts.

    To avoid loss of useful debug information, for architectures which
    support NMI, this patch makes it possible to improve soft lockup
    reporting. This is accomplished by issuing an NMI to each cpu to obtain
    a stack trace.

    If NMI is not supported we just revert back to the old method. A sysctl
    and boot-time parameter is available to toggle this feature.

    [dzickus@redhat.com: add CONFIG_SMP in certain areas]
    [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
    [mq@suse.cz: fix warning]
    Signed-off-by: Aaron Tomlin
    Signed-off-by: Don Zickus
    Cc: David S. Miller
    Cc: Mateusz Guzik
    Cc: Oleg Nesterov
    Signed-off-by: Jan Moskyto Matejka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Tomlin
     
  • Oleg reports a division by zero error on zero-length write() to the
    percpu_pagelist_fraction sysctl:

    divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
    RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
    RSP: 0018:ffff8800d87a3e78 EFLAGS: 00010246
    RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
    RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
    R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
    FS: 00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
    Call Trace:
    proc_sys_call_handler+0xb3/0xc0
    proc_sys_write+0x14/0x20
    vfs_write+0xba/0x1e0
    SyS_write+0x46/0xb0
    tracesys+0xe1/0xe6

    However, if the percpu_pagelist_fraction sysctl is set by the user, it
    is also impossible to restore it to the kernel default since the user
    cannot write 0 to the sysctl.

    This patch allows the user to write 0 to restore the default behavior.
    It still requires a fraction equal to or larger than 8, however, as
    stated by the documentation for sanity. If a value in the range [1, 7]
    is written, the sysctl will return EINVAL.

    This successfully solves the divide by zero issue at the same time.

    Signed-off-by: David Rientjes
    Reported-by: Oleg Drokin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Peter Wu noticed the following splat on his machine when updating
    /proc/sys/kernel/watchdog_thresh:

    BUG: sleeping function called from invalid context at mm/slub.c:965
    in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
    3 locks held by init/1:
    #0: (sb_writers#3){.+.+.+}, at: [] vfs_write+0x143/0x180
    #1: (watchdog_proc_mutex){+.+.+.}, at: [] proc_dowatchdog+0x33/0x110
    #2: (cpu_hotplug.lock){.+.+.+}, at: [] get_online_cpus+0x32/0x80
    Preemption disabled at:[] proc_dowatchdog+0xe4/0x110

    CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x4e/0x7a
    __might_sleep+0x11d/0x190
    kmem_cache_alloc_trace+0x4e/0x1e0
    perf_event_alloc+0x55/0x440
    perf_event_create_kernel_counter+0x26/0xe0
    watchdog_nmi_enable+0x75/0x140
    update_timers_all_cpus+0x53/0xa0
    proc_dowatchdog+0xe4/0x110
    proc_sys_call_handler+0xb3/0xc0
    proc_sys_write+0x14/0x20
    vfs_write+0xad/0x180
    SyS_write+0x49/0xb0
    system_call_fastpath+0x16/0x1b
    NMI watchdog: disabled (cpu0): hardware events not enabled

    What happened is after updating the watchdog_thresh, the lockup detector
    is restarted to utilize the new value. Part of this process involved
    disabling preemption. Once preemption was disabled, perf tried to
    allocate a new event (as part of the restart). This caused the above
    BUG_ON as you can't sleep with preemption disabled.

    The preemption restriction seemed agressive as we are not doing anything
    on that particular cpu, but with all the online cpus (which are
    protected by the get_online_cpus lock). Remove the restriction and the
    BUG_ON goes away.

    Signed-off-by: Don Zickus
    Acked-by: Michal Hocko
    Reported-by: Peter Wu
    Tested-by: Peter Wu
    Acked-by: David Rientjes
    Cc: [3.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Zickus
     
  • To allow filtering of huge pages, makedumpfile must be able to identify
    them in the dump. This can be done by checking the appropriate page
    flag, so communicate its value to makedumpfile through the VMCOREINFO
    interface.

    There's only one small catch. Depending on how many page flags are
    available on a given architecture, this bit can be called PG_head or
    PG_compound.

    I sent a similar patch back in 2012, but Eric Biederman did not like
    using an #ifdef. So, this time I'm adding a common symbol
    (PG_head_mask) instead.

    See https://lkml.org/lkml/2012/11/28/91 for the previous version.

    Signed-off-by: Petr Tesarik
    Acked-by: Vivek Goyal
    Cc: Eric Biederman
    Cc: Paul Mackerras
    Cc: Fengguang Wu
    Cc: Benjamin Herrenschmidt
    Cc: Shaohua Li
    Cc: Alexey Kardashevskiy
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     
  • There is a race between the CPU offline code (within stop-machine) and
    the smp-call-function code, which can lead to getting IPIs on the
    outgoing CPU, *after* it has gone offline.

    Specifically, this can happen when using
    smp_call_function_single_async() to send the IPI, since this API allows
    sending asynchronous IPIs from IRQ disabled contexts. The exact race
    condition is described below.

    During CPU offline, in stop-machine, we don't enforce any rule in the
    _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
    the other CPUs disable their local interrupts. Due to this, we can
    encounter a situation in which an IPI is sent by one of the other CPUs
    to the outgoing CPU (while it is *still* online), but the outgoing CPU
    ends up noticing it only *after* it has gone offline.

    CPU 1 CPU 2
    (Online CPU) (CPU going offline)

    Enter _PREPARE stage Enter _PREPARE stage

    Enter _DISABLE_IRQ stage

    =
    Got a device interrupt, and | Didn't notice the IPI
    the interrupt handler sent an | since interrupts were
    IPI to CPU 2 using | disabled on this CPU.
    smp_call_function_single_async() |
    =

    Enter _DISABLE_IRQ stage

    Enter _RUN stage Enter _RUN stage

    =
    Busy loop with interrupts | Invoke take_cpu_down()
    disabled. | and take CPU 2 offline
    =

    Enter _EXIT stage Enter _EXIT stage

    Re-enable interrupts Re-enable interrupts

    The pending IPI is noted
    immediately, but alas,
    the CPU is offline at
    this point.

    This of course, makes the smp-call-function IPI handler code running on
    CPU 2 unhappy and it complains about "receiving an IPI on an offline
    CPU".

    One real example of the scenario on CPU 1 is the block layer's
    complete-request call-path:

    __blk_complete_request() [interrupt-handler]
    raise_blk_irq()
    smp_call_function_single_async()

    However, if we look closely, the block layer does check that the target
    CPU is online before firing the IPI. So in this case, it is actually
    the unfortunate ordering/timing of events in the stop-machine phase that
    leads to receiving IPIs after the target CPU has gone offline.

    In reality, getting a late IPI on an offline CPU is not too bad by
    itself (this can happen even due to hardware latencies in IPI
    send-receive). It is a bug only if the target CPU really went offline
    without executing all the callbacks queued on its list. (Note that a
    CPU is free to execute its pending smp-call-function callbacks in a
    batch, without waiting for the corresponding IPIs to arrive for each one
    of those callbacks).

    So, fixing this issue can be broken up into two parts:

    1. Ensure that a CPU goes offline only after executing all the
    callbacks queued on it.

    2. Modify the warning condition in the smp-call-function IPI handler
    code such that it warns only if an offline CPU got an IPI *and* that
    CPU had gone offline with callbacks still pending in its queue.

    Achieving part 1 is straight-forward - just flush (execute) all the
    queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
    including those callbacks for which the source CPU's IPIs might not have
    been received on the outgoing CPU yet. Once we do this, an IPI that
    arrives late on the CPU going offline (either due to the race mentioned
    above, or due to hardware latencies) will be completely harmless, since
    the outgoing CPU would have executed all the queued callbacks before
    going offline.

    Overall, this fix (parts 1 and 2 put together) additionally guarantees
    that we will see a warning only when the *IPI-sender code* is buggy -
    that is, if it queues the callback _after_ the target CPU has gone
    offline.

    [1]. The CPU_DYING part needs a little more explanation: by the time we
    execute the CPU_DYING notifier callbacks, the CPU would have already
    been marked offline. But we want to flush out the pending callbacks at
    this stage, ignoring the fact that the CPU is offline. So restructure
    the IPI handler code so that we can by-pass the "is-cpu-offline?" check
    in this particular case. (Of course, the right solution here is to fix
    CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
    notifiers, but this requires a lot of audit to ensure that this change
    doesn't break any existing code; hence lets go with the solution
    proposed above until that is done).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Srivatsa S. Bhat
    Suggested-by: Frederic Weisbecker
    Cc: "Paul E. McKenney"
    Cc: Borislav Petkov
    Cc: Christoph Hellwig
    Cc: Frederic Weisbecker
    Cc: Gautham R Shenoy
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Tested-by: Sachin Kamat
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

22 Jun, 2014

2 commits

  • Pull perf fixes from Ingo Molnar:
    "This is larger than usual: the main reason are the ARM symbol lookup
    speedups that came in late and were hard to resist.

    There's also a kprobes fix and various tooling fixes, plus the minimal
    re-enablement of the mmap2 support interface"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/kprobes: Fix build errors and blacklist context_track_user
    perf tests: Add test for closing dso objects on EMFILE error
    perf tests: Add test for caching dso file descriptors
    perf tests: Allow reuse of test_file function
    perf tests: Spawn child for each test
    perf tools: Add dso__data_* interface descriptons
    perf tools: Allow to close dso fd in case of open failure
    perf tools: Add file size check and factor dso__data_read_offset
    perf tools: Cache dso data file descriptor
    perf tools: Add global count of opened dso objects
    perf tools: Add global list of opened dso objects
    perf tools: Add data_fd into dso object
    perf tools: Separate dso data related variables
    perf tools: Cache register accesses for unwind processing
    perf record: Fix to honor user freq/interval properly
    perf timechart: Reflow documentation
    perf probe: Improve error messages in --line option
    perf probe: Improve an error message of perf probe --vars mode
    perf probe: Show error code and description in verbose mode
    perf probe: Improve error message for unknown member of data structure
    ...

    Linus Torvalds
     
  • …nux/kernel/git/tip/tip

    Pull rtmutex fixes from Thomas Gleixner:
    "Another three patches to make the rtmutex code more robust. That's
    the last urgent fallout from the big futex/rtmutex investigation"

    * 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rtmutex: Plug slow unlock race
    rtmutex: Detect changes in the pi lock chain
    rtmutex: Handle deadlock detection smarter

    Linus Torvalds
     

21 Jun, 2014

3 commits

  • syscall_regfunc() ignores the kernel threads because "it has no effect",
    see cc3b13c1 "Don't trace kernel thread syscalls" which added this check.

    However, this means that a user-space task spawned by call_usermodehelper()
    will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0.

    Remove this check. The unnecessary report from ret_from_fork path mentioned
    by cc3b13c1 is no longer possible, see See commit fb45550d76bb5 "make sure
    that kernel_thread() callbacks call do_exit() themselves".

    A kernel_thread() callback can only return and take the int_ret_from_sys_call
    path after do_execve() succeeds, otherwise the kernel will crash. But in this
    case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT.

    Link: http://lkml.kernel.org/p/20140413185938.GD20668@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • 1. Remove _irqsafe from syscall_regfunc/syscall_unregfunc,
    read_lock(tasklist) doesn't need to disable irqs.

    2. Change this code to avoid the deprecated do_each_thread()
    and use for_each_process_thread() (stolen from the patch
    from Frederic).

    3. Change syscall_regfunc() to check PF_KTHREAD to skip
    the kernel threads, ->mm != NULL is the common mistake.

    Note: probably this check should be simply removed, needs
    another patch.

    [fweisbec@gmail.com: s/do_each_thread/for_each_process_thread/]
    Link: http://lkml.kernel.org/p/20140413185918.GC20668@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • syscall_regfunc() and syscall_unregfunc() should set/clear
    TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
    with copy_process() and miss the new child which was not added to
    the process/thread lists yet.

    Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
    under tasklist.

    Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com

    Cc: stable@vger.kernel.org # 2.6.33
    Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints"
    Acked-by: Frederic Weisbecker
    Acked-by: Paul E. McKenney
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

20 Jun, 2014

2 commits

  • Pull ACPI and power management fixes from Rafael Wysocki:
    "These are fixes mostly (ia64 regression related to the ACPI
    enumeration of devices, cpufreq regressions, fix for I2C controllers
    included in Intel SoCs, mvebu cpuidle driver fix related to sysfs)
    plus additional kernel command line arguments from Kees to make it
    possible to build kernel images with hibernation and the kernel
    address space randomization included simultaneously, a new ACPI
    battery driver quirk for a system with a broken BIOS and a couple of
    ACPI core cleanups.

    Specifics:

    - Fix for an ia64 regression introduced during the 3.11 cycle by a
    commit that modified the hardware initialization ordering and made
    device discovery fail on some systems.

    - Fix for a build problem on systems where the cpufreq-cpu0 driver is
    built-in and the cpu-thermal driver is modular from Arnd Bergmann.

    - Fix for a recently introduced computational mistake in the
    intel_pstate driver that leads to excessive rounding errors from
    Doug Smythies.

    - Fix for a failure code path in cpufreq_update_policy() that fails
    to unlock the locks acquired previously from Aaron Plattner.

    - Fix for the cpuidle mvebu driver to use shorter state names which
    will prevent the sysfs interface from returning mangled strings.
    From Gregory Clement.

    - ACPI LPSS driver fix to make sure that the I2C controllers included
    in BayTrail SoCs are not held in the reset state while they are
    being probed from Mika Westerberg.

    - New kernel command line arguments making it possible to build
    kernel images with hibernation and kASLR included at the same time
    and to select which of them will be used via the command line (they
    are still functionally mutually exclusive, though). From Kees
    Cook.

    - ACPI battery driver quirk for Acer Aspire V5-573G that fails to
    send battery status change notifications timely from Alexander
    Mezin.

    - Two ACPI core cleanups from Christoph Jaeger and Fabian Frederick"

    * tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpuidle: mvebu: Fix the name of the states
    cpufreq: unlock when failing cpufreq_update_policy()
    intel_pstate: Correct rounding in busy calculation
    ACPI: use kstrto*() instead of simple_strto*()
    ACPI / processor replace __attribute__((packed)) by __packed
    ACPI / battery: add quirk for Acer Aspire V5-573G
    ACPI / battery: use callback for setting up quirks
    ACPI / LPSS: Take I2C host controllers out of reset
    x86, kaslr: boot-time selectable with hibernation
    PM / hibernate: introduce "nohibernate" boot parameter
    cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency
    ACPI / ia64 / sba_iommu: Restore the working initialization ordering

    Linus Torvalds
     
  • Pull sparc fixes from David Miller:
    "Sparc sparse fixes from Sam Ravnborg"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
    sparc64: fix sparse warnings in int_64.c
    sparc64: fix sparse warning in ftrace.c
    sparc64: fix sparse warning in kprobes.c
    sparc64: fix sparse warning in kgdb_64.c
    sparc64: fix sparse warnings in compat_audit.c
    sparc64: fix sparse warnings in init_64.c
    sparc64: fix sparse warnings in aes_glue.c
    sparc: fix sparse warnings in smp_32.c + smp_64.c
    sparc64: fix sparse warnings in perf_event.c
    sparc64: fix sparse warnings in kprobes.c
    sparc64: fix sparse warning in tsb.c
    sparc64: clean up compat_sigset_t.seta handling
    sparc64: fix sparse "Should it be static?" warnings in signal32.c
    sparc64: fix sparse warnings in sys_sparc32.c
    sparc64: fix sparse warning in pci.c
    sparc64: fix sparse warnings in smp_64.c
    sparc64: fix sparse warning in prom_64.c
    sparc64: fix sparse warning in btext.c
    sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
    sparc64: fix sparse warning in process_64.c
    ...

    Conflicts:
    arch/sparc/include/asm/pgtable_64.h

    Linus Torvalds
     

17 Jun, 2014

2 commits

  • Changes kASLR from being compile-time selectable (blocked by
    CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
    available by default) via the "kaslr" kernel command line.

    Signed-off-by: Kees Cook
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Kees Cook
     
  • To support using kernel features that are not compatible with hibernation,
    this creates the "nohibernate" kernel boot parameter to disable both
    hibernation and resume. This allows hibernation support to be a boot-time
    choice instead of only a compile-time choice.

    Signed-off-by: Kees Cook
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Kees Cook
     

16 Jun, 2014

1 commit

  • When the rtmutex fast path is enabled the slow unlock function can
    create the following situation:

    spin_lock(foo->m->wait_lock);
    foo->m->owner = NULL;
    rt_mutex_lock(foo->m); refcnt);
    rt_mutex_unlock(foo->m); m->wait_lock); owner */
    clear_rt_mutex_waiters(m);
    owner = rt_mutex_owner(m);
    spin_unlock(m->wait_lock);
    if (cmpxchg(m->owner, owner, 0) == owner)
    return;
    spin_lock(m->wait_lock);
    }

    So in case of a new waiter incoming while the owner tries the slow
    path unlock we have two situations:

    unlock(wait_lock);
    lock(wait_lock);
    cmpxchg(p, owner, 0) == owner
    mark_rt_mutex_waiters(lock);
    acquire(lock);

    Or:

    unlock(wait_lock);
    lock(wait_lock);
    mark_rt_mutex_waiters(lock);
    cmpxchg(p, owner, 0) != owner
    enqueue_waiter();
    unlock(wait_lock);
    lock(wait_lock);
    wakeup_next waiter();
    unlock(wait_lock);
    lock(wait_lock);
    acquire(lock);

    If the fast path is disabled, then the simple

    m->owner = NULL;
    unlock(m->wait_lock);

    is sufficient as all access to m->owner is serialized via
    m->wait_lock;

    Also document and clarify the wakeup_next_waiter function as suggested
    by Oleg Nesterov.

    Reported-by: Steven Rostedt
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Steven Rostedt
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

14 Jun, 2014

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This essentially reverts commit:

    ecd50f714c42 ("kprobes, x86: Call exception_enter after kprobes handled")

    since it causes build errors with CONFIG_CONTEXT_TRACKING and
    that has been made from misunderstandings;
    context_track_user_*() don't involve much in interrupt context,
    it just returns if in_interrupt() is true.

    Instead of changing the do_debug/int3(), this just adds
    context_track_user_*() to kprobes blacklist, since those are
    still can be called right before kprobes handles int3 and debug
    exceptions, and probing those will cause an infinite loop.

    Reported-by: Frederic Weisbecker
    Signed-off-by: Masami Hiramatsu
    Cc: Borislav Petkov
    Cc: Kees Cook
    Cc: Jiri Kosina
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Seiji Aguchi
    Cc: Andrew Morton
    Cc: Kees Cook
    Link: http://lkml.kernel.org/r/20140614064711.7865.45957.stgit@kbuild-fedora.novalocal
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

13 Jun, 2014

6 commits

  • Pull tracing cleanups and bugfixes from Steven Rostedt:
    "One bug fix that goes back to 3.10. Accessing a non existent buffer
    if "possible cpus" is greater than actual CPUs (including offline
    CPUs).

    Namhyung Kim did some reviews of the patches I sent this merge window
    and found a memory leak and had a few clean ups"

    * tag 'trace-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix check of ftrace_trace_arrays list_empty() check
    tracing: Fix leak of per cpu max data in instances
    tracing: Cleanup saved_cmdlines_size changes
    ring-buffer: Check if buffer exists before polling

    Linus Torvalds
     
  • Pull more scheduler updates from Ingo Molnar:
    "Second round of scheduler changes:
    - try-to-wakeup and IPI reduction speedups, from Andy Lutomirski
    - continued power scheduling cleanups and refactorings, from Nicolas
    Pitre
    - misc fixes and enhancements"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/deadline: Delete extraneous extern for to_ratio()
    sched/idle: Optimize try-to-wake-up IPI
    sched/idle: Simplify wake_up_idle_cpu()
    sched/idle: Clear polling before descheduling the idle thread
    sched, trace: Add a tracepoint for IPI-less remote wakeups
    cpuidle: Set polling in poll_idle
    sched: Remove redundant assignment to "rt_rq" in update_curr_rt(...)
    sched: Rename capacity related flags
    sched: Final power vs. capacity cleanups
    sched: Remove remaining dubious usage of "power"
    sched: Let 'struct sched_group_power' care about CPU capacity
    sched/fair: Disambiguate existing/remaining "capacity" usage
    sched/fair: Change "has_capacity" to "has_free_capacity"
    sched/fair: Remove "power" from 'struct numa_stats'
    sched: Fix signedness bug in yield_to()
    sched/fair: Use time_after() in record_wakee()
    sched/balancing: Reduce the rate of needless idle load balancing
    sched/fair: Fix unlocked reads of some cfs_b->quota/period

    Linus Torvalds
     
  • Pull more perf updates from Ingo Molnar:
    "A second round of perf updates:

    - wide reaching kprobes sanitization and robustization, with the hope
    of fixing all 'probe this function crashes the kernel' bugs, by
    Masami Hiramatsu.

    - uprobes updates from Oleg Nesterov: tmpfs support, corner case
    fixes and robustization work.

    - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
    et al:
    * Add support to accumulate hist periods (Namhyung Kim)
    * various fixes, refactorings and enhancements"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    perf: Differentiate exec() and non-exec() comm events
    perf: Fix perf_event_comm() vs. exec() assumption
    uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
    perf/documentation: Add description for conditional branch filter
    perf/x86: Add conditional branch filtering support
    perf/tool: Add conditional branch filter 'cond' to perf record
    perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
    uprobes: Teach copy_insn() to support tmpfs
    uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
    perf/x86: Use common PMU interrupt disabled code
    perf/ARM: Use common PMU interrupt disabled code
    perf: Disable sampled events if no PMU interrupt
    perf: Fix use after free in perf_remove_from_context()
    perf tools: Fix 'make help' message error
    perf record: Fix poll return value propagation
    perf tools: Move elide bool into perf_hpp_fmt struct
    perf tools: Remove elide setup for SORT_MODE__MEMORY mode
    perf tools: Fix "==" into "=" in ui_browser__warning assignment
    perf tools: Allow overriding sysfs and proc finding with env var
    perf tools: Consider header files outside perf directory in tags target
    ...

    Linus Torvalds
     
  • Pull more locking changes from Ingo Molnar:
    "This is the second round of locking tree updates for v3.16, offering
    large system scalability improvements:

    - optimistic spinning for rwsems, from Davidlohr Bueso.

    - 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, locking/rwlocks: Enable qrwlocks on x86
    locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks
    locking/mutexes: Documentation update/rewrite
    locking/rwsem: Fix checkpatch.pl warnings
    locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK
    locking/rwsem: Support optimistic spinning

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

    2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

    3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

    4) BPF now has a "random" opcode, from Chema Gonzalez.

    5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

    6) Support TCP fastopen over ipv6, from Daniel Lee.

    7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

    8) Support software TSO in fec driver too, from Nimrod Andy.

    9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

    10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

    11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

    12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

    13) Support busy polling in SCTP, from Neal Horman.

    14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

    15) Bridge promisc mode handling improvements from Vlad Yasevich.

    16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
    rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
    tcp: fixing TLP's FIN recovery
    net: fec: Add software TSO support
    net: fec: Add Scatter/gather support
    net: fec: Increase buffer descriptor entry number
    net: fec: Factorize feature setting
    net: fec: Enable IP header hardware checksum
    net: fec: Factorize the .xmit transmit function
    bridge: fix compile error when compiling without IPv6 support
    bridge: fix smatch warning / potential null pointer dereference
    via-rhine: fix full-duplex with autoneg disable
    bnx2x: Enlarge the dorq threshold for VFs
    bnx2x: Check for UNDI in uncommon branch
    bnx2x: Fix 1G-baseT link
    bnx2x: Fix link for KR with swapped polarity lane
    sctp: Fix sk_ack_backlog wrap-around problem
    net/core: Add VF link state control policy
    net/fsl: xgmac_mdio is dependent on OF_MDIO
    net/fsl: Make xgmac_mdio read error message useful
    net_sched: drr: warn when qdisc is not work conserving
    ...

    Linus Torvalds
     
  • Pull more ACPI and power management updates from Rafael Wysocki:
    "These are fixups on top of the previous PM+ACPI pull request,
    regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes
    (ACPI reset, cpufreq), new PM trace points for system suspend
    profiling and a copyright notice update.

    Specifics:

    - I didn't remember correctly that the Hans de Goede's ACPI video
    patches actually didn't flip the video.use_native_backlight
    default, although we had discussed that and decided to do that.
    Since I said we would do that in the previous PM+ACPI pull request,
    make that change for real now.

    - ACPI bus check notifications for PCI host bridges don't cause the
    bus below the host bridge to be checked for changes as they should
    because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
    subsystem that forgets to add hotplug contexts to PCI host bridge
    ACPI device objects. Create hotplug contexts for PCI host bridges
    too as appropriate.

    - Revert recent cpufreq commit related to the big.LITTLE cpufreq
    driver that breaks arm64 builds.

    - Fix for a regression in the ppc-corenet cpufreq driver introduced
    during the 3.15 cycle and causing the driver to use the remainder
    from do_div instead of the quotient. From Ed Swarthout.

    - Resets triggered by panic activate a BUG_ON() in vmalloc.c on
    systems where the ACPI reset register is located in memory address
    space. Fix from Randy Wright.

    - Fix for a problem with cpufreq governors that decisions made by
    them may be suboptimal due to the fact that deferrable timers are
    used by them for CPU load sampling. From Srivatsa S Bhat.

    - Fix for a problem with the Tegra cpufreq driver where the CPU
    frequency is temporarily switched to a "stable" level that is
    different from both the initial and target frequencies during
    transitions which causes udelay() to expire earlier than it should
    sometimes. From Viresh Kumar.

    - New trace points and rework of some existing trace points for
    system suspend/resume profiling from Todd Brandt.

    - Assorted cpufreq fixes and cleanups from Stratos Karafotis and
    Viresh Kumar.

    - Copyright notice update for suspend-and-cpuhotplug.txt from
    Srivatsa S Bhat"

    * tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges
    PM / sleep: trace events for device PM callbacks
    cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
    cpufreq: tegra: update comment for clarity
    cpufreq: intel_pstate: Remove duplicate CPU ID check
    cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
    PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt
    cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
    cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
    PM / sleep: trace events for suspend/resume
    cpufreq: ppc-corenet-cpu-freq: do_div use quotient
    Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
    cpufreq: Tegra: implement intermediate frequency callbacks
    cpufreq: add support for intermediate (stable) frequencies
    ACPI / video: Change the default for video.use_native_backlight to 1
    ACPI: Fix bug when ACPI reset register is implemented in system memory

    Linus Torvalds
     

12 Jun, 2014

3 commits

  • Fix this dependency on the locking tree's smp_mb*() API changes:

    kernel/sched/idle.c:247:3: error: implicit declaration of function ‘smp_mb__after_atomic’ [-Werror=implicit-function-declaration]

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • * pm-sleep:
    PM / sleep: trace events for device PM callbacks
    PM / sleep: trace events for suspend/resume

    Rafael J. Wysocki
     
  • Pull module updates from Rusty Russell:
    "Most of this is cleaning up various driver sysfs permissions so we can
    re-add the perm check (we unified the module param and sysfs checks,
    but the module ones were stronger so we weakened them temporarily).

    Param parsing gets documented, and also "--" now forces args to be
    handed to init (and ignored by the kernel).

    Module NX/RO protections get tightened: we now set them before calling
    parse_args()"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: set nx before marking module MODULE_STATE_COMING.
    samples/kobject/: avoid world-writable sysfs files.
    drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files.
    drivers/staging/speakup/: avoid world-writable sysfs files.
    drivers/regulator/virtual: avoid world-writable sysfs files.
    drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files.
    drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files.
    drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files.
    drivers/mtd/devices/docg3.c: avoid world-writable sysfs files.
    speakup: fix incorrect perms on speakup_acntsa.c
    cpumask.h: silence warning with -Wsign-compare
    Documentation: Update kernel-parameters.tx
    param: hand arguments after -- straight to init
    modpost: Fix resource leak in read_dump()

    Linus Torvalds
     

11 Jun, 2014

4 commits

  • Merge leftovers from Andrew Morton:
    "A few leftovers: ocfs2, gcov, RTC"

    * emailed patches from Andrew Morton :
    rtc: s5m: consolidate two device type switch statements
    rtc: s5m: add support for S2MPS14 RTC
    rtc: s5m: support different register layout
    rtc: s5m: use shorter time of register update
    rtc: s5m: remove undocumented time init on first boot
    mfd/rtc: sec/s5m: rename SEC* symbols to S5M
    gcov: add support for GCC 4.9
    ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one

    Linus Torvalds
     
  • This patch handles the gcov-related changes in GCC 4.9:

    A new counter (time profile) is added. The total number is 9 now.

    A new profile merge function __gcov_merge_time_profile is added.

    See gcc/gcov-io.h and libgcc/libgcov-merge.c

    For the first change, the layout of struct gcov_info is affected.

    For the second one, a dummy function is added to kernel/gcov/base.c
    similarly.

    Signed-off-by: Yuan Pengfei
    Acked-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yuan Pengfei
     
  • The kernel has no concept of capabilities with respect to inodes; inodes
    exist independently of namespaces. For example, inode_capable(inode,
    CAP_LINUX_IMMUTABLE) would be nonsense.

    This patch changes inode_capable to check for uid and gid mappings and
    renames it to capable_wrt_inode_uidgid, which should make it more
    obvious what it does.

    Fixes CVE-2014-4014.

    Cc: Theodore Ts'o
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Chinner
    Cc: stable@vger.kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • The check that tests if ftrace_trace_arrays is empty in
    top_trace_array(), uses the .prev pointer:

    if (list_empty(ftrace_trace_arrays.prev))

    instead of testing the variable itself:

    if (list_empty(&ftrace_trace_arrays))

    Although it is technically correct, it is awkward and confusing.
    Use the proper method.

    Link: http://lkml.kernel.org/r/87oay1bas8.fsf@sejong.aot.lge.com

    Reported-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)