01 Mar, 2012

3 commits

  • commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream.

    This patch is intentionally incomplete to simplify the review.
    It ignores ep_unregister_pollwait() which plays with the same wqh.
    See the next change.

    epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
    f_op->poll() needs. In particular it assumes that the wait queue
    can't go away until eventpoll_release(). This is not true in case
    of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
    which is not connected to the file.

    This patch adds the special event, POLLFREE, currently only for
    epoll. It expects that init_poll_funcptr()'ed hook should do the
    necessary cleanup. Perhaps it should be defined as EPOLLFREE in
    eventpoll.

    __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
    ->signalfd_wqh is not empty, we add the new signalfd_cleanup()
    helper.

    ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
    This make this poll entry inconsistent, but we don't care. If you
    share epoll fd which contains our sigfd with another process you
    should blame yourself. signalfd is "really special". I simply do
    not know how we can define the "right" semantics if it used with
    epoll.

    The main problem is, epoll calls signalfd_poll() once to establish
    the connection with the wait queue, after that signalfd_poll(NULL)
    returns the different/inconsistent results depending on who does
    EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
    has nothing to do with the file, it works with the current thread.

    In short: this patch is the hack which tries to fix the symptoms.
    It also assumes that nobody can take tasklist_lock under epoll
    locks, this seems to be true.

    Note:

    - we do not have wake_up_all_poll() but wake_up_poll()
    is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

    - signalfd_cleanup() uses POLLHUP along with POLLFREE,
    we need a couple of simple changes in eventpoll.c to
    make sure it can't be "lost".

    Reported-by: Maxime Bizon
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit b4bc724e82e80478cba5fe9825b62e71ddf78757 upstream.

    An interrupt might be pending when irq_startup() is called, but the
    startup code does not invoke the resend logic. In some cases this
    prevents the device from issuing another interrupt which renders the
    device non functional.

    Call the resend function in irq_startup() to keep things going.

    Reported-and-tested-by: Russell King
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit ac5637611150281f398bb7a47e3fcb69a09e7803 upstream.

    When the primary handler of an interrupt which is marked IRQ_ONESHOT
    returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
    woken and the unmask logic of the interrupt line is never
    invoked. This keeps the interrupt masked forever.

    This was not noticed as most IRQ_ONESHOT users wake the thread
    unconditionally (usually because they cannot access the underlying
    device from hard interrupt context). Though this behaviour was nowhere
    documented and not necessarily intentional. Some drivers can avoid the
    thread wakeup in certain cases and run into the situation where the
    interrupt line s kept masked.

    Handle it gracefully.

    Reported-and-tested-by: Lothar Wassmann
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Feb, 2012

1 commit


14 Feb, 2012

7 commits

  • commit 9ec84acee1e221d99dc33237bff5e82839d10cc0 upstream.

    We do want to allow lock debugging for GPL-compatible modules
    that are not (yet) built in-tree. This was disabled as a
    side-effect of commit 2449b8ba0745327c5fa49a8d9acffe03b2eded69
    ('module,bug: Add TAINT_OOT_MODULE flag for modules not built
    in-tree'). Lock debug warnings now include taint flags, so
    kernel developers should still be able to deflect warnings
    caused by out-of-tree modules.

    The TAINT_PROPRIETARY_MODULE flag for non-GPL-compatible modules
    will still disable lock debugging.

    Signed-off-by: Ben Hutchings
    Cc: Nick Bowler
    Cc: Dave Jones
    Cc: Rusty Russell
    Cc: Randy Dunlap
    Cc: Debian kernel maintainers
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Link: http://lkml.kernel.org/r/1323268258.18450.11.camel@deadeye
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • commit df754e6af2f237a6c020c0daff55a1a609338e31 upstream.

    It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false
    lockdep messages, so do not disable lockdep in that case.
    We still want to keep lockdep disabled in the
    TAINT_OOT_MODULE case:

    - bin-only modules can cause various instabilities in
    their and in unrelated kernel code

    - they are impossible to debug for kernel developers

    - they also typically do not have the copyright license
    permission to link to the GPL-ed lockdep code.

    Suggested-by: Ben Hutchings
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit fe9161db2e6053da21e4649d77bbefaf3030b11d upstream.

    In the SNAPSHOT_CREATE_IMAGE ioctl, if the call to hibernation_snapshot()
    fails, the frozen tasks are not thawed.

    And in the case of success, if we happen to exit due to a successful freezer
    test, all tasks (including those of userspace) are thawed, whereas actually
    we should have thawed only the kernel threads at that point. Fix both these
    issues.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Srivatsa S. Bhat
     
  • commit 97819a26224f019e73d88bb2fd4eb5a614860461 upstream.

    Commit 2aede851ddf08666f68ffc17be446420e9d2a056 (PM / Hibernate: Freeze
    kernel threads after preallocating memory) moved the freezing of kernel
    threads to hibernation_snapshot() function.

    So now, if the call to hibernation_snapshot() returns early due to a
    successful hibernation test, the caller has to thaw processes to ensure
    that the system gets back to its original state.

    But in SNAPSHOT_CREATE_IMAGE hibernation ioctl, the caller does not thaw
    processes in case hibernation_snapshot() returned due to a successful
    freezer test. Fix this issue. But note we still send the value of 'in_suspend'
    (which is now 0) to userspace, because we are not in an error path per-se,
    and moreover, the value of in_suspend correctly depicts the situation here.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Srivatsa S. Bhat
     
  • commit cb297a3e433dbdcf7ad81e0564e7b804c941ff0d upstream.

    This issue happens under the following conditions:

    1. preemption is off
    2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
    3. RT scheduling class
    4. SMP system

    Sequence is as follows:

    1.suppose current task is A. start schedule()
    2.task A is enqueued pushable task at the entry of schedule()
    __schedule
    prev = rq->curr;
    ...
    put_prev_task
    put_prev_task_rt
    enqueue_pushable_task
    4.pick the task B as next task.
    next = pick_next_task(rq);
    3.rq->curr set to task B and context_switch is started.
    rq->curr = next;
    4.At the entry of context_swtich, release this cpu's rq->lock.
    context_switch
    prepare_task_switch
    prepare_lock_switch
    raw_spin_unlock_irq(&rq->lock);
    5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
    6.try_to_wake_up() which called by ISR acquires rq->lock
    try_to_wake_up
    ttwu_remote
    rq = __task_rq_lock(p)
    ttwu_do_wakeup(rq, p, wake_flags);
    task_woken_rt
    7.push_rt_task picks the task A which is enqueued before.
    task_woken_rt
    push_rt_tasks(rq)
    next_task = pick_next_pushable_task(rq)
    8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
    lowest_rq can be the remote rq.
    (But,If preemption is on, double_lock_balance always return 1 and it
    does't happen.)
    push_rt_task
    find_lock_lowest_rq
    if (double_lock_balance(rq, lowest_rq))..
    9.find_lock_lowest_rq return the available rq. task A is migrated to
    the remote cpu/rq.
    push_rt_task
    ...
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq->cpu);
    activate_task(lowest_rq, next_task, 0);
    10. But, task A is on irq context at this cpu.
    So, task A is scheduled by two cpus at the same time until restore from IRQ.
    Task A's stack is corrupted.

    To fix it, don't migrate an RT task if it's still running.

    Signed-off-by: Chanho Min
    Signed-off-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Chanho Min
     
  • commit 181e9bdef37bfcaa41f3ab6c948a2a0d60a268b5 upstream.

    Commit 2aede851ddf08666f68ffc17be446420e9d2a056

    PM / Hibernate: Freeze kernel threads after preallocating memory

    introduced a mechanism by which kernel threads were frozen after
    the preallocation of hibernate image memory to avoid problems with
    frozen kernel threads not responding to memory freeing requests.
    However, it overlooked the s2disk code path in which the
    SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
    which caused freeze_workqueues_begin() to BUG(), because it saw
    that worqueues had been already frozen.

    Although in principle this issue might be addressed by removing
    the relevant BUG_ON() from freeze_workqueues_begin(), that would
    reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
    attempted to avoid into that particular code path. For this reason,
    to fix the issue at hand, introduce thaw_kernel_threads() and make
    the SNAPSHOT_FREE ioctl execute it.

    Special thanks to Srivatsa S. Bhat for detailed analysis of the
    problem.

    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Srivatsa S. Bhat
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit 55ca6140e9bb307efc97a9301a4f501de02a6fd6 upstream.

    In function pre_handler_kretprobe(), the allocated kretprobe_instance
    object will get leaked if the entry_handler callback returns non-zero.
    This may cause all the preallocated kretprobe_instance objects exhausted.

    This issue can be reproduced by changing
    samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix
    is straightforward: just put the allocated kretprobe_instance object back
    onto the free_instances list.

    [akpm@linux-foundation.org: use raw_spin_lock/unlock]
    Signed-off-by: Jiang Liu
    Acked-by: Jim Keniston
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Masami Hiramatsu
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jiang Liu
     

26 Jan, 2012

3 commits

  • commit d496aab567e7e52b3e974c9192a5de6e77dce32c upstream.

    Commit ef53d9c5e ("kprobes: improve kretprobe scalability with hashed
    locking") introduced a bug where we can potentially leak
    kretprobe_instances since we initialize a hlist head after having used
    it.

    Initialize the hlist head before using it.

    Reported by: Jim Keniston
    Acked-by: Jim Keniston
    Signed-off-by: Ananth N Mavinakayanahalli
    Acked-by: Masami Hiramatsu
    Cc: Srinivasa D S
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Ananth N Mavinakayanahalli
     
  • commit c10076c4304083af15a41f6bc5e657e781c1f9a6 upstream.

    Tracepoints are disabled for tainted modules, which is usually because the
    module is either proprietary or was forced, and we don't want either of them
    using kernel tracepoints.

    But, a module can also be tainted by being in the staging directory or
    compiled out of tree. Either is fine for use with tracepoints, no need
    to punish them. I found this out when I noticed that my sample trace event
    module, when done out of tree, stopped working.

    Cc: Mathieu Desnoyers
    Cc: Ben Hutchings
    Cc: Dave Jones
    Cc: Greg Kroah-Hartman
    Cc: Rusty Russell
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt
     
  • commit 30fb6aa74011dcf595f306ca2727254d708b786e upstream.

    Multiple users of the function tracer can register their functions
    with the ftrace_ops structure. The accounting within ftrace will
    update the counter on each function record that is being traced.
    When the ftrace_ops filtering adds or removes functions, the
    function records will be updated accordingly if the ftrace_ops is
    still registered.

    When a ftrace_ops is removed, the counter of the function records,
    that the ftrace_ops traces, are decremented. When they reach zero
    the functions that they represent are modified to stop calling the
    mcount code.

    When changes are made, the code is updated via stop_machine() with
    a command passed to the function to tell it what to do. There is an
    ENABLE and DISABLE command that tells the called function to enable
    or disable the functions. But the ENABLE is really a misnomer as it
    should just update the records, as records that have been enabled
    and now have a count of zero should be disabled.

    The DISABLE command is used to disable all functions regardless of
    their counter values. This is the big off switch and is not the
    complement of the ENABLE command.

    To make matters worse, when a ftrace_ops is unregistered and there
    is another ftrace_ops registered, neither the DISABLE nor the
    ENABLE command are set when calling into the stop_machine() function
    and the records will not be updated to match their counter. A command
    is passed to that function that will update the mcount code to call
    the registered callback directly if it is the only one left. This
    means that the ftrace_ops that is still registered will have its callback
    called by all functions that have been set for it as well as the ftrace_ops
    that was just unregistered.

    Here's a way to trigger this bug. Compile the kernel with
    CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:

    CONFIG_FUNCTION_PROFILER=y
    # CONFIG_FUNCTION_GRAPH is not set

    This will force the function profiler to use the function tracer instead
    of the function graph tracer.

    # cd /sys/kernel/debug/tracing
    # echo schedule > set_ftrace_filter
    # echo function > current_tracer
    # cat set_ftrace_filter
    schedule
    # cat trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 692/68108025 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    kworker/0:2-909 [000] .... 531.235574: schedule -0 [001] .N.. 531.235575: schedule function_profile_enabled
    # echo 0 > function_porfile_enabled
    # cat set_ftrace_filter
    schedule
    # cat trace
    # tracer: function
    #
    # entries-in-buffer/entries-written: 159701/118821262 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    -0 [002] ...1 604.870655: local_touch_nmi -0 [002] d..1 604.870655: enter_idle -0 [002] d..1 604.870656: atomic_notifier_call_chain -0 [002] d..1 604.870656: __atomic_notifier_call_chain
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     

13 Jan, 2012

1 commit

  • commit 0d19ea866562e46989412a0676412fa0983c9ce7 upstream.

    If we mount a hierarchy with a specified name, the name is unique,
    and we can use it to mount the hierarchy without specifying its
    set of subsystem names. This feature is documented is
    Documentation/cgroups/cgroups.txt section 2.3

    Here's an example:

    # mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
    # mount -t cgroup -o name=myhier xxx /cgroup2

    But it was broken by commit 32a8cf235e2f192eb002755076994525cdbaa35a
    (cgroup: make the mount options parsing more accurate)

    This fixes the regression.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Li Zefan
     

05 Jan, 2012

2 commits

  • This is the temporary simple fix for 3.2, we need more changes in this
    area.

    1. do_signal_stop() assumes that the running untraced thread in the
    stopped thread group is not possible. This was our goal but it is
    not yet achieved: a stopped-but-resumed tracee can clone the running
    thread which can initiate another group-stop.

    Remove WARN_ON_ONCE(!current->ptrace).

    2. A new thread always starts with ->jobctl = 0. If it is auto-attached
    and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
    but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
    in do_jobctl_trap() if another debugger attaches.

    Change __ptrace_unlink() to set the artificial SIGSTOP for report.

    Alternatively we could change ptrace_init_task() to copy signr from
    current, but this means we can copy it for no reason and hide the
    possible similar problems.

    Acked-by: Tejun Heo
    Cc: [3.1]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Test-case:

    int main(void)
    {
    int pid, status;

    pid = fork();
    if (!pid) {
    for (;;) {
    if (!fork())
    return 0;
    if (waitpid(-1, &status, 0) < 0) {
    printf("ERR!! wait: %m\n");
    return 0;
    }
    }
    }

    assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
    assert(waitpid(-1, NULL, 0) == pid);

    assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
    PTRACE_O_TRACEFORK) == 0);

    do {
    ptrace(PTRACE_CONT, pid, 0, 0);
    pid = waitpid(-1, NULL, 0);
    } while (pid > 0);

    return 1;
    }

    It fails because ->real_parent sees its child in EXIT_DEAD state
    while the tracer is going to change the state back to EXIT_ZOMBIE
    in wait_task_zombie().

    The offending commit is 823b018e which moved the EXIT_DEAD check,
    but in fact we should not blame it. The original code was not
    correct as well because it didn't take ptrace_reparented() into
    account and because we can't really trust ->ptrace.

    This patch adds the additional check to close this particular
    race but it doesn't solve the whole problem. We simply can't
    rely on ->ptrace in this case, it can be cleared if the tracer
    is multithreaded by the exiting ->parent.

    I think we should kill EXIT_DEAD altogether, we should always
    remove the soon-to-be-reaped child from ->children or at least
    we should never do the DEAD->ZOMBIE transition. But this is too
    complex for 3.2.

    Reported-and-tested-by: Denys Vlasenko
    Tested-by: Lukasz Michalik
    Acked-by: Tejun Heo
    Cc: [3.0+]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

04 Jan, 2012

1 commit

  • vfork parent uninterruptibly and unkillably waits for its child to
    exec/exit. This wait is of unbounded length. Ignore such waits
    in the hung_task detector.

    Signed-off-by: Mandeep Singh Baines
    Reported-by: Sasha Levin
    LKML-Reference:
    Cc: Linus Torvalds
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: John Kacur
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     

01 Jan, 2012

1 commit

  • It was found (by Sasha) that if you use a futex located in the gate
    area we get stuck in an uninterruptible infinite loop, much like the
    ZERO_PAGE issue.

    While looking at this problem, PeterZ realized you'll get into similar
    trouble when hitting any install_special_pages() mapping. And are there
    still drivers setting up their own special mmaps without page->mapping,
    and without special VM or pte flags to make get_user_pages fail?

    In most cases, if page->mapping is NULL, we do not need to retry at all:
    Linus points out that even /proc/sys/vm/drop_caches poses no problem,
    because it ends up using remove_mapping(), which takes care not to
    interfere when the page reference count is raised.

    But there is still one case which does need a retry: if memory pressure
    called shmem_writepage in between get_user_pages_fast dropping page
    table lock and our acquiring page lock, then the page gets switched from
    filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
    Fault it back in to get the page->mapping needed for key->shared.inode.

    Reported-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

31 Dec, 2011

1 commit

  • This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c.

    It results in resume problems for various people. See for example

    http://thread.gmane.org/gmane.linux.kernel/1233033
    http://thread.gmane.org/gmane.linux.kernel/1233389
    http://thread.gmane.org/gmane.linux.kernel/1233159
    http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

    and the fedora and ubuntu bug reports

    https://bugzilla.redhat.com/show_bug.cgi?id=767248
    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

    which got bisected down to the stable version of this commit.

    Reported-by: Jonathan Nieder
    Reported-by: Phil Miller
    Reported-by: Philip Langdale
    Reported-by: Tim Gardner
    Cc: Thomas Gleixner
    Cc: Greg KH
    Cc: stable@kernel.org # for stable kernels that applied the original
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Dec, 2011

4 commits

  • * 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroups: fix a css_set not found bug in cgroup_attach_proc

    Linus Torvalds
     
  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time/clocksource: Fix kernel-doc warnings
    rtc: m41t80: Workaround broken alarm functionality
    rtc: Expire alarms after the time is set.

    Linus Torvalds
     
  • binary_sysctl() calls sysctl_getname() which allocates from names_cache
    slab usin __getname()

    The matching function to free the name is __putname(), and not putname()
    which should be used only to match getname() allocations.

    This is because when auditing is enabled, putname() calls audit_putname
    *instead* (not in addition) to __putname(). Then, if a syscall is in
    progress, audit_putname does not release the name - instead, it expects
    the name to get released when the syscall completes, but that will happen
    only if audit_getname() was called previously, i.e. if the name was
    allocated with getname() rather than the naked __getname(). So,
    __getname() followed by putname() ends up leaking memory.

    Signed-off-by: Michel Lespinasse
    Acked-by: Al Viro
    Cc: Christoph Hellwig
    Cc: Eric Paris
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
    nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
    new set of allowed cpuset nodes where the two nodemasks, as a result of
    the remap, are now disjoint.

    c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
    cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
    nodes from changing for a thread. This causes any update to a set of
    allowed nodes to stall until put_mems_allowed() is called.

    This stall is unncessary, however, if at least one node remains unchanged
    in the update to the set of allowed nodes. This was addressed by
    89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
    node remains set"), but it's still possible that an empty nodemask may be
    read from a mempolicy because the old nodemask may be remapped to the new
    nodemask during rebind. To prevent this, only avoid the stall if there is
    no mempolicy for the thread being changed.

    This is a temporary solution until all reads from mempolicy nodemasks can
    be guaranteed to not be empty without the get_mems_allowed()
    synchronization.

    Also moves the check for nodemask intersection inside task_lock() so that
    tsk->mems_allowed cannot change. This ensures that nothing can set this
    tsk's mems_allowed out from under us and also protects tsk->mempolicy.

    Reported-by: Miao Xie
    Signed-off-by: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Paul Menage
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

20 Dec, 2011

1 commit

  • There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
    is not called for the PF_EXITING case, find_existing_css_set() will return
    NULL inside cgroup_task_migrate() causing a BUG.

    This bug is easy to reproduce. Create a zombie and echo its pid to
    cgroup.procs.

    $ cat zombie.c
    \#include

    int main()
    {
    if (fork())
    pause();
    return 0;
    }
    $

    We are hitting this bug pretty regularly on ChromeOS.

    This bug is already fixed by Tejun Heo's cgroup patchset which is
    targetted for the next merge window:

    https://lkml.org/lkml/2011/11/1/356

    I've create a smaller patch here which just fixes this bug so that a
    fix can be merged into the current release and stable.

    Signed-off-by: Mandeep Singh Baines
    Downstream-Bug-Report: http://crosbug.com/23953
    Reviewed-by: Li Zefan
    Signed-off-by: Tejun Heo
    Cc: containers@lists.linux-foundation.org
    Cc: cgroups@vger.kernel.org
    Cc: stable@kernel.org
    Cc: KAMEZAWA Hiroyuki
    Cc: Frederic Weisbecker
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Menage
    Cc: Olof Johansson

    Mandeep Singh Baines
     

19 Dec, 2011

1 commit


18 Dec, 2011

1 commit


16 Dec, 2011

1 commit

  • Mike Galbraith reported that this recent commit:

    commit 4dcfe1025b513c2c1da5bf5586adb0e80148f612
    Author: Peter Zijlstra
    Date: Thu Nov 10 13:01:10 2011 +0100

    sched: Avoid SMT siblings in select_idle_sibling() if possible

    stopped selecting an idle SMT sibling when there are no idle
    cores in a single socket system.

    Intent of the select_idle_sibling() was to fallback to an idle
    SMT sibling, if it fails to identify an idle core. But this
    fallback was not happening on systems where all the scheduler
    domains had `SD_SHARE_PKG_RESOURCES' flag set.

    Fix it. Slightly bigger patch of cleaning all these goto's etc
    is queued up for the next release.

    Reported-by: Mike Galbraith
    Reported-by: Alex Shi
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Suresh Siddha
    Link: http://lkml.kernel.org/r/1323978421.1984.244.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 Dec, 2011

1 commit

  • Commit 10c6db11 ("perf: Fix loss of notification with multi-event")
    seems to unconditionally dereference event->rb in the wakeup handler,
    this is wrong, there might not be a buffer attached.

    Signed-off-by: Will Deacon
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111213152651.GP20297@mudshark.cambridge.arm.com
    [ minor edits ]
    Signed-off-by: Ingo Molnar

    Will Deacon
     

10 Dec, 2011

1 commit


09 Dec, 2011

3 commits


07 Dec, 2011

3 commits

  • perf_event_sched_in() shouldn't try to schedule task events if there
    are none otherwise task's ctx->is_active will be set and will not be
    cleared during sched_out. This will prevent newly added events from
    being scheduled into the task context.

    Fixes a boo-boo in commit 1d5f003f5a9 ("perf: Do not set task_ctx
    pointer in cpuctx if there are no events in the context").

    Signed-off-by: Gleb Natapov
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.com
    Signed-off-by: Ingo Molnar

    Gleb Natapov
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ftrace: Fix hash record accounting bug
    perf: Fix parsing of __print_flags() in TP_printk()
    jump_label: jump_label_inc may return before the code is patched
    ftrace: Remove force undef config value left for testing
    tracing: Restore system filter behavior
    tracing: fix event_subsystem ref counting

    Linus Torvalds
     
  • Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"),
    lockdep_init_map() will clear all the struct. But it will break
    lock_set_class()/lock_set_subclass(). A typical race condition
    is like below:

    CPU A CPU B
    lock_set_subclass(lockA);
    lock_set_class(lockA);
    lockdep_init_map(lockA);
    /* lockA->name is cleared */
    memset(lockA);
    __lock_acquire(lockA);
    /* lockA->class_cache[] is cleared */
    register_lock_class(lockA);
    look_up_lock_class(lockA);
    WARN_ON_ONCE(class->name !=
    lock->name);

    lock->name = name;

    So restore to what we have done before commit f59de89 but annotate
    ->lock with kmemcheck_mark_initialized() to suppress the kmemcheck
    warning reported in commit f59de89.

    Reported-by: Sergey Senozhatsky
    Reported-by: Borislav Petkov
    Suggested-by: Vegard Nossum
    Signed-off-by: Yong Zhang
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc:
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy
    Signed-off-by: Ingo Molnar

    Yong Zhang
     

06 Dec, 2011

4 commits

  • The expiry function compares the timer against current time and does
    not expire the timer when the expiry time is >= now. That's wrong. If
    the timer is set for now, then it must expire.

    Make the condition expiry > now for breaking out the loop.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: stable@kernel.org

    Thomas Gleixner
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Fix loss of notification with multi-event
    perf, x86: Force IBS LVT offset assignment for family 10h
    perf, x86: Disable PEBS on SandyBridge chips
    trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
    perf session: Fix crash with invalid CPU list
    perf python: Fix undefined symbol problem
    perf/x86: Enable raw event access to Intel offcore events
    perf: Don't use -ENOSPC for out of PMU resources
    perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
    perf/x86: Fix PEBS instruction unwind
    oprofile, x86: Fix crash when unloading module (nmi timer mode)
    oprofile: Fix crash when unloading module (hr timer mode)

    Linus Torvalds
     
  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clockevents: Set noop handler in clockevents_exchange_device()
    tick-broadcast: Stop active broadcast device when replacing it
    clocksource: Fix bug with max_deferment margin calculation
    rtc: Fix some bugs that allowed accumulating time drift in suspend/resume
    rtc: Disable the alarm in the hardware

    Linus Torvalds
     
  • …ernel.org/pub/scm/linux/kernel/git/tip/tip

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    slab, lockdep: Fix silly bug

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Fix race condition when stopping the irq thread

    Linus Torvalds