31 Aug, 2013

1 commit

  • Pull networking fixes from David Miller:

    1) There was a simplification in the ipv6 ndisc packet sending
    attempted here, which avoided using memory accounting on the
    per-netns ndisc socket for sending NDISC packets. It did fix some
    important issues, but it causes regressions so it gets reverted here
    too. Specifically, the problem with this change is that the IPV6
    output path really depends upon there being a valid skb->sk
    attached.

    The reason we want to do this change in some form when we figure out
    how to do it right, is that if a device goes down the ndisc_sk
    socket send queue will fill up and block NDISC packets that we want
    to send to other devices too. That's really bad behavior.

    Hopefully Thomas can come up with a better version of this change.

    2) Fix a severe TCP performance regression by reverting a change made
    to dev_pick_tx() quite some time ago. From Eric Dumazet.

    3) TIPC returns wrongly signed error codes, fix from Erik Hugne.

    4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
    skb->sk too early. Fix from Li Hongjun.

    5) RAW ipv4 sockets can use the wrong routing key during lookup, from
    Chris Clark.

    6) Similar to #1 revert an older change that tried to use plain
    alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
    mark which needs to see the skb->sk for such frames. From Phil
    Oester.

    7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
    specifically in the handling of virtual functions.

    8) IPSEC path error propagations to sockets is not done properly when
    we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic
    Sowa.

    9) Fix missing channel context release in mac80211, from Johannes Berg.

    10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy
    Lutomirski.

    11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
    drivers. From Michal Schmidt.

    12) Hopefully a complete and correct fix for the genetlink dump locking
    and module reference counting. From Pravin B Shelar.

    13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.

    14) Fix handling of timestamp offset when restoring a snapshotted TCP
    socket. From Andrew Vagin.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
    net: fec: fix time stamping logic after napi conversion
    net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
    mISDN: return -EINVAL on error in dsp_control_req()
    net: revert 8728c544a9c ("net: dev_pick_tx() fix")
    Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
    ipv4 tunnels: fix an oops when using ipip/sit with IPsec
    tipc: set sk_err correctly when connection fails
    tcp: tcp_make_synack() should use sock_wmalloc
    bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
    ipv6: Don't depend on per socket memory for neighbour discovery messages
    ipv4: sendto/hdrincl: don't use destination address found in header
    tcp: don't apply tsoffset if rcv_tsecr is zero
    tcp: initialize rcv_tstamp for restored sockets
    net: xilinx: fix memleak
    net: usb: Add HP hs2434 device to ZLP exception table
    net: add cpu_relax to busy poll loop
    net: stmmac: fixed the pbl setting with DT
    genl: Hold reference on correct module while netlink-dump.
    genl: Fix genl dumpit() locking.
    xfrm: Fix potential null pointer dereference in xdst_queue_output
    ...

    Linus Torvalds
     

30 Aug, 2013

1 commit

  • Pull cgroup fix from Tejun Heo:
    "During the percpu reference counting update which was merged during
    v3.11-rc1, the cgroup destruction path was updated so that a cgroup in
    the process of dying may linger on the children list, which was
    necessary as the cgroup should still be included in child/descendant
    iteration while percpu ref is being killed.

    Unfortunately, I forgot to update cgroup destruction path accordingly
    and cgroup destruction may fail spuriously with -EBUSY due to
    lingering dying children even when there's no live child left - e.g.
    "rmdir parent/child parent" will usually fail.

    This can be easily fixed by iterating through the children list to
    verify that there's no live child left. While this is very late in
    the release cycle, this bug is very visible to userland and I believe
    the fix is relatively safe.

    Thanks Hugh for spotting and providing fix for the issue"

    * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix rmdir EBUSY regression in 3.11

    Linus Torvalds
     

29 Aug, 2013

3 commits

  • On 3.11-rc we are seeing cgroup directories left behind when they should
    have been removed. Here's a trivial reproducer:

    cd /sys/fs/cgroup/memory
    mkdir parent parent/child; rmdir parent/child parent
    rmdir: failed to remove `parent': Device or resource busy

    It's because cgroup_destroy_locked() (step 1 of destruction) leaves
    cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
    destruction) remove it; but step 2 is run by work queue, which may not
    yet have removed the children when parent destruction checks the list.

    Fix that by checking through a non-empty list of children: if every one
    of them has already been marked CGRP_DEAD, then it's safe to proceed:
    those children are invisible to userspace, and should not obstruct rmdir.

    (I didn't see any reason to keep the cgrp->children checks under the
    unrelated css_set_lock, so moved them out.)

    tj: Flattened nested ifs a bit and updated comment so that it's
    correct on both for-3.11-fixes and for-3.12.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Tejun Heo

    Hugh Dickins
     
  • If !PREEMPT, a kworker running work items back to back can hog CPU.
    This becomes dangerous when a self-requeueing work item which is
    waiting for something to happen races against stop_machine. Such
    self-requeueing work item would requeue itself indefinitely hogging
    the kworker and CPU it's running on while stop_machine would wait for
    that CPU to enter stop_machine while preventing anything else from
    happening on all other CPUs. The two would deadlock.

    Jamie Liu reports that this deadlock scenario exists around
    scsi_requeue_run_queue() and libata port multiplier support, where one
    port may exclude command processing from other ports. With the right
    timing, scsi_requeue_run_queue() can end up requeueing itself trying
    to execute an IO which is asked to be retried while another device has
    an exclusive access, which in turn can't make forward progress due to
    stop_machine.

    Fix it by invoking cond_resched() after executing each work item.

    Signed-off-by: Tejun Heo
    Reported-by: Jamie Liu
    References: http://thread.gmane.org/gmane.linux.kernel/1552567
    Cc: stable@vger.kernel.org
    --
    kernel/workqueue.c | 9 +++++++++
    1 file changed, 9 insertions(+)

    Tejun Heo
     
  • Correct an issue with /proc/timer_list reported by Holger.

    When reading from the proc file with a sufficiently small buffer, 2k so
    not really that small, there was one could get hung trying to read the
    file a chunk at a time.

    The timer_list_start function failed to account for the possibility that
    the offset was adjusted outside the timer_list_next.

    Signed-off-by: Nathan Zimmer
    Reported-by: Holger Hans Peter Freyther
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Berke Durak
    Cc: Jeff Layton
    Tested-by: Al Viro
    Cc: # 3.10.x
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Zimmer
     

28 Aug, 2013

1 commit


24 Aug, 2013

1 commit

  • Pull cgroup fix from Tejun Heo:
    "A late fix for cgroup.

    This fixes a behavior regression visible to userland which was created
    by a commit merged during -rc1. While the behavior change isn't too
    likely to be noticeable, the fix is relatively low risk and we'll need
    to backport it through -stable anyway if the bug gets released"

    * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cpuset: fix a regression in validating config change

    Linus Torvalds
     

21 Aug, 2013

1 commit

  • It's not allowed to clear masks of a cpuset if there're tasks in it,
    but it's broken:

    # mkdir /cgroup/sub
    # echo 0 > /cgroup/sub/cpuset.cpus
    # echo 0 > /cgroup/sub/cpuset.mems
    # echo $$ > /cgroup/sub/tasks
    # echo > /cgroup/sub/cpuset.cpus
    (should fail)

    This bug was introduced by commit 88fa523bff295f1d60244a54833480b02f775152
    ("cpuset: allow to move tasks to empty cpusets").

    tj: Dropped temp bool variables and nestes the conditionals directly.

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

20 Aug, 2013

2 commits


18 Aug, 2013

1 commit


17 Aug, 2013

1 commit


15 Aug, 2013

1 commit

  • Merge a bunch of fixes from Andrew Morton.

    * emailed patches from Andrew Morton :
    fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
    arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
    ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
    x86 get_unmapped_area(): use proper mmap base for bottom-up direction
    ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
    ocfs2: Revert 40bd62e to avoid regression in extended allocation
    drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
    hugetlb: fix lockdep splat caused by pmd sharing
    aoe: adjust ref of head for compound page tails
    microblaze: fix clone syscall
    mm: save soft-dirty bits on file pages
    mm: save soft-dirty bits on swapped pages
    memcg: don't initialize kmem-cache destroying work for root caches

    Linus Torvalds
     

14 Aug, 2013

3 commits

  • Fix inadvertent breakage in the clone syscall ABI for Microblaze that
    was introduced in commit f3268edbe6fe ("microblaze: switch to generic
    fork/vfork/clone").

    The Microblaze syscall ABI for clone takes the parent tid address in the
    4th argument; the third argument slot is used for the stack size. The
    incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
    slot.

    This commit restores the original ABI so that existing userspace libc
    code will work correctly.

    All kernel versions from v3.8-rc1 were affected.

    Signed-off-by: Michal Simek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Simek
     
  • Pull scheduler fixes from Ingo Molnar:
    "Docbook fixes that make 99% of the diffstat, plus a oneliner fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks
    sched: Fix some kernel-doc warnings

    Linus Torvalds
     
  • pm_qos_update_request_timeout() updates a qos and then schedules
    a delayed work item to bring the qos back down to the default
    after the timeout. When the work item runs, pm_qos_work_fn() will
    call pm_qos_update_request() and deadlock because it tries to
    cancel itself via cancel_delayed_work_sync(). Future callers of
    that qos will also hang waiting to cancel the work that is
    canceling itself. Let's extract the little bit of code that does
    the real work of pm_qos_update_request() and call it from the
    work function so that we don't deadlock.

    Before ed1ac6e (PM: don't use [delayed_]work_pending()) this didn't
    happen because the work function wouldn't try to cancel itself.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Tejun Heo
    Cc: 3.9+ # 3.9+
    Signed-off-by: Rafael J. Wysocki

    Stephen Boyd
     

13 Aug, 2013

4 commits

  • This is only theoretical, but after try_to_wake_up(p) was changed
    to check p->state under p->pi_lock the code like

    __set_current_state(TASK_INTERRUPTIBLE);
    schedule();

    can miss a signal. This is the special case of wait-for-condition,
    it relies on try_to_wake_up/schedule interaction and thus it does
    not need mb() between __set_current_state() and if(signal_pending).

    However, this __set_current_state() can move into the critical
    section protected by rq->lock, now that try_to_wake_up() takes
    another lock we need to ensure that it can't be reordered with
    "if (signal_pending(current))" check inside that section.

    The patch is actually one-liner, it simply adds smp_wmb() before
    spin_lock_irq(rq->lock). This is what try_to_wake_up() already
    does by the same reason.

    We turn this wmb() into the new helper, smp_mb__before_spinlock(),
    for better documentation and to allow the architectures to change
    the default implementation.

    While at it, kill smp_mb__after_lock(), it has no callers.

    Perhaps we can also add smp_mb__before/after_spinunlock() for
    prepare_to_wait().

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Writing to this file always returns -ENODEV:

    # echo 1 > cpuset.memory_pressure_enabled
    -bash: echo: write error: No such device

    Signed-off-by: Li Zefan
    Cc: # 3.9+
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • Pull w/w mutex deadlock injection fix from Ingo Molnar.

    This bug made the CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y option largely
    useless, but wouldn't affect normal users.

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    mutex: Fix w/w mutex deadlock injection

    Linus Torvalds
     
  • Pull small fix for v3.11 from John Stultz.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

09 Aug, 2013

1 commit


08 Aug, 2013

1 commit

  • …t/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Oleg Nesterov has been working hard in closing all the holes that can
    lead to race conditions between deleting an event and accessing an
    event debugfs file. This included a fix to the debugfs system (acked
    by Greg Kroah-Hartman). We think that all the holes have been patched
    and hopefully we don't find more. I haven't marked all of them for
    stable because I need to examine them more to figure out how far back
    some of the changes need to go.

    Along the way, some other fixes have been made. Alexander Z Lam fixed
    some logic where the wrong buffer was being modifed.

    Andrew Vagin found a possible corruption for machines that actually
    allocate cpumask, as a reference to one was being zeroed out by
    mistake.

    Dhaval Giani found a bad prototype when tracing is not configured.

    And I not only had some changes to help Oleg, but also finally fixed a
    long standing bug that Dave Jones and others have been hitting, where
    a module unload and reload can cause the function tracing accounting
    to get screwed up"

    * tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix reset of time stamps during trace_clock changes
    tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer
    tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set
    tracing: Fix fields of struct trace_iterator that are zeroed by mistake
    tracing/uprobes: Fail to unregister if probe event files are in use
    tracing/kprobes: Fail to unregister if probe event files are in use
    tracing: Add comment to describe special break case in probe_remove_event_call()
    tracing: trace_remove_event_call() should fail if call/file is in use
    debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)
    ftrace: Check module functions being traced on reload
    ftrace: Consolidate some duplicate code for updating ftrace ops
    tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private
    tracing: Introduce remove_event_file_dir()
    tracing: Change f_start() to take event_mutex and verify i_private != NULL
    tracing: Change event_filter_read/write to verify i_private != NULL
    tracing: Change event_enable/disable_read() to verify i_private != NULL
    tracing: Turn event/id->i_private into call->event.type

    Linus Torvalds
     

07 Aug, 2013

5 commits

  • Pull cgroup fix from Tejun Heo:
    "Fix for a minor memory leak bug in the cgroup init failure path"

    * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix a leak when percpu_ref_init() fails

    Linus Torvalds
     
  • Pull two workqueue fixes from Tejun Heo:
    "A lockdep notation update so that nested work_on_cpu() invocations
    don't lead to spurious lockdep warnings and fix for an unbound attr
    bug which made what's shown in sysfs deviate from the actual ones.
    Both patches have pretty limited scope"

    * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: copy workqueue_attrs with all fields
    workqueue: allow work_on_cpu() to be called recursively

    Linus Torvalds
     
  • Some of my configs I test with have CONFIG_A11Y_BRAILLE_CONSOLE set.
    When I started testing against v3.11-rc4 my console went bonkers. Using
    ktest to bisect the issue, it came down to:

    commit bbeddf52a "printk: move braille console support into separate
    braille.[ch] files"

    Looking into the patch I found the problem. It's with the return of
    braille_register_console(). As anything other than NULL is considered a
    failure.

    But for those of us that have CONFIG_A11Y_BRAILLE_CONSOLE set but do not
    define a "brl" or "brl=" on the command line, we still may want a
    console that those with sight can still use.

    Return NULL (success) if "brl" or "brl=" is not on the console line.

    Signed-off-by: Steven Rostedt
    Acked-by: Joe Perches
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • This reverts commit fab840fc2d542fabcab903db8e03589a6702ba5f.

    This commit even has the test-case to prove that the tracee
    can be killed by SIGTRAP if the debugger does not remove the
    breakpoints before PTRACE_DETACH.

    However, this is exactly what wineserver deliberately does,
    set_thread_context() calls PTRACE_ATTACH + PTRACE_DETACH just
    for PTRACE_POKEUSER(DR*) in between.

    So we should revert this fix and document that PTRACE_DETACH
    should keep the breakpoints.

    Reported-by: Felipe Contreras
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • unshare_userns(new_cred) does *new_cred = prepare_creds() before
    create_user_ns() which can fail. However, the caller expects that
    it doesn't need to take care of new_cred if unshare_userns() fails.

    We could change the single caller, sys_unshare(), but I think it
    would be more clean to avoid the side effects on failure, so with
    this patch unshare_userns() does put_cred() itself and initializes
    *new_cred only if create_user_ns() succeeeds.

    Cc: stable@vger.kernel.org
    Signed-off-by: Oleg Nesterov
    Reviewed-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

03 Aug, 2013

4 commits

  • Fixed two issues with changing the timestamp clock with trace_clock:

    - The global buffer was reset on instance clock changes. Change this to pass
    the correct per-instance buffer
    - ftrace_now() is used to set buf->time_start in tracing_reset_online_cpus().
    This was incorrect because ftrace_now() used the global buffer's clock to
    return the current time. Change this to use buffer_ftrace_now() which
    returns the current time for the correct per-instance buffer.

    Also removed tracing_reset_current() because it is not used anywhere

    Link: http://lkml.kernel.org/r/1375493777-17261-2-git-send-email-azl@google.com

    Cc: Vaibhav Nagarnaik
    Cc: David Sharp
    Cc: Alexander Z Lam
    Cc: stable@vger.kernel.org # 3.10
    Signed-off-by: Alexander Z Lam
    Signed-off-by: Steven Rostedt

    Alexander Z Lam
     
  • Releasing the free_buffer file in an instance causes the global buffer
    to be stopped when TRACE_ITER_STOP_ON_FREE is enabled. Operate on the
    correct buffer.

    Link: http://lkml.kernel.org/r/1375493777-17261-1-git-send-email-azl@google.com

    Cc: Vaibhav Nagarnaik
    Cc: David Sharp
    Cc: Alexander Z Lam
    Cc: stable@vger.kernel.org # 3.10
    Signed-off-by: Alexander Z Lam
    Signed-off-by: Steven Rostedt

    Alexander Z Lam
     
  • tracing_read_pipe zeros all fields bellow "seq". The declaration contains
    a comment about that, but it doesn't help.

    The first field is "snapshot", it's true when current open file is
    snapshot. Looks obvious, that it should not be zeroed.

    The second field is "started". It was converted from cpumask_t to
    cpumask_var_t (v2.6.28-4983-g4462344), in other words it was
    converted from cpumask to pointer on cpumask.

    Currently the reference on "started" memory is lost after the first read
    from tracing_read_pipe and a proper object will never be freed.

    The "started" is never dereferenced for trace_pipe, because trace_pipe
    can't have the TRACE_FILE_ANNOTATE options.

    Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org

    Cc: stable@vger.kernel.org # 2.6.30
    Signed-off-by: Andrew Vagin
    Signed-off-by: Steven Rostedt

    Andrew Vagin
     
  • Pull ACPI and power management fixes from Rafael Wysocki:

    - Revert two cpuidle commits added during the 3.8 development cycle
    that turn out to have introduced a significant performance regression
    as requested by Jeremy Eder.

    - The recent patches that made the freezer less heavy-weight introduced
    a regression causing user-space-driven hibernation using the ioctl()
    interface to block indefinitely when the hibernate process executes
    try_to_freeze(). Fix from Colin Cross addresses this by adding a
    process flag to mark the hibernate/suspend process to inform the
    freezer that that process should be ignored.

    - One of the recent cpufreq reverts uncovered a problem in the core
    causing the cpufreq driver module refcount to become negative after a
    system suspend-resume cycle. Fix from Rafael J Wysocki.

    - The evaluation of the ACPI battery _BIX method has never worked
    correctly, because the commit that added support for it forgot to
    take the "Revision" field in the return package into account. As a
    result, the reading of battery info doesn't work at all on some
    systems, which is addressed by a fix from Lan Tianyu.

    * tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
    ACPI / battery: Fix parsing _BIX return value
    cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
    Revert "cpuidle: Quickly notice prediction failure for repeat mode"
    Revert "cpuidle: Quickly notice prediction failure in general case"

    Linus Torvalds
     

02 Aug, 2013

1 commit

  • Uprobes suffer the same problem that kprobes have. There's a race between
    writing to the "enable" file and removing the probe. The probe checks for
    it being in use and if it is not, goes about deleting the probe and the
    event that represents it. But the problem with that is, after it checks
    if it is in use it can be enabled, and the deletion of the event (access
    to the probe) will fail, as it is in use. But the uprobe will still be
    deleted. This is a problem as the event can reference the uprobe that
    was deleted.

    The fix is to remove the event first, and check to make sure the event
    removal succeeds. Then it is safe to remove the probe.

    When the event exists, either ftrace or perf can enable the probe and
    prevent the event from being removed.

    Link: http://lkml.kernel.org/r/20130704034038.991525256@goodmis.org

    Acked-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

01 Aug, 2013

8 commits

  • $echo '0' > /sys/bus/workqueue/devices/xxx/numa
    $cat /sys/bus/workqueue/devices/xxx/numa

    I got 1. It should be 0, the reason is copy_workqueue_attrs() called
    in apply_workqueue_attrs() doesn't copy no_numa field.

    Fix it by making copy_workqueue_attrs() copy ->no_numa too. This
    would also make get_unbound_pool() set a pool's ->no_numa attribute
    according to the workqueue attributes used when the pool was created.
    While harmelss, as ->no_numa isn't a pool attribute, this is a bit
    confusing. Clear it explicitly.

    tj: Updated description and comments a bit.

    Signed-off-by: Shaohua Li
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Shaohua Li
     
  • When a probe is being removed, it cleans up the event files that correspond
    to the probe. But there is a race between writing to one of these files
    and deleting the probe. This is especially true for the "enable" file.

    CPU 0 CPU 1
    ----- -----

    fd = open("enable",O_WRONLY);

    probes_open()
    release_all_trace_probes()
    unregister_trace_probe()
    if (trace_probe_is_enabled(tp))
    return -EBUSY

    write(fd, "1", 1)
    __ftrace_set_clr_event()
    call->class->reg()
    (kprobe_register)
    enable_trace_probe(tp)

    __unregister_trace_probe(tp);
    list_del(&tp->list)
    unregister_probe_event(tp) class->unreg
    (kprobe_register)
    disable_trace_probe(tp) ] probes_open+0x3b/0xa7
    PGD 7808a067 PUD 0
    Oops: 0000 [#1] PREEMPT SMP
    Dumping ftrace buffer:
    ---------------------------------
    Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
    CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
    Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
    task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
    RIP: 0010:[] [] probes_open+0x3b/0xa7
    RSP: 0018:ffff880076e53c38 EFLAGS: 00010203
    RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
    RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
    RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
    R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
    R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
    FS: 00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
    Stack:
    ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
    ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
    ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
    Call Trace:
    [] ? security_file_open+0x2c/0x30
    [] ? unregister_trace_probe+0x4b/0x4b
    [] do_dentry_open+0x162/0x226
    [] finish_open+0x46/0x54
    [] do_last+0x7f6/0x996
    [] ? inode_permission+0x42/0x44
    [] path_openat+0x232/0x496
    [] do_filp_open+0x3a/0x8a
    [] ? __alloc_fd+0x168/0x17a
    [] do_sys_open+0x70/0x102
    [] ? trace_hardirqs_on_caller+0x160/0x197
    [] SyS_open+0x1e/0x20
    [] system_call_fastpath+0x16/0x1b
    Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
    RIP [] probes_open+0x3b/0xa7
    RSP
    CR2: 00000005000000f9
    ---[ end trace 35f17d68fc569897 ]---

    The unregister_trace_probe() must be done first, and if it fails it must
    fail the removal of the kprobe.

    Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
    to allow moving the unregister_probe_event() before the removal of
    the probe and exit the function if it fails. This prevents the tp
    structure from being used after it is freed.

    Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.org

    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Merge more patches from Andrew Morton:
    "A bunch of fixes.

    Plus Joe's printk move and rework. It's not a -rc3 thing but now
    would be a nice time to offload it, while things are quiet. I've been
    sitting on it all for a couple of weeks, no issues"

    * emailed patches from Andrew Morton :
    vmpressure: make sure there are no events queued after memcg is offlined
    vmpressure: do not check for pending work to prevent from new work
    vmpressure: change vmpressure::sr_lock to spinlock
    printk: rename struct log to struct printk_log
    printk: use pointer for console_cmdline indexing
    printk: move braille console support into separate braille.[ch] files
    printk: add console_cmdline.h
    printk: move to separate directory for easier modification
    drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created
    mm: zbud: fix condition check on allocation size
    thp, mm: avoid PageUnevictable on active/inactive lru lists
    mm/swap.c: clear PageActive before adding pages onto unevictable list
    arch/x86/platform/ce4100/ce4100.c: include reboot.h
    mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
    rapidio: fix use after free in rio_unregister_scan()
    .gitignore: ignore *.lz4 files
    MAINTAINERS: dynamic debug: Jason's not there...
    dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine()
    ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
    mm: mempolicy: fix mbind_range() && vma_adjust() interaction

    Linus Torvalds
     
  • Rename the struct to enable moving portions of
    printk.c to separate files.

    The rename changes output of /proc/vmcoreinfo.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the code a bit more compact by always using a pointer for the active
    console_cmdline.

    Move overly indented code to correct indent level.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Create files with prototypes and static inlines for braille support. Make
    braille_console functions return 1 on success.

    Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
    return value to NULL.

    Signed-off-by: Joe Perches
    Reviewed-by: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add an include file for the console_cmdline struct so that the braille
    console driver can be separated.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make it easier to break up printk into bite-sized chunks.

    Remove printk path/filename from comment.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches