22 Jul, 2011

1 commit

  • PMU type id can be allocated dynamically, so perf_event_attr::type check
    when copying attribute from userspace to kernel is not valid.

    Signed-off-by: Lin Ming
    Cc: Robert Richter
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309421396-17438-4-git-send-email-ming.m.lin@intel.com
    Signed-off-by: Ingo Molnar

    Lin Ming
     

21 Jul, 2011

2 commits


16 Jul, 2011

4 commits

  • Since the address of a module-local variable can only be
    solved after the target module is loaded, the symbol
    fetch-argument should be updated when loading target
    module.

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • To support probing module init functions, kprobe-tracer allows
    user to define a probe on non-existed function when it is given
    with a module name. This also enables user to set a probe on
    a function on a specific module, even if a same name (but different)
    function is locally defined in another module.

    The module name must be in the front of function name and separated
    by a ':'. e.g. btrfs:btrfs_init_sysfs

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Return -ENOENT if probe point doesn't exist, but still returns
    -EINVAL if both of kprobe->addr and kprobe->symbol_name are
    specified or both are not specified.

    Acked-by: Ananth N Mavinakayanahalli
    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Merge redundant enable/disable functions into enable_trace_probe()
    and disable_trace_probe().

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: yrl.pp-manager.tt@hitachi.com
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15

    [ converted kprobe selftest to use enable_trace_probe ]

    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     

15 Jul, 2011

4 commits

  • Enabling function tracer to trace all functions, then load a module and
    then disable function tracing will cause ftrace to fail.

    This can also happen by enabling function tracing on the command line:

    ftrace=function

    and during boot up, modules are loaded, then you disable function tracing
    with 'echo nop > current_tracer' you will trigger a bug in ftrace that
    will shut itself down.

    The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
    are registered for tracing. When one or more ftrace_ops are registered,
    all the records that represent the functions that the ftrace_ops will
    trace have a ref count incremented. If this ref count is not zero,
    when the code modification runs, that function will be enabled for tracing.
    If the ref count is zero, that function will be disabled from tracing.

    To make sure the accounting was working, FTRACE_WARN_ON()s were added
    to updating of the ref counts.

    If the ref count hits its max (> 2^30 ftrace_ops added), or if
    the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
    disables all modification of code.

    Since it is common for ftrace_ops to trace all functions in the kernel,
    instead of creating > 20,000 hash items for the ftrace_ops, the hash
    count is just set to zero, and it represents that the ftrace_ops is
    to trace all functions. This is where the issues arrise.

    If you enable function tracing to trace all functions, and then add
    a module, the modules function records do not get the ref count updated.
    When the function tracer is disabled, all function records ref counts
    are subtracted. Since the modules never had their ref counts incremented,
    they go below zero and the FTRACE_WARN_ON() is triggered.

    The solution to this is rather simple. When modules are loaded, and
    their functions are added to the the ftrace pool, look to see if any
    ftrace_ops are registered that trace all functions. And for those,
    update the ref count for the module function records.

    Reported-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Rename probe_* to trace_probe_* for avoiding namespace
    confliction. This also fixes improper names of find_probe_event()
    and cleanup_all_probes() to find_trace_probe() and
    release_all_trace_probes() respectively.

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Instead of hw_nmi_watchdog_set_attr() weak function
    and appropriate x86_pmu::hw_watchdog_set_attr() call
    we introduce even alias mechanism which allow us
    to drop this routines completely and isolate quirks
    of Netburst architecture inside P4 PMU code only.

    The main idea remains the same though -- to allow
    nmi-watchdog and perf top run simultaneously.

    Note the aliasing mechanism applies to generic
    PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
    event (say passed as RAW initially) might have some
    additional bits set inside ESCR register changing
    the behaviour of event and we can't guarantee anymore
    that alias event will give the same result.

    P.S. Thanks a huge to Don and Steven for for testing
    and early review.

    Acked-by: Don Zickus
    Tested-by: Steven Rostedt
    Signed-off-by: Cyrill Gorcunov
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Stephane Eranian
    CC: Lin Ming
    CC: Arnaldo Carvalho de Melo
    CC: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
    Signed-off-by: Steven Rostedt

    Cyrill Gorcunov
     
  • Currently the stack trace per event in ftace is only 8 frames.
    This can be quite limiting and sometimes useless. Especially when
    the "ignore frames" is wrong and we also use up stack frames for
    the event processing itself.

    Change this to be dynamic by adding a percpu buffer that we can
    write a large stack frame into and then copy into the ring buffer.

    For interrupts and NMIs that come in while another event is being
    process, will only get to use the 8 frame stack. That should be enough
    as the task that it interrupted will have the full stack frame anyway.

    Requested-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Jul, 2011

3 commits

  • Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
    fail the dynamic ftrace selftest.

    The function tracer has a quick 'off' variable that will prevent
    the call back functions from being called. This variable is called
    function_trace_stop. In x86, this is implemented directly in the mcount
    assembly, but for other archs, an intermediate function is used called
    ftrace_test_stop_func().

    In dynamic ftrace, the function pointer variable ftrace_trace_function is
    used to update the caller code in the mcount caller. But for archs that
    do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
    ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.

    When more than one ftrace_ops is registered, the function it calls is
    ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
    and call the callbacks that have their hash matching.

    The issue happens when two ftrace_ops are registered for different functions
    and one is then unregistered. The __ftrace_trace_function is then pointed
    to the remaining ftrace_ops callback function directly. This mean it will
    be called for all functions that were registered to trace by both ftrace_ops
    that were registered.

    This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
    because the update of ftrace_trace_function doesn't happen until after all
    functions have been updated, and then the mcount caller is updated. But
    for those archs that do use the ftrace_test_stop_func(), the update is
    immediate.

    The dynamic selftest fails because it hits this situation, and the
    ftrace_ops that it registers fails to only trace what it was suppose to
    and instead traces all other functions.

    The solution is to delay the setting of __ftrace_trace_function until
    after all the functions have been updated according to the registered
    ftrace_ops. Also, function_trace_stop is set during the update to prevent
    function tracing from calling code that is caused by the function tracer
    itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, if set_ftrace_filter() is called when the ftrace_ops is
    active, the function filters will not be updated. They will only be updated
    when tracing is disabled and re-enabled.

    Update the functions immediately during set_ftrace_filter().

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Whenever the hash of the ftrace_ops is updated, the record counts
    must be balance. This requires disabling the records that are set
    in the original hash, and then enabling the records that are set
    in the updated hash.

    Moving the update into ftrace_hash_move() removes the bug where the
    hash was updated but the records were not, which results in ftrace
    triggering a warning and disabling itself because the ftrace_ops filter
    is updated while the ftrace_ops was registered, and then the failure
    happens when the ftrace_ops is unregistered.

    The current code will not trigger this bug, but new code will.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jul, 2011

2 commits

  • When I mounted an NFS directory, it caused several modules to be loaded. At the
    time I was running the preemptirqsoff tracer, and it showed the following
    output:

    # tracer: preemptirqsoff
    #
    # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
    # --------------------------------------------------------------------
    # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: ftrace_module_notify
    # => ended at: ftrace_module_notify
    #
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /_--=> lock-depth
    # |||||/ delay
    # cmd pid |||||| time | caller
    # \ / |||||| \ | /
    modprobe-19370 3d.... 0us!: ftrace_process_locs
    => ftrace_process_locs
    => ftrace_module_notify
    => notifier_call_chain
    => __blocking_notifier_call_chain
    => blocking_notifier_call_chain
    => sys_init_module
    => system_call_fastpath

    That's over 1ms that interrupts are disabled on a Real-Time kernel!

    Looking at the cause (being the ftrace author helped), I found that the
    interrupts are disabled before the code modification of mcounts into nops. The
    interrupts only need to be disabled on start up around this code, not when
    modules are being loaded.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If a function is set to be traced by the set_graph_function, but the
    option funcgraph-irqs is zero, and the traced function happens to be
    called from a interrupt, it will not be traced.

    The point of funcgraph-irqs is to not trace interrupts when we are
    preempted by an irq, not to not trace functions we want to trace that
    happen to be *in* a irq.

    Luckily the current->trace_recursion element is perfect to add a flag
    to help us be able to trace functions within an interrupt even when
    we are not tracing interrupts that preempt the trace.

    Reported-by: Heiko Carstens
    Tested-by: Heiko Carstens
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

07 Jul, 2011

3 commits

  • The new code that allows different utilities to pick and choose
    what functions they trace broke the :mod: hook that allows users
    to trace only functions of a particular module.

    The reason is that the :mod: hook bypasses the hash that is setup
    to allow individual users to trace their own functions and uses
    the global hash directly. But if the global hash has not been
    set up, it will cause a bug:

    echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter

    produces:

    [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
    [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip
    BUG: unable to handle kernel paging request at ffffffff8160ec90
    IP: [] add_hash_entry+0x66/0xd0
    PGD 1a05067 PUD 1a09063 PMD 80000000016001e1
    Oops: 0003 [#1] SMP Jul 7 04:02:28 phyllis kernel: [55303.858604] CPU 1
    Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt

    Pid: 10344, comm: bash Tainted: G WC 3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ
    RIP: 0010:[] [] add_hash_entry+0x66/0xd0
    RSP: 0018:ffff88003a96bda8 EFLAGS: 00010246
    RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940
    RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78
    R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000
    FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470)
    Stack:
    0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5
    ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80
    ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00
    Call Trace:
    [] match_records+0x155/0x1b0
    [] ftrace_mod_callback+0xbc/0x100
    [] ftrace_regex_write+0x16f/0x210
    [] ftrace_filter_write+0xf/0x20
    [] vfs_write+0xc8/0x190
    [] sys_write+0x51/0x90
    [] system_call_fastpath+0x16/0x1b
    Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2
    RIP [] add_hash_entry+0x66/0xd0
    RSP
    CR2: ffffffff8160ec90
    ---[ end trace a5d031828efdd88e ]---

    Reported-by: Brian Marete
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The "enable" file for the event system can be removed when a module
    is unloaded and the event system only has events from that module.
    As the event system nr_events count goes to zero, it may be freed
    if its ref_count is also set to zero.

    Like the "filter" file, the "enable" file may be opened by a task and
    referenced later, after a module has been unloaded and the events for
    that event system have been removed.

    Although the "filter" file referenced the event system structure,
    the "enable" file only references a pointer to the event system
    name. Since the name is freed when the event system is removed,
    it is possible that an access to the "enable" file may reference
    a freed pointer.

    Update the "enable" file to use the subsystem_open() routine that
    the "filter" file uses, to keep a reference to the event system
    structure while the "enable" file is opened.

    Cc:
    Reported-by: Johannes Berg
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The event system is freed when its nr_events is set to zero. This happens
    when a module created an event system and then later the module is
    removed. Modules may share systems, so the system is allocated when
    it is created and freed when the modules are unloaded and all the
    events under the system are removed (nr_events set to zero).

    The problem arises when a task opened the "filter" file for the
    system. If the module is unloaded and it removed the last event for
    that system, the system structure is freed. If the task that opened
    the filter file accesses the "filter" file after the system has
    been freed, the system will access an invalid pointer.

    By adding a ref_count, and using it to keep track of what
    is using the event system, we can free it after all users
    are finished with the event system.

    Cc:
    Reported-by: Johannes Berg
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

05 Jul, 2011

1 commit


01 Jul, 2011

10 commits

  • KVM needs one-shot samples, since a PMC programmed to -X will fire after X
    events and then again after 2^40 events (i.e. variable period).

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • Since only samples call perf_output_sample() its much saner (and more
    correct) to put the sample logic in there than in the
    perf_output_begin()/perf_output_end() pair.

    Saves a useless argument, reduces conditionals and shrinks
    struct perf_output_handle, win!

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Due to restriction and specifics of Netburst PMU we need a separated
    event for NMI watchdog. In particular every Netburst event
    consumes not just a counter and a config register, but also an
    additional ESCR register.

    Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied
    for some event there is no room for another event to enter until its
    released) we need to pick up the "least" used ESCR (or the most available
    one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen.

    With this patch nmi-watchdog and perf top should be able to run simultaneously.

    Signed-off-by: Cyrill Gorcunov
    CC: Lin Ming
    CC: Arnaldo Carvalho de Melo
    CC: Frederic Weisbecker
    Tested-and-reviewed-by: Don Zickus
    Tested-and-reviewed-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     
  • The event tracing infrastructure exposes two timers which should be updated
    each time the value of the counter is updated. Currently, these counters are
    only updated when userspace calls read() on the fd associated with an event.
    This means that counters which are read via the mmap'd page exclusively never
    have their timers updated. This patch adds ensures that the timers are updated
    each time the values in the mmap'd page are updated.

    Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Take the timer calculation from perf_output_read and move it to a helper
    function for any place that needs timer values but cannot take the ctx->lock.

    Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the
    ring-buffer logic: make perf_buffer_alloc() do everything needed"),
    the perf_buffer_init_code() has been mis-setting the buffer watermark
    if perf_event_attr.wakeup_events has a non-zero value.

    This is because perf_event_attr.wakeup_events is a union with
    perf_event_attr.wakeup_watermark.

    This commit re-enables the check for perf_event_attr.watermark being
    set before continuing with setting a non-default watermark.

    This bug is most noticable when you are trying to use PERF_IOC_REFRESH
    with a value larger than one and perf_event_attr.wakeup_events is set to
    one. In this case the buffer watermark will be set to 1 and you will
    get extraneous POLL_IN overflows rather than POLL_HUP as expected.

    [ avoid using attr.wakeup_events when attr.watermark is set ]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Cc:
    Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Merge reason: Pick up the latest fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Jun, 2011

1 commit

  • The jump labels entries for modules do not stop at __stop__jump_table,
    but after mod->jump_entries + mod_num_jump_entries.

    By checking the wrong end point, module trace events never get enabled.

    Cc: Ingo Molnar
    Acked-by: Jason Baron
    Tested-by: Avi Kivity
    Tested-by: Johannes Berg
    Signed-off-by: Xiao Guangrong
    Link: http://lkml.kernel.org/r/4E00038B.2060404@cn.fujitsu.com
    Signed-off-by: Steven Rostedt

    Xiao Guangrong
     

28 Jun, 2011

1 commit

  • Currently a single process may register exit handlers unlimited times.
    It may lead to a bloated listeners chain and very slow process
    terminations.

    Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
    kernel memory is stolen for the handlers chain and "time id" shows 2-7
    seconds instead of normal 0.003. It makes it possible to exhaust all
    kernel memory and to eat much of CPU time by triggerring numerous exits
    on a single CPU.

    The patch limits the number of times a single process may register
    itself on a single CPU to one.

    One little issue is kept unfixed - as taskstats_exit() is called before
    exit_files() in do_exit(), the orphaned listener entry (if it was not
    explicitly deregistered) is kept until the next someone's exit() and
    implicit deregistration in send_cpu_listeners(). So, if a process
    registered itself as a listener exits and the next spawned process gets
    the same pid, it would inherit taskstats attributes.

    Signed-off-by: Vasiliy Kulikov
    Cc: Balbir Singh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

25 Jun, 2011

1 commit


22 Jun, 2011

3 commits

  • Toralf Förster and Richard Weinberger noted that if there is
    no RTC device, the alarm timers core prints out an annoying
    "ALARM timers will not wake from suspend" message.

    This warning has been removed in a previous patch, however
    the issue still remains: The original idea was to support
    alarm timers even if there was no rtc device, as long as the
    system didn't go into suspend.

    However, after further consideration, communicating to the application
    that alarmtimers are not fully functional seems like the better
    solution.

    So this patch makes it so we return -ENOTSUPP to any posix _ALARM
    clockid calls if there is no backing RTC device on the system.

    Further this changes the behavior where when there is no rtc device
    we will check for one on clock_getres, clock_gettime, timer_create,
    and timer_nsleep instead of on suspend.

    CC: Toralf Förster
    CC: Richard Weinberger
    CC: Thomas Gleixner
    Reported-by: Toralf Förster
    Reported by: Richard Weinberger
    Signed-off-by: John Stultz

    John Stultz
     
  • The alarmtimers code currently picks a rtc device to use at
    late init time. However, if your rtc driver is loaded as a module,
    it may be registered after the alarmtimers late init code, leaving
    the alarmtimers nonfunctional.

    This patch moves the the rtcdevice selection to when we actually try
    to use it, allowing us to make use of rtc modules that may have been
    loaded at any point since bootup.

    CC: Thomas Gleixner
    CC: Meelis Roos
    Reported-by: Meelis Roos
    Signed-off-by: John Stultz

    John Stultz
     
  • When opening /dev/snapshot device, snapshot_open() creates memory
    bitmaps which are freed in snapshot_release(). But if any of the
    callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
    fails, snapshot_release() is never called and bitmaps are not freed.
    Next attempt to open /dev/snapshot then triggers BUG_ON() check in
    create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
    is active on s390x.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Michal Kubecek
     

20 Jun, 2011

1 commit

  • …-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tools/perf: Fix static build of perf tool
    tracing: Fix regression in printk_formats file

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Make watchdog robust vs. interruption
    timerfd: Fix wakeup of processes when timer is cancelled on clock change

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, MAINTAINERS: Add x86 MCE people
    x86, efi: Do not reserve boot services regions within reserved areas

    Linus Torvalds
     

19 Jun, 2011

1 commit


18 Jun, 2011

1 commit

  • ____call_usermodehelper() now erases any credentials set by the
    subprocess_inf::init() function. The problem is that commit
    17f60a7da150 ("capabilites: allow the application of capability limits
    to usermode helpers") creates and commits new credentials with
    prepare_kernel_cred() after the call to the init() function. This wipes
    all keyrings after umh_keys_init() is called.

    The best way to deal with this is to put the init() call just prior to
    the commit_creds() call, and pass the cred pointer to init(). That
    means that umh_keys_init() and suchlike can modify the credentials
    _before_ they are published and potentially in use by the rest of the
    system.

    This prevents request_key() from working as it is prevented from passing
    the session keyring it set up with the authorisation token to
    /sbin/request-key, and so the latter can't assume the authority to
    instantiate the key. This causes the in-kernel DNS resolver to fail
    with ENOKEY unconditionally.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Jun, 2011

1 commit

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh