13 Oct, 2010

1 commit

  • Time stamps for the ring buffer are created by the difference between
    two events. Each page of the ring buffer holds a full 64 bit timestamp.
    Each event has a 27 bit delta stamp from the last event. The unit of time
    is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
    happen more than 134 milliseconds apart, a time extend is inserted
    to add more bits for the delta. The time extend has 59 bits, which
    is good for ~18 years.

    Currently the time extend is committed separately from the event.
    If an event is discarded before it is committed, due to filtering,
    the time extend still exists. If all events are being filtered, then
    after ~134 milliseconds a new time extend will be added to the buffer.

    This can only happen till the end of the page. Since each page holds
    a full timestamp, there is no reason to add a time extend to the
    beginning of a page. Time extends can only fill a page that has actual
    data at the beginning, so there is no fear that time extends will fill
    more than a page without any data.

    When reading an event, a loop is made to skip over time extends
    since they are only used to maintain the time stamp and are never
    given to the caller. As a paranoid check to prevent the loop running
    forever, with the knowledge that time extends may only fill a page,
    a check is made that tests the iteration of the loop, and if the
    iteration is more than the number of time extends that can fit in a page
    a warning is printed and the ring buffer is disabled (all of ftrace
    is also disabled with it).

    There is another event type that is called a TIMESTAMP which can
    hold 64 bits of data in the theoretical case that two events happen
    18 years apart. This code has not been implemented, but the name
    of this event exists, as well as the structure for it. The
    size of a TIMESTAMP is 16 bytes, where as a time extend is only
    8 bytes. The macro used to calculate how many time extends can fit on
    a page used the TIMESTAMP size instead of the time extend size
    cutting the amount in half.

    The following test case can easily trigger the warning since we only
    need to have half the page filled with time extends to trigger the
    warning:

    # cd /sys/kernel/debug/tracing/
    # echo function > current_tracer
    # echo 'common_pid < 0' > events/ftrace/function/filter
    # echo > trace
    # echo 1 > trace_marker
    # sleep 120
    # cat trace

    Enabling the function tracer and then setting the filter to only trace
    functions where the process id is negative (no events), then clearing
    the trace buffer to ensure that we have nothing in the buffer,
    then write to trace_marker to add an event to the beginning of a page,
    sleep for 2 minutes (only 35 seconds is probably needed, but this
    guarantees the bug), and then finally reading the trace which will
    trigger the bug.

    This patch fixes the typo and prevents the false positive of that warning.

    Reported-by: Hans J. Koch
    Tested-by: Hans J. Koch
    Cc: Thomas Gleixner
    Cc: Stable Kernel
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

10 Sep, 2010

3 commits

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
    perf symbols: Fix multiple initialization of symbol system
    perf: Fix CPU hotplug
    perf, trace: Fix module leak
    tracing/kprobe: Fix handling of C-unlike argument names
    tracing/kprobes: Fix handling of argument names
    perf probe: Fix handling of arguments names
    perf probe: Fix return probe support
    tracing/kprobe: Fix a memory leak in error case
    tracing: Do not allow llseek to set_ftrace_filter

    Linus Torvalds
     
  • Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
    having properly started the iterator to iterate the hash. This case is
    degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
    to misuse a pointer. This causes a NULL ptr deref with possible security
    implications. Tracked as CVE-2010-3079.

    Cc: Robert Swiecki
    Cc: Eugene Teo
    Cc:
    Signed-off-by: Chris Wright
    Signed-off-by: Steven Rostedt

    Chris Wright
     
  • Commit 1c024eca (perf, trace: Optimize tracepoints by using
    per-tracepoint-per-cpu hlist to track events) caused a module
    refcount leak.

    Reported-And-Tested-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

09 Sep, 2010

2 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    gcc-4.6: kernel/*: Fix unused but set warnings
    mutex: Fix annotations to include it in kernel-locking docbook
    pid: make setpgid() system call use RCU read-side critical section
    MAINTAINERS: Add RCU's public git tree

    Linus Torvalds
     
  • Reading the file set_ftrace_filter does three things.

    1) shows whether or not filters are set for the function tracer
    2) shows what functions are set for the function tracer
    3) shows what triggers are set on any functions

    3 is independent from 1 and 2.

    The way this file currently works is that it is a state machine,
    and as you read it, it may change state. But this assumption breaks
    when you use lseek() on the file. The state machine gets out of sync
    and the t_show() may use the wrong pointer and cause a kernel oops.

    Luckily, this will only kill the app that does the lseek, but the app
    dies while holding a mutex. This prevents anyone else from using the
    set_ftrace_filter file (or any other function tracing file for that matter).

    A real fix for this is to rewrite the code, but that is too much for
    a -rc release or stable. This patch simply disables llseek on the
    set_ftrace_filter() file for now, and we can do the proper fix for the
    next major release.

    Reported-by: Robert Swiecki
    Cc: Chris Wright
    Cc: Tavis Ormandy
    Cc: Eugene Teo
    Cc: vendor-sec@lst.de
    Cc:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Sep, 2010

3 commits

  • Check the argument name whether it is invalid (not C-like symbol name). This
    makes event format simple.

    Reported-by: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Set "argN" name for each argument automatically if it has no specified name.
    Since dynamic trace event(kprobe_events) accepts special characters for its
    argument, its format can show those special characters (e.g. '$', '%', '+').
    However, perf can't parse those format because of the character (especially
    '%') mess up the format. This sets "argX" name for those arguments if user
    omitted the argument names.

    E.g.
    # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events
    # cat tracing/kprobe_events
    p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack

    Reported-by: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Fix a memory leak which happens when a field name conflicts with others. In
    error case, free_trace_probe() will free all arguments until nr_args, so this
    increments nr_args the begining of the loop instead of the end.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     

05 Sep, 2010

1 commit


01 Sep, 2010

1 commit

  • While we are reading trace_stat/functionX and someone just
    disabled function_profile at that time, we can trigger this:

    divide error: 0000 [#1] PREEMPT SMP
    ...
    EIP is at function_stat_show+0x90/0x230
    ...

    This fix just takes the ftrace_profile_lock and checks if
    rec->counter is 0. If it's 0, we know the profile buffer
    has been reset.

    Signed-off-by: Li Zefan
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

25 Aug, 2010

1 commit

  • save_stack_trace() stores the instruction pointer, not the
    function descriptor. On ppc64 the trace stack code currently
    dereferences the instruction pointer and shows 8 bytes of
    instructions in our backtraces:

    # cat /sys/kernel/debug/tracing/stack_trace
    Depth Size Location (26 entries)
    ----- ---- --------
    0) 5424 112 0x6000000048000004
    1) 5312 160 0x60000000ebad01b0
    2) 5152 160 0x2c23000041c20030
    3) 4992 240 0x600000007c781b79
    4) 4752 160 0xe84100284800000c
    5) 4592 192 0x600000002fa30000
    6) 4400 256 0x7f1800347b7407e0
    7) 4144 208 0xe89f0108f87f0070
    8) 3936 272 0xe84100282fa30000

    Since we aren't dealing with function descriptors, use %pS
    instead of %pF to fix it:

    # cat /sys/kernel/debug/tracing/stack_trace
    Depth Size Location (26 entries)
    ----- ---- --------
    0) 5424 112 ftrace_call+0x4/0x8
    1) 5312 160 .current_io_context+0x28/0x74
    2) 5152 160 .get_io_context+0x48/0xa0
    3) 4992 240 .cfq_set_request+0x94/0x4c4
    4) 4752 160 .elv_set_request+0x60/0x84
    5) 4592 192 .get_request+0x2d4/0x468
    6) 4400 256 .get_request_wait+0x7c/0x258
    7) 4144 208 .__make_request+0x49c/0x610
    8) 3936 272 .generic_make_request+0x390/0x434

    Signed-off-by: Anton Blanchard
    Cc: rostedt@goodmis.org
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

16 Aug, 2010

1 commit


14 Aug, 2010

1 commit

  • When userspace code writes non-new-line-terminated string to trace_marker
    file, write handler appends new-line and returns number of bytes written
    to trace buffer, so
    write(fd, "abc", 3) will return 4

    That's unexpected and unfortunately it confuses glibc's fprintf function.

    Example:
    int main() {
    fprintf(stderr, "abc");
    return 0;
    }

    $ gcc test.c -o test
    $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
    $ ./test 2>/sys/kernel/debug/tracing/trace_marker

    results in infinite loop:
    write(fd, "abc", 3) = 4
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    write(fd, "", 1) = 0
    (...)

    ...and kernel trace buffer full of empty markers.

    Fix it by sanitizing write return value.

    Signed-off-by: Marcin Slusarz
    LKML-Reference:
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Marcin Slusarz
     

13 Aug, 2010

1 commit

  • Two new events were added that broke the current format output.

    Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout

    The reason is that their print_fmt exceeded a page size. Since the output
    of the format used simple_read_from_buffer and trace_seq, it was limited
    to a page size in output.

    This patch converts the printing of the format of an event into seq_file,
    which allows greater than a page size to be shown.

    I diffed all event formats comparing the output with and without this
    patch. All matched except for the above two, which showed just:

    FORMAT TOO BIG

    without this patch, but now properly displays the output with this patch.

    v2: Remove updating *pos in seq start function.
    [ Thanks to Li Zefan for pointing that out ]

    Reviewed-by: Li Zefan
    Cc: Martin K. Petersen
    Cc: Kei Tokunaga
    Cc: James Bottomley
    Cc: Tomohiro Kusumi
    Cc: Xiao Guangrong
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

12 Aug, 2010

1 commit

  • Secure discard is the same as discard except that all copies of the
    discarded sectors (perhaps created by garbage collection) must also be
    erased.

    Signed-off-by: Adrian Hunter
    Acked-by: Jens Axboe
    Cc: Kyungmin Park
    Cc: Madhusudhan Chikkature
    Cc: Christoph Hellwig
    Cc: Ben Gardiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Hunter
     

11 Aug, 2010

1 commit

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     

08 Aug, 2010

5 commits

  • * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
    do_coredump: Do not take BKL
    init: Remove the BKL from startup code

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
    workqueue: mark init_workqueues() as early_initcall()
    workqueue: explain for_each_*cwq_cpu() iterators
    fscache: fix build on !CONFIG_SYSCTL
    slow-work: kill it
    gfs2: use workqueue instead of slow-work
    drm: use workqueue instead of slow-work
    cifs: use workqueue instead of slow-work
    fscache: drop references to slow-work
    fscache: convert operation to use workqueue instead of slow-work
    fscache: convert object to use workqueue instead of slow-work
    workqueue: fix how cpu number is stored in work->data
    workqueue: fix mayday_mask handling on UP
    workqueue: fix build problem on !CONFIG_SMP
    workqueue: fix locking in retry path of maybe_create_worker()
    async: use workqueue for worker pool
    workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
    workqueue: implement unbound workqueue
    workqueue: prepare for WQ_UNBOUND implementation
    libata: take advantage of cmwq and remove concurrency limitations
    workqueue: fix worker management invocation without pending works
    ...

    Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
    include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c

    Linus Torvalds
     
  • The blktrace driver currently needs the BKL, but
    we should not need to take that in the block layer,
    so just push it down into the driver itself.

    It is quite likely that the BKL is not actually
    required in blktrace code and could be removed
    in a follow-on patch.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
    struct requests. This allows much easier grepping for different request
    types instead of unwinding through macros.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Aug, 2010

5 commits

  • …x/kernel/git/tip/linux-2.6-tip

    * 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    um: Fix read_persistent_clock fallout
    kgdb: Do not access xtime directly
    powerpc: Clean up obsolete code relating to decrementer and timebase
    powerpc: Rework VDSO gettimeofday to prevent time going backwards
    clocksource: Add __clocksource_updatefreq_hz/khz methods
    x86: Convert common clocksources to use clocksource_register_hz/khz
    timekeeping: Make xtime and wall_to_monotonic static
    hrtimer: Cleanup direct access to wall_to_monotonic
    um: Convert to use read_persistent_clock
    timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
    powerpc: Cleanup xtime usage
    powerpc: Simplify update_vsyscall
    time: Kill off CONFIG_GENERIC_TIME
    time: Implement timespec_add
    x86: Fix vtime/file timestamp inconsistencies

    Trivial conflicts in Documentation/feature-removal-schedule.txt

    Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
    per Thomas' earlier merge commit 47916be4e28c ("Merge branch
    'powerpc.cherry-picks' into timers/clocksource")

    Linus Torvalds
     
  • With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:

    [PATCH]x86: make spurious_fault check correct pte bit

    Function call graph trace with the following will trigger a page fault.

    # cd /sys/kernel/debug/tracing/
    # echo function_graph > current_tracer
    # cat per_cpu/cpu1/trace_pipe_raw > /dev/null

    BUG: unable to handle kernel paging request at ffff880006e99000
    IP: [] rb_event_length+0x1/0x3f
    PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    last sysfs file: /sys/devices/virtual/net/lo/operstate
    CPU 1
    Modules linked in:

    Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
    RIP: 0010:[] [] rb_event_length+0x1/0x3f
    RSP: 0018:ffff880006475e38 EFLAGS: 00010006
    RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
    RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
    RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
    R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
    R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
    FS: 00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
    Stack:
    ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
    ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
    ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
    Call Trace:
    [] ? ring_buffer_read_page+0x15a/0x24a
    [] ? return_to_handler+0x15/0x2f
    [] tracing_buffers_read+0xb9/0x164
    [] vfs_read+0xaf/0x150
    [] return_to_handler+0x0/0x2f
    [] __bad_area_nosemaphore+0x17e/0x1a1
    [] return_to_handler+0x0/0x2f
    [] bad_area_nosemaphore+0x13/0x15
    Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
    RIP [] rb_event_length+0x1/0x3f
    RSP
    CR2: ffff880006e99000
    ---[ end trace a6877bb92ccb36bb ]---

    The root cause is that ring_buffer_read_page() may read out of page
    boundary, because the boundary checking is done after reading. This is
    fixed via doing boundary checking before reading.

    Reported-by: Shaohua Li
    Cc:
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Huang Ying
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
    tracing/kprobes: unregister_trace_probe needs to be called under mutex
    perf: expose event__process function
    perf events: Fix mmap offset determination
    perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
    perf, powerpc: Convert the FSL driver to use local64_t
    perf tools: Don't keep unreferenced maps when unmaps are detected
    perf session: Invalidate last_match when removing threads from rb_tree
    perf session: Free the ref_reloc_sym memory at the right place
    x86,mmiotrace: Add support for tracing STOS instruction
    perf, sched migration: Librarize task states and event headers helpers
    perf, sched migration: Librarize the GUI class
    perf, sched migration: Make the GUI class client agnostic
    perf, sched migration: Make it vertically scrollable
    perf, sched migration: Parameterize cpu height and spacing
    perf, sched migration: Fix key bindings
    perf, sched migration: Ignore unhandled task states
    perf, sched migration: Handle ignored migrate out events
    perf: New migration tool overview
    tracing: Drop cpparg() macro
    perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
    ...

    Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c

    Linus Torvalds
     
  • With CONFIG_DEBUG_PAGEALLOC, I observed an unallocated memory access in
    function_graph trace. It appears we find a small size entry in ring buffer,
    but we access it as a big size entry. The access overflows the page size
    and touches an unallocated page.

    Cc:
    Signed-off-by: Shaohua Li
    LKML-Reference:
    [ Added a comment to explain the problem - SDR ]
    Signed-off-by: Steven Rostedt

    Shaohua Li
     

05 Aug, 2010

2 commits


04 Aug, 2010

1 commit

  • Comment in unregister_trace_probe() says probe_lock will be held when it
    gets called. However there is a case where it might called without the
    probe_lock being held. Also since we are traversing the probe_list and
    deleting an element from the probe_list, probe_lock should be held.

    This was first pointed in uprobes traceevent review by Frederic
    Weisbecker here. (http://lkml.org/lkml/2010/5/12/106)

    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Acked-by: Masami Hiramatsu
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Srikar Dronamraju
    Signed-off-by: Arnaldo Carvalho de Melo

    Srikar Dronamraju
     

02 Aug, 2010

1 commit


27 Jul, 2010

1 commit

  • Now that all arches have been converted over to use generic time via
    clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
    config option and simplify the generic code.

    Signed-off-by: John Stultz
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    John Stultz
     

23 Jul, 2010

2 commits


22 Jul, 2010

1 commit


21 Jul, 2010

4 commits

  • Documentation/trace/ftrace.txt says

    buffer_size_kb:

    This sets or displays the number of kilobytes each CPU
    buffer can hold. The tracer buffers are the same size
    for each CPU. The displayed number is the size of the
    CPU buffer and not total size of all buffers. The
    trace buffers are allocated in pages (blocks of memory
    that the kernel uses for allocation, usually 4 KB in size).
    If the last page allocated has room for more bytes
    than requested, the rest of the page will be used,
    making the actual allocation bigger than requested.
    ( Note, the size may not be a multiple of the page size
    due to buffer management overhead. )

    This can only be updated when the current_tracer
    is set to "nop".

    But it's incorrect. currently total memory consumption is
    'buffer_size_kb x CPUs x 2'.

    Why two times difference is there? because ftrace implicitly allocate
    the buffer for max latency too.

    That makes sad result when admin want to use large buffer. (If admin
    want full logging and makes detail analysis). example, If admin
    have 24 CPUs machine and write 200MB to buffer_size_kb, the system
    consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
    usually unacceptable.

    Fortunatelly, almost all users don't use max latency feature.
    The max latency buffer can be disabled easily.

    This patch shrink buffer size of the max latency buffer if
    unnecessary.

    Signed-off-by: KOSAKI Motohiro
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    KOSAKI Motohiro
     
  • __print_flags() and __print_symbolic() use percpu trace_seq:

    1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
    2) It is percpu data and it wastes more memory for multi-cpus system.
    3) It disables preemption when it executes its core routine
    "trace_seq_printf(s, "%s: ", #call);" and introduces latency.

    So we move this trace_seq to struct trace_iterator.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • Reorder structure to remove 8 bytes of padding on 64 bit builds.
    This shrinks the size to 128 bytes so allowing allocation from a smaller
    slab & needed one fewer cache lines.

    Signed-off-by: Richard Kennedy
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Richard Kennedy
     
  • We found that even enabling a single trace event that will rarely be
    triggered can add big overhead to context switch.

    (lmbench context switch test)
    -------------------------------------------------
    2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
    ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
    ------ ------ ------ ------ ------ ------- -------
    2.19 2.3 2.21 2.56 2.13 2.54 2.07
    2.39 2.51 2.35 2.75 2.27 2.81 2.24

    The overhead is 6% ~ 11%.

    It's because when a trace event is enabled 3 tracepoints (sched_switch,
    sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

    We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
    to allow to disable cmdline recording.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

20 Jul, 2010

1 commit

  • The default for llseek will change to no_llseek,
    so the tracing debugfs files need to add explicit
    .llseek assignments. Since we're dealing with regular
    files from a VFS perspective, use generic_file_llseek.

    Signed-off-by: Arnd Bergmann
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: John Kacur
    Cc: Li Zefan
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Arnd Bergmann