22 Jan, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: x86: Add support for the ANY bit
    perf: Change the is_software_event() definition
    perf: Honour event state for aux stream data
    perf: Fix perf_event_do_pending() fallback callsite
    perf kmem: Print usage help for unknown commands
    perf kmem: Increase "Hit" column length
    hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
    perf timechart: Use tid not pid for COMM change

    Linus Torvalds
     

21 Jan, 2010

4 commits

  • Anton reported that perf record kept receiving events even after calling
    ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
    events didn't respect the disabled state and kept flowing in.

    Reported-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    Tested-by: Anton Blanchard
    LKML-Reference:
    CC: stable@kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Paul questioned the context in which we should call
    perf_event_do_pending(). After looking at that I found that it should be
    called from IRQ context these days, however the fallback call-site is
    placed in softirq context. Ammend this by placing the callback in the IRQ
    timer path.

    Reported-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Assume A->B schedule is processing, if B have acquired BKL before and it
    need reschedule this time. Then on B's context, it will go to
    need_resched_nonpreemptible for reschedule. But at this time, prev and
    switch_count are related to A. It's wrong and will lead to incorrect
    scheduler statistics.

    Signed-off-by: Yong Zhang
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Zhang
     
  • SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
    enabled, leading to many cache misses on large machines as we traverse
    looking for an idle shared cache to wake to. Change the enabler of
    select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
    sibling domain level.

    Reported-by: Lin Ming
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

17 Jan, 2010

7 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futexes: Remove rw parameter from get_futex_key()

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing/filters: Add comment for match callbacks
    tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
    tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
    lib: Introduce strnstr()
    tracing/filters: Fix MATCH_END_ONLY filter matching
    tracing/filters: Fix MATCH_FRONT_ONLY filter matching
    ftrace: Fix MATCH_END_ONLY function filter
    tracing/x86: Derive arch from bits argument in recordmcount.pl
    ring-buffer: Add rb_list_head() wrapper around new reader page next field
    ring-buffer: Wrap a list.next reference with rb_list_head()

    Linus Torvalds
     
  • The change in acpi_cpufreq to use smp_call_function_any causes a warning
    when it is called since the function erroneously passes the cpu id to
    cpumask_of_node rather than the node that the cpu is on. Fix this.

    cpumask_of_node(3): node > nr_node_ids(1)
    Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
    Call Trace:
    [] cpumask_of_node+0x23/0x58
    [] smp_call_function_any+0x65/0xfa
    [] ? do_drv_read+0x0/0x2f
    [] get_cur_val+0xb0/0x102
    [] get_cur_freq_on_cpu+0x74/0xc5
    [] acpi_cpufreq_cpu_init+0x417/0x515
    [] ? __down_write+0xb/0xd
    [] cpufreq_add_dev+0x278/0x922

    Signed-off-by: David John
    Cc: Suresh Siddha
    Cc: Rusty Russell
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David John
     
  • On my first try using them I missed that the fifos need to be power of
    two, resulting in a runtime bug. Document that requirement everywhere
    (and fix one grammar bug)

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • In some upcoming code it's useful to peek into a FIFO without permanentely
    removing data. This patch implements a new kfifo_out_peek() to do this.

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Right now for kfifo_*_user it's not easily possible to distingush between
    a user copy failing and the FIFO not containing enough data. The problem
    is that both conditions are multiplexed into the same return code.

    Avoid this by moving the "copy length" into a separate output parameter
    and only return 0/-EFAULT in the main return value.

    I didn't fully adapt the weird "record" variants, those seem
    to be unused anyways and were rather messy (should they be just removed?)

    I would appreciate some double checking if I did all the conversions
    correctly.

    Signed-off-by: Andi Kleen
    Cc: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The pointers to user buffers are currently unsigned char *, which requires
    a lot of casting in the caller for any non-char typed buffers. Use void *
    instead.

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

15 Jan, 2010

6 commits

  • We should be clear on 2 things:

    - the length parameter of a match callback includes
    tailing '\0'.

    - the string to be searched might not be NULL-terminated.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • MATCH_FULL matching for PTR_STRING is not working correctly:

    # echo 'func == vt' > events/bkl/lock_kernel/filter
    # echo 1 > events/bkl/lock_kernel/enable
    ...
    # cat trace
    Xorg-1484 [000] 1973.392586: lock_kernel: ... func=vt_ioctl()
    gpm-1402 [001] 1974.027740: lock_kernel: ... func=vt_ioctl()

    We should pass to regex.match(..., len) the length (including '\0')
    of the source string instead of the length of the pattern string.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • The @str might not be NULL-terminated if it's of type
    DYN_STRING or STATIC_STRING, so we should use strnstr()
    instead of strstr().

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • For '*foo' pattern, we should allow any string ending with
    'foo', but event filtering incorrectly disallows strings
    like bar_foo_foo:

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • MATCH_FRONT_ONLY actually is a full matching:

    # ./perf record -R -f -a -e lock:lock_acquire \
    --filter 'name ~rcu_*' sleep 1
    # ./perf trace
    (no output)

    We should pass the length of the pattern string to strncmp().

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • For '*foo' pattern, we should allow any string ending with
    'foo', but ftrace filter incorrectly disallows strings
    like bar_foo_foo:

    # echo '*io' > set_ftrace_filter
    # cat set_ftrace_filter | grep 'req_bio_endio'
    # cat available_filter_functions | grep 'req_bio_endio'
    req_bio_endio

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     

13 Jan, 2010

1 commit

  • Currently, futexes have two problem:

    A) The current futex code doesn't handle private file mappings properly.

    get_futex_key() uses PageAnon() to distinguish file and
    anon, which can cause the following bad scenario:

    1) thread-A call futex(private-mapping, FUTEX_WAIT), it
    sleeps on file mapping object.
    2) thread-B writes a variable and it makes it cow.
    3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
    wakes up blocked thread on the anonymous page. (but it's nothing)

    B) Current futex code doesn't handle zero page properly.

    Read mode get_user_pages() can return zero page, but current
    futex code doesn't handle it at all. Then, zero page makes
    infinite loop internally.

    The solution is to use write mode get_user_page() always for
    page lookup. It prevents the lookup of both file page of private
    mappings and zero page.

    Performance concerns:

    Probaly very little, because glibc always initialize variables
    for futex before to call futex(). It means glibc users never see
    the overhead of this patch.

    Compatibility concerns:

    This patch has few compatibility issues. After this patch,
    FUTEX_WAIT require writable access to futex variables (read-only
    mappings makes EFAULT). But practically it's not a problem,
    glibc always initalizes variables for futexes explicitly - nobody
    uses read-only mappings.

    Reported-by: Hugh Dickins
    Signed-off-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Cc:
    Cc: Linus Torvalds
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: Ulrich Drepper
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

12 Jan, 2010

3 commits

  • When print-fatal-signals is enabled it's possible to dump any memory
    reachable by the kernel to the log by simply jumping to that address from
    user space.

    Or crash the system if there's some hardware with read side effects.

    The fatal signals handler will dump 16 bytes at the execution address,
    which is fully controlled by ring 3.

    In addition when something jumps to a unmapped address there will be up to
    16 additional useless page faults, which might be potentially slow (and at
    least is not very efficient)

    Fortunately this option is off by default and only there on i386.

    But fix it by checking for kernel addresses and also stopping when there's
    a page fault.

    Signed-off-by: Andi Kleen
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
    here in cgroup_diput():

    /*
    * if we're getting rid of the cgroup, refcount should ensure
    * that there are no pidlists left.
    */
    BUG_ON(!list_empty(&cgrp->pidlists));

    The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
    when pidlist_array_load() calls cgroup_pidlist_find():

    (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
    pre-existing cgroup_pidlist, and increments its use_count.
    (2) if no matching cgroup_pidlist is found, then a new one is allocated, it
    down_write's its mutex, and the use_count is set to 0.
    (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
    which increments its use_count -- regardless whether new or pre-existing --
    and up_write's the mutex.

    So if a matching list is ever encountered by cgroup_pidlist_find() during
    the life of a cgroup directory, it results in an inflated use_count value,
    preventing it from ever getting released by cgroup_release_pid_array().
    Then if the directory is subsequently removed, cgroup_diput() hits the
    BUG_ON() when it finds that the directory's cgroup is still populated with
    a pidlist.

    The patch simply removes the use_count increment when a matching pidlist
    is found by cgroup_pidlist_find(), because it gets bumped by the calling
    pidlist_array_load() function while still protected by the list's mutex.

    Signed-off-by: Dave Anderson
    Reviewed-by: Li Zefan
    Acked-by: Ben Blum
    Cc: Paul Menage
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Anderson
     
  • Fix resource (write-pipe file) leak in call_usermodehelper_pipe().

    When call_usermodehelper_exec() fails, write-pipe file is opened and
    call_usermodehelper_pipe() just returns an error. Since it is hard for
    caller to determine whether the error occured when opening the pipe or
    executing the helper, the caller cannot close the pipe by themselves.

    I've found this resoruce leak when testing coredump. You can check how
    the resource leaks as below;

    $ echo "|nocommand" > /proc/sys/kernel/core_pattern
    $ ulimit -c unlimited
    $ while [ 1 ]; do ./segv; done &> /dev/null &
    $ cat /proc/meminfo (
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     

07 Jan, 2010

2 commits

  • If the very unlikely case happens where the writer moves the head by one
    between where the head page is read and where the new reader page
    is assigned _and_ the writer then writes and wraps the entire ring buffer
    so that the head page is back to what was originally read as the head page,
    the page to be swapped will have a corrupted next pointer.

    Simple solution is to wrap the assignment of the next pointer with a
    rb_list_head().

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This reference at the end of rb_get_reader_page() was causing off-by-one
    writes to the prev pointer of the page after the reader page when that
    page is the head page, and therefore the reader page has the RB_PAGE_HEAD
    flag in its list.next pointer. This eventually results in a GPF in a
    subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
    when that prev pointer is dereferenced. The dereferenced register would
    characteristically have an address that appears shifted left by one byte
    (eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
    an address one byte too high.

    Signed-off-by: David Sharp
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     

06 Jan, 2010

1 commit

  • Commit 35dead4 "modules: don't export section names of empty sections
    via sysfs" changed the set of sections that have attributes, but did
    not change the iteration over these attributes in add_notes_attrs().
    This can lead to add_notes_attrs() creating attributes with the wrong
    names or with null name pointers.

    Introduce a sect_empty() function and use it in both add_sect_attrs()
    and add_notes_attrs().

    Reported-by: Martin Michlmayr
    Signed-off-by: Ben Hutchings
    Tested-by: Martin Michlmayr
    Cc: stable@kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     

01 Jan, 2010

3 commits

  • …el/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Fix NULL deref in inheritance code
    perf: Pass appropriate frame pointer to dump_trace()

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf kmem: Fix statistics typo
    kprobes: Fix distinct type warning
    perf: Rename perf_event_hw_event in design document
    perf tools: Add missing header files to LIB_H Makefile variable
    perf record: We should fork only if a program was specified to run
    perf diff: Fix usage array, it must end with a NULL entry

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Fix sign fields in ftrace_define_fields_##call()
    tracing/syscalls: Fix typo in SYSCALL_DEFINE0
    tracing/kprobe: Show sign of fields in trace_kprobe format files
    ksym_tracer: Remove trace_stat
    ksym_tracer: Fix race when incrementing count
    ksym_tracer: Fix to allow writing newline to ksym_trace_filter
    ksym_tracer: Fix to make the tracer work
    tracing: Kconfig spelling fixes and cleanups
    tracing: Fix setting tracer specific options
    Documentation: Update ftrace-design.txt
    Documentation: Update tracepoint-analysis.txt
    Documentation: Update mmiotrace.txt

    Linus Torvalds
     

31 Dec, 2009

1 commit

  • Liming found a NULL deref when a task has a perf context but no
    counters when it forks.

    This can occur in two cases, a race during construction where
    the fork hits after installing the context but before the first
    counter gets inserted, or more reproducably, a fork after the
    last counter is closed (which leaves the context around).

    Reported-by: Wang Liming
    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    CC:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

30 Dec, 2009

6 commits

  • Add is_signed_type() call to trace_define_field() in ftrace macros.

    The code previously just passed in 0 (false), disregarding whether
    or not the field was actually a signed type.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • The format files of trace_kprobe do not show the sign of the fields.
    The other format files show the field signed type of the fields and
    this patch makes the trace_kprobe formats consistent with the others.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • trace_stat is problematic. Don't use it, use seqfile instead.

    This fixes a race that reading the stat file is not protected by
    any lock, which can lead to use after free.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: K.Prasad
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • We are under rcu read section but not holding the write lock, so
    count++ is not atomic. Use atomic64_t instead.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: K.Prasad
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • It used to work, but now doesn't:

    # echo > ksym_filter
    bash: echo: write error: Invalid argument

    It's caused by d954fbf0ff6b5fdfb32350e85a2f15d3db976506
    ("tracing: Fix wrong usage of strstrip in trace_ksyms").

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: K.Prasad
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • ksym tracer doesn't work:

    # echo tasklist_lock:rw- > ksym_trace_filter
    -bash: echo: write error: No such device

    It's because we pass to perf_event_create_kernel_counter()
    a cpu number which is not present.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: K.Prasad
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

28 Dec, 2009

2 commits

  • Fix filename reference (ftrace-implementation.txt ->
    ftrace-design.txt).

    Fix spelling, punctuation, grammar.

    Fix help text indentation and line lengths to reduce need for
    horizontal scrolling or larger window sizes.

    Signed-off-by: Randy Dunlap
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     
  • Every time I see this:

    kernel/kprobes.c: In function 'register_kretprobe':
    kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a cast

    I'm wondering if something changed in common code and we need to
    do something for s390. Apparently that's not the case.
    Let's get rid of this annoying warning.

    Signed-off-by: Heiko Carstens
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

25 Dec, 2009

1 commit


24 Dec, 2009

2 commits

  • When printing legacy sysctls print the warning message
    for each of them only once. This way there is a guarantee
    the syslog won't be flooded for any sane program.

    The original attempt at this made the tables non const and stored
    the flag inline.

    Linus suggested using a separate hash table for this, this is based on a
    code snippet from him.

    The hash implies this is not exact and can sometimes not print a
    new sysctl due to a hash collision, but in practice this should not
    be a problem

    I used a FNV32 hash over the binary string with a 32byte bitmap. This
    gives relatively little collisions when all the predefined binary sysctls
    are hashed:

    size 256
    bucket
    length number
    0: [25]
    1: [67]
    2: [88]
    3: [47]
    4: [22]
    5: [6]
    6: [1]

    The worst case is a single collision of 6 hash values.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Revert 738d2be, simplify set_task_cpu()

    Linus Torvalds