22 Jan, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: x86: Add support for the ANY bit
perf: Change the is_software_event() definition
perf: Honour event state for aux stream data
perf: Fix perf_event_do_pending() fallback callsite
perf kmem: Print usage help for unknown commands
perf kmem: Increase "Hit" column length
hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
perf timechart: Use tid not pid for COMM change
21 Jan, 2010
4 commits
-
Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.Reported-by: Anton Blanchard
Signed-off-by: Peter Zijlstra
Tested-by: Anton Blanchard
LKML-Reference:
CC: stable@kernel.org
Signed-off-by: Ingo Molnar -
Paul questioned the context in which we should call
perf_event_do_pending(). After looking at that I found that it should be
called from IRQ context these days, however the fallback call-site is
placed in softirq context. Ammend this by placing the callback in the IRQ
timer path.Reported-by: Paul Mackerras
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.Signed-off-by: Yong Zhang
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to. Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.Reported-by: Lin Ming
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Jan, 2010
7 commits
-
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futexes: Remove rw parameter from get_futex_key() -
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing/filters: Add comment for match callbacks
tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
lib: Introduce strnstr()
tracing/filters: Fix MATCH_END_ONLY filter matching
tracing/filters: Fix MATCH_FRONT_ONLY filter matching
ftrace: Fix MATCH_END_ONLY function filter
tracing/x86: Derive arch from bits argument in recordmcount.pl
ring-buffer: Add rb_list_head() wrapper around new reader page next field
ring-buffer: Wrap a list.next reference with rb_list_head() -
The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on. Fix this.cpumask_of_node(3): node > nr_node_ids(1)
Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
Call Trace:
[] cpumask_of_node+0x23/0x58
[] smp_call_function_any+0x65/0xfa
[] ? do_drv_read+0x0/0x2f
[] get_cur_val+0xb0/0x102
[] get_cur_freq_on_cpu+0x74/0xc5
[] acpi_cpufreq_cpu_init+0x417/0x515
[] ? __down_write+0xb/0xd
[] cpufreq_add_dev+0x278/0x922Signed-off-by: David John
Cc: Suresh Siddha
Cc: Rusty Russell
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On my first try using them I missed that the fifos need to be power of
two, resulting in a runtime bug. Document that requirement everywhere
(and fix one grammar bug)Signed-off-by: Andi Kleen
Acked-by: Stefani Seibold
Cc: Roland Dreier
Cc: Dmitry Torokhov
Cc: Andy Walls
Cc: Vikram Dhillon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In some upcoming code it's useful to peek into a FIFO without permanentely
removing data. This patch implements a new kfifo_out_peek() to do this.Signed-off-by: Andi Kleen
Acked-by: Stefani Seibold
Cc: Roland Dreier
Cc: Dmitry Torokhov
Cc: Andy Walls
Cc: Vikram Dhillon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Right now for kfifo_*_user it's not easily possible to distingush between
a user copy failing and the FIFO not containing enough data. The problem
is that both conditions are multiplexed into the same return code.Avoid this by moving the "copy length" into a separate output parameter
and only return 0/-EFAULT in the main return value.I didn't fully adapt the weird "record" variants, those seem
to be unused anyways and were rather messy (should they be just removed?)I would appreciate some double checking if I did all the conversions
correctly.Signed-off-by: Andi Kleen
Cc: Stefani Seibold
Cc: Roland Dreier
Cc: Dmitry Torokhov
Cc: Andy Walls
Cc: Vikram Dhillon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The pointers to user buffers are currently unsigned char *, which requires
a lot of casting in the caller for any non-char typed buffers. Use void *
instead.Signed-off-by: Andi Kleen
Acked-by: Stefani Seibold
Cc: Roland Dreier
Cc: Dmitry Torokhov
Cc: Andy Walls
Cc: Vikram Dhillon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jan, 2010
6 commits
-
We should be clear on 2 things:
- the length parameter of a match callback includes
tailing '\0'.- the string to be searched might not be NULL-terminated.
Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt -
MATCH_FULL matching for PTR_STRING is not working correctly:
# echo 'func == vt' > events/bkl/lock_kernel/filter
# echo 1 > events/bkl/lock_kernel/enable
...
# cat trace
Xorg-1484 [000] 1973.392586: lock_kernel: ... func=vt_ioctl()
gpm-1402 [001] 1974.027740: lock_kernel: ... func=vt_ioctl()We should pass to regex.match(..., len) the length (including '\0')
of the source string instead of the length of the pattern string.Signed-off-by: Li Zefan
LKML-Reference:
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt -
The @str might not be NULL-terminated if it's of type
DYN_STRING or STATIC_STRING, so we should use strnstr()
instead of strstr().Signed-off-by: Li Zefan
LKML-Reference:
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt -
For '*foo' pattern, we should allow any string ending with
'foo', but event filtering incorrectly disallows strings
like bar_foo_foo:Signed-off-by: Li Zefan
LKML-Reference:
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt -
MATCH_FRONT_ONLY actually is a full matching:
# ./perf record -R -f -a -e lock:lock_acquire \
--filter 'name ~rcu_*' sleep 1
# ./perf trace
(no output)We should pass the length of the pattern string to strncmp().
Signed-off-by: Li Zefan
LKML-Reference:
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt -
For '*foo' pattern, we should allow any string ending with
'foo', but ftrace filter incorrectly disallows strings
like bar_foo_foo:# echo '*io' > set_ftrace_filter
# cat set_ftrace_filter | grep 'req_bio_endio'
# cat available_filter_functions | grep 'req_bio_endio'
req_bio_endioSigned-off-by: Li Zefan
LKML-Reference:
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt
13 Jan, 2010
1 commit
-
Currently, futexes have two problem:
A) The current futex code doesn't handle private file mappings properly.
get_futex_key() uses PageAnon() to distinguish file and
anon, which can cause the following bad scenario:1) thread-A call futex(private-mapping, FUTEX_WAIT), it
sleeps on file mapping object.
2) thread-B writes a variable and it makes it cow.
3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
wakes up blocked thread on the anonymous page. (but it's nothing)B) Current futex code doesn't handle zero page properly.
Read mode get_user_pages() can return zero page, but current
futex code doesn't handle it at all. Then, zero page makes
infinite loop internally.The solution is to use write mode get_user_page() always for
page lookup. It prevents the lookup of both file page of private
mappings and zero page.Performance concerns:
Probaly very little, because glibc always initialize variables
for futex before to call futex(). It means glibc users never see
the overhead of this patch.Compatibility concerns:
This patch has few compatibility issues. After this patch,
FUTEX_WAIT require writable access to futex variables (read-only
mappings makes EFAULT). But practically it's not a problem,
glibc always initalizes variables for futexes explicitly - nobody
uses read-only mappings.Reported-by: Hugh Dickins
Signed-off-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Darren Hart
Cc:
Cc: Linus Torvalds
Cc: KAMEZAWA Hiroyuki
Cc: Nick Piggin
Cc: Ulrich Drepper
LKML-Reference:
Signed-off-by: Ingo Molnar
12 Jan, 2010
3 commits
-
When print-fatal-signals is enabled it's possible to dump any memory
reachable by the kernel to the log by simply jumping to that address from
user space.Or crash the system if there's some hardware with read side effects.
The fatal signals handler will dump 16 bytes at the execution address,
which is fully controlled by ring 3.In addition when something jumps to a unmapped address there will be up to
16 additional useless page faults, which might be potentially slow (and at
least is not very efficient)Fortunately this option is off by default and only there on i386.
But fix it by checking for kernel addresses and also stopping when there's
a page fault.Signed-off-by: Andi Kleen
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
here in cgroup_diput():/*
* if we're getting rid of the cgroup, refcount should ensure
* that there are no pidlists left.
*/
BUG_ON(!list_empty(&cgrp->pidlists));The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
when pidlist_array_load() calls cgroup_pidlist_find():(1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
pre-existing cgroup_pidlist, and increments its use_count.
(2) if no matching cgroup_pidlist is found, then a new one is allocated, it
down_write's its mutex, and the use_count is set to 0.
(3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
which increments its use_count -- regardless whether new or pre-existing --
and up_write's the mutex.So if a matching list is ever encountered by cgroup_pidlist_find() during
the life of a cgroup directory, it results in an inflated use_count value,
preventing it from ever getting released by cgroup_release_pid_array().
Then if the directory is subsequently removed, cgroup_diput() hits the
BUG_ON() when it finds that the directory's cgroup is still populated with
a pidlist.The patch simply removes the use_count increment when a matching pidlist
is found by cgroup_pidlist_find(), because it gets bumped by the calling
pidlist_array_load() function while still protected by the list's mutex.Signed-off-by: Dave Anderson
Reviewed-by: Li Zefan
Acked-by: Ben Blum
Cc: Paul Menage
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix resource (write-pipe file) leak in call_usermodehelper_pipe().
When call_usermodehelper_exec() fails, write-pipe file is opened and
call_usermodehelper_pipe() just returns an error. Since it is hard for
caller to determine whether the error occured when opening the pipe or
executing the helper, the caller cannot close the pipe by themselves.I've found this resoruce leak when testing coredump. You can check how
the resource leaks as below;$ echo "|nocommand" > /proc/sys/kernel/core_pattern
$ ulimit -c unlimited
$ while [ 1 ]; do ./segv; done &> /dev/null &
$ cat /proc/meminfo (
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jan, 2010
2 commits
-
If the very unlikely case happens where the writer moves the head by one
between where the head page is read and where the new reader page
is assigned _and_ the writer then writes and wraps the entire ring buffer
so that the head page is back to what was originally read as the head page,
the page to be swapped will have a corrupted next pointer.Simple solution is to wrap the assignment of the next pointer with a
rb_list_head().Signed-off-by: Steven Rostedt
-
This reference at the end of rb_get_reader_page() was causing off-by-one
writes to the prev pointer of the page after the reader page when that
page is the head page, and therefore the reader page has the RB_PAGE_HEAD
flag in its list.next pointer. This eventually results in a GPF in a
subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
when that prev pointer is dereferenced. The dereferenced register would
characteristically have an address that appears shifted left by one byte
(eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
an address one byte too high.Signed-off-by: David Sharp
LKML-Reference:
Signed-off-by: Steven Rostedt
06 Jan, 2010
1 commit
-
Commit 35dead4 "modules: don't export section names of empty sections
via sysfs" changed the set of sections that have attributes, but did
not change the iteration over these attributes in add_notes_attrs().
This can lead to add_notes_attrs() creating attributes with the wrong
names or with null name pointers.Introduce a sect_empty() function and use it in both add_sect_attrs()
and add_notes_attrs().Reported-by: Martin Michlmayr
Signed-off-by: Ben Hutchings
Tested-by: Martin Michlmayr
Cc: stable@kernel.org
Signed-off-by: Rusty Russell
Signed-off-by: Linus Torvalds
01 Jan, 2010
3 commits
-
…el/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix NULL deref in inheritance code
perf: Pass appropriate frame pointer to dump_trace() -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf kmem: Fix statistics typo
kprobes: Fix distinct type warning
perf: Rename perf_event_hw_event in design document
perf tools: Add missing header files to LIB_H Makefile variable
perf record: We should fork only if a program was specified to run
perf diff: Fix usage array, it must end with a NULL entry -
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix sign fields in ftrace_define_fields_##call()
tracing/syscalls: Fix typo in SYSCALL_DEFINE0
tracing/kprobe: Show sign of fields in trace_kprobe format files
ksym_tracer: Remove trace_stat
ksym_tracer: Fix race when incrementing count
ksym_tracer: Fix to allow writing newline to ksym_trace_filter
ksym_tracer: Fix to make the tracer work
tracing: Kconfig spelling fixes and cleanups
tracing: Fix setting tracer specific options
Documentation: Update ftrace-design.txt
Documentation: Update tracepoint-analysis.txt
Documentation: Update mmiotrace.txt
31 Dec, 2009
1 commit
-
Liming found a NULL deref when a task has a perf context but no
counters when it forks.This can occur in two cases, a race during construction where
the fork hits after installing the context but before the first
counter gets inserted, or more reproducably, a fork after the
last counter is closed (which leaves the context around).Reported-by: Wang Liming
Signed-off-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Paul Mackerras
CC:
LKML-Reference:
Signed-off-by: Ingo Molnar
30 Dec, 2009
6 commits
-
Add is_signed_type() call to trace_define_field() in ftrace macros.
The code previously just passed in 0 (false), disregarding whether
or not the field was actually a signed type.Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt -
The format files of trace_kprobe do not show the sign of the fields.
The other format files show the field signed type of the fields and
this patch makes the trace_kprobe formats consistent with the others.Signed-off-by: Lai Jiangshan
LKML-Reference:
Acked-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt -
trace_stat is problematic. Don't use it, use seqfile instead.
This fixes a race that reading the stat file is not protected by
any lock, which can lead to use after free.Signed-off-by: Li Zefan
Cc: Steven Rostedt
Cc: K.Prasad
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
We are under rcu read section but not holding the write lock, so
count++ is not atomic. Use atomic64_t instead.Signed-off-by: Li Zefan
Cc: Steven Rostedt
Cc: K.Prasad
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
It used to work, but now doesn't:
# echo > ksym_filter
bash: echo: write error: Invalid argumentIt's caused by d954fbf0ff6b5fdfb32350e85a2f15d3db976506
("tracing: Fix wrong usage of strstrip in trace_ksyms").Signed-off-by: Li Zefan
Cc: Steven Rostedt
Cc: K.Prasad
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
ksym tracer doesn't work:
# echo tasklist_lock:rw- > ksym_trace_filter
-bash: echo: write error: No such deviceIt's because we pass to perf_event_create_kernel_counter()
a cpu number which is not present.Signed-off-by: Li Zefan
Cc: Steven Rostedt
Cc: K.Prasad
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
28 Dec, 2009
2 commits
-
Fix filename reference (ftrace-implementation.txt ->
ftrace-design.txt).Fix spelling, punctuation, grammar.
Fix help text indentation and line lengths to reduce need for
horizontal scrolling or larger window sizes.Signed-off-by: Randy Dunlap
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Every time I see this:
kernel/kprobes.c: In function 'register_kretprobe':
kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a castI'm wondering if something changed in common code and we need to
do something for s390. Apparently that's not the case.
Let's get rid of this annoying warning.Signed-off-by: Heiko Carstens
Acked-by: Ananth N Mavinakayanahalli
Cc: Masami Hiramatsu
LKML-Reference:
Signed-off-by: Ingo Molnar
25 Dec, 2009
1 commit
-
* 'sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6:
SYSCTL: Add a mutex to the page_alloc zone order sysctl
SYSCTL: Print binary sysctl warnings (nearly) only once
24 Dec, 2009
2 commits
-
When printing legacy sysctls print the warning message
for each of them only once. This way there is a guarantee
the syslog won't be flooded for any sane program.The original attempt at this made the tables non const and stored
the flag inline.Linus suggested using a separate hash table for this, this is based on a
code snippet from him.The hash implies this is not exact and can sometimes not print a
new sysctl due to a hash collision, but in practice this should not
be a problemI used a FNV32 hash over the binary string with a 32byte bitmap. This
gives relatively little collisions when all the predefined binary sysctls
are hashed:size 256
bucket
length number
0: [25]
1: [67]
2: [88]
3: [47]
4: [22]
5: [6]
6: [1]The worst case is a single collision of 6 hash values.
Signed-off-by: Andi Kleen
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Revert 738d2be, simplify set_task_cpu()