31 May, 2010
4 commits
-
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix optimistic spinning vs. BKL -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tui: Fix last use_browser problem related to .perfconfig
perf symbols: Add the build id cache to the vmlinux path
perf tui: Reset use_browser if stdout is not a tty
ring-buffer: Move zeroing out excess in page to ring buffer code
ring-buffer: Reset "real_end" when page is filled -
If there's only one CPU online when disable_nonboot_cpus() is called,
the error variable will not be initialized and that may lead to
erroneous behavior. Fix this issue by initializing error in
disable_nonboot_cpus() as appropriate.Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds -
This reverts commit 0ac0c0d0f837c499afd02a802f9cf52d3027fa3b, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.Requested-by: Stephen Rothwell
Signed-off-by: Linus Torvalds
30 May, 2010
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: clean up on forwarded aborted mds request
ceph: fix leak of osd authorizer
ceph: close out mds, osd connections before stopping auth
ceph: make lease code DN specific
fs/ceph: Use ERR_CAST
ceph: renew auth tickets before they expire
ceph: do not resend mon requests on auth ticket renewal
ceph: removed duplicated #includes
ceph: avoid possible null dereference
ceph: make mds requests killable, not interruptible
sched: add wait_for_completion_killable_timeout -
Add missing _killable_timeout variant for wait_for_completion that will
return when a timeout expires or the task is killed.CC: Ingo Molnar
CC: Andreas Herrmann
CC: Thomas Gleixner
CC: Mike Galbraith
Acked-by: Peter Zijlstra
Signed-off-by: Sage Weil
29 May, 2010
1 commit
-
…el/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_timer: Fix error path in timer_create
hrtimer: Avoid double seqlock
timers: Move local variable into else section
timers: Fix slack calculation really
28 May, 2010
33 commits
-
once anon_inode_getfd() is called, you can't expect *anything* about
struct file that descriptor points to - another thread might be doing
whatever it likes with descriptor table at that point.Cc: stable
Signed-off-by: Al Viro -
…git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
tracing: Add __used annotation to event variable
perf, trace: Fix !x86 build bug
perf report: Support multiple events on the TUI
perf annotate: Fix up usage of the build id cache
x86/mmiotrace: Remove redundant instruction prefix checks
perf annotate: Add TUI interface
perf tui: Remove annotate from popup menu after failure
perf report: Don't start the TUI if -D is used
perf: Fix getline undeclared
perf: Optimize perf_tp_event_match()
perf: Remove more code from the fastpath
perf: Optimize the !vmalloc backed buffer
perf: Optimize perf_output_copy()
perf: Fix wakeup storm for RO mmap()s
perf-record: Share per-cpu buffers
perf-record: Remove -M
perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
... -
Move CLOCK_DISPATCH(which_clock, timer_create, (new_timer)) after all
posible EFAULT erros.*_timer_create may allocate/get resources.
(for example posix_cpu_timer_create does get_task_struct)[ tglx: fold the remove crappy comment patch into this ]
Signed-off-by: Andrey Vagin
Cc: Oleg Nesterov
Cc: Pavel Emelyanov
Cc:
Reviewed-by: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner -
Commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") also introduced this annoying
warning:kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used
when CONFIG_HOTPLUG_CPU wasn't set.
So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
simplify it while at it.Signed-off-by: Linus Torvalds
-
In kernel profiling requires that we be able to allocate "local" memory
for each cpu. Use "cpu_to_mem()" instead of "cpu_to_node()" to support
memoryless nodes.Depends on the "numa_mem_id()" patch.
Signed-off-by: Lee Schermerhorn
Cc: Tejun Heo
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Nick Piggin
Cc: David Rientjes
Cc: Eric Whitney
Cc: KAMEZAWA Hiroyuki
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: "Luck, Tony"
Cc: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Most distros turn the console verbosity down and that means a backtrace
after a panic never makes it to the console. I assume we haven't seen
this because a panic is often preceeded by an oops which will have called
console_verbose. There are however a lot of places we call panic
directly, and they are broken.Use console_verbose like we do in the oops path to ensure a directly
called panic will print a backtrace.Signed-off-by: Anton Blanchard
Acked-by: Greg Kroah-Hartman
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
copy_process(pid => &init_struct_pid) doesn't do attach_pid/etc.
It shouldn't, but this means that the idle threads run with the wrong
pids copied from the caller's task_struct. In x86 case the caller is
either kernel_init() thread or keventd.In particular, this means that after the series of cpu_up/cpu_down an
idle thread (which never exits) can run with .pid pointing to nowhere.Change fork_idle() to initialize idle->pids[] correctly. We only set
.pid = &init_struct_pid but do not add .node to list, INIT_TASK() does
the same for the boot-cpu idle thread (swapper).Signed-off-by: Oleg Nesterov
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Eric Biederman
Cc: Herbert Poetzl
Cc: Mathias Krause
Acked-by: Roland McGrath
Acked-by: Serge Hallyn
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough. A system with 1664 CPU's, there are
25163 processes started before the login prompt. It's estimated that with
2048 CPU's we will pass the 32k limit. With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche
Signed-off-by: Mike Travis
Signed-off-by: Robin Holt
Acked-by: Linus Torvalds
Cc: Ingo Molnar
Cc: Pavel Machek
Cc: Alan Cox
Cc: Greg KH
Cc: Rik van Riel
Cc: John Stoffel
Cc: Jack Steiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
need cpu_hotplug_begin() either.This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
block of CONFIG_HOTPLUG_CPU=y.Signed-off-by: Lai Jiangshan
Cc: Gautham R Shenoy
Cc: Ingo MolnarSigned-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for kernel/*.cSigned-off-by: Akinobu Mita
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, onlining or offlining a CPU failure by one of the cpu notifiers
error always cause -EINVAL error. (i.e. writing 0 or 1 to
/sys/devices/system/cpu/cpuX/online gets EINVAL)To get better error reporting rather than always getting -EINVAL, This
changes cpu_notify() to return -errno value with notifier_to_errno() and
fix the callers. Now that cpu notifiers can return encapsulate errno
value.Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or
NOTIFY_DONE. So cpu_notify() can returns 0 or -EPERM with this change for
now.(notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0,
notifier_to_errno(NOTIFY_BAD) == -EPERM)Forthcoming patches convert several cpu notifiers to return encapsulate
errno value with notifier_from_errno().Signed-off-by: Akinobu Mita
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No functional change. These are just wrappers of
raw_cpu_notifier_call_chain.Signed-off-by: Akinobu Mita
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No functional changes, just s/atomic_t count/int nr_threads/.
With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit. It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman"
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Trivial, use get_nr_threads() helper to read signal->count which we are
going to change.Like other callers, proc_sched_show_task() doesn't need the exactly
precise nr_threads.David said:
: Note that get_nr_threads() isn't completely equivalent (it can return 0
: where proc_sched_show_task() will display a 1). But I don't think this
: should be a problem.Signed-off-by: Oleg Nesterov
Acked-by: David Howells
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
check_unshare_flags(CLONE_SIGHAND) adds CLONE_THREAD to *flags_ptr if the
task is multithreaded to ensure unshare_thread() will fail.Not only this is a bit strange way to return the error, this is absolutely
meaningless. If signal->count > 1 then sighand->count must be also > 1,
and unshare_sighand() will fail anyway.In fact, all CLONE_THREAD/SIGHAND/VM checks inside sys_unshare() do not
look right. Fortunately this code doesn't really work anyway.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move taskstats_tgid_free() from __exit_signal() to free_signal_struct().
This way signal->stats never points to nowhere and we can read ->stats
lockless.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Kill the empty thread_group_cputime_free() helper. It was needed to free
the per-cpu data which we no longer have.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cleanup:
- Add the boolean, group_dead = thread_group_leader(), for clarity.
- Do not test/set sig == NULL to detect the all-dead case, use this
boolean.- Pass this boolen to __unhash_process() and use it instead of another
thread_group_leader() call which needs ->group_leader.This can be considered as microoptimization, but hopefully this also
allows us do do other cleanups later.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Roland McGrath
Cc: Veaceslav Falico
Cc: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that task->signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.
Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When the last thread exits signal->tty is freed, but the pointer is not
cleared and points to nowhere.This is OK. Nobody should use signal->tty lockless, and it is no longer
possible to take ->siglock. However this looks wrong even if correct, and
the nice OOPS is better than subtle and hard to find bugs.Change __exit_signal() to clear signal->tty under ->siglock.
Note: __exit_signal() needs more cleanups. It should not check "sig !=
NULL" to detect the all-dead case and we have the same issues with
signal->stats.Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Acked-by: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment. Even current can't use its ->signal safely
after exit_notify(). ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.This patch adds the reference counter, sigcnt, into signal_struct. This
reference is owned by task_struct and it is dropped in
__put_task_struct(). Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.Rename __cleanup_signal() to free_signal_struct() and unexport it. With
the previous changes it does nothing except kmem_cache_free().Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.Note:
- when the last thead exits signal->tty can point to nowhere, see
the next patch.- with or without this patch signal_struct->count should go away,
or at least it should be "int nr_threads" for fs/proc. This will
be addressed later.Signed-off-by: Oleg Nesterov
Cc: Alan Cox
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
tty_kref_put() has two callsites in copy_process() paths,
1. if copy_process() suceeds it is called before we copy
signal->tty from parent2. otherwise it is called from __cleanup_signal() under
bad_fork_cleanup_signal: labelIn both cases tty_kref_put() is not right and unneeded because we don't
have the balancing tty_kref_get(). Fortunately, this is harmless because
this can only happen without CLONE_THREAD, and in this case signal->tty
must be NULL.Remove tty_kref_put() from copy_process() and __cleanup_signal(), and
change another caller of __cleanup_signal(), __exit_signal(), to call
tty_kref_put() by hand.I hope this change makes sense by itself, but it is also needed to make
->signal refcountable.Signed-off-by: Oleg Nesterov
Acked-by: Alan Cox
Acked-by: Roland McGrath
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Preparation to make task->signal immutable, no functional changes.
posix-cpu-timers.c checks task->signal != NULL to ensure this task is
alive and didn't pass __exit_signal(). This is correct but we are going
to change the lifetime rules for ->signal and never reset this pointer.Change the code to check ->sighand instead, it doesn't matter which
pointer we check under tasklist, they both are cleared simultaneously.As Roland pointed out, some of these changes are not strictly needed and
probably it makes sense to revert them later, when ->signal will be pinned
to task_struct. But this patch tries to ensure the subsequent changes in
fork/exit can't make any visible impact on posix cpu timers.Signed-off-by: Oleg Nesterov
Cc: Fenghua Yu
Acked-by: Roland McGrath
Cc: Stanislaw Gruszka
Cc: Tony Luck
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change __exit_signal() to check thread_group_leader() instead of
atomic_dec_and_test(&sig->count). This must be equivalent, the group
leader must be released only after all other threads have exited and
passed __exit_signal().Henceforth sig->count is not actually used, except in fs/proc for
get_nr_threads/etc.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
de_thread() and __exit_signal() use signal_struct->count/notify_count for
synchronization. We can simplify the code and use ->notify_count only.
Instead of comparing these two counters, we can change de_thread() to set
->notify_count = nr_of_sub_threads, then change __exit_signal() to
dec-and-test this counter and notify group_exit_task.Note that __exit_signal() checks "notify_count > 0" just for symmetry with
exit_notify(), we could just check it is != 0.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change zap_other_threads() to return the number of other sub-threads found
on ->thread_group list.Other changes are cosmetic:
- change the code to use while_each_thread() helper
- remove the obsolete comment about SIGKILL/SIGSTOP
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
signal_struct->count in its current form must die.
- it has no reasons to be atomic_t
- it looks like a reference counter, but it is not
- otoh, we really need to make task->signal refcountable, just look at
the extremely ugly task_rq_unlock_wait() called from __exit_signals().- we should change the lifetime rules for task->signal, it should be
pinned to task_struct. We have a lot of code which can be simplified
after that.- it is not needed! while the code is correct, any usage of this
counter is artificial, except fs/proc uses it correctly to show the
number of threads.This series removes the usage of sig->count from exit pathes.
This patch:
Now that Veaceslav changed copy_signal() to use zalloc(), exit_notify()
can just check notify_count < 0 to ensure the execing sub-threads needs
the notification from us. No need to do other checks, notify_count != 0
must always mean ->group_exit_task != NULL is waiting for us.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Veaceslav Falico
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
UMH_WAIT_EXEC should report the error if kernel_thread() fails, like
UMH_WAIT_PROC does.Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__call_usermodehelper(UMH_NO_WAIT) has 2 problems:
- if kernel_thread() fails, call_usermodehelper_freeinfo()
is not called.- for unknown reason UMH_NO_WAIT has UMH_WAIT_PROC logic,
we spawn yet another thread which waits until the user
mode application exits.Change the UMH_NO_WAIT code to use ____call_usermodehelper() instead of
wait_for_helper(), and do call_usermodehelper_freeinfo() unconditionally.
We can rely on CLONE_VFORK, do_fork(CLONE_VFORK) until the child exits or
execs.With or without this patch UMH_NO_WAIT does not report the error if
kernel_thread() fails, this is correct since the caller doesn't wait for
result.Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
1. wait_for_helper() calls allow_signal(SIGCHLD) to ensure the child
can't autoreap itself.However, this means that a spurious SIGCHILD from user-space can
set TIF_SIGPENDING and:- kernel_thread() or sys_wait4() can fail due to signal_pending()
- worse, wait4() can fail before ____call_usermodehelper() execs
or exits. In this case the caller may kfree(subprocess_info)
while the child still uses this memory.Change the code to use SIG_DFL instead of magic "(void __user *)2"
set by allow_signal(). This means that SIGCHLD won't be delivered,
yet the child won't autoreap itsefl.The problem is minor, only root can send a signal to this kthread.
2. If sys_wait4(&ret) fails it doesn't populate "ret", in this case
wait_for_helper() reports a random value from uninitialized var.With this patch sys_wait4() should never fail, but still it makes
sense to initialize ret = -ECHILD so that the caller can notice
the problem.Signed-off-by: Oleg Nesterov
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
____call_usermodehelper() correctly calls flush_signal_handlers() to set
SIG_DFL, but sigemptyset(->blocked) and recalc_sigpending() are not
needed.This kthread was forked by workqueue thread, all signals must be unblocked
and ignored, no pending signal is possible.Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that nobody ever changes subprocess_info->cred we can kill this member
and related code. ____call_usermodehelper() always runs in the context of
freshly forked kernel thread, it has the proper ->cred copied from its
parent kthread, keventd.Signed-off-by: Oleg Nesterov
Acked-by: Neil Horman
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
subprocess_info->cred in advance. Now that we have info->init() we can
change this code to set tgcred->session_keyring in context of execing
kernel thread.Note: since currently call_usermodehelper_keys() is never called with
UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
are not really needed, we could rely on install_session_keyring_to_cred()
which does key_get() on success.Signed-off-by: Oleg Nesterov
Acked-by: Neil Horman
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds