18 Oct, 2007
2 commits
-
Doh, I completely missed that devices marked DUMMY are not running
the set_mode function. So we force broadcasting, but we keep the
local APIC timer running.Let the clock event layer mark the device _after_ switching it off.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar -
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
sched: fix new task startup crash
sched: fix !SYSFS build breakage
sched: fix improper load balance across sched domain
sched: more robust sd-sysctl entry freeing
17 Oct, 2007
38 commits
-
This patch contains the following cleanups that are now possible:
- remove the unused security_operations->inode_xattr_getsuffix
- remove the no longer used security_operations->unregister_security
- remove some no longer required exit code
- remove a bunch of no longer used exportsSigned-off-by: Adrian Bunk
Acked-by: James Morris
Cc: Chris Wright
Cc: Stephen Smalley
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For those who don't care about CONFIG_SECURITY.
Signed-off-by: Alexey Dobriyan
Cc: "Serge E. Hallyn"
Cc: Casey Schaufler
Cc: James Morris
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change migration_call(CPU_DEAD) to use direct spin_lock_irq() instead of
task_rq_lock(rq->idle), rq->idle can't change its task_rq().This makes the code a bit more symmetrical with migrate_dead_tasks()'s path
which uses spin_lock_irq/spin_unlock_irq.Signed-off-by: Oleg Nesterov
Cc: Cliff Wickman
Cc: Gautham R Shenoy
Cc: Ingo Molnar
Cc: Srivatsa Vaddagiri
Cc: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently move_task_off_dead_cpu() is called under
write_lock_irq(tasklist). This means it can't use task_lock() which is
needed to improve migrating to take task's ->cpuset into account.Change the code to call move_task_off_dead_cpu() with irqs enabled, and
change migrate_live_tasks() to use read_lock(tasklist).This all is a preparation for the futher changes proposed by Cliff Wickman, see
http://marc.info/?t=117327786100003Signed-off-by: Oleg Nesterov
Cc: Cliff Wickman
Cc: Gautham R Shenoy
Cc: Ingo Molnar
Cc: Srivatsa Vaddagiri
Cc: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
load_module() returns zero when mod_sysfs_init() fails, then the module
loading will succeed accidentally.This patch makes load_module() return error correctly in that case.
Acked-by: Greg Kroah-Hartman
Acked-by: Rusty Russell
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Compiling handle_percpu_irq only on uniprocessor generates an artificial
special case so a typical use like:set_irq_chip_and_handler(irq, &some_irq_type, handle_percpu_irq);
needs to be conditionally compiled only on SMP systems as well and an
alternative UP construct is usually needed - for no good reason.This fixes uniprocessor configurations for some MIPS SMP systems.
Signed-off-by: Ralf Baechle
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a
little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as
magic for inotifyfs.Signed-off-by: Andrey Mirkin
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The blessed way for standard caches is to use it. Besides, this may give
this cache a better alignment.Signed-off-by: Pavel Emelyanov
Acked-by: Cedric Le Goater
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For those who deselect POSIX message queues.
Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from
40 bytes to 32 bytes.[akpm@linux-foundation.org: fix build]
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Save some space because uid_hash_find() has 3 callsites.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
.. in an effort to make read-only whatever can be made, so that
CONFIG_DEBUG_RODATA can catch as many issues as possible.Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
{,un}register_timer_hook() is the API that should be used.
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kernel/sys_ni.c can't #include due to cond_syscall(),
but let's tell gcc to not warn with -Wmissing-prototypes.Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Kconfig.preempt is not included on some archs (for example, m68k). On those
archs, the Kconfig machinery complains that KVM selects an undefined symbol
PREEMPT_NOTIFIERS (which lives in Kconfig.preempt).So move the offending symbol into a Kconfig file which is included by
everyone.Cc: Roman Zippel
Cc: Geert Uytterhoeven
Signed-off-by: Avi Kivity
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
robust_list, compat_robust_list, pi_state_list, pi_state_cache are
really used if futexes are on.Signed-off-by: Alexey Dobriyan
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them. So these names should be changed. This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.htmlSigned-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data.
The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed
the above values, and it was not good from the reliability viewpoint.
So makedumpfile v1.2.0 came to need these values and I created the patch
to let the kernel output them.
makedumpfile site:
https://sourceforge.net/projects/makedumpfile/Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[1/3] Cleanup the coding style according to Andrew's comments:
http://lists.infradead.org/pipermail/kexec/2007-August/000522.html
- vmcoreinfo_append_str() should have suitable __attribute__s so that
the compiler can check its use.
- vmcoreinfo_max_size should have size_t.
- Use get_seconds() instead of xtime.tv_sec.
- Use init_uts_ns.name.release instead of UTS_RELEASE.Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)Signed-off-by: Dan Aloni
Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Bernhard Walle
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
do_sigaction() returns -ERESTARTNOINTR if signal_pending(). The comment says:
* If there might be a fatal signal pending on multiple
* threads, make sure we take it before changing the action.I think this is not needed. We should only worry about SIGNAL_GROUP_EXIT case,
bit it implies a pending SIGKILL which can't be cleared by do_sigaction.Kill this special case.
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks
if an rt-prio execer shares the same cpu with ->group_leader. Change the code
to use ->group_exit_task/notify_count mechanics.This patch certainly uglifies the code, perhaps someone can suggest something
better.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.
The locking used by get_random_bytes() can conflict with the
preempt_disable() and synchronize_sched() form of RCU. This patch changes
rcutorture's RNG to gather entropy from the new cpu_clock() interface
(relying on interrupts, preemption, daemons, and rcutorture's reader
thread's rock-bottom scheduling priority to provide useful entropy), and
also adds and EXPORT_SYMBOL_GPL() to make that interface available to GPLed
kernel modules such as rcutorture.Passes several hours of rcutorture.
[ego@in.ibm.com: Use raw_smp_processor_id() in rcu_random()]
Signed-off-by: Paul E. McKenney
Cc: Ingo Molnar
Signed-off-by: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To avoid lock contention, we distribute the sched_timer calls across the
cpus so they do not trigger at the same instant. However, I used NR_CPUS,
which can cause needless grouping on small smp systems depending on your
kernel config. This patch converts to using num_possible_cpus() so we
spread it as evenly as possible on every machine.Briefly tested w/ NR_CPUS=255 and verified reduced contention.
Signed-off-by: John Stultz
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- remove the no longer required __attribute__((weak)) of xtime_lock
- remove the following no longer used EXPORT_SYMBOL's:
- xtime
- xtime_lockSigned-off-by: Adrian Bunk
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The child was found on ->children list under tasklist_lock, it must have a
valid ->signal. __exit_signal() both removes the task from parent->children
and clears ->signal "atomically" under write_lock(tasklist).Remove unneeded checks.
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cleanup. __group_complete_signal() wakes up ->group_exit_task twice. The
second wakeup's state includes TASK_UNINTERRUPTIBLE, which is not very
appropriate.Change the code to pass the "correct" argument to signal_wake_up() and kill
now unneeded wake_up_process().Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The "p->exit_signal == -1 && p->ptrace == 0" check and the comment are
bogus. We already did exactly the same check in eligible_child(), we did
not drop tasklist_lock since then, and both variables need
write_lock(tasklist) to be changed.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Nowadays thread_group_empty() and next_thread() are simple list operations,
this optimization doesn't make sense: we are doing exactly same check one
line below.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
->siglock provides enough protection to iterate over the thread group.
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Two threads, T1 and T2. T2 ptraces P, and P is not a child of ptracer's
thread group. P exits and goes to TASK_ZOMBIE.T1 does wait_task_zombie(P):
P->exit_state = TASK_DEAD;
...
read_unlock(&tasklist_lock);T2 does exit(), takes tasklist,
forget_original_parent() does
__ptrace_unlink(P) but doesn't
call do_notify_parent(P) because
p->exit_state == EXIT_DEAD.Now, P is not visible to our process: __ptrace_unlink() removed it from
->children. We should send notification to P->parent and release P if and
only if SIGCHLD is ignored.And we have 3 bugs:
1. P->parent does do_wait() and gets -ECHILD (P is on ->parent->children,
but its state is TASK_DEAD).2. // wait_task_zombie() continues
if (put_user(...)) {
// TODO: is this safe?
p->exit_state = EXIT_ZOMBIE;
return;
}we return without notification/release, task_struct leaked.
Solution: ignore -EFAULT and proceed. It is an application's bug if
we can't fill infop/stat_addr (in case of VM_FAULT_OOM we have much
more problems).3. // wait_task_zombie() continues
if (p->real_parent != p->parent) {
// Not taken, it was untraced'ed
...
}release_task(p);
we released the task which we shouldn't.
Solution: check ->real_parent != ->parent before, under tasklist_lock,
but use ptrace_unlink() instead of __ptrace_unlink() to check ->ptrace.This patch hopefully solves 2 and 3, the 1st bug will be fixed later, we need
some cleanups in forget_original_parent/reparent_thread.However, the first race is very unlikely and not critical, so I hope it makes
sense to fix 1 and 2 for now.4. Small cleanup: don't "restore" EXIT_ZOMBIE unless we know we are not going
to realease the child.Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A zombie must have a valid ->signal, we are going to release it and
__exit_signal() starts with BUG_ON(!sig).Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With or without this patch, multi-threaded init's are not fully supported,
but do_exit() is completely wrong. This becomes a real problem when we
support pid namespaces.1. do_exit() panics when the main thread of /sbin/init exits. It should not
until the whole thread group exits. Move the code below, under the
"if (group_dead)" check.Note: this means that forget_original_parent() can use an already dead
child_reaper()'s task_struct. This is OK for /sbin/init because- do_wait() from alive sub-thread still can reap a zombie, we iterate
over all sub-thread's ->children lists- do_notify_parent() will wakeup some alive sub-thread because it sends
the group-wide signalHowever, we should remove choose_new_parent()->BUG_ON(reaper->exit_state)
for this.2. We are playing games with ->nsproxy->pid_ns. This code is bogus today, and
it has to be changed anyway when we really support pid namespaces, just
remove it.Signed-off-by: Oleg Nesterov
Roland McGrath
Cc: "Eric W. Biederman"
Cc: Sukadev Bhattiprolu
Cc: Serge Hallyn
Cc: Cedric Le Goater
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With the recent changes, do_sigaction()->recalc_sigpending_and_wake() can
never clear TIF_SIGPENDING. Instead, it can set this flag and wake up the
thread without any reason. Harmless, but unneeded and wastes CPU.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is a bit annoying that do_exit() takes ->pi_lock to set PF_EXITING. All
we need is to synchronize with lookup_pi_state() which saw this task
without PF_EXITING under ->pi_lock.Change do_exit() to use spin_unlock_wait().
Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add two new functions for reading the kernel log buffer. The intention is for
them to be used by recovery/dump/debug code so the kernel log can be easily
retrieved/parsed in a crash scenario, but they are generic enough for other
people to dream up other fun uses.[akpm@linux-foundation.org: buncha fixes]
Signed-off-by: Mike Frysinger
Cc: Robin Getz
Cc: Greg Ungerer
Cc: Russell King
Cc: Paul Mundt
Acked-by: Tim Bird
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch contains the following cleanups:
- make the needlessly global variable rt_trace_on static
- remove the unused global function deadlock_trace_off()Signed-off-by: Adrian Bunk
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds the /sys/module//notes/ magic directory, which has a
file for each allocated SHT_NOTE section that appears in .ko. This
is the counterpart for each module of /sys/kernel/notes for vmlinux.
Reading this delivers the contents of the module's SHT_NOTE sections. This
lets userland easily glean any detailed information about that module's
build that was stored there at compile time (e.g. by ld --build-id).Signed-off-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Andy Gospodarek pointed out that because we return in the middle of the
free_irq() function, we never actually do call the IRQ handler that just
got deregistered. This should fix it, although I expect Andrew will want
to convert those 'return's to 'break'. That's a separate change though.Signed-off-by: David Woodhouse
Cc: Andy Gospodarek
Cc: Fernando Luis Vzquez Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds