25 Jul, 2008
12 commits
-
This patch is by far the most complex in the series. It adds a new syscall
paccept. This syscall differs from accept in that it adds (at the userlevel)
two additional parameters:- a signal mask
- a flags valueThe flags parameter can be used to set flag like SOCK_CLOEXEC. This is
imlpemented here as well. Some people argued that this is a property which
should be inherited from the file desriptor for the server but this is against
POSIX. Additionally, we really want the signal mask parameter as well
(similar to pselect, ppoll, etc). So an interface change in inevitable.The flag value is the same as for socket and socketpair. I think diverging
here will only create confusion. Similar to the filesystem interfaces where
the use of the O_* constants differs, it is acceptable here.The signal mask is handled as for pselect etc. The mask is temporarily
installed for the thread and removed before the call returns. I modeled the
code after pselect. If there is a problem it's likely also in pselect.For architectures which use socketcall I maintained this interface instead of
adding a system call. The symmetry shouldn't be broken.The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include
#include
#include
#include
#include#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif#define PORT 57392
#define SOCK_CLOEXEC O_CLOEXEC
static pthread_barrier_t b;
static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);pthread_barrier_wait (&b);
sleep (2);
pthread_kill ((pthread_t) arg, SIGUSR1);return NULL;
}static void
handler (int s)
{
}int
main (void)
{
pthread_barrier_init (&b, NULL, 2);struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
{
puts ("pthread_create failed");
return 1;
}int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);pthread_barrier_wait (&b);
int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}int coe = fcntl (s2, F_GETFD);
if (coe & FD_CLOEXEC)
{
puts ("paccept(0) set close-on-exec-flag");
return 1;
}
close (s2);pthread_barrier_wait (&b);
s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
if (s2 < 0)
{
puts ("paccept(SOCK_CLOEXEC) failed");
return 1;
}coe = fcntl (s2, F_GETFD);
if ((coe & FD_CLOEXEC) == 0)
{
puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (s2);pthread_barrier_wait (&b);
struct sigaction sa;
sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset (&sa.sa_mask);
sigaction (SIGUSR1, &sa, NULL);sigset_t ss;
pthread_sigmask (SIG_SETMASK, NULL, &ss);
sigaddset (&ss, SIGUSR1);
pthread_sigmask (SIG_SETMASK, &ss, NULL);sigdelset (&ss, SIGUSR1);
alarm (4);
pthread_barrier_wait (&b);errno = 0 ;
s2 = paccept (s, NULL, 0, &ss, 0);
if (s2 != -1 || errno != EINTR)
{
puts ("paccept did not fail with EINTR");
return 1;
}close (s);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[akpm@linux-foundation.org: make it compile]
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Cc: "David S. Miller"
Cc: Roland McGrath
Cc: Kyle McMartin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
set_type returns an int indicating success or failure, but up to now
setup_irq ignores that.In my case this resulted in a machine hang:
gpio-keys requested IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING, but
arm/ns9xxx can only trigger on one direction so set_type didn't touch
the configuration which happens do default to a level sensitiveness and
returned -EINVAL. setup_irq ignored that and unmasked the irq. This
resulted in an endless triggering of the gpio-key interrupt service
routine which effectively killed the machine.With this patch applied setup_irq propagates the error to the caller.
Note that before in the case
chip && !chip->set_type && !chip->name
a NULL pointer was feed to printk. This is fixed, too.
Signed-off-by: Uwe Kleine-König
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix try_to_freeze_tasks()'s use of do_div() on an s64 by making
elapsed_csecs64 a u64 instead and dividing that.Possibly this should be guarded lest the interval calculation turn up
negative, but the possible negativity of the result of the division is
cast away anyway.This was introduced by patch 438e2ce68dfd4af4cfcec2f873564fb921db4bb5.
Signed-off-by: David Howells
Acked-by: "Rafael J. Wysocki"
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
schedule sysrq poweroff on boot cpu.
sysrq poweroff needs to disable nonboot cpus, and we need to run this on boot
cpu to avoid any recursion. http://bugzilla.kernel.org/show_bug.cgi?id=10897[kosaki.motohiro@jp.fujitsu.com: build fix]
Signed-off-by: Zhang Rui
Tested-by: Rus
Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This interface allows adding a job on a specific cpu.
Although a work struct on a cpu will be scheduled to other cpu if the cpu
dies, there is a recursion if a work task tries to offline the cpu it's
running on. we need to schedule the task to a specific cpu in this case.
http://bugzilla.kernel.org/show_bug.cgi?id=10897[oleg@tv-sign.ru: cleanups]
Signed-off-by: Zhang Rui
Tested-by: Rus
Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch simplifies the memory bitmap manipulations.
- remove the member size in struct bm_block
It is not necessary for struct bm_block to have the number of bit chunks that
can be calculated by using end_pfn and start_pfn.- use find_next_bit() for memory_bm_next_pfn
No need to invent the bitmap library only for the memory bitmap.
Signed-off-by: Akinobu Mita
Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Boot-time test for system suspend states (STR or standby). The generic
RTC framework triggers wakeup alarms, which are used to exit those states.- Measures some aspects of suspend time ... this uses "jiffies" until
someone converts it to use a timebase that works properly even while
timer IRQs are disabled.- Triggered by a command line parameter. By default nothing even
vaguely troublesome will happen, but "test_suspend=mem" will give
you a brief STR test during system boot. (Or you may need to use
"test_suspend=standby" instead, if your hardware needs that.)This isn't without problems. It fires early enough during boot that for
example both PCMCIA and MMC stacks have misbehaved. The workaround in
those cases was to boot without such media cards inserted.[matthltc@us.ibm.com: fix compile failure in boot time suspend selftest]
Signed-off-by: David Brownell
Cc: Ingo Molnar
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Signed-off-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Tell the user about the no_console_suspend option, so that we don't have to
tell each bug reporter personally.[akpm@linux-foundation.org: clarify the text a little]
Signed-off-by: Pavel Machek
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To date, we've tried hard to confine filesystem support for capabilities
to the security modules. This has left a lot of the code in
kernel/capability.c in a state where it looks like it supports something
that filesystem support for capabilities actually suppresses when the LSM
security/commmoncap.c code runs. What is left is a lot of code that uses
sub-optimal locking in the main kernelWith this change we refactor the main kernel code and make it explicit
which locks are needed and that the only remaining kernel races in this
area are associated with non-filesystem capability code.Signed-off-by: Andrew G. Morgan
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add basic support for more than one hstate in hugetlbfs. This is the key
to supporting multiple hugetlbfs page sizes at once.- Rather than a single hstate, we now have an array, with an iterator
- default_hstate continues to be the struct hstate which we use by default
- Add functions for architectures to register new hstates[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke
Acked-by: Nishanth Aravamudan
Signed-off-by: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
a similar manner to the reservations taken for MAP_SHARED mappings. The
reserve count is accounted both globally and on a per-VMA basis for
private mappings. This guarantees that a process that successfully calls
mmap() will successfully fault all pages in the future unless fork() is
called.The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;1. The process calling mmap() is guaranteed to succeed all future faults until
it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
mapping if enough pages are not available for the COW. For reasonably
reliable behaviour in the face of a small huge page pool, children of
hugepage-aware processes should not reference the mappings; such as
might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
faulted by the parent will succeed. Successful writes will depend on enough
huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
and at fault time otherwise.Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent. This applies
to reads and writes of uninstantiated pages as well as COW. After the
patch it is only a write to an instantiated page that causes problems.Signed-off-by: Mel Gorman
Acked-by: Adam Litke
Cc: Andy Whitcroft
Cc: William Lee Irwin III
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds proper extern declarations for five variables in
include/linux/vmstat.hSigned-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jul, 2008
6 commits
-
* 'x86/auditsc' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
i386 syscall audit fast-path
x86_64 ia32 syscall audit fast-path
x86_64 syscall audit fast-path
x86_64: remove bogus optimization in sysret_signal -
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction). -
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
-
…ernel/git/tip/linux-2.6-tip
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix invalid proc_handler for softlockup_panic
softlockup: fix watchdog task wakeup frequency
softlockup: fix watchdog task wakeup frequency
softlockup: show irqtrace
softlockup: print a module list on being stuck
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
softlockup: fix softlockup_thresh fix
softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
softlockup: allow panic on lockup -
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.Signed-off-by: Roland McGrath
-
Since 15a647eba94c3da27ccc666bea72e7cca06b2d19 set_irq_wake returned -ENXIO
if another device had it already enabled. Zero is the right value to
return in this case. Moreover the change to desc->status was not reverted
if desc->chip->set_wake returned an error.Signed-off-by: Uwe Kleine-König
Acked-by: David Brownell
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Russell King
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jul, 2008
4 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod -
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (79 commits)
arm: bus_id -> dev_name() and dev_set_name() conversions
sparc64: fix up bus_id changes in sparc core code
3c59x: handle pci_name() being const
MTD: handle pci_name() being const
HP iLO driver
sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute
sysdev: Add utility functions for simple int/ulong variable sysdev attributes
sysdev: Pass the attribute to the low level sysdev show/store function
driver core: Suppress sysfs warnings for device_rename().
kobject: Transmit return value of call_usermodehelper() to caller
sysfs-rules.txt: reword API stability statement
debugfs: Implement debugfs_remove_recursive()
HOWTO: change email addresses of James in HOWTO
always enable FW_LOADER unless EMBEDDED=y
uio-howto.tmpl: use unique output names
uio-howto.tmpl: use standard copyright/legal markings
sysfs: don't call notify_change
sysdev: fix debugging statements in registration code.
kobject: should use kobject_put() in kset-example
kobject: reorder kobject to save space on 64 bit builds
... -
Add missing cond_syscall() entry for compat_sys_epoll_pwait.
Signed-off-by: Atsushi Nemoto
Cc: Davide Libenzi
Cc: [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix wrong domain attr updates, or we will always update the first sched
domain attr.Signed-off-by: Miao Xie
Cc: Hidetoshi Seto
Cc: Paul Jackson
Cc: Nick Piggin
Cc: Ingo Molnar
Cc: [2.6.26.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Jul, 2008
7 commits
-
Always compile request_module when the kernel allows modules.
Signed-off-by: Johannes Berg
Signed-off-by: Rusty Russell -
This patch keeps track of the boundaries of module allocation, in
order to speed up module_text_address().Inspired by Arjan's version, which required arch-specific defines:
Various pieces of the kernel (lockdep, latencytop, etc) tend
to store backtraces, sometimes at a relatively high
frequency. In itself this isn't a big performance deal (after
all you're using diagnostics features), but there have been
some complaints from people who have over 100 modules loaded
that this is a tad too slow.This is due to the new backtracer code which looks at every
slot on the stack to see if it's a kernel/module text address,
so that's 1024 slots. 1024 times 100 modules... that's a lot
of list walking.Signed-off-by: Rusty Russell
-
This shrinks module.o and each *.ko file.
And finally, structure members which hold length of module
code (four such members there) and count of symbols
are converted from longs to ints.We cannot possibly have a module where 32 bits won't
be enough to hold such counts.For one, module loading checks module size for sanity
before loading, so such insanely big module will fail
that test first.Signed-off-by: Denys Vlasenko
Signed-off-by: Rusty Russell -
module.c and module.h conatains code for finding
exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
(because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).This patch adds required #ifdefs.
Signed-off-by: Denys Vlasenko
Signed-off-by: Rusty Russell -
Introduce an each_symbol() iterator to avoid duplicating the knowledge
about the 5 different sections containing symbols. Currently only
used by find_symbol(), but will be used by symbol_put_addr() too.(Includes NULL ptr deref fix by Jiri Kosina )
Signed-off-by: Rusty Russell
Cc: Jiri Kosina -
rmmod has a little-used "-w" option, meaning that instead of failing if the
module is in use, it should block until the module becomes unused.In this case, we don't need to use stop_machine: Max Krasnyansky
indicated that would be useful for SystemTap which loads/unloads new
modules frequently.Cc: Max Krasnyansky
Signed-off-by: Rusty Russell -
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman
20 Jul, 2008
3 commits
-
Peter pointed out that hrtick_enabled() should use cpu_active().
Signed-off-by: Ingo Molnar
-
random uvesafb failures were reported against Gentoo:
http://bugs.gentoo.org/show_bug.cgi?id=222799
and Mihai Moldovan bisected it back to:
> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tickLinus suspected it to be hrtick + vm86 interaction and observed:
> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.Here's a stab at cleaning this stuff up ...
Mihai reported test success as well.
Signed-off-by: Peter Zijlstra
Tested-by: Mihai Moldovan
Cc: Michal Januszewski
Cc: Antonino Daplas
Signed-off-by: Ingo Molnar
19 Jul, 2008
4 commits
-
* Optimize various places where a pointer to the cpumask_of_cpu value
will result in reducing stack pressure.Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* This patch replaces the dangerous lvalue version of cpumask_of_cpu
with new cpumask_of_cpu_ptr macros. These are patterned after the
node_to_cpumask_ptr macros.In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map
is provided when there is a large NR_CPUS count, reducing
greatly the amount of code generated and stack space used for
cpumask_of_cpu(). The pointer to the cpumask_t value is needed for
calling set_cpus_allowed_ptr() to reduce the amount of stack space
needed to pass the cpumask_t value.If there isn't a cpumask_of_cpu_map[], then a temporary variable is
declared and filled in with value from cpumask_of_cpu(cpu) as well as
a pointer variable pointing to this temporary variable. Afterwards,
the pointer is used to reference the cpumask value. The compiler
will optimize out the extra dereference through the pointer as well
as the stack space used for the pointer, resulting in identical code.A good example of the orthogonal usages is in net/sunrpc/svc.c:
case SVC_POOL_PERCPU:
{
unsigned int cpu = m->pool_to[pidx];
cpumask_of_cpu_ptr(cpumask, cpu);*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask);
return 1;
}
case SVC_POOL_PERNODE:
{
unsigned int node = m->pool_to[pidx];
node_to_cpumask_ptr(nodecpumask, node);*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, nodecpumask);
return 1;
}Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
Conflicts:
drivers/acpi/processor_throttling.c
Signed-off-by: Ingo Molnar
-
The type of softlockup_panic is int, but the proc_handler is
proc_doulongvec_minmax(). This handler is for unsigned long.This handler should be proc_dointvec_minmax().
Signed-off-by: Hiroshi Shimamoto
Signed-off-by: Ingo Molnar
18 Jul, 2008
4 commits
-
Fix inc_rt_tasks() to not declare variable 'rq' if it's not needed. It is
declared if CONFIG_SMP or CONFIG_RT_GROUP_SCHED, but only used if CONFIG_SMP.This is a consequence of patch 1f11eb6a8bc92536d9e93ead48fa3ffbd1478571 plus
patch 1100ac91b6af02d8639d518fad5b434b1bf44ed6.Signed-off-by: David Howells
Signed-off-by: Ingo Molnar -
This goes on top of the cpu_active_map (take 2) patch.
Currently we depend on the stop_machine to provide nescessesary
synchronization for the cpu_active_map updates.
As Dmitry Adamushko pointed this is fragile and is not much clearer
than the previous scheme. In other words we do not want to depend on
the internal stop machine operation here.
So make the synchronization rules clear by doing synchronize_sched()
after clearing out cpu active bit.Tested on quad-Core2 with:
while true; do
for i in 1 2 3; do
echo 0 > /sys/devices/system/cpu/cpu$i/online
done
for i in 1 2 3; do
echo 1 > /sys/devices/system/cpu/cpu$i/online
done
done
and
stress -c 200No lockdep, preempt or other complaints.
Signed-off-by: Max Krasnyansky
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).This not only boots but also easily handles
while true; do make clean; make -j 8; done
and
while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).Signed-off-by: Max Krasnyanskiy
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Acked-by: Gregory Haskins
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar -
(1) handle in a generic way all cases when a newly woken-up task is
not migratable (not just a corner case when "rt_se->nr_cpus_allowed ==
1")(2) if current is to be preempted, then make sure "p" will be picked
up by pick_next_task_rt().
i.e. move task's group at the head of its list as well.currently, it's not a case for the group-scheduling case as described
here: http://www.ussg.iu.edu/hypermail/linux/kernel/0807.0/0134.htmlSigned-off-by: Dmitry Adamushko
Cc: Steven Rostedt
Cc: Gregory Haskins
Signed-off-by: Ingo Molnar