Doug / smarc-fsl-linux-kernel | Embedian Git Server

09 Oct, 2005

1 commit

dd0fc66fb [PATCH] gfp flags annotations - part 1 ... Browse Code »

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-10-09 06:00:57 +0800

01 Oct, 2005

1 commit

d96c4e7bb [PATCH] x86: hw_irq.h warning fix ... Browse Code »

include/asm/hw_irq.h:70: warning: `struct hw_interrupt_type' declared inside parameter list
include/asm/hw_irq.h:70: warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-10-01 03:41:17 +0800

22 Sep, 2005

1 commit

c51179fb0 [PATCH] uml: adapt asm/futex.h to our arch ... Browse Code »

Follow up to 4732efbeb997189d9f9b04708dc26bf8613ed721 - uml must just reuse
as-is the backing architecture support. There is a micro-fixup is needed for the
included file, which won't affect i386 behaviour at all.

I've not tested compilation on x86_64, only on x86, but the code is almost the
same except the culprit test, so everything should be ok on x86_64 too.

Cc: Jakub Jelinek
Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-22 07:16:29 +0800

13 Sep, 2005

4 commits

33bf56106 [PATCH] feature removal of io_remap_page_range() ... Browse Code »

As written in Documentation/feature-removal-schedule.txt, remove the
io_remap_page_range() kernel API.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-09-13 23:22:33 +0800
66759a01a [PATCH] x86-64: i386/x86-64: Fix time going twice as fast problem on ATI Xpress chipsets ... Browse Code »

Original patch from Bertro Simul

This is probably still not quite correct, but seems to be
the best solution so far.

Signed-off-by: Linus Torvalds

Chuck Ebbert
2005-09-13 01:50:58 +0800
6e44f12ba [PATCH] i386: add memory clobbers to syscall macros ... Browse Code »

As noted by matz@suse.de

The problem is, that on i386 the syscallN
macro is defined like so:

long __res; \
__asm__ volatile ("int $0x80" \
: "=a" (__res) \
: "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \
"d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5))); \

If one of the arguments (in the _llseek syscall it's the arg4) is a pointer
which the syscall is expected to write to (to the memory pointed to by this
ptr), then this side-effect is not captured in the asm.

If anyone uses this macro to define it's own version of the syscall
(sometimes necessary when not using glibc) and it's inlined, then GCC
doesn't know that this asm write to "*dest", when called like so for instance:

out = 1;
llseek (fd, bla, blubb, &out, trara)
use (out);

Here nobody tells GCC that "out" actually is written to (just a pointer to it
is passed to the asm). Hence GCC might (and in the above bug did)
copy-propagate "1" into the second use of "out".

The easiest solution would be to add a "memory" clobber to the definition
of this syscall macro. As this is a syscall, it shouldn't inhibit too many
optimizations.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-09-13 01:50:55 +0800
69e1a33f6 [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignments ... Browse Code »

Since this is shared code I had to implement it for i386 too

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-09-13 01:49:57 +0800

11 Sep, 2005

2 commits

e2afe6745 [PATCH] include/asm-i386/: "extern inline" -> "static inline" ... Browse Code »

"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-09-11 01:06:34 +0800
fb1c8f93d [PATCH] spinlock consolidation ... Browse Code »

This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code. It does the following
things:

- consolidates and enhances the spinlock/rwlock debugging code

- simplifies the asm/spinlock.h files

- encapsulates the raw spinlock type and moves generic spinlock
features (such as ->break_lock) into the generic code.

- cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c. (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

include/asm-i386/spinlock_types.h | 16
include/asm-x86_64/spinlock_types.h | 16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

SMP | UP
----------------------------|-----------------------------------
asm/spinlock_types_smp.h | linux/spinlock_types_up.h
linux/spinlock_types.h | linux/spinlock_types.h
asm/spinlock_smp.h | linux/spinlock_up.h
linux/spinlock_api_smp.h | linux/spinlock_api_up.h
linux/spinlock.h | linux/spinlock.h

/*
* here's the role of the various spinlock/rwlock related include files:
*
* on SMP builds:
*
* asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
* initializers
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
* implementations, mostly inline assembly code
*
* (also included on UP-debug builds:)
*
* linux/spinlock_api_smp.h:
* contains the prototypes for the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*
* on UP builds:
*
* linux/spinlock_type_up.h:
* contains the generic, simplified UP spinlock type.
* (which is an empty structure on non-debug builds)
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* linux/spinlock_up.h:
* contains the __raw_spin_*()/etc. version of UP
* builds. (which are NOPs on non-debug, non-preempt
* builds)
*
* (included on UP-non-debug builds:)
*
* linux/spinlock_api_up.h:
* builds the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*/

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler

Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
Builds 32-bit SMP kernel (not booted or tested). I did not try to build
non-SMP kernels. That should be trivial to fix up later if necessary.

I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
are well tested and contained entirely inside arch specific code. I do NOT
expect any new issues to arise with them.

If someone does ever need to use debug/metrics with them, then they will
need to unravel this hairball between spinlocks, atomic ops, and bit ops
that exist only because parisc has exactly one atomic instruction: LDCW
(load and clear word).

From: "Luck, Tony"

ia64 fix

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Grant Grundler
Cc: Matthew Wilcox
Signed-off-by: Hirokazu Takata
Signed-off-by: Mikael Pettersson
Signed-off-by: Benoit Boissinot
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2005-09-11 01:06:21 +0800

10 Sep, 2005

4 commits

486a153f0 Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild Browse Code »

Linus Torvalds
2005-09-10 06:46:49 +0800
59f4e7d57 [PATCH] fix reboot via keyboard controller reset ... Browse Code »

I have a system (Biostar IDEQ210M mini-pc with a VIA chipset) which will
not reboot unless a keyboard is plugged in to it. I have tried all
combinations of the kernel "reboot=x,y" flags to no avail. Rebooting by
any method will leave the system in a wedged state (at the "Restarting
system" message).

I finally tracked the problem down to the machine's refusal to fully reboot
unless the keyboard controller status register had bit 2 set. This is the
"System flag" which when set, indicates successful completion of the
keyboard controller self-test (Basic Assurance Test, BAT).

I suppose that something is trying to protect against sporadic reboots
unless the keyboard controller is in a good state (a keyboard is present),
but I need this machine to be headless.

I found that setting the system flag (via the command byte) before giving
the "pulse reset line" command will allow the reboot to proceed. The patch
is simple, and I think it should be fine for everybody whether they have
this type of machine or not. This affects the "hard" reboot (as done when
the kernel boot flags "reboot=c,h" are used).

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Truxton Fulton
2005-09-10 04:57:35 +0800
abda24528 [PATCH] i386: CONFIG_ACPI_SRAT typo fix ... Browse Code »

Fix a typo involving CONFIG_ACPI_SRAT.

Signed-off-by: Magnus Damm
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Magnus Damm
2005-09-10 04:56:44 +0800
86feeaa81 kbuild: full dependency check on asm-offsets.h ... Browse Code »

Building asm-offsets.h has been moved to a seperate Kbuild file
located in the top-level directory. This allow us to share the
functionality across the architectures.

The old rules in architecture specific Makefiles will die
in subsequent patches.

Furhtermore the usual kbuild dependency tracking is now used
when deciding to rebuild asm-offsets.s. So we no longer risk
to fail a rebuild caused by asm-offsets.c dependencies being touched.

With this common rule-set we now force the same name across
all architectures. Following patches will fix the rest.

Signed-off-by: Sam Ravnborg

Sam Ravnborg
2005-09-10 01:28:28 +0800

08 Sep, 2005

10 commits

64e47488c Merge linux-2.6 with linux-acpi-2.6 Browse Code »

Len Brown
2005-09-08 13:45:47 +0800
8d286aa5e [PATCH] Clean up struct flock64 definitions ... Browse Code »

This patch gathers all the struct flock64 definitions (and the operations),
puts them under !CONFIG_64BIT and cleans up the arch files.

Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2005-09-08 07:57:38 +0800
5ac353f9b [PATCH] Clean up struct flock definitions ... Browse Code »

This patch just gathers together all the struct flock definitions except
xtensa into asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2005-09-08 07:57:38 +0800
1abf62afb [PATCH] Clean up the fcntl operations ... Browse Code »

This patch puts the most popular of each fcntl operation/flag into
asm-generic/fcntl.h and cleans up the arch files.

Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2005-09-08 07:57:38 +0800
e64ca97fd [PATCH] Clean up the open flags ... Browse Code »

This patch puts the most popular of each open flag into asm-generic/fcntl.h
and cleans up the arch files.

Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2005-09-08 07:57:38 +0800
9317259ea [PATCH] Create asm-generic/fcntl.h ... Browse Code »

This set of patches creates asm-generic/fcntl.h and consolidates as much as
possible from the asm-*/fcntl.h files into it.

This patch just gathers all the identical bits of the asm-*/fcntl.h files into
asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell
Signed-off-by: Yoichi Yuasa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2005-09-08 07:57:37 +0800
97de50c0a [PATCH] remove verify_area(): remove verify_area() from various uaccess.h headers ... Browse Code »

Remove the deprecated (and unused) verify_area() from various uaccess.h
headers.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-09-08 07:57:35 +0800
c8d127418 [PATCH] remove asm-*/hdreg.h ... Browse Code »

unused and useless..

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-09-08 07:57:30 +0800
36d57ac4a [PATCH] auxiliary vector cleanups ... Browse Code »

The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't
very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE
so that we can change it if necessary when a new vector is added.

Because of include file ordering problems, doing this necessitated the
extraction of the AT_* symbols into a standalone header file.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H. J. Lu
2005-09-08 07:57:21 +0800
4732efbeb [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup ... Browse Code »

ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
(which at least on UP usually means an immediate context switch to one of
the waiter threads). This waiter wakes up and after a few instructions it
attempts to acquire the cv internal lock, but that lock is still held by
the thread calling pthread_cond_signal. So it goes to sleep and eventually
the signalling thread is scheduled in, unlocks the internal lock and wakes
the waiter again.

Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).

Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force in place of
lll_mutex_unlock is not enough and probably why it has been disabled at
that time:

The number is value in cv->__data.__lock.
thr1 thr2 thr3
0 pthread_cond_wait
1 lll_mutex_lock (cv->__data.__lock)
0 lll_mutex_unlock (cv->__data.__lock)
0 lll_futex_wait (&cv->__data.__futex, futexval)
0 pthread_cond_signal
1 lll_mutex_lock (cv->__data.__lock)
1 pthread_cond_signal
2 lll_mutex_lock (cv->__data.__lock)
2 lll_futex_wait (&cv->__data.__lock, 2)
2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
# FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2 lll_mutex_unlock_force (cv->__data.__lock)
0 cv->__data.__lock = 0
0 lll_futex_wake (&cv->__data.__lock, 1)
1 lll_mutex_lock (cv->__data.__lock)
0 lll_mutex_unlock (cv->__data.__lock)
# Here, lll_mutex_unlock doesn't know there are threads waiting
# on the internal cv's lock

Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one of
these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.

We would need to use lll_mutex_unlock_force in pthread_cond_signal after
requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.

Another alternative is to do the unlocking pthread_cond_signal needs to do
(the lock can't be unlocked before lll_futex_wake, as that is racy) in the
kernel.

I have implemented both variants, futex-requeue-glibc.patch is the first
one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
The kernel interface allows userland to specify how exactly an unlocking
operation should look like (some atomic arithmetic operation with optional
constant argument and comparison of the previous futex value with another
constant).

It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as a
starting point by maintainers to write support for their arches and ATM
will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been
(lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.

With the following benchmark on UP x86-64 I get:

for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s

The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.

Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.

Acked-by: Ingo Molnar
Cc: Ulrich Drepper
Cc: Jamie Lokier
Cc: Rusty Russell
Signed-off-by: Yoichi Yuasa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jakub Jelinek
2005-09-08 07:57:17 +0800

05 Sep, 2005

17 commits

ed75e8d58 [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage ... Browse Code »

Jeff Dike ,
Paolo 'Blaisorblade' Giarrusso ,
Bodo Stroesser

Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
except that the kernel does not execute the requested syscall; this is useful
to improve performance for virtual environments, like UML, which want to run
the syscall on their own.

In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
and on exit, and each time you also have two context switches; with SYSEMU you
avoid the 2nd stop and so save two context switches per syscall.

Also, some architectures don't have support in the host for changing the
syscall number via ptrace(), which is currently needed to skip syscall
execution (UML turns any syscall into getpid() to avoid it being executed on
the host). Fixing that is hard, while SYSEMU is easier to implement.

* This version of the patch includes some suggestions of Jeff Dike to avoid
adding any instructions to the syscall fast path, plus some other little
changes, by myself, to make it work even when the syscall is executed with
SYSENTER (but I'm unsure about them). It has been widely tested for quite a
lot of time.

* Various fixed were included to handle the various switches between
various states, i.e. when for instance a syscall entry is traced with one of
PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
Basically, this is done by remembering which one of them was used even after
the call to ptrace_notify().

* We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
to make do_syscall_trace() notice that the current syscall was started with
SYSEMU on entry, so that no notification ought to be done in the exit path;
this is a bit of a hack, so this problem is solved in another way in next
patches.

* Also, the effects of the patch:
"Ptrace - i386: fix Syscall Audit interaction with singlestep"
are cancelled; they are restored back in the last patch of this series.

Detailed descriptions of the patches doing this kind of processing follow (but
I've already summed everything up).

* Fix behaviour when changing interception kind #1.

In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
only after doing the debugger notification; but the debugger might have
changed the status of this flag because he continued execution with
PTRACE_SYSCALL, so this is wrong. This patch fixes it by saving the flag
status before calling ptrace_notify().

* Fix behaviour when changing interception kind #2:
avoid intercepting syscall on return when using SYSCALL again.

A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
crashes.

The problem is in arch/i386/kernel/entry.S. The current SYSEMU patch
inhibits the syscall-handler to be called, but does not prevent
do_syscall_trace() to be called after this for syscall completion
interception.

The appended patch fixes this. It reuses the flag TIF_SYSCALL_EMU to
remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
the flag is unused in the depicted situation.

* Fix behaviour when changing interception kind #3:
avoid intercepting syscall on return when using SINGLESTEP.

When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
problems with singlestepping on UML in SKAS with SYSEMU. It looped
receiving SIGTRAPs without moving forward. EIP of the traced process was
the same for all SIGTRAPs.

What's missing is to handle switching from PTRACE_SYSCALL_EMU to
PTRACE_SINGLESTEP in a way very similar to what is done for the change from
PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.

I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
notified and then wake ups the process; the syscall is executed (or skipped,
when do_syscall_trace() returns 0, i.e. when using PTRACE_SYSEMU), and
do_syscall_trace() is called again. Since we are on the return path of a
SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
we must still avoid notifying the parent of the syscall exit. Now, this
behaviour is extended even to resuming with PTRACE_SINGLESTEP.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Laurent Vivier
2005-09-05 15:06:20 +0800
c3c433e4f [PATCH] add suspend/resume for timer ... Browse Code »

The timers lack .suspend/.resume methods. Because of this, jiffies got a
big compensation after a S3 resume. And then softlockup watchdog reports
an oops. This occured with HPET enabled, but it's also possible for other
timers.

Signed-off-by: Shaohua Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2005-09-05 15:06:18 +0800
4ad8d3834 [PATCH] i386 boottime for_each_cpu broken ... Browse Code »

for_each_cpu walks through all processors in cpu_possible_map, which is
defined as cpu_callout_map on i386 and isn't initialised until all
processors have been booted. This breaks things which do for_each_cpu
iterations early during boot. So, define cpu_possible_map as a bitmap with
NR_CPUS bits populated. This was triggered by a patch i'm working on which
does alloc_percpu before bringing up secondary processors.

From: Alexander Nyberg

i386-boottime-for_each_cpu-broken.patch
i386-boottime-for_each_cpu-broken-fix.patch

The SMP version of __alloc_percpu checks the cpu_possible_map before
allocating memory for a certain cpu. With the above patches the BSP cpuid
is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
machines (as soon as someone tries to dereference something allocated via
__alloc_percpu, which in fact is never allocated since the cpu is not set
in cpu_possible_map).

Signed-off-by: Zwane Mwaikambo
Signed-off-by: Alexander Nyberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zwane Mwaikambo
2005-09-05 15:06:13 +0800
d7271b14b [PATCH] i386: encapsulate copying of pgd entries ... Browse Code »

Add a clone operation for pgd updates.

This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them. This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.

The new function is:

clone_pgd_range(pgd_t *dst, pgd_t *src, int count);

dst - pointer to pgd range anwhere on a pgd page
src - ""
count - the number of pgds to copy.

dst and src can be on the same page, but the range must not overlap
and must not cross a page boundary.

Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend. CC'ing Pavel in case
he has any feedback on this.

Thanks to Chris Wright for noticing that this could be more optimal in
PAE compiles by eliminating the memset.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:13 +0800
748f2edb5 [PATCH] x86 NMI: better support for debuggers ... Browse Code »

This patch adds a notify to the die_nmi notify that the system is about to
be taken down. If the notify is handled with a NOTIFY_STOP return, the
system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and b) possibly
allow the system to continue, realizing that the time out may be due to
debugger activities such as single stepping which is usually done with
"other" cpus held.

Signed-off-by: George Anzinger
Cc: Keith Owens
Signed-off-by: George Anzinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

George Anzinger
2005-09-05 15:06:13 +0800
f2f30ebca [PATCH] x86: introduce a write acessor for updating the current LDT ... Browse Code »

Introduce a write acessor for updating the current LDT. This is required
for hypervisors like Xen that do not allow LDT pages to be directly
written.

Testing - here's a fun little LDT test that can be trivially modified to
test limits as well.

/*
* Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
* This is licensed under the GPL.
*/

#include
#include
#include
#include
#include
#include
#include
#define __KERNEL__
#include

void main(void)
{
struct user_desc desc;
char *code;
unsigned long long tsc;

code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
desc.entry_number = 0;
desc.base_addr = code;
desc.limit = 1;
desc.seg_32bit = 1;
desc.contents = MODIFY_LDT_CONTENTS_CODE;
desc.read_exec_only = 0;
desc.limit_in_pages = 1;
desc.seg_not_present = 0;
desc.useable = 1;
if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
perror("modify_ldt");
}
printf("code base is 0x%08x\n", (unsigned)code);
code[0x0ffe] = 0x0f; /* rdtsc */
code[0x0fff] = 0x31;
code[0x1000] = 0xcb; /* lret */
__asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
printf("TSC is 0x%016llx\n", tsc);
}

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:13 +0800
a52011293 [PATCH] x86: make IOPL explicit ... Browse Code »

The pushf/popf in switch_to are ONLY used to switch IOPL. Making this
explicit in C code is more clear. This pushf/popf pair was added as a
bugfix for leaking IOPL to unprivileged processes when using
sysenter/sysexit based system calls (sysexit does not restore flags).

When requesting an IOPL change in sys_iopl(), it is just as easy to change
the current flags and the flags in the stack image (in case an IRET is
required), but there is no reason to force an IRET if we came in from the
SYSENTER path.

This change is the minimal solution for supporting a paravirtualized Linux
kernel that allows user processes to run with I/O privilege. Other
solutions require radical rewrites of part of the low level fault / system
call handling code, or do not fully support sysenter based system calls.

Unfortunately, this added one field to the thread_struct. But as a bonus,
on P4, the fastest time measured for switch_to() went from 312 to 260
cycles, a win of about 17% in the fast case through this performance
critical path.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:12 +0800
0998e4228 [PATCH] x86: privilege cleanup ... Browse Code »

Privilege checking cleanup. Originally, these diffs were much greater, but
recent cleanups in Linux have already done much of the cleanup. I added
some explanatory comments in places where the reasoning behind certain
tests is rather subtle.

Also, in traps.c, we can skip the user_mode check in handle_BUG(). The
reason is, there are only two call chains - one via die_if_kernel() and one
via do_page_fault(), both entering from die(). Both of these paths already
ensure that a kernel mode failure has happened. Also, the original check
here, if (user_mode(regs)) was insufficient anyways, since it would not
rule out BUG faults from V8086 mode execution.

Saving the %ss segment in show_regs() rather than assuming a fixed value
also gives better information about the current kernel state in the
register dump.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:12 +0800
f2ab44612 [PATCH] x86: more asm cleanups ... Browse Code »

Some more assembler cleanups I noticed along the way.

Signed-off-by: Zachary Amsden
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:12 +0800
4f0cb8d97 [PATCH] i386: fix incorrect TSS entry for LDT ... Browse Code »

Noticed by Chuck Ebbert: the .ldt entry of the TSS was set up incorrectly.
It never mattered since this was a leftover from old times, so remove it.

Signed-off-by: Ingo Molnar
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2005-09-05 15:06:12 +0800
c9b02a241 [PATCH] i386: use set_pte macros in a couple places where they were missing ... Browse Code »

Also, setting PDPEs in PAE mode does not require atomic operations, since the
PDPEs are cached by the processor, and only reloaded on an explicit or
implicit reload of CR3.

Since the four PDPEs must always be present in an active root, and the kernel
PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using
task gates (which reload CR3). Actually, much of this is moot, since the user
PDPEs are never updated either, and the only usage of task gates is by the
doublefault handler. It appears the only place PGDs get updated in PAE mode
is in init_low_mappings() / zap_low_mapping() for initial page table creation
and recovery from ACPI sleep state, and these sites are safe by inspection.
Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on
P4.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:12 +0800
2f2984eb4 [PATCH] i386: generate better code around descriptor update and access functions ... Browse Code »

GCC can generate better code around descriptor update and access functions
when there is not an explicit "eax" register constraint.

Testing: You won't boot if this is messed up, since the TSS descriptor will be
corrupted. Verified the assembler and booted.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:11 +0800
4d37e7e3f [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management ... Browse Code »

i386 inline assembler cleanup.

This change encapsulates descriptor and task register management. Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:11 +0800
245067d16 [PATCH] i386: cleanup serialize msr ... Browse Code »

i386 arch cleanup. Introduce the serialize macro to serialize processor
state. Why the microcode update needs it I am not quite sure, since wrmsr()
is already a serializing instruction, but it is a microcode update, so I will
keep the semantic the same, since this could be a timing workaround. As far
as I can tell, this has always been there since the original microcode update
source.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:11 +0800
4bb0d3ec3 [PATCH] i386: inline asm cleanup ... Browse Code »

i386 Inline asm cleanup. Use cr/dr accessor functions.

Also, a potential bugfix. Also, some CR accessors really should be volatile.
Reads from CR0 (numeric state may change in an exception handler), writes to
CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
was not there to begin with, and in no case should kernel memory be clobbered,
except when doing a TLB flush, which already has memory clobber.

I noticed that page invalidation does not have a memory clobber. I can't find
a bug as a result, but there is definitely a potential for a bug here:

#define __flush_tlb_single(addr) \
__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:11 +0800
56f1d5d52 [PATCH] ES7000 platform update (i386) ... Browse Code »

This is subarch update for ES7000. I've modified platform check code and
removed unnecessary OEM table parsing for newer systems that don't use OEM
information during boot. Parsing the table in fact is causing problems,
and the platform doesn't get recognized. The patch only affects the ES7000
subach.

Signed-off-by:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Natalie.Protasevich@unisys.com
2005-09-05 15:06:10 +0800
911a62d42 [PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs ... Browse Code »

i386 generic subarchitecture requires explicit dmi strings or command line
to enable bigsmp mode. The patch below removes that restriction, and uses
bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
xAPIC support.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Venkatesh Pallipadi
2005-09-05 15:06:10 +0800