Eric Lee / smarc-fsl-linux-kernel

09 May, 2007

13 commits

a436ed9c5 x86: create asm/cmpxchg.h ... Browse Code »

i386:

Rearrange the cmpxchg code to allow atomic.h to get it without needing to
include system.h. This kills warnings in the UML build from atomic.h about
implicit declarations of cmpxchg symbols. The i386 build presumably isn't
seeing this because a separate inclusion of system.h is covering it over.

The cmpxchg stuff is moved to asm-i386/cmpxchg.h, with an include left in
system.h for the benefit of generic code which expects cmpxchg there.

Meanwhile, atomic.h includes cmpxchg.h.

This causes no noticable damage to the i386 build.

x86_64:

Move cmpxchg into its own header. atomic.h already included system.h, so
this is changed to include cmpxchg.h.

This is purely cleanup - it's not fixing any warnings - so if the x86_64
system.h isn't considered as cleanup-worthy as i386, then this can be
dropped.

It causes no noticable damage to the x86_64 build.

uml:

The i386 and x86_64 cmpxchg patches require an asm-um/cmpxchg.h for the
UML build.

Signed-off-by: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Dike
2007-05-09 02:15:20 +0800
5dc12ddee Remove tas() ... Browse Code »

tas() has no users, so get rid of it.

Signed-off-by: Jeff Dike
Cc:
Cc: Paolo 'Blaisorblade' Giarrusso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Dike
2007-05-09 02:15:20 +0800
c343c14ae local_t: x86_64 extension ... Browse Code »

Signed-off-by: Mathieu Desnoyers
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-05-09 02:15:20 +0800
2856f5e31 atomic.h: atomic_add_unless as inline. Remove system.h atomic.h circular dependency ... Browse Code »

atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
I agree (with Andi Kleen) this typeof is not needed and more error
prone. All the original atomic.h code that uses cmpxchg (which includes
the atomic_add_unless) uses defines instead of inline functions,
probably to circumvent a circular dependency between system.h and
atomic.h on powerpc (which my patch addresses). Therefore, it makes
sense to use inline functions that will provide type checking.

atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
Digging into the FRV architecture shows me that it is also affected by
such a circular dependency. Here is the diff applying this against the
rest of my atomic.h patches.

It applies over the atomic.h standardization patches.

Signed-off-by: Mathieu Desnoyers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-05-09 02:15:20 +0800
79d365a30 atomic.h: add atomic64 cmpxchg, xchg and add_unless to x86_64 ... Browse Code »

Signed-off-by: Mathieu Desnoyers
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-05-09 02:15:19 +0800
1c710c896 utimensat implementation ... Browse Code »

Implement utimensat(2) which is an extension to futimesat(2) in that it

a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
of the BSD lutimes(3) functions

For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.

Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.

Also, the completely missing futimensat() functionality is added. We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).

Test application (the syscall number will need per-arch editing):

#include
#include
#include
#include
#include
#include

#define __NR_utimensat 280

#define UTIME_NOW ((1l << 30) - 1l)
#define UTIME_OMIT ((1l << 30) - 2l)

int
main(void)
{
int status = 0;

int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
if (fd == -1)
error (1, errno, "failed to create test file \"ttt\"");

struct stat64 st1;
if (fstat64 (fd, &st1) != 0)
error (1, errno, "fstat failed");

struct timespec t[2];
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");

struct stat64 st2;
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");

if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;

t[0] = st1.st_atim;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_OMIT;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");

if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");

if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("atim not set");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim changed from zero");
status = 1;
}
if (status != 0)
goto out;

t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_OMIT;
t[1] = st1.st_mtim;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");

if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");

if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("mtim changed from original time");
status = 1;
}
if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
|| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
{
puts ("mtim not set");
status = 1;
}
if (status != 0)
goto out;

sleep (2);

t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_NOW;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_NOW;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");

if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");

struct timeval tv;
gettimeofday(&tv,NULL);

if (st2.st_atim.tv_sec tv.tv_sec)
{
puts ("atim not set to NOW");
status = 1;
}
if (st2.st_mtim.tv_sec tv.tv_sec)
{
puts ("mtim not set to NOW");
status = 1;
}

if (symlink ("ttt", "tttsym") != 0)
error (1, errno, "cannot create symlink");

t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
error (1, errno, "utimensat failed");

if (lstat64 ("tttsym", &st2) != 0)
error (1, errno, "lstat failed");

if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("symlink atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("symlink mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;

t[0].tv_sec = 1;
t[0].tv_nsec = 0;
t[1].tv_sec = 1;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
error (1, errno, "utimensat failed");

if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");

if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to one");
status = 1;
}
if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to one");
status = 1;
}

if (status == 0)
puts ("all OK");

out:
close (fd);
unlink ("ttt");
unlink ("tttsym");

return status;
}

[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper
Cc: Alexey Dobriyan
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2007-05-09 02:15:18 +0800
63f6564d3 x86_64: kill 19000+ sparse warnings ... Browse Code »

Eliminate 19439 (!!) sparse warnings like:
include/linux/mm.h:321:22: warning: constant 0xffff810000000000 is so big it is unsigned long

Eliminate 56 sparse warnings like:
arch/x86_64/kernel/setup.c:248:16: warning: constant 0xffffffff80000000 is so big it is unsigned long

Eliminate 5 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xfffffffffff00000 is so big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:551:37: warning: constant 0xffffc20000000000 is so big it is unsigned long

Eliminate 6 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xffffffff88000000 is so big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:552:6: warning: constant 0xffffe1ffffffffff is so big it is unsigned long

Eliminate 3 sparse warnings like:
arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fffffffffff is so big it is long

Signed-off-by: Randy Dunlap
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-05-09 02:15:14 +0800
6df95fd7a consolidate asm/const.h to linux/const.h ... Browse Code »

Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.

Built on x86_64 and sparc64.

[akpm@linux-foundation.org: fix include/asm-x86_64/Kbuild]
Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-05-09 02:15:13 +0800
0bb5e19d6 Clean up mostly unused IOSPACE macros ... Browse Code »

Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h. However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.

This patch removes the redundant macros from all architectures except
sparc and sparc64.

Signed-off-by: David Gibson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2007-05-09 02:15:13 +0800
6672f76a5 kdump/kexec: calculate note size at compile time ... Browse Code »

Currently the size of the per-cpu region reserved to save crash notes is
set by the per-architecture value MAX_NOTE_BYTES. Which in turn is
currently set to 1024 on all supported architectures.

While testing ia64 I recently discovered that this value is in fact too
small. The particular setup I was using actually needs 1172 bytes. This
lead to very tedious failure mode where the tail of one elf note would
overwrite the head of another if they ended up being alocated sequentially
by kmalloc, which was often the case.

It seems to me that a far better approach is to caclculate the size that
the area needs to be. This patch does just that.

If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is
needed then this should be as easy as making MAX_NOTE_BYTES larger in
arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I
think that the approach in this patch is a much more robust idea.

Acked-by: Vivek Goyal
Signed-off-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2007-05-09 02:15:07 +0800
1eeb66a1b move die notifier handling to common code ... Browse Code »

This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Russell King
Signed-off-by: Bryan Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-05-09 02:15:04 +0800
6de02123b tty: i386/x86_64 arbitary speed support ... Browse Code »

Adds the needed TCGETS2/TCSETS2 ioctl calls, structures, defines and the like.
Tested against the test suite and passes. Other platforms should need
roughly the same change.

Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2007-05-09 02:15:03 +0800
f64da958d ipmi: add new IPMI nmi watchdog handling ... Browse Code »

Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
NMI. This add config options to know if there is the ability to receive NMIs
and if it has an NMI post processing call. Then it modifies the IPMI watchdog
to take advantage of this so that it can know if an NMI comes in.

It also adds testing that the IPMI NMI watchdog works.

Signed-off-by: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Corey Minyard
2007-05-09 02:14:58 +0800

07 May, 2007

1 commit

e3ebadd95 Revert "[PATCH] x86: __pa and __pa_symbol address space separation" ... Browse Code »

This was broken. It adds complexity, for no good reason. Rather than
separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
and preferably __pa() too - and just use "virt_to_phys()" instead, which
is more readable and has nicer semantics.

However, right now, just undo the separation, and make __pa_symbol() be
the exact same as __pa(). That fixes the bugs this patch introduced,
and we can do the fairly obvious cleanups later.

Do the new __phys_addr() function (which is now the actual workhorse for
the unified __pa()/__pa_symbol()) as a real external function, that way
all the potential issues with compile/link-time optimizations of
constant symbol addresses go away, and we can also, if we choose to, add
more sanity-checking of the argument.

Cc: Eric W. Biederman
Cc: Vivek Goyal
Cc: Andi Kleen
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-07 23:44:24 +0800

06 May, 2007

1 commit

ea62ccd00 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...

Fix up trivial conflict in include/linux/highmem.h manually.

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-06 05:55:20 +0800

05 May, 2007

2 commits

89661adaa Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits)
PCI: Free resource files in error path of pci_create_sysfs_dev_files()
pci-quirks: disable MSI on RS400-200 and RS480
PCI hotplug: Use menuconfig objects
PCI: ZT5550 CPCI Hotplug driver fix
PCI: rpaphp: Remove semaphores
PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry
PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically
PCI: rpaphp: Document is_php_dn()
PCI: rpaphp: Document find_php_slot()
PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot()
PCI: rpaphp: refactor tail call to rpaphp_register_slot()
PCI: rpaphp: remove rpaphp_set_attention_status()
PCI: rpaphp: remove print_slot_pci_funcs()
PCI: rpaphp: Remove setup_pci_slot()
PCI: rpaphp: remove a call that does nothing but a pointer lookup
PCI: rpaphp: Remove another wrappered function
PCI: rpaphp: Remve another call that is a wrapper
PCI: rpaphp: remove a function that does nothing but wrap debug printks
PCI: rpaphp: Remove un-needed goto
PCI: rpaphp: Fix a memleak; slot->location string was never freed
...

Linus Torvalds
2007-05-05 09:04:29 +0800
98b96173c Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
[AGPGART] sworks-agp: Switch to PCI ref counting APIs
[AGPGART] Nvidia AGP: Use refcount aware PCI interfaces
[AGPGART] Fix sparse warning in sgi-agp.c
[AGPGART] Intel-agp adjustments
[AGPGART] Move [un]map_page_into_agp into asm/agp.h
[AGPGART] Add missing calls to global_flush_tlb() to ali-agp
[AGPGART] prevent probe collision of sis-agp and amd64_agp

Linus Torvalds
2007-05-05 08:38:16 +0800

03 May, 2007

23 commits

a9dfd281a PCI: scatterlist.h needs types.h ... Browse Code »

Most architectures' scatterlist.h use the type dma_addr_t, but omit to
include which defines it. This could lead to build failures,
so let's add the missing includes.

Signed-off-by: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Jean Delvare
2007-05-03 10:02:34 +0800
05cb007da [PATCH] x86-64: Use the 32bit wd_ops for 64bit too. ... Browse Code »

This mainly removes a lot of code, replacing it with calls into the new 32bit
perfctr-watchdog.c

Signed-off-by: Andi Kleen

Andi Kleen
2007-05-03 01:27:20 +0800
57a4f91ae [PATCH] x86-64: Auto compute __NR_syscall_max at compile time ... Browse Code »

No need to maintain it anymore

Signed-off-by: Andi Kleen

Andi Kleen
2007-05-03 01:27:18 +0800
70ae77f49 [PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 ... Browse Code »

Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.

Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Andi Kleen

Fernando Luis [** ISO-8859-1 charset **] VázquezCao
2007-05-03 01:27:18 +0800
9062d888a [PATCH] x86-64: __send_IPI_dest_field - x86_64 ... Browse Code »

Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.

Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Andi Kleen

Fernando Luis [** ISO-8859-1 charset **] VázquezCao
2007-05-03 01:27:18 +0800
8339e9fba [PATCH] x86-64: safe_apic_wait_icr_idle - x86_64 ... Browse Code »

apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: Added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Andi Kleen

Fernando Luis VazquezCao
2007-05-03 01:27:17 +0800
2b1f6278d [PATCH] x86: Save the MTRRs of the BSP before booting an AP ... Browse Code »

Applied fix by Andew Morton:
http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.

AMD and Intel x86 CPU manuals state that it is the responsibility of
system software to initialize and maintain MTRR consistency across
all processors in Multi-Processing Environments.

Quote from page 188 of the AMD64 System Programming manual (Volume 2):

7.6.5 MTRRs in Multi-Processing Environments

"In multi-processing environments, the MTRRs located in all processors must
characterize memory in the same way. Generally, this means that identical
values are written to the MTRRs used by the processors." (short omission here)
"Failure to do so may result in coherency violations or loss of atomicity.
Processor implementations do not check the MTRR settings in other processors
to ensure consistency. It is the responsibility of system software to
initialize and maintain MTRR consistency across all processors."

Current Linux MTRR code already implements the above in the case that the
BIOS does not properly initialize MTRRs on the secondary processors,
but the case where the fixed-range MTRRs of the boot processor are changed
after Linux started to boot, before the initialsation of a secondary
processor, is not handled yet.

In this case, secondary processors are currently initialized by Linux
with MTRRs which the boot processor had very early, when mtrr_bp_init()
did run, but not with the MTRRs which the boot processor uses at the
time when that secondary processors is actually booted,
causing differing MTRR contents on the secondary processors.

Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
of the boot processor when it transitions the system into ACPI mode.
The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
code runs acpi_enable().

Other occasions where the SMI handler of the BIOS may change bits in
the MTRRs could occur as well. To initialize newly booted secodary
processors with the fixed-range MTRRs which the boot processor uses
at that time, this patch saves the fixed-range MTRRs of the boot
processor before new secondary processors are started. When the
secondary processors run their Linux initialisation code, their
fixed-range MTRRs will be updated with the saved fixed-range MTRRs.

If CONFIG_MTRR is not set, we define mtrr_save_state
as an empty statement because there is nothing to do.

Possible TODOs:

*) CPU-hotplugging outside of SMP suspend/resume is not yet tested
with this patch.

*) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
then the calls to mtrr_save_state() could be replaced by calls to
mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be
needed.

That would need either verification of the CPU-hotplug code or
at least a test on a >2 CPU machine.

*) The MTRRs of other running processors are not yet checked at this
time but it might be interesting to syncronize the MTTRs of all
processors before booting. That would be an incremental patch,
but of rather low priority since there is no machine known so
far which would require this.

AK: moved prototypes on x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl
Signed-off-by: Andrew Morton
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Dave Jones

Bernhard Kaindl
2007-05-03 01:27:17 +0800
2b3b4835c [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. ... Browse Code »

In this current implementation which is used in other patches,
mtrr_save_fixed_ranges() accepts a dummy void pointer because
in the current implementation of one of these patches, this
function may be called from smp_call_function_single() which
requires that this function takes a void pointer argument.

This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
which is the element of the static struct which stores our current
backup of the fixed-range MTRR values which all CPUs shall be
using.

Because mtrr_save_fixed_ranges calls get_fixed_ranges after
kernel initialisation time, __init needs to be removed from
the declaration of get_fixed_ranges().

If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
as an empty statement because there is nothing to do.

AK: Moved prototypes for x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl
Signed-off-by: Andi Kleen
Cc: Andrew Morton
Cc: Andi Kleen
Cc: Dave Jones

Bernhard Kaindl
2007-05-03 01:27:17 +0800
856f44ff4 [PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h ... Browse Code »

Signed-off-by: Andi Kleen

Andi Kleen
2007-05-03 01:27:17 +0800
441d40dca [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org> ... Browse Code »

The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end. Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Rusty Russell
Signed-off-by: Andrew Morton

Jeremy Fitzhardinge
2007-05-03 01:27:16 +0800
d6dd61c83 [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction ... Browse Code »

Add hooks to allow a paravirt implementation to track the lifetime of
an mm. Paravirtualization requires three hooks, but only two are
needed in common code. They are:

arch_dup_mmap, which is called when a new mmap is created at fork

arch_exit_mmap, which is called when the last process reference to an
mm is dropped, which typically happens on exit and exec.

The third hook is activate_mm, which is called from the arch-specific
activate_mm() macro/function, and so doesn't need stub versions for
other architectures. It's called when an mm is first used.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: linux-arch@vger.kernel.org
Cc: James Bottomley
Acked-by: Ingo Molnar

Jeremy Fitzhardinge
2007-05-03 01:27:14 +0800
4bc5aa91f [PATCH] x86: Clean up x86 control register and MSR macros (corrected) ... Browse Code »

This patch is based on Rusty's recent cleanup of the EFLAGS-related
macros; it extends the same kind of cleanup to control registers and
MSRs.

It also unifies these between i386 and x86-64; at least with regards
to MSRs, the two had definitely gotten out of sync.

Signed-off-by: H. Peter Anvin
Signed-off-by: Andi Kleen

H. Peter Anvin
2007-05-03 01:27:12 +0800
f039b7547 [PATCH] x86: Don't use MWAIT on AMD Family 10 ... Browse Code »

It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.

I also removed the obsolete idle=halt on i386

Cc: andreas.herrmann@amd.com

Signed-off-by: Andi Kleen

Andi Kleen
2007-05-03 01:27:12 +0800
c169859d6 [PATCH] x86-64: Clean up asm-x86_64/bugs.h ... Browse Code »

Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Linus Torvalds

Jeremy Fitzhardinge
2007-05-03 01:27:12 +0800
bbf30a165 [PATCH] x86-64: fix arithmetic in comment ... Browse Code »

The xmm space on x86_64 is 256 bytes.

Signed-off-by: Avi Kivity
Signed-off-by: Andi Kleen

Avi Kivity
2007-05-03 01:27:12 +0800
5d02d7ae7 [PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h. ... Browse Code »

As per i386 patch: move X86_EFLAGS_IF et al out to a new header:
processor-flags.h, so we can include it from irqflags.h and use it in
raw_irqs_disabled_flags().

As a side-effect, we could now use these flags in .S files.

Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen

Andi Kleen
2007-05-03 01:27:11 +0800
b00742d39 [PATCH] x86-64: Account for module percpu space separately from kernel percpu ... Browse Code »

Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: Rusty Russell
Cc: Eric W. Biederman
Cc: Andi Kleen

Jeremy Fitzhardinge
2007-05-03 01:27:11 +0800
ca906e423 [PATCH] x86: sys_ioperm() prototype cleanup ... Browse Code »

- there's no reason for duplicating the prototype from
include/linux/syscalls.h in include/asm-x86_64/unistd.h
- every file should #include the headers containing the prototypes for
it's global functions

Signed-off-by: Adrian Bunk
Signed-off-by: Andi Kleen

Adrian Bunk
2007-05-03 01:27:10 +0800
2bff73830 [PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management. ... Browse Code »

x86_64 currently simulates a list using the index and private fields of the
page struct. Seems that the code was inherited from i386. But x86_64 does
not use the slab to allocate pgds and pmds etc. So the lru field is not
used by the slab and therefore available.

This patch uses standard list operations on page->lru to realize pgd
tracking.

Signed-off-by: Christoph Lameter
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Signed-off-by: Andrew Morton

Christoph Lameter
2007-05-03 01:27:10 +0800
eab0c72ae [PATCH] x86-64: Introduce load_TLS to the "for" loop. ... Browse Code »

GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable. (I've also submitted a patch which cleans up
i386, which is even uglier).

Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Signed-off-by: Andrew Morton

Rusty Russell
2007-05-03 01:27:09 +0800
8b8ca80e1 [PATCH] x86-64: configurable fake numa node sizes ... Browse Code »

Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes. These nodes can be used in conjunction with cpusets for coarse
memory resource management.

The old command-line option is still supported:
numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the
actual machine.

But now you may configure your system for the node sizes of your choice:
numa=fake=2*512,1024,2*256
gives two 512M nodes, one 1024M node, two 256M nodes, and
the rest of system memory to a sixth node.

The existing hash function is maintained to support the various node sizes
that are possible with this implementation.

Each node of the same size receives roughly the same amount of available
pages, regardless of any reserved memory with its address range. The total
available pages on the system is calculated and divided by the number of equal
nodes to allocate. These nodes are then dynamically allocated and their
borders extended until such time as their number of available pages reaches
the required size.

Configurable node sizes are recommended when used in conjunction with cpusets
for memory control because it eliminates the overhead associated with scanning
the zonelists of many smaller full nodes on page_alloc().

Cc: Andi Kleen
Signed-off-by: David Rientjes
Signed-off-by: Andi Kleen
Cc: Paul Jackson
Cc: Christoph Lameter
Signed-off-by: Andrew Morton

David Rientjes
2007-05-03 01:27:09 +0800
5a90cf205 [PATCH] x86: Log reason why TSC was marked unstable ... Browse Code »

Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.

This is then displayed the first time mark_tsc_unstable is called.

This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.

Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Andi Kleen

john stultz
2007-05-03 01:27:08 +0800
6a50a664c [PATCH] x86-64: build-time checking ... Browse Code »

o X86_64 kernel should run from 2MB aligned address for two reasons.
- Performance.
- For relocatable kernels, page tables are updated based on difference
between compile time address and load time physical address.
This difference should be multiple of 2MB as kernel text and data
is mapped using 2MB pages and PMD should be pointing to a 2MB
aligned address. Life is simpler if both compile time and load time
kernel addresses are 2MB aligned.

o Flag the error at compile time if one is trying to build a kernel which
does not meet alignment restrictions.

Signed-off-by: Vivek Goyal
Signed-off-by: Andi Kleen
Cc: "Eric W. Biederman"
Cc: Andi Kleen
Signed-off-by: Andrew Morton

Vivek Goyal
2007-05-03 01:27:08 +0800