Eric Lee / smarc-fsl-linux-kernel

24 Jun, 2005

40 commits

d6e711448 [PATCH] setuid core dump ... Browse Code »

Add a new `suid_dumpable' sysctl:

This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are

0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped

1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.

2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.

(akpm:

> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?

No problem to me.

> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?

Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)

> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.

Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.

)

Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2005-06-24 00:45:26 +0800
ea32c65cc [PATCH] kprobes: Temporary disarming of reentrant probe ... Browse Code »

In situations where a kprobes handler calls a routine which has a probe on it,
then kprobes_handler() disarms the new probe forever. This patch removes the
above limitation by temporarily disarming the new probe. When the another
probe hits while handling the old probe, the kprobes_handler() saves previous
kprobes state and handles the new probe without calling the new kprobes
registered handlers. kprobe_post_handler() restores back the previous kprobes
state and the normal execution continues.

However on x86_64 architecture, re-rentrancy is provided only through
pre_handler(). If a routine having probe is referenced through
post_handler(), then the probes on that routine are disarmed forever, since
the exception stack is gets changed after the processor single steps the
instruction of the new probe.

This patch includes generic changes to support temporary disarming on
reentrancy of probes.

Signed-of-by: Prasanna S Panchamukhi

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Prasanna S Panchamukhi
2005-06-24 00:45:24 +0800
1674eafcb [PATCH] Kprobes IA64: cmp ctype unc support ... Browse Code »

The current Kprobes when patching the original instruction with the break
instruction tries to retain the original qualifying predicate(qp), however
for cmp.crel.ctype where ctype == unc, which is a special instruction
always needs to be executed irrespective of qp. Hence, if the instruction
we are patching is of this type, then we should not copy the original qp to
the break instruction, this is because we always want the break fault to
happen so that we can emulate the instruction.

This patch is based on the feedback given by David Mosberger

Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anil S Keshavamurthy
2005-06-24 00:45:23 +0800
8bc76772a [PATCH] Kprobes ia64 cleanup ... Browse Code »

A cleanup of the ia64 kprobes implementation such that all of the bundle
manipulation logic is concentrated in arch_prepare_kprobe().

With the current design for kprobes, the arch specific code only has a
chance to return failure inside the arch_prepare_kprobe() function.

This patch moves all of the work that was happening in arch_copy_kprobe()
and most of the work that was happening in arch_arm_kprobe() into
arch_prepare_kprobe(). By doing this we can add further robustness checks
in arch_arm_kprobe() and refuse to insert kprobes that will cause problems.

Signed-off-by: Rusty Lynch
Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Lynch
2005-06-24 00:45:23 +0800
cd2675bf6 [PATCH] Kprobes/IA64: support kprobe on branch/call instructions ... Browse Code »

This patch is required to support kprobe on branch/call instructions.

Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anil S Keshavamurthy
2005-06-24 00:45:23 +0800
b2761dc26 [PATCH] Kprobes/IA64: architecture specific JProbes support ... Browse Code »

This patch adds IA64 architecture specific JProbes support on top of Kprobes

Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Rusty Lynch
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anil S Keshavamurthy
2005-06-24 00:45:22 +0800
fd7b231ff [PATCH] Kprobes/IA64: arch specific handling ... Browse Code »

This is an IA64 arch specific handling of Kprobes

Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Rusty Lynch
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anil S Keshavamurthy
2005-06-24 00:45:22 +0800
7213b2521 [PATCH] Kprobes/IA64: kdebug die notification mechanism ... Browse Code »

As many of you know that kprobes exist in the main line kernel for various
architecture including i386, x86_64, ppc64 and sparc64. Attached patches
following this mail are a port of Kprobes and Jprobes for IA64.

I have tesed this patches for kprobes and Jprobes and this seems to work fine.
I have tested this patch by inserting kprobes on various slots and various
templates including various types of branch instructions.

I have also tested this patch using the tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the
kprobes for IA64 works great.

Here is list of TODO things and pathes for the same will appear soon.

1) Support kprobes on "mov r1=ip" type of instruction
2) Support Kprobes and Jprobes to exist on the same address
3) Support Return probes
3) Architecture independent cleanup of kprobes

This patch adds the kdebug die notification mechanism needed by Kprobes.

For break instruction on Branch type slot, imm21 is ignored and value
zero is placed in IIM register, hence we need to handle kprobes
for switch case zero.

Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Rusty Lynch

From: Rusty Lynch

At the point in traps.c where we recieve a break with a zero value, we can
not say if the break was a result of a kprobe or some other debug facility.

This simple patch changes the informational string to a more correct "break
0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes
patches that were just recently included for the next mm cut.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anil S Keshavamurthy
2005-06-24 00:45:22 +0800
0aa55e4d7 [PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task ... Browse Code »

This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().

Signed-off-by: Hien Nguyen
Acked-by: Prasanna S Panchamukhi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hien Nguyen
2005-06-24 00:45:21 +0800
7e1048b11 [PATCH] Move kprobe [dis]arming into arch specific code ... Browse Code »

The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time. The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address. This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:

*p->addr = BREAKPOINT_INSTRUCTION;

The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:

* void arch_arm_kprobe(struct kprobe *p)
* void arch_disarm_kprobe(struct kprobe *p)

and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).

I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So... I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.

So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.

Signed-off-by: Rusty Lynch
Signed-off-by: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Lynch
2005-06-24 00:45:21 +0800
73649dab0 [PATCH] x86_64 specific function return probes ... Browse Code »

The following patch adds the x86_64 architecture specific implementation
for function return probes.

Function return probes is a mechanism built on top of kprobes that allows
a caller to register a handler to be called when a given function exits.
For example, to instrument the return path of sys_mkdir:

static int sys_mkdir_exit(struct kretprobe_instance *i, struct pt_regs *regs)
{
printk("sys_mkdir exited\n");
return 0;
}
static struct kretprobe return_probe = {
.handler = sys_mkdir_exit,
};

return_probe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("sys_mkdir");
if (register_kretprobe(&return_probe)) {
printk(KERN_DEBUG "Unable to register return probe!\n");
/* do error path */
}

unregister_kretprobe(&return_probe);

The way this works is that:

* At system initialization time, kernel/kprobes.c installs a kprobe
on a function called kretprobe_trampoline() that is implemented in
the arch/x86_64/kernel/kprobes.c (More on this later)

* When a return probe is registered using register_kretprobe(),
kernel/kprobes.c will install a kprobe on the first instruction of the
targeted function with the pre handler set to arch_prepare_kretprobe()
which is implemented in arch/x86_64/kernel/kprobes.c.

* arch_prepare_kretprobe() will prepare a kretprobe instance that stores:
- nodes for hanging this instance in an empty or free list
- a pointer to the return probe
- the original return address
- a pointer to the stack address

With all this stowed away, arch_prepare_kretprobe() then sets the return
address for the targeted function to a special trampoline function called
kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c

* The kprobe completes as normal, with control passing back to the target
function that executes as normal, and eventually returns to our trampoline
function.

* Since a kprobe was installed on kretprobe_trampoline() during system
initialization, control passes back to kprobes via the architecture
specific function trampoline_probe_handler() which will lookup the
instance in an hlist maintained by kernel/kprobes.c, and then call
the handler function.

* When trampoline_probe_handler() is done, the kprobes infrastructure
single steps the original instruction (in this case just a top), and
then calls trampoline_post_handler(). trampoline_post_handler() then
looks up the instance again, puts the instance back on the free list,
and then makes a long jump back to the original return instruction.

So to recap, to instrument the exit path of a function this implementation
will cause four interruptions:

- A breakpoint at the very beginning of the function allowing us to
switch out the return address
- A single step interruption to execute the original instruction that
we replaced with the break instruction (normal kprobe flow)
- A breakpoint in the trampoline function where our instrumented function
returned to
- A single step interruption to execute the original instruction that
we replaced with the break instruction (normal kprobe flow)

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Lynch
2005-06-24 00:45:21 +0800
b94cce926 [PATCH] kprobes: function-return probes ... Browse Code »

This patch adds function-return probes to kprobes for the i386
architecture. This enables you to establish a handler to be run when a
function returns.

1. API

Two new functions are added to kprobes:

int register_kretprobe(struct kretprobe *rp);
void unregister_kretprobe(struct kretprobe *rp);

2. Registration and unregistration

2.1 Register

To register a function-return probe, the user populates the following
fields in a kretprobe object and calls register_kretprobe() with the
kretprobe address as an argument:

kp.addr - the function's address

handler - this function is run after the ret instruction executes, but
before control returns to the return address in the caller.

maxactive - The maximum number of instances of the probed function that
can be active concurrently. For example, if the function is non-
recursive and is called with a spinlock or mutex held, maxactive = 1
should be enough. If the function is non-recursive and can never
relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
be enough. maxactive is used to determine how many kretprobe_instance
objects to allocate for this particular probed function. If maxactive
Signed-off-by: Prasanna S Panchamukhi
Signed-off-by: Frederik Deweerdt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hien Nguyen
2005-06-24 00:45:21 +0800
84de856ed [PATCH] quota: consolidate code surrounding vfs_quota_on_mount ... Browse Code »

Move some code duplicated in both callers into vfs_quota_on_mount

Signed-off-by: Christoph Hellwig
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-06-24 00:45:20 +0800
ac20427ef [PATCH] add check to /proc/devices read routines ... Browse Code »

Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
of /proc/devices from spilling over the provided page if more than 4096
bytes of string data are generated from all the registered character and
block devices in a system

Signed-off-by: Neil Horman
Cc: Christoph Hellwig
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Horman
2005-06-24 00:45:19 +0800
dcd497f99 [PATCH] streamline preempt_count type across archs ... Browse Code »

The preempt_count member of struct thread_info is currently either defined
as int, unsigned int or __s32 depending on arch. This patch makes the type
of preempt_count an int on all archs.

Having preempt_count be an unsigned type prevents the catching of
preempt_count < 0 bugs, and using int on some archs and __s32 on others is
not exactely "neat" - much nicer when it's just int all over.

A previous version of this patch was already ACK'ed by Robert Love, and the
only change in this version of the patch compared to the one he ACK'ed is
that this one also makes sure the preempt_count member is consistently
commented.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-06-24 00:45:19 +0800
35a82d1a5 [PATCH] optimise loop driver a bit ... Browse Code »

Looks like locking can be optimised quite a lot. Increase lock widths
slightly so lo_lock is taken fewer times per request. Also it was quite
trivial to cover lo_pending with that lock, and remove the atomic
requirement. This also makes memory ordering explicitly correct, which is
nice (not that I particularly saw any mem ordering bugs).

Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
block size, 4K page size). System is 2 socket Xeon with HT (4 thread).

intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh

Before:
0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k

After:
0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-06-24 00:45:18 +0800
543537bd9 [PATCH] create a kstrdup library function ... Browse Code »

This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.

Most of the changes come from the sound and net subsystems. The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.

I left UML alone for now because I would need more time to read the code
carefully before making changes there.

Signed-off-by: Paulo Marques
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paulo Marques
2005-06-24 00:45:18 +0800
991114c6f [PATCH] fix for prune_icache()/forced final iput() races ... Browse Code »

Based on analysis and a patch from Russ Weight

There is a race condition that can occur if an inode is allocated and then
released (using iput) during the ->fill_super functions. The race
condition is between kswapd and mount.

For most filesystems this can only happen in an error path when kswapd is
running concurrently. For isofs, however, the error can occur in a more
common code path (which is how the bug was found).

The logic here is "we want final iput() to free inode *now* instead of
letting it sit in cache if fs is going down or had not quite come up". The
problem is with kswapd seeing such inodes in the middle of being killed and
happily taking over.

The clean solution would be to tell kswapd to leave those inodes alone and
let our final iput deal with them. I.e. add a new flag
(I_FORCED_FREEING), set it before write_inode_now() there and make
prune_icache() leave those alone.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Viro
2005-06-24 00:45:17 +0800
fd450b731 [PATCH] timers: introduce try_to_del_timer_sync() ... Browse Code »

This patch splits del_timer_sync() into 2 functions. The new one,
try_to_del_timer_sync(), returns -1 when it hits executing timer.

It can be used in interrupt context, or when the caller hold locks which
can prevent completion of the timer's handler.

NOTE. Currently it can't be used in interrupt context in UP case, because
->running_timer is used only with CONFIG_SMP.

Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
set_running_timer(), it is cheap.

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-06-24 00:45:16 +0800
55c888d6d [PATCH] timers fixes/improvements ... Browse Code »

This patch tries to solve following problems:

1. del_timer_sync() is racy. The timer can be fired again after
del_timer_sync have checked all cpus and before it will recheck
timer_pending().

2. It has scalability problems. All cpus are scanned to determine
if the timer is running on that cpu.

With this patch del_timer_sync is O(1) and no slower than plain
del_timer(pending_timer), unless it has to actually wait for
completion of the currently running timer.

The only restriction is that the recurring timer should not use
add_timer_on().

3. The timers are not serialized wrt to itself.

If CPU_0 does mod_timer(jiffies+1) while the timer is currently
running on CPU 1, it is quite possible that local interrupt on
CPU_0 will start that timer before it finished on CPU_1.

4. The timers locking is suboptimal. __mod_timer() takes 3 locks
at once and still requires wmb() in del_timer/run_timers.

The new implementation takes 2 locks sequentially and does not
need memory barriers.

Currently ->base != NULL means that the timer is pending. In that case
->base.lock is used to lock the timer. __mod_timer also takes timer->lock
because ->base can be == NULL.

This patch uses timer->entry.next != NULL as indication that the timer is
pending. So it does __list_del(), entry->next = NULL instead of list_del()
when the timer is deleted.

The ->base field is used for hashed locking only, it is initialized
in init_timer() which sets ->base = per_cpu(tvec_bases). When the
tvec_bases.lock is locked, it means that all timers which are tied
to this base via timer->base are locked, and the base itself is locked
too.

So __run_timers/migrate_timers can safely modify all timers which could
be found on ->tvX lists (pending timers).

When the timer's base is locked, and the timer removed from ->entry list
(which means that _run_timers/migrate_timers can't see this timer), it is
possible to set timer->base = NULL and drop the lock: the timer remains
locked.

This patch adds lock_timer_base() helper, which waits for ->base != NULL,
locks the ->base, and checks it is still the same.

__mod_timer() schedules the timer on the local CPU and changes it's base.
However, it does not lock both old and new bases at once. It locks the
timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
and adds this timer. This simplifies the code, because AB-BA deadlock is not
possible. __mod_timer() also ensures that the timer's base is not changed
while the timer's handler is running on the old base.

__run_timers(), del_timer() do not change ->base anymore, they only clear
pending flag.

So del_timer_sync() can test timer->base->running_timer == timer to detect
whether it is running or not.

We don't need timer_list->lock anymore, this patch kills it.

We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
before clearing timer's pending flag. It was needed because __mod_timer()
did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
could race with del_timer()->list_del(). With this patch these functions are
serialized through base->lock.

One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
adds global

struct timer_base_s {
spinlock_t lock;
struct timer_list *running_timer;
} __init_timer_base;

which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
struct are replaced by struct timer_base_s t_base.

It is indeed ugly. But this can't have scalability problems. The global
__init_timer_base.lock is used only when __mod_timer() is called for the first
time AND the timer was compile time initialized. After that the timer migrates
to the local CPU.

Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Signed-off-by: Renaud Lienhart
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2005-06-24 00:45:16 +0800
f7d37d028 [PATCH] blk: remove BLK_TAGS_{PER_LONG|MASK} ... Browse Code »

Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK.

Signed-off-by: Tejun Heo
Acked-by: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-06-24 00:45:15 +0800
fa72b903f [PATCH] blk: remove blk_queue_tag->real_max_depth optimization ... Browse Code »

blk_queue_tag->real_max_depth was used to optimize out unnecessary
allocations/frees on tag resize. However, the whole thing was very broken -
tag_map was never allocated to real_max_depth resulting in access beyond the
end of the map, bits in [max_depth..real_max_depth] were set when initializing
a map and copied when resizing resulting in pre-occupied tags.

As the gain of the optimization is very small, well, almost nill, remove the
whole thing.

Signed-off-by: Tejun Heo
Acked-by: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-06-24 00:45:15 +0800
e9129e56e [PATCH] xen: x86_64: Add macro for debugreg ... Browse Code »

Add 2 macros to set and get debugreg on x86_64. This is useful for Xen
because it will need only to redefine each macro to a hypervisor call.

Signed-off-by: Vincent Hanquez
Cc: Ian Pratt
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vincent Hanquez
2005-06-24 00:45:14 +0800
fa1e1bdf7 [PATCH] xen: x86: Rename usermode macro ... Browse Code »

Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().

Signed-off-by: Vincent Hanquez
Cc: Ian Pratt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vincent Hanquez
2005-06-24 00:45:14 +0800
f5012310e [PATCH] xen: x86: add macro for debugreg ... Browse Code »

Add 2 macros to set and get debugreg on x86. This is useful for Xen because
it will need only to redefine each macro to a hypervisor call.

Signed-off-by: Vincent Hanquez
Cc: Ian Pratt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vincent Hanquez
2005-06-24 00:45:13 +0800
32ecd42b6 [PATCH] eliminate duplicate rdpmc definition ... Browse Code »

Eliminate duplicate definition of rdpmc in x86-64's mtrr.h.

Signed-off-by: Jan Beulich
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2005-06-24 00:45:13 +0800
a3a255e74 [PATCH] x86: cpu_khz type fix ... Browse Code »

x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-06-24 00:45:11 +0800
a9ed88179 [PATCH] x86: #include asm/uaccess.h in asm/checksum.h ... Browse Code »

csum_and_copy_to_user is static inline and uses VERIFY_WRITE. Patch allows
to remove asm/uaccess.h from i386_ksyms.c without dependency surprises.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2005-06-24 00:45:11 +0800
b5d23e5b8 [PATCH] ia64: Selectable Timer Interrupt Frequency ... Browse Code »

It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-06-24 00:45:10 +0800
591210037 [PATCH] i386: Selectable Frequency of the Timer Interrupt ... Browse Code »

Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter
Signed-off-by: Shai Fultheim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-06-24 00:45:10 +0800
ca05fea6d [PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386) ... Browse Code »

This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it. The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.

Signed-off-by: Natalie Protasevich
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Natalie Protasevich
2005-06-24 00:45:09 +0800
1946089a1 [PATCH] NUMA aware block device control structure allocation ... Browse Code »

Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems). The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.

Does some realignment too.

Signed-off-by: Justin M. Forbes
Signed-off-by: Christoph Lameter
Signed-off-by: Pravin Shelar
Signed-off-by: Shobhit Dayal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-06-24 00:45:09 +0800
8c5a09082 [PATCH] x86/x86_64: pcibus_to_node ... Browse Code »

Define pcibus_to_node to be able to figure out which NUMA node contains a
given PCI device. This defines pcibus_to_node(bus) in
include/linux/topology.h and adjusts the macros for i386 and x86_64 that
already provided a way to determine the cpumask of a pci device.

x86_64 was changed to not build an array of cpumasks anymore. Instead an
array of nodes is build which can be used to generate the cpumask via
node_to_cpumask.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-06-24 00:45:08 +0800
e164f5573 [PATCH] ppc64: pcibus_to_node fix ... Browse Code »

asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to
provide the fall back pcibus_to_node function.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-06-24 00:45:08 +0800
4a3529366 [PATCH] m32r: build fix for asm-m32r/topology.h ... Browse Code »

Use asm-generic/topology.h to fix yet another pcibus_to_node() build error.

Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hirokazu Takata
2005-06-24 00:45:08 +0800
8a9e1b0f5 [PATCH] Platform SMIs and their interferance with tsc based delay calibration ... Browse Code »

Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2) wait for jiffy transition (jiffy1)
(3) Note down current tsc (tsc1)
(4) loop until tsc becomes tsc1 + loops_per_jiffy
(5) check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Venkatesh Pallipadi
2005-06-24 00:45:08 +0800
bbfceef47 [PATCH] add x86-64 specific support for sparsemem ... Browse Code »

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels. Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

text data bss dec hex filename
2371036 765884 1237108 4374028 42be0c vmlinux.discontig
2366549 776484 1302772 4445805 43d66d vmlinux.sparse

Signed-off-by: Matt Tolentino
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Tolentino
2005-06-24 00:45:07 +0800
2b97690f4 [PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options ... Browse Code »

In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options. This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.

Signed-off-by: Matt Tolentino
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Tolentino
2005-06-24 00:45:06 +0800
145e66423 [PATCH] ppc64: sparsemem memory model ... Browse Code »

Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.

Signed-off-by: Andy Whitcroft
Signed-off-by: Dave Hansen
Signed-off-by: Mike Kravetz (in part)
Signed-off-by: Martin Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2005-06-24 00:45:06 +0800
510f8fa7b [PATCH] ppc64: add early_pfn_to_nid ... Browse Code »

Provide an implementation of early_pfn_to_nid for PPC64. This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.

Signed-off-by: Andy Whitcroft
Signed-off-by: Dave Hansen
Signed-off-by: Martin Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2005-06-24 00:45:05 +0800