Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

20 Apr, 2006

6 commits

402a26f0c Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block ... Browse Code »

* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] block/elevator.c: remove unused exports
[PATCH] splice: fix smaller sized splice reads
[PATCH] Don't inherit ->splice_pipe across forks
[patch] cleanup: use blk_queue_stopped
[PATCH] Document online io scheduler switching

Linus Torvalds
2006-04-20 23:17:04 +0800
7522a8423 [PATCH] kprobes: NULL out non-relevant fields in struct kretprobe ... Browse Code »

In cases where a struct kretprobe's *_handler fields are non-NULL, it is
possible to cause a system crash, due to the possibility of calls ending up
in zombie functions. Documentation clearly states that unused *_handlers
should be set to NULL, but kprobe users sometimes fail to do so.

Fix it by setting the non-relevant fields of the struct kretprobe to NULL.

Signed-off-by: Ananth N Mavinakayanahalli
Acked-by: Jim Keniston
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ananth N Mavinakayanahalli
2006-04-20 22:54:03 +0800
a0aa7f68a [PATCH] Don't inherit ->splice_pipe across forks ... Browse Code »

It's really task private, so clear that field on fork after copying
task structure.

Signed-off-by: Jens Axboe

Jens Axboe
2006-04-20 19:05:33 +0800
5a7b46b36 [PATCH] Add more prevent_tail_call() ... Browse Code »

Those also break userland regs like following.

00000000 :
0: 0f b7 44 24 0c movzwl 0xc(%esp),%eax
5: 83 ca ff or $0xffffffff,%edx
8: 0f b7 4c 24 08 movzwl 0x8(%esp),%ecx
d: 66 83 f8 ff cmp $0xffffffff,%ax
11: 0f 44 c2 cmove %edx,%eax
14: 66 83 f9 ff cmp $0xffffffff,%cx
18: 0f 45 d1 cmovne %ecx,%edx
1b: 89 44 24 0c mov %eax,0xc(%esp)
1f: 89 54 24 08 mov %edx,0x8(%esp)
23: e9 fc ff ff ff jmp 24

where the tailcall at the end overwrites the incoming stack-frame.

Signed-off-by: OGAWA Hirofumi
[ I would _really_ like to have a way to tell gcc about calling
conventions. The "prevent_tail_call()" macro is pretty ugly ]
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2006-04-20 07:27:18 +0800
4a3b98a42 [PATCH] swsusp: prevent possible image corruption on resume ... Browse Code »

The function free_pagedir() used by swsusp for freeing its internal data
structures clears the PG_nosave and PG_nosave_free flags for each page
being freed.

However, during resume PG_nosave_free set means that the page in
question is "unsafe" (ie. it will be overwritten in the process of
restoring the saved system state from the image), so it should not be
used for the image data.

Therefore free_pagedir() should not clear PG_nosave_free if it's called
during resume (otherwise "unsafe" pages freed by it may be used for
storing the image data and the data may get corrupted later on).

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-04-20 00:13:49 +0800
5e85d4abe [PATCH] task: Make task list manipulations RCU safe ... Browse Code »

While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-20 00:13:49 +0800

15 Apr, 2006

2 commits

64541d197 [PATCH] kill unushed __put_task_struct_cb ... Browse Code »

Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.

Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.

Signed-off-by: Eric W. Biederman
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-15 08:43:57 +0800
78a596b44 [PATCH] remove kernel/power/pm.c:pm_unregister() ... Browse Code »

Since the last user is removed in -mm, we can now remove this long deprecated
function.

Signed-off-by: Adrian Bunk
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Adrian Bunk
2006-04-15 03:25:26 +0800

14 Apr, 2006

1 commit

e57a50598 [PATCH] fix non-leader exec under ptrace ... Browse Code »

This reverts most of commit 30e0fca6c1d7d26f3f2daa4dd2b12c51dadc778a.
It broke the case of non-leader MT exec when ptraced.
I think the bug it was intended to fix was already addressed by commit
788e05a67c343fa22f2ae1d3ca264e7f15c25eaf.

Signed-off-by: Roland McGrath
Acked-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Roland McGrath
2006-04-14 23:59:13 +0800

11 Apr, 2006

10 commits

a145410dc [PATCH] __group_complete_signal: remove bogus BUG_ON ... Browse Code »

Commit e56d090310d7625ecb43a1eeebd479f04affb48b

[PATCH] RCU signal handling

made this BUG_ON() unsafe. This code runs under ->siglock,
while switch_exec_pids() takes tasklist_lock.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-04-11 22:34:01 +0800
88dd9c16c Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block ... Browse Code »

* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed

Linus Torvalds
2006-04-11 21:34:02 +0800
5ef37b196 [PATCH] add cpu_relax to hrtimer_cancel ... Browse Code »

Add a cpu_relax() to the hand-coded spinwait in hrtimer_cancel().

Signed-off-by: Joe Korty
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Korty
2006-04-11 21:18:42 +0800
d824e66a9 [PATCH] build kernel/irq/migration.c only if CONFIG_GENERIC_PENDING_IRQ is set ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-04-11 21:18:41 +0800
aa7271076 [PATCH] the scheduled unexport of panic_timeout ... Browse Code »

Implement the scheduled unexport of panic_timeout.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-04-11 21:18:40 +0800
ba6edfcd1 [PATCH] timer initialisation fix ... Browse Code »

We need the boot CPU's tvec_bases[] entry to be initialised super-early in
boot, for early_serial_setup(). That runs within setup_arch(), before even
per-cpu areas are initialised.

The patch changes tvec_bases to use compile-time initialisation, and adds a
separate array `tvec_base_done' to keep track of which CPU has had its
tvec_bases[] entry initialised (because we can no longer use the zeroness of
that tvec_bases[] entry to determine whether it has been initialised).

Thanks to Eugene Surovegin for diagnosing this.

Cc: Eugene Surovegin
Cc: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-11 21:18:40 +0800
3016b4215 [PATCH] frv: define MMU mode specific syscalls as 'cond_syscall' and clean up unneeded macros ... Browse Code »

For some architectures, a few syscalls are not linked in noMMU mode. In
that case, the MMU depending syscalls are needed to be defined as
'cond_syscall'. For example, ARM architecture selectively links sys_mlock
by the mode configuration.

In case of FRV, it has been managed by #ifdef CONFIG_MMU macro in
arch/frv/kernel/entry.S. However these conditional macros are just
duplicates if they were defined as cond_syscall. Compilation test is done
with FRV toolchains for both of MMU and noMMU mode.

Signed-off-by: Hyok S. Choi
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hyok S. Choi
2006-04-11 21:18:33 +0800
8a5bc075b [PATCH] sched: don't awaken RT tasks on expired array ... Browse Code »

RT tasks are being awakened on the expired array when expired_starving() is
true, whereas they really should be excluded. Fix.

Signed-off-by: Mike Galbraith
Acked-by: Ingo Molnar
Cc: Con Kolivas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Galbraith
2006-04-11 21:18:30 +0800
5ce74abe7 [PATCH] sched: fix interactive task starvation ... Browse Code »

Fix a starvation problem that occurs when a stream of highly interactive tasks
delay an array switch for extended periods despite EXPIRED_STARVING(rq) being
true. AFAIKT, the only choice is to enqueue awakening tasks on the expired
array in this case.

Without this patch, it can be nearly impossible to remotely login to a busy
server, and interactive shell commands can starve for minutes.

Also, convert the EXPIRED_STARVING macro into an inline function which humans
can understand.

Signed-off-by: Mike Galbraith
Acked-by: Ingo Molnar
Cc: Nick Piggin
Acked-by: Con Kolivas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Galbraith
2006-04-11 21:18:30 +0800
b92ce5589 [PATCH] splice: add direct fd <-> fd splicing support ... Browse Code »

It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.

Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.

Signed-off-by: Jens Axboe

Jens Axboe
2006-04-11 19:52:07 +0800

10 Apr, 2006

1 commit

b20367a6c [PATCH] x86_64: Fix drift with HPET timer enabled ... Browse Code »

If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).

If HZ changes, this drift can become even more pronounced.

HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.

Vojtech comments:

"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Jordan Hargrave
2006-04-10 02:53:53 +0800

02 Apr, 2006

3 commits

9f31252cb BUG_ON() Conversion in kernel/signal.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:45:55 +0800
fda8bd78a BUG_ON() Conversion in kernel/signal.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:44:47 +0800
524223ca8 BUG_ON() Conversion in kernel/ptrace.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:43:40 +0800

01 Apr, 2006

17 commits

8ba8e95ed Fix comments: s/granuality/granularity/ ... Browse Code »

I was grepping through the code and some `grep ganularity -R .` didn't
catch what I thought. Then looking closer I saw the term "granuality"
used in only four places (in comments) and granularity in many more
places describing the same idea. Some other facts:

dictionary.com does not know such a word
define:granuality on google is not found (and pages for granuality are
mostly related to patches to the kernel)
it has not been discussed as a term on LKML, AFAICS (=Can Search)

To be consistent, I think granularity should be used everywhere.

Signed-off-by: Kalin KOZHUHAROV
Signed-off-by: Adrian Bunk

Kalin KOZHUHAROV
2006-04-01 07:41:22 +0800
8abd8e298 BUG_ON() Conversion in kernel/printk.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:21:17 +0800
3e6e952d1 help text: SOFTWARE_SUSPEND doesn't need ACPI ... Browse Code »

The note that SOFTWARE_SUSPEND doesn't need APM is helpful, but nowadays
the information that it doesn't need ACPI, too, is even more helpful.

Signed-off-by: Adrian Bunk

Adrian Bunk
2006-04-01 07:03:08 +0800
428622986 [PATCH] wrong error path in dup_fd() leading to oopses in RCU ... Browse Code »

Wrong error path in dup_fd() - it should return NULL on error,
not an address of already freed memory :/

Triggered by OpenVZ stress test suite.

What is interesting is that it was causing different oopses in RCU like
below:
Call Trace:
[] rcu_do_batch+0x2c/0x80
[] rcu_process_callbacks+0x3d/0x70
[] tasklet_action+0x73/0xe0
[] __do_softirq+0x10a/0x130
[] do_softirq+0x4f/0x60
=======================
[] smp_apic_timer_interrupt+0x77/0x110
[] apic_timer_interrupt+0x1c/0x24
Code: Bad EIP value.
Kernel panic - not syncing: Fatal exception in interrupt

Signed-Off-By: Pavel Emelianov
Signed-Off-By: Dmitry Mishin
Signed-Off-By: Kirill Korotaev
Signed-Off-By: Linus Torvalds

Kirill Korotaev
2006-04-01 04:25:46 +0800
92476d7fc [PATCH] pidhash: Refactor the pid hash table ... Browse Code »

Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.

In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.

For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.

If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.

This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.

The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.

Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-01 04:19:00 +0800
8c7904a00 [PATCH] task: RCU protect task->usage ... Browse Code »

A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.

When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.

What that means is that the proc code that looks up a task and later
wants to sleep can now do:

rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();

The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.

Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.

Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.

The end result:

- get_task_struct is safe in an rcu context where we have just looked
up the task.

- put_task_struct() simplifies into its old pre rcu self.

This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-01 04:18:59 +0800
158d9ebd1 [PATCH] resurrect __put_task_struct ... Browse Code »

This just got nuked in mainline. Bring it back because Eric's patches use it.

Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-01 04:18:59 +0800
390e2ff07 [PATCH] Make setsid() more robust ... Browse Code »

The core problem: setsid fails if it is called by init. The effect in 2.6.16
and the earlier kernels that have this problem is that if you do a "ps -j 1 or
ps -ej 1" you will see that init and several of it's children have process
group and session == 0. Instead of process group == session == 1. Despite
init calling setsid.

The reason it fails is that daemonize calls set_special_pids(1,1) on kernel
threads that are launched before /sbin/init is called.

The only remaining effect in that current->signal->leader == 0 for init
instead of 1. And the setsid call fails. No one has noticed because
/sbin/init does not check the return value of setsid.

In 2.4 where we don't have the pidhash table, and daemonize doesn't exist
setsid actually works for init.

I care a lot about pid == 1 not being a special case that we leave broken,
because of the container/jail work that I am doing.

- Carefully allow init (pid == 1) to call setsid despite the kernel using
its session.

- Use find_task_by_pid instead of find_pid because find_pid taking a
pidtype is going away.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-01 04:18:59 +0800
9741ef964 [PATCH] futex: check and validate timevals ... Browse Code »

The futex timeval is not checked for correctness. The change does not
break existing applications as the timeval is supplied by glibc (and glibc
always passes a correct value), but the glibc-internal tests for this
functionality fail.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2006-04-01 04:18:59 +0800
d425b274b [PATCH] sched: activate SCHED BATCH expired ... Browse Code »

To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:59 +0800
7c4bb1f9b [PATCH] sched: remove on runqueue requeueing ... Browse Code »

On runqueue time is used to elevate priority in schedule().

In the code it currently requeues tasks even if their priority is not
elevated, which would end up placing them at the end of their runqueue
array effectively delaying them instead of improving their priority.

Bug spotted by Mike Galbraith

This patch removes this requeueing.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Cc: Mike Galbraith
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:59 +0800
5138930e6 [PATCH] sched: include noninteractive sleep in idle detect ... Browse Code »

Tasks waiting in SLEEP_NONINTERACTIVE state can now get to best priority so
they need to be included in the idle detection code.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:59 +0800
e72ff0bb2 [PATCH] sched: dont decrease idle sleep avg ... Browse Code »

We watch for tasks that sleep extended periods and don't allow one single
prolonged sleep period from elevating priority to maximum bonus to prevent cpu
bound tasks from getting high priority with single long sleeps. There is a
bug in the current code that also penalises tasks that already have high
priority. Correct that bug.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:58 +0800
e7c38cb49 [PATCH] sched: make task_noninteractive use sleep_type ... Browse Code »

Alterations to the pipe code in the kernel made it possible for relative
starvation to occur with tasks that slept waiting on a pipe getting unfair
priority bonuses even if they were otherwise fully cpu bound so the
TASK_NONINTERACTIVE flag was introduced which prevented any change to
sleep_avg while sleeping waiting on a pipe. This change also leads to the
converse though, preventing any priority boost from occurring in truly
interactive tasks that wait on pipes.

Convert the TASK_NONINTERACTIVE flag to set sleep_type to SLEEP_NONINTERACTIVE
which will allow a linear bonus to priority based on sleep time thus allowing
interactive tasks to get high priority if they sleep enough.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:58 +0800
3dee386e1 [PATCH] sched: cleanup task_activated() ... Browse Code »

The activated flag in task_struct is used to track different sleep types and
its usage is somewhat obfuscated. Convert the variable to an enum with more
descriptive names without altering the function.

Signed-off-by: Con Kolivas
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-04-01 04:18:58 +0800
db1b1fefc [PATCH] sched: reduce overhead of calc_load ... Browse Code »

Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.

Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.

In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.

Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.

On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.

Signed-off-by: Jack Steiner
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jack Steiner
2006-04-01 04:18:58 +0800
3055addad [PATCH] hrtimer: call get_softirq_time() only when necessary in run_hrtimer_queue() ... Browse Code »

It seems that run_hrtimer_queue() is calling get_softirq_time() more
often than it needs to.

With this patch, it only calls get_softirq_time() if there's a
pending timer.

Signed-off-by: Dimitri Sivanich
Signed-off-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dimitri Sivanich
2006-04-01 04:18:58 +0800