Eric Lee / linux-smarc-t335x-v3.2

19 May, 2007

1 commit

92080309d init/main: use __init_refok to fix section mismatch ... Browse Code »

Kill a special case in modpost by introducing the
__init_refok marker.

Signed-off-by: Sam Ravnborg

Sam Ravnborg
2007-05-19 15:11:58 +0800

17 May, 2007

2 commits

9fbf09a09 SLUB: Remove depends on EXPERIMENTAL and !ARCH_USES_SLAB_PAGE_STRUCT ... Browse Code »

No arch sets ARCH_USES_SLAB_PAGE_STRUCT anymore.

Remove the experimental dependency as well since we want to have it as
a real alternative to SLAB.

It all comes down to killing a single line from init/Kconfig.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-17 20:23:03 +0800
afc0cedbe slob: implement RCU freeing ... Browse Code »

The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because
even on UP, RCU freeing semantics are not equivalent to simply freeing
immediately. This also allows SLOB to be used on SMP.

Signed-off-by: Nick Piggin
Acked-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-05-17 20:23:02 +0800

11 May, 2007

6 commits

e1ad7468c signal/timer/event: eventfd core ... Browse Code »

This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:

int eventfd(unsigned int count);

The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

The POLLIN flag is raised when the internal counter is greater than zero.

The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.

The POLLERR flag is raised when an overflow in the counter value is detected.

The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).

But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.

The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).

The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.

On the kernel side, we have:

struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);

The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).

The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:

http://www.xmailserver.org/eventfd-bench.c

This is the eventfd-based version of pipetest-4 (pipe(2) based):

http://www.xmailserver.org/pipetest-4.c

Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.

[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:36 +0800
b215e2839 signal/timer/event: timerfd core ... Browse Code »

This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.

The system call is defined as:

int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.

The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.

If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.

If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.

The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.

The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.

As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.

The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.

A quick test program, shows timerfd working correctly on my amd64 box:

http://www.xmailserver.org/timerfd-test.c

[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:36 +0800
fba2afaae signal/timer/event: signalfd core ... Browse Code »

This patch series implements the new signalfd() system call.

I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:

http://www.xmailserver.org/signafd-test.c

The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:

int signalfd(int ufd, const sigset_t *mask, size_t masksize);

The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.

The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".

The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.

The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.

If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:

struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};

[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:36 +0800
5dc8bf813 signal/timer/event fds: anonymous inode source ... Browse Code »

This patch add an anonymous inode source, to be used for files that need
and inode only in order to create a file*. We do not care of having an
inode for each file, and we do not even care of having different names in
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd
(and whatever else there'll be).

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:36 +0800
0e29b24aa Explicitly set pgid and sid of init process ... Browse Code »

Explicitly set pgid and sid of init process to 1.

Signed-off-by: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Eric Biederman
Cc: Herbert Poetzl
Cc:
Acked-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-05-11 23:29:35 +0800
d4751a279 SLUB: SLUB_DEBUG must depend on SLUB ... Browse Code »

Otherwise people get asked about SLUB_DEBUG even if they have another
slab allocator enabled.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-11 00:26:53 +0800

10 May, 2007

5 commits

9a9136e27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...

Fix trivial comment conflict in kernel/relay.c.

Linus Torvalds
2007-05-10 03:54:17 +0800
73c279927 kthread: don't depend on work queues ... Browse Code »

Currently there is a circular reference between work queue initialization
and kthread initialization. This prevents the kthread infrastructure from
initializing until after work queues have been initialized.

We want the properties of tasks created with kthread_create to be as close
as possible to the init_task and to not be contaminated by user processes.
The later we start our kthreadd that creates these tasks the harder it is
to avoid contamination from user processes and the more of a mess we have
to clean up because the defaults have changed on us.

So this patch modifies the kthread support to not use work queues but to
instead use a simple list of structures, and to have kthreadd start from
init_task immediately after our kernel thread that execs /sbin/init.

By being a true child of init_task we only have to change those process
settings that we want to have different from init_task, such as our process
name, the cpus that are allowed, blocking all signals and setting SIGCHLD
to SIG_IGN so that all of our children are reaped automatically.

By being a true child of init_task we also naturally get our ppid set to 0
and do not wind up as a child of PID == 1. Ensuring that tasks generated
by kthread_create will not slow down the functioning of the wait family of
functions.

[akpm@linux-foundation.org: use interruptible sleeps]
Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-05-10 03:30:53 +0800
dd2a345f8 Display all possible partitions when the root filesystem failed to mount ... Browse Code »

Display all possible partitions when the root filesystem is not mounted.
This helps to track spell'o's and missing drivers.

Updated to work with newer kernels.

Example output:

VFS: Cannot open root device "foobar" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
0800 8388608 sda driver: sd
0801 192748 sda1
0802 8193150 sda2
0810 4194304 sdb driver: sd
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[akpm@linux-foundation.org: cleanups, fix printk warnings]
Signed-off-by: Jan Engelhardt
Cc: Dave Gilbert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Gilbert
2007-05-10 03:30:48 +0800
34013886e Fix spellings of slab allocator section in init/Kconfig ... Browse Code »

Fix some of the spelling issues. Fix sentences. Discourage SLOB use
since SLUB can pack objects denser.

Signed-off-by: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-10 03:30:46 +0800
41ecc55b8 SLUB: add CONFIG_SLUB_DEBUG ... Browse Code »

CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components
of SLUB. Thus SLUB will be able to replace SLOB. SLUB can arrange objects in
a denser way than SLOB and the code size should be minimal without debugging
and sysfs support.

Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG.
CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB. SLUB enables
debugging via a boot parameter. SLUB debug code should always be present.

CONFIG_SLUB_DEBUG can be modified in the embedded config section.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-10 03:30:45 +0800

09 May, 2007

5 commits

b0e376504 Kconfig: Remove reference to external mqueue library ... Browse Code »

Remove the reference to an external mqueue library since that was
merged into glibc in 2004.

Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk

Robert P. J. Day
2007-05-09 13:25:13 +0800
3dde6ad8f Fix trivial typos in Kconfig* files ... Browse Code »

Fix several typos in help text in Kconfig* files.

Signed-off-by: David Sterba
Signed-off-by: Adrian Bunk

David Sterba
2007-05-09 13:12:20 +0800
794543a23 Move LOG_BUF_SHIFT to a more sensible place ... Browse Code »

Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
undesirable trade off for something as trivial as increasing the kernel log
buffer size.

Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
likely to be able to change it such a circumstance that the default buffer
size is insufficient.

Signed-off-by: Alistair John Strachan
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alistair John Strachan
2007-05-09 02:15:14 +0800
8f0c45cdf enhance initcall_debug, measure latency ... Browse Code »

enhance the initcall_debug boot option:

- measure the time the initcall took to execute and report
it in units of milliseconds.

- show the return code of initcalls (useful to see failures and
to make sure that an initcall hung)

[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2007-05-09 02:15:07 +0800
46595390e init/do_mounts.c: proper prepare_namespace() prototype ... Browse Code »

Add a proper protype for prepare_namespace() in include/linux/init.h.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-05-09 02:15:00 +0800

08 May, 2007

4 commits

726162b5d freezer: remove PF_NOFREEZE from handle_initrd ... Browse Code »

Make handle_initrd() call try_to_freeze() in a suitable place instead of setting
PF_NOFREEZE for the current task.

Signed-off-by: Rafael J. Wysocki
Acked-by: Nigel Cunningham
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-08 03:12:59 +0800
1394f0322 blackfin architecture ... Browse Code »

This adds support for the Analog Devices Blackfin processor architecture, and
currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
(Dual Core) devices, with a variety of development platforms including those
avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
BF561-EZKIT), and Bluetechnix! Tinyboards.

The Blackfin architecture was jointly developed by Intel and Analog Devices
Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
December of 2000. Since then ADI has put this core into its Blackfin
processor family of devices. The Blackfin core has the advantages of a clean,
orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC
(Multiply/Accumulate), state-of-the-art signal processing engine and
single-instruction, multiple-data (SIMD) multimedia capabilities into a single
instruction-set architecture.

The Blackfin architecture, including the instruction set, is described by the
ADSP-BF53x/BF56x Blackfin Processor Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

The Blackfin processor is already supported by major releases of gcc, and
there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
documentation, including "getting started" guides available at:
http://docs.blackfin.uclinux.org/ which provides links to the sources and
patches you will need in order to set up a cross-compiling environment for
bfin-linux-uclibc

This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/

We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

[m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
Signed-off-by: Bryan Wu
Signed-off-by: Mariusz Kozlowski
Signed-off-by: Aubrey Li
Signed-off-by: Jie Zhang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bryan Wu
2007-05-08 03:12:58 +0800
81819f0fc SLUB core ... Browse Code »

This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.

A. Management of object queues

A particular concern was the complex management of the numerous object
queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
each allocating CPU and use objects from a slab directly instead of
queueing them up.

B. Storage overhead of object queues

SLAB Object queues exist per node, per CPU. The alien cache queue even
has a queue array that contain a queue for each processor on each
node. For very large systems the number of queues and the number of
objects that may be caught in those queues grows exponentially. On our
systems with 1k nodes / processors we have several gigabytes just tied up
for storing references to objects for those queues This does not include
the objects that could be on those queues. One fears that the whole
memory of the machine could one day be consumed by those queues.

C. SLAB meta data overhead

SLAB has overhead at the beginning of each slab. This means that data
cannot be naturally aligned at the beginning of a slab block. SLUB keeps
all meta data in the corresponding page_struct. Objects can be naturally
aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
boundaries and can fit tightly into a 4k page with no bytes left over.
SLAB cannot do this.

D. SLAB has a complex cache reaper

SLUB does not need a cache reaper for UP systems. On SMP systems
the per CPU slab may be pushed back into partial list but that
operation is simple and does not require an iteration over a list
of objects. SLAB expires per CPU, shared and alien object queues
during cache reaping which may cause strange hold offs.

E. SLAB has complex NUMA policy layer support

SLUB pushes NUMA policy handling into the page allocator. This means that
allocation is coarser (SLUB does interleave on a page level) but that
situation was also present before 2.6.13. SLABs application of
policies to individual slab objects allocated in SLAB is
certainly a performance concern due to the frequent references to
memory policies which may lead a sequence of objects to come from
one node after another. SLUB will get a slab full of objects
from one node and then will switch to the next.

F. Reduction of the size of partial slab lists

SLAB has per node partial lists. This means that over time a large
number of partial slabs may accumulate on those lists. These can
only be reused if allocator occur on specific nodes. SLUB has a global
pool of partial slabs and will consume slabs from that pool to
decrease fragmentation.

G. Tunables

SLAB has sophisticated tuning abilities for each slab cache. One can
manipulate the queue sizes in detail. However, filling the queues still
requires the uses of the spin lock to check out slabs. SLUB has a global
parameter (min_slab_order) for tuning. Increasing the minimum slab
order can decrease the locking overhead. The bigger the slab order the
less motions of pages between per CPU and partial lists occur and the
better SLUB will be scaling.

G. Slab merging

We often have slab caches with similar parameters. SLUB detects those
on boot up and merges them into the corresponding general caches. This
leads to more effective memory use. About 50% of all caches can
be eliminated through slab merging. This will also decrease
slab fragmentation because partial allocated slabs can be filled
up again. Slab merging can be switched off by specifying
slub_nomerge on boot up.

Note that merging can expose heretofore unknown bugs in the kernel
because corrupted objects may now be placed differently and corrupt
differing neighboring objects. Enable sanity checks to find those.

H. Diagnostics

The current slab diagnostics are difficult to use and require a
recompilation of the kernel. SLUB contains debugging code that
is always available (but is kept out of the hot code paths).
SLUB diagnostics can be enabled via the "slab_debug" option.
Parameters can be specified to select a single or a group of
slab caches for diagnostics. This means that the system is running
with the usual performance and it is much more likely that
race conditions can be reproduced.

I. Resiliency

If basic sanity checks are on then SLUB is capable of detecting
common error conditions and recover as best as possible to allow the
system to continue.

J. Tracing

Tracing can be enabled via the slab_debug=T, option
during boot. SLUB will then protocol all actions on that slabcache
and dump the object contents on free.

K. On demand DMA cache creation.

Generally DMA caches are not needed. If a kmalloc is used with
__GFP_DMA then just create this single slabcache that is needed.
For systems that have no ZONE_DMA requirement the support is
completely eliminated.

L. Performance increase

Some benchmarks have shown speed improvements on kernbench in the
range of 5-10%. The locking overhead of slub is based on the
underlying base allocation size. If we can reliably allocate
larger order pages then it is possible to increase slub
performance much further. The anti-fragmentation patches may
enable further performance increases.

Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

SLUB Boot options

slub_nomerge Disable merging of slabs
slub_min_order=x Require a minimum order for slab caches. This
increases the managed chunk size and therefore
reduces meta data and locking overhead.
slub_min_objects=x Mininum objects per slab. Default is 8.
slub_max_order=x Avoid generating slabs larger than order specified.
slub_debug Enable all diagnostics for all caches
slub_debug= Enable selective options for all caches
slub_debug=, Enable selective options for a certain set of
caches

Available Debug options
F Double Free checking, sanity and resiliency
R Red zoning
P Object / padding poisoning
U Track last free / alloc
T Trace all allocs / frees (only use for individual slabs).

To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.

[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
476f35348 Safer nr_node_ids and nr_node_ids determination and initial values ... Browse Code »

The nr_cpu_ids value is currently only calculated in smp_init. However, it
may be needed before (SLUB needs it on kmem_cache_init!) and other kernel
components may also want to allocate dynamically sized per cpu array before
smp_init. So move the determination of possible cpus into sched_init()
where we already loop over all possible cpus early in boot.

Also initialize both nr_node_ids and nr_cpu_ids with the highest value they
could take. If we have accidental users before these values are determined
then the current valud of 0 may cause too small per cpu and per node arrays
to be allocated. If it is set to the maximum possible then we only waste
some memory for early boot users.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:51 +0800

07 May, 2007

1 commit

15700770e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits)
kconfig: fix mconf segmentation fault
kbuild: enable use of code from a different dir
kconfig: error out if recursive dependencies are found
kbuild: scripts/basic/fixdep segfault on pathological string-o-death
kconfig: correct minor typo in Kconfig warning message.
kconfig: fix path to modules.txt in Kconfig help
usr/Kconfig: fix typo
kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs'
kbuild: be more explicit on missing .config file
kbuild: clarify the creation of the LOCALVERSION_AUTO string.
kbuild: propagate errors from find in scripts/gen_initramfs_list.sh
kconfig: refer to qt3 if we cannot find qt libraries
kbuild: handle compressed cpio initramfs-es
kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text
kbuild: remove stale comment in modpost.c
kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE
kbuild: fix make mrproper for Documentation/DocBook/man
kbuild: remove kconfig binaries during make mrproper
kconfig/menuconfig: do not hardcode '.config'
kbuild: override build timestamp & version
...

Linus Torvalds
2007-05-07 04:21:57 +0800

03 May, 2007

3 commits

6e5a5420b kbuild: clarify the creation of the LOCALVERSION_AUTO string. ... Browse Code »

Clarify the creation of the LOCALVERSION_AUTO string during kernel
configuration, and fix a couple typoes while we're there.

Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Sam Ravnborg

Robert P. J. Day
2007-05-03 02:58:11 +0800
aae5f662a kbuild: whitelist section mismatch in init/main.c ... Browse Code »

In init/main.c we have a reference from rest_init() to .init.text
which is intentional.
Rename the function 'init' to 'kernel_init' to make it a
kernel wide unique symbol and whitelist the reference.

Signed-off-by: Sam Ravnborg

Sam Ravnborg
2007-05-03 02:58:07 +0800
b6e3590f8 [PATCH] x86: Allow percpu variables to be page-aligned ... Browse Code »

Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).

Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Acked-by: Ingo Molnar
Cc: Andi Kleen
Signed-off-by: Andrew Morton

Jeremy Fitzhardinge
2007-05-03 01:27:12 +0800

07 Mar, 2007

1 commit

f991633de [PATCH] initramfs should not depend on CONFIG_BLOCK ... Browse Code »

initramfs ended up depending on BLOCK:

INITRAMFS_SOURCE
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dimitri Gorokhovik
2007-03-07 01:30:25 +0800

21 Feb, 2007

1 commit

53b8a315b [PATCH] Convert highest_possible_processor_id to nr_cpu_ids ... Browse Code »

We frequently need the maximum number of possible processors in order to
allocate arrays for all processors. So far this was done using
highest_possible_processor_id(). However, we do need the number of
processors not the highest id. Moreover the number was so far dynamically
calculated on each invokation. The number of possible processors does not
change when the system is running. We can therefore calculate that number
once.

Signed-off-by: Christoph Lameter
Cc: Frederik Deweerdt
Cc: Neil Brown
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-21 09:10:13 +0800

20 Feb, 2007

1 commit

6168a702a [PATCH] Declare init_irq_proc before we use it. ... Browse Code »

powerpc gets:

init/main.c: In function `do_basic_setup':
init/main.c:714: warning: implicit declaration of function `init_irq_proc'

but we cannot include linux/irq.h in generic code.

Fix it by moving the declaration into linux/interrupt.h instead.

And make sure all code that defines init_irq_proc() is including
linux/interrupt.h.

And nuke an ifdef-in-C

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-20 06:21:50 +0800

17 Feb, 2007

1 commit

906568c9c [PATCH] tick-management: core functionality ... Browse Code »

With Ingo Molnar

The tick-management code is the first user of the clockevents layer. It takes
clock event devices from the clock events core and uses them to provide the
periodic tick.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: john stultz
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-02-17 00:13:59 +0800

15 Feb, 2007

5 commits

414f827c4 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
[PATCH] x86-64: Remove mk_pte_phys()
[PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
[PATCH] i386: fix 32-bit ioctls on x64_32
[PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
[PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
[PATCH] i386: Move mce_disabled to asm/mce.h
[PATCH] i386: paravirt unhandled fallthrough
[PATCH] x86_64: Wire up compat epoll_pwait
[PATCH] x86: Don't require the vDSO for handling a.out signals
[PATCH] i386: Fix Cyrix MediaGX detection
[PATCH] i386: Fix warning in cpu initialization
[PATCH] i386: Fix warning in microcode.c
[PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
[PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
[PATCH] i386: Remove fastcall in paravirt.[ch]
[PATCH] x86-64: Fix wrong gcc check in bitops.h
[PATCH] x86-64: survive having no irq mapping for a vector
[PATCH] i386: geode configuration fixes
[PATCH] i386: add option to show more code in oops reports
...

Linus Torvalds
2007-02-15 01:46:06 +0800
77b14db50 [PATCH] sysctl: reimplement the sysctl proc support ... Browse Code »

With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:10:00 +0800
a5494dcd8 [PATCH] sysctl: move SYSV IPC sysctls to their own file ... Browse Code »

This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the ipc logic to together it makes
the code a little more readable.

[gcoady.lk@gmail.com: build fix]
Signed-off-by: Eric W. Biederman
Cc: Serge E. Hallyn
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Signed-off-by: Grant Coady
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
b04c3afb2 [PATCH] sysctl: move init_irq_proc into init/main where it belongs ... Browse Code »

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:58 +0800
cd354f1ae [PATCH] remove many unneeded #includes of sched.h ... Browse Code »

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Schmielau
2007-02-15 00:09:54 +0800

13 Feb, 2007

2 commits

ee5bfa642 [PATCH] generic: Break init() in two parts to avoid MODPOST warnings ... Browse Code »

o init() is a non __init function in .text section but it calls many
functions which are in .init.text section. Hence MODPOST generates lots
of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus from .text between 'init' (at offset 0xc0101049) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init from .text between 'init' (at offset 0xc010104e) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd from .text between 'init' (at offset 0xc0101053) and 'rest_init'

o This patch breaks down init() in two parts. One part which can go
in .init.text section and can be freed and other part which has to
be non __init(init_post()). Now init() calls init_post() and init_post()
does not call any functions present in .init sections. Hence getting
rid of warnings.

Signed-off-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Andi Kleen

Vivek Goyal
2007-02-13 20:26:22 +0800
30d7e0d46 [PATCH] Dynamic kernel command-line: common ... Browse Code »

Current implementation stores a static command-line buffer allocated to
COMMAND_LINE_SIZE size. Most architectures stores two copies of this buffer,
one for future reference and one for parameter parsing.

Current kernel command-line size for most architecture is much too small for
module parameters, video settings, initramfs paramters and much more. The
problem is that setting COMMAND_LINE_SIZE to a grater value, allocates static
buffers.

In order to allow a greater command-line size, these buffers should be
dynamically allocated or marked as init disposable buffers, so unused memory
can be released.

This patch renames the static saved_command_line variable into
boot_command_line adding __initdata attribute, so that it can be disposed
after initialization. This rename is required so applications that use
saved_command_line will not be affected by this change.

It reintroduces saved_command_line as dynamically allocated buffer to match
the data in boot_command_line.

It also mark secondary command-line buffer as __initdata, and copies it to
dynamically allocated static_command_line buffer components may hold reference
to it after initialization.

This patch is for linux-2.6.20-rc4-mm1 and is divided to target each
architecture. I could not check this in any architecture so please forgive me
if I got it wrong.

The per-architecture modification is very simple, use boot_command_line in
place of saved_command_line. The common code is the change into dynamic
command-line.

This patch:

1. Rename saved_command_line into boot_command_line, mark as init
disposable.

2. Add dynamic allocated saved_command_line.

3. Add dynamic allocated static_command_line.

4. During startup copy: boot_command_line into saved_command_line. arch
command_line into static_command_line.

5. Parse static_command_line and not arch command_line, so arch
command_line may be freed.

Signed-off-by: Alon Bar-Lev
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Hirokazu Takata
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alon Bar-Lev
2007-02-13 01:48:37 +0800

12 Feb, 2007

2 commits

18f705f49 [PATCH] Move TASK_XACCT, TASK_IO_ACCOUNTING up in menus ... Browse Code »

Since they depends on TASKSTATS, it would be nice to move them closer to
another options depending on TASKSTATS.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-02-12 03:18:07 +0800
842f968f3 [PATCH] Remove final reference to superfluous smp_commence() ... Browse Code »

Remove the last (and commented out) invocation of the obsolete
smp_commence() call.

Signed-off-by: Robert P. J. Day
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 03:18:05 +0800