Doug / smarc-fsl-linux-kernel | Embedian Git Server

26 Feb, 2009

1 commit

a68260483 rcu: Teach RCU that idle task is not quiscent state at boot ... Browse Code »

This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton. And cleans up the variable-name
and function-name language. ;-)

The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state. This in turn causes
RCU to prematurely end grace periods during this time.

This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition. RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().

Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1. This allows boot-time code that
calls synchronize_rcu() to proceed normally. Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence. Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.

In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.

Changes since v4:

o Changes the name of the introduced function and variable to
be less emotional. ;-)

Changes since v3:

o WARN_ON(nr_context_switches() > 0) to verify that RCU
switches out of boot-time mode before the first context
switch, as suggested by Nick Piggin.

Changes since v2:

o Created rcu_blocking_is_gp() internal-to-RCU API that
determines whether a call to synchronize_rcu() is itself
a grace period.

o The definition of rcu_blocking_is_gp() for rcuclassic and
rcutree checks to see if but a single CPU is online.

o The definition of rcu_blocking_is_gp() for rcupreempt
checks to see both if but a single CPU is online and if
the system is still in early boot.

This allows rcupreempt to again work correctly if running
on a single CPU after booting is complete.

o Added check to rcupreempt's synchronize_sched() for there
being but one online CPU.

Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.

Located-by: Vegard Nossum
Tested-by: Vegard Nossum
Tested-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Signed-off-by: Ingo Molnar

Paul E. McKenney
2009-02-26 11:08:14 +0800

08 Jan, 2009

3 commits

67acd8b4b Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
async: don't do the initcall stuff post boot
bootchart: improve output based on Dave Jones' feedback
async: make the final inode deletion an asynchronous event
fastboot: Make libata initialization even more async
fastboot: make the libata port scan asynchronous
fastboot: make scsi probes asynchronous
async: Asynchronous function calls to speed up kernel boot

Linus Torvalds
2009-01-08 07:35:47 +0800
57c44c5f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...

Linus Torvalds
2009-01-08 03:31:52 +0800
22a9d6456 async: Asynchronous function calls to speed up kernel boot ... Browse Code »

Right now, most of the kernel boot is strictly synchronous, such that
various hardware delays are done sequentially.

In order to make the kernel boot faster, this patch introduces
infrastructure to allow doing some of the initialization steps
asynchronously, which will hide significant portions of the hardware delays
in practice.

In order to not change device order and other similar observables, this
patch does NOT do full parallel initialization.

Rather, it operates more in the way an out of order CPU does; the work may
be done out of order and asynchronous, but the observable effects
(instruction retiring for the CPU) are still done in the original sequence.

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2009-01-08 00:45:46 +0800

07 Jan, 2009

3 commits

d2e3192b6 init/main.c: mark late_time_init as __initdata ... Browse Code »

Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-01-07 07:59:14 +0800
f1883f86d Remove remaining unwinder code ... Browse Code »

Signed-off-by: Alexey Dobriyan
Cc: Gabor Gombas
Cc: Jan Beulich
Cc: Andi Kleen
Cc: Ingo Molnar ,
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-01-07 07:59:11 +0800
f99ebf0a8 init: properly placing noinline keyword ... Browse Code »

checkpatch warns about 'static void noinline'. It wants `static noinline
void'.

Both are permissible, but the kernel consistently uses `static inline' and
`static noinline', and consistency is good. Hence let's keep the
checkpatch warning and fix up this code site.

[akpm@linux-foundation.org: rewrote changelog]
Signed-off-by: Md.Rakib H. Mullick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2009-01-07 07:59:10 +0800

06 Jan, 2009

1 commit

24d431d06 trivial: add missing printk loglevel in start_kernel ... Browse Code »

Add missing printk loglevel in start_kernel

Signed-off-by: Ron Lee
Signed-off-by: Jiri Kosina

Ron Lee
2009-01-06 18:28:05 +0800

04 Jan, 2009

1 commit

7d3b56ba3 Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...

Linus Torvalds
2009-01-04 04:04:39 +0800

03 Jan, 2009

1 commit

609e5b71d kbuild: Remove gcc 4.1.0 quirk from init/main.c ... Browse Code »

Impact: cleanup

We now have a cleaner check for gcc 4.1.0/4.1.1 trouble in
include/linux/compiler-gcc4.h, so remove the 4.1.0 quirk from
init/main.c.

Reported-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Ingo Molnar
Acked-by: Sam Ravnborg
Signed-off-by: Linus Torvalds

Ingo Molnar
2009-01-03 02:09:27 +0800

01 Jan, 2009

3 commits

e0c0ba736 cpumask: Use find_last_bit() ... Browse Code »

Impact: cleanup

There's one obvious place to use it: to find the highest possible cpu.

Signed-off-by: Rusty Russell

Rusty Russell
2009-01-01 07:42:19 +0800
915441b60 cpumask: Use accessors code in core ... Browse Code »

Impact: use new API

cpu_*_map are going away in favour of cpu_*_mask, but const pointers.
So we have accessors where we really do want to frob them. Archs
will also need the (trivial) conversion before we can finally remove
cpu_*_map.

Signed-off-by: Rusty Russell
Signed-off-by: Mike Travis

Rusty Russell
2009-01-01 07:42:15 +0800
db200df0b Merge branch 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sparseirq: move __weak symbols into separate compilation unit
sparseirq: work around __weak alias bug
sparseirq: fix hang with !SPARSE_IRQ
sparseirq: set lock_class for legacy irq when sparse_irq is selected
sparseirq: work around compiler optimizing away __weak functions
sparseirq: fix desc->lock init
sparseirq: do not printk when migrating IRQ descriptors
sparseirq: remove duplicated arch_early_irq_init()
irq: simplify for_each_irq_desc() usage
proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
irq: for_each_irq_desc() move to irqnr.h
hrtimer: remove #include <linux/irq.h>

Linus Torvalds
2009-01-01 01:00:59 +0800

31 Dec, 2008

1 commit

179475a3b Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, sparseirq: clean up Kconfig entry
x86: turn CONFIG_SPARSE_IRQ off by default
sparseirq: fix numa_migrate_irq_desc dependency and comments
sparseirq: add kernel-doc notation for new member in irq_desc, -v2
locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
sparseirq, xen: make sure irq_desc is allocated for interrupts
sparseirq: fix !SMP building, #2
x86, sparseirq: move irq_desc according to smp_affinity, v7
proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
sparse irqs: add irqnr.h to the user headers list
sparse irqs: handle !GENIRQ platforms
sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
sparseirq: fix Alpha build failure
sparseirq: fix typo in !CONFIG_IO_APIC case
x86, MSI: pass irq_cfg and irq_desc
x86: MSI start irq numbering from nr_irqs_gsi
x86: use NR_IRQS_LEGACY
sparse irq_desc[] array: core kernel and x86 changes
genirq: record IRQ_LEVEL in irq_desc[]
irq.h: remove padding from irq_desc on 64bits

Linus Torvalds
2008-12-31 08:20:19 +0800

29 Dec, 2008

2 commits

43a256322 sparseirq: move __weak symbols into separate compilation unit ... Browse Code »

GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.

So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.

Signed-off-by: Yinghai Lu
Signed-off-by: Ingo Molnar

Yinghai Lu
2008-12-29 19:15:49 +0800
b0f4b285d Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
sched, trace: update trace_sched_wakeup()
tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
Revert "x86: disable X86_PTRACE_BTS"
ring-buffer: prevent false positive warning
ring-buffer: fix dangling commit race
ftrace: enable format arguments checking
x86, bts: memory accounting
x86, bts: add fork and exit handling
ftrace: introduce tracing_reset_online_cpus() helper
tracing: fix warnings in kernel/trace/trace_sched_switch.c
tracing: fix warning in kernel/trace/trace.c
tracing/ring-buffer: remove unused ring_buffer size
trace: fix task state printout
ftrace: add not to regex on filtering functions
trace: better use of stack_trace_enabled for boot up code
trace: add a way to enable or disable the stack tracer
x86: entry_64 - introduce FTRACE_ frame macro v2
tracing/ftrace: add the printk-msg-only option
tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
x86, bts: correctly report invalid bts records
...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.

Linus Torvalds
2008-12-29 04:21:10 +0800

27 Dec, 2008

1 commit

13a0c3c26 sparseirq: work around compiler optimizing away __weak functions ... Browse Code »

Impact: fix panic on null pointer with sparseirq

Some GCC versions seem to inline the weak global function,
when that function is empty.

Work it around, by making the functions return a (dummy) integer.

Signed-off-by: Yinghai
Signed-off-by: Ingo Molnar

Yinghai Lu
2008-12-27 20:24:00 +0800

08 Dec, 2008

1 commit

0b8f1efad sparse irq_desc[] array: core kernel and x86 changes ... Browse Code »

Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu
Signed-off-by: Ingo Molnar

Yinghai Lu
2008-12-08 21:31:51 +0800

23 Nov, 2008

1 commit

1d926f275 init/main.c: use ktime accessor function in initcall_debug code ... Browse Code »

Impact: fix initcall debug output on non-scalar ktime platforms (32-bit embedded)

The initcall_debug code access the tv64 member of ktime. This won't work
correctly for large deltas on platforms that don't use the scalar ktime
implementation.

Signed-off-by: Will Newton
Acked-by: Tim Bird
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Will Newton
2008-11-23 18:10:15 +0800

14 Nov, 2008

1 commit

d84f4f992 CRED: Inaugurate COW credentials ... Browse Code »

Inaugurate copy-on-write credentials management. This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

struct cred *new = prepare_creds();
int ret = blah(new);
if (ret < 0) {
abort_creds(new);
return ret;
}
return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const. The purpose of this is compile-time
discouragement of altering credentials through those pointers. Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

(1) Its reference count may incremented and decremented.

(2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

(1) execve().

This now prepares and commits credentials in various places in the
security code rather than altering the current creds directly.

(2) Temporary credential overrides.

do_coredump() and sys_faccessat() now prepare their own credentials and
temporarily override the ones currently on the acting thread, whilst
preventing interference from other threads by holding cred_replace_mutex
on the thread being dumped.

This will be replaced in a future patch by something that hands down the
credentials directly to the functions being called, rather than altering
the task's objective credentials.

(3) LSM interface.

A number of functions have been changed, added or removed:

(*) security_capset_check(), ->capset_check()
(*) security_capset_set(), ->capset_set()

Removed in favour of security_capset().

(*) security_capset(), ->capset()

New. This is passed a pointer to the new creds, a pointer to the old
creds and the proposed capability sets. It should fill in the new
creds or return an error. All pointers, barring the pointer to the
new creds, are now const.

(*) security_bprm_apply_creds(), ->bprm_apply_creds()

Changed; now returns a value, which will cause the process to be
killed if it's an error.

(*) security_task_alloc(), ->task_alloc_security()

Removed in favour of security_prepare_creds().

(*) security_cred_free(), ->cred_free()

New. Free security data attached to cred->security.

(*) security_prepare_creds(), ->cred_prepare()

New. Duplicate any security data attached to cred->security.

(*) security_commit_creds(), ->cred_commit()

New. Apply any security effects for the upcoming installation of new
security by commit_creds().

(*) security_task_post_setuid(), ->task_post_setuid()

Removed in favour of security_task_fix_setuid().

(*) security_task_fix_setuid(), ->task_fix_setuid()

Fix up the proposed new credentials for setuid(). This is used by
cap_set_fix_setuid() to implicitly adjust capabilities in line with
setuid() changes. Changes are made to the new credentials, rather
than the task itself as in security_task_post_setuid().

(*) security_task_reparent_to_init(), ->task_reparent_to_init()

Removed. Instead the task being reparented to init is referred
directly to init's credentials.

NOTE! This results in the loss of some state: SELinux's osid no
longer records the sid of the thread that forked it.

(*) security_key_alloc(), ->key_alloc()
(*) security_key_permission(), ->key_permission()

Changed. These now take cred pointers rather than task pointers to
refer to the security context.

(4) sys_capset().

This has been simplified and uses less locking. The LSM functions it
calls have been merged.

(5) reparent_to_kthreadd().

This gives the current thread the same credentials as init by simply using
commit_thread() to point that way.

(6) __sigqueue_alloc() and switch_uid()

__sigqueue_alloc() can't stop the target task from changing its creds
beneath it, so this function gets a reference to the currently applicable
user_struct which it then passes into the sigqueue struct it returns if
successful.

switch_uid() is now called from commit_creds(), and possibly should be
folded into that. commit_creds() should take care of protecting
__sigqueue_alloc().

(7) [sg]et[ug]id() and co and [sg]et_current_groups.

The set functions now all use prepare_creds(), commit_creds() and
abort_creds() to build and check a new set of credentials before applying
it.

security_task_set[ug]id() is called inside the prepared section. This
guarantees that nothing else will affect the creds until we've finished.

The calling of set_dumpable() has been moved into commit_creds().

Much of the functionality of set_user() has been moved into
commit_creds().

The get functions all simply access the data directly.

(8) security_task_prctl() and cap_task_prctl().

security_task_prctl() has been modified to return -ENOSYS if it doesn't
want to handle a function, or otherwise return the return value directly
rather than through an argument.

Additionally, cap_task_prctl() now prepares a new set of credentials, even
if it doesn't end up using it.

(9) Keyrings.

A number of changes have been made to the keyrings code:

(a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
all been dropped and built in to the credentials functions directly.
They may want separating out again later.

(b) key_alloc() and search_process_keyrings() now take a cred pointer
rather than a task pointer to specify the security context.

(c) copy_creds() gives a new thread within the same thread group a new
thread keyring if its parent had one, otherwise it discards the thread
keyring.

(d) The authorisation key now points directly to the credentials to extend
the search into rather pointing to the task that carries them.

(e) Installing thread, process or session keyrings causes a new set of
credentials to be created, even though it's not strictly necessary for
process or session keyrings (they're shared).

(10) Usermode helper.

The usermode helper code now carries a cred struct pointer in its
subprocess_info struct instead of a new session keyring pointer. This set
of credentials is derived from init_cred and installed on the new process
after it has been cloned.

call_usermodehelper_setup() allocates the new credentials and
call_usermodehelper_freeinfo() discards them if they haven't been used. A
special cred function (prepare_usermodeinfo_creds()) is provided
specifically for call_usermodehelper_setup() to call.

call_usermodehelper_setkeys() adjusts the credentials to sport the
supplied keyring as the new session keyring.

(11) SELinux.

SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:

(a) selinux_setprocattr() no longer does its check for whether the
current ptracer can access processes with the new SID inside the lock
that covers getting the ptracer's SID. Whilst this lock ensures that
the check is done with the ptracer pinned, the result is only valid
until the lock is released, so there's no point doing it inside the
lock.

(12) is_single_threaded().

This function has been extracted from selinux_setprocattr() and put into
a file of its own in the lib/ directory as join_session_keyring() now
wants to use it too.

The code in SELinux just checked to see whether a task shared mm_structs
with other tasks (CLONE_VM), but that isn't good enough. We really want
to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

The NFS server daemon now has to use the COW credentials to set the
credentials it is going to use. It really needs to pass the credentials
down to the functions it calls, but it can't do that until other patches
in this series have been applied.

Signed-off-by: David Howells
Acked-by: James Morris
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:23 +0800

12 Nov, 2008

2 commits

742390728 tracing/fastboot: Use the ring-buffer timestamp for initcall entries ... Browse Code »

Impact: Split the boot tracer entries in two parts: call and return

Now that we are using the sched tracer from the boot tracer, we want
to use the same timestamp than the ring-buffer to have consistent time
captures between sched events and initcall events.

So we get rid of the old time capture by the boot tracer and split the
initcall events in two parts: call and return. This way we have the
ring buffer timestamp of both.

An example trace:

[ 27.904149584] calling net_ns_init+0x0/0x1c0 @ 1
[ 27.904429624] initcall net_ns_init+0x0/0x1c0 returned 0 after 0 msecs
[ 27.904575926] calling reboot_init+0x0/0x20 @ 1
[ 27.904655399] initcall reboot_init+0x0/0x20 returned 0 after 0 msecs
[ 27.904800228] calling sysctl_init+0x0/0x30 @ 1
[ 27.905142914] initcall sysctl_init+0x0/0x30 returned 0 after 0 msecs
[ 27.905287211] calling ksysfs_init+0x0/0xb0 @ 1
##### CPU 0 buffer started ####
init-1 [000] 27.905395: 1:120:R + [001] 11:115:S
##### CPU 1 buffer started ####
-0 [001] 27.905425: 0:140:R ==> [001] 11:115:R
init-1 [000] 27.905426: 1:120:D ==> [000] 0:140:R
-0 [000] 27.905431: 0:140:R + [000] 4:115:S
-0 [000] 27.905451: 0:140:R ==> [000] 4:115:R
ksoftirqd/0-4 [000] 27.905456: 4:115:S ==> [000] 0:140:R
udevd-11 [001] 27.905458: 11:115:R + [001] 14:115:R
-0 [000] 27.905459: 0:140:R + [000] 4:115:S
-0 [000] 27.905462: 0:140:R ==> [000] 4:115:R
udevd-11 [001] 27.905462: 11:115:R ==> [001] 14:115:R
ksoftirqd/0-4 [000] 27.905467: 4:115:S ==> [000] 0:140:R
-0 [000] 27.905470: 0:140:R + [000] 4:115:S
-0 [000] 27.905473: 0:140:R ==> [000] 4:115:R
ksoftirqd/0-4 [000] 27.905476: 4:115:S ==> [000] 0:140:R
-0 [000] 27.905479: 0:140:R + [000] 4:115:S
-0 [000] 27.905482: 0:140:R ==> [000] 4:115:R
ksoftirqd/0-4 [000] 27.905486: 4:115:S ==> [000] 0:140:R
udevd-14 [001] 27.905499: 14:120:X ==> [001] 11:115:R
udevd-11 [001] 27.905506: 11:115:R + [000] 1:120:D
-0 [000] 27.905515: 0:140:R ==> [000] 1:120:R
udevd-11 [001] 27.905517: 11:115:S ==> [001] 0:140:R
[ 27.905557107] initcall ksysfs_init+0x0/0xb0 returned 0 after 3906 msecs
[ 27.905705736] calling init_jiffies_clocksource+0x0/0x10 @ 1
[ 27.905779239] initcall init_jiffies_clocksource+0x0/0x10 returned 0 after 0 msecs
[ 27.906769814] calling pm_init+0x0/0x30 @ 1
[ 27.906853627] initcall pm_init+0x0/0x30 returned 0 after 0 msecs
[ 27.906997803] calling pm_disk_init+0x0/0x20 @ 1
[ 27.907076946] initcall pm_disk_init+0x0/0x20 returned 0 after 0 msecs
[ 27.907222556] calling swsusp_header_init+0x0/0x30 @ 1
[ 27.907294325] initcall swsusp_header_init+0x0/0x30 returned 0 after 0 msecs
[ 27.907439620] calling stop_machine_init+0x0/0x50 @ 1
init-1 [000] 27.907485: 1:120:R + [000] 2:115:S
init-1 [000] 27.907490: 1:120:D ==> [000] 2:115:R
kthreadd-2 [000] 27.907507: 2:115:R + [001] 15:115:R
-0 [001] 27.907517: 0:140:R ==> [001] 15:115:R
kthreadd-2 [000] 27.907517: 2:115:D ==> [000] 0:140:R
-0 [000] 27.907521: 0:140:R + [000] 4:115:S
-0 [000] 27.907524: 0:140:R ==> [000] 4:115:R
udevd-15 [001] 27.907527: 15:115:D + [000] 2:115:D
ksoftirqd/0-4 [000] 27.907537: 4:115:S ==> [000] 2:115:R
udevd-15 [001] 27.907537: 15:115:D ==> [001] 0:140:R
kthreadd-2 [000] 27.907546: 2:115:R + [000] 1:120:D
kthreadd-2 [000] 27.907550: 2:115:S ==> [000] 1:120:R
init-1 [000] 27.907584: 1:120:R + [000] 15: 0:D
init-1 [000] 27.907589: 1:120:R + [000] 2:115:S
init-1 [000] 27.907593: 1:120:D ==> [000] 15: 0:R
udevd-15 [000] 27.907601: 15: 0:S ==> [000] 2:115:R
##### CPU 0 buffer started ####
kthreadd-2 [000] 27.907616: 2:115:R + [001] 16:115:R
##### CPU 1 buffer started ####
-0 [001] 27.907620: 0:140:R ==> [001] 16:115:R
kthreadd-2 [000] 27.907621: 2:115:D ==> [000] 0:140:R
udevd-16 [001] 27.907625: 16:115:D + [000] 2:115:D
-0 [000] 27.907628: 0:140:R + [000] 4:115:S
udevd-16 [001] 27.907629: 16:115:D ==> [001] 0:140:R
-0 [000] 27.907631: 0:140:R ==> [000] 4:115:R
ksoftirqd/0-4 [000] 27.907636: 4:115:S ==> [000] 2:115:R
kthreadd-2 [000] 27.907644: 2:115:R + [000] 1:120:D
kthreadd-2 [000] 27.907647: 2:115:S ==> [000] 1:120:R
init-1 [000] 27.907657: 1:120:R + [001] 16: 0:D
-0 [001] 27.907666: 0:140:R ==> [001] 16: 0:R
[ 27.907703862] initcall stop_machine_init+0x0/0x50 returned 0 after 0 msecs
[ 27.907850704] calling filelock_init+0x0/0x30 @ 1
[ 27.907926573] initcall filelock_init+0x0/0x30 returned 0 after 0 msecs
[ 27.908071327] calling init_script_binfmt+0x0/0x10 @ 1
[ 27.908165195] initcall init_script_binfmt+0x0/0x10 returned 0 after 0 msecs
[ 27.908309461] calling init_elf_binfmt+0x0/0x10 @ 1

Signed-off-by: Frederic Weisbecker
Acked-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-11-12 17:17:19 +0800
3f5ec1369 tracing/fastboot: move boot tracer structs and funcs into their own header. ... Browse Code »

Impact: Cleanups on the boot tracer and ftrace

This patch bring some cleanups about the boot tracer headers. The
functions and structures of this tracer have nothing related to ftrace
and should have so their own header file.

Signed-off-by: Frederic Weisbecker
Acked-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-11-12 17:17:18 +0800

05 Nov, 2008

1 commit

71566a0d1 tracing/fastboot: Enable boot tracing only during initcalls ... Browse Code »

Impact: modify boot tracer

We used to disable the initcall tracing at a specified time (IE: end
of builtin initcalls). But we don't need it anymore. It will be
stopped when initcalls are finished.

However we want two things:

_Start this tracing only after pre-smp initcalls are finished.

_Since we are planning to trace sched_switches at the same time, we
want to enable them only during the initcall execution.

For this purpose, this patch introduce two functions to enable/disable
the sched_switch tracing during boot.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-11-05 00:14:02 +0800

26 Oct, 2008

1 commit

4403b406d Revert "Call init_workqueues before pre smp initcalls." ... Browse Code »

This reverts commit a802dd0eb5fc97a50cf1abb1f788a8f6cc5db635 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.

It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall(). That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.

Cc: Heiko Carstens
Cc: Rusty Russell
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-10-26 10:53:38 +0800

24 Oct, 2008

2 commits

5ed487bc2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
[PATCH] fs: add a sanity check in d_free
[PATCH] i_version: remount support
[patch] vfs: make security_inode_setattr() calling consistent
[patch 1/3] FS_MBCACHE: don't needlessly make it built-in
[PATCH] move executable checking into ->permission()
[PATCH] fs/dcache.c: update comment of d_validate()
[RFC PATCH] touch_mnt_namespace when the mount flags change
[PATCH] reiserfs: add missing llseek method
[PATCH] fix ->llseek for more directories
[PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
[PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
[PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
[PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
[PATCH vfs-2.6 2/6] vfs: add d_ancestor()
[PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
[PATCH] get rid of on-stack dentry in udf
[PATCH 2/2] anondev: switch to IDA
[PATCH 1/2] anondev: init IDR statically
[JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
[PATCH] Optimise NFS readdir hack slightly.
...

Linus Torvalds
2008-10-24 01:22:40 +0800
a53448760 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
stop_machine: fix error code handling on multiple cpus
stop_machine: use workqueues instead of kernel threads
workqueue: introduce create_rt_workqueue
Call init_workqueues before pre smp initcalls.
Make panic= and panic_on_oops into core_params
Make initcall_debug a core_param
core_param() for genuinely core kernel parameters
param: Fix duplicate module prefixes
module: check kernel param length at compile time, not runtime
Remove stop_machine during module load v2
module: simplify load_module.

Linus Torvalds
2008-10-24 01:00:14 +0800

23 Oct, 2008

2 commits

94b6da5ab memcg: fix page_cgroup allocation ... Browse Code »

page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
....
cgroup_init_early() # "early" init of cgroup.
....
setup_arch() # memmap is allocated.
...
page_cgroup_init();
mem_init(); # we cannot call alloc_bootmem after this.
....
cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
- cgroup_init_early() is too early
- in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
use of vmalloc area in x86-32 is important and we should avoid very large
vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh
Tested-by: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-23 23:55:02 +0800
6de24f0ed [PATCH 1/2] anondev: init IDR statically ... Browse Code »

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2008-10-23 17:13:13 +0800

22 Oct, 2008

2 commits

a802dd0eb Call init_workqueues before pre smp initcalls. ... Browse Code »

This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).

Signed-off-by: Heiko Carstens
Signed-off-by: Rusty Russell

Heiko Carstens
2008-10-22 07:00:25 +0800
d0ea3d7d2 Make initcall_debug a core_param ... Browse Code »

This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.

Signed-off-by: Rusty Russell
Acked-by: Arjan van de Ven

Rusty Russell
2008-10-22 07:00:24 +0800

21 Oct, 2008

1 commit

92b29b86f Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
tracing/fastboot: improve help text
tracing/stacktrace: improve help text
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix bootgraph.pl initcall name regexp
tracing/fastboot: fix issues and improve output of bootgraph.pl
tracepoints: synchronize unregister static inline
tracepoints: tracepoint_synchronize_unregister()
ftrace: make ftrace_test_p6nop disassembler-friendly
markers: fix synchronize marker unregister static inline
tracing/fastboot: add better resolution to initcall debug/tracing
trace: add build-time check to avoid overrunning hex buffer
ftrace: fix hex output mode of ftrace
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix printk format typo in boot tracer
ftrace: return an error when setting a nonexistent tracer
ftrace: make some tracers reentrant
ring-buffer: make reentrant
ring-buffer: move page indexes into page headers
tracing/fastboot: only trace non-module initcalls
ftrace: move pc counter in irqtrace
...

Manually fix conflicts:
- init/main.c: initcall tracing
- kernel/module.c: verbose level vs tracepoints
- scripts/bootgraph.pl: fallout from cherry-picking commits.

Linus Torvalds
2008-10-21 04:35:07 +0800

20 Oct, 2008

1 commit

db64fe022 mm: rewrite vmap layer ... Browse Code »

Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap. Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache. This is all done under a global lock. As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock. It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems. The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping. vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated. So the addresses aren't allocated again until
a subsequent TLB flush. A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap. Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages. Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron. Results are
in nanoseconds per map+touch+unmap.

threads vanilla vmap rewrite
1 14700 2900
2 33600 3000
4 49500 2800
8 70631 2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram... along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system. I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now. vmap is pretty well blown off the
profiles.

Before:
1352059 total 0.1401
798784 _write_lock 8320.6667
Cc: Jeremy Fitzhardinge
Cc: Krzysztof Helt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-20 23:52:32 +0800

14 Oct, 2008

7 commits

ca538f6bb tracing/fastboot: add better resolution to initcall debug/tracing ... Browse Code »

Change the time resolution for initcall_debug to microseconds, from
milliseconds. This is handy to determine which initcalls you want to work
on for faster booting.

One one of my test machines, over 90% of the initcalls are less than a
millisecond and (without this patch) these are all reported as 0 msecs.
Working on the 900 us ones is more important than the 4 us ones.

With 'quiet' on the kernel command line, this adds no significant overhead
to kernel boot time.

Signed-off-by: Tim Bird
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Tim Bird
2008-10-14 16:39:27 +0800
097d036a2 tracing/fastboot: only trace non-module initcalls ... Browse Code »

At this time, only built-in initcalls interest us.
We can't really produce a relevant graph if we include
the modules initcall too.

I had good results after this patch (see svg in attachment).

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:17 +0800
5601020fe tracing/fastboot: get the initcall name before it disappears ... Browse Code »

After some initcall traces, some initcall names may be inconsistent.
That's because these functions will disappear from the .init section
and also their name from the symbols table.

So we have to copy the name of the function in a buffer large enough
during the trace appending. It is not costly for the ring_buffer because
the number of initcall entries is commonly not really large.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:12 +0800
cb5ab7420 tracing/fastboot: change the printing of boot tracer according to bootgraph.pl ... Browse Code »

Change the boot tracer printing to make it parsable for
the scripts/bootgraph.pl script.

We have now to output two lines for each initcall, according to the
printk in do_one_initcall() in init/main.c
We need now the call's time and the return's time.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:11 +0800
3bf77af6e tracing/ftrace: launch boot tracing after pre-smp initcalls ... Browse Code »

Launch the boot tracing inside the initcall_debug area. Old printk
have not been removed to keep the old way of initcall tracing for
backward compatibility.

[ mingo@elte.hu: resolved conflicts ]
Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frédéric Weisbecker
2008-10-14 16:38:50 +0800
aa5d9151f tracing/fastboot: add a script to visualize the kernel boot process / time ... Browse Code »

When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with the fastboot asynchronous
initcall level, it's very valuable to see which initcall gets run where
and when.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-10-14 16:38:46 +0800
68bf21aa1 ftrace: mcount call site on boot nops core ... Browse Code »

This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.

This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.

This patch does not handle modules. That comes with a later patch.

Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Steven Rostedt
2008-10-14 16:34:44 +0800

12 Oct, 2008

1 commit

f9b9796ad Add a script to visualize the kernel boot process / time ... Browse Code »

When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with some of the initializing
going asynchronous soon, it's valuable to track/print which worker thread
is executing the initialization.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-10-12 23:07:20 +0800