Eric Lee / smarc-fsl-linux-kernel

02 Nov, 2008

1 commit

d3f15800d init/do_mounts_md.c: remove duplicated #include ... Browse Code »

Removed duplicated #include in init/do_mounts_md.c.

The same compile error ("error: implicit declaration of function
'msleep'") got fixed twice:

- f8b77d39397e1510b1a3bcfd385ebd1a45aae77f ("init/do_mounts_md.c:
msleep compile fix")

- 73b4a24f5ff09389ba6277c53a266b142f655ed2 ("init/do_mounts_md.c must
#include ")

by people adding the include in two slightly different
places. Andrew's quilt scripts happily ignore the fuzz, and will
re-apply the patch even though they had conflicts.

Signed-off-by: Huang Weiyi
Signed-off-by: Linus Torvalds

Huang Weiyi
2008-11-02 01:35:51 +0800

31 Oct, 2008

2 commits

84ad6d700 memcg: update menuconfig help text ... Browse Code »

page_cgroup is now allocated at boot and memmap doesn't includes pointer
for page_cgroup. Fix the menu help text.

Reviewed-by: Balbir Singh
Signed-off-by: KAMEZAWA hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-31 02:38:46 +0800
f8b77d393 init/do_mounts_md.c: msleep compile fix ... Browse Code »

init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-31 02:38:46 +0800

26 Oct, 2008

1 commit

4403b406d Revert "Call init_workqueues before pre smp initcalls." ... Browse Code »

This reverts commit a802dd0eb5fc97a50cf1abb1f788a8f6cc5db635 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.

It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall(). That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.

Cc: Heiko Carstens
Cc: Rusty Russell
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-10-26 10:53:38 +0800

24 Oct, 2008

3 commits

5ed487bc2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
[PATCH] fs: add a sanity check in d_free
[PATCH] i_version: remount support
[patch] vfs: make security_inode_setattr() calling consistent
[patch 1/3] FS_MBCACHE: don't needlessly make it built-in
[PATCH] move executable checking into ->permission()
[PATCH] fs/dcache.c: update comment of d_validate()
[RFC PATCH] touch_mnt_namespace when the mount flags change
[PATCH] reiserfs: add missing llseek method
[PATCH] fix ->llseek for more directories
[PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
[PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
[PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
[PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
[PATCH vfs-2.6 2/6] vfs: add d_ancestor()
[PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
[PATCH] get rid of on-stack dentry in udf
[PATCH 2/2] anondev: switch to IDA
[PATCH 1/2] anondev: init IDR statically
[JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
[PATCH] Optimise NFS readdir hack slightly.
...

Linus Torvalds
2008-10-24 01:22:40 +0800
a3415dc34 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 ... Browse Code »

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (32 commits)
PCI hotplug: fix logic in Compaq hotplug controller bus speed setup
PCI: don't export linux/io.h from pci.h
PCI: PCI_QUIRKS depends on PCI
PCI hotplug: pciehp: poll data link layer link active
PCI hotplug: pciehp: fix possible memory leak in pcie_init
PCI: Workaround invalid P2P bridge bus numbers
PCI Hotplug: fakephp: add duplicate slot name debugging
PCI: Hotplug core: remove 'name'
PCI: shcphp: remove 'name' parameter
PCI: SGI Hotplug: stop managing bss_hotplug_slot->name
PCI: rpaphp: kmalloc/kfree slot->name directly
PCI: pciehp: remove 'name' parameter
PCI: ibmphp: stop managing hotplug_slot->name
PCI: fakephp: remove 'name' parameter
PCI, PCI Hotplug: introduce slot_name helpers
PCI: cpqphp: stop managing hotplug_slot->name
PCI: cpci_hotplug: stop managing hotplug_slot->name
PCI: acpiphp: remove 'name' parameter
PCI: prevent duplicate slot names
PCI Hotplug: serialize pci_hp_register and pci_hp_deregister
...

Linus Torvalds
2008-10-24 01:16:53 +0800
a53448760 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
stop_machine: fix error code handling on multiple cpus
stop_machine: use workqueues instead of kernel threads
workqueue: introduce create_rt_workqueue
Call init_workqueues before pre smp initcalls.
Make panic= and panic_on_oops into core_params
Make initcall_debug a core_param
core_param() for genuinely core kernel parameters
param: Fix duplicate module prefixes
module: check kernel param length at compile time, not runtime
Remove stop_machine during module load v2
module: simplify load_module.

Linus Torvalds
2008-10-24 01:00:14 +0800

23 Oct, 2008

3 commits

94b6da5ab memcg: fix page_cgroup allocation ... Browse Code »

page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
....
cgroup_init_early() # "early" init of cgroup.
....
setup_arch() # memmap is allocated.
...
page_cgroup_init();
mem_init(); # we cannot call alloc_bootmem after this.
....
cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
- cgroup_init_early() is too early
- in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
use of vmalloc area in x86-32 is important and we should avoid very large
vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh
Tested-by: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-23 23:55:02 +0800
6de24f0ed [PATCH 1/2] anondev: init IDR statically ... Browse Code »

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2008-10-23 17:13:13 +0800
61cfc7e44 PCI: PCI_QUIRKS depends on PCI ... Browse Code »

commit 3d137310245e4cdc3e8c8ba1bea2e145a87ae8e3 ("PCI: allow quirks to be
compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each
and every .config. Fix this by making it depend on PCI.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Jesse Barnes

Geert Uytterhoeven
2008-10-23 07:42:46 +0800

22 Oct, 2008

2 commits

a802dd0eb Call init_workqueues before pre smp initcalls. ... Browse Code »

This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).

Signed-off-by: Heiko Carstens
Signed-off-by: Rusty Russell

Heiko Carstens
2008-10-22 07:00:25 +0800
d0ea3d7d2 Make initcall_debug a core_param ... Browse Code »

This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.

Signed-off-by: Rusty Russell
Acked-by: Arjan van de Ven

Rusty Russell
2008-10-22 07:00:24 +0800

21 Oct, 2008

3 commits

a0bfb673d Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 ... Browse Code »

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits)
PCI: fix pci_ioremap_bar() on s390
PCI: fix AER capability check
PCI: use pci_find_ext_capability everywhere
PCI: remove #ifdef DEBUG around dev_dbg call
PCI hotplug: fix get_##name return value problem
PCI: document the pcie_aspm kernel parameter
PCI: introduce an pci_ioremap(pdev, barnr) function
powerpc/PCI: Add legacy PCI access via sysfs
PCI: Add ability to mmap legacy_io on some platforms
PCI: probing debug message uniformization
PCI: support PCIe ARI capability
PCI: centralize the capabilities code in probe.c
PCI: centralize the capabilities code in pci-sysfs.c
PCI: fix 64-vbit prefetchable memory resource BARs
PCI: replace cfg space size (256/4096) by macros.
PCI: use resource_size() everywhere.
PCI: use same arg names in PCI_VDEVICE comment
PCI hotplug: rpaphp: make debug var unique
PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c
PCI: fix hotplug get_##name return value problem
...

Linus Torvalds
2008-10-21 04:40:47 +0800
92b29b86f Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
tracing/fastboot: improve help text
tracing/stacktrace: improve help text
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix bootgraph.pl initcall name regexp
tracing/fastboot: fix issues and improve output of bootgraph.pl
tracepoints: synchronize unregister static inline
tracepoints: tracepoint_synchronize_unregister()
ftrace: make ftrace_test_p6nop disassembler-friendly
markers: fix synchronize marker unregister static inline
tracing/fastboot: add better resolution to initcall debug/tracing
trace: add build-time check to avoid overrunning hex buffer
ftrace: fix hex output mode of ftrace
tracing/fastboot: fix initcalls disposition in bootgraph.pl
tracing/fastboot: fix printk format typo in boot tracer
ftrace: return an error when setting a nonexistent tracer
ftrace: make some tracers reentrant
ring-buffer: make reentrant
ring-buffer: move page indexes into page headers
tracing/fastboot: only trace non-module initcalls
ftrace: move pc counter in irqtrace
...

Manually fix conflicts:
- init/main.c: initcall tracing
- kernel/module.c: verbose level vs tracepoints
- scripts/bootgraph.pl: fallout from cherry-picking commits.

Linus Torvalds
2008-10-21 04:35:07 +0800
3d1373102 PCI: allow quirks to be compiled out ... Browse Code »

This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
the PCI quirks, which are not necessarily used on embedded systems when
PCI is working properly. As this is a size-reduction option, it depends
on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
code:

text data bss dec hex filename
1287806 123596 212992 1624394 18c94a vmlinux.old
1275854 123596 212992 1612442 189a9a vmlinux
-11952 0 0 -11952 -2EB0 +/-

This patch has originally been written by Zwane Mwaikambo
and is part of the Linux Tiny project.

Signed-off-by: Thomas Petazzoni
Signed-off-by: Jesse Barnes

Thomas Petazzoni
2008-10-21 01:53:40 +0800

20 Oct, 2008

2 commits

dc52ddc0e container freezer: implement freezer cgroup subsystem ... Browse Code »

This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.

* Examples of usage :

# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

# cat /containers/0/freezer.state
RUNNING

to freeze all tasks in the container :

# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN

to unfreeze all tasks in the container :

# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING

This is the basic mechanism which should do the right thing for user space
task in a simple scenario.

It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:

1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater
Signed-off-by: Matt Helsley
Acked-by: Serge E. Hallyn
Tested-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Helsley
2008-10-20 23:52:34 +0800
db64fe022 mm: rewrite vmap layer ... Browse Code »
1

Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap. Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache. This is all done under a global lock. As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock. It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems. The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping. vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated. So the addresses aren't allocated again until
a subsequent TLB flush. A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap. Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages. Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron. Results are
in nanoseconds per map+touch+unmap.

threads vanilla vmap rewrite
1 14700 2900
2 33600 3000
4 49500 2800
8 70631 2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram... along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system. I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now. vmap is pretty well blown off the
profiles.

Before:
1352059 total 0.1401
798784 _write_lock 8320.6667
Cc: Jeremy Fitzhardinge
Cc: Krzysztof Helt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-20 23:52:32 +0800

17 Oct, 2008

4 commits

73b4a24f5 init/do_mounts_md.c must #include <linux/delay.h> ... Browse Code »

This patch fixes the following compile error caused by commit
589f800bb12c5cd6c9167bbf9bf3cb70cd8e422c ("fastboot: make the raid
autodetect code wait for all devices to init"):

CC init/do_mounts_md.o
init/do_mounts_md.c: In function 'autodetect_raid':
init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
make[2]: *** [init/do_mounts_md.o] Error 1

Signed-off-by: Adrian Bunk
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-10-17 04:35:34 +0800
ebf3f09c6 Configure out AIO support ... Browse Code »

This patchs adds the CONFIG_AIO option which allows to remove support
for asynchronous I/O operations, that are not necessarly used by
applications, particularly on embedded devices. As this is a
size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
save ~7 kilobytes of kernel code/data:

text data bss dec hex filename
1115067 119180 217088 1451335 162547 vmlinux
1108025 119048 217088 1444161 160941 vmlinux.new
-7042 -132 0 -7174 -1C06 +/-

This patch has been originally written by Matt Mackall
, and is part of the Linux Tiny project.

[randy.dunlap@oracle.com: build fix]
Signed-off-by: Thomas Petazzoni
Cc: Benjamin LaHaise
Cc: Zach Brown
Signed-off-by: Matt Mackall
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Petazzoni
2008-10-17 02:21:51 +0800
889d51a10 initramfs: add option to preserve mtime from initramfs cpio images ... Browse Code »

When unpacking the cpio into the initramfs, mtimes are not preserved by
default. This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
mtimes stored in the cpio image to be used when constructing the
initramfs.

For embedded applications that run exclusively out of the initramfs, this
is invaluable:

When building embedded application initramfs images, its nice to know when
the files were actually created during the build process - that makes it
easier to see what files were modified when so we can compare the files
that are being used on the image with the files used during the build
process. This might help (for example) to determine if the target system
has all the updated files you expect to see w/o having to check MD5s etc.

In our environment, the whole system runs off the initramfs partition, and
seeing the modified times of the shared libraries (for example) helps us
find bugs that may have been introduced by the build system incorrectly
propogating outdated shared libraries into the image.

Similarly, many of the initializion/configuration files in /etc might be
dynamically built by the build system, and knowing when they were modified
helps us sanity check whether the target system has the "latest" files
etc.

Finally, we might use last modified times to determine whether a hot fix
should be applied or not to the running ramfs.

Signed-off-by: Nye Liu
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nye Liu
2008-10-17 02:21:31 +0800
93fd85d00 identify_ramdisk_image(): correct typo about return value in comment ... Browse Code »

identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:

if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
printk(KERN_NOTICE
"RAMDISK: Compressed image found at block %d\n",
start_block);
nblocks = 0;
^^^^^^^^^^^
goto done;
}

...

done:
sys_lseek(fd, start_block * BLOCK_SIZE, 0);
kfree(buf);
return nblocks;
^^^^^^^^^^^^^^

Hence correct the typo in the comment, which has existed since the
addition of compressed ramdisk support in 1.3.48.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2008-10-17 02:21:30 +0800

15 Oct, 2008

1 commit

2d51b7537 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
raid, fastboot: hide RAID autodetect option if MD is compiled as a module
raid: make RAID autodetect default a KConfig option
warning: fix init do_mounts_md c
fastboot: make the RAID autostart code print a message just before waiting
fastboot: make the raid autodetect code wait for all devices to init
fastboot: Fix bootgraph.pl initcall name regexp
fastboot: fix issues and improve output of bootgraph.pl
Add a script to visualize the kernel boot process / time

Linus Torvalds
2008-10-15 03:28:02 +0800

14 Oct, 2008

11 commits

ca538f6bb tracing/fastboot: add better resolution to initcall debug/tracing ... Browse Code »

Change the time resolution for initcall_debug to microseconds, from
milliseconds. This is handy to determine which initcalls you want to work
on for faster booting.

One one of my test machines, over 90% of the initcalls are less than a
millisecond and (without this patch) these are all reported as 0 msecs.
Working on the 900 us ones is more important than the 4 us ones.

With 'quiet' on the kernel command line, this adds no significant overhead
to kernel boot time.

Signed-off-by: Tim Bird
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Tim Bird
2008-10-14 16:39:27 +0800
097d036a2 tracing/fastboot: only trace non-module initcalls ... Browse Code »

At this time, only built-in initcalls interest us.
We can't really produce a relevant graph if we include
the modules initcall too.

I had good results after this patch (see svg in attachment).

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:17 +0800
5601020fe tracing/fastboot: get the initcall name before it disappears ... Browse Code »

After some initcall traces, some initcall names may be inconsistent.
That's because these functions will disappear from the .init section
and also their name from the symbols table.

So we have to copy the name of the function in a buffer large enough
during the trace appending. It is not costly for the ring_buffer because
the number of initcall entries is commonly not really large.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:12 +0800
cb5ab7420 tracing/fastboot: change the printing of boot tracer according to bootgraph.pl ... Browse Code »

Change the boot tracer printing to make it parsable for
the scripts/bootgraph.pl script.

We have now to output two lines for each initcall, according to the
printk in do_one_initcall() in init/main.c
We need now the call's time and the return's time.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2008-10-14 16:39:11 +0800
3bf77af6e tracing/ftrace: launch boot tracing after pre-smp initcalls ... Browse Code »

Launch the boot tracing inside the initcall_debug area. Old printk
have not been removed to keep the old way of initcall tracing for
backward compatibility.

[ mingo@elte.hu: resolved conflicts ]
Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frédéric Weisbecker
2008-10-14 16:38:50 +0800
aa5d9151f tracing/fastboot: add a script to visualize the kernel boot process / time ... Browse Code »

When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with the fastboot asynchronous
initcall level, it's very valuable to see which initcall gets run where
and when.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-10-14 16:38:46 +0800
68bf21aa1 ftrace: mcount call site on boot nops core ... Browse Code »

This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.

This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.

This patch does not handle modules. That comes with a later patch.

Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Steven Rostedt
2008-10-14 16:34:44 +0800
5f87f1121 tracing: clean up tracepoints kconfig structure ... Browse Code »

do not expose users to CONFIG_TRACEPOINTS - tracers can select it
just fine.

update ftrace to select CONFIG_TRACEPOINTS.

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-10-14 16:33:32 +0800
fa340d9c0 tracing: disable tracepoints by default ... Browse Code »

while it's arguably low overhead, we dont enable new features by default.

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-10-14 16:32:56 +0800
97e1c18e8 tracing: Kernel Tracepoints ... Browse Code »

Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.

Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.

While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.

The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.

* 2 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------

* 4 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------

* 8 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------

Signed-off-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: 'Peter Zijlstra'
Signed-off-by: Ingo Molnar

Mathieu Desnoyers
2008-10-14 16:28:28 +0800
20272c899 Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc ... Browse Code »

* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
proc: remove kernel.maps_protect
proc: remove now unneeded ADDBUF macro
[PATCH] proc: show personality via /proc/pid/personality
[PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
proc: make grab_header() static
proc: remove unused get_dma_list()
proc: remove dummy vmcore_open()
proc: proc_sys_root tweak
proc: fix return value of proc_reg_open() in "too late" case

Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h

Linus Torvalds
2008-10-14 01:04:04 +0800

12 Oct, 2008

5 commits

a364092a4 raid: make RAID autodetect default a KConfig option ... Browse Code »

RAID autodetect has the side effect of requiring synchronisation
of all device drivers, which can make the boot several seconds longer
(I've measured 7 on one of my laptops).... even for systems that don't
have RAID setup for the root filesystem (the only FS where this matters).

This patch makes the default for autodetect a config option; either way
the user can always override via the kernel command line.

Signed-off-by: Arjan van de Ven
Acked-by: NeilBrown

Arjan van de Ven
2008-10-12 23:25:02 +0800
82cbc11a4 warning: fix init do_mounts_md c ... Browse Code »

fix warning:

init/do_mounts_md.c: In function ‘md_run_setup’:
init/do_mounts_md.c:282: warning: ISO C90 forbids mixed declarations and code

also, use the opportunity to put the RAID autodetection code
into a separate function - this also solves a checkpatch style warning.

No code changed:

md5:
aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.before.asm
aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.after.asm

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-10-12 23:24:34 +0800
02c15def8 fastboot: make the RAID autostart code print a message just before waiting ... Browse Code »

As requested/suggested by Neil Brown: make the raid code print that it's
about to wait for probing to be done as well as give a suggestion on how
to disable the probing if the user doesn't use raid.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com

Arjan van de Ven
2008-10-12 23:23:53 +0800
589f800bb fastboot: make the raid autodetect code wait for all devices to init ... Browse Code »

The raid autodetect code really needs to have all devices probed before
it can detect raid arrays; not doing so would give rather messy situations
where arrays would get detected as degraded while they shouldn't be etc.

This is in preparation of removing the "wait for everything to init"
code that makes everyone pay, not just raid users.

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-10-12 23:23:04 +0800
f9b9796ad Add a script to visualize the kernel boot process / time ... Browse Code »

When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with some of the initializing
going asynchronous soon, it's valuable to track/print which worker thread
is executing the initialization.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven

Arjan van de Ven
2008-10-12 23:07:20 +0800

10 Oct, 2008

1 commit

53167a3ef proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig ... Browse Code »

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2008-10-10 08:18:57 +0800

09 Oct, 2008

1 commit

55dc7db70 init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param ... Browse Code »

DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless. Root devices should
be explicitly specified using textual names. Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning
to the help text.

Signed-off-by: Tejun Heo
Cc: Bartlomiej Zolnierkiewicz
Signed-off-by: Jens Axboe

Tejun Heo
2008-10-09 14:56:11 +0800