13 Dec, 2006
1 commit
-
Signed-off-by: Jesper Juhl
Acked-By: Randy Dunlap
Signed-off-by: Adrian Bunk
12 Dec, 2006
2 commits
-
We should not initialize rootfs before all the core initializers have
run. So do it as a separate stage just before starting the regular
driver initializers.Signed-off-by: Linus Torvalds
-
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary. It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5:
"[PATCH] Fix linux banner utsname information").This just restores "linux_banner" as a static string, which should fix
the version finding. And /proc/version simply uses a different string.To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.Cc: Andy Whitcroft
Acked-by: Herbert Poetzl
Cc: Andi Kleen
Cc: Andrew Morton
Cc: Steve Fox
Acked-by: Olaf Hering
Signed-off-by: Linus Torvalds
11 Dec, 2006
1 commit
-
The present per-task IO accounting isn't very useful. It simply counts the
number of bytes passed into read() and write(). So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.(David Wright had some comments on the applicability of the present logical IO accounting:
For billing purposes it is useless but for workload analysis it is very
usefulread_bytes/read_calls average read request size
write_bytes/write_calls average write request sizeread_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
write_bytes/write_blocks ie logical/physical guess since pdflush writes can
be missedI often look for logical larger than physical to see filesystem cache
problems. And the bytes/cpusec can help find applications that are
dominating the cache and causing slow interactive response from page cache
contention.I want to find the IO intensive applications and make sure they are doing
efficient IO. Thus the acctcms(sysV) or csacms command would give the high
IO commands).This patchset adds new accounting which tries to be more accurate. We account
for three things:reads:
attempt to count the number of bytes which this process really did cause
to be fetched from the storage layer. Done at the submit_bio() level, so it
is accurate for block-backed filesystems. I also attempt to wire up NFS and
CIFS.writes:
attempt to count the number of bytes which this process caused to be sent
to the storage layer. This is done at page-dirtying time.The big inaccuracy here is truncate. If a process writes 1MB to a file
and then deletes the file, it will in fact perform no writeout. But it will
have been accounted as having caused 1MB of write.So...
cancelled_writes:
account the number of bytes which this process caused to not happen, by
truncating pagecache.We _could_ just subtract this from the process's `write' accounting. But
that means that some processes would be reported to have done negative
amounts of write IO, which is silly.So we just report the raw number and punt this decision up to userspace.
Now, we _could_ account for writes at the physical I/O level. But
- This would require that we track memory-dirtying tasks at the per-page
level (would require a new pointer in struct page).- It would mean that IO statistics for a process are usually only available
long after that process has exitted. Which means that we probably cannot
communicate this info via taskstats.This patch:
Wire up the kernel-private data structures and the accessor functions to
manipulate them.Cc: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Cc: David Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2006
2 commits
-
Add a per pid_namespace child-reaper. This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space. Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285
Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
utsname information is shown in the linux banner, which also is used for
/proc/version (which can have different utsname values inside a uts
namespaces). this patch makes the varying data arguments and changes the
string to a format string, using those arguments.Signed-off-by: Herbert Poetzl
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
4 commits
-
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
[PATCH] x86-64: Export smp_call_function_single
[PATCH] i386: Clean up smp_tune_scheduling()
[PATCH] unwinder: move .eh_frame to RODATA
[PATCH] unwinder: fully support linker generated .eh_frame_hdr section
[PATCH] x86-64: don't use set_irq_regs()
[PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
[PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
[PATCH] i386: replace kmalloc+memset with kzalloc
[PATCH] x86-64: remove remaining pc98 code
[PATCH] x86-64: remove unused variable
[PATCH] x86-64: Fix constraints in atomic_add_return()
[PATCH] x86-64: fix asm constraints in i386 atomic_add_return
[PATCH] x86-64: Correct documentation for bzImage protocol v2.05
[PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
[PATCH] x86-64: Fix numaq build error
[PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
[PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
[PATCH] x86-64: Clarify error message in GART code
[PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
[PATCH] x86-64: Remove unwind stack pointer alignment forcing again
...Fixed conflict in include/linux/uaccess.h manually
Signed-off-by: Linus Torvalds
-
Keith says
Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), wait_hpet_tick is
optimized away to a never ending loop and the kernel hangs on boot in timer
setup.0000001a :
1a: 55 push %ebp
1b: 89 e5 mov %esp,%ebp
1d: eb fe jmp 1dThis is not a problem with gcc 3.3.5. Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.And the consensus is that gcc-4.1.0 is busted. Suse went and shipped
gcc-4.1.0 so we cannot ban it. Add a warning.Cc: Keith Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It turns out that the "-c" option of cpio is highly unportable even between
distros let alone unix variants, and may actually make the wrong type of
cpio archive. I just wasted quite some time on this, and the kernel can
detect this and warn about it (it's __init memory so it gets thrown away
and thus there is no runtime overhead)Signed-off-by: Arjan van de Ven
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Dec, 2006
1 commit
-
Both lhype and Xen want to call the core of the x86 cpu detect code before
calling start_kernel.(extracted from larger patch)
AK: folded in start_kernel header patch
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen
Cc: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Andrew Morton
02 Dec, 2006
1 commit
-
Provide a way to support older versions of udev that are shipped in
older distros. If this option is disabled, it will also turn off the
compatible symlinks in sysfs that older programs might rely on.When in doubt, or if running a distro older than 2006, say Yes here.
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
09 Nov, 2006
1 commit
-
The basic issue is that despite have been deprecated and warned about as a
very bad thing in the man pages since its inception there are a few real
users of sys_sysctl. It was my assumption that because sysctl had been
deprecated for all of 2.6 there would be no user space users by this point,
so I initially gave sys_sysctl a very short deprecation period.Now that I know there are a few real users the only sane way to proceed
with deprecation is to push the time limit out to a year or two work and
work with distributions that have big testing pools like fedora core to
find these last remaining users.Which means that the sys_sysctl interface needs to be maintained in the
meantime.Since I have provided a technical measure that allows us to add new sysctl
entries without reserving more binary numbers I believe that is enough to
fix the sys_sysctl binary interface maintenance problems, because there is
no longer a need to change the binary interface at all.Since the sys_sysctl implementation needs to stay around for a while and
the worst of the maintenance issues that caused us to occasionally break
the ABI have been addressed I don't see any advantage in continuing with
the removal of sys_sysctl.So instead of merely increasing the deprecation period this patch removes
the deprecation of sys_sysctl and modifies the kernel to compile the code
in by default.With committing to maintain sys_sysctl we get all of the advantages of a
fast interface for anything that needs it. Currently sys_sysctl is about
5x faster than /proc/sys, for the same string data.Signed-off-by: Eric W. Biederman
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Oct, 2006
1 commit
-
This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.Signed-off-by: Jan Beulich
Signed-off-by: Andi Kleen
21 Oct, 2006
1 commit
-
This should make sure that, for UML, host's configuration files are not
considered, which avoids various pains to the user. Our dependency are such
that the obtained Kconfig will be valid and will lead to successful
compilation - however they cannot prevent an user from disabling any boot
device, and if an option is not set in the read .config (say
/boot/config-XXX), with make menuconfig ARCH=um, it is not set. This always
disables UBD and all console I/O channels, which leads to non-working UML
kernels, so this bothers users - especially now, since it will happen on
almost every machine (/boot/config-`uname -r` exists almost on every machine).
It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
can be avoided, so please _do_ merge this patch.Given the existence of options, it could be interesting to implement
(additionally) "option required" - with it, Kconfig will refuse reading a
.config file (from wherever it comes) if the given option is not set. With
this, one could mark with it the option characteristic of the given
architecture (it was an old proposal of Roman Zippel, when I pointed out our
problem):config UML
option required
default yHowever this should be further discussed:
*) for x86, it must support constructs like:==arch/i386/Kconfig==
config 64BIT
option required
default n
where Kconfig must require that CONFIG_64BIT is disabled or not present in the
read .config.*) do we want to do such checks only for the starting defconfig or also for
.config? Which leads to:
*) I may want to port a x86_64 .config to x86 and viceversa, or even among more
different archs. Should that be allowed, and in which measure (the user may
force skipping the check for a .config or it is only given a warning by
default)?Cc: Roman Zippel
Cc:
Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2006
1 commit
-
kbuild explicitly includes this at build time.
Signed-off-by: Dave Jones
03 Oct, 2006
1 commit
-
Once upon a time we needed to fixed limit to the number of md devices,
probably because we preallocated some array. This need no longer exists, but
we still have an arbitrary limit.So remove MAX_MD_DEVS and allow as many devices as we can fit into the 'minor'
part of a device number.Also remove some useless noise at init time (which reports MAX_MD_DEVS) and
remove MD_THREAD_NAME_MAX which hasn't been used for a while.Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Oct, 2006
4 commits
-
There are a few places in the kernel where the init task is signaled. The
ctrl+alt+del sequence is one them. It kills a task, usually init, using a
cached pid (cad_pid).This patch replaces the pid_t by a struct pid to avoid pid wrap around
problem. The struct pid is initialized at boot time in init() and can be
modified through systctl with/proc/sys/kernel/cad_pid
[ I haven't found any distro using it ? ]
It also introduces a small helper routine kill_cad_pid() which is used
where it seemed ok to use cad_pid instead of pid 1.[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Cedric Le Goater
Cc: Eric W. Biederman
Cc: Martin Schwidefsky
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The use of execve() in the kernel is dubious, since it relies on the
__KERNEL_SYSCALLS__ mechanism that stores the result in a global errno
variable. As a first step of getting rid of this, change all users to a
global kernel_execve function that returns a proper error code.This function is a terrible hack, and a later patch removes it again after the
kernel syscalls are gone.Signed-off-by: Arnd Bergmann
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace. Basically, it is another building block of
containers functionality.This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Cedric Le Goater
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2006
4 commits
-
Add extended system accounting handling over taskstats interface. A
CONFIG_TASK_XACCT flag is created to enable the extended accounting code.Signed-off-by: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jes Sorensen
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SYSCTL should still depend on EMBEDDED. This unbreaks the EMBEDDED menu
(from the recent SYSCTL_SYCALL menu option patch).Fix typos in new SYSCTL_SYSCALL menu.
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The driver for /proc/config.gz consumes rather a lot of memory and it is in
fact possible to build it as a module.In some ways this is a bit risky, because the .config which is used for
compiling kernel/configs.c isn't necessarily the same as the .config which was
used to build vmlinux.But OTOH the potential memory savings are decent, and it'd be fairly dumb to
build your configs.o with a different .config.Signed-off-by: Andrew Morton
Cc: "Randy.Dunlap"
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
27 Sep, 2006
3 commits
-
Since sys_sysctl is deprecated start allow it to be compiled out. This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Resetting the devices during driver initialization can be a costly
operation in terms of time (especially scsi devices). This option can be
used by drivers to know that user forcibly wants the devices to be reset
during initialization.This option can be useful while kernel is booting in unreliable
environment. For ex. during kdump boot where devices are in unknown
random state and BIOS execution has been skipped.Signed-off-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
26 Sep, 2006
3 commits
-
Needed for use of the unwinder in lockdep, because lockdep runs really
early too.Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen
-
We currently assume that boot parameters which are handled by
early_param() will not overlap boot parameters handled by __setup: if
they do, behaviour is dependent on link order, usually meaning __setup
will not get called.ACPI wants to use early_param("pci"), and pci uses __setup("pci="), so
we modify the core to let them coexist: "pci=noacpi" will now get
passed to both.Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen -
This adds the infrastructure for drivers to do a threaded probe, and
waits at init time for all currently outstanding probes to complete.A new kernel thread will be created when the probe() function for the
driver is called, if the multithread_probe bit is set in the driver
saying it can support this kind of operation.I have tested this with USB and PCI, and it works, and shaves off a lot
of time in the boot process, but there are issues with finding root boot
disks, and some USB drivers assume that this can never happen, so it is
currently not enabled for any bus type. Individual drivers can enable
this right now if they wish, and bus authors can selectivly turn it on
as well, once they determine that their subsystem will work properly
with it.Signed-off-by: Greg Kroah-Hartman
17 Sep, 2006
1 commit
-
Fix two problems with the CONFIG_EMBEDDED submenu:
(1) The menu was split in two by the rt_mutex patch, which moved
half the items into the "General setup" menu.(2) CONFIG_SYSCTL and CONFIG_UID16 were added to the main menu
instead of the submenu.Signed-off-by: Chuck Ebbert
Cc: Sam Ravnborg
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jul, 2006
3 commits
-
Usage of taskstats interface by delay accounting.
Signed-off-by: Shailabh Nagar
Signed-off-by: Balbir Singh
Cc: Jes Sorensen
Cc: Peter Chubb
Cc: Erich Focht
Cc: Levent Serinol
Cc: Jay Lan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
family), for getting statistics of tasks and thread groups during their
lifetime and when they exit. The interface is intended for use by multiple
accounting packages though it is being created in the context of delay
accounting.This patch creates the interface without populating the fields of the data
that is sent to the user in response to a command or upon the exit of a task.
Each accounting package interested in using taskstats has to provide an
additional patch to add its stats to the common structure.[akpm@osdl.org: cleanups, Kconfig fix]
Signed-off-by: Shailabh Nagar
Signed-off-by: Balbir Singh
Cc: Jes Sorensen
Cc: Peter Chubb
Cc: Erich Focht
Cc: Levent Serinol
Cc: Jay Lan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Initialization code related to collection of per-task "delay" statistics which
measure how long it had to wait for cpu, sync block io, swapping etc. The
collection of statistics and the interface are in other patches. This patch
sets up the data structures and allows the statistics collection to be
disabled through a kernel boot parameter.Signed-off-by: Shailabh Nagar
Signed-off-by: Balbir Singh
Cc: Jes Sorensen
Cc: Peter Chubb
Cc: Erich Focht
Cc: Levent Serinol
Cc: Jay Lan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jul, 2006
5 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
kbuild: introduce utsrelease.h
kbuild: explicit turn off gcc stack-protector -
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Generic lock debugging:
- generalized lock debugging framework. For example, a bug in one lock
subsystem turns off debugging in all lock subsystems.- got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
the mutex/rtmutex debugging code: it caused way too much prototype
hackery, and lockdep will give the same information anyway.- ability to do silent tests
- check lock freeing in vfree too.
- more finegrained debugging options, to allow distributions to
turn off more expensive debugging features.There's no separate 'held mutexes' list anymore - but there's a 'held locks'
stack within lockdep, which unifies deadlock detection across all lock
classes. (this is independent of the lockdep validation stuff - lockdep first
checks whether we are holding a lock already)Here are the current debugging options:
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=ywhich do:
config DEBUG_MUTEXES
bool "Mutex debugging, basic checks"config DEBUG_LOCK_ALLOC
bool "Detect incorrect freeing of live mutexes"Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
s390's console_init must enable interrupts, but early_boot_irqs_on() gets
called later. To avoid problems move console_init() after local_irq_enable().Signed-off-by: Heiko Carstens
Acked-by: Ingo Molnar
Cc: Martin Schwidefsky
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds