09 May, 2008
1 commit
-
Linus found a logic bug: we ignore the version number in a module's
vermagic string if we have CONFIG_MODVERSIONS set, but modversions
also lets through a module with no __versions section for modprobe
--force (with tainting, but still).We should only ignore the start of the vermagic string if the module
actually *has* crcs to check. Rather than (say) having an
entertaining hissy fit and creating a config option to work around the
buggy code.Signed-off-by: Rusty Russell
Signed-off-by: Linus Torvalds
07 May, 2008
1 commit
-
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.Signed-off-by: Stas Sergeev
Acked-by: Thomas Gleixner
Acked-by: Dmitry Torokhov
CC: Vojtech Pavlik
CC: Michael Opdenacker
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai
06 May, 2008
3 commits
-
GROUP_SCHED is confirmed to cause unacceptable latencies, see:
http://lkml.org/lkml/2008/5/2/370.
Mark it EXPERIMENTAL and default to no for now.
Signed-off-by: Parag Warudkar
Signed-off-by: Ingo Molnar -
this replaces the rq->clock stuff (and possibly cpu_clock()).
- architectures that have an 'imperfect' hardware clock can set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK- the 'jiffie' window might be superfulous when we update tick_gtod
before the __update_sched_clock() call in sched_clock_tick()- cpu_clock() might be implemented as:
sched_clock_cpu(smp_processor_id())
if the accuracy proves good enough - how far can TSC drift in a
single jiffie when considering the filtering and idle hooks?[ mingo@elte.hu: various fixes and cleanups ]
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select.
the next change utilizes it.
Signed-off-by: Ingo Molnar
05 May, 2008
1 commit
-
The kernel module loader used to be much too happy to allow loading of
modules for the wrong kernel version by default. For example, if you
had MODVERSIONS enabled, but tried to load a module with no version
info, it would happily load it and taint the kernel - whether it was
likely to actually work or not!Generally, such forced module loading should be considered a really
really bad idea, so make it conditional on a new config option
(MODULE_FORCE_LOAD), and make it default to off.If somebody really wants to force module loads, that's their problem,
but we should not encourage it. Especially as it happened to me by
mistake (ie regular unversioned Fedora modules getting loaded) causing
lots of strange behavior.Signed-off-by: Linus Torvalds
02 May, 2008
1 commit
-
If we make SLUB_DEBUG depend on SYSFS then we can simplify some
#ifdefs and avoid others.Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg
30 Apr, 2008
3 commits
-
We can see an ever repeating problem pattern with objects of any kind in the
kernel:1) freeing of active objects
2) reinitialization of active objectsBoth problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem listEach operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are some places that are known to operate on tasks'
global pids only:* the rest_init() call (called on boot)
* the kgdb's getthread
* the create_kthread() (since the kthread is run in init ns)So use the find_task_by_pid_ns(..., &init_pid_ns) there
and schedule the find_task_by_pid for removal.[sukadev@us.ibm.com: Fix warning in kernel/pid.c]
Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The global init has a lot of long standing problems with the unhandled fatal
signals.- The "is_global_init(current)" check in get_signal_to_deliver()
protects only the main thread. Sub-thread can dequee the fatal
signal and shutdown the whole thread group except the main thread.
If it dequeues SIGSTOP /sbin/init will be stopped, this is not
right too. Note that we can't use is_global_init(->group_leader),
this breaks exec and this can't solve other problems we have.- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
on delivery. This breaks exec, has other bad implications, and this
is just wrong.Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps
to solve some other problems addressed by the subsequent patches.Currently we use this flag for the global init only, but it could also be used
by kthreads and (perhaps) by the sub-namespace inits.Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
11 commits
-
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.This change also enables us to eliminate the check every time idr_init() is
called.[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Disable sysctl_check.c for embedded targets. This saves about about 11 kB
in .text and another 11 kB in .data on a PXA255 embedded platform.Signed-off-by: Holger Schurig
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the mem_cgroup member from mm_struct and instead adds an owner.
This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.Signed-off-by: Balbir Singh
Cc: Pavel Emelianov
Cc: Hugh Dickins
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: Hirokazu Takahashi
Cc: David Rientjes ,
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Pekka Enberg
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement a cgroup to track and enforce open and mknod restrictions on device
files. A device cgroup associates a device access whitelist with each cgroup.
A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers. Major
and minor are either an integer or * for all. Access is a composition of r
(read), w (write), and m (mknod).The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
the parent. Admins can then remove devices from the whitelist or add new
entries. A child cgroup can never receive a device access which is denied its
parent. However when a device access is removed from a parent it will not
also be removed from the child(ren).An entry is added using devices.allow, and removed using
devices.deny. For instanceecho 'c 1:3 mr' > /cgroups/1/devices.allow
allows cgroup 1 to read and mknod the device usually known as
/dev/null. Doingecho a > /cgroups/1/devices.deny
will remove the default 'a *:* mrw' entry.
CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup. A cgroup may not be granted more permissions than the cgroup's parent
has. Any task can move itself between cgroups. This won't be sufficient, but
we can decide the best way to adequately restrict movement later.[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn
Acked-by: James Morris
Looks-good-to: Pavel Emelyanov
Cc: Daniel Hokka Zakrisson
Cc: Li Zefan
Cc: Paul Menage
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The cgroup debug subsystem isn't generally useful for users. It should
default to "n".Signed-off-by: Paul Menage
Cc: "Li Zefan"
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: "YAMAMOTO Takashi"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of using the malloc() and free() wrappers needed by the
lib/inflate.c code for allocations, simply use kmalloc() and kfree() in the
initramfs code. This is needed for a further lib/inflate.c-related cleanup
patch that will remove the malloc() and free() functions.Take that opportunity to remove the useless kmalloc() return value
cast.Based on work done by Matt Mackall.
Signed-off-by: Thomas Petazzoni
Signed-off-by: Matt Mackall
Cc: Jan Engelhardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
print_fn_descriptor_symbol() prints the address if we don't have a symbol, so
no need to print both.Also, combine printing return value with elapsed time. Changes this:
Calling initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50()
initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() returned 1.
initcall 0xc05b7a70 ran for 0 msecs: pci_mmcfg_late_insert_resources+0x0/0x50()
initcall at 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50(): returned with error code 1to this:
calling pci_mmcfg_late_insert_resources+0x0/0x50()
initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned 1 after 0 msecs
initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned with error code 1Signed-off-by: Bjorn Helgaas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
16 kB is often no longer enough for a normal boot of an UP system.
And even less when people e.g. use suspend.
17 seems to be a more reasonable default for current kernels on current
hardware (it's just the default, anyone who is memory limited can still lower
it).Signed-off-by: Adrian Bunk
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
init/do_mounts_rd.c:215:13: warning: Using plain integer as NULL pointer
init/do_mounts_md.c:136:45: warning: Using plain integer as NULL pointerSigned-off-by: Harvey Harrison
Signed-off-by: Linus Torvalds -
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: video drivers: add facility level
sparc: tcx.c make tcx_init and tcx_exit static
sparc: ffb.c make ffb_init and ffb_exit static
sparc: cg14.c make cg14_init and cg15_exit static
sparc: bw2.c fix bw2_exit
sparc64: Fix accidental syscall restart on child return from clone/fork/vfork.
sparc64: Clean up handling of pt_regs trap type encoding.
sparc: Remove old style signal frame support.
sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
sparc: sunzilog.c remove unused argument
sparc: fix drivers/video/tcx.c warning
sparc64: Kill unused local ISA bus layer.
input: Rewrite sparcspkr device probing.
sparc64: Do not ignore 'pmu' device ranges.
sparc64: Kill ISA_FLOPPY_WORKS code.
sparc64: Kill CONFIG_SPARC32_COMPAT
sparc64: Cleanups and corrections for arch/sparc64/Kconfig
sparc64: Fix wedged irq regression. -
this option has been the default on a wide range of distributions
for a long time - time to make it non-experimental.Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds
27 Apr, 2008
1 commit
-
It's completely superfluous, CONFIG_COMPAT is sufficient.
What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types. But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.Update defconfig.
Signed-off-by: David S. Miller
24 Apr, 2008
2 commits
-
Use the __weak macro instead of the longer __attribute__ ((weak)) form
in one place in init/main.c.Signed-off-by: Benjamin Herrenschmidt
Acked-by: Andrew Morton
--init/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Paul Mackerras -
Some architectures need to maintain a kmem cache for thread info
structures. The next commit adds that to powerpc to fix an alignment
problem.There is no good arch callback to use to initialize that cache
that I can find, so this adds a new one in the form of a weak
function whose default is empty.Signed-off-by: Benjamin Herrenschmidt
Acked-by: Andrew Morton
Signed-off-by: Paul Mackerras
20 Apr, 2008
3 commits
-
Viktor was nice enough to enhance the document based on my replies to
his questions on the subject.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar -
Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar -
* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
a pointer reference to CPU_MASK_ALL. This reduces where possible
the instances where CPU_MASK_ALL allocates and fills a large
array on the stack. Used only if NR_CPUS > BITS_PER_LONG.* Change init/main.c to use new set_cpus_allowed_ptr().
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr functionCc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar
14 Apr, 2008
1 commit
-
The per node counters are used mainly for showing data through the sysfs API.
If that API is not compiled in then there is no point in keeping track of this
data. Disable counters for the number of slabs and the number of total slabs
if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
potentially contended cacheline so this could actually be a performance
benefit to embedded systems.SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
is on by default).Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
if the system is not compiled with NUMA support.[penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg
16 Mar, 2008
1 commit
-
This essentially reverts commit 71fc47a9adf8ee89e5c96a47222915c5485ac437
("ACPI: basic initramfs DSDT override support"), because the code simply
isn't ready.It did ugly things to the init sequence to populate the rootfs image
early, but that just ended up showing other problems with the whole
approach. The fact is, the VFS layer simply isn't initialized this
early, and the relevant ACPI code should either run much later, or this
shouldn't be done at all.For 2.6.25, we'll just pick the latter option. We can revisit this
concept later if necessary.Cc: Dave Hansen
Cc: Tilman Schmidt
Cc: Andrew Morton
Cc: Thomas Renninger
Cc: Eric Piel
Cc: Len Brown
Cc: Christoph Hellwig
Cc: Markus Gaugusch
Signed-off-by: Linus Torvalds
11 Mar, 2008
1 commit
-
The original preemptible-RCU patch put the choice between classic and
preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to
init/Kconfig, which worked, but placed the choice between classic and
preemptible RCU at the top level, a very obtuse choice indeed.This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
kernel/Kconfig.preempt, where one would expect it to be. The other
(CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for
suggesting this approach.Signed-off-by: Paul E. McKenney
Cc: Ingo Molnar
Acked-by: Steven Rostedt
Cc: Dipankar Sarma
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Roman Zippel
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Mar, 2008
5 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
debugfs: fix sparse warnings
Driver core: Fix cleanup when failing device_add().
driver core: Remove dpm_sysfs_remove() from error path of device_add()
PM: fix new mutex-locking bug in the PM core
PM: Do not acquire device semaphores upfront during suspend
kobject: properly initialize ksets
sysfs: CONFIG_SYSFS_DEPRECATED fix
driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED -
Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.Signed-off-by: Balbir Singh
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Keith Mannthey said:
The parameter hotadd_percent is setup right but there is a "Malformed
early option 'numa'" message.Rusty Russell said:
This happens when the function registered with early_param() returns
non-zero. __setup() functions return 1 if OK, module_param() and
early_param() return 0 or a -ve error code.For instance:
Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Malformed early option 'loglevel'
127MB HIGHMEM available.
896MB LOWMEM available.Command line:
BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5
Acked-by: Yinghai Lu
Cc: Rusty Russell
Cc: Keith Mannthey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes
regressions in working setups that had SYSFS_DEPRECATED disabled.so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new
default via 'make oldconfig', even if their old .config's disabled
CONFIG_SYSFS_DEPRECATED ...Signed-off-by: Ingo Molnar
Cc: Kay Sievers
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Greg Kroah-Hartman -
As things get moved into this config option, the hard date of 2006 does
not work anymore, so update the text to be more descriptive.Cc: Kay Sievers
Cc: Jiri Slaby
Signed-off-by: Greg Kroah-Hartman
24 Feb, 2008
1 commit
-
Document huge memory/cache overhead of memory controller in Kconfig
I was a little surprised that 2.6.25-rc* increased struct page for the
memory controller. At least on many x86-64 machines it will not fit into a
single cache line now anymore and also costs considerable amounts of RAM.
At earlier review I remembered asking for a external data structure for
this.It's also quite unobvious that a innocent looking Kconfig option with a
single line Kconfig description has such a negative effect.This patch attempts to document these disadvantages at least so that users
configuring their kernel can make a informed decision.Signed-off-by: Andi Kleen
Cc: Balbir Singh
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Feb, 2008
1 commit
-
* Use struct path in fs_struct.
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jan Blunck
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Feb, 2008
1 commit
-
Make the rt group scheduler compile time configurable.
Keep it experimental for now.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
12 Feb, 2008
1 commit
-
When make -s support were added to filechk to
combination created with make V=1 were not
covered.
Fix it by explicitly cover this case too.Signed-off-by: Sam Ravnborg
Cc: Mike Frysinger
10 Feb, 2008
1 commit
-
DEBUG_PAGEALLOC must not be enabled before mem_init(). Before this
point there is nothing to allocate.Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar