08 Dec, 2006
40 commits
-
do_exit:
taskstats_exit_alloc()
...
taskstats_exit_send()
taskstats_exit_free()I think this is not good, let it be a single function exported to the core
kernel, taskstats_exit(), which does alloc + send + free itself.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Cc: Shailabh Nagar
Cc: Jay Lan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If there are no listeners, every task does unneeded kmem_cache alloc/free on
exit. We don't need listeners->sem for 'if (!list_empty())' check. Yes, we may
have a false positive, but this doesn't differ from the case when the listener
is unregistered after we drop the semaphore. So we don't need to do allocation
beforehand.Signed-off-by: Oleg Nesterov
Cc: Balbir Singh
Acked-by: Shailabh Nagar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use put_pages_list() instead of opencoding it.
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch makes module init return proper value instead of -1 (-EPERM).
Cc: Tim Waugh
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
probe_kernel_address() purports to be generic, only it forgot to select
KERNEL_DS, so it presently won't work right on all architectures.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add support for the parallel port (implemented as separate PCI function) on
the Oxford Semiconductor OX16PCI952.Signed-off-by: Ryan Underwood
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make PRINTK_TIME depend on PRINTK. Only display/offer it if PRINTK is
enabled.Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Heiko Carstens
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
communicate via memory/futex when they exit, so pthread_join can
synchronize using a simple futex wait. The word of user memory where NPTL
stores a thread's own TID is what it passes; this gets reset to zero at
thread exit.It is not desireable to touch this user memory when threads are dying due
to a fatal signal. A core dump is more usefully representative of the
dying program state if the threads live at the time of the crash have their
NPTL data structures unperturbed. The userland expectation of
CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
an _exit system call.This problem was identified by Ernie Petrides .
Signed-off-by: Roland McGrath
Cc: Ernie Petrides
Cc: Jakub Jelinek
Acked-by: Ingo Molnar
Cc: Ulrich Drepper
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Function v9fs_get_idpool returns int, not u32. Actually it returns -1 on
errors, and these two callers check if the value is smaller than 0, which
was caught by gcc with extra warning flags. Compile tested only but should
be OK, as the value computed in v9fs_get_idpool() is also int.Signed-of-by: Mika Kukkonen
Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Latchesar Ionkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix sparse NULL warning;
drivers/misc/tifm_core.c:223:17: warning: Using plain integer as NULL pointerFix style while there.
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gzBasically it makes a filesystem, splats some random bits over it, then
tries to mount it and do some simple filesystem actions.At best, the filesystem catches the corruption gracefully. At worst,
things spin out of control.As you might guess, we found a couple places in ext4 where things spin out
of control :)First, we had a corrupted directory that was never checked for
consistency... it was corrupt, and pointed to another bad "entry" of
length 0. The for() loop looped forever, since the length of
ext4_next_entry(de) was 0, and we kept looking at the same pointer over and
over and over and over... I modeled this check and subsequent action on
what is done for other directory types in ext4_readdir...(adding this check adds some computational expense; I am testing a followup
patch to reduce the number of times we check and re-check these directory
entries, in all cases. Thanks for the idea, Andreas).Next we had a root directory inode which had a corrupted size, claimed to
be > 200M on a 4M filesystem. There was only really 1 block in the
directory, but because the size was so large, readdir kept coming back for
more, spewing thousands of printk's along the way.Per Andreas' suggestion, if we're in this read error condition and we're
trying to read an offset which is greater than i_blocks worth of bytes,
stop trying, and break out of the loop.With these two changes fsfuzz test survives quite well on ext4.
Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gzBasically it makes a filesystem, splats some random bits over it, then
tries to mount it and do some simple filesystem actions.At best, the filesystem catches the corruption gracefully. At worst,
things spin out of control.As you might guess, we found a couple places in ext3 where things spin out
of control :)First, we had a corrupted directory that was never checked for
consistency... it was corrupt, and pointed to another bad "entry" of
length 0. The for() loop looped forever, since the length of
ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
over and over and over... I modeled this check and subsequent action on
what is done for other directory types in ext3_readdir...(adding this check adds some computational expense; I am testing a followup
patch to reduce the number of times we check and re-check these directory
entries, in all cases. Thanks for the idea, Andreas).Next we had a root directory inode which had a corrupted size, claimed to
be > 200M on a 4M filesystem. There was only really 1 block in the
directory, but because the size was so large, readdir kept coming back for
more, spewing thousands of printk's along the way.Per Andreas' suggestion, if we're in this read error condition and we're
trying to read an offset which is greater than i_blocks worth of bytes,
stop trying, and break out of the loop.With these two changes fsfuzz test survives quite well on ext3.
Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Randomizes -pie compiled binaries from 64k (0x10000) up to ELF_ET_DYN_BASE.
0 -> 64k is excluded to allow NULL ptr accesses to fail.
Signed-off-by: Marcus Meissner
Cc: Ingo Molnar
Cc: Dave Jones
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- numeric string size replaced with constant in print_lock_name and
print_lockdep_cache,- return on null pointer in print_lock_dependencies,
- one more lockdep return with 0 with unlocking fix in mark_lock.
Signed-off-by: Jarek Poplawski
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Here are mainly some lockdep returns with 0 with unlocking fixes.
Signed-off-by: Jarek Poplawski
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds -
paride_register() returns 1 on success, 0 on failure and module init
code looks likestatic int __init foo_init(void)
{
return paride_register(&foo) - 1;
}which is not what one get used to. Converted to usual 0/-E convention.
In case of kbic driver, unwind registration. It was just
return (paride_register(&k951)||paride_register(&k971))-1;
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We're about to change the semantics of pi_register()'s return value, so
rename it to something else first, so that any unconverted code reliaby
breaks.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In order for spi_busnum_to_master to work spi master devices must be linked
into the spi_master_class.subsys.kset list. At the moment the default
class_obj_subsys.kset is used and we can't enumerate the master devices.Signed-off-by: Hans-Christian Egtvedt
Cc: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Correct the following in driver/spi/spi.c in function spi_busnum_to_master:
* must allow bus_num 0, the if is really not needed.
* correct the name buffer which is too small for bus_num >= 10000. Itshould be 9 bytes big, not 8.
Signed-off-by: Hans-Christian Egtvedt
Cc: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lock_super() is unnecessary for setting super-block feature flags. Use the
provided *_SET_COMPAT_FEATURE() macros as well.Signed-off-by: Andreas Gruenbacher
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A couple of minor code simplifications to the kernel/cpuset.c code. No
functional change. Just a little less code and a little more readable.Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
linux/cdev.h uses struct kobject and other structs and should therefore
include them. Currently, a module either needs to add the missing includes
itself, or, in case a module includes other headers already, needs to put
last, which goes against a alphabetically-sorted include
list.Signed-off-by: Jan Engelhardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
rmmod/3080 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(proc_subdir_lock){--..}, at: [] remove_proc_entry+0x40/0x191and this task is already holding:
(ide_lock){++..}, at: [] ide_unregister_subdriver+0x39/0xc8
which would create a new lock dependency:
(ide_lock){++..} -> (proc_subdir_lock){--..}but this new dependency connects a hard-irq-safe lock:
(ide_lock){++..}
... which became hard-irq-safe at:
[] lock_acquire+0x4b/0x6b
[] _spin_lock_irqsave+0x22/0x32
[] ide_intr+0x17/0x1a9
[] handle_IRQ_event+0x20/0x4d
[] __do_IRQ+0x94/0xef
[] do_IRQ+0x9e/0xbdto a hard-irq-unsafe lock:
(proc_subdir_lock){--..}
... which became hard-irq-unsafe at:
... [] lock_acquire+0x4b/0x6b
[] _spin_lock+0x19/0x28
[] xlate_proc_name+0x1b/0x99
[] proc_create+0x46/0xdf
[] create_proc_entry+0x62/0xa5
[] proc_misc_init+0x1c/0x1d2
[] proc_root_init+0x4c/0xe9
[] start_kernel+0x294/0x3b3Move ide_remove_proc_entries() out from under ide_lock; there is nothing
that indicates that this is needed.In specific, the call to ide_add_proc_entries() is unprotected, and there
is nothing else in the file using the respective ->proc fields. Also the
lock order around destroy_proc_ide_interface() suggests this.Alan sayeth:
proc_ide_write_settings walks the setting list under ide_setting_sem, read
ditto. remove_proc_entry is doing proc side housekeeping.Looks fine to me, although that old code is such a mess anything could be
going on.Signed-off-by: Peter Zijlstra
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Jeff noted that the via driver returned an error to an unsigned int in a
a case where errors are not permitted. Move the check down earlier so we
can handle it properly. Not as pretty but it works this way and avoids
hacking up ugly stuff in the legacy ide core.Signed-off-by: Alan Cox
Cc: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
One of our test team hit a reiserfs_panic while running fsstress tests on
2.6.19-rc1. The message looks like :REISERFS: panic(device Null superblock):
reiserfs[5676]: assertion !(p->path_length != 1 ) failed at
fs/reiserfs/stree.c:397:reiserfs_check_path: path not properly relsed.The backtrace looked :
kernel BUG in reiserfs_panic at fs/reiserfs/prints.c:361!
.reiserfs_check_path+0x58/0x74
.reiserfs_get_block+0x1444/0x1508
.__block_prepare_write+0x1c8/0x558
.block_prepare_write+0x34/0x64
.reiserfs_prepare_write+0x118/0x1d0
.generic_file_buffered_write+0x314/0x82c
.__generic_file_aio_write_nolock+0x350/0x3e0
.__generic_file_write_nolock+0x78/0xb0
.generic_file_write+0x60/0xf0
.reiserfs_file_write+0x198/0x2038
.vfs_write+0xd0/0x1b4
.sys_write+0x4c/0x8c
syscall_exit+0x0/0x4Upon debugging I found that the restart_transaction was not releasing
the path if the th->refcount was > 1./*static*/
int restart_transaction(struct reiserfs_transaction_handle *th,
struct inode *inode, struct path *path)
{
[...]/* we cannot restart while nested */
if (th->t_refcount > 1) { <<i_sb)->j_next_async_flush = 1;-->> retval = restart_transaction(th, inode, &path); <refcount is > 1, the path is still valid. And,
if (retval)
goto failure;
repeat =
_allocate_block(th, block, inode,
&allocated_block_nr, NULL, create);If the above allocate_block fails with NO_DISK_SPACE or QUOTA_EXCEEDED,
we would have path which is not released.if (repeat != NO_DISK_SPACE && repeat != QUOTA_EXCEEDED) {
goto research;
}
if (repeat == QUOTA_EXCEEDED)
retval = -EDQUOT;
else
retval = -ENOSPC;
goto failure;
[...]failure:
[...]
reiserfs_check_path(&path); << Panics here !Attached here is a patch which could fix the issue.
fix reiserfs/inode.c : restart_transaction() to release the path in all
cases.The restart_transaction() doesn't release the path when the the journal
handle has a refcount > 1. This would trigger a reiserfs_panic() if we
encounter an -ENOSPC / -EDQUOT in reiserfs_get_block().Signed-off-by: Suzuki K P
Cc: "Vladimir V. Saveliev"
Cc:
Cc: Jeff Mahoney
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The new shared APM emulation just like its ARM and MIPS predecessors uses
pm_suspend() which was only exported on SH. Move export to close to it's
definition where it really should be anyway.Signed-off-by: Ralf Baechle
Cc: Russell King
Cc: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Documentation update, adding references to CFQ scheduler and to another
document about selecting IO Schedulers.Signed-off-by: Filipe Lautert
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
free_fdtable_rc() schedules timer to reschedule fddef->wq if
schedule_work() on it returns 0. However, schedule_work() guarantees that
the target work is executed at least once after the scheduling regardless
of its return value. 0 return simply means that the work was already
pending and thus no further action was required.Another problem is that it used contant '5' as @expires argument to
mod_timer().Kill unnecessary fddef->timer.
Signed-off-by: Tejun Heo
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add SysRq-X support: show blocked (TASK_UNINTERRUPTIBLE) tasks only.
Useful for debugging IO stalls.
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Randy Dunlap wote:
> Should FUSE depend on BLOCK? Without that and with BLOCK=n, I get:
>
> inode.c:(.text+0x3acc5): undefined reference to `sb_set_blocksize'
> inode.c:(.text+0x3a393): undefined reference to `get_sb_bdev'
> fs/built-in.o:(.data+0xd718): undefined reference to `kill_block_superMost fuse filesystems work fine without block device support, so I
think a better solution is to disable the 'fuseblk' filesystem type if
BLOCK=n.Signed-off-by: Miklos Szeredi
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a DESTROY operation for block device based filesystems. With the help of
this operation, such a filesystem can flush dirty data to the device
synchronously before the umount returns.This is needed in situations where the filesystem is assumed to be clean
immediately after unmount (e.g. ejecting removable media).Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add support for the BMAP operation for block device based filesystems. This
is needed to support swap-files and lilo.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add 'blksize' option for block device based filesystems. During
initialization this is used to set the block size on the device and the super
block. The default block size is 512bytes.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I never intended this, but people started using fuse to implement block device
based "real" filesystems (ntfs-3g, zfs).The following four patches add better support for these kinds of filesystems.
Unlike "normal" fuse filesystems, using this feature should require superuser
privileges (enforced by the fusermount utility).Thanks to Szabolcs Szakacsits for the input and testing.
This patch adds a 'fuseblk' filesystem type, which is only different from the
'fuse' filesystem type in how the 'dev_name' mount argument is interpreted.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove unneeded code from fuse_dentry_revalidate(). This made some sense
while the validity time could wrap around, but now it's a very obvious no-op.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a flag to the RELEASE message which specifies that a FLUSH operation
should be performed as well. This interface update is needed for the FreeBSD
port, and doesn't actually touch the Linux implementation at all.Also rename the unused 'flush_flags' in the FLUSH message to 'unused'.
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds