Eric Lee / smarc-fsl-linux-kernel

08 May, 2007

2 commits

50953fe9e slab allocators: Remove SLAB_DEBUG_INITIAL flag ... Browse Code »

I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
f98393a64 mm: remove destroy_dirty_buffers from invalidate_bdev() ... Browse Code »

Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev($[^,]*$,[^)]*)/invalidate_bdev(\1)/g' $file;
done

Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-05-08 03:12:55 +0800

13 Feb, 2007

1 commit

ee9b6d61a [PATCH] Mark struct super_operations const ... Browse Code »

This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef 'Jeff' Sipek
2007-02-13 01:48:47 +0800

12 Feb, 2007

2 commits

2e7842b88 [PATCH] fix umask when noACL kernel meets extN tuned for ACLs ... Browse Code »

Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.

This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.

We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).

Likewise don't set the XATTR_USER flag when built without XATTR support.

Signed-off-by: Hugh Dickins
Cc: Tigran Aivazian
Cc:
Cc: Andreas Gruenbacher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-02-12 02:51:34 +0800
ea9a05a13 [PATCH] ext3: refuse ro to rw remount of fs with orphan inodes ... Browse Code »

In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.

Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.

Signed-off-by: Eric Sandeen
Cc: "Stephen C. Tweedie"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2007-02-12 02:51:34 +0800

09 Dec, 2006

1 commit

f0d1b0b30 [PATCH] LOG2: Implement a general integer log2 facility in the kernel ... Browse Code »

This facility provides three entry points:

ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

int do_something(long q)
{
...;
y = ilog2(x)
...;
}

Or can be used to statically initialise global variables with constant values:

unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Herbert Xu
Cc: David Howells
Cc: Wojtek Kaniewski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-12-09 00:28:51 +0800

08 Dec, 2006

4 commits

a8f48a956 [PATCH] ext3/4: don't do orphan processing on readonly devices ... Browse Code »

If you do something like:

# touch foo
# tail -f foo &
# rm foo
#
#

you'll panic, because ext3/4 tries to do orphan list processing on the
readonly snapshot device, and:

kernel: journal commit I/O error
kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"
kernel: Kernel panic: Fatal exception

for a truly readonly underlying device, it's reasonable and necessary
to just skip orphan list processing.

Signed-off-by: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2006-12-08 00:39:44 +0800
50ee0a32b [PATCH] ext3: fsid for statvfs ... Browse Code »

Update ext3_statfs to return an FSID that is a 64 bit XOR of the 128 bit
filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
entry for details:

http://bugzilla.kernel.org/show_bug.cgi?id=136

Cc: Andreas Dilger
Cc: Stephen Tweedie
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2006-12-08 00:39:31 +0800
e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
e6b4f8da3 [PATCH] slab: remove SLAB_NOFS ... Browse Code »

SLAB_NOFS is an alias of GFP_NOFS.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800

12 Oct, 2006

1 commit

2245d7c21 [PATCH] ext3: errors behaviour fix ... Browse Code »

Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.

According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:

---- start of quote man 8 mount ----

errors=continue / errors=remount-ro / errors=panic

Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).

---- end of quote ----

However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.

Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:

- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;

- if options are omitted, parse_option() does not set any of these options.

Therefore it is possible to set any combination of these options on the ext2:

- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;

- any of them may be set using mount options;

- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;

- and finally all three options may be set by adding third option in remount.

Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.

This patch:

EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
error behaviour.

Signed-off-by: Dmitry Mishin
Acked-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Mishin
2006-10-12 02:14:21 +0800

27 Sep, 2006

8 commits

1a1d92c10 [PATCH] Really ignore kmem_cache_destroy return value ... Browse Code »

* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-09-27 23:26:10 +0800
f8314dc60 [PATCH] fs: Conversions from kmalloc+memset to k(z|c)alloc ... Browse Code »

Conversions from kmalloc+memset to kzalloc.

Signed-off-by: Panagiotis Issaris
Jffs2-bit-acked-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Panagiotis Issaris
2006-09-27 23:26:10 +0800
a4e4de36d [PATCH] ext3: Fix sparse warnings ... Browse Code »

Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Kleikamp
2006-09-27 23:26:10 +0800
e9ad5620b [PATCH] ext3: More whitespace cleanups ... Browse Code »

More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.

Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Kleikamp
2006-09-27 23:26:10 +0800
7543fc7b3 [PATCH] ext3: wrong error behavior ... Browse Code »

SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
behavior was broken in linux kernels since 2.5.x versions by the following
patch:

2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

In case ext3 file system is mounted with errors=continue
(EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
present in case of any error kernel aborts journal and remounts filesystem
to read-only. Such behavior was hit number of times and noted to differ
from that of 2.4.x kernels.

This patch fixes this:
- do nothing in case of EXT3_ERRORS_CONTINUE,
- set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
- panic() should be called after ext3_commit_super() to save
sb marked as EXT3_ERROR_FS

Signed-off-by: Vasily Averin
Acked-by: Kirill Korotaev
Cc: Theodore Ts'o
Cc: "Stephen C. Tweedie"
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasily Averin
2006-09-27 23:26:09 +0800
eee194e76 [PATCH] ext3: inode numbers are unsigned long ... Browse Code »

This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....

Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2006-09-27 23:26:09 +0800
855565e81 [PATCH] fix ext3 mounts at 16T ... Browse Code »

I need to do some actual IO testing now, but this gets things mounting for
a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
off the kernel list)

This patch fixes these issues in the kernel:

o sbi->s_groups_count overflows in ext3_fill_super()

sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) +
EXT3_BLOCKS_PER_GROUP(sb) - 1) /
EXT3_BLOCKS_PER_GROUP(sb);

at 16T, s_blocks_count is already maxed out; adding
EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
Not really what we want, and causes a failed mount.

Feel free to check my math (actually, please do!), but changing it this
way should work & avoid the overflow:

(A + B - 1)/B changed to: ((A - 1)/B) + 1

o ext3_check_descriptors() overflows range checks

ext3_check_descriptors() iterates over all block groups making sure
that various bits are within the right block ranges... on the last pass
through, it is checking the error case

[item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

where "block" is the first block in the last block group. The last
block in this group (and the last one that will fit in 32 bits) is block
+ EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
back around to 0.

so, make things clearer with "first_block" and "last_block" where those
are first and last, inclusive, and use rather than =.

Finally, the last block group may be smaller than the rest, so account
for this on the last pass through: last_block = sb->s_blocks_count - 1;

(a similar patch could be done for ext2; does anyone in their right mind
use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
warranted)

Signed-off-by: Eric Sandeen
Cc: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2006-09-27 23:26:09 +0800
ae6ddcc5f [PATCH] ext3 and jbd cleanup: remove whitespace ... Browse Code »

Remove whitespace from ext3 and jbd, before we clone ext4.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-09-27 23:26:09 +0800

17 Sep, 2006

1 commit

fdb36673a [PATCH] knfsd: Make ext3 reject filehandles referring to invalid inode number ... Browse Code »

Inodes earlier than the 'first' inode (e.g. journal, resize) should be
rejected early - except the root inode. Also inode numbers that are too
big should be rejected early.

[akpm@osdl.org: cleanup]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-09-17 03:54:31 +0800

04 Jul, 2006

1 commit

5c81a4197 [PATCH] lockdep: annotate the quota code ... Browse Code »

The quota code plays interesting games with the lock ordering; to quote Jan:

| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...

The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function. For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.

The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems. This makes
the lock ordering of the I_MUTEX_* levels:

I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA

Has no effect on non-lockdep kernels.

Signed-off-by: Arjan van de Ven
Acked-by: Ingo Molnar
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-07-04 06:27:08 +0800

01 Jul, 2006

1 commit

6ab3d5624 Remove obsolete #include <linux/config.h> ... Browse Code »

Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk

Jörn Engel
2006-07-01 01:25:36 +0800

27 Jun, 2006

1 commit

ade1a29e1 [PATCH] ext3: Add "-o bh" option ... Browse Code »

This patch adds "-o bh" option to force use of buffer_heads. This option
is needed when we make "nobh" as default - and if we run into problems.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-06-27 00:58:20 +0800

26 Jun, 2006

4 commits

43d23f903 [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversion ... Browse Code »

Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:10 +0800
1c2bf374a [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixes ... Browse Code »

Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.

With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.

This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.

Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.

This patch:

Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.

This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:10 +0800
d2e5b13c4 [PATCH] ext3: remove inconsistent space before exclamation point in mount code ... Browse Code »

This was reported as Debian bug #336604.

Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Ts'o
2006-06-26 01:01:07 +0800
fcd5df358 [PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystem ... Browse Code »

If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow. Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.

Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.

Signed-off-by: Mingming Cao
Acked-by: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-26 01:01:07 +0800

23 Jun, 2006

4 commits

0216bfcff [PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter ... Browse Code »

The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.

- Generic perpcu counters data type changes. The size of the global counter
and local counter were explictly specified using s64 and s32. The global
counter is changed from long to s64, while the local counter is changed from
long to s32, so we could avoid doing 64 bit update in most cases.

- Users of the percpu counters are updated to make use of the new
percpu_counter_init() routine now taking an additional parameter to allow
users to pass the initial value of the global counter.

Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2006-06-23 22:43:06 +0800
e6022603b [PATCH] ext3_clear_inode(): avoid kfree(NULL) ... Browse Code »

Steven Rostedt points out that `rsv' here is usually
NULL, so we should avoid calling kfree().

Also, fix up some nearby whitespace damage.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-06-23 22:43:05 +0800
726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

27 Mar, 2006

1 commit

a0e928523 [PATCH] ext3: "nobh" writeback support for filesystems blocksize < pagesize ... Browse Code »

There is no valid reason why we can't support "nobh" option for filesystems
with blocksize != PAGESIZE.

This patch lets them use "nobh" option for writeback mode for blocksize <
pagesize.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-03-27 00:57:02 +0800

24 Mar, 2006

3 commits

09fe316a7 [PATCH] fast ext3_statfs ... Browse Code »

Under I/O load it may take up to a dozen seconds to read all group
descriptors. This is what ext3_statfs() does. At the same time, we already
maintain global numbers of free inodes/blocks. Why don't we use them instead
of group reading and summing?

Cc: Ravikiran G Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Tomas
2006-03-24 23:33:25 +0800
fffb60f93 [PATCH] cpuset memory spread: slab cache format ... Browse Code »

Rewrap the overly long source code lines resulting from the previous
patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch
contains only formatting changes, and no function change.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:23 +0800
4b6a9316f [PATCH] cpuset memory spread: slab cache filesystems ... Browse Code »

Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.

If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.

The following inode and similar caches are marked SLAB_MEM_SPREAD:

file cache
==== =====
fs/adfs/super.c adfs_inode_cache
fs/affs/super.c affs_inode_cache
fs/befs/linuxvfs.c befs_inode_cache
fs/bfs/inode.c bfs_inode_cache
fs/block_dev.c bdev_cache
fs/cifs/cifsfs.c cifs_inode_cache
fs/coda/inode.c coda_inode_cache
fs/dquot.c dquot
fs/efs/super.c efs_inode_cache
fs/ext2/super.c ext2_inode_cache
fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
fs/ext3/super.c ext3_inode_cache
fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
fs/fat/cache.c fat_cache
fs/fat/inode.c fat_inode_cache
fs/freevxfs/vxfs_super.c vxfs_inode
fs/hpfs/super.c hpfs_inode_cache
fs/isofs/inode.c isofs_inode_cache
fs/jffs/inode-v23.c jffs_fm
fs/jffs2/super.c jffs2_i
fs/jfs/super.c jfs_ip
fs/minix/inode.c minix_inode_cache
fs/ncpfs/inode.c ncp_inode_cache
fs/nfs/direct.c nfs_direct_cache
fs/nfs/inode.c nfs_inode_cache
fs/ntfs/super.c ntfs_big_inode_cache_name
fs/ntfs/super.c ntfs_inode_cache
fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
fs/ocfs2/super.c ocfs2_inode_cache
fs/proc/inode.c proc_inode_cache
fs/qnx4/inode.c qnx4_inode_cache
fs/reiserfs/super.c reiser_inode_cache
fs/romfs/inode.c romfs_inode_cache
fs/smbfs/inode.c smb_inode_cache
fs/sysv/inode.c sysv_inode_cache
fs/udf/super.c udf_inode_cache
fs/ufs/super.c ufs_inode_cache
net/socket.c sock_inode_cache
net/sunrpc/rpc_pipe.c rpc_inode_cache

The choice of which slab caches to so mark was quite simple. I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch. Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.

Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:23 +0800

23 Mar, 2006

2 commits

974615186 [PATCH] convert ext3's truncate_sem to a mutex ... Browse Code »

ext3's truncate_sem is always released in the same function it's taken
and it otherwise is a mutex as well..

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-23 23:38:14 +0800
d3be915fc [PATCH] sem2mutex: quota ... Browse Code »

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-03-23 23:38:11 +0800

10 Jan, 2006

2 commits

7892f2f48 [PATCH] mutex subsystem, semaphore to mutex: VFS, sb->s_lock ... Browse Code »

This patch converts the superblock-lock semaphore to a mutex, affecting
lock_super()/unlock_super(). Tested on ext3 and XFS.

Signed-off-by: Ingo Molnar

Ingo Molnar
2006-01-10 07:59:25 +0800
1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800

09 Jan, 2006

1 commit

71b962574 [PATCH] ext3: external journal device as a mount option ... Browse Code »

The patch below adds a new mount option to allow the external journal
device to be specified.

The syntax is as follows:
# mount -t ext3 -o journal_dev=0x0820 ...
where 0x0820 means major=8 and minor=32.

Signed-off-by: Johann Lombardi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johann Lombardi
2006-01-09 12:13:56 +0800