Doug / smarc-fsl-linux-kernel | Embedian Git Server

26 Feb, 2013

1 commit

ecf3d1f1a vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op ... Browse Code »

The following set of operations on a NFS client and server will cause

server# mkdir a
client# cd a
server# mv a a.bak
client# sleep 30 # (or whatever the dir attrcache timeout is)
client# stat .
stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-02-26 15:46:09 +0800

21 Dec, 2012

1 commit

b9f61c3c0 documentation: drop vmtruncate ... Browse Code »

Removed vmtruncate

Signed-off-by: Marco Stornelli
Signed-off-by: Al Viro

Marco Stornelli
2012-12-21 07:47:08 +0800

04 Aug, 2012

1 commit

34e5053fb Documentation: get rid of write_super ... Browse Code »

The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from various pieces of the kernel documentation.

Cc: Randy Dunlap
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro

Artem Bityutskiy
2012-08-04 05:25:20 +0800

14 Jul, 2012

4 commits

ebfc3b49a don't pass nameidata to ->create() ... Browse Code »

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:47 +0800
00cd8dd3b stop passing nameidata to ->lookup() ... Browse Code »

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:32 +0800
0b728e191 stop passing nameidata * to ->d_revalidate() ... Browse Code »

Just the lookup flags. Die, bastard, die...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:14 +0800
049b3c10e vfs: update documentation on ->i_dentry handling ... Browse Code »

we used to need to clean it in RCU callback freeing an inode;
in 3.2 that requirement went away. Unfortunately, it hadn't
been reflected in Documentation/filesystems/porting.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:32:51 +0800

06 May, 2012

1 commit

dbd5768f8 vfs: Rename end_writeback() to clear_inode() ... Browse Code »

After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara
Signed-off-by: Fengguang Wu

Jan Kara
2012-05-06 13:43:41 +0800

21 Mar, 2012

1 commit

32991ab30 vfs: d_alloc_root() gone ... Browse Code »

all callers converted to d_make_root() by now

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:37 +0800

26 Jul, 2011

1 commit

4e34e719e fs: take the ACL checks to common code ... Browse Code »

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2011-07-26 02:30:23 +0800

21 Jul, 2011

2 commits

02c24a821 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers ... Browse Code »

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara
Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2011-07-21 08:47:59 +0800
982d81658 fs: add SEEK_HOLE and SEEK_DATA flags ... Browse Code »

This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck. We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end. That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA. This is how Solaris does it so we have to do it
the same way.

Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2011-07-21 08:47:56 +0800

20 Jul, 2011

1 commit

76fe3276b ->permission() sanitizing: document API changes ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:36 +0800

25 Mar, 2011

1 commit

f283c86af fs: remove inode_lock from iput_final and prune_icache ... Browse Code »

Now that inode state changes are protected by the inode->i_lock and
the inode LRU manipulations by the inode_lru_lock, we can remove the
inode_lock from prune_icache and the initial part of iput_final().

instead of using the inode_lock to protect the inode during
iput_final, use the inode->i_lock instead. This protects the inode
against new references being taken while we change the inode state
to I_FREEING, as well as preventing prune_icache from grabbing the
inode while we are manipulating it. Hence we no longer need the
inode_lock in iput_final prior to setting I_FREEING on the inode.

For prune_icache, we no longer need the inode_lock to protect the
LRU list, and the inodes themselves are protected against freeing
races by the inode->i_lock. Hence we can lift the inode_lock from
prune_icache as well.

Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Dave Chinner
2011-03-25 09:16:32 +0800

17 Mar, 2011

1 commit

1a102ff92 vfs: bury ->get_sb() ... Browse Code »

This is an ex-parrot.

Signed-off-by: Al Viro

Al Viro
2011-03-17 04:48:06 +0800

14 Jan, 2011

2 commits

db9effe99 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/npiggin/linux-npiggin

* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
fs: fix do_last error case when need_reval_dot
nfs: add missing rcu-walk check
fs: hlist UP debug fixup
fs: fix dropping of rcu-walk from force_reval_path
fs: force_reval_path drop rcu-walk before d_invalidate
fs: small rcu-walk documentation fixes

Fixed up trivial conflicts in Documentation/filesystems/porting

Linus Torvalds
2011-01-14 12:14:13 +0800
a82416da8 fs: small rcu-walk documentation fixes ... Browse Code »

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-14 10:26:53 +0800

13 Jan, 2011

1 commit

924241575 fs: add documentation on fallocate hole punching ... Browse Code »

This patch simply adds documentation on how to handle the hole punching mode of
fallocate for any filesystem wishing to use it.

Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2011-01-13 09:19:03 +0800

07 Jan, 2011

7 commits

b74c79e99 fs: provide rcu-walk aware permission i_ops ... Browse Code »

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
34286d666 fs: rcu-walk aware d_revalidate method ... Browse Code »

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800
b5c84bf6f fs: dcache remove dcache_lock ... Browse Code »

dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:23 +0800
b1e6a015a fs: change d_hash for rcu-walk ... Browse Code »

Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:20 +0800
621e155a3 fs: change d_compare for rcu-walk ... Browse Code »

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:19 +0800
fe15ce446 fs: change d_delete semantics ... Browse Code »

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:18 +0800

10 Aug, 2010

2 commits

336fb3b97 update VFS documentation for method changes. ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:48:40 +0800
1e2317350 update documentation for the new truncate sequence ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-10 04:47:40 +0800

28 Oct, 2009

1 commit

dc7a08166 nfs: new subdir Documentation/filesystems/nfs ... Browse Code »

We're adding enough nfs documentation that it may as well have its own
subdirectory.

Acked-by: Randy Dunlap
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2009-10-28 07:34:04 +0800

08 Feb, 2008

2 commits

12debc424 iget: remove iget() and the read_inode() super op as being obsolete ... Browse Code »

Remove the old iget() call and the read_inode() superblock operation it uses
as these are really obsolete, and the use of read_inode() does not produce
proper error handling (no distinction between ENOMEM and EIO when marking an
inode bad).

Furthermore, this removes the temptation to use iget() to find an inode by
number in a filesystem from code outside that filesystem.

iget_locked() should be used instead. A new function is added in an earlier
patch (iget_failed) that is to be called to mark an inode as bad, unlock it
and release it should the get routine fail. Mark iget() and read_inode() as
being obsolete and remove references to them from the documentation.

Typically a filesystem will be modified such that the read_inode function
becomes an internal iget function, for example the following:

void thingyfs_read_inode(struct inode *inode)
{
...
}

would be changed into something like:

struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
{
struct inode *inode;
int ret;

inode = iget_locked(sb, ino);
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;

...
unlock_new_inode(inode);
return inode;
error:
iget_failed(inode);
return ERR_PTR(ret);
}

and then thingyfs_iget() would be called rather than iget(), for example:

ret = -EINVAL;
inode = iget(sb, ino);
if (!inode || is_bad_inode(inode))
goto error;

becomes:

inode = thingyfs_iget(sb, ino);
if (IS_ERR(inode)) {
ret = PTR_ERR(inode);
goto error;
}

Note that is_bad_inode() does not need to be called. The error returned by
thingyfs_iget() should render it unnecessary.

Signed-off-by: David Howells
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2008-02-08 00:42:29 +0800
b46980fee iget: introduce a function to register iget failure ... Browse Code »

Introduce a function to register failure in an inode construction path. This
includes marking the inode under construction as bad, unlocking it and
releasing it.

Signed-off-by: David Howells
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2008-02-08 00:42:26 +0800

03 Feb, 2008

1 commit

3eb43f689 Documentation/filesystems/porting fixes ... Browse Code »

typo fix and whitespace cleanup

Signed-off-by: Oliver Pinter
Signed-off-by: Adrian Bunk

Oliver Pinter
2008-02-03 23:59:17 +0800

25 May, 2007

1 commit

c2b38989c Documentation: Fix up docs still talking about i_sem ... Browse Code »

.. it got changed to 'i_mutex' some time ago.

Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Linus Torvalds

Josef 'Jeff' Sipek
2007-05-25 01:16:17 +0800

23 Jun, 2006

1 commit

454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800