26 Feb, 2013

1 commit

  • The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30 # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

    Obviously, we should not be getting an ESTALE error back there since the
    inode still exists on the server. The problem is that the lookup code
    will call d_revalidate on the dentry that "." refers to, because NFS has
    FS_REVAL_DOT set.

    nfs_lookup_revalidate will see that the parent directory has changed and
    will try to reverify the dentry by redoing a LOOKUP. That of course
    fails, so the lookup code returns ESTALE.

    The problem here is that d_revalidate is really a bad fit for this case.
    What we really want to know at this point is whether the inode is still
    good or not, but we don't really care what name it goes by or whether
    the dcache is still valid.

    Add a new d_op->d_weak_revalidate operation and have complete_walk call
    that instead of d_revalidate. The intent there is to allow for a
    "weaker" d_revalidate that just checks to see whether the inode is still
    good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
    special casing.

    [AV: changed method name, added note in porting, fixed confusion re
    having it possibly called from RCU mode (it won't be)]

    Cc: NeilBrown
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

21 Dec, 2012

1 commit


04 Aug, 2012

1 commit


14 Jul, 2012

4 commits


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

21 Mar, 2012

1 commit


26 Jul, 2011

1 commit

  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2011

2 commits

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     
  • This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out
    using fiemap in things like cp cause more problems than it solves, so lets try
    and give userspace an interface that doesn't suck. We need to match solaris
    here, and the definitions are

    *o* If /whence/ is SEEK_HOLE, the offset of the start of the
    next hole greater than or equal to the supplied offset
    is returned. The definition of a hole is provided near
    the end of the DESCRIPTION.

    *o* If /whence/ is SEEK_DATA, the file pointer is set to the
    start of the next non-hole file region greater than or
    equal to the supplied offset.

    So in the generic case the entire file is data and there is a virtual hole at
    the end. That means we will just return i_size for SEEK_HOLE and will return
    the same offset for SEEK_DATA. This is how Solaris does it so we have to do it
    the same way.

    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

20 Jul, 2011

1 commit


25 Mar, 2011

1 commit

  • Now that inode state changes are protected by the inode->i_lock and
    the inode LRU manipulations by the inode_lru_lock, we can remove the
    inode_lock from prune_icache and the initial part of iput_final().

    instead of using the inode_lock to protect the inode during
    iput_final, use the inode->i_lock instead. This protects the inode
    against new references being taken while we change the inode state
    to I_FREEING, as well as preventing prune_icache from grabbing the
    inode while we are manipulating it. Hence we no longer need the
    inode_lock in iput_final prior to setting I_FREEING on the inode.

    For prune_icache, we no longer need the inode_lock to protect the
    LRU list, and the inodes themselves are protected against freeing
    races by the inode->i_lock. Hence we can lift the inode_lock from
    prune_icache as well.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     

17 Mar, 2011

1 commit


14 Jan, 2011

2 commits


13 Jan, 2011

1 commit


07 Jan, 2011

7 commits

  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_hash so it may be called from lock-free RCU lookups. See similar
    patch for d_compare for details.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_compare so it may be called from lock-free RCU lookups. This
    does put significant restrictions on what may be done from the callback,
    however there don't seem to have been any problems with in-tree fses.
    If some strange use case pops up that _really_ cannot cope with the
    rcu-walk rules, we can just add new rcu-unaware callbacks, which would
    cause name lookup to drop out of rcu-walk mode.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

10 Aug, 2010

2 commits


28 Oct, 2009

1 commit


08 Feb, 2008

2 commits

  • Remove the old iget() call and the read_inode() superblock operation it uses
    as these are really obsolete, and the use of read_inode() does not produce
    proper error handling (no distinction between ENOMEM and EIO when marking an
    inode bad).

    Furthermore, this removes the temptation to use iget() to find an inode by
    number in a filesystem from code outside that filesystem.

    iget_locked() should be used instead. A new function is added in an earlier
    patch (iget_failed) that is to be called to mark an inode as bad, unlock it
    and release it should the get routine fail. Mark iget() and read_inode() as
    being obsolete and remove references to them from the documentation.

    Typically a filesystem will be modified such that the read_inode function
    becomes an internal iget function, for example the following:

    void thingyfs_read_inode(struct inode *inode)
    {
    ...
    }

    would be changed into something like:

    struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
    {
    struct inode *inode;
    int ret;

    inode = iget_locked(sb, ino);
    if (!inode)
    return ERR_PTR(-ENOMEM);
    if (!(inode->i_state & I_NEW))
    return inode;

    ...
    unlock_new_inode(inode);
    return inode;
    error:
    iget_failed(inode);
    return ERR_PTR(ret);
    }

    and then thingyfs_iget() would be called rather than iget(), for example:

    ret = -EINVAL;
    inode = iget(sb, ino);
    if (!inode || is_bad_inode(inode))
    goto error;

    becomes:

    inode = thingyfs_iget(sb, ino);
    if (IS_ERR(inode)) {
    ret = PTR_ERR(inode);
    goto error;
    }

    Note that is_bad_inode() does not need to be called. The error returned by
    thingyfs_iget() should render it unnecessary.

    Signed-off-by: David Howells
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Introduce a function to register failure in an inode construction path. This
    includes marking the inode under construction as bad, unlocking it and
    releasing it.

    Signed-off-by: David Howells
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

03 Feb, 2008

1 commit


25 May, 2007

1 commit


23 Jun, 2006

1 commit

  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds