06 Oct, 2016

1 commit

  • When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors.
    In particular it can be tempting to return ENOENT, but this is not
    handled well by nfsd.

    Rather than requiring strict adherence to error code code filesystems,
    treat all unexpected error codes the same as ESTALE. This is safest.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

03 May, 2016

2 commits


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

23 Feb, 2015

1 commit

  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

04 Nov, 2014

1 commit


01 Nov, 2014

1 commit


05 Jun, 2014

1 commit


09 Nov, 2013

9 commits

  • Suppose we're given the filehandle for a directory whose closest
    ancestor in the dcache is its Nth ancestor.

    The main loop in reconnect_path searches for an IS_ROOT ancestor of
    target_dir, reconnects that ancestor to its parent, then recommences the
    search for an IS_ROOT ancestor from target_dir.

    This behavior is quadratic in N. And there's really no need to restart
    the search from target_dir each time: once a directory has been looked
    up, it won't become IS_ROOT again. So instead of starting from
    target_dir each time, we can continue where we left off.

    This simplifies the code and improves performance on very deep directory
    heirachies. (I can't think of any reason anyone should need heirarchies
    a hundred or more deep, but the performance improvement may be valuable
    if only to limit damage in case of abuse.)

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Replace another unhelpful acronym.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Also replace 3 easily-confused three-letter acronyms by more helpful
    variable names.

    Just cleanup, no change in functionality, with one exception: the
    dentry_connected() check in the "out_reconnected" case will now only
    check the ancestors of the current dentry instead of checking all the
    way from target_dir. Since we've already verified connectivity up to
    this dentry, that should be sufficient.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Note this counter is now being set to 0 on every pass through the loop,
    so it no longer serves any useful purpose.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • There are two places here where we could race with a rename or remove:

    - We could find the parent, but then be removed or renamed away
    from that parent directory before finding our name in that
    directory.
    - We could find the parent, and find our name in that parent,
    but then be renamed or removed before we look ourselves up by
    that name in that parent.

    In both cases the concurrent rename or remove will take care of
    reconnecting the directory that we're currently examining. Our target
    directory should then also be connected. Check this and clear
    DISCONNECTED in these cases instead of looping around again.

    Note: we *do* need to check that this actually happened if we want to be
    robust in the face of corrupted filesystems: a corrupted filesystem
    could just return a completely wrong parent, and we want to fail with an
    error in that case before starting to clear DISCONNECTED on
    non-DISCONNECTED filesystems.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Once we've found any connected parent, we know all our parents are
    connected--that's true even if there's a concurrent rename. May as well
    clear them all at once and be done with it.

    Reviewed-by: Cristoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • This would indicate a nasty bug in the dcache and has never triggered in
    the past 10 years as far as I know.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
    32-bit NFS server exporting a very large XFS filesystem, when the
    server's cache is cold (so the inodes in question are not in cache).

    Reviewed-by: Christoph Hellwig
    Reported-by: Trevor Cordes
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     

08 Sep, 2013

1 commit


29 Jun, 2013

4 commits


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

21 Dec, 2012

1 commit

  • Pull nfsd update from Bruce Fields:
    "Included this time:

    - more nfsd containerization work from Stanislav Kinsbursky: we're
    not quite there yet, but should be by 3.9.

    - NFSv4.1 progress: implementation of basic backchannel security
    negotiation and the mandatory BACKCHANNEL_CTL operation. See

    http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

    for remaining TODO's

    - Fixes for some bugs that could be triggered by unusual compounds.
    Our xdr code wasn't designed with v4 compounds in mind, and it
    shows. A more thorough rewrite is still a todo.

    - If you've ever seen "RPC: multiple fragments per record not
    supported" logged while using some sort of odd userland NFS client,
    that should now be fixed.

    - Further work from Jeff Layton on our mechanism for storing
    information about NFSv4 clients across reboots.

    - Further work from Bryan Schumaker on his fault-injection mechanism
    (which allows us to discard selective NFSv4 state, to excercise
    rarely-taken recovery code paths in the client.)

    - The usual mix of miscellaneous bugs and cleanup.

    Thanks to everyone who tested or contributed this cycle."

    * 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
    nfsd4: don't leave freed stateid hashed
    nfsd4: free_stateid can use the current stateid
    nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
    nfsd: warn on odd reply state in nfsd_vfs_read
    nfsd4: fix oops on unusual readlike compound
    nfsd4: disable zero-copy on non-final read ops
    svcrpc: fix some printks
    NFSD: Correct the size calculation in fault_inject_write
    NFSD: Pass correct buffer size to rpc_ntop
    nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
    nfsd: simplify service shutdown
    nfsd: replace boolean nfsd_up flag by users counter
    nfsd: simplify NFSv4 state init and shutdown
    nfsd: introduce helpers for generic resources init and shutdown
    nfsd: make NFSd service structure allocated per net
    nfsd: make NFSd service boot time per-net
    nfsd: per-net NFSd up flag introduced
    nfsd: move per-net startup code to separated function
    nfsd: pass net to __write_ports() and down
    nfsd: pass net to nfsd_set_nrthreads()
    ...

    Linus Torvalds
     

18 Dec, 2012

2 commits

  • We will need this helper in the next patch to provide a file handle for
    inotify marks in /proc/pid/fdinfo output.

    The patch is rather providing the way to use inodes directly when dentry
    is not available (like in case of inotify system).

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • This routine will be used to generate a file handle in fdinfo output for
    inotify subsystem, where if no s_export_op present the general
    export_encode_fh should be used. Thus add a test if s_export_op present
    inside exportfs_encode_fh itself.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

08 Nov, 2012

1 commit


23 Jul, 2012

1 commit


14 Jul, 2012

1 commit


30 May, 2012

1 commit

  • pass inode + parent's inode or NULL instead of dentry + bool saying
    whether we want the parent or not.

    NOTE: that needs ceph fix folded in.

    Signed-off-by: Al Viro

    Al Viro
     

15 Mar, 2011

1 commit


14 Mar, 2011

1 commit

  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     

07 Jan, 2011

4 commits

  • dcache_inode_lock can be replaced with per-inode locking. Use existing
    inode->i_lock for this. This is slightly non-trivial because we sometimes
    need to find the inode from the dentry, which requires d_inode to be
    stabilised (either with refcount or d_lock).

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dget_locked was a shortcut to avoid the lazy lru manipulation when we already
    held dcache_lock (lru manipulation was relatively cheap at that point).
    However, how that the lru lock is an innermost one, we never hold it at any
    caller, so the lock cost can now be avoided. We already have well working lazy
    dcache LRU, so it should be fine to defer LRU manipulations to scan time.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
    from concurrent modification. d_alias is also protected by d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

26 Oct, 2010

1 commit

  • Use dget_parent instead of opencoding it. This simplifies the code, but
    more importanly prepares for the more complicated locking for a parent
    dget in the dcache scale patch series.

    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 Oct, 2009

1 commit


25 Dec, 2008

1 commit


09 Dec, 2008

1 commit

  • While 440037287c5 "[PATCH] switch all filesystems over to
    d_obtain_alias" removed some cases where fh_to_dentry() and
    fh_to_parent() could return NULL, there are still a few NULL returns
    left in individual filesystems. Thus it was a mistake for that commit
    to remove the handling of NULL returns in the callers.

    Revert those parts of 440037287c5 which removed the NULL handling.

    (We could, alternatively, modify all implementations to return -ESTALE
    instead of NULL, but that proves to require fixing a number of
    filesystems, and in some cases it's arguably more natural to return
    NULL.)

    Thanks to David for original patch and Linus, Christoph, and Hugh for
    review.

    Signed-off-by: J. Bruce Fields
    Cc: David Howells
    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Signed-off-by: Linus Torvalds

    J. Bruce Fields