Commit cc53ce53c86924bfe98a12ea20b7465038a08792

Authored by David Howells
Committed by Al Viro
1 parent 9875cf8064

Add a dentry op to allow processes to be held during pathwalk transit

Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Showing 15 changed files with 126 additions and 50 deletions Side-by-side Diff

Documentation/filesystems/Locking
... ... @@ -20,6 +20,7 @@
20 20 void (*d_iput)(struct dentry *, struct inode *);
21 21 char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
22 22 struct vfsmount *(*d_automount)(struct path *path);
  23 + int (*d_manage)(struct dentry *, bool);
23 24  
24 25 locking rules:
25 26 rename_lock ->d_lock may block rcu-walk
... ... @@ -31,6 +32,7 @@
31 32 d_iput: no no yes no
32 33 d_dname: no no no no
33 34 d_automount: no no yes no
  35 +d_manage: no no yes no
34 36  
35 37 --------------------------- inode_operations ---------------------------
36 38 prototypes:
Documentation/filesystems/vfs.txt
... ... @@ -865,6 +865,7 @@
865 865 void (*d_iput)(struct dentry *, struct inode *);
866 866 char *(*d_dname)(struct dentry *, char *, int);
867 867 struct vfsmount *(*d_automount)(struct path *);
  868 + int (*d_manage)(struct dentry *, bool);
868 869 };
869 870  
870 871 d_revalidate: called when the VFS needs to revalidate a dentry. This
871 872  
... ... @@ -938,11 +939,29 @@
938 939 target and the parent VFS mount record to provide inheritable mount
939 940 parameters. NULL should be returned if someone else managed to make
940 941 the automount first. If the automount failed, then an error code
941   - should be returned.
  942 + should be returned. If -EISDIR is returned, then the directory will
  943 + be treated as an ordinary directory and returned to pathwalk to
  944 + continue walking.
942 945  
943 946 This function is only used if DCACHE_NEED_AUTOMOUNT is set on the
944 947 dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the
945 948 inode being added.
  949 +
  950 + d_manage: called to allow the filesystem to manage the transition from a
  951 + dentry (optional). This allows autofs, for example, to hold up clients
  952 + waiting to explore behind a 'mountpoint' whilst letting the daemon go
  953 + past and construct the subtree there. 0 should be returned to let the
  954 + calling process continue. -EISDIR can be returned to tell pathwalk to
  955 + use this directory as an ordinary directory and to ignore anything
  956 + mounted on it and not to check the automount flag. Any other error
  957 + code will abort pathwalk completely.
  958 +
  959 + If the 'mounting_here' parameter is true, then namespace_sem is being
  960 + held by the caller and the function should not initiate any mounts or
  961 + unmounts that it will then wait for.
  962 +
  963 + This function is only used if DCACHE_MANAGE_TRANSIT is set on the
  964 + dentry being transited from.
946 965  
947 966 Example :
948 967  
drivers/staging/autofs/dirhash.c
... ... @@ -88,14 +88,13 @@
88 88 }
89 89 path.mnt = mnt;
90 90 path_get(&path);
91   - if (!follow_down(&path)) {
  91 + if (!follow_down_one(&path)) {
92 92 path_put(&path);
93 93 DPRINTK(("autofs: not expirable\
94 94 (not a mounted directory): %s\n", ent->name));
95 95 continue;
96 96 }
97   - while (d_mountpoint(path.dentry) && follow_down(&path))
98   - ;
  97 + follow_down(&path, false); // TODO: need to check error
99 98 umount_ok = may_umount(path.mnt);
100 99 path_put(&path);
101 100  
... ... @@ -273,10 +273,7 @@
273 273 break;
274 274 case -EBUSY:
275 275 /* someone else made a mount here whilst we were busy */
276   - while (d_mountpoint(nd->path.dentry) &&
277   - follow_down(&nd->path))
278   - ;
279   - err = 0;
  276 + err = follow_down(&nd->path, false);
280 277 default:
281 278 mntput(newmnt);
282 279 break;
fs/autofs4/autofs_i.h
... ... @@ -229,19 +229,6 @@
229 229 int autofs4_wait_release(struct autofs_sb_info *,autofs_wqt_t,int);
230 230 void autofs4_catatonic_mode(struct autofs_sb_info *);
231 231  
232   -static inline int autofs4_follow_mount(struct path *path)
233   -{
234   - int res = 0;
235   -
236   - while (d_mountpoint(path->dentry)) {
237   - int followed = follow_down(path);
238   - if (!followed)
239   - break;
240   - res = 1;
241   - }
242   - return res;
243   -}
244   -
245 232 static inline u32 autofs4_get_dev(struct autofs_sb_info *sbi)
246 233 {
247 234 return new_encode_dev(sbi->sb->s_dev);
fs/autofs4/dev-ioctl.c
... ... @@ -551,7 +551,7 @@
551 551  
552 552 err = have_submounts(path.dentry);
553 553  
554   - if (follow_down(&path))
  554 + if (follow_down_one(&path))
555 555 magic = path.mnt->mnt_sb->s_magic;
556 556 }
557 557  
... ... @@ -56,7 +56,7 @@
56 56  
57 57 path_get(&path);
58 58  
59   - if (!follow_down(&path))
  59 + if (!follow_down_one(&path))
60 60 goto done;
61 61  
62 62 if (is_autofs4_dentry(path.dentry)) {
... ... @@ -234,7 +234,7 @@
234 234 nd->flags);
235 235 /*
236 236 * For an expire of a covered direct or offset mount we need
237   - * to break out of follow_down() at the autofs mount trigger
  237 + * to break out of follow_down_one() at the autofs mount trigger
238 238 * (d_mounted--), so we can see the expiring flag, and manage
239 239 * the blocking and following here until the expire is completed.
240 240 */
... ... @@ -243,7 +243,7 @@
243 243 if (ino->flags & AUTOFS_INF_EXPIRING) {
244 244 spin_unlock(&sbi->fs_lock);
245 245 /* Follow down to our covering mount. */
246   - if (!follow_down(&nd->path))
  246 + if (!follow_down_one(&nd->path))
247 247 goto done;
248 248 goto follow;
249 249 }
250 250  
... ... @@ -292,11 +292,10 @@
292 292 * multi-mount with no root offset so we don't need
293 293 * to follow it.
294 294 */
295   - if (d_mountpoint(dentry)) {
296   - if (!autofs4_follow_mount(&nd->path)) {
297   - status = -ENOENT;
  295 + if (d_managed(dentry)) {
  296 + status = follow_down(&nd->path, false);
  297 + if (status < 0)
298 298 goto out_error;
299   - }
300 299 }
301 300  
302 301 done:
fs/cifs/cifs_dfs_ref.c
... ... @@ -273,10 +273,7 @@
273 273 break;
274 274 case -EBUSY:
275 275 /* someone else made a mount here whilst we were busy */
276   - while (d_mountpoint(nd->path.dentry) &&
277   - follow_down(&nd->path))
278   - ;
279   - err = 0;
  276 + err = follow_down(&nd->path, false);
280 277 default:
281 278 mntput(newmnt);
282 279 break;
... ... @@ -960,6 +960,7 @@
960 960  
961 961 /*
962 962 * Handle a dentry that is managed in some way.
  963 + * - Flagged for transit management (autofs)
963 964 * - Flagged as mountpoint
964 965 * - Flagged as automount point
965 966 *
... ... @@ -979,6 +980,16 @@
979 980 while (managed = ACCESS_ONCE(path->dentry->d_flags),
980 981 managed &= DCACHE_MANAGED_DENTRY,
981 982 unlikely(managed != 0)) {
  983 + /* Allow the filesystem to manage the transit without i_mutex
  984 + * being held. */
  985 + if (managed & DCACHE_MANAGE_TRANSIT) {
  986 + BUG_ON(!path->dentry->d_op);
  987 + BUG_ON(!path->dentry->d_op->d_manage);
  988 + ret = path->dentry->d_op->d_manage(path->dentry, false);
  989 + if (ret < 0)
  990 + return ret == -EISDIR ? 0 : ret;
  991 + }
  992 +
982 993 /* Transit to a mounted filesystem. */
983 994 if (managed & DCACHE_MOUNTED) {
984 995 struct vfsmount *mounted = lookup_mnt(path);
... ... @@ -1012,7 +1023,7 @@
1012 1023 return 0;
1013 1024 }
1014 1025  
1015   -int follow_down(struct path *path)
  1026 +int follow_down_one(struct path *path)
1016 1027 {
1017 1028 struct vfsmount *mounted;
1018 1029  
1019 1030  
1020 1031  
... ... @@ -1029,14 +1040,19 @@
1029 1040  
1030 1041 /*
1031 1042 * Skip to top of mountpoint pile in rcuwalk mode. We abort the rcu-walk if we
1032   - * meet an automount point and we're not walking to "..". True is returned to
  1043 + * meet a managed dentry and we're not walking to "..". True is returned to
1033 1044 * continue, false to abort.
1034 1045 */
1035 1046 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
1036 1047 struct inode **inode, bool reverse_transit)
1037 1048 {
  1049 + unsigned abort_mask =
  1050 + reverse_transit ? 0 : DCACHE_MANAGE_TRANSIT;
  1051 +
1038 1052 while (d_mountpoint(path->dentry)) {
1039 1053 struct vfsmount *mounted;
  1054 + if (path->dentry->d_flags & abort_mask)
  1055 + return true;
1040 1056 mounted = __lookup_mnt(path->mnt, path->dentry, 1);
1041 1057 if (!mounted)
1042 1058 break;
... ... @@ -1087,6 +1103,57 @@
1087 1103 }
1088 1104  
1089 1105 /*
  1106 + * Follow down to the covering mount currently visible to userspace. At each
  1107 + * point, the filesystem owning that dentry may be queried as to whether the
  1108 + * caller is permitted to proceed or not.
  1109 + *
  1110 + * Care must be taken as namespace_sem may be held (indicated by mounting_here
  1111 + * being true).
  1112 + */
  1113 +int follow_down(struct path *path, bool mounting_here)
  1114 +{
  1115 + unsigned managed;
  1116 + int ret;
  1117 +
  1118 + while (managed = ACCESS_ONCE(path->dentry->d_flags),
  1119 + unlikely(managed & DCACHE_MANAGED_DENTRY)) {
  1120 + /* Allow the filesystem to manage the transit without i_mutex
  1121 + * being held.
  1122 + *
  1123 + * We indicate to the filesystem if someone is trying to mount
  1124 + * something here. This gives autofs the chance to deny anyone
  1125 + * other than its daemon the right to mount on its
  1126 + * superstructure.
  1127 + *
  1128 + * The filesystem may sleep at this point.
  1129 + */
  1130 + if (managed & DCACHE_MANAGE_TRANSIT) {
  1131 + BUG_ON(!path->dentry->d_op);
  1132 + BUG_ON(!path->dentry->d_op->d_manage);
  1133 + ret = path->dentry->d_op->d_manage(path->dentry, mounting_here);
  1134 + if (ret < 0)
  1135 + return ret == -EISDIR ? 0 : ret;
  1136 + }
  1137 +
  1138 + /* Transit to a mounted filesystem. */
  1139 + if (managed & DCACHE_MOUNTED) {
  1140 + struct vfsmount *mounted = lookup_mnt(path);
  1141 + if (!mounted)
  1142 + break;
  1143 + dput(path->dentry);
  1144 + mntput(path->mnt);
  1145 + path->mnt = mounted;
  1146 + path->dentry = dget(mounted->mnt_root);
  1147 + continue;
  1148 + }
  1149 +
  1150 + /* Don't handle automount points here */
  1151 + break;
  1152 + }
  1153 + return 0;
  1154 +}
  1155 +
  1156 +/*
1090 1157 * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
1091 1158 */
1092 1159 static void follow_mount(struct path *path)
... ... @@ -3530,6 +3597,7 @@
3530 3597 };
3531 3598  
3532 3599 EXPORT_SYMBOL(user_path_at);
  3600 +EXPORT_SYMBOL(follow_down_one);
3533 3601 EXPORT_SYMBOL(follow_down);
3534 3602 EXPORT_SYMBOL(follow_up);
3535 3603 EXPORT_SYMBOL(get_write_access); /* binfmt_aout */
... ... @@ -1844,9 +1844,10 @@
1844 1844 return err;
1845 1845  
1846 1846 down_write(&namespace_sem);
1847   - while (d_mountpoint(path->dentry) &&
1848   - follow_down(path))
1849   - ;
  1847 + err = follow_down(path, true);
  1848 + if (err < 0)
  1849 + goto out;
  1850 +
1850 1851 err = -EINVAL;
1851 1852 if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
1852 1853 goto out;
... ... @@ -1940,9 +1941,10 @@
1940 1941  
1941 1942 down_write(&namespace_sem);
1942 1943 /* Something was mounted here while we slept */
1943   - while (d_mountpoint(path->dentry) &&
1944   - follow_down(path))
1945   - ;
  1944 + err = follow_down(path, true);
  1945 + if (err < 0)
  1946 + goto unlock;
  1947 +
1946 1948 err = -EINVAL;
1947 1949 if (!(mnt_flags & MNT_SHRINKABLE) && !check_mnt(path->mnt))
1948 1950 goto unlock;
... ... @@ -176,10 +176,7 @@
176 176 path_put(&nd->path);
177 177 goto out;
178 178 out_follow:
179   - while (d_mountpoint(nd->path.dentry) &&
180   - follow_down(&nd->path))
181   - ;
182   - err = 0;
  179 + err = follow_down(&nd->path, false);
183 180 goto out;
184 181 }
185 182  
... ... @@ -88,8 +88,9 @@
88 88 .dentry = dget(dentry)};
89 89 int err = 0;
90 90  
91   - while (d_mountpoint(path.dentry) && follow_down(&path))
92   - ;
  91 + err = follow_down(&path, false);
  92 + if (err < 0)
  93 + goto out;
93 94  
94 95 exp2 = rqst_exp_get_by_name(rqstp, &path);
95 96 if (IS_ERR(exp2)) {
include/linux/dcache.h
... ... @@ -168,6 +168,7 @@
168 168 void (*d_iput)(struct dentry *, struct inode *);
169 169 char *(*d_dname)(struct dentry *, char *, int);
170 170 struct vfsmount *(*d_automount)(struct path *);
  171 + int (*d_manage)(struct dentry *, bool);
171 172 } ____cacheline_aligned;
172 173  
173 174 /*
174 175  
... ... @@ -214,8 +215,9 @@
214 215  
215 216 #define DCACHE_MOUNTED 0x10000 /* is a mountpoint */
216 217 #define DCACHE_NEED_AUTOMOUNT 0x20000 /* handle automount on this dir */
  218 +#define DCACHE_MANAGE_TRANSIT 0x40000 /* manage transit from this dirent */
217 219 #define DCACHE_MANAGED_DENTRY \
218   - (DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT)
  220 + (DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
219 221  
220 222 extern seqlock_t rename_lock;
221 223  
... ... @@ -404,7 +406,12 @@
404 406  
405 407 extern void dput(struct dentry *);
406 408  
407   -static inline int d_mountpoint(struct dentry *dentry)
  409 +static inline bool d_managed(struct dentry *dentry)
  410 +{
  411 + return dentry->d_flags & DCACHE_MANAGED_DENTRY;
  412 +}
  413 +
  414 +static inline bool d_mountpoint(struct dentry *dentry)
408 415 {
409 416 return dentry->d_flags & DCACHE_MOUNTED;
410 417 }
include/linux/namei.h
... ... @@ -79,7 +79,8 @@
79 79  
80 80 extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
81 81  
82   -extern int follow_down(struct path *);
  82 +extern int follow_down_one(struct path *);
  83 +extern int follow_down(struct path *, bool);
83 84 extern int follow_up(struct path *);
84 85  
85 86 extern struct dentry *lock_rename(struct dentry *, struct dentry *);