Eric Lee / smarc-fsl-linux-kernel

13 Apr, 2020

3 commits

40fc81027 afs: Fix afs_d_validate() to set the right directory version ... Browse Code »

If a dentry's version is somewhere between invalid_before and the current
directory version, we should be setting it forward to the current version,
not backwards to the invalid_before version. Note that we're only doing
this at all because dentry::d_fsdata isn't large enough on a 32-bit system.

Fix this by using a separate variable for invalid_before so that we don't
accidentally clobber the current dir version.

Fixes: a4ff7401fbfa ("afs: Keep track of invalid-before version for dentry coherency")
Signed-off-by: David Howells

David Howells
2020-04-13 22:09:01 +0800
2105c2820 afs: Fix race between post-modification dir edit and readdir/d_revalidate ... Browse Code »

AFS directories are retained locally as a structured file, with lookup
being effected by a local search of the file contents. When a modification
(such as mkdir) happens, the dir file content is modified locally rather
than redownloading the directory.

The directory contents are accessed in a number of ways, with a number of
different locks schemes:

(1) Download of contents - dvnode->validate_lock/write in afs_read_dir().

(2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(),
downgrading from (1) if necessary.

(3) d_revalidate of child dentry - dvnode->validate_lock/read in
afs_do_lookup_one() downgrading from (1) if necessary.

(4) Edit of dir after modification - page locks on individual dir pages.

Unfortunately, because (4) uses different locking scheme to (1) - (3),
nothing protects against the page being scanned whilst the edit is
underway. Even download is not safe as it doesn't lock the pages - relying
instead on the validate_lock to serialise as a whole (the theory being that
directory contents are treated as a block and always downloaded as a
block).

Fix this by write-locking dvnode->validate_lock around the edits. Care
must be taken in the rename case as there may be two different dirs - but
they need not be locked at the same time. In any case, once the lock is
taken, the directory version must be rechecked, and the edit skipped if a
later version has been downloaded by revalidation (there can't have been
any local changes because the VFS holds the inode lock, but there can have
been remote changes).

Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Signed-off-by: David Howells

David Howells
2020-04-13 22:09:01 +0800
b98f0ec91 afs: Fix rename operation status delivery ... Browse Code »

The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only
decode the second file status returned unless the parent directories are
different - unfortunately, this means that the xdr pointer isn't advanced
and the volsync record will be read incorrectly in such an instance.

Fix this by always decoding the second status into the second
status/callback block which wasn't being used if the dirs were the same.

The afs_update_dentry_version() calls that update the directory data
version numbers on the dentries can then unconditionally use the second
status record as this will always reflect the state of the destination dir
(the two records will be identical if the destination dir is the same as
the source dir)

Fixes: 260a980317da ("[AFS]: Add "directory write" support.")
Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells

David Howells
2020-04-13 22:09:01 +0800

15 Jan, 2020

2 commits

f52b83b0b afs: Fix afs_lookup() to not clobber the version on a new dentry ... Browse Code »

Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82ef530 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2020-01-15 01:40:06 +0800
40a708bd6 afs: Fix use-after-loss-of-ref ... Browse Code »

afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b03991f ("afs: Add more tracepoints")
Reported-by: Al Viro
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2020-01-15 01:40:06 +0800

16 Nov, 2019

1 commit

a28f239e2 afs: Fix race in commit bulk status fetch ... Browse Code »

When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally). When the results are received, it will speculatively
populate the inode cache from the extra data.

However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.

If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():

BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));

on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.

Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.

Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch")
Signed-off-by: David Howells
Reviewed-by: Marc Dionne
Signed-off-by: Linus Torvalds

David Howells
2019-11-16 02:28:02 +0800

02 Sep, 2019

1 commit

a0753c290 afs: Support RCU pathwalk ... Browse Code »

Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode
pathwalk to reduce latency in pathwalk elements that get done multiple
times. We don't need to query the server unless we've received a
notification from it that something has changed or the callback has
expired.

This requires that we can request a key and check permits under RCU
conditions if we need to.

Signed-off-by: David Howells

David Howells
2019-09-02 18:43:54 +0800

22 Aug, 2019

1 commit

c4c613ff0 afs: Fix possible oops in afs_lookup trace event ... Browse Code »

The afs_lookup trace event can cause the following:

[ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
[ 216.576803] #PF: supervisor read access in kernel mode
[ 216.576813] #PF: error_code(0x0000) - not-present page
...
[ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]

If the inode from afs_do_lookup() is an error other than ENOENT, or if it
is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
try to dereference the error pointer as a valid pointer.

Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.

Ideally the trace would include the error value, but for now just avoid
the oops.

Fixes: 80548b03991f ("afs: Add more tracepoints")
Signed-off-by: Marc Dionne
Signed-off-by: David Howells

Marc Dionne
2019-08-22 20:33:26 +0800

30 Jul, 2019

3 commits

9dd0b82ef afs: Fix missing dentry data version updating ... Browse Code »

In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold
the data version of the parent directory when it was created or when
d_revalidate() last caused it to be updated. This is compared to the
->invalid_before field in the directory inode, rather than the actual data
version number, thereby allowing changes due to local edits to be ignored.
Only if the server data version gets bumped unexpectedly (eg. by a
competing client), do we need to revalidate stuff.

However, the d_fsdata field should also be updated if an rpc op is
performed that modifies that particular dentry. Such ops return the
revised data version of the directory(ies) involved, so we should use that.

This is particularly problematic for rename, since a dentry from one
directory may be moved directly into another directory (ie. mv a/x b/x).
It would then be sporting the wrong data version - and if this is in the
future, for the destination directory, revalidations would be missed,
leading to foreign renames and hard-link deletion being missed.

Fix this by the following means:

(1) Return the data version number from operations that read the directory
contents - if they issue the read. This starts in afs_dir_iterate()
and is used, ignored or passed back by its callers.

(2) In afs_lookup*(), set the dentry version to the version returned by
(1) before d_splice_alias() is called and the dentry published.

(3) In afs_d_revalidate(), set the dentry version to that returned from
(1) if an rpc call was issued. This means that if a parallel
procedure, such as mkdir(), modifies the directory, we won't
accidentally use the data version from that.

(4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to
the directory data version before d_instantiate() is called.

(5) In afs_{rmdir,unlink}, update the target dentry's version to the
directory data version as soon as we've updated the directory inode.

(6) In afs_rename(), we need to unhash the old dentry before we start so
that we don't get afs_d_revalidate() reverting the version change in
cross-directory renames.

We then need to set both the old and the new dentry versions the data
version of the new directory before we call d_move() as d_move() will
rehash them.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells

David Howells
2019-07-30 21:38:52 +0800
5dc84855b afs: Only update d_fsdata if different in afs_d_revalidate() ... Browse Code »

In the in-kernel afs filesystem, d_fsdata is set with the data version of
the parent directory. afs_d_revalidate() will update this to the current
directory version, but it shouldn't do this if it the value it read from
d_fsdata is the same as no lock is held and cmpxchg() is not used.

Fix the code to only change the value if it is different from the current
directory version.

Fixes: 260a980317da ("[AFS]: Add "directory write" support.")
Signed-off-by: David Howells

David Howells
2019-07-30 21:38:51 +0800
37c0bbb33 afs: Fix off-by-one in afs_rename() expected data version calculation ... Browse Code »

When afs_rename() calculates the expected data version of the target
directory in a cross-directory rename, it doesn't increment it as it
should, so it always thinks that the target inode is unexpectedly modified
on the server.

Fixes: a58823ac4589 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells

David Howells
2019-07-30 21:38:51 +0800

11 Jul, 2019

1 commit

8dda9957e Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs ... Browse Code »

Pull afs updates from David Howells:
"A set of minor changes for AFS:

- Remove an unnecessary check in afs_unlink()

- Add a tracepoint for tracking callback management

- Add a tracepoint for afs_server object usage

- Use struct_size()

- Add mappings for AFS UAE abort codes to Linux error codes, using
symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Add support for the UAE error table
fs/afs: use struct_size() in kzalloc()
afs: Trace afs_server usage
afs: Add some callback management tracepoints
afs: afs_unlink() doesn't need to check dentry->d_inode

Linus Torvalds
2019-07-11 11:55:33 +0800

21 Jun, 2019

3 commits

ee102584e fs/afs: use struct_size() in kzalloc() ... Browse Code »

As Gustavo said in other patches doing the same replace, we can now
use the new struct_size() helper to avoid leaving these open-coded and
prone to type mistake.

Signed-off-by: Zhengyuan Liu
Signed-off-by: David Howells

Zhengyuan Liu
2019-06-21 01:12:17 +0800
051d25250 afs: Add some callback management tracepoints ... Browse Code »

Add a couple of tracepoints to track callback management:

(1) afs_cb_miss - Logs when we were unable to apply a callback, either due
to the inode being discarded or due to a competing thread applying a
callback first.

(2) afs_cb_break - Logs when we attempted to clear the noted callback
promise, either due to the server explicitly breaking the callback,
the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells

David Howells
2019-06-21 01:12:16 +0800
fa59f52f5 afs: afs_unlink() doesn't need to check dentry->d_inode ... Browse Code »

Don't check that dentry->d_inode is valid in afs_unlink(). We should be
able to take that as given.

This caused Smatch to issue the following warning:

fs/afs/dir.c:1392 afs_unlink() error: we previously assumed 'vnode' could be null (see line 1375)

Reported-by: kbuild test robot
Reported-by: Dan Carpenter
Signed-off-by: David Howells

David Howells
2019-06-21 01:12:16 +0800

31 May, 2019

1 commit

2874c5fd2 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:26:32 +0800

17 May, 2019

4 commits

39db9815d afs: Fix application of the results of a inline bulk status fetch ... Browse Code »

Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
it will update inodes that are already extant (something that afs_iget()
doesn't do) and to cache permits for each inode created (thereby avoiding a
follow up FS.FetchStatus call to determine this).

Extant inodes need looking up in advance so that their cb_break counters
before and after the operation can be compared. To this end, the inode
pointers are cached so that they don't need looking up again after the op.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
b83591532 afs: Pass pre-fetch server and volume break counts into afs_iget5_set() ... Browse Code »

Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.

This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
a38a75581 afs: Fix unlink to handle YFS.RemoveFile2 better ... Browse Code »

Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.

Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
f642404a0 afs: Make vnode->cb_interest RCU safe ... Browse Code »

Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest. Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().

This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.

Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800

16 May, 2019

3 commits

a58823ac4 afs: Fix application of status and callback to be under same lock ... Browse Code »

When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

(1) Allocating a joint {status,callback} record (afs_status_cb) before
calling the RPC function for each vnode for which the RPC reply
contains a status or a status plus a callback. A flag is set in the
record to indicate if a callback was actually received.

(2) These records are passed into the RPC functions to be filled in. The
afs_decode_status() and yfs_decode_status() functions are removed and
the cb_lock is no longer taken.

(3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
update the vnode.

(4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
the vnode.

(5) vnodes, expected data-version numbers and callback break counters
(cb_break) no longer need to be passed to the reply delivery
functions.

Note that, for the moment, the file locking functions still need
access to both the call and the vnode at the same time.

(6) afs_vnode_commit_status() is now given the cb_break value and the
expected data_version and the task of applying the status and the
callback to the vnode are now done here.

This is done under a single taking of vnode->cb_lock.

(7) afs_pages_written_back() is now called by afs_store_data() rather than
by the reply delivery function.

afs_pages_written_back() has been moved to before the call point and
is now given the first and last page numbers rather than a pointer to
the call.

(8) The indicator from YFS.RemoveFile2 as to whether the target file
actually got removed (status.abort_code == VNOVNODE) rather than
merely dropping a link is now checked in afs_unlink rather than in
xdr_decode_YFSFetchStatus().

Supplementary fixes:

(*) afs_cache_permit() now gets the caller_access mask from the
afs_status_cb object rather than picking it out of the vnode's status
record. afs_fetch_status() returns caller_access through its argument
list for this purpose also.

(*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
than a read lock and now sets the callback inside the same critical
section.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:21 +0800
87182759c afs: Fix order-1 allocation in afs_do_lookup() ... Browse Code »

afs_do_lookup() will do an order-1 allocation to allocate status records if
there are more than 39 vnodes to stat.

Fix this by allocating an array of {status,callback} records for each vnode
we want to examine using vmalloc() if larger than a page.

This not only gets rid of the order-1 allocation, but makes it easier to
grow beyond 50 records for YFS servers. It also allows us to move to
{status,callback} tuples for other calls too and makes it easier to lock
across the application of the status and the callback to the vnode.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:21 +0800
20b8391ff afs: Make some RPC operations non-interruptible ... Browse Code »

Make certain RPC operations non-interruptible, including:

(*) Set attributes
(*) Store data

We don't want to get interrupted during a flush on close, flush on
unlock, writeback or an inode update, leaving us in a state where we
still need to do the writeback or update.

(*) Extend lock
(*) Release lock

We don't want to get lock extension interrupted as the file locks on
the server are time-limited. Interruption during lock release is less
of an issue since the lock is time-limited, but it's better to
complete the release to avoid a several-minute wait to recover it.

*Setting* the lock isn't a problem if it's interrupted since we can
just return to the user and tell them they were interrupted - at
which point they can elect to retry.

(*) Silly unlink

We want to remove silly unlink files if we can, rather than leaving
them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list. The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings
Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:20 +0800

07 May, 2019

1 commit

b134d687d afs: Log more information for "kAFS: AFS vnode with undefined type\n" ... Browse Code »

Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0). This prints more information
to try and help pin down the problem.

Signed-off-by: David Howells

David Howells
2019-05-07 23:48:44 +0800

25 Apr, 2019

4 commits

80548b039 afs: Add more tracepoints ... Browse Code »

Add four more tracepoints:

(1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
to log also.

(2) afs_make_fs_call2 - Like the above but takes two filenames to log.

(3) afs_lookup - Log the result of doing a successful lookup, including a
negative result (fid 0:0).

(4) afs_get_tree - Log the set up of a volume for mounting.

It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
and puts quotes around the filename in the text representation.

Signed-off-by: David Howells

David Howells
2019-04-25 21:26:51 +0800
79ddbfa50 afs: Implement sillyrename for unlink and rename ... Browse Code »

Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.

Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.

A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup. The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.

Signed-off-by: David Howells

David Howells
2019-04-25 21:26:51 +0800
99987c560 afs: Add directory reload tracepoint ... Browse Code »

Add a tracepoint (afs_reload_dir) to indicate when a directory is being
reloaded.

Signed-off-by: David Howells

David Howells
2019-04-25 21:26:51 +0800
445b10289 afs: Improve dir check failure reports ... Browse Code »

Improve the content of directory check failure reports from:

kAFS: afs_dir_check_page(6d57): bad magic 1/2 is 0000

to dump more information about the individual blocks in a directory page.

Signed-off-by: David Howells

David Howells
2019-04-25 21:26:51 +0800

30 Nov, 2018

1 commit

73116df7b afs: Use d_instantiate() rather than d_add() and don't d_drop() ... Browse Code »

Use d_instantiate() rather than d_add() and don't d_drop() in
afs_vnode_new_inode(). The dentry shouldn't be removed as it's not
changing its name.

Reported-by: Al Viro
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2018-11-30 10:08:14 +0800

24 Oct, 2018

5 commits

30062bd13 afs: Implement YFS support in the fs client ... Browse Code »

Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client. These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
f58db83fd afs: Get the target vnode in afs_rmdir() and get a callback on it ... Browse Code »

Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
006719120 afs: Commit the status on a new file/dir/symlink ... Browse Code »

Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
3b6492df4 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS ... Browse Code »

Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
f51375cd9 afs: Add a couple of tracepoints to log I/O errors ... Browse Code »

Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:07 +0800

06 Aug, 2018

2 commits

1401a0fc2 afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT) ... Browse Code »

simpler logics in callers that way

Acked-by: David Howells
Signed-off-by: Al Viro

Al Viro
2018-08-06 03:50:59 +0800
34b2a88fb afs_lookup(): switch to d_splice_alias() ... Browse Code »

->lookup() methods can (and should) use d_splice_alias() instead of
d_add(). Even if they are not going to be hit by open_by_handle(),
code does get copied around; besides, d_splice_alias() has better
calling conventions for use in ->lookup(), so the code gets simpler.

Acked-by: David Howells
Signed-off-by: Al Viro

Al Viro
2018-08-06 03:44:14 +0800

14 May, 2018

2 commits

68251f0a6 afs: Fix whole-volume callback handling ... Browse Code »

It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken. This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2018-05-14 22:15:18 +0800
b61f7dcf4 afs: Fix directory page locking ... Browse Code »

The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data. As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data. Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
...
}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
flags: 0xfffe000001004(referenced|private)
raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:ffff98156b27c000
------------[ cut here ]------------
kernel BUG at mm/filemap.c:1205!
...
RIP: 0010:unlock_page+0x43/0x50
...
Call Trace:
afs_dir_iterate+0x789/0x8f0 [kafs]
? _cond_resched+0x15/0x30
? kmem_cache_alloc_trace+0x166/0x1d0
? afs_do_lookup+0x69/0x490 [kafs]
? afs_do_lookup+0x101/0x490 [kafs]
? key_default_cmp+0x20/0x20
? request_key+0x3c/0x80
? afs_lookup+0xf1/0x340 [kafs]
? __lookup_slow+0x97/0x150
? lookup_slow+0x35/0x50
? walk_component+0x1bf/0x490
? path_lookupat.isra.52+0x75/0x200
? filename_lookup.part.66+0xa0/0x170
? afs_end_vnode_operation+0x41/0x60 [kafs]
? __check_object_size+0x9c/0x171
? strncpy_from_user+0x4a/0x170
? vfs_statx+0x73/0xe0
? __do_sys_newlstat+0x39/0x70
? __x64_sys_getdents+0xc9/0x140
? __x64_sys_getdents+0x140/0x140
? do_syscall_64+0x5b/0x160
? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2018-05-14 20:17:35 +0800

10 Apr, 2018

2 commits

5a8132761 afs: Do better accretion of small writes on newly created content ... Browse Code »

Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between. Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

(1) If a file is just created locally or has just been truncated with
O_TRUNC locally, allow subsequent writes to the file to be merged with
intervening space if that space doesn't cross an entire intervening
page.

(2) Don't flush the file on ->flush() but rather on ->release() if the
file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

file-wr : n=441 nb=513581204

and after the patch:

file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells

David Howells
2018-04-10 04:54:48 +0800
63a4681ff afs: Locally edit directory data for mkdir/create/unlink/... ... Browse Code »

Locally edit the contents of an AFS directory upon a successful inode
operation that modifies that directory (such as mkdir, create and unlink)
so that we can avoid the current practice of re-downloading the directory
after each change.

This is viable provided that the directory version number we get back from
the modifying RPC op is exactly incremented by 1 from what we had
previously. The data in the directory contents is in a defined format that
we have to parse locally to perform lookups and readdir, so modifying isn't
a problem.

If the edit fails, we just clear the VALID flag on the directory and it
will be reloaded next time it is needed.

Signed-off-by: David Howells

David Howells
2018-04-10 04:54:48 +0800