Eric Lee / smarc-fsl-linux-kernel

23 Nov, 2020

1 commit

a9e5c87ca afs: Fix speculative status fetch going out of order wrt to modifications ... Browse Code »

When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2020-11-23 03:27:03 +0800

09 Oct, 2020

1 commit

ec0fa0b65 afs: Fix deadlock between writeback and truncate ... Browse Code »

The afs filesystem has a lock[*] that it uses to serialise I/O operations
going to the server (vnode->io_lock), as the server will only perform one
modification operation at a time on any given file or directory. This
prevents the the filesystem from filling up all the call slots to a server
with calls that aren't going to be executed in parallel anyway, thereby
allowing operations on other files to obtain slots.

[*] Note that is probably redundant for directories at least since
i_rwsem is used to serialise directory modifications and
lookup/reading vs modification. The server does allow parallel
non-modification ops, however.

When a file truncation op completes, we truncate the in-memory copy of the
file to match - but we do it whilst still holding the io_lock, the idea
being to prevent races with other operations.

However, if writeback starts in a worker thread simultaneously with
truncation (whilst notify_change() is called with i_rwsem locked, writeback
pays it no heed), it may manage to set PG_writeback bits on the pages that
will get truncated before afs_setattr_success() manages to call
truncate_pagecache(). Truncate will then wait for those pages - whilst
still inside io_lock:

# cat /proc/8837/stack
[] wait_on_page_bit_common+0x184/0x1e7
[] truncate_inode_pages_range+0x37f/0x3eb
[] truncate_pagecache+0x3c/0x53
[] afs_setattr_success+0x4d/0x6e
[] afs_wait_for_operation+0xd8/0x169
[] afs_do_sync_operation+0x16/0x1f
[] afs_setattr+0x1fb/0x25d
[] notify_change+0x2cf/0x3c4
[] do_truncate+0x7f/0xb2
[] do_sys_ftruncate+0xd1/0x104
[] do_syscall_64+0x2d/0x3a
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The writeback operation, however, stalls indefinitely because it needs to
get the io_lock to proceed:

# cat /proc/5940/stack
[] afs_get_io_locks+0x58/0x1ae
[] afs_begin_vnode_operation+0xc7/0xd1
[] afs_store_data+0x1b2/0x2a3
[] afs_write_back_from_locked_page+0x418/0x57c
[] afs_writepages_region+0x196/0x224
[] afs_writepages+0x74/0x156
[] do_writepages+0x2d/0x56
[] __writeback_single_inode+0x84/0x207
[] writeback_sb_inodes+0x238/0x3cf
[] __writeback_inodes_wb+0x68/0x9f
[] wb_writeback+0x145/0x26c
[] wb_do_writeback+0x16a/0x194
[] wb_workfn+0x74/0x177
[] process_one_work+0x174/0x264
[] worker_thread+0x117/0x1b9
[] kthread+0xec/0xf1
[] ret_from_fork+0x1f/0x30

and thus deadlock has occurred.

Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
that the caller is holding i_rwsem doesn't preclude more pages being
dirtied through an mmap'd region.

Fix this by:

(1) Use the vnode validate_lock to mediate access between afs_setattr()
and afs_writepages():

(a) Exclusively lock validate_lock in afs_setattr() around the whole
RPC operation.

(b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
shared-lock validate_lock and returning immediately if we couldn't
get it.

(c) If WB_SYNC_ALL is set, wait for the lock.

The validate_lock is also used to validate a file and to zap its cache
if the file was altered by a third party, so it's probably a good fit
for this.

(2) Move the truncation outside of the io_lock in setattr, using the same
hook as is used for local directory editing.

This requires the old i_size to be retained in the operation record as
we commit the revised status to the inode members inside the io_lock
still, but we still need to know if we reduced the file size.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2020-10-09 01:50:55 +0800

17 Jun, 2020

1 commit

b6489a49f afs: Fix silly rename ... Browse Code »

Fix AFS's silly rename by the following means:

(1) Set the destination directory in afs_do_silly_rename() so as to avoid
misbehaviour and indicate that the directory data version will
increment by 1 so as to avoid warnings about unexpected changes in the
DV. Also indicate that the ctime should be updated to avoid xfstest
grumbling.

(2) Note when the server indicates that a directory changed more than we
expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
third party change, checking on successful completion of unlink and
rename.

The problem is that the FS.RemoveFile RPC op doesn't report the status
of the unlinked file, though YFS.RemoveFile2 does. This can be
mitigated by the assumption that if the directory DV cranked by
exactly 1, we can be sure we removed one link from the file; further,
ordinarily in AFS, files cannot be hardlinked across directories, so
if we reduce nlink to 0, the file is deleted.

However, if the directory DV jumps by more than 1, we cannot know if a
third party intervened by adding or removing a link on the file we
just removed a link from.

The same also goes for any vnode that is at the destination of the
FS.Rename RPC op.

(3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
section along with the other attribute updates if ->op_unlinked is set
on the descriptor for the appropriate vnode.

(4) Issue a follow up status fetch to the unlinked file in the event of a
third party conflict that makes it impossible for us to know if we
actually deleted the file or not.

(5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
the user about the nlink of a silly deleted file so that it appears as
0, not 1.

Found with the generic/035 and generic/084 xfstests.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2020-06-17 05:00:28 +0800

16 Jun, 2020

2 commits

7c295eec1 afs: afs_vnode_commit_status() doesn't need to check the RPC error ... Browse Code »

afs_vnode_commit_status() is only ever called if op->error is 0, so remove
the op->error checks from the function.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells

David Howells
2020-06-16 23:26:57 +0800
728279a5a afs: Fix use of afs_check_for_remote_deletion() ... Browse Code »

afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells

David Howells
2020-06-16 23:26:57 +0800

15 Jun, 2020

3 commits

793fe82ee afs: Fix truncation issues and mmap writeback size ... Browse Code »

Fix the following issues:

(1) Fix writeback to reduce the size of a store operation to i_size,
effectively discarding the extra data.

The problem comes when afs_page_mkwrite() records that a page is about
to be modified by mmap(). It doesn't know what bits of the page are
going to be modified, so it records the whole page as being dirty
(this is stored in page->private as start and end offsets).

Without this, the marshalling for the store to the server extends the
size of the file to the end of the page (in afs_fs_store_data() and
yfs_fs_store_data()).

(2) Fix setattr to actually truncate the pagecache, thereby clearing
the discarded part of a file.

(3) Fix setattr to check that the new size is okay and to disable
ATTR_SIZE if i_size wouldn't change.

(4) Force i_size to be updated as the result of a truncate.

(5) Don't truncate if ATTR_SIZE is not set.

(6) Call pagecache_isize_extended() if the file was enlarged.

Note that truncate_set_size() isn't used because the setting of i_size is
done inside afs_vnode_commit_status() under the vnode->cb_lock.

Found with the generic/029 and generic/393 xfstests.

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells

David Howells
2020-06-15 22:41:02 +0800
da8d07551 afs: Concoct ctimes ... Browse Code »

The in-kernel afs filesystem ignores ctime because the AFS fileserver
protocol doesn't support ctimes. This, however, causes various xfstests to
fail.

Work around this by:

(1) Setting ctime to attr->ia_ctime in afs_setattr().

(2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.

(3) Setting the ctime from the server mtime when on the target file when
creating a hard link to it.

(4) Setting the ctime on directories from their revised mtimes when
renaming/moving a file.

Found by the generic/221 and generic/309 xfstests.

Signed-off-by: David Howells

David Howells
2020-06-15 22:41:02 +0800
3f4aa9818 afs: Fix EOF corruption ... Browse Code »

When doing a partial writeback, afs_write_back_from_locked_page() may
generate an FS.StoreData RPC request that writes out part of a file when a
file has been constructed from pieces by doing seek, write, seek, write,
... as is done by ld.

The FS.StoreData RPC is given the current i_size as the file length, but
the server basically ignores it unless the data length is 0 (in which case
it's just a truncate operation). The revised file length returned in the
result of the RPC may then not reflect what we suggested - and this leads
to i_size getting moved backwards - which causes issues later.

Fix the client to take account of this by ignoring the returned file size
unless the data version number jumped unexpectedly - in which case we're
going to have to clear the pagecache and reload anyway.

This can be observed when doing a kernel build on an AFS mount. The
following pair of commands produce the issue:

ld -m elf_x86_64 -z max-page-size=0x200000 --emit-relocs \
-T arch/x86/realmode/rm/realmode.lds \
arch/x86/realmode/rm/header.o \
arch/x86/realmode/rm/trampoline_64.o \
arch/x86/realmode/rm/stack.o \
arch/x86/realmode/rm/reboot.o \
-o arch/x86/realmode/rm/realmode.elf
arch/x86/tools/relocs --realmode \
arch/x86/realmode/rm/realmode.elf \
>arch/x86/realmode/rm/realmode.relocs

This results in the latter giving:

Cannot read ELF section headers 0/18: Success

as the realmode.elf file got corrupted.

The sequence of events can also be driven with:

xfs_io -t -f \
-c "pwrite -S 0x58 0 0x58" \
-c "pwrite -S 0x59 10000 1000" \
-c "close" \
/afs/example.com/scratch/a

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Signed-off-by: David Howells

David Howells
2020-06-15 22:41:02 +0800

10 Jun, 2020

1 commit

c68421bba afs: Make afs_zap_data() static ... Browse Code »

Make afs_zap_data() static as it's only used in the file in which it is
defined.

Signed-off-by: David Howells

David Howells
2020-06-10 01:17:14 +0800

04 Jun, 2020

2 commits

20325960f afs: Reorganise volume and server trees to be rooted on the cell ... Browse Code »

Reorganise afs_volume objects such that they're in a tree keyed on volume
ID, rooted at on an afs_cell object rather than being in multiple trees,
each of which is rooted on an afs_server object.

afs_server structs become per-cell and acquire a pointer to the cell.

The process of breaking a callback then starts with finding the server by
its network address, following that to the cell and then looking up each
volume ID in the volume tree.

This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
and allows those structs and the code for maintaining them to be simplified
or removed.

It does make a couple of things a bit more tricky, though:

(1) Operations now start with a volume, not a server, so there can be more
than one answer as to whether or not the server we'll end up using
supports the FS.InlineBulkStatus RPC.

(2) CB RPC operations that specify the server UUID. There's still a tree
of servers by UUID on the afs_net struct, but the UUIDs in it aren't
guaranteed unique.

Signed-off-by: David Howells

David Howells
2020-06-04 22:37:57 +0800
e49c7b2f6 afs: Build an abstraction around an "operation" concept ... Browse Code »

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells

David Howells
2020-06-04 22:37:17 +0800

31 May, 2020

2 commits

a310082f6 afs: Rename struct afs_fs_cursor to afs_operation ... Browse Code »

As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.

Signed-off-by: David Howells

David Howells
2020-05-31 22:19:52 +0800
7126ead91 afs: Remove the error argument from afs_protocol_error() ... Browse Code »

Remove the error argument from afs_protocol_error() as it's always
-EBADMSG.

Signed-off-by: David Howells

David Howells
2020-05-31 22:19:52 +0800

26 Nov, 2019

1 commit

436b2a803 Merge tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk ... Browse Code »

Pull printk updates from Petr Mladek:

- Allow to print symbolic error names via new %pe modifier.

- Use pr_warn() instead of the remaining pr_warning() calls. Fix
formatting of the related lines.

- Add VSPRINTF entry to MAINTAINERS.

* tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits)
checkpatch: don't warn about new vsprintf pointer extension '%pe'
MAINTAINERS: Add VSPRINTF
tools lib api: Renaming pr_warning to pr_warn
ASoC: samsung: Use pr_warn instead of pr_warning
lib: cpu_rmap: Use pr_warn instead of pr_warning
trace: Use pr_warn instead of pr_warning
dma-debug: Use pr_warn instead of pr_warning
vgacon: Use pr_warn instead of pr_warning
fs: afs: Use pr_warn instead of pr_warning
sh/intc: Use pr_warn instead of pr_warning
scsi: Use pr_warn instead of pr_warning
platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning
platform/x86: asus-laptop: Use pr_warn instead of pr_warning
platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning
oprofile: Use pr_warn instead of pr_warning
of: Use pr_warn instead of pr_warning
macintosh: Use pr_warn instead of pr_warning
idsn: Use pr_warn instead of pr_warning
ide: Use pr_warn instead of pr_warning
crypto: n2: Use pr_warn instead of pr_warning
...

Linus Torvalds
2019-11-26 11:40:40 +0800

18 Oct, 2019

1 commit

a4e530ae7 fs: afs: Use pr_warn instead of pr_warning ... Browse Code »

As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent _warn style. Let's do it.

Link: http://lkml.kernel.org/r/20191018031850.48498-23-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: David Howells
Cc: linux-afs@lists.infradead.org
Signed-off-by: Kefeng Wang
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek

Kefeng Wang
2019-10-18 21:01:55 +0800

16 Sep, 2019

1 commit

473ef57ad afs dynroot: switch to simple_dir_operations ... Browse Code »

no point reinventing it (with wrong ->read(), BTW).

Signed-off-by: Al Viro

Al Viro
2019-09-16 00:19:48 +0800

21 Jun, 2019

2 commits

051d25250 afs: Add some callback management tracepoints ... Browse Code »

Add a couple of tracepoints to track callback management:

(1) afs_cb_miss - Logs when we were unable to apply a callback, either due
to the inode being discarded or due to a competing thread applying a
callback first.

(2) afs_cb_break - Logs when we attempted to clear the noted callback
promise, either due to the server explicitly breaking the callback,
the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells

David Howells
2019-06-21 01:12:16 +0800
2cd42d19c afs: Fix setting of i_blocks ... Browse Code »

The setting of i_blocks, which is calculated from i_size, has got
accidentally misordered relative to the setting of i_size when initially
setting up an inode. Further, i_blocks isn't updated by afs_apply_status()
when the size is updated.

To fix this, break the i_size/i_blocks setting out into a helper function
and call it from both places.

Fixes: a58823ac4589 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells

David Howells
2019-06-21 01:12:02 +0800

20 Jun, 2019

1 commit

3647e42b5 afs: Fix over zealous "vnode modified" warnings ... Browse Code »

Occasionally, warnings like this:

vnode modified 2af7 on {10000b:1} [exp 2af2] YFS.FetchStatus(vnode)

are emitted into the kernel log. This indicates that when we were applying
the updated vnode (file) status retrieved from the server to an inode we
saw that the data version number wasn't what we were expecting (in this
case it's 0x2af7 rather than 0x2af2).

We've usually received a callback from the server prior to this point - or
the callback promise has lapsed - so the warning is merely informative and
the state is to be expected.

Fix this by only emitting the warning if the we still think that we have a
valid callback promise and haven't received a callback.

Also change the format slightly so so that the new data version doesn't
look like part of the text, the like is prefixed with "kAFS: " and the
message is ranked as a warning.

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Ian Wienand
Signed-off-by: David Howells

David Howells
2019-06-20 23:49:34 +0800

17 May, 2019

6 commits

b83591532 afs: Pass pre-fetch server and volume break counts into afs_iget5_set() ... Browse Code »

Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.

This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
a38a75581 afs: Fix unlink to handle YFS.RemoveFile2 better ... Browse Code »

Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.

Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
61c347ba5 afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry ... Browse Code »

Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect
any condition that causes the callback promise to be broken implicitly,
including server break (cb_s_break), volume break (cb_v_break) or callback
expiry.

Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction")
Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
f642404a0 afs: Make vnode->cb_interest RCU safe ... Browse Code »

Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest. Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().

This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.

Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
c925bd0ac afs: Split afs_validate() so first part can be used under LOOKUP_RCU ... Browse Code »

Split afs_validate() so that the part that decides if the vnode is still
valid can be used under LOOKUP_RCU conditions from afs_d_revalidate().

Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800
7c7124586 afs: Don't save callback version and type fields ... Browse Code »

Don't save callback version and type fields as the version is about the
format of the callback information and the type is relative to the
particular RPC call.

Signed-off-by: David Howells

David Howells
2019-05-17 05:23:21 +0800

16 May, 2019

3 commits

a58823ac4 afs: Fix application of status and callback to be under same lock ... Browse Code »

When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

(1) Allocating a joint {status,callback} record (afs_status_cb) before
calling the RPC function for each vnode for which the RPC reply
contains a status or a status plus a callback. A flag is set in the
record to indicate if a callback was actually received.

(2) These records are passed into the RPC functions to be filled in. The
afs_decode_status() and yfs_decode_status() functions are removed and
the cb_lock is no longer taken.

(3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
update the vnode.

(4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
the vnode.

(5) vnodes, expected data-version numbers and callback break counters
(cb_break) no longer need to be passed to the reply delivery
functions.

Note that, for the moment, the file locking functions still need
access to both the call and the vnode at the same time.

(6) afs_vnode_commit_status() is now given the cb_break value and the
expected data_version and the task of applying the status and the
callback to the vnode are now done here.

This is done under a single taking of vnode->cb_lock.

(7) afs_pages_written_back() is now called by afs_store_data() rather than
by the reply delivery function.

afs_pages_written_back() has been moved to before the call point and
is now given the first and last page numbers rather than a pointer to
the call.

(8) The indicator from YFS.RemoveFile2 as to whether the target file
actually got removed (status.abort_code == VNOVNODE) rather than
merely dropping a link is now checked in afs_unlink rather than in
xdr_decode_YFSFetchStatus().

Supplementary fixes:

(*) afs_cache_permit() now gets the caller_access mask from the
afs_status_cb object rather than picking it out of the vnode's status
record. afs_fetch_status() returns caller_access through its argument
list for this purpose also.

(*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
than a read lock and now sets the callback inside the same critical
section.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:21 +0800
d9052dda8 afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set ... Browse Code »

Don't invalidate the callback promise on a directory if the
AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory
contents are invalid, due to edit failure, callback break, page reclaim).

The directory will be reloaded next time the directory is accessed, so
clearing the callback flag at this point may race with a reload of the
directory and cancel it's recorded callback promise.

Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:21 +0800
20b8391ff afs: Make some RPC operations non-interruptible ... Browse Code »

Make certain RPC operations non-interruptible, including:

(*) Set attributes
(*) Store data

We don't want to get interrupted during a flush on close, flush on
unlock, writeback or an inode update, leaving us in a state where we
still need to do the writeback or update.

(*) Extend lock
(*) Release lock

We don't want to get lock extension interrupted as the file locks on
the server are time-limited. Interruption during lock release is less
of an issue since the lock is time-limited, but it's better to
complete the release to avoid a several-minute wait to recover it.

*Setting* the lock isn't a problem if it's interrupted since we can
just return to the user and tell them they were interrupted - at
which point they can elect to retry.

(*) Silly unlink

We want to remove silly unlink files if we can, rather than leaving
them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list. The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings
Reported-by: Marc Dionne
Signed-off-by: David Howells

David Howells
2019-05-16 23:25:20 +0800

15 May, 2019

1 commit

a1b879eef afs: Fix key leak in afs_release() and afs_evict_inode() ... Browse Code »

Fix afs_release() to go through the cleanup part of the function if
FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the
cleanup). The cleanup involves discarding the refs on the key used for
file ops and the writeback key record.

Also fix afs_evict_inode() to clean up any left over wb keys attached to
the inode/vnode when it is removed.

Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content")
Signed-off-by: David Howells

David Howells
2019-05-15 19:32:34 +0800

07 May, 2019

2 commits

c0abbb579 afs: Calculate i_blocks based on file size ... Browse Code »

While it's not possible to give an accurate number for the blocks
used on the server, populate i_blocks based on the file size so
that 'du' can give a reasonable estimate.

The value is rounded up to 1K granularity, for consistency with
what other AFS clients report, and the servers' 1K usage quota
unit. Note that the value calculated by 'du' at the root of a
volume can still be slightly lower than the quota usage on the
server, as 0-length files are charged 1 quota block, but are
reported as occupying 0 blocks. Again, this is consistent with
other AFS clients.

Signed-off-by: Marc Dionne
Signed-off-by: David Howells

Marc Dionne
2019-05-07 23:48:44 +0800
b134d687d afs: Log more information for "kAFS: AFS vnode with undefined type\n" ... Browse Code »

Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0). This prints more information
to try and help pin down the problem.

Signed-off-by: David Howells

David Howells
2019-05-07 23:48:44 +0800

25 Apr, 2019

1 commit

79ddbfa50 afs: Implement sillyrename for unlink and rename ... Browse Code »

Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.

Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.

A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup. The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.

Signed-off-by: David Howells

David Howells
2019-04-25 21:26:51 +0800

13 Apr, 2019

1 commit

ba25b81e3 afs: avoid deprecated get_seconds() ... Browse Code »

get_seconds() has a limited range on 32-bit architectures and is
deprecated because of that. While AFS uses the same limits for
its inode timestamps on the wire protocol, let's just use the
simpler current_time() as we do for other file systems.

This will still zero out the 'tv_nsec' field of the timestamps
internally.

Signed-off-by: Arnd Bergmann
Signed-off-by: David Howells

Arnd Bergmann
2019-04-13 15:37:36 +0800

17 Jan, 2019

2 commits

59d49076a afs: Fix key refcounting in file locking code ... Browse Code »

Fix the refcounting of the authentication keys in the file locking code.
The vnode->lock_key member points to a key on which it expects to be
holding a ref, but it isn't always given an extra ref, however.

Fixes: 0fafdc9f888b ("afs: Fix file locking")
Signed-off-by: David Howells

David Howells
2019-01-17 23:17:28 +0800
4882a27ce afs: Don't set vnode->cb_s_break in afs_validate() ... Browse Code »

A cb_interest record is not necessarily attached to the vnode on entry to
afs_validate(), which can cause an oops when we try to bring the vnode's
cb_s_break up to date in the default case (ie. no current callback promise
and the vnode has not been deleted).

Fix this by simply removing the line, as vnode->cb_s_break will be set when
needed by afs_register_server_cb_interest() when we next get a callback
promise from RPC call.

The oops looks something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
...
RIP: 0010:afs_validate+0x66/0x250 [kafs]
...
Call Trace:
afs_d_revalidate+0x8d/0x340 [kafs]
? __d_lookup+0x61/0x150
lookup_dcache+0x44/0x70
? lookup_dcache+0x44/0x70
__lookup_hash+0x24/0xa0
do_unlinkat+0x11d/0x2c0
__x64_sys_unlink+0x23/0x30
do_syscall_64+0x4d/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction")
Signed-off-by: Marc Dionne
Signed-off-by: David Howells

Marc Dionne
2019-01-17 23:15:52 +0800

30 Nov, 2018

1 commit

ae3b7361d afs: Fix validation/callback interaction ... Browse Code »

When afs_validate() is called to validate a vnode (inode), there are two
unhandled cases in the fastpath at the top of the function:

(1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break
counters match and the data has expired, then there's an implicit case
in which the vnode needs revalidating.

This has no consequences since the default "valid = false" set at the
top of the function happens to do the right thing.

(2) If the vnode is not promised and it hasn't been deleted
(AFS_VNODE_DELETED is not set) then there's a default case we're not
handling in which the vnode is invalid. If the vnode is invalid, we
need to bring cb_s_break and cb_v_break up to date before we refetch
the status.

As a consequence, once the server loses track of the client
(ie. sufficient time has passed since we last sent it an operation),
it will send us a CB.InitCallBackState* operation when we next try to
talk to it. This calls afs_init_callback_state() which increments
afs_server::cb_s_break, but this then doesn't propagate to the
afs_vnode record.

The result being that every afs_validate() call thereafter sends a
status fetch operation to the server.

Clarify and fix this by:

(A) Setting valid in all the branches rather than initialising it at the
top so that the compiler catches where we've missed.

(B) Restructuring the logic in the 'promised' branch so that we set valid
to false if the callback is due to expire (or has expired) and so that
the final case is that the vnode is still valid.

(C) Adding an else-statement that ups cb_s_break and cb_v_break if the
promised and deleted cases don't match.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2018-11-30 10:08:14 +0800

24 Oct, 2018

3 commits

12d8e95a9 afs: Calc callback expiry in op reply delivery ... Browse Code »

Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
3b6492df4 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS ... Browse Code »

Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:08 +0800
160cb9574 afs: Better tracing of protocol errors ... Browse Code »

Include the site of detection of AFS protocol errors in trace lines to
better be able to determine what went wrong.

Signed-off-by: David Howells

David Howells
2018-10-24 07:41:07 +0800

14 May, 2018

1 commit

68251f0a6 afs: Fix whole-volume callback handling ... Browse Code »

It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken. This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells

David Howells
2018-05-14 22:15:18 +0800