23 Nov, 2020

1 commit

  • When doing a lookup in a directory, the afs filesystem uses a bulk
    status fetch to speculatively retrieve the statuses of up to 48 other
    vnodes found in the same directory and it will then either update extant
    inodes or create new ones - effectively doing 'lookup ahead'.

    To avoid the possibility of deadlocking itself, however, the filesystem
    doesn't lock all of those inodes; rather just the directory inode is
    locked (by the VFS).

    When the operation completes, afs_inode_init_from_status() or
    afs_apply_status() is called, depending on whether the inode already
    exists, to commit the new status.

    A case exists, however, where the speculative status fetch operation may
    straddle a modification operation on one of those vnodes. What can then
    happen is that the speculative bulk status RPC retrieves the old status,
    and whilst that is happening, the modification happens - which returns
    an updated status, then the modification status is committed, then we
    attempt to commit the speculative status.

    This results in something like the following being seen in dmesg:

    kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

    showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
    say that the vnode had data version 8 when we'd already recorded version
    9 due to a local modification. This was causing the cache to be
    invalidated for that vnode when it shouldn't have been. If it happens
    on a data file, this might lead to local changes being lost.

    Fix this by ignoring speculative status updates if the data version
    doesn't match the expected value.

    Note that it is possible to get a DV regression if a volume gets
    restored from a backup - but we should get a callback break in such a
    case that should trigger a recheck anyway. It might be worth checking
    the volume creation time in the volsync info and, if a change is
    observed in that (as would happen on a restore), invalidate all caches
    associated with the volume.

    Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

09 Oct, 2020

1 commit

  • The afs filesystem has a lock[*] that it uses to serialise I/O operations
    going to the server (vnode->io_lock), as the server will only perform one
    modification operation at a time on any given file or directory. This
    prevents the the filesystem from filling up all the call slots to a server
    with calls that aren't going to be executed in parallel anyway, thereby
    allowing operations on other files to obtain slots.

    [*] Note that is probably redundant for directories at least since
    i_rwsem is used to serialise directory modifications and
    lookup/reading vs modification. The server does allow parallel
    non-modification ops, however.

    When a file truncation op completes, we truncate the in-memory copy of the
    file to match - but we do it whilst still holding the io_lock, the idea
    being to prevent races with other operations.

    However, if writeback starts in a worker thread simultaneously with
    truncation (whilst notify_change() is called with i_rwsem locked, writeback
    pays it no heed), it may manage to set PG_writeback bits on the pages that
    will get truncated before afs_setattr_success() manages to call
    truncate_pagecache(). Truncate will then wait for those pages - whilst
    still inside io_lock:

    # cat /proc/8837/stack
    [] wait_on_page_bit_common+0x184/0x1e7
    [] truncate_inode_pages_range+0x37f/0x3eb
    [] truncate_pagecache+0x3c/0x53
    [] afs_setattr_success+0x4d/0x6e
    [] afs_wait_for_operation+0xd8/0x169
    [] afs_do_sync_operation+0x16/0x1f
    [] afs_setattr+0x1fb/0x25d
    [] notify_change+0x2cf/0x3c4
    [] do_truncate+0x7f/0xb2
    [] do_sys_ftruncate+0xd1/0x104
    [] do_syscall_64+0x2d/0x3a
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The writeback operation, however, stalls indefinitely because it needs to
    get the io_lock to proceed:

    # cat /proc/5940/stack
    [] afs_get_io_locks+0x58/0x1ae
    [] afs_begin_vnode_operation+0xc7/0xd1
    [] afs_store_data+0x1b2/0x2a3
    [] afs_write_back_from_locked_page+0x418/0x57c
    [] afs_writepages_region+0x196/0x224
    [] afs_writepages+0x74/0x156
    [] do_writepages+0x2d/0x56
    [] __writeback_single_inode+0x84/0x207
    [] writeback_sb_inodes+0x238/0x3cf
    [] __writeback_inodes_wb+0x68/0x9f
    [] wb_writeback+0x145/0x26c
    [] wb_do_writeback+0x16a/0x194
    [] wb_workfn+0x74/0x177
    [] process_one_work+0x174/0x264
    [] worker_thread+0x117/0x1b9
    [] kthread+0xec/0xf1
    [] ret_from_fork+0x1f/0x30

    and thus deadlock has occurred.

    Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
    that the caller is holding i_rwsem doesn't preclude more pages being
    dirtied through an mmap'd region.

    Fix this by:

    (1) Use the vnode validate_lock to mediate access between afs_setattr()
    and afs_writepages():

    (a) Exclusively lock validate_lock in afs_setattr() around the whole
    RPC operation.

    (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
    shared-lock validate_lock and returning immediately if we couldn't
    get it.

    (c) If WB_SYNC_ALL is set, wait for the lock.

    The validate_lock is also used to validate a file and to zap its cache
    if the file was altered by a third party, so it's probably a good fit
    for this.

    (2) Move the truncation outside of the io_lock in setattr, using the same
    hook as is used for local directory editing.

    This requires the old i_size to be retained in the operation record as
    we commit the revised status to the inode members inside the io_lock
    still, but we still need to know if we reduced the file size.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Jun, 2020

1 commit

  • Fix AFS's silly rename by the following means:

    (1) Set the destination directory in afs_do_silly_rename() so as to avoid
    misbehaviour and indicate that the directory data version will
    increment by 1 so as to avoid warnings about unexpected changes in the
    DV. Also indicate that the ctime should be updated to avoid xfstest
    grumbling.

    (2) Note when the server indicates that a directory changed more than we
    expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
    third party change, checking on successful completion of unlink and
    rename.

    The problem is that the FS.RemoveFile RPC op doesn't report the status
    of the unlinked file, though YFS.RemoveFile2 does. This can be
    mitigated by the assumption that if the directory DV cranked by
    exactly 1, we can be sure we removed one link from the file; further,
    ordinarily in AFS, files cannot be hardlinked across directories, so
    if we reduce nlink to 0, the file is deleted.

    However, if the directory DV jumps by more than 1, we cannot know if a
    third party intervened by adding or removing a link on the file we
    just removed a link from.

    The same also goes for any vnode that is at the destination of the
    FS.Rename RPC op.

    (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
    section along with the other attribute updates if ->op_unlinked is set
    on the descriptor for the appropriate vnode.

    (4) Issue a follow up status fetch to the unlinked file in the event of a
    third party conflict that makes it impossible for us to know if we
    actually deleted the file or not.

    (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
    the user about the nlink of a silly deleted file so that it appears as
    0, not 1.

    Found with the generic/035 and generic/084 xfstests.

    Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells
     

16 Jun, 2020

2 commits

  • afs_vnode_commit_status() is only ever called if op->error is 0, so remove
    the op->error checks from the function.

    Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
    Signed-off-by: David Howells

    David Howells
     
  • afs_check_for_remote_deletion() checks to see if error ENOENT is returned
    by the server in response to an operation and, if so, marks the primary
    vnode as having been deleted as the FID is no longer valid.

    However, it's being called from the operation success functions, where no
    abort has happened - and if an inline abort is recorded, it's handled by
    afs_vnode_commit_status().

    Fix this by actually calling the operation aborted method if provided and
    having that point to afs_check_for_remote_deletion().

    Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
    Signed-off-by: David Howells

    David Howells
     

15 Jun, 2020

3 commits

  • Fix the following issues:

    (1) Fix writeback to reduce the size of a store operation to i_size,
    effectively discarding the extra data.

    The problem comes when afs_page_mkwrite() records that a page is about
    to be modified by mmap(). It doesn't know what bits of the page are
    going to be modified, so it records the whole page as being dirty
    (this is stored in page->private as start and end offsets).

    Without this, the marshalling for the store to the server extends the
    size of the file to the end of the page (in afs_fs_store_data() and
    yfs_fs_store_data()).

    (2) Fix setattr to actually truncate the pagecache, thereby clearing
    the discarded part of a file.

    (3) Fix setattr to check that the new size is okay and to disable
    ATTR_SIZE if i_size wouldn't change.

    (4) Force i_size to be updated as the result of a truncate.

    (5) Don't truncate if ATTR_SIZE is not set.

    (6) Call pagecache_isize_extended() if the file was enlarged.

    Note that truncate_set_size() isn't used because the setting of i_size is
    done inside afs_vnode_commit_status() under the vnode->cb_lock.

    Found with the generic/029 and generic/393 xfstests.

    Fixes: 31143d5d515e ("AFS: implement basic file write support")
    Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record")
    Signed-off-by: David Howells

    David Howells
     
  • The in-kernel afs filesystem ignores ctime because the AFS fileserver
    protocol doesn't support ctimes. This, however, causes various xfstests to
    fail.

    Work around this by:

    (1) Setting ctime to attr->ia_ctime in afs_setattr().

    (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.

    (3) Setting the ctime from the server mtime when on the target file when
    creating a hard link to it.

    (4) Setting the ctime on directories from their revised mtimes when
    renaming/moving a file.

    Found by the generic/221 and generic/309 xfstests.

    Signed-off-by: David Howells

    David Howells
     
  • When doing a partial writeback, afs_write_back_from_locked_page() may
    generate an FS.StoreData RPC request that writes out part of a file when a
    file has been constructed from pieces by doing seek, write, seek, write,
    ... as is done by ld.

    The FS.StoreData RPC is given the current i_size as the file length, but
    the server basically ignores it unless the data length is 0 (in which case
    it's just a truncate operation). The revised file length returned in the
    result of the RPC may then not reflect what we suggested - and this leads
    to i_size getting moved backwards - which causes issues later.

    Fix the client to take account of this by ignoring the returned file size
    unless the data version number jumped unexpectedly - in which case we're
    going to have to clear the pagecache and reload anyway.

    This can be observed when doing a kernel build on an AFS mount. The
    following pair of commands produce the issue:

    ld -m elf_x86_64 -z max-page-size=0x200000 --emit-relocs \
    -T arch/x86/realmode/rm/realmode.lds \
    arch/x86/realmode/rm/header.o \
    arch/x86/realmode/rm/trampoline_64.o \
    arch/x86/realmode/rm/stack.o \
    arch/x86/realmode/rm/reboot.o \
    -o arch/x86/realmode/rm/realmode.elf
    arch/x86/tools/relocs --realmode \
    arch/x86/realmode/rm/realmode.elf \
    >arch/x86/realmode/rm/realmode.relocs

    This results in the latter giving:

    Cannot read ELF section headers 0/18: Success

    as the realmode.elf file got corrupted.

    The sequence of events can also be driven with:

    xfs_io -t -f \
    -c "pwrite -S 0x58 0 0x58" \
    -c "pwrite -S 0x59 10000 1000" \
    -c "close" \
    /afs/example.com/scratch/a

    Fixes: 31143d5d515e ("AFS: implement basic file write support")
    Signed-off-by: David Howells

    David Howells
     

10 Jun, 2020

1 commit


04 Jun, 2020

2 commits

  • Reorganise afs_volume objects such that they're in a tree keyed on volume
    ID, rooted at on an afs_cell object rather than being in multiple trees,
    each of which is rooted on an afs_server object.

    afs_server structs become per-cell and acquire a pointer to the cell.

    The process of breaking a callback then starts with finding the server by
    its network address, following that to the cell and then looking up each
    volume ID in the volume tree.

    This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
    and allows those structs and the code for maintaining them to be simplified
    or removed.

    It does make a couple of things a bit more tricky, though:

    (1) Operations now start with a volume, not a server, so there can be more
    than one answer as to whether or not the server we'll end up using
    supports the FS.InlineBulkStatus RPC.

    (2) CB RPC operations that specify the server UUID. There's still a tree
    of servers by UUID on the afs_net struct, but the UUIDs in it aren't
    guaranteed unique.

    Signed-off-by: David Howells

    David Howells
     
  • Turn the afs_operation struct into the main way that most fileserver
    operations are managed. Various things are added to the struct, including
    the following:

    (1) All the parameters and results of the relevant operations are moved
    into it, removing corresponding fields from the afs_call struct.
    afs_call gets a pointer to the op.

    (2) The target volume is made the main focus of the operation, rather than
    the target vnode(s), and a bunch of op->vnode->volume are made
    op->volume instead.

    (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
    in most operations. The vnode record (struct afs_vnode_param)
    contains:

    - The vnode pointer.

    - The fid of the vnode to be included in the parameters or that was
    returned in the reply (eg. FS.MakeDir).

    - The status and callback information that may be returned in the
    reply about the vnode.

    - Callback break and data version tracking for detecting
    simultaneous third-parth changes.

    (4) Pointers to dentries to be updated with new inodes.

    (5) An operations table pointer. The table includes pointers to functions
    for issuing AFS and YFS-variant RPCs, handling the success and abort
    of an operation and handling post-I/O-lock local editing of a
    directory.

    To make this work, the following function restructuring is made:

    (A) The rotation loop that issues calls to fileservers that can be found
    in each function that wants to issue an RPC (such as afs_mkdir()) is
    extracted out into common code, in a new file called fs_operation.c.

    (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
    a much smaller piece of code that allocates an operation, sets the
    parameters and then calls out to the common code to do the actual
    work.

    (C) The code for handling the success and failure of an operation are
    moved into operation functions (as (5) above) and these are called
    from the core code at appropriate times.

    (D) The pseudo inode getting stuff used by the dynamic root code is moved
    over into dynroot.c.

    (E) struct afs_iget_data is absorbed into the operation struct and
    afs_iget() expects to be given an op pointer and a vnode record.

    (F) Point (E) doesn't work for the root dir of a volume, but we know the
    FID in advance (it's always vnode 1, unique 1), so a separate inode
    getter, afs_root_iget(), is provided to special-case that.

    (G) The inode status init/update functions now also take an op and a vnode
    record.

    (H) The RPC marshalling functions now, for the most part, just take an
    afs_operation struct as their only argument. All the data they need
    is held there. The result delivery functions write their answers
    there as well.

    (I) The call is attached to the operation and then the operation core does
    the waiting.

    And then the new operation code is, for the moment, made to just initialise
    the operation, get the appropriate vnode I/O locks and do the same rotation
    loop as before.

    This lays the foundation for the following changes in the future:

    (*) Overhauling the rotation (again).

    (*) Support for asynchronous I/O, where the fileserver rotation must be
    done asynchronously also.

    Signed-off-by: David Howells

    David Howells
     

31 May, 2020

2 commits


26 Nov, 2019

1 commit

  • Pull printk updates from Petr Mladek:

    - Allow to print symbolic error names via new %pe modifier.

    - Use pr_warn() instead of the remaining pr_warning() calls. Fix
    formatting of the related lines.

    - Add VSPRINTF entry to MAINTAINERS.

    * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits)
    checkpatch: don't warn about new vsprintf pointer extension '%pe'
    MAINTAINERS: Add VSPRINTF
    tools lib api: Renaming pr_warning to pr_warn
    ASoC: samsung: Use pr_warn instead of pr_warning
    lib: cpu_rmap: Use pr_warn instead of pr_warning
    trace: Use pr_warn instead of pr_warning
    dma-debug: Use pr_warn instead of pr_warning
    vgacon: Use pr_warn instead of pr_warning
    fs: afs: Use pr_warn instead of pr_warning
    sh/intc: Use pr_warn instead of pr_warning
    scsi: Use pr_warn instead of pr_warning
    platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning
    platform/x86: asus-laptop: Use pr_warn instead of pr_warning
    platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning
    oprofile: Use pr_warn instead of pr_warning
    of: Use pr_warn instead of pr_warning
    macintosh: Use pr_warn instead of pr_warning
    idsn: Use pr_warn instead of pr_warning
    ide: Use pr_warn instead of pr_warning
    crypto: n2: Use pr_warn instead of pr_warning
    ...

    Linus Torvalds
     

18 Oct, 2019

1 commit

  • As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
    pr_warning"), removing pr_warning so all logging messages use a
    consistent _warn style. Let's do it.

    Link: http://lkml.kernel.org/r/20191018031850.48498-23-wangkefeng.wang@huawei.com
    To: linux-kernel@vger.kernel.org
    Cc: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Kefeng Wang
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Kefeng Wang
     

16 Sep, 2019

1 commit


21 Jun, 2019

2 commits

  • Add a couple of tracepoints to track callback management:

    (1) afs_cb_miss - Logs when we were unable to apply a callback, either due
    to the inode being discarded or due to a competing thread applying a
    callback first.

    (2) afs_cb_break - Logs when we attempted to clear the noted callback
    promise, either due to the server explicitly breaking the callback,
    the callback promise lapsing or a local event obsoleting it.

    Signed-off-by: David Howells

    David Howells
     
  • The setting of i_blocks, which is calculated from i_size, has got
    accidentally misordered relative to the setting of i_size when initially
    setting up an inode. Further, i_blocks isn't updated by afs_apply_status()
    when the size is updated.

    To fix this, break the i_size/i_blocks setting out into a helper function
    and call it from both places.

    Fixes: a58823ac4589 ("afs: Fix application of status and callback to be under same lock")
    Signed-off-by: David Howells

    David Howells
     

20 Jun, 2019

1 commit

  • Occasionally, warnings like this:

    vnode modified 2af7 on {10000b:1} [exp 2af2] YFS.FetchStatus(vnode)

    are emitted into the kernel log. This indicates that when we were applying
    the updated vnode (file) status retrieved from the server to an inode we
    saw that the data version number wasn't what we were expecting (in this
    case it's 0x2af7 rather than 0x2af2).

    We've usually received a callback from the server prior to this point - or
    the callback promise has lapsed - so the warning is merely informative and
    the state is to be expected.

    Fix this by only emitting the warning if the we still think that we have a
    valid callback promise and haven't received a callback.

    Also change the format slightly so so that the new data version doesn't
    look like part of the text, the like is prefixed with "kAFS: " and the
    message is ranked as a warning.

    Fixes: 31143d5d515e ("AFS: implement basic file write support")
    Reported-by: Ian Wienand
    Signed-off-by: David Howells

    David Howells
     

17 May, 2019

6 commits


16 May, 2019

3 commits

  • When applying the status and callback in the response of an operation,
    apply them in the same critical section so that there's no race between
    checking the callback state and checking status-dependent state (such as
    the data version).

    Fix this by:

    (1) Allocating a joint {status,callback} record (afs_status_cb) before
    calling the RPC function for each vnode for which the RPC reply
    contains a status or a status plus a callback. A flag is set in the
    record to indicate if a callback was actually received.

    (2) These records are passed into the RPC functions to be filled in. The
    afs_decode_status() and yfs_decode_status() functions are removed and
    the cb_lock is no longer taken.

    (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
    update the vnode.

    (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
    the vnode.

    (5) vnodes, expected data-version numbers and callback break counters
    (cb_break) no longer need to be passed to the reply delivery
    functions.

    Note that, for the moment, the file locking functions still need
    access to both the call and the vnode at the same time.

    (6) afs_vnode_commit_status() is now given the cb_break value and the
    expected data_version and the task of applying the status and the
    callback to the vnode are now done here.

    This is done under a single taking of vnode->cb_lock.

    (7) afs_pages_written_back() is now called by afs_store_data() rather than
    by the reply delivery function.

    afs_pages_written_back() has been moved to before the call point and
    is now given the first and last page numbers rather than a pointer to
    the call.

    (8) The indicator from YFS.RemoveFile2 as to whether the target file
    actually got removed (status.abort_code == VNOVNODE) rather than
    merely dropping a link is now checked in afs_unlink rather than in
    xdr_decode_YFSFetchStatus().

    Supplementary fixes:

    (*) afs_cache_permit() now gets the caller_access mask from the
    afs_status_cb object rather than picking it out of the vnode's status
    record. afs_fetch_status() returns caller_access through its argument
    list for this purpose also.

    (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
    than a read lock and now sets the callback inside the same critical
    section.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells

    David Howells
     
  • Don't invalidate the callback promise on a directory if the
    AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory
    contents are invalid, due to edit failure, callback break, page reclaim).

    The directory will be reloaded next time the directory is accessed, so
    clearing the callback flag at this point may race with a reload of the
    directory and cancel it's recorded callback promise.

    Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
    Signed-off-by: David Howells

    David Howells
     
  • Make certain RPC operations non-interruptible, including:

    (*) Set attributes
    (*) Store data

    We don't want to get interrupted during a flush on close, flush on
    unlock, writeback or an inode update, leaving us in a state where we
    still need to do the writeback or update.

    (*) Extend lock
    (*) Release lock

    We don't want to get lock extension interrupted as the file locks on
    the server are time-limited. Interruption during lock release is less
    of an issue since the lock is time-limited, but it's better to
    complete the release to avoid a several-minute wait to recover it.

    *Setting* the lock isn't a problem if it's interrupted since we can
    just return to the user and tell them they were interrupted - at
    which point they can elect to retry.

    (*) Silly unlink

    We want to remove silly unlink files if we can, rather than leaving
    them for the salvager to clear up.

    Note that whilst these calls are no longer interruptible, they do have
    timeouts on them, so if the server stops responding the call will fail with
    something like ETIME or ECONNRESET.

    Without this, the following:

    kAFS: Unexpected error from FS.StoreData -512

    appears in dmesg when a pending store data gets interrupted and some
    processes may just hang.

    Additionally, make the code that checks/updates the server record ignore
    failure due to interruption if the main call is uninterruptible and if the
    server has an address list. The next op will check it again since the
    expiration time on the old list has past.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Reported-by: Jonathan Billings
    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells
     

15 May, 2019

1 commit

  • Fix afs_release() to go through the cleanup part of the function if
    FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the
    cleanup). The cleanup involves discarding the refs on the key used for
    file ops and the writeback key record.

    Also fix afs_evict_inode() to clean up any left over wb keys attached to
    the inode/vnode when it is removed.

    Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content")
    Signed-off-by: David Howells

    David Howells
     

07 May, 2019

2 commits

  • While it's not possible to give an accurate number for the blocks
    used on the server, populate i_blocks based on the file size so
    that 'du' can give a reasonable estimate.

    The value is rounded up to 1K granularity, for consistency with
    what other AFS clients report, and the servers' 1K usage quota
    unit. Note that the value calculated by 'du' at the root of a
    volume can still be slightly lower than the quota usage on the
    server, as 0-length files are charged 1 quota block, but are
    reported as occupying 0 blocks. Again, this is consistent with
    other AFS clients.

    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells

    Marc Dionne
     
  • Log more information when "kAFS: AFS vnode with undefined type\n" is
    displayed due to a vnode record being retrieved from the server that
    appears to have a duff file type (usually 0). This prints more information
    to try and help pin down the problem.

    Signed-off-by: David Howells

    David Howells
     

25 Apr, 2019

1 commit

  • Implement sillyrename for AFS unlink and rename, using the NFS variant
    implementation as a basis.

    Note that the asynchronous file locking extender/releaser has to be
    notified with a state change to stop it complaining if there's a race
    between that and the actual file deletion.

    A tracepoint, afs_silly_rename, is also added to note the silly rename and
    the cleanup. The afs_edit_dir tracepoint is given some extra reason
    indicators and the afs_flock_ev tracepoint is given a silly-delete file
    lock cancellation indicator.

    Signed-off-by: David Howells

    David Howells
     

13 Apr, 2019

1 commit

  • get_seconds() has a limited range on 32-bit architectures and is
    deprecated because of that. While AFS uses the same limits for
    its inode timestamps on the wire protocol, let's just use the
    simpler current_time() as we do for other file systems.

    This will still zero out the 'tv_nsec' field of the timestamps
    internally.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David Howells

    Arnd Bergmann
     

17 Jan, 2019

2 commits

  • Fix the refcounting of the authentication keys in the file locking code.
    The vnode->lock_key member points to a key on which it expects to be
    holding a ref, but it isn't always given an extra ref, however.

    Fixes: 0fafdc9f888b ("afs: Fix file locking")
    Signed-off-by: David Howells

    David Howells
     
  • A cb_interest record is not necessarily attached to the vnode on entry to
    afs_validate(), which can cause an oops when we try to bring the vnode's
    cb_s_break up to date in the default case (ie. no current callback promise
    and the vnode has not been deleted).

    Fix this by simply removing the line, as vnode->cb_s_break will be set when
    needed by afs_register_server_cb_interest() when we next get a callback
    promise from RPC call.

    The oops looks something like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    ...
    RIP: 0010:afs_validate+0x66/0x250 [kafs]
    ...
    Call Trace:
    afs_d_revalidate+0x8d/0x340 [kafs]
    ? __d_lookup+0x61/0x150
    lookup_dcache+0x44/0x70
    ? lookup_dcache+0x44/0x70
    __lookup_hash+0x24/0xa0
    do_unlinkat+0x11d/0x2c0
    __x64_sys_unlink+0x23/0x30
    do_syscall_64+0x4d/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction")
    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells

    Marc Dionne
     

30 Nov, 2018

1 commit

  • When afs_validate() is called to validate a vnode (inode), there are two
    unhandled cases in the fastpath at the top of the function:

    (1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break
    counters match and the data has expired, then there's an implicit case
    in which the vnode needs revalidating.

    This has no consequences since the default "valid = false" set at the
    top of the function happens to do the right thing.

    (2) If the vnode is not promised and it hasn't been deleted
    (AFS_VNODE_DELETED is not set) then there's a default case we're not
    handling in which the vnode is invalid. If the vnode is invalid, we
    need to bring cb_s_break and cb_v_break up to date before we refetch
    the status.

    As a consequence, once the server loses track of the client
    (ie. sufficient time has passed since we last sent it an operation),
    it will send us a CB.InitCallBackState* operation when we next try to
    talk to it. This calls afs_init_callback_state() which increments
    afs_server::cb_s_break, but this then doesn't propagate to the
    afs_vnode record.

    The result being that every afs_validate() call thereafter sends a
    status fetch operation to the server.

    Clarify and fix this by:

    (A) Setting valid in all the branches rather than initialising it at the
    top so that the compiler catches where we've missed.

    (B) Restructuring the logic in the 'promised' branch so that we set valid
    to false if the callback is due to expire (or has expired) and so that
    the final case is that the vnode is still valid.

    (C) Adding an else-statement that ups cb_s_break and cb_v_break if the
    promised and deleted cases don't match.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

24 Oct, 2018

3 commits


14 May, 2018

1 commit

  • It's possible for an AFS file server to issue a whole-volume notification
    that callbacks on all the vnodes in the file have been broken. This is
    done for R/O and backup volumes (which don't have per-file callbacks) and
    for things like a volume being taken offline.

    Fix callback handling to detect whole-volume notifications, to track it
    across operations and to check it during inode validation.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells

    David Howells