02 May, 2020

1 commit

  • commit c4bfda16d1b40d1c5941c61b5aa336bdd2d9904a upstream.

    When an operation is meant to be done uninterruptibly (such as
    FS.StoreData), we should not be allowing volume and server record checking
    to be interrupted.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

09 Jan, 2020

1 commit

  • [ Upstream commit 9bd0160d12370a076e44f8d1320cde9c83f2c647 ]

    afs_find_server tries to find a server that has an address that
    matches the transport address of an rxrpc peer. The code assumes
    that the transport address is always ipv6, with ipv4 represented
    as ipv4 mapped addresses, but that's not the case. If the transport
    family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will
    be beyond the actual ipv4 address and will always be 0, and all
    ipv4 addresses will be seen as matching.

    As a result, the first ipv4 address seen on any server will be
    considered a match, and the server returned may be the wrong one.

    One of the consequences is that callbacks received over ipv4 will
    only be correctly applied for the server that happens to have the
    first ipv4 address on the fs_addresses4 list. Callbacks over ipv4
    from all other servers are dropped, causing the client to serve stale
    data.

    This is fixed by looking at the transport family, and comparing ipv4
    addresses based on a sockaddr_in structure rather than a sockaddr_in6.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    Marc Dionne
     

11 Jul, 2019

1 commit

  • Pull afs updates from David Howells:
    "A set of minor changes for AFS:

    - Remove an unnecessary check in afs_unlink()

    - Add a tracepoint for tracking callback management

    - Add a tracepoint for afs_server object usage

    - Use struct_size()

    - Add mappings for AFS UAE abort codes to Linux error codes, using
    symbolic names rather than hex numbers in the .c file"

    * tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    afs: Add support for the UAE error table
    fs/afs: use struct_size() in kzalloc()
    afs: Trace afs_server usage
    afs: Add some callback management tracepoints
    afs: afs_unlink() doesn't need to check dentry->d_inode

    Linus Torvalds
     

21 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 May, 2019

2 commits

  • Make certain RPC operations non-interruptible, including:

    (*) Set attributes
    (*) Store data

    We don't want to get interrupted during a flush on close, flush on
    unlock, writeback or an inode update, leaving us in a state where we
    still need to do the writeback or update.

    (*) Extend lock
    (*) Release lock

    We don't want to get lock extension interrupted as the file locks on
    the server are time-limited. Interruption during lock release is less
    of an issue since the lock is time-limited, but it's better to
    complete the release to avoid a several-minute wait to recover it.

    *Setting* the lock isn't a problem if it's interrupted since we can
    just return to the user and tell them they were interrupted - at
    which point they can elect to retry.

    (*) Silly unlink

    We want to remove silly unlink files if we can, rather than leaving
    them for the salvager to clear up.

    Note that whilst these calls are no longer interruptible, they do have
    timeouts on them, so if the server stops responding the call will fail with
    something like ETIME or ECONNRESET.

    Without this, the following:

    kAFS: Unexpected error from FS.StoreData -512

    appears in dmesg when a pending store data gets interrupted and some
    processes may just hang.

    Additionally, make the code that checks/updates the server record ignore
    failure due to interruption if the main call is uninterruptible and if the
    server has an address list. The next op will check it again since the
    expiration time on the old list has past.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Reported-by: Jonathan Billings
    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells
     
  • afs_check/update_server_record() should be setting fc->error rather than
    fc->ac.error as they're called from within the cursor iteration function.

    afs_fs_cursor::error is where the error code of the attempt to call the
    operation on multiple servers is integrated and is the final result,
    whereas afs_addr_cursor::error is used to hold the error from individual
    iterations of the call loop. (Note there's also an afs_vl_cursor which
    also wraps afs_addr_cursor for accessing VL servers rather than file
    servers).

    Fix this by setting fc->error in the afs_check/update_server_record() so
    that any error incurred whilst talking to the VL server correctly
    propagates to the final result.

    This results in:

    kAFS: Unexpected error from FS.StoreData -512

    being seen, even though the store-data op is non-interruptible. The error
    is actually coming from the server record update getting interrupted.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells

    David Howells
     

13 Apr, 2019

1 commit

  • The in-kernel afs filesystem client counts the number of server-level
    callback invalidation events (CB.InitCallBackState* RPC operations) that it
    receives from the server. This is stored in cb_s_break in various
    structures, including afs_server and afs_vnode.

    If an inode is examined by afs_validate(), say, the afs_server copy is
    compared, along with other break counters, to those in afs_vnode, and if
    one or more of the counters do not match, it is considered that the
    server's callback promise is broken. At points where this happens,
    AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
    refetched from the server.

    afs_validate() issues an FS.FetchStatus operation to get updated metadata -
    and based on the updated data_version may invalidate the pagecache too.

    However, the break counters are also used to determine whether to note a
    new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
    and whether to cache the permit data included in the YFSFetchStatus record
    by the server.

    The problem comes when the server sends us a CB.InitCallBackState op. The
    first such instance doesn't cause cb_s_break to be incremented, but rather
    causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
    after last use and all the volumes have been automatically unmounted and
    the server has forgotten about the client[*], this *will* likely cause an
    increment.

    [*] There are other circumstances too, such as the server restarting or
    needing to make space in its callback table.

    Note that the server won't send us a CB.InitCallBackState op until we talk
    to it again.

    So what happens is:

    (1) A mount for a new volume is attempted, a inode is created for the root
    vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
    immediately, as we don't have a nominated server to talk to yet - and
    we may iterate through a few to find one.

    (2) Before the operation happens, afs_fetch_status(), say, notes in the
    cursor (fc.cb_break) the break counter sum from the vnode, volume and
    server counters, but the server->cb_s_break is currently 0.

    (3) We send FS.FetchStatus to the server. The server sends us back
    CB.InitCallBackState. We increment server->cb_s_break.

    (4) Our FS.FetchStatus completes. The reply includes a callback record.

    (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
    the callback promise was broken by checking the break counter sum from
    step (2) against the current sum.

    This fails because of step (3), so we don't set the callback record
    and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.

    This does not preclude the syscall from progressing, and we don't loop here
    rechecking the status, but rather assume it's good enough for one round
    only and will need to be rechecked next time.

    (6) afs_validate() it triggered on the vnode, probably called from
    d_revalidate() checking the parent directory.

    (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
    update vnode->cb_s_break and assumes the vnode to be invalid.

    (8) afs_validate() needs to calls afs_fetch_status(). Go back to step (2)
    and repeat, every time the vnode is validated.

    This primarily affects volume root dir vnodes. Everything subsequent to
    those inherit an already incremented cb_s_break upon mounting.

    The issue is that we assume that the callback record and the cached permit
    information in a reply from the server can't be trusted after getting a
    server break - but this is wrong since the server makes sure things are
    done in the right order, holding up our ops if necessary[*].

    [*] There is an extremely unlikely scenario where a reply from before the
    CB.InitCallBackState could get its delivery deferred till after - at
    which point we think we have a promise when we don't. This, however,
    requires unlucky mass packet loss to one call.

    AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
    a server we've never contacted before, but this should be unnecessary.
    It's also further insulated from the problem on an initial mount by
    querying the server first with FS.GetCapabilities, which triggers the
    CB.InitCallBackState.

    Fix this by

    (1) Remove AFS_SERVER_FL_NEW.

    (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
    calculation.

    (3) In afs_cb_is_broken(), don't include cb_s_break in the check.

    Signed-off-by: David Howells

    David Howells
     

24 Oct, 2018

5 commits

  • Send probes to all the unprobed fileservers in a fileserver list on all
    addresses simultaneously in an attempt to find out the fastest route whilst
    not getting stuck for 20s on any server or address that we don't get a
    reply from.

    This alleviates the problem whereby attempting to access a new server can
    take a long time because the rotation algorithm ends up rotating through
    all servers and addresses until it finds one that responds.

    Signed-off-by: David Howells

    David Howells
     
  • Eliminate the address pointer from the address list cursor as it's
    redundant (ac->addrs[ac->index] can be used to find the same address) and
    address lists must be replaced rather than being rearranged, so is of
    limited value.

    Signed-off-by: David Howells

    David Howells
     
  • Implement support for talking to YFS-variant fileservers in the cache
    manager and the filesystem client. These implement upgraded services on
    the same port as their AFS services.

    YFS fileservers provide expanded capabilities over AFS.

    Signed-off-by: David Howells

    David Howells
     
  • Add a couple of tracepoints to log the production of I/O errors within the AFS
    filesystem.

    Signed-off-by: David Howells

    David Howells
     
  • Track VL servers as independent entities rather than lumping all their
    addresses together into one set and implement server-level rotation by:

    (1) Add the concept of a VL server list, where each server has its own
    separate address list. This code is similar to the FS server list.

    (2) Use the DNS resolver to retrieve a set of servers and their associated
    addresses, ports, preference and weight ratings.

    (3) In the case of a legacy DNS resolver or an address list given directly
    through /proc/net/afs/cells, create a list containing just a dummy
    server record and attach all the addresses to that.

    (4) Implement a simple rotation policy, for the moment ignoring the
    priorities and weights assigned to the servers.

    (5) Show the address list through /proc/net/afs//vlservers. This
    also displays the source and status of the data as indicated by the
    upcall.

    Signed-off-by: David Howells

    David Howells
     

15 Oct, 2018

1 commit

  • The recent patch to fix the afs_server struct leak didn't actually fix the
    bug, but rather fixed some of the symptoms. The problem is that an
    asynchronous call that holds a resource pointed to by call->reply[0] will
    find the pointer cleared in the call destructor, thereby preventing the
    resource from being cleaned up.

    In the case of the server record leak, the afs_fs_get_capabilities()
    function in devel code sets up a call with reply[0] pointing at the server
    record that should be altered when the result is obtained, but this was
    being cleared before the destructor was called, so the put in the
    destructor does nothing and the record is leaked.

    Commit f014ffb025c1 removed the additional ref obtained by
    afs_install_server(), but the removal of this ref is actually used by the
    garbage collector to mark a server record as being defunct after the record
    has expired through lack of use.

    The offending clearance of call->reply[0] upon completion in
    afs_process_async_call() has been there from the origin of the code, but
    none of the asynchronous calls actually use that pointer currently, so it
    should be safe to remove (note that synchronous calls don't involve this
    function).

    Fix this by the following means:

    (1) Revert commit f014ffb025c1.

    (2) Remove the clearance of reply[0] from afs_process_async_call().

    Without this, afs_manage_servers() will suffer an assertion failure if it
    sees a server record that didn't get used because the usage count is not 1.

    Fixes: f014ffb025c1 ("afs: Fix afs_server struct leak")
    Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
    Signed-off-by: David Howells
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

12 Oct, 2018

1 commit

  • Fix a leak of afs_server structs. The routine that installs them in the
    various lookup lists and trees gets a ref on leaving the function, whether
    it added the server or a server already exists. It shouldn't increment
    the refcount if it added the server.

    The effect of this that "rmmod kafs" will hang waiting for the leaked
    server to become unused.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

15 Jun, 2018

1 commit

  • At the moment, afs_break_callbacks calls afs_break_one_callback() for each
    separate FID it was given, and the latter looks up the volume individually
    for each one.

    However, this is inefficient if two or more FIDs have the same vid as we
    could reuse the volume. This is complicated by cell aliasing whereby we
    may have multiple cells sharing a volume and can therefore have multiple
    callback interests for any particular volume ID.

    At the moment afs_break_one_callback() scans the entire list of volumes
    we're getting from a server and breaks the appropriate callback in every
    matching volume, regardless of cell. This scan is done for every FID.

    Optimise callback breaking by the following means:

    (1) Sort the FID list by vid so that all FIDs belonging to the same volume
    are clumped together.

    This is done through the use of an indirection table as we cannot do
    an insertion sort on the afs_callback_break array as we decode FIDs
    into it as we subsequently also have to decode callback info into it
    that corresponds by array index only.

    We also don't really want to bubblesort afterwards if we can avoid it.

    (2) Sort the server->cb_interests array by vid so that all the matching
    volumes are grouped together. This permits the scan to stop after
    finding a record that has a higher vid.

    (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
    possible, caching the start point in the array for that volume group
    as long as possible.

    It might make sense to add another layer in that list and have a
    refcounted volume ID anchor that has the matching interests attached
    to it rather than being in the list. This would allow the lock to be
    dropped without losing the cursor.

    Signed-off-by: David Howells

    David Howells
     

14 May, 2018

2 commits

  • The code that looks up servers by addresses makes the assumption
    that the list of addresses for a server is sorted. It exits the
    loop if it finds that the target address is larger than the
    current candidate. As the list is not currently sorted, this
    can lead to a failure to find a matching server, which can cause
    callbacks from that server to be ignored.

    Remove the early exit case so that the complete list is searched.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells

    Marc Dionne
     
  • When a server record is destroyed, we want to send a message to the server
    telling it that we're giving up all the callbacks it has promised us.

    Apply two fixes to this:

    (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
    callback from that server. We assume this to be the case if we
    performed at least one successful FS operation on that server.

    (2) Send it to the address last used for that server rather than always
    picking the first address in the list (which might be unreachable).

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells

    David Howells
     

21 Apr, 2018

1 commit

  • AFS server records get removed from the net->fs_servers tree when
    they're deleted, but not from the net->fs_addresses{4,6} lists, which
    can lead to an oops in afs_find_server() when a server record has been
    removed, for instance during rmmod.

    Fix this by deleting the record from the by-address lists before posting
    it for RCU destruction.

    The reason this hasn't been noticed before is that the fileserver keeps
    probing the local cache manager, thereby keeping the service record
    alive, so the oops would only happen when a fileserver eventually gets
    bored and stops pinging or if the module gets rmmod'd and a call comes
    in from the fileserver during the window between the server records
    being destroyed and the socket being closed.

    The oops looks something like:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
    ...
    Workqueue: kafsd afs_process_async_call [kafs]
    RIP: 0010:afs_find_server+0x271/0x36f [kafs]
    ...
    Call Trace:
    afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
    afs_deliver_to_call+0x1ee/0x5e8 [kafs]
    afs_process_async_call+0x5b/0xd0 [kafs]
    process_one_work+0x2c2/0x504
    worker_thread+0x1d4/0x2ac
    kthread+0x11f/0x127
    ret_from_fork+0x24/0x30

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

10 Apr, 2018

1 commit

  • Fix warnings raised by checker, including:

    (*) Warnings raised by unequal comparison for the purposes of sorting,
    where the endianness doesn't matter:

    fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
    fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
    fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
    fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
    fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
    fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer

    (*) afs_set_cb_interest() is not actually used and can be removed.

    (*) afs_cell_gc_delay() should be provided with a sysctl.

    (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
    cell->vl_addrs.

    (*) afs_init_fs_cursor() should be static.

    (*) struct afs_vnode::permit_cache needs to be marked __rcu.

    (*) afs_server_rcu() needs to use rcu_access_pointer().

    (*) afs_destroy_server() should use rcu_access_pointer() on
    server->addresses as the server object is no longer accessible.

    (*) afs_find_server() casts __be16/__be32 values to int in order to
    directly compare them for the purpose of finding a match in a list,
    but is should also annotate the cast with __force to avoid checker
    warnings.

    (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
    readlock, though it doesn't then access the value; the extraneous
    access is deleted.

    False positives:

    (*) Conditional locking around the code in xdr_decode_AFSFetchStatus. This
    can be dealt with in a separate patch.

    fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block

    (*) Incorrect handling of seq-retry lock context balance:

    fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
    lock contexts for basic block
    fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
    fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block

    Errors:

    (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
    round again if it successfully found the workstation cell.

    (*) Fix UUID decode in afs_deliver_cb_probe_uuid().

    (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
    jumps to the someone_else_changed_it label. Move the unlock to after
    the label.

    (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
    encoding to XDR.

    (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
    when decoding from XDR.

    Signed-off-by: David Howells

    David Howells
     

20 Mar, 2018

1 commit


13 Nov, 2017

9 commits

  • YFS VL servers offer an upgraded Volume Location service that can return
    IPv6 addresses to fileservers and volume servers in addition to IPv4
    addresses using the YFSVL.GetEndpoints operation which we should use if
    it's available.

    To this end:

    (1) Make rxrpc_kernel_recv_data() return the call's current service ID so
    that the caller can detect service upgrade and see what the service
    was upgraded to.

    (2) When we see a VL server address we haven't seen before, send a
    VL.GetCapabilities operation to it with the service upgrade bit set.

    If we get an upgrade to the YFS VL service, change the service ID in
    the address list for that address to use the upgraded service and set
    a flag to note that this appears to be a YFS-compatible server.

    (3) If, when a server's addresses are being looked up, we note that we
    previously detected a YFS-compatible server, then send the
    YFSVL.GetEndpoints operation rather than VL.GetAddrsU.

    (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints,
    including both IPv4 and IPv6 addresses. Volume server addresses are
    discarded.

    (5) The address list is sorted by address and port now, instead of just
    address. This allows multiple servers on the same host sitting on
    different ports.

    Signed-off-by: David Howells

    David Howells
     
  • The current code assumes that volumes and servers are per-cell and are
    never shared, but this is not enforced, and, indeed, public cells do exist
    that are aliases of each other. Further, an organisation can, say, set up
    a public cell and a private cell with overlapping, but not identical, sets
    of servers. The difference is purely in the database attached to the VL
    servers.

    The current code will malfunction if it sees a server in two cells as it
    assumes global address -> server record mappings and that each server is in
    just one cell.

    Further, each server may have multiple addresses - and may have addresses
    of different families (IPv4 and IPv6, say).

    To this end, the following structural changes are made:

    (1) Server record management is overhauled:

    (a) Server records are made independent of cell. The namespace keeps
    track of them, volume records have lists of them and each vnode
    has a server on which its callback interest currently resides.

    (b) The cell record no longer keeps a list of servers known to be in
    that cell.

    (c) The server records are now kept in a flat list because there's no
    single address to sort on.

    (d) Server records are now keyed by their UUID within the namespace.

    (e) The addresses for a server are obtained with the VL.GetAddrsU
    rather than with VL.GetEntryByName, using the server's UUID as a
    parameter.

    (f) Cached server records are garbage collected after a period of
    non-use and are counted out of existence before purging is allowed
    to complete. This protects the work functions against rmmod.

    (g) The servers list is now in /proc/fs/afs/servers.

    (2) Volume record management is overhauled:

    (a) An RCU-replaceable server list is introduced. This tracks both
    servers and their coresponding callback interests.

    (b) The superblock is now keyed on cell record and numeric volume ID.

    (c) The volume record is now tied to the superblock which mounts it,
    and is activated when mounted and deactivated when unmounted.
    This makes it easier to handle the cache cookie without causing a
    double-use in fscache.

    (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
    to get the server UUID list.

    (e) The volume name is updated if it is seen to have changed when the
    volume is updated (the update is keyed on the volume ID).

    (3) The vlocation record is got rid of and VLDB records are no longer
    cached. Sufficient information is stored in the volume record, though
    an update to a volume record is now no longer shared between related
    volumes (volumes come in bundles of three: R/W, R/O and backup).

    and the following procedural changes are made:

    (1) The fileserver cursor introduced previously is now fleshed out and
    used to iterate over fileservers and their addresses.

    (2) Volume status is checked during iteration, and the server list is
    replaced if a change is detected.

    (3) Server status is checked during iteration, and the address list is
    replaced if a change is detected.

    (4) The abort code is saved into the address list cursor and -ECONNABORTED
    returned in afs_make_call() if a remote abort happened rather than
    translating the abort into an error message. This allows actions to
    be taken depending on the abort code more easily.

    (a) If a VMOVED abort is seen then this is handled by rechecking the
    volume and restarting the iteration.

    (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
    handled by sleeping for a short period and retrying and/or trying
    other servers that might serve that volume. A message is also
    displayed once until the condition has cleared.

    (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
    moment.

    (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
    see if it has been deleted; if not, the fileserver is probably
    indicating that the volume couldn't be attached and needs
    salvaging.

    (e) If statfs() sees one of these aborts, it does not sleep, but
    rather returns an error, so as not to block the umount program.

    (5) The fileserver iteration functions in vnode.c are now merged into
    their callers and more heavily macroised around the cursor. vnode.c
    is removed.

    (6) Operations on a particular vnode are serialised on that vnode because
    the server will lock that vnode whilst it operates on it, so a second
    op sent will just have to wait.

    (7) Fileservers are probed with FS.GetCapabilities before being used.
    This is where service upgrade will be done.

    (8) A callback interest on a fileserver is set up before an FS operation
    is performed and passed through to afs_make_call() so that it can be
    set on the vnode if the operation returns a callback. The callback
    interest is passed through to afs_iget() also so that it can be set
    there too.

    In general, record updating is done on an as-needed basis when we try to
    access servers, volumes or vnodes rather than offloading it to work items
    and special threads.

    Notes:

    (1) Pre AFS-3.4 servers are no longer supported, though this can be added
    back if necessary (AFS-3.4 was released in 1998).

    (2) VBUSY is retried forever for the moment at intervals of 1s.

    (3) /proc/fs/afs//servers no longer exists.

    Signed-off-by: David Howells

    David Howells
     
  • Add an RCU replaceable address list structure to hold a list of server
    addresses. The list also holds the

    To this end:

    (1) A cell's VL server address list can be loaded directly via insmod or
    echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
    or SRV records.

    (2) Anyone wanting to use a cell's VL server address must wait until the
    cell record comes online and has tried to obtain some addresses.

    (3) An FS server's address list, for the moment, has a single entry that
    is the key to the server list. This will change in the future when a
    server is instead keyed on its UUID and the VL.GetAddrsU operation is
    used.

    (4) An 'address cursor' concept is introduced to handle iteration through
    the address list. This is passed to the afs_make_call() as, in the
    future, stuff (such as abort code) that doesn't outlast the call will
    be returned in it.

    In the future, we might want to annotate the list with information about
    how each address fares. We might then want to propagate such annotations
    over address list replacement.

    Whilst we're at it, we allow IPv6 addresses to be specified in
    colon-delimited lists by enclosing them in square brackets.

    Signed-off-by: David Howells

    David Howells
     
  • Overhaul the AFS callback handling by the following means:

    (1) Don't give up callback promises on vnodes that we are no longer using,
    rather let them just expire on the server or let the server break
    them. This is actually more efficient for the server as the callback
    lookup is expensive if there are lots of extant callbacks.

    (2) Only give up the callback promises we have from a server when the
    server record is destroyed. Then we can just give up *all* the
    callback promises on it in one go.

    (3) Servers can end up being shared between cells if cells are aliased, so
    don't add all the vnodes being backed by a particular server into a
    big FID-indexed tree on that server as there may be duplicates.

    Instead have each volume instance (~= superblock) register an interest
    in a server as it starts to make use of it and use this to allow the
    processor for callbacks from the server to find the superblock and
    thence the inode corresponding to the FID being broken by means of
    ilookup_nowait().

    (4) Rather than iterating over the entire callback list when a mass-break
    comes in from the server, maintain a counter of mass-breaks in
    afs_server (cb_seq) and make afs_validate() check it against the copy
    in afs_vnode.

    It would be nice not to have to take a read_lock whilst doing this,
    but that's tricky without using RCU.

    (5) Save a ref on the fileserver we're using for a call in the afs_call
    struct so that we can access its cb_s_break during call decoding.

    (6) Write-lock around callback and status storage in a vnode and read-lock
    around getattr so that we don't see the status mid-update.

    This has the following consequences:

    (1) Data invalidation isn't seen until someone calls afs_validate() on a
    vnode. Unfortunately, we need to use a key to query the server, but
    getting one from a background thread is tricky without caching loads
    of keys all over the place.

    (2) Mass invalidation isn't seen until someone calls afs_validate().

    (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
    Could this be replaced with rcu_read_lock() since inodes are destroyed
    under RCU conditions.

    Signed-off-by: David Howells

    David Howells
     
  • Allow VL server specifications to be given IPv6 addresses as well as IPv4
    addresses, for example as:

    echo add foo.org 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells

    Note that ':' is the expected separator for separating IPv4 addresses, but
    if a ',' is detected or no '.' is detected in the string, the delimiter is
    switched to ','.

    This also works with DNS AFSDB or SRV record strings fetched by upcall from
    userspace.

    Signed-off-by: David Howells

    David Howells
     
  • Keep and pass sockaddr_rxrpc addresses around rather than keeping and
    passing in_addr addresses to allow for the use of IPv6 and non-standard
    port numbers in future.

    This also allows the port and service_id fields to be removed from the
    afs_call struct.

    Signed-off-by: David Howells

    David Howells
     
  • Push the network namespace pointer to more places in AFS, including the
    afs_server structure (which doesn't hold a ref on the netns).

    In particular, afs_put_cell() now takes requires a net ns parameter so that
    it can safely alter the netns after decrementing the cell usage count - the
    cell will be deallocated by a background thread after being cached for a
    period, which means that it's not safe to access it after reducing its
    usage count.

    Signed-off-by: David Howells

    David Howells
     
  • Fix server reaping and make sure it's all done before we start trying to
    purge cells, given that servers currently pin cells.

    Signed-off-by: David Howells

    David Howells
     
  • Lay the groundwork for supporting network namespaces (netns) to the AFS
    filesystem by moving various global features to a network-namespace struct
    (afs_net) and providing an instance of this as a temporary global variable
    that everything uses via accessor functions for the moment.

    The following changes have been made:

    (1) Store the netns in the superblock info. This will be obtained from
    the mounter's nsproxy on a manual mount and inherited from the parent
    superblock on an automount.

    (2) The cell list is made per-netns. It can be viewed through
    /proc/net/afs/cells and also be modified by writing commands to that
    file.

    (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
    This is unset by default.

    (4) The 'rootcell' module parameter, which sets a cell and VL server list
    modifies the init net namespace, thereby allowing an AFS root fs to be
    theoretically used.

    (5) The volume location lists and the file lock manager are made
    per-netns.

    (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

    The various workqueues remain global for the moment.

    Changes still to be made:

    (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
    from the old name.

    (2) A per-netns subsys needs to be registered for AFS into which it can
    store its per-netns data.

    (3) Rather than the AF_RXRPC socket being opened on module init, it needs
    to be opened on the creation of a superblock in that netns.

    (4) The socket needs to be closed when the last superblock using it is
    destroyed and all outstanding client calls on it have been completed.
    This prevents a reference loop on the namespace.

    (5) It is possible that several namespaces will want to use AFS, in which
    case each one will need its own UDP port. These can either be set
    through /proc/net/afs/cm_port or the kernel can pick one at random.
    The init_ns gets 7001 by default.

    Other issues that need resolving:

    (1) The DNS keyring needs net-namespacing.

    (2) Where do upcalls go (eg. DNS request-key upcall)?

    (3) Need something like open_socket_in_file_ns() syscall so that AFS
    command line tools attempting to operate on an AFS file/volume have
    their RPC calls go to the right place.

    Signed-off-by: David Howells

    David Howells
     

17 Mar, 2017

1 commit

  • get_seconds() returns real wall-clock seconds. On 32-bit systems
    this value will overflow in year 2038 and beyond. This patch changes
    afs's vlocation record to use ktime_get_real_seconds() instead, for the
    fields time_of_death and update_at.

    Signed-off-by: Tina Ruchandani
    Signed-off-by: David Howells

    Tina Ruchandani
     

30 Aug, 2016

1 commit

  • Provide a function so that kernel users, such as AFS, can ask for the peer
    address of a call:

    void rxrpc_kernel_get_peer(struct rxrpc_call *call,
    struct sockaddr_rxrpc *_srx);

    In the future the kernel service won't get sk_buffs to look inside.
    Further, this allows us to hide any canonicalisation inside AF_RXRPC for
    when IPv6 support is added.

    Also propagate this through to afs_find_server() and issue a warning if we
    can't handle the address family yet.

    Signed-off-by: David Howells

    David Howells
     

14 Aug, 2012

1 commit

  • Convert delayed_work users doing cancel_delayed_work() followed by
    queue_delayed_work() to mod_delayed_work().

    Most conversions are straight-forward. Ones worth mentioning are,

    * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
    use mod_delayed_work() and cancel loop in
    edac_mc_reset_delay_period() is dropped.

    * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
    watchdog is active or not. @fan_watchdog_active and related code
    dropped.

    * drivers/power/charger-manager.c: Seemingly a lot of
    delayed_work_pending() abuse going on here.
    [delayed_]work_pending() are unsynchronized and racy when used like
    this. I converted one instance in fullbatt_handler(). Please
    conver the rest so that it invokes workqueue APIs for the intended
    target state rather than trying to game work item pending state
    transitions. e.g. if timer should be modified - call
    mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

    * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
    simplified. Note that round_jiffies() calls in this function are
    meaningless. round_jiffies() work on absolute jiffies not delta
    delay used by delayed_work.

    v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work(). They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock. __cancel_delayed_work() users are
    dropped.

    Signed-off-by: Tejun Heo
    Acked-by: Henrique de Moraes Holschuh
    Acked-by: Dmitry Torokhov
    Acked-by: Anton Vorontsov
    Acked-by: David Howells
    Cc: Tomi Valkeinen
    Cc: Jens Axboe
    Cc: Jiri Kosina
    Cc: Doug Thompson
    Cc: David Airlie
    Cc: Roland Dreier
    Cc: "John W. Linville"
    Cc: Zhang Rui
    Cc: Len Brown
    Cc: "J. Bruce Fields"
    Cc: Johannes Berg

    Tejun Heo
     

15 Jan, 2011

1 commit

  • flush_scheduled_work() is going away. afs needs to make sure all the
    works it has queued have finished before being unloaded and there can
    be arbitrary number of pending works. Add afs_wq and use it as the
    flush domain instead of the system workqueue.

    Also, convert cancel_delayed_work() + flush_scheduled_work() to
    cancel_delayed_work_sync() in afs_mntpt_kill_timer().

    Signed-off-by: Tejun Heo
    Signed-off-by: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

02 Jun, 2010

1 commit


31 Oct, 2008

1 commit


17 Oct, 2007

1 commit

  • This patch contains the following possible cleanups:
    - make the following needlessly global functions static:
    - rxrpc.c: afs_send_pages()
    - vlocation.c: afs_vlocation_queue_for_updates()
    - write.c: afs_writepages_region()
    - make the following needlessly global variables static:
    - mntpt.c: afs_mntpt_expiry_timeout
    - proc.c: afs_vlocation_states[]
    - server.c: afs_server_timeout
    - vlocation.c: afs_vlocation_timeout
    - vlocation.c: afs_vlocation_update_timeout
    - #if 0 the following unused function:
    - cell.c: afs_get_cell_maybe()
    - #if 0 the following unused variables:
    - callback.c: afs_vnode_update_timeout
    - cmservice.c: struct afs_cm_workqueue

    Signed-off-by: Adrian Bunk
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

10 May, 2007

1 commit

  • Make some miscellaneous changes to the AFS filesystem:

    (1) Assert RCU barriers on module exit to make sure RCU has finished with
    callbacks in this module.

    (2) Correctly handle the AFS server returning a zero-length read.

    (3) Split out data zapping calls into one function (afs_zap_data).

    (4) Rename some afs_file_*() functions to afs_*() where they apply to
    non-regular files too.

    (5) Be consistent about the presentation of volume ID:vnode ID in debugging
    output.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

27 Apr, 2007

2 commits

  • Add support for the create, link, symlink, unlink, mkdir, rmdir and
    rename VFS operations to the in-kernel AFS filesystem.

    Also:

    (1) Fix dentry and inode revalidation. d_revalidate should only look at
    state of the dentry. Revalidation of the contents of an inode pointed to
    by a dentry is now separate.

    (2) Fix afs_lookup() to hash negative dentries as well as positive ones.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells