04 Jun, 2020

2 commits

  • Put in the first phase of cell alias detection. This part handles alias
    detection for cells that have root.cell volumes (which is expected to be
    likely).

    When a cell becomes newly active, it is probed for its root.cell volume,
    and if it has one, this volume is compared against other root.cell volumes
    to find out if the list of fileserver UUIDs have any in common - and if
    that's the case, do the address lists of those fileservers have any
    addresses in common. If they do, the new cell is adjudged to be an alias
    of the old cell and the old cell is used instead.

    Comparing is aided by the server list in struct afs_server_list being
    sorted in UUID order and the addresses in the fileserver address lists
    being sorted in address order.

    The cell then retains the afs_volume object for the root.cell volume, even
    if it's not mounted for future alias checking.

    This necessary because:

    (1) Whilst fileservers have UUIDs that are meant to be globally unique, in
    practice they are not because cells get cloned without changing the
    UUIDs - so afs_server records need to be per cell.

    (2) Sometimes the DNS is used to make cell aliases - but if we don't know
    they're the same, we may end up with multiple superblocks and multiple
    afs_server records for the same thing, impairing our ability to
    deliver callback notifications of third party changes

    (3) The fileserver RPC API doesn't contain the cell name, so it can't tell
    us which cell it's notifying and can't see that a change made to to
    one cell should notify the same client that's also accessed as the
    other cell.

    Reported-by: Jeffrey Altman
    Signed-off-by: David Howells

    David Howells
     
  • Turn the afs_operation struct into the main way that most fileserver
    operations are managed. Various things are added to the struct, including
    the following:

    (1) All the parameters and results of the relevant operations are moved
    into it, removing corresponding fields from the afs_call struct.
    afs_call gets a pointer to the op.

    (2) The target volume is made the main focus of the operation, rather than
    the target vnode(s), and a bunch of op->vnode->volume are made
    op->volume instead.

    (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
    in most operations. The vnode record (struct afs_vnode_param)
    contains:

    - The vnode pointer.

    - The fid of the vnode to be included in the parameters or that was
    returned in the reply (eg. FS.MakeDir).

    - The status and callback information that may be returned in the
    reply about the vnode.

    - Callback break and data version tracking for detecting
    simultaneous third-parth changes.

    (4) Pointers to dentries to be updated with new inodes.

    (5) An operations table pointer. The table includes pointers to functions
    for issuing AFS and YFS-variant RPCs, handling the success and abort
    of an operation and handling post-I/O-lock local editing of a
    directory.

    To make this work, the following function restructuring is made:

    (A) The rotation loop that issues calls to fileservers that can be found
    in each function that wants to issue an RPC (such as afs_mkdir()) is
    extracted out into common code, in a new file called fs_operation.c.

    (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
    a much smaller piece of code that allocates an operation, sets the
    parameters and then calls out to the common code to do the actual
    work.

    (C) The code for handling the success and failure of an operation are
    moved into operation functions (as (5) above) and these are called
    from the core code at appropriate times.

    (D) The pseudo inode getting stuff used by the dynamic root code is moved
    over into dynroot.c.

    (E) struct afs_iget_data is absorbed into the operation struct and
    afs_iget() expects to be given an op pointer and a vnode record.

    (F) Point (E) doesn't work for the root dir of a volume, but we know the
    FID in advance (it's always vnode 1, unique 1), so a separate inode
    getter, afs_root_iget(), is provided to special-case that.

    (G) The inode status init/update functions now also take an op and a vnode
    record.

    (H) The RPC marshalling functions now, for the most part, just take an
    afs_operation struct as their only argument. All the data they need
    is held there. The result delivery functions write their answers
    there as well.

    (I) The call is attached to the operation and then the operation core does
    the waiting.

    And then the new operation code is, for the moment, made to just initialise
    the operation, get the appropriate vnode I/O locks and do the same rotation
    loop as before.

    This lays the foundation for the following changes in the future:

    (*) Overhauling the rotation (again).

    (*) Support for asynchronous I/O, where the fileserver rotation must be
    done asynchronously also.

    Signed-off-by: David Howells

    David Howells
     

03 Jun, 2019

1 commit

  • David Howells says:
    I'm told that there's not really any point populating the list.
    Current OpenAFS ignores it, as does AuriStor - and IBM AFS 3.6 will
    do the right thing.
    The list is actually useless as it's the client's view of the world,
    not the servers, so if there's any NAT in the way its contents are
    invalid. Further, it doesn't support IPv6 addresses.

    On that basis, feel free to make it an empty list and remove all the
    interface enumeration.

    V1 of this patch reworked the function to use a new helper for the
    ifa_list iteration to avoid sparse warnings once the proper __rcu
    annotations get added in struct in_device later.

    But, in light of the above, just remove afs_get_ipv4_interfaces.

    Compile tested only.

    Cc: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Florian Westphal
    Tested-by: David Howells
    Signed-off-by: David S. Miller

    Florian Westphal
     

25 Apr, 2019

1 commit

  • Implement sillyrename for AFS unlink and rename, using the NFS variant
    implementation as a basis.

    Note that the asynchronous file locking extender/releaser has to be
    notified with a state change to stop it complaining if there's a race
    between that and the actual file deletion.

    A tracepoint, afs_silly_rename, is also added to note the silly rename and
    the cleanup. The afs_edit_dir tracepoint is given some extra reason
    indicators and the afs_flock_ev tracepoint is given a silly-delete file
    lock cancellation indicator.

    Signed-off-by: David Howells

    David Howells
     

24 Oct, 2018

3 commits

  • Send probes to all the unprobed fileservers in a fileserver list on all
    addresses simultaneously in an attempt to find out the fastest route whilst
    not getting stuck for 20s on any server or address that we don't get a
    reply from.

    This alleviates the problem whereby attempting to access a new server can
    take a long time because the rotation algorithm ends up rotating through
    all servers and addresses until it finds one that responds.

    Signed-off-by: David Howells

    David Howells
     
  • Implement support for talking to YFS-variant fileservers in the cache
    manager and the filesystem client. These implement upgraded services on
    the same port as their AFS services.

    YFS fileservers provide expanded capabilities over AFS.

    Signed-off-by: David Howells

    David Howells
     
  • Track VL servers as independent entities rather than lumping all their
    addresses together into one set and implement server-level rotation by:

    (1) Add the concept of a VL server list, where each server has its own
    separate address list. This code is similar to the FS server list.

    (2) Use the DNS resolver to retrieve a set of servers and their associated
    addresses, ports, preference and weight ratings.

    (3) In the case of a legacy DNS resolver or an address list given directly
    through /proc/net/afs/cells, create a list containing just a dummy
    server record and attach all the addresses to that.

    (4) Implement a simple rotation policy, for the moment ignoring the
    priorities and weights assigned to the servers.

    (5) Show the address list through /proc/net/afs//vlservers. This
    also displays the source and status of the data as indicated by the
    upcall.

    Signed-off-by: David Howells

    David Howells
     

15 Jun, 2018

1 commit

  • The AFS filesystem depends at the moment on /proc for configuration and
    also presents information that way - however, this causes a compilation
    failure if procfs is disabled.

    Fix it so that the procfs bits aren't compiled in if procfs is disabled.

    This means that you can't configure the AFS filesystem directly, but it is
    still usable provided that an up-to-date keyutils is installed to look up
    cells by SRV or AFSDB DNS records.

    Reported-by: Al Viro
    Signed-off-by: David Howells

    David Howells
     

10 Apr, 2018

2 commits

  • Locally edit the contents of an AFS directory upon a successful inode
    operation that modifies that directory (such as mkdir, create and unlink)
    so that we can avoid the current practice of re-downloading the directory
    after each change.

    This is viable provided that the directory version number we get back from
    the modifying RPC op is exactly incremented by 1 from what we had
    previously. The data in the directory contents is in a defined format that
    we have to parse locally to perform lookups and readdir, so modifying isn't
    a problem.

    If the edit fails, we just clear the VALID flag on the directory and it
    will be reloaded next time it is needed.

    Signed-off-by: David Howells

    David Howells
     
  • Split the AFS dynamic root stuff out of the main directory handling file
    and into its own file as they share little in common.

    The dynamic root code also gets its own dentry and inode ops tables.

    Signed-off-by: David Howells

    David Howells
     

13 Nov, 2017

3 commits

  • The current code assumes that volumes and servers are per-cell and are
    never shared, but this is not enforced, and, indeed, public cells do exist
    that are aliases of each other. Further, an organisation can, say, set up
    a public cell and a private cell with overlapping, but not identical, sets
    of servers. The difference is purely in the database attached to the VL
    servers.

    The current code will malfunction if it sees a server in two cells as it
    assumes global address -> server record mappings and that each server is in
    just one cell.

    Further, each server may have multiple addresses - and may have addresses
    of different families (IPv4 and IPv6, say).

    To this end, the following structural changes are made:

    (1) Server record management is overhauled:

    (a) Server records are made independent of cell. The namespace keeps
    track of them, volume records have lists of them and each vnode
    has a server on which its callback interest currently resides.

    (b) The cell record no longer keeps a list of servers known to be in
    that cell.

    (c) The server records are now kept in a flat list because there's no
    single address to sort on.

    (d) Server records are now keyed by their UUID within the namespace.

    (e) The addresses for a server are obtained with the VL.GetAddrsU
    rather than with VL.GetEntryByName, using the server's UUID as a
    parameter.

    (f) Cached server records are garbage collected after a period of
    non-use and are counted out of existence before purging is allowed
    to complete. This protects the work functions against rmmod.

    (g) The servers list is now in /proc/fs/afs/servers.

    (2) Volume record management is overhauled:

    (a) An RCU-replaceable server list is introduced. This tracks both
    servers and their coresponding callback interests.

    (b) The superblock is now keyed on cell record and numeric volume ID.

    (c) The volume record is now tied to the superblock which mounts it,
    and is activated when mounted and deactivated when unmounted.
    This makes it easier to handle the cache cookie without causing a
    double-use in fscache.

    (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
    to get the server UUID list.

    (e) The volume name is updated if it is seen to have changed when the
    volume is updated (the update is keyed on the volume ID).

    (3) The vlocation record is got rid of and VLDB records are no longer
    cached. Sufficient information is stored in the volume record, though
    an update to a volume record is now no longer shared between related
    volumes (volumes come in bundles of three: R/W, R/O and backup).

    and the following procedural changes are made:

    (1) The fileserver cursor introduced previously is now fleshed out and
    used to iterate over fileservers and their addresses.

    (2) Volume status is checked during iteration, and the server list is
    replaced if a change is detected.

    (3) Server status is checked during iteration, and the address list is
    replaced if a change is detected.

    (4) The abort code is saved into the address list cursor and -ECONNABORTED
    returned in afs_make_call() if a remote abort happened rather than
    translating the abort into an error message. This allows actions to
    be taken depending on the abort code more easily.

    (a) If a VMOVED abort is seen then this is handled by rechecking the
    volume and restarting the iteration.

    (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
    handled by sleeping for a short period and retrying and/or trying
    other servers that might serve that volume. A message is also
    displayed once until the condition has cleared.

    (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
    moment.

    (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
    see if it has been deleted; if not, the fileserver is probably
    indicating that the volume couldn't be attached and needs
    salvaging.

    (e) If statfs() sees one of these aborts, it does not sleep, but
    rather returns an error, so as not to block the umount program.

    (5) The fileserver iteration functions in vnode.c are now merged into
    their callers and more heavily macroised around the cursor. vnode.c
    is removed.

    (6) Operations on a particular vnode are serialised on that vnode because
    the server will lock that vnode whilst it operates on it, so a second
    op sent will just have to wait.

    (7) Fileservers are probed with FS.GetCapabilities before being used.
    This is where service upgrade will be done.

    (8) A callback interest on a fileserver is set up before an FS operation
    is performed and passed through to afs_make_call() so that it can be
    set on the vnode if the operation returns a callback. The callback
    interest is passed through to afs_iget() also so that it can be set
    there too.

    In general, record updating is done on an as-needed basis when we try to
    access servers, volumes or vnodes rather than offloading it to work items
    and special threads.

    Notes:

    (1) Pre AFS-3.4 servers are no longer supported, though this can be added
    back if necessary (AFS-3.4 was released in 1998).

    (2) VBUSY is retried forever for the moment at intervals of 1s.

    (3) /proc/fs/afs//servers no longer exists.

    Signed-off-by: David Howells

    David Howells
     
  • Move server rotation code into its own file.

    Signed-off-by: David Howells

    David Howells
     
  • Add an RCU replaceable address list structure to hold a list of server
    addresses. The list also holds the

    To this end:

    (1) A cell's VL server address list can be loaded directly via insmod or
    echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
    or SRV records.

    (2) Anyone wanting to use a cell's VL server address must wait until the
    cell record comes online and has tried to obtain some addresses.

    (3) An FS server's address list, for the moment, has a single entry that
    is the key to the server list. This will change in the future when a
    server is instead keyed on its UUID and the VL.GetAddrsU operation is
    used.

    (4) An 'address cursor' concept is introduced to handle iteration through
    the address list. This is passed to the afs_make_call() as, in the
    future, stuff (such as abort code) that doesn't outlast the call will
    be returned in it.

    In the future, we might want to annotate the list with information about
    how each address fares. We might then want to propagate such annotations
    over address list replacement.

    Whilst we're at it, we allow IPv6 addresses to be specified in
    colon-delimited lists by enclosing them in square brackets.

    Signed-off-by: David Howells

    David Howells
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Jul, 2017

1 commit

  • Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
    available. The following xattrs are now available:

    - "afs.cell"

    The name of the cell in which the vnode's volume resides.

    - "afs.fid"

    The volume ID, vnode ID and vnode uniquifier of the file as three hex
    numbers separated by colons.

    - "afs.volume"

    The name of the volume in which the vnode resides.

    For example:

    # getfattr -d -m ".*" /mnt/scratch
    getfattr: Removing leading '/' from absolute path names
    # file: mnt/scratch
    afs.cell="mycell.myorg.org"
    afs.fid="10000b:1:1"
    afs.volume="scratch"

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

03 Apr, 2009

1 commit

  • The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
    through it any attached caches. The kAFS filesystem will use caching
    automatically if it's available.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     

17 Jul, 2007

1 commit


10 May, 2007

1 commit

  • Implement support for writing to regular AFS files, including:

    (1) write

    (2) truncate

    (3) fsync, fdatasync

    (4) chmod, chown, chgrp, utime.

    AFS writeback attempts to batch writes into as chunks as large as it can manage
    up to the point that it writes back 65535 pages in one chunk or it meets a
    locked page.

    Furthermore, if a page has been written to using a particular key, then should
    another write to that page use some other key, the first write will be flushed
    before the second is allowed to take place. If the first write fails due to a
    security error, then the page will be scrapped and reread before the second
    write takes place.

    If a page is dirty and the callback on it is broken by the server, then the
    dirty data is not discarded (same behaviour as NFS).

    Shared-writable mappings are not supported by this patch.

    [akpm@linux-foundation.org: fix a bunch of warnings]
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

03 May, 2007

1 commit


27 Apr, 2007

4 commits

  • Add support for the CB.GetCapabilities operation with which the fileserver can
    ask the client for the following information:

    (1) The list of network interfaces it has available as IPv4 address + netmask
    plus the MTUs.

    (2) The client's UUID.

    (3) The extended capabilities of the client, for which the only current one
    is unified error mapping (abort code interpretation).

    To support this, the patch adds the following routines to AFS:

    (1) A function to iterate through all the network interfaces using RTNETLINK
    to extract IPv4 addresses and MTUs.

    (2) A function to iterate through all the network interfaces using RTNETLINK
    to pull out the MAC address of the lowest index interface to use in UUID
    construction.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Add security support to the AFS filesystem. Kerberos IV tickets are added as
    RxRPC keys are added to the session keyring with the klog program. open() and
    other VFS operations then find this ticket with request_key() and either use
    it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open).

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Clean up the AFS sources.

    Also remove references to AFS keys. RxRPC keys are used instead.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds