12 Oct, 2018

1 commit

  • Access to the list of cells by /proc/net/afs/cells has a couple of
    problems:

    (1) It should be checking against SEQ_START_TOKEN for the keying the
    header line.

    (2) It's only holding the RCU read lock, so it can't just walk over the
    list without following the proper RCU methods.

    Fix these by using an hlist instead of an ordinary list and using the
    appropriate accessor functions to follow it with RCU.

    Since the code that adds a cell to the list must also necessarily change,
    sort the list on insertion whilst we're at it.

    Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

15 Jun, 2018

1 commit

  • Alter the dynroot mount so that cells created by manipulation of
    /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
    cell as a module parameter will cause directories for those cells to be
    created in the dynamic root superblock for the network namespace[*].

    To this end:

    (1) Only one dynamic root superblock is now created per network namespace
    and this is shared between all attempts to mount it. This makes it
    easier to find the superblock to modify.

    (2) When a dynamic root superblock is created, the list of cells is walked
    and directories created for each cell already defined.

    (3) When a new cell is added, if a dynamic root superblock exists, a
    directory is created for it.

    (4) When a cell is destroyed, the directory is removed.

    (5) These directories are created by calling lookup_one_len() on the root
    dir which automatically creates them if they don't exist.

    [*] Inasmuch as network namespaces are currently supported here.

    Signed-off-by: David Howells

    David Howells
     

23 May, 2018

1 commit


10 Apr, 2018

1 commit

  • Implement the AFS feature by which @sys at the end of a pathname component
    may be substituted for one of a list of values, typically naming the
    operating system. Up to 16 alternatives may be specified and these are
    tried in turn until one works. Each network namespace has[*] a separate
    independent list.

    Upon creation of a new network namespace, the list of values is
    initialised[*] to a single OpenAFS-compatible string representing arch type
    plus "_linux26". For example, on x86_64, the sysname is "amd64_linux26".

    [*] Or will, once network namespace support is finalised in kAFS.

    The list may be set by:

    # for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

    for which separate writes to the same fd are amalgamated and applied on
    close. The LF character may be used as a separator to specify multiple
    items in the same write() call.

    The list may be cleared by:

    # echo >/proc/fs/afs/sysname

    and read by:

    # cat /proc/fs/afs/sysname
    foo
    bar
    linux-x86_64

    Signed-off-by: David Howells

    David Howells
     

13 Nov, 2017

7 commits

  • The current code assumes that volumes and servers are per-cell and are
    never shared, but this is not enforced, and, indeed, public cells do exist
    that are aliases of each other. Further, an organisation can, say, set up
    a public cell and a private cell with overlapping, but not identical, sets
    of servers. The difference is purely in the database attached to the VL
    servers.

    The current code will malfunction if it sees a server in two cells as it
    assumes global address -> server record mappings and that each server is in
    just one cell.

    Further, each server may have multiple addresses - and may have addresses
    of different families (IPv4 and IPv6, say).

    To this end, the following structural changes are made:

    (1) Server record management is overhauled:

    (a) Server records are made independent of cell. The namespace keeps
    track of them, volume records have lists of them and each vnode
    has a server on which its callback interest currently resides.

    (b) The cell record no longer keeps a list of servers known to be in
    that cell.

    (c) The server records are now kept in a flat list because there's no
    single address to sort on.

    (d) Server records are now keyed by their UUID within the namespace.

    (e) The addresses for a server are obtained with the VL.GetAddrsU
    rather than with VL.GetEntryByName, using the server's UUID as a
    parameter.

    (f) Cached server records are garbage collected after a period of
    non-use and are counted out of existence before purging is allowed
    to complete. This protects the work functions against rmmod.

    (g) The servers list is now in /proc/fs/afs/servers.

    (2) Volume record management is overhauled:

    (a) An RCU-replaceable server list is introduced. This tracks both
    servers and their coresponding callback interests.

    (b) The superblock is now keyed on cell record and numeric volume ID.

    (c) The volume record is now tied to the superblock which mounts it,
    and is activated when mounted and deactivated when unmounted.
    This makes it easier to handle the cache cookie without causing a
    double-use in fscache.

    (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
    to get the server UUID list.

    (e) The volume name is updated if it is seen to have changed when the
    volume is updated (the update is keyed on the volume ID).

    (3) The vlocation record is got rid of and VLDB records are no longer
    cached. Sufficient information is stored in the volume record, though
    an update to a volume record is now no longer shared between related
    volumes (volumes come in bundles of three: R/W, R/O and backup).

    and the following procedural changes are made:

    (1) The fileserver cursor introduced previously is now fleshed out and
    used to iterate over fileservers and their addresses.

    (2) Volume status is checked during iteration, and the server list is
    replaced if a change is detected.

    (3) Server status is checked during iteration, and the address list is
    replaced if a change is detected.

    (4) The abort code is saved into the address list cursor and -ECONNABORTED
    returned in afs_make_call() if a remote abort happened rather than
    translating the abort into an error message. This allows actions to
    be taken depending on the abort code more easily.

    (a) If a VMOVED abort is seen then this is handled by rechecking the
    volume and restarting the iteration.

    (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
    handled by sleeping for a short period and retrying and/or trying
    other servers that might serve that volume. A message is also
    displayed once until the condition has cleared.

    (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
    moment.

    (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
    see if it has been deleted; if not, the fileserver is probably
    indicating that the volume couldn't be attached and needs
    salvaging.

    (e) If statfs() sees one of these aborts, it does not sleep, but
    rather returns an error, so as not to block the umount program.

    (5) The fileserver iteration functions in vnode.c are now merged into
    their callers and more heavily macroised around the cursor. vnode.c
    is removed.

    (6) Operations on a particular vnode are serialised on that vnode because
    the server will lock that vnode whilst it operates on it, so a second
    op sent will just have to wait.

    (7) Fileservers are probed with FS.GetCapabilities before being used.
    This is where service upgrade will be done.

    (8) A callback interest on a fileserver is set up before an FS operation
    is performed and passed through to afs_make_call() so that it can be
    set on the vnode if the operation returns a callback. The callback
    interest is passed through to afs_iget() also so that it can be set
    there too.

    In general, record updating is done on an as-needed basis when we try to
    access servers, volumes or vnodes rather than offloading it to work items
    and special threads.

    Notes:

    (1) Pre AFS-3.4 servers are no longer supported, though this can be added
    back if necessary (AFS-3.4 was released in 1998).

    (2) VBUSY is retried forever for the moment at intervals of 1s.

    (3) /proc/fs/afs//servers no longer exists.

    Signed-off-by: David Howells

    David Howells
     
  • Overhaul the way that the in-kernel AFS client keeps track of cells in the
    following manner:

    (1) Cells are now held in an rbtree to make walking them quicker and RCU
    managed (though this is probably overkill).

    (2) Cells now have a manager work item that:

    (A) Looks after fetching and refreshing the VL server list.

    (B) Manages cell record lifetime, including initialising and
    destruction.

    (B) Manages cell record caching whereby threads are kept around for a
    certain time after last use and then destroyed.

    (C) Manages the FS-Cache index cookie for a cell. It is not permitted
    for a cookie to be in use twice, so we have to be careful to not
    allow a new cell record to exist at the same time as an old record
    of the same name.

    (3) Each AFS network namespace is given a manager work item that manages
    the cells within it, maintaining a single timer to prod cells into
    updating their DNS records.

    This uses the reduce_timer() facility to make the timer expire at the
    soonest timed event that needs happening.

    (4) When a module is being unloaded, cells and cell managers are now
    counted out using dec_after_work() to make sure the module text is
    pinned until after the data structures have been cleaned up.

    (5) Each cell's VL server list is now protected by a seqlock rather than a
    semaphore.

    Signed-off-by: David Howells

    David Howells
     
  • Overhaul permit caching in AFS by making it per-vnode and sharing permit
    lists where possible.

    When most of the fileserver operations are called, they return a status
    structure indicating the (revised) details of the vnode or vnodes involved
    in the operation. This includes the access mark derived from the ACL
    (named CallerAccess in the protocol definition file). This is cacheable
    and if the ACL changes, the server will tell us that it is breaking the
    callback promise, at which point we can discard the currently cached
    permits.

    With this patch, the afs_permits structure has, at the end, an array of
    { key, CallerAccess } elements, sorted by key pointer. This is then cached
    in a hash table so that it can be shared between vnodes with the same
    access permits.

    Permit lists can only be shared if they contain the exact same set of
    key->CallerAccess mappings.

    Note that that table is global rather than being per-net_ns. If the keys
    in a permit list cross net_ns boundaries, there is no problem sharing the
    cached permits, since the permits are just integer masks.

    Since permit lists pin keys, the permit cache also makes it easier for a
    future patch to find all occurrences of a key and remove them by means of
    setting the afs_permits::invalidated flag and then clearing the appropriate
    key pointer. In such an event, memory barriers will need adding.

    Lastly, the permit caching is skipped if the server has sent either a
    vnode-specific or an entire-server callback since the start of the
    operation.

    Signed-off-by: David Howells

    David Howells
     
  • Overhaul the AFS callback handling by the following means:

    (1) Don't give up callback promises on vnodes that we are no longer using,
    rather let them just expire on the server or let the server break
    them. This is actually more efficient for the server as the callback
    lookup is expensive if there are lots of extant callbacks.

    (2) Only give up the callback promises we have from a server when the
    server record is destroyed. Then we can just give up *all* the
    callback promises on it in one go.

    (3) Servers can end up being shared between cells if cells are aliased, so
    don't add all the vnodes being backed by a particular server into a
    big FID-indexed tree on that server as there may be duplicates.

    Instead have each volume instance (~= superblock) register an interest
    in a server as it starts to make use of it and use this to allow the
    processor for callbacks from the server to find the superblock and
    thence the inode corresponding to the FID being broken by means of
    ilookup_nowait().

    (4) Rather than iterating over the entire callback list when a mass-break
    comes in from the server, maintain a counter of mass-breaks in
    afs_server (cb_seq) and make afs_validate() check it against the copy
    in afs_vnode.

    It would be nice not to have to take a read_lock whilst doing this,
    but that's tricky without using RCU.

    (5) Save a ref on the fileserver we're using for a call in the afs_call
    struct so that we can access its cb_s_break during call decoding.

    (6) Write-lock around callback and status storage in a vnode and read-lock
    around getattr so that we don't see the status mid-update.

    This has the following consequences:

    (1) Data invalidation isn't seen until someone calls afs_validate() on a
    vnode. Unfortunately, we need to use a key to query the server, but
    getting one from a background thread is tricky without caching loads
    of keys all over the place.

    (2) Mass invalidation isn't seen until someone calls afs_validate().

    (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
    Could this be replaced with rcu_read_lock() since inodes are destroyed
    under RCU conditions.

    Signed-off-by: David Howells

    David Howells
     
  • Fix server reaping and make sure it's all done before we start trying to
    purge cells, given that servers currently pin cells.

    Signed-off-by: David Howells

    David Howells
     
  • Close the rxrpc socket only after we've purged the server records (and also
    cell and volume records which might refer to servers) so that we can give
    up the callbacks on each server.

    Signed-off-by: David Howells

    David Howells
     
  • Lay the groundwork for supporting network namespaces (netns) to the AFS
    filesystem by moving various global features to a network-namespace struct
    (afs_net) and providing an instance of this as a temporary global variable
    that everything uses via accessor functions for the moment.

    The following changes have been made:

    (1) Store the netns in the superblock info. This will be obtained from
    the mounter's nsproxy on a manual mount and inherited from the parent
    superblock on an automount.

    (2) The cell list is made per-netns. It can be viewed through
    /proc/net/afs/cells and also be modified by writing commands to that
    file.

    (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
    This is unset by default.

    (4) The 'rootcell' module parameter, which sets a cell and VL server list
    modifies the init net namespace, thereby allowing an AFS root fs to be
    theoretically used.

    (5) The volume location lists and the file lock manager are made
    per-netns.

    (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

    The various workqueues remain global for the moment.

    Changes still to be made:

    (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
    from the old name.

    (2) A per-netns subsys needs to be registered for AFS into which it can
    store its per-netns data.

    (3) Rather than the AF_RXRPC socket being opened on module init, it needs
    to be opened on the creation of a superblock in that netns.

    (4) The socket needs to be closed when the last superblock using it is
    destroyed and all outstanding client calls on it have been completed.
    This prevents a reference loop on the namespace.

    (5) It is possible that several namespaces will want to use AFS, in which
    case each one will need its own UDP port. These can either be set
    through /proc/net/afs/cm_port or the kernel can pick one at random.
    The init_ns gets 7001 by default.

    Other issues that need resolving:

    (1) The DNS keyring needs net-namespacing.

    (2) Where do upcalls go (eg. DNS request-key upcall)?

    (3) Need something like open_socket_in_file_ns() syscall so that AFS
    command line tools attempting to operate on an AFS file/volume have
    their RPC calls go to the right place.

    Signed-off-by: David Howells

    David Howells
     

05 Jun, 2017

1 commit

  • This essentially is a partial revert of commit ff548773
    ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into
    fs/afs as struct afs_uuid. It however keeps it as big endian structure
    so that we can use the normal uuid generation helpers when casting to/from
    struct afs_uuid.

    The V1 uuid intrepretation in struct form isn't really useful to the
    rest of the kernel, and not really compatible to it either, so move it
    back to AFS instead of polluting the global uuid.h.

    Signed-off-by: Christoph Hellwig
    Acked-by: David Howells

    Christoph Hellwig
     

11 Feb, 2017

2 commits

  • AFS uses a time based UUID to identify the host itself. This requires
    getting a timestamp which is currently done through the getnstimeofday()
    interface that we want to eventually get rid of.

    Instead of replacing it with a ktime-based interface, simply remove the
    entire function and use generate_random_uuid() instead, which has a v4
    ("completely random") UUID instead of the time-based one.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David Howells

    Arnd Bergmann
     
  • Move the afs_uuid struct to linux/uuid.h, rename it to uuid_v1 and change
    the u16/u32 fields to __be16/__be32 instead so that the structure can be
    cast to a 16-octet network-order buffer.

    Signed-off-by: David Howells
    Reviewed-by: Arnd Bergmann <arnd@arndb.de

    David Howells
     

09 Jan, 2017

1 commit

  • Add three tracepoints to the AFS filesystem:

    (1) The afs_recv_data tracepoint logs data segments that are extracted
    from the data received from the peer through afs_extract_data().

    (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
    coming in to an asynchronous call.

    (3) The afs_cb_call tracepoint logs incoming calls that have had their
    operation ID extracted and mapped into a supported cache manager
    service call.

    To make (3) work, the name strings in the afs_call_type struct objects have
    to be annotated with __tracepoint_string. This is done with the CM_NAME()
    macro.

    Further, the AFS call state enum needs a name so that it can be used to
    declare parameter types.

    Signed-off-by: David Howells

    David Howells
     

30 Aug, 2016

1 commit


30 Jul, 2014

1 commit


15 Jan, 2011

1 commit

  • flush_scheduled_work() is going away. afs needs to make sure all the
    works it has queued have finished before being unloaded and there can
    be arbitrary number of pending works. Add afs_wq and use it as the
    flush domain instead of the system workqueue.

    Also, convert cancel_delayed_work() + flush_scheduled_work() to
    cancel_delayed_work_sync() in afs_mntpt_kill_timer().

    Signed-off-by: Tejun Heo
    Signed-off-by: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

08 Aug, 2010

1 commit

  • Fix the module init error handling. There are a bunch of goto labels for
    aborting the init procedure at different points and just undoing what needs
    undoing - they aren't all in the right places, however.

    This can lead to an oops like the following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: [] destroy_workqueue+0x17/0xc0
    ...
    Modules linked in: kafs(+) dns_resolver rxkad af_rxrpc fscache

    Pid: 2171, comm: insmod Not tainted 2.6.35-cachefs+ #319 DG965RY/
    ...
    Process insmod (pid: 2171, threadinfo ffff88003ca6a000, task ffff88003dcc3050)
    ...
    Call Trace:
    [] afs_callback_update_kill+0x10/0x12 [kafs]
    [] afs_init+0x190/0x1ce [kafs]
    [] ? afs_init+0x0/0x1ce [kafs]
    [] do_one_initcall+0x59/0x14e
    [] sys_init_module+0x9c/0x1de
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

03 Apr, 2009

1 commit

  • The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
    through it any attached caches. The kAFS filesystem will use caching
    automatically if it's available.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     

16 Apr, 2008

1 commit


17 Jul, 2007

1 commit


22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

10 May, 2007

1 commit

  • Make some miscellaneous changes to the AFS filesystem:

    (1) Assert RCU barriers on module exit to make sure RCU has finished with
    callbacks in this module.

    (2) Correctly handle the AFS server returning a zero-length read.

    (3) Split out data zapping calls into one function (afs_zap_data).

    (4) Rename some afs_file_*() functions to afs_*() where they apply to
    non-regular files too.

    (5) Be consistent about the presentation of volume ID:vnode ID in debugging
    output.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

03 May, 2007

1 commit

  • Adjust the new netdevice scanning code provided by Patrick McHardy:

    (1) Restore the function banner comments that were dropped.

    (2) Rather than using an array size of 6 in some places and an array size of
    ETH_ALEN in others, pass a pointer instead and pass the array size
    through so that we can actually check it.

    (3) Do the buffer fill count check before checking the for_primary_ifa
    condition again. This permits us to skip that check should maxbufs be
    reached before we run out of interfaces.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

27 Apr, 2007

3 commits

  • Add support for the CB.GetCapabilities operation with which the fileserver can
    ask the client for the following information:

    (1) The list of network interfaces it has available as IPv4 address + netmask
    plus the MTUs.

    (2) The client's UUID.

    (3) The extended capabilities of the client, for which the only current one
    is unified error mapping (abort code interpretation).

    To support this, the patch adds the following routines to AFS:

    (1) A function to iterate through all the network interfaces using RTNETLINK
    to extract IPv4 addresses and MTUs.

    (2) A function to iterate through all the network interfaces using RTNETLINK
    to pull out the MAC address of the lowest index interface to use in UUID
    construction.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Clean up the AFS sources.

    Also remove references to AFS keys. RxRPC keys are used instead.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds