09 Dec, 2020

1 commit

  • There's a memory leak in afs_parse_source() whereby multiple source=
    parameters overwrite fc->source in the fs_context struct without freeing
    the previously recorded source.

    Fix this by only permitting a single source parameter and rejecting with
    an error all subsequent ones.

    This was caught by syzbot with the kernel memory leak detector, showing
    something like the following trace:

    unreferenced object 0xffff888114375440 (size 32):
    comm "repro", pid 5168, jiffies 4294923723 (age 569.948s)
    backtrace:
    slab_post_alloc_hook+0x42/0x79
    __kmalloc_track_caller+0x125/0x16a
    kmemdup_nul+0x24/0x3c
    vfs_parse_fs_string+0x5a/0xa1
    generic_parse_monolithic+0x9d/0xc5
    do_new_mount+0x10d/0x15a
    do_mount+0x5f/0x8e
    __do_sys_mount+0xff/0x127
    do_syscall_64+0x2d/0x3a
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 13fcc6837049 ("afs: Add fs_context support")
    Reported-by: syzbot+86dc6632faaca40133ab@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    cc: Randy Dunlap
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Oct, 2020

1 commit

  • Pull afs updates from David Howells:
    "A collection of fixes to fix afs_cell struct refcounting, thereby
    fixing a slew of related syzbot bugs:

    - Fix the cell tree in the netns to use an rwsem rather than RCU.

    There seem to be some problems deriving from the use of RCU and a
    seqlock to walk the rbtree, but it's not entirely clear what since
    there are several different failures being seen.

    Changing things to use an rwsem instead makes it more robust. The
    extra performance derived from using RCU isn't necessary in this
    case since the only time we're looking up a cell is during mount or
    when cells are being manually added.

    - Fix the refcounting by splitting the usage counter into a memory
    refcount and an active users counter. The usage counter was doing
    double duty, keeping track of whether a cell is still in use and
    keeping track of when it needs to be destroyed - but this makes the
    clean up tricky. Separating these out simplifies the logic.

    - Fix purging a cell that has an alias. A cell alias pins the cell
    it's an alias of, but the alias is always later in the list. Trying
    to purge in a single pass causes rmmod to hang in such a case.

    - Fix cell removal. If a cell's manager is requeued whilst it's
    removing itself, the manager will run again and re-remove itself,
    causing problems in various places. Follow Hillf Danton's
    suggestion to insert a more terminal state that causes the manager
    to do nothing post-removal.

    In additional to the above, two other changes:

    - Add a tracepoint for the cell refcount and active users count. This
    helped with debugging the above and may be useful again in future.

    - Downgrade an assertion to a print when a still-active server is
    seen during purging. This was happening as a consequence of
    incomplete cell removal before the servers were cleaned up"

    * tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    afs: Don't assert on unpurgeable server records
    afs: Add tracing for cell refcount and active user count
    afs: Fix cell removal
    afs: Fix cell purging with aliases
    afs: Fix cell refcounting by splitting the usage counter
    afs: Fix rapid cell addition/removal by not using RCU on cells tree

    Linus Torvalds
     

16 Oct, 2020

3 commits

  • Add a tracepoint to log the cell refcount and active user count and pass in
    a reason code through various functions that manipulate these counters.

    Additionally, a helper function, afs_see_cell(), is provided to log
    interesting places that deal with a cell without actually doing any
    accounting directly.

    Signed-off-by: David Howells

    David Howells
     
  • Management of the lifetime of afs_cell struct has some problems due to the
    usage counter being used to determine whether objects of that type are in
    use in addition to whether anyone might be interested in the structure.

    This is made trickier by cell objects being cached for a period of time in
    case they're quickly reused as they hold the result of a setup process that
    may be slow (DNS lookups, AFS RPC ops).

    Problems include the cached root volume from alias resolution pinning its
    parent cell record, rmmod occasionally hanging and occasionally producing
    assertion failures.

    Fix this by splitting the count of active users from the struct reference
    count. Things then work as follows:

    (1) The cell cache keeps +1 on the cell's activity count and this has to
    be dropped before the cell can be removed. afs_manage_cell() tries to
    exchange the 1 to a 0 with the cells_lock write-locked, and if
    successful, the record is removed from the net->cells.

    (2) One struct ref is 'owned' by the activity count. That is put when the
    active count is reduced to 0 (final_destruction label).

    (3) A ref can be held on a cell whilst it is queued for management on a
    work queue without confusing the active count. afs_queue_cell() is
    added to wrap this.

    (4) The queue's ref is dropped at the end of the management. This is
    split out into a separate function, afs_manage_cell_work().

    (5) The root volume record is put after a cell is removed (at the
    final_destruction label) rather then in the RCU destruction routine.

    (6) Volumes hold struct refs, but aren't active users.

    (7) Both counts are displayed in /proc/net/afs/cells.

    There are some management function changes:

    (*) afs_put_cell() now just decrements the refcount and triggers the RCU
    destruction if it becomes 0. It no longer sets a timer to have the
    manager do this.

    (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease
    the active count. afs_unuse_cell() sets the management timer.

    (*) afs_queue_cell() is added to queue a cell with approprate refs.

    There are also some other fixes:

    (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL.

    (*) Make sure that candidate cells in lookups are properly destroyed
    rather than being simply kfree'd. This ensures the bits it points to
    are destroyed also.

    (*) afs_dec_cells_outstanding() is now called in cell destruction rather
    than at "final_destruction". This ensures that cell->net is still
    valid to the end of the destructor.

    (*) As a consequence of the previous two changes, move the increment of
    net->cells_outstanding that was at the point of insertion into the
    tree to the allocation routine to correctly balance things.

    Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
    Signed-off-by: David Howells

    David Howells
     
  • There are a number of problems that are being seen by the rapidly mounting
    and unmounting an afs dynamic root with an explicit cell and volume
    specified (which should probably be rejected, but that's a separate issue):

    What the tests are doing is to look up/create a cell record for the name
    given and then tear it down again without actually using it to try to talk
    to a server. This is repeated endlessly, very fast, and the new cell
    collides with the old one if it's not quick enough to reuse it.

    It appears (as suggested by Hillf Danton) that the search through the RB
    tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
    that it's not blocking the write_seqlock(), despite taking two passes at
    it. He suggested that the code should take a ref on the cell it's
    attempting to look at - but this shouldn't be necessary until we've
    compared the cell names. It's possible that I'm missing a barrier
    somewhere.

    However, using an RCU search for this is overkill, really - we only need to
    access the cell name in a few places, and they're places where we're may
    end up sleeping anyway.

    Fix this by switching to an R/W semaphore instead.

    Additionally, draw the down_read() call inside the function (renamed to
    afs_find_cell()) since all the callers were taking the RCU read lock (or
    should've been[*]).

    [*] afs_probe_cell_name() should have been, but that doesn't appear to be
    involved in the bug reports.

    The symptoms of this look like:

    general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
    KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
    ...
    RIP: 0010:strncasecmp lib/string.c:52 [inline]
    RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
    afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
    afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
    afs_parse_source fs/afs/super.c:290 [inline]
    ...

    Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
    Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    cc: Hillf Danton
    cc: syzkaller-bugs@googlegroups.com

    David Howells
     

25 Sep, 2020

1 commit

  • Set up a readahead size by default, as very few users have a good
    reason to change it. This means code, ecryptfs, and orangefs now
    set up the values while they were previously missing it, while ubifs,
    mtd and vboxsf manually set it to 0 to avoid readahead.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Acked-by: David Sterba [btrfs]
    Acked-by: Richard Weinberger [ubifs, mtd]
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Jun, 2020

5 commits

  • Fix afs_statfs() so that the value for f_bavail and f_bfree don't go
    "negative" if the number of blocks in use by a volume exceeds the max quota
    for that volume.

    Signed-off-by: David Howells

    David Howells
     
  • Reorganise afs_volume objects such that they're in a tree keyed on volume
    ID, rooted at on an afs_cell object rather than being in multiple trees,
    each of which is rooted on an afs_server object.

    afs_server structs become per-cell and acquire a pointer to the cell.

    The process of breaking a callback then starts with finding the server by
    its network address, following that to the cell and then looking up each
    volume ID in the volume tree.

    This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
    and allows those structs and the code for maintaining them to be simplified
    or removed.

    It does make a couple of things a bit more tricky, though:

    (1) Operations now start with a volume, not a server, so there can be more
    than one answer as to whether or not the server we'll end up using
    supports the FS.InlineBulkStatus RPC.

    (2) CB RPC operations that specify the server UUID. There's still a tree
    of servers by UUID on the afs_net struct, but the UUIDs in it aren't
    guaranteed unique.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to track the lifetime of the afs_volume struct.

    Signed-off-by: David Howells

    David Howells
     
  • Put in the first phase of cell alias detection. This part handles alias
    detection for cells that have root.cell volumes (which is expected to be
    likely).

    When a cell becomes newly active, it is probed for its root.cell volume,
    and if it has one, this volume is compared against other root.cell volumes
    to find out if the list of fileserver UUIDs have any in common - and if
    that's the case, do the address lists of those fileservers have any
    addresses in common. If they do, the new cell is adjudged to be an alias
    of the old cell and the old cell is used instead.

    Comparing is aided by the server list in struct afs_server_list being
    sorted in UUID order and the addresses in the fileserver address lists
    being sorted in address order.

    The cell then retains the afs_volume object for the root.cell volume, even
    if it's not mounted for future alias checking.

    This necessary because:

    (1) Whilst fileservers have UUIDs that are meant to be globally unique, in
    practice they are not because cells get cloned without changing the
    UUIDs - so afs_server records need to be per cell.

    (2) Sometimes the DNS is used to make cell aliases - but if we don't know
    they're the same, we may end up with multiple superblocks and multiple
    afs_server records for the same thing, impairing our ability to
    deliver callback notifications of third party changes

    (3) The fileserver RPC API doesn't contain the cell name, so it can't tell
    us which cell it's notifying and can't see that a change made to to
    one cell should notify the same client that's also accessed as the
    other cell.

    Reported-by: Jeffrey Altman
    Signed-off-by: David Howells

    David Howells
     
  • Turn the afs_operation struct into the main way that most fileserver
    operations are managed. Various things are added to the struct, including
    the following:

    (1) All the parameters and results of the relevant operations are moved
    into it, removing corresponding fields from the afs_call struct.
    afs_call gets a pointer to the op.

    (2) The target volume is made the main focus of the operation, rather than
    the target vnode(s), and a bunch of op->vnode->volume are made
    op->volume instead.

    (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
    in most operations. The vnode record (struct afs_vnode_param)
    contains:

    - The vnode pointer.

    - The fid of the vnode to be included in the parameters or that was
    returned in the reply (eg. FS.MakeDir).

    - The status and callback information that may be returned in the
    reply about the vnode.

    - Callback break and data version tracking for detecting
    simultaneous third-parth changes.

    (4) Pointers to dentries to be updated with new inodes.

    (5) An operations table pointer. The table includes pointers to functions
    for issuing AFS and YFS-variant RPCs, handling the success and abort
    of an operation and handling post-I/O-lock local editing of a
    directory.

    To make this work, the following function restructuring is made:

    (A) The rotation loop that issues calls to fileservers that can be found
    in each function that wants to issue an RPC (such as afs_mkdir()) is
    extracted out into common code, in a new file called fs_operation.c.

    (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
    a much smaller piece of code that allocates an operation, sets the
    parameters and then calls out to the common code to do the actual
    work.

    (C) The code for handling the success and failure of an operation are
    moved into operation functions (as (5) above) and these are called
    from the core code at appropriate times.

    (D) The pseudo inode getting stuff used by the dynamic root code is moved
    over into dynroot.c.

    (E) struct afs_iget_data is absorbed into the operation struct and
    afs_iget() expects to be given an op pointer and a vnode record.

    (F) Point (E) doesn't work for the root dir of a volume, but we know the
    FID in advance (it's always vnode 1, unique 1), so a separate inode
    getter, afs_root_iget(), is provided to special-case that.

    (G) The inode status init/update functions now also take an op and a vnode
    record.

    (H) The RPC marshalling functions now, for the most part, just take an
    afs_operation struct as their only argument. All the data they need
    is held there. The result delivery functions write their answers
    there as well.

    (I) The call is attached to the operation and then the operation core does
    the waiting.

    And then the new operation code is, for the moment, made to just initialise
    the operation, get the appropriate vnode I/O locks and do the same rotation
    loop as before.

    This lays the foundation for the following changes in the future:

    (*) Overhauling the rotation (again).

    (*) Support for asynchronous I/O, where the fileserver rotation must be
    done asynchronously also.

    Signed-off-by: David Howells

    David Howells
     

31 May, 2020

1 commit


08 Feb, 2020

2 commits


07 Feb, 2020

2 commits


12 Dec, 2019

1 commit

  • Fix missing cell comparison in afs_test_super(). Without this, any pair
    volumes that have the same volume ID will share a superblock, no matter the
    cell, unless they're in different network namespaces.

    Normally, most users will only deal with a single cell and so they won't
    see this. Even if they do look into a second cell, they won't see a
    problem unless they happen to hit a volume with the same ID as one they've
    already got mounted.

    Before the patch:

    # ls /afs/grand.central.org/archive
    linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/
    # ls /afs/kth.se/
    linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/
    # cat /proc/mounts | grep afs
    none /afs afs rw,relatime,dyn,autocell 0 0
    #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0

    After the patch:

    # ls /afs/grand.central.org/archive
    linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/
    # ls /afs/kth.se/
    admin/ common/ install/ OldFiles/ service/ system/
    bakrestores/ home/ misc/ pkg/ src/ wsadmin/
    # cat /proc/mounts | grep afs
    none /afs afs rw,relatime,dyn,autocell 0 0
    #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0
    #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0

    Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: Carsten Jacobi
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    Tested-by: Jonathan Billings
    cc: Todd DeSantis

    David Howells
     

10 Dec, 2019

1 commit

  • Make the AFS dynamic root superblock R/W so that SELinux can set the
    security label on it. Without this, upgrades to, say, the Fedora
    filesystem-afs RPM fail if afs is mounted on it because the SELinux label
    can't be (re-)applied.

    It might be better to make it possible to bypass the R/O check for LSM
    label application through setxattr.

    Fixes: 4d673da14533 ("afs: Support the AFS dynamic root")
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    cc: selinux@vger.kernel.org
    cc: linux-security-module@vger.kernel.org

    David Howells
     

23 Nov, 2019

1 commit

  • By default s_maxbytes is set to MAX_NON_LFS, which limits the usable
    file size to 2GB, enforced by the vfs.

    Commit b9b1f8d5930a ("AFS: write support fixes") added support for the
    64-bit fetch and store server operations, but did not change this value.
    As a result, attempts to write past the 2G mark result in EFBIG errors:

    $ dd if=/dev/zero of=foo bs=1M count=1 seek=2048
    dd: error writing 'foo': File too large

    Set s_maxbytes to MAX_LFS_FILESIZE.

    Fixes: b9b1f8d5930a ("AFS: write support fixes")
    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Marc Dionne
     

17 May, 2019

3 commits

  • Pass the server and volume break counts from before the status fetch
    operation that queried the attributes of a file into afs_iget5_set() so
    that the new vnode's break counters can be initialised appropriately.

    This allows detection of a volume or server break that happened whilst we
    were fetching the status or setting up the vnode.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells

    David Howells
     
  • Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
    vnode->cb_interest. Use that change to allow afs_check_validity() to use
    read_seqbegin_or_lock() instead of read_seqlock_excl().

    This also requires the caller of afs_check_validity() to hold the RCU read
    lock across the call.

    Signed-off-by: David Howells

    David Howells
     
  • Don't save callback version and type fields as the version is about the
    format of the callback information and the type is relative to the
    particular RPC call.

    Signed-off-by: David Howells

    David Howells
     

16 May, 2019

2 commits

  • When applying the status and callback in the response of an operation,
    apply them in the same critical section so that there's no race between
    checking the callback state and checking status-dependent state (such as
    the data version).

    Fix this by:

    (1) Allocating a joint {status,callback} record (afs_status_cb) before
    calling the RPC function for each vnode for which the RPC reply
    contains a status or a status plus a callback. A flag is set in the
    record to indicate if a callback was actually received.

    (2) These records are passed into the RPC functions to be filled in. The
    afs_decode_status() and yfs_decode_status() functions are removed and
    the cb_lock is no longer taken.

    (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
    update the vnode.

    (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
    the vnode.

    (5) vnodes, expected data-version numbers and callback break counters
    (cb_break) no longer need to be passed to the reply delivery
    functions.

    Note that, for the moment, the file locking functions still need
    access to both the call and the vnode at the same time.

    (6) afs_vnode_commit_status() is now given the cb_break value and the
    expected data_version and the task of applying the status and the
    callback to the vnode are now done here.

    This is done under a single taking of vnode->cb_lock.

    (7) afs_pages_written_back() is now called by afs_store_data() rather than
    by the reply delivery function.

    afs_pages_written_back() has been moved to before the call point and
    is now given the first and last page numbers rather than a pointer to
    the call.

    (8) The indicator from YFS.RemoveFile2 as to whether the target file
    actually got removed (status.abort_code == VNOVNODE) rather than
    merely dropping a link is now checked in afs_unlink rather than in
    xdr_decode_YFSFetchStatus().

    Supplementary fixes:

    (*) afs_cache_permit() now gets the caller_access mask from the
    afs_status_cb object rather than picking it out of the vnode's status
    record. afs_fetch_status() returns caller_access through its argument
    list for this purpose also.

    (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
    than a read lock and now sets the callback inside the same critical
    section.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells

    David Howells
     
  • Make certain RPC operations non-interruptible, including:

    (*) Set attributes
    (*) Store data

    We don't want to get interrupted during a flush on close, flush on
    unlock, writeback or an inode update, leaving us in a state where we
    still need to do the writeback or update.

    (*) Extend lock
    (*) Release lock

    We don't want to get lock extension interrupted as the file locks on
    the server are time-limited. Interruption during lock release is less
    of an issue since the lock is time-limited, but it's better to
    complete the release to avoid a several-minute wait to recover it.

    *Setting* the lock isn't a problem if it's interrupted since we can
    just return to the user and tell them they were interrupted - at
    which point they can elect to retry.

    (*) Silly unlink

    We want to remove silly unlink files if we can, rather than leaving
    them for the salvager to clear up.

    Note that whilst these calls are no longer interruptible, they do have
    timeouts on them, so if the server stops responding the call will fail with
    something like ETIME or ECONNRESET.

    Without this, the following:

    kAFS: Unexpected error from FS.StoreData -512

    appears in dmesg when a pending store data gets interrupted and some
    processes may just hang.

    Additionally, make the code that checks/updates the server record ignore
    failure due to interruption if the main call is uninterruptible and if the
    server has an address list. The next op will check it again since the
    expiration time on the old list has past.

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Reported-by: Jonathan Billings
    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells
     

08 May, 2019

1 commit

  • Pull AFS updates from David Howells:
    "A set of fix and development patches for AFS for 5.2.

    Summary:

    - Fix the AFS file locking so that sqlite can run on an AFS mount and
    also so that firefox and gnome can use a homedir that's mounted
    through AFS.

    This required emulation of fine-grained locking when the server
    will only support whole-file locks and no upgrade/downgrade. Four
    modes are provided, settable by mount parameter:

    "flock=local" - No reference to the server

    "flock=openafs" - Fine-grained locks are local-only, whole-file
    locks require sufficient server locks

    "flock=strict" - All locks require sufficient server locks

    "flock=write" - Always get an exclusive server lock

    If the volume is a read-only or backup volume, then flock=local for
    that volume.

    - Log extra information for a couple of cases where the client mucks
    up somehow: AFS vnode with undefined type and dir check failure -
    in both cases we seem to end up with unfilled data, but the issues
    happen infrequently and are difficult to reproduce at will.

    - Implement silly rename for unlink() and rename().

    - Set i_blocks so that du can get some information about usage.

    - Fix xattr handlers to return the right amount of data and to not
    overflow buffers.

    - Implement getting/setting raw AFS and YFS ACLs as xattrs"

    * tag 'afs-next-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    afs: Implement YFS ACL setting
    afs: Get YFS ACLs and information through xattrs
    afs: implement acl setting
    afs: Get an AFS3 ACL as an xattr
    afs: Fix getting the afs.fid xattr
    afs: Fix the afs.cell and afs.volume xattr handlers
    afs: Calculate i_blocks based on file size
    afs: Log more information for "kAFS: AFS vnode with undefined type\n"
    afs: Provide mount-time configurable byte-range file locking emulation
    afs: Add more tracepoints
    afs: Implement sillyrename for unlink and rename
    afs: Add directory reload tracepoint
    afs: Handle lock rpc ops failing on a file that got deleted
    afs: Improve dir check failure reports
    afs: Add file locking tracepoints
    afs: Further fix file locking
    afs: Fix AFS file locking to allow fine grained locks
    afs: Calculate lock extend timer from set/extend reply reception
    afs: Split wait from afs_make_call()

    Linus Torvalds
     

07 May, 2019

1 commit


02 May, 2019

1 commit

  • debugging printks left in ->destroy_inode() and so's the
    update of inode count; we could take the latter to RCU-delayed
    part (would take only moving the check on module exit past
    rcu_barrier() there), but debugging output ought to either
    stay where it is or go into ->evict_inode()

    Signed-off-by: Al Viro

    Al Viro
     

25 Apr, 2019

3 commits

  • Provide byte-range file locking emulation that can be configured at mount
    time to one of four modes:

    (1) flock=local. Locking is done locally only and no reference is made to
    the server.

    (2) flock=openafs. Byte-range locking is done locally only; whole-file
    locking is done with reference to the server. Whole-file locks cannot
    be upgraded unless the client holds an exclusive lock.

    (3) flock=strict. Byte-range and whole-file locking both require a
    sufficient whole-file lock on the server.

    (4) flock=write. As strict, but the client always gets an exclusive
    whole-file lock on the server.

    Signed-off-by: David Howells

    David Howells
     
  • Add four more tracepoints:

    (1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
    to log also.

    (2) afs_make_fs_call2 - Like the above but takes two filenames to log.

    (3) afs_lookup - Log the result of doing a successful lookup, including a
    negative result (fid 0:0).

    (4) afs_get_tree - Log the set up of a volume for mounting.

    It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
    and puts quotes around the filename in the text representation.

    Signed-off-by: David Howells

    David Howells
     
  • Implement sillyrename for AFS unlink and rename, using the NFS variant
    implementation as a basis.

    Note that the asynchronous file locking extender/releaser has to be
    notified with a state change to stop it complaining if there's a race
    between that and the actual file deletion.

    A tracepoint, afs_silly_rename, is also added to note the silly rename and
    the cleanup. The afs_edit_dir tracepoint is given some extra reason
    indicators and the afs_flock_ev tracepoint is given a silly-delete file
    lock cancellation indicator.

    Signed-off-by: David Howells

    David Howells
     

13 Mar, 2019

2 commits

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     
  • All users of VM_MAX_READAHEAD actually convert it to kbytes and then to
    pages. Define the macro explicitly as (SZ_128K / PAGE_SIZE). This
    simplifies the expression in every filesystem. Also rename the macro to
    VM_READAHEAD_PAGES to properly convey its meaning. Finally remove unused
    VM_MIN_READAHEAD

    [akpm@linux-foundation.org: fix fs/io_uring.c, per Stephen]
    Link: http://lkml.kernel.org/r/20181221144053.24318-1-nborisov@suse.com
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Matthew Wilcox
    Reviewed-by: David Hildenbrand
    Cc: Jens Axboe
    Cc: Eric Van Hensbergen
    Cc: Latchesar Ionkov
    Cc: Dominique Martinet
    Cc: David Howells
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Cc: Miklos Szeredi
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolay Borisov
     

28 Feb, 2019

2 commits

  • Alter the AFS automounting code to create and modify an fs_context struct
    when parameterising a new mount triggered by an AFS mountpoint rather than
    constructing device name and option strings.

    Also remove the cell=, vol= and rwpath options as they are then redundant.
    The reason they existed is because the 'device name' may be derived
    literally from a mountpoint object in the filesystem, so default cell and
    parent-type information needed to be passed in by some other method from
    the automount routines. The vol= option didn't end up being used.

    Signed-off-by: David Howells
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Add fs_context support to the AFS filesystem, converting the parameter
    parsing to store options there.

    This will form the basis for namespace propagation over mountpoints within
    the AFS model, thereby allowing AFS to be used in containers more easily.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

24 Oct, 2018

1 commit

  • Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
    number equivalent) to 96 bits to allow the support of YFS.

    This requires the iget comparator to check the vnode->fid rather than i_ino
    and i_generation as i_ino is not sufficiently capacious. It also requires
    this data to be placed into the vnode cache key for fscache.

    For the moment, just discard the top 32 bits of the vnode ID when returning
    it though stat.

    Signed-off-by: David Howells

    David Howells
     

15 Jun, 2018

1 commit

  • Alter the dynroot mount so that cells created by manipulation of
    /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
    cell as a module parameter will cause directories for those cells to be
    created in the dynamic root superblock for the network namespace[*].

    To this end:

    (1) Only one dynamic root superblock is now created per network namespace
    and this is shared between all attempts to mount it. This makes it
    easier to find the superblock to modify.

    (2) When a dynamic root superblock is created, the list of cells is walked
    and directories created for each cell already defined.

    (3) When a new cell is added, if a dynamic root superblock exists, a
    directory is created for it.

    (4) When a cell is destroyed, the directory is removed.

    (5) These directories are created by calling lookup_one_len() on the root
    dir which automatically creates them if they don't exist.

    [*] Inasmuch as network namespaces are currently supported here.

    Signed-off-by: David Howells

    David Howells
     

03 Jun, 2018

1 commit


23 May, 2018

1 commit


14 May, 2018

2 commits

  • It's possible for an AFS file server to issue a whole-volume notification
    that callbacks on all the vnodes in the file have been broken. This is
    done for R/O and backup volumes (which don't have per-file callbacks) and
    for things like a volume being taken offline.

    Fix callback handling to detect whole-volume notifications, to track it
    across operations and to check it during inode validation.

    Fixes: c435ee34551e ("afs: Overhaul the callback handling")
    Signed-off-by: David Howells

    David Howells
     
  • The afs directory loading code (primarily afs_read_dir()) locks all the
    pages that hold a directory's content blob to defend against
    getdents/getdents races and getdents/lookup races where the competitors
    issue conflicting reads on the same data. As the reads will complete
    consecutively, they may retrieve different versions of the data and
    one may overwrite the data that the other is busy parsing.

    Fix this by not locking the pages at all, but rather by turning the
    validation lock into an rwsem and getting an exclusive lock on it whilst
    reading the data or validating the attributes and a shared lock whilst
    parsing the data. Sharing the attribute validation lock should be fine as
    the data fetch will retrieve the attributes also.

    The individual page locks aren't needed at all as the only place they're
    being used is to serialise data loading.

    Without this patch, the:

    if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
    ...
    }

    part of afs_read_dir() may be skipped, leaving the pages unlocked when we
    hit the success: clause - in which case we try to unlock the not-locked
    pages, leading to the following oops:

    page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
    flags: 0xfffe000001004(referenced|private)
    raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
    raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffff98156b27c000
    ------------[ cut here ]------------
    kernel BUG at mm/filemap.c:1205!
    ...
    RIP: 0010:unlock_page+0x43/0x50
    ...
    Call Trace:
    afs_dir_iterate+0x789/0x8f0 [kafs]
    ? _cond_resched+0x15/0x30
    ? kmem_cache_alloc_trace+0x166/0x1d0
    ? afs_do_lookup+0x69/0x490 [kafs]
    ? afs_do_lookup+0x101/0x490 [kafs]
    ? key_default_cmp+0x20/0x20
    ? request_key+0x3c/0x80
    ? afs_lookup+0xf1/0x340 [kafs]
    ? __lookup_slow+0x97/0x150
    ? lookup_slow+0x35/0x50
    ? walk_component+0x1bf/0x490
    ? path_lookupat.isra.52+0x75/0x200
    ? filename_lookup.part.66+0xa0/0x170
    ? afs_end_vnode_operation+0x41/0x60 [kafs]
    ? __check_object_size+0x9c/0x171
    ? strncpy_from_user+0x4a/0x170
    ? vfs_statx+0x73/0xe0
    ? __do_sys_newlstat+0x39/0x70
    ? __x64_sys_getdents+0xc9/0x140
    ? __x64_sys_getdents+0x140/0x140
    ? do_syscall_64+0x5b/0x160
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells