15 Sep, 2018

1 commit


22 Aug, 2018

1 commit

  • If we knew that the file was empty, we wouldn't be asking for a layout.
    Any optimisation here is already done before calling pnfs_update_layout().
    As it stands, we sometimes end up doing an unnecessary inband read to
    the MDS even when holding a layout.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

17 Aug, 2018

2 commits


09 Aug, 2018

3 commits


27 Jul, 2018

4 commits


12 Jun, 2018

1 commit

  • Currently, when IO to DS fails, client returns the layout and
    retries against the MDS. However, then on umounting (inode eviction)
    it returns the layout again.

    This is because pnfs_return_layout() was changed in
    commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR")
    to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned
    the layout, it will be returned again. Instead, let's also check
    if we have already marked the layout invalid.

    Signed-off-by: Olga Kornievskaia
    Signed-off-by: Trond Myklebust

    Olga Kornievskaia
     

01 Jun, 2018

14 commits


09 Mar, 2018

1 commit


15 Jan, 2018

2 commits

  • Currently when falling back to doing I/O through the MDS (via
    pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
    without releasing the reference taken on the dreq
    via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
    nfs_direct_pgio_init. It then takes another reference on the dreq via
    nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
    as a result the requester will become stuck in inode_dio_wait. Once
    that happens, other processes accessing the inode will become stuck as
    well.

    Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
    up correctly by calling hdr->completion_ops->completion() instead of
    calling hdr->release() directly.

    This can be reproduced (sometimes) by performing "storage failover
    takeover" commands on NetApp filer while doing direct I/O from a client.

    This can also be reproduced using SystemTap to simulate a failure while
    doing direct I/O from a client (from Dave Wysochanski
    ):

    stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'

    Suggested-by: Trond Myklebust
    Signed-off-by: Scott Mayhew
    Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails")
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Scott Mayhew
     
  • PNFS block/SCSI layouts should gracefully handle cases where block devices
    are not available when a layout is retrieved, or the block devices are
    removed while the client holds a layout.

    While setting up a layout segment, keep a record of an unavailable or
    un-parsable block device in cache with a flag so that subsequent layouts do
    not spam the server with GETDEVINFO. We can reuse the current
    NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing
    the device, we will discard it and send a fresh GETDEVINFO after the
    timeout, since the lookup and validation of the device occurs within the
    GETDEVINFO response handling.

    A lookup of a layout segment that references an unavailable device will
    return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow
    the pgio layer to mark the layout with the appropriate fail bit, which
    forces subsequent IO to the MDS, and prevents spamming the server with
    LAYOUTGET, LAYOUTRETURN.

    Finally, when IO to a block device fails, look up the block device(s)
    referenced by the pgio header, and mark them as unavailable.

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

18 Nov, 2017

4 commits

  • If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID,
    then try to update the stateid and retry. We know that there should
    be no further LAYOUTGET requests being launched.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Bool initializations should use true and false. Bool tests don't need
    comparisons.

    Signed-off-by: Thomas Meyer
    Signed-off-by: Anna Schumaker

    Thomas Meyer
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: Anna Schumaker

    Elena Reshetova
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Anna Schumaker

    Elena Reshetova
     

12 Sep, 2017

1 commit


09 Sep, 2017

1 commit

  • The writeback code wants to send a commit after processing the pages,
    which is why we want to delay releasing the struct path until after
    that's done.

    Also, the layout code expects that we do not free the inode before
    we've put the layout segments in pnfs_writehdr_free() and
    pnfs_readhdr_free()

    Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
    Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup")
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

15 Aug, 2017

1 commit


24 May, 2017

1 commit

  • It's possible and acceptable for NFS to attempt to add requests beyond the
    range of the current pgio->pg_lseg, a case which should be caught and
    limited by the pg_test operation. However, the current handling of this
    case replaces pgio->pg_lseg with a new layout segment (after a WARN) within
    that pg_test operation. That will cause all the previously added requests
    to be submitted with this new layout segment, which may not be valid for
    those requests.

    Fix this problem by only returning zero for the number of bytes to coalesce
    from pg_test for this case which allows any previously added requests to
    complete on the current layout segment. The check for requests starting
    out of range of the layout segment moves to pg_init, so that the
    replacement of pgio->pg_lseg will be done when the next request is added.

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

03 May, 2017

2 commits

  • Consider the following deadlock:

    Process P1 Process P2 Process P3
    ========== ========== ==========
    lock_page(page)

    lseg = pnfs_update_layout(inode)

    lo = NFS_I(inode)->layout
    pnfs_error_mark_layout_for_return(lo)

    lock_page(page)

    lseg = pnfs_update_layout(inode)

    In this scenario,
    - P1 has declared the layout to be in error, but P2 holds a reference to
    a layout segment on that inode, so the layoutreturn is deferred.
    - P2 is waiting for a page lock held by P3.
    - P3 is asking for a new layout segment, but is blocked waiting
    for the layoutreturn.

    The fix is to ensure that pnfs_error_mark_layout_for_return() does
    not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow
    the latter to call LAYOUTGET so that it can make progress and unblock
    P2.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout
    return info if there are new segments queued for return due to, for
    instance, a race between a LAYOUTRETURN and a failed I/O attempt.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

29 Apr, 2017

1 commit