Eric Lee / smarc-fsl-linux-kernel

15 Sep, 2018

1 commit

d03360aaf pNFS: Ensure we return the error if someone kills a waiting layoutget ... Browse Code »

If someone interrupts a wait on one or more outstanding layoutgets in
pnfs_update_layout() then return the ERESTARTSYS/EINTR error.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-09-15 04:24:08 +0800

22 Aug, 2018

1 commit

0af4c8be9 pNFS: Remove unwanted optimisation of layoutget ... Browse Code »

If we knew that the file was empty, we wouldn't be asking for a layout.
Any optimisation here is already done before calling pnfs_update_layout().
As it stands, we sometimes end up doing an unnecessary inband read to
the MDS even when holding a layout.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-08-22 01:39:08 +0800

17 Aug, 2018

2 commits

ea51f94b4 pNFS: Treat RECALLCONFLICT like DELAY... ... Browse Code »

Yes, it is possible to get trapped in a loop, but the server should be
administratively revoking the recalled layout if it never gets returned.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-08-17 01:47:09 +0800
ecf840260 pNFS: When updating the stateid in layoutreturn, also update the recall range ... Browse Code »

When we update the layout stateid in nfs4_layoutreturn_refresh_stateid, we
should also update the range in order to let the server know we're actually
returning everything.

Fixes: 16c278dbfa63 ("pnfs: Fix handling of NFS4ERR_OLD_STATEID replies...")
Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-08-17 01:29:36 +0800

09 Aug, 2018

3 commits

10db5b7a2 pnfs: Use true and false for boolean values ... Browse Code »

Return statements in functions returning bool should use true or false
instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Anna Schumaker

Gustavo A. R. Silva
2018-08-09 04:50:03 +0800
2230ca0d2 pnfs: pnfs_find_lseg() should not check NFS_LSEG_LAYOUTRETURN ... Browse Code »

Layout segment validity is determined only by the NFS_LSEG_VALID flag. If
it is set, the layout segment is finable. As it is, when the flexfiles
driver sets NFS_LSEG_LAYOUTRETURN to indicate that we cannot discard
the layout segment, but that it must be returned, then this can result
in an unnecessary layoutget storm.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-08-09 04:50:03 +0800
c16467dc0 pnfs: Fix handling of NFS4ERR_OLD_STATEID replies to layoutreturn ... Browse Code »

If the server tells us that out layoutreturn raced with another layout
update, then we must ensure that the new layout segments are not in use
before we resend with an updated layout stateid.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2018-08-09 04:50:01 +0800

27 Jul, 2018

4 commits

af9b6d757 pNFS: Parse the results of layoutget on open even if permissions checks fail ... Browse Code »

Even if the results of the permissions checks failed, we should parse
the results of the layout on open call so that we can return the
layout if required.
Note that we also want to ignore the sequence counter for whether or not
a layout recall occurred. If the recall pertained to our OPEN, then the
callback will know, and will attempt to wait for us to finih processing
anyway.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-07-27 04:25:25 +0800
411ae722d pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout() ... Browse Code »

If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT
then we need to wait for all outstanding layoutget calls to complete
before we can send a new one.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-07-27 04:25:25 +0800
f0b429819 pNFS: Ignore non-recalled layouts in pnfs_layout_need_return() ... Browse Code »

If a layout has been recalled, then we should fire off a layoutreturn as
soon as all the layout segments that match the recall have been retired.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-07-27 04:25:25 +0800
e0b7d420f pNFS: Don't discard layout segments that are marked for return ... Browse Code »

If there are layout segments that are marked for return, then we need
to ensure that pnfs_mark_matching_lsegs_return() does not just
silently discard them, but it should tell the caller that there is a
layoutreturn scheduled.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-07-27 04:25:24 +0800

12 Jun, 2018

1 commit

93b7f7ad2 skip LAYOUTRETURN if layout is invalid ... Browse Code »

Currently, when IO to DS fails, client returns the layout and
retries against the MDS. However, then on umounting (inode eviction)
it returns the layout again.

This is because pnfs_return_layout() was changed in
commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR")
to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned
the layout, it will be returned again. Instead, let's also check
if we have already marked the layout invalid.

Signed-off-by: Olga Kornievskaia
Signed-off-by: Trond Myklebust

Olga Kornievskaia
2018-06-12 20:48:04 +0800

01 Jun, 2018

14 commits

32f1c28f3 pnfs: Don't call commit on failed layoutget-on-open ... Browse Code »

If the layoutget on open call failed, we can't really commit the inode,
so don't bother calling it.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-06-01 03:03:12 +0800
64294b08f pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached data ... Browse Code »

If we're only opening the file for reading, and the file is empty and/or
we already have cached data, then heuristically optimise away the
LAYOUTGET.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-06-01 03:03:12 +0800
8dc96566c NFSv4/pnfs: Don't switch off layoutget-on-open for transient errors ... Browse Code »

Ensure that we only switch off the LAYOUTGET operation in the OPEN
compound when the server is truly broken, and/or it is complaining
that the compound is too large.
Currently, we end up turning off the functionality permanently,
even for transient errors such as EACCES or ENOSPC.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-06-01 03:03:11 +0800
d49e0d5b9 NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised data ... Browse Code »

We need to ensure that pnfs_parse_lgopen() doesn't try to parse a
struct nfs4_layoutget_res that was not filled by a successful call
to decode_layoutget(). This can happen if we performed a cached open,
or if either the OP_ACCESS or OP_GETATTR operations preceding the
OP_LAYOUTGET in the compound returned an error.

By initialising the 'status' field to NFS4ERR_DELAY, we ensure that
pnfs_parse_lgopen() won't try to interpret the structure.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-06-01 03:03:11 +0800
30ae2412e pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGET ... Browse Code »

The flag was not always being cleared after LAYOUTGET on OPEN.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
c49b5209f pnfs: Add barrier to prevent lgopen using LAYOUTGET during recall ... Browse Code »

Since the LAYOUTGET on OPEN can be sent without prior inode information,
existing methods to prevent LAYOUTGET from being sent while processing
CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET
was being sent, and if so ignore the results.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
6e01260ce pnfs: Stop attempting LAYOUTGET on OPEN on failure ... Browse Code »

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
78746a384 pnfs: Add LAYOUTGET to OPEN of an existing file ... Browse Code »

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
29a8bfe52 pNFS: Refactor nfs4_layoutget_release() ... Browse Code »

Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c
where it can be reused by the layoutget on open code.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-06-01 03:03:11 +0800
2409a976a pnfs: Add LAYOUTGET to OPEN of a new file ... Browse Code »

This triggers when have no pre-existing inode to attach to.
The preexisting case is saved for later.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
5e36e2a94 pnfs: Change pnfs_alloc_init_layoutget_args call signature ... Browse Code »

Don't send in a layout, instead use the (possibly NULL) inode.

This is needed for LAYOUTGET attached to an OPEN where the inode is not
yet set.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
1b146fcff pnfs: Move nfs4_opendata into nfs4_fs.h ... Browse Code »

It will be needed now by the pnfs code.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
dacb452db pnfs: move allocations out of nfs4_proc_layoutget ... Browse Code »

They work better in the new alloc_init function.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800
587f03deb pnfs: refactor send_layoutget ... Browse Code »

Pull out the alloc/init part for eventual reuse by OPEN.

Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust

Fred Isaman
2018-06-01 03:03:11 +0800

09 Mar, 2018

1 commit

9c6376ebd pNFS: Prevent the layout header refcount going to zero in pnfs_roc() ... Browse Code »

Ensure that we hold a reference to the layout header when processing
the pNFS return-on-close so that the refcount value does not inadvertently
go to zero.

Reported-by: Tigran Mkrtchyan
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org # v4.10+
Tested-by: Tigran Mkrtchyan

Trond Myklebust
2018-03-09 01:56:31 +0800

15 Jan, 2018

2 commits

ba4a76f70 nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds ... Browse Code »

Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init. It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait. Once
that happens, other processes accessing the inode will become stuck as
well.

Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
up correctly by calling hdr->completion_ops->completion() instead of
calling hdr->release() directly.

This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.

This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
):

stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'

Suggested-by: Trond Myklebust
Signed-off-by: Scott Mayhew
Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust

Scott Mayhew
2018-01-15 12:06:29 +0800
b3dce6a2f pnfs/blocklayout: handle transient devices ... Browse Code »

PNFS block/SCSI layouts should gracefully handle cases where block devices
are not available when a layout is retrieved, or the block devices are
removed while the client holds a layout.

While setting up a layout segment, keep a record of an unavailable or
un-parsable block device in cache with a flag so that subsequent layouts do
not spam the server with GETDEVINFO. We can reuse the current
NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing
the device, we will discard it and send a fresh GETDEVINFO after the
timeout, since the lookup and validation of the device occurs within the
GETDEVINFO response handling.

A lookup of a layout segment that references an unavailable device will
return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow
the pgio layer to mark the layout with the appropriate fail bit, which
forces subsequent IO to the MDS, and prevents spamming the server with
LAYOUTGET, LAYOUTRETURN.

Finally, when IO to a block device fails, look up the block device(s)
referenced by the pgio header, and mark them as unavailable.

Signed-off-by: Benjamin Coddington
Signed-off-by: Trond Myklebust

Benjamin Coddington
2018-01-15 12:06:29 +0800

18 Nov, 2017

4 commits

7380020e7 pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close ... Browse Code »

If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID,
then try to update the stateid and retry. We know that there should
be no further LAYOUTGET requests being launched.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2017-11-18 05:43:47 +0800
6089dd0d7 NFS: Fix bool initialization/comparison ... Browse Code »

Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer
Signed-off-by: Anna Schumaker

Thomas Meyer
2017-11-18 05:43:43 +0800
2b28a7bee fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t ... Browse Code »

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook
Reviewed-by: David Windsor
Reviewed-by: Hans Liljestrand
Signed-off-by: Elena Reshetova
Signed-off-by: Anna Schumaker

Elena Reshetova
2017-11-18 02:47:59 +0800
eba6dd691 fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: Anna Schumaker

Elena Reshetova
2017-11-18 02:47:59 +0800

12 Sep, 2017

1 commit

70d2f7b1e pNFS: Use the standard I/O stateid when calling LAYOUTGET ... Browse Code »

Instead of having a private method for copying the open/delegation stateid,
use the same call that is used for standard I/O through the MDS.

Note that this means we transmit the stateid with a zero seqid, avoiding
issues with NFS4ERR_OLD_STATEID.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-09-12 10:19:00 +0800

09 Sep, 2017

1 commit

196639ebb NFS: Fix 2 use after free issues in the I/O code ... Browse Code »

The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust

Trond Myklebust
2017-09-09 10:07:52 +0800

15 Aug, 2017

1 commit

8205b9ce0 NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg() ... Browse Code »

Now that we no longer hold the inode->i_lock when manipulating the
commit lists, it is safe to call pnfs_put_lseg() again.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-08-15 23:54:48 +0800

24 May, 2017

1 commit

08cb5b0f0 pnfs: Fix the check for requests in range of layout segment ... Browse Code »

It's possible and acceptable for NFS to attempt to add requests beyond the
range of the current pgio->pg_lseg, a case which should be caught and
limited by the pg_test operation. However, the current handling of this
case replaces pgio->pg_lseg with a new layout segment (after a WARN) within
that pg_test operation. That will cause all the previously added requests
to be submitted with this new layout segment, which may not be valid for
those requests.

Fix this problem by only returning zero for the number of bytes to coalesce
from pg_test for this case which allows any previously added requests to
complete on the current layout segment. The check for requests starting
out of range of the layout segment moves to pg_init, so that the
replacement of pgio->pg_lseg will be done when the next request is added.

Signed-off-by: Benjamin Coddington
Signed-off-by: Trond Myklebust

Benjamin Coddington
2017-05-24 19:55:02 +0800

03 May, 2017

2 commits

61f454e30 pNFS: Fix a deadlock when coalescing writes and returning the layout ... Browse Code »

Consider the following deadlock:

Process P1 Process P2 Process P3
========== ========== ==========
lock_page(page)

lseg = pnfs_update_layout(inode)

lo = NFS_I(inode)->layout
pnfs_error_mark_layout_for_return(lo)

lock_page(page)

lseg = pnfs_update_layout(inode)

In this scenario,
- P1 has declared the layout to be in error, but P2 holds a reference to
a layout segment on that inode, so the layoutreturn is deferred.
- P2 is waiting for a page lock held by P3.
- P3 is asking for a new layout segment, but is blocked waiting
for the layoutreturn.

The fix is to ensure that pnfs_error_mark_layout_for_return() does
not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow
the latter to call LAYOUTGET so that it can make progress and unblock
P2.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-05-03 00:35:33 +0800
5466d2141 pNFS: Don't clear the layout return info if there are segments to return ... Browse Code »

In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout
return info if there are new segments queued for return due to, for
instance, a race between a LAYOUTRETURN and a failed I/O attempt.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-05-03 00:35:33 +0800

29 Apr, 2017

1 commit

1f18b82c3 pNFS: Ensure we commit the layout if it has been invalidated ... Browse Code »

If the layout is being invalidated on the server, then we must
invoke nfs_commit_inode() to ensure any commits to the DS get
cleared out.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-04-29 23:29:30 +0800