Eric Lee / smarc-fsl-linux-kernel

30 Nov, 2017

1 commit

57f3c05d0 NFS: Revert "NFS: Move the flock open mode check into nfs_flock()" ... Browse Code »

commit fcfa447062b2061e11f68b846d61cbfe60d0d604 upstream.

Commit e12937279c8b "NFS: Move the flock open mode check into nfs_flock()"
changed NFSv3 behavior for flock() such that the open mode must match the
lock type, however that requirement shouldn't be enforced for flock().

Signed-off-by: Benjamin Coddington
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Benjamin Coddington
2017-11-30 16:40:46 +0800

12 Sep, 2017

1 commit

bf4b49059 NFS: various changes relating to reporting IO errors. ... Browse Code »

1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
They aren't used.

2/ Make nfs_context_set_write_error() a "static inline" in internal.h
so we can...

3/ Use nfs_context_set_write_error() instead of mapping_set_error()
if nfs_pageio_add_request() fails before sending any request.
NFS generally keeps errors in the open_context, not the mapping,
so this is more consistent.

4/ If filemap_write_and_write_range() reports any error, still
check ctx->error. The value in ctx->error is likely to be
more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is
cleared slightly earlier, before nfs_file_fsync_commit() is called,
rather than at the start of that function.

Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust

NeilBrown
2017-09-12 10:28:56 +0800

07 Sep, 2017

2 commits

e973b1a59 NFS: Sync the correct byte range during synchronous writes ... Browse Code »

Since commit 18290650b1c8 ("NFS: Move buffered I/O locking into
nfs_file_write()") nfs_file_write() has not flushed the correct byte
range during synchronous writes. generic_write_sync() expects that
iocb->ki_pos points to the right edge of the range rather than the
left edge.

To replicate the problem, open a file with O_DSYNC, have the client
write at increasing offsets, and then print the successful offsets.
Block port 2049 partway through that sequence, and observe that the
client application indicates successful writes in advance of what the
server received.

Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Jacob Strauss
Signed-off-by: Tarang Gupta
Tested-by: Tarang Gupta
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Trond Myklebust

tarangg@amazon.com
2017-09-07 23:07:13 +0800
779eafab0 NFS: flush data when locking a file to ensure cache coherence for mmap. ... Browse Code »

When a byte range lock (or flock) is taken out on an NFS file, the
validity of the cached data is checked and the inode is marked
NFS_INODE_INVALID_DATA. However the cached data isn't flushed from
the page cache.

This is sufficient for future read() requests or mmap() requests as
they call nfs_revalidate_mapping() which performs the flush if
necessary.

However an existing mapping is not affected. Accessing data through
that mapping will continue to return old data even though the inode is
marked NFS_INODE_INVALID_DATA.

This can easily be confirmed using the 'nfs' tool in
git://github.com/okirch/twopence-nfs.git
and running

nfs coherence FILENAME
on one client, and
nfs coherence -r FILENAME
on another client.

It appears that prior to Linux 2.6.0 this worked correctly.

However commit:

http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0

removed the call to inode_invalidate_pages() from nfs_zap_caches(). I
haven't tested this code, but inspection suggests that prior to this
commit, file locking would invalidate all inode pages.

This patch adds a call to nfs_revalidate_mapping() after a
successful SETLK so that invalid data is flushed. With this patch the
above test passes. To minimize impact (and possibly avoid a GETATTR
call) this only happens if the mapping might be mapped into
userspace.

Cc: Olaf Kirch
Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust

NeilBrown
2017-09-07 00:31:15 +0800

27 Jul, 2017

2 commits

6ba80d434 NFS: Optimize fallocate by refreshing mapping when needed. ... Browse Code »

posix_fallocate() will allocate space in an NFS file by considering
the last byte of every 4K block. If it is before EOF, it will read
the byte and if it is zero, a zero is written out. If it is after EOF,
the zero is unconditionally written.

For the blocks beyond EOF, if NFS believes its cache is valid, it will
expand these writes to write full pages, and then will merge the pages.
This results if (typically) 1MB writes. If NFS believes its cache is
not valid (particularly if NFS_INO_INVALID_DATA or
NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
send the individual 1-byte writes. This results in (typically) 256 times
as many RPC requests, and can be substantially slower.

Currently nfs_revalidate_mapping() is only used when reading a file or
mmapping a file, as these are times when the content needs to be
up-to-date. Writes don't generally need the cache to be up-to-date, but
writes beyond EOF can benefit, particularly in the posix_fallocate()
case.

So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
i.e. when there is a gap between the end of the file and the start of
the write. If the cache is thought to be out of date (as happens after
taking a file lock), this will cause a GETATTR, and the two flags
mentioned above will be cleared. With this, posix_fallocate() on a
newly locked file does not generate excessive tiny writes.

Signed-off-by: NeilBrown
Signed-off-by: Anna Schumaker

NeilBrown
2017-07-27 23:22:42 +0800
442ce0499 NFS: invalidate file size when taking a lock. ... Browse Code »

Prior to commit ca0daa277aca ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock. Since that commit it only invalidates the file content.

If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data. This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().

If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was. If EOF had moved, the pre-existing part of the file
will be over-written. Locking should have protected against this,
but it doesn't.

This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes. When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.

cc: stable@vger.kernel.org (v4.8+)
Fixes: ca0daa277aca ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NeilBrown
Signed-off-by: Anna Schumaker

NeilBrown
2017-07-27 23:22:42 +0800

27 Apr, 2017

1 commit

c373fff7b NFSv4: Don't special case "launder" ... Browse Code »

If the client receives a fatal server error from nfs_pageio_add_request(),
then we should always truncate the page on which the error occurred.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-04-27 01:03:04 +0800

21 Apr, 2017

2 commits

f30cb757f NFS: Always wait for I/O completion before unlock ... Browse Code »

NFS attempts to wait for read and write completion before unlocking in
order to ensure that the data returned was protected by the lock. When
this waiting is interrupted by a signal, the unlock may be skipped, and
messages similar to the following are seen in the kernel ring buffer:

[20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
[20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183
[20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185

For NFSv3, the missing unlock will cause the server to refuse conflicting
locks indefinitely. For NFSv4, the leftover lock will be removed by the
server after the lease timeout.

This patch fixes this issue by skipping the usual wait in
nfs_iocounter_wait if the FL_CLOSE flag is set when signaled. Instead, the
wait happens in the unlock RPC task on the NFS UOC rpc_waitqueue.

For NFSv3, use lockd's new nlmclnt_operations along with
nfs_async_iocounter_wait to defer NLM's unlock task until the lock
context's iocounter reaches zero.

For NFSv4, call nfs_async_iocounter_wait() directly from unlock's
current rpc_call_prepare.

Signed-off-by: Benjamin Coddington
Reviewed-by: Jeff Layton
Signed-off-by: Trond Myklebust

Benjamin Coddington
2017-04-21 22:45:01 +0800
e12937279 NFS: Move the flock open mode check into nfs_flock() ... Browse Code »

We only need to check lock exclusive/shared types against open mode when
flock() is used on NFS, so move it into the flock-specific path instead of
checking it for all locks.

Signed-off-by: Benjamin Coddington
Reviewed-by: Christoph Hellwig
Reviewed-by: Jeff Layton
Signed-off-by: Trond Myklebust

Benjamin Coddington
2017-04-21 22:45:00 +0800

25 Feb, 2017

1 commit

11bac8000 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ... Browse Code »

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Signed-off-by: Arnd Bergmann
Reviewed-by: Ross Zwisler
Cc: Theodore Ts'o
Cc: Darrick J. Wong
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-25 09:46:54 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

22 Dec, 2016

1 commit

bc1ecd626 Merge tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull more NFS client updates from Trond Myklebust:
"Highlights include:

- further attribute cache improvements to make revalidation more fine
grained

- NFSv4 locking improvements

Bugfixes:

- nfs4_fl_prepare_ds must be careful about reporting success in files
layout

- pNFS/flexfiles: Instead of marking a device inactive, remove it
from the cache"

* tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES
NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES
NFSv4: Place the GETATTR operation before the CLOSE
NFSv4: Also ask for attributes when downgrading to a READ-only state
NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()
pNFS: Return RW layouts on OPEN_DOWNGRADE
NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE
NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
NFSv4: ensure __nfs4_find_lock_state returns consistent result.
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
pNFS/flexfiles: delete deviceid, don't mark inactive
NFS: Clean up nfs_attribute_timeout()
NFS: Remove unused function nfs_revalidate_inode_rcu()
NFS: Fix and clean up the access cache validity checking
NFS: Only look at the change attribute cache state in nfs_weak_revalidate()
NFS: Clean up cache validity checking
NFS: Don't revalidate the file on close if we hold a delegation
NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN
NFSv4: Update the attribute cache info in update_changeattr

Linus Torvalds
2016-12-22 02:40:30 +0800

20 Dec, 2016

1 commit

61540bf6b NFS: Clean up cache validity checking ... Browse Code »

Consolidate the open-coded checking of NFS_I(inode)->cache_validity
into a couple of helper functions.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-12-20 06:29:35 +0800

18 Dec, 2016

1 commit

0110c350c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"In this pile:

- autofs-namespace series
- dedupe stuff
- more struct path constification"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features
ocfs2: charge quota for reflinked blocks
ocfs2: fix bad pointer cast
ocfs2: always unlock when completing dio writes
ocfs2: don't eat io errors during _dio_end_io_write
ocfs2: budget for extent tree splits when adding refcount flag
ocfs2: prohibit refcounted swapfiles
ocfs2: add newlines to some error messages
ocfs2: convert inode refcount test to a helper
simple_write_end(): don't zero in short copy into uptodate
exofs: don't mess with simple_write_{begin,end}
9p: saner ->write_end() on failing copy into non-uptodate page
fix gfs2_stuffed_write_end() on short copies
fix ceph_write_end()
nfs_write_end(): fix handling of short copies
vfs: refactor clone/dedupe_file_range common functions
fs: try to clone files first in vfs_copy_file_range
vfs: misc struct path constification
namespace.c: constify struct path passed to a bunch of primitives
quota: constify struct path in quota_on
...

Linus Torvalds
2016-12-18 10:44:00 +0800

10 Dec, 2016

1 commit

c0cf3ef5e nfs_write_end(): fix handling of short copies ... Browse Code »

What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied. As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2016-12-10 11:41:47 +0800

05 Dec, 2016

1 commit

9310b224f NFS: Fix incorrect size revalidation when holding a delegation ... Browse Code »

We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-12-05 07:08:40 +0800

14 Oct, 2016

1 commit

c4a86165d Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Highlights include:

Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid

Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...

Linus Torvalds
2016-10-14 12:28:20 +0800

06 Oct, 2016

1 commit

82c156f85 switch generic_file_splice_read() to use of ->read_iter() ... Browse Code »

... and kill the ->splice_read() instances that can be switched to it

Signed-off-by: Al Viro

Al Viro
2016-10-06 06:23:56 +0800

23 Sep, 2016

1 commit

75575ddf2 nfs: eliminate pointless and confusing do_vfs_lock wrappers ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Anna Schumaker

Jeff Layton
2016-09-23 01:56:04 +0800

20 Sep, 2016

1 commit

f844cd0d7 nfs: cover ->migratepage with CONFIG_MIGRATION ... Browse Code »

It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.

Signed-off-by: Chao Yu
Signed-off-by: Anna Schumaker

Chao Yu
2016-09-20 21:29:39 +0800

04 Sep, 2016

1 commit

c49edecd5 NFS: Fix error reporting in nfs_file_write() ... Browse Code »

When doing O_DSYNC writes, the actual write errors are reported through
generic_write_sync(), so we must test the result.

Reported-by: J. R. Okajima
Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Trond Myklebust

Trond Myklebust
2016-09-04 00:10:36 +0800

25 Jul, 2016

1 commit

362745268 Merge branch 'writeback' Browse Code »

Trond Myklebust
2016-07-25 05:08:31 +0800

20 Jul, 2016

1 commit

ce52914eb sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags ... Browse Code »

A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server. Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set. Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount. This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client. Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect. Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags. Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too. Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.

Signed-off-by: Scott Mayhew
Signed-off-by: Trond Myklebust

Scott Mayhew
2016-07-20 04:23:24 +0800

06 Jul, 2016

5 commits

9a773e7c8 NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... ... Browse Code »

Prevent filesystem freezes while handling the write page fault.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-07-06 07:11:08 +0800
f508d46ae NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() ... Browse Code »

We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-07-06 07:11:05 +0800
a5864c999 NFS: Do not serialise O_DIRECT reads and writes ... Browse Code »

Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-07-06 07:11:04 +0800
18290650b NFS: Move buffered I/O locking into nfs_file_write() ... Browse Code »

Preparation for the patch that de-serialises O_DIRECT reads and
writes.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-07-06 07:11:03 +0800
89698b24d NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-07-06 07:11:03 +0800

22 Jun, 2016

4 commits

4f52b6bb8 NFS: Don't call COMMIT in ->releasepage() ... Browse Code »

While COMMIT has the potential to free up a lot of memory that is being
taken by unstable writes, it isn't guaranteed to free up this particular
page. Also, calling fsync() on the server is expensive and so we want to
do it in a more controlled fashion, rather than have it triggered at
random by the VM.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-06-22 21:59:44 +0800
93761d986 NFS: Don't hold the inode lock across fsync() ... Browse Code »

Commits are no longer required to be serialised.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-06-22 21:59:43 +0800
6b56a8983 NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer ... Browse Code »

filemap_datawrite() and friends already deal just fine with livelock.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-06-22 21:59:42 +0800
ca0daa277 NFS: Cache aggressively when file is open for writing ... Browse Code »

Unless the user is using file locking, we must assume close-to-open
cache consistency when the file is open for writing. Adjust the
caching algorithm so that it does not clear the cache on out-of-order
writes and/or attribute revalidations.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-06-22 21:59:42 +0800

02 May, 2016

1 commit

c8b8e32d7 direct-io: eliminate the offset argument to ->direct_IO ... Browse Code »

Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2016-05-02 07:58:39 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

17 Mar, 2016

2 commits

95d9f6c3e nfs: remove nfs_inode_dio_wait ... Browse Code »

Just call inode_dio_wait directly instead of through a pointless wrapper.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2016-03-17 03:42:43 +0800
4ff79bc70 nfs: remove nfs4_file_fsync ... Browse Code »

The only difference to nfs_file_fsync is the call to pnfs_sync_inode. But
pnfs_sync_inode is just an inline that calls a pNFS layout driver method
if CONFIG_PNFS is designed, and thus can be called just fine from the core
NFS module.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2016-03-17 03:42:43 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

08 Jan, 2016

1 commit

210c7c175 NFS: Use wait_on_atomic_t() for unlock after readahead ... Browse Code »

The use of wait_on_atomic_t() for waiting on I/O to complete before
unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the
nfs_iocounter's flags member, and finally the nfs_iocounter altogether.
The count of I/O is moved to the lock context, and the counter
increment/decrement functions become simple enough to open-code.

Signed-off-by: Benjamin Coddington
[Trond: Fix up conflict with existing function nfs_wait_atomic_killable()]
Signed-off-by: Trond Myklebust

Benjamin Coddington
2016-01-08 07:42:51 +0800

05 Jan, 2016

1 commit

942e3d72a Merge branch 'pnfs_generic' ... Browse Code »

* pnfs_generic:
NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
NFSv4.1/pNFS: Fix a race in initiate_file_draining()
NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
NFS: Relax requirements in nfs_flush_incompatible
NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
NFS: Allow multiple commit requests in flight per file
NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
NFSv4: List stateid information in the callback tracepoints
NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
pNFS: If we have to delay the layout callback, mark the layout for return
NFSv4.1/pNFS: Add a helper to mark the layout as returned
pNFS: Ensure nfs4_layoutget_prepare returns the correct error

Trond Myklebust
2016-01-05 02:19:55 +0800

01 Jan, 2016

1 commit

af7cf0579 NFS: Allow multiple commit requests in flight per file ... Browse Code »

Allow synchronous RPC calls to wait for pending RPC calls to finish,
but also allow asynchronous ones to just fire off another commit.

With this patch, the xfstests generic/074 test completes in 226s
instead of 242s

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-01-01 02:53:48 +0800