Eric Lee / smarc-fsl-linux-kernel

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

17 Mar, 2016

2 commits

95d9f6c3e nfs: remove nfs_inode_dio_wait ... Browse Code »

Just call inode_dio_wait directly instead of through a pointless wrapper.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2016-03-17 03:42:43 +0800
4ff79bc70 nfs: remove nfs4_file_fsync ... Browse Code »

The only difference to nfs_file_fsync is the call to pnfs_sync_inode. But
pnfs_sync_inode is just an inline that calls a pNFS layout driver method
if CONFIG_PNFS is designed, and thus can be called just fine from the core
NFS module.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2016-03-17 03:42:43 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

08 Jan, 2016

1 commit

210c7c175 NFS: Use wait_on_atomic_t() for unlock after readahead ... Browse Code »

The use of wait_on_atomic_t() for waiting on I/O to complete before
unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the
nfs_iocounter's flags member, and finally the nfs_iocounter altogether.
The count of I/O is moved to the lock context, and the counter
increment/decrement functions become simple enough to open-code.

Signed-off-by: Benjamin Coddington
[Trond: Fix up conflict with existing function nfs_wait_atomic_killable()]
Signed-off-by: Trond Myklebust

Benjamin Coddington
2016-01-08 07:42:51 +0800

05 Jan, 2016

1 commit

942e3d72a Merge branch 'pnfs_generic' ... Browse Code »

* pnfs_generic:
NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
NFSv4.1/pNFS: Fix a race in initiate_file_draining()
NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
NFS: Relax requirements in nfs_flush_incompatible
NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
NFS: Allow multiple commit requests in flight per file
NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
NFSv4: List stateid information in the callback tracepoints
NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
pNFS: If we have to delay the layout callback, mark the layout for return
NFSv4.1/pNFS: Add a helper to mark the layout as returned
pNFS: Ensure nfs4_layoutget_prepare returns the correct error

Trond Myklebust
2016-01-05 02:19:55 +0800

01 Jan, 2016

1 commit

af7cf0579 NFS: Allow multiple commit requests in flight per file ... Browse Code »

Allow synchronous RPC calls to wait for pending RPC calls to finish,
but also allow asynchronous ones to just fire off another commit.

With this patch, the xfstests generic/074 test completes in 226s
instead of 242s

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-01-01 02:53:48 +0800

29 Dec, 2015

1 commit

d6c843b96 nfs: only remove page from mapping if launder_page fails ... Browse Code »

Instead of dropping pages when write fails, only do it when
we get fatal failure in launder_page write back.

Signed-off-by: Peng Tao
Signed-off-by: Trond Myklebust

Peng Tao
2015-12-29 03:32:38 +0800

07 Nov, 2015

1 commit

d0164adc8 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep an… ... Browse Code »

…d avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2015-11-07 09:50:42 +0800

23 Oct, 2015

1 commit

4f6563677 Move locks API users to locks_lock_inode_wait() ... Browse Code »

Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.

Signed-off-by: Benjamin Coddington
Signed-off-by: Jeff Layton

Benjamin Coddington
2015-10-23 02:57:36 +0800

08 Sep, 2015

1 commit

5445b1fbd NFSv4: Respect the server imposed limit on how many changes we may cache ... Browse Code »

The NFSv4 delegation spec allows the server to tell a client to limit how
much data it cache after the file is closed. In return, the server
guarantees enough free space to avoid ENOSPC situations, etc.
Prior to this patch, we assumed we could always cache aggressively after
close. Unfortunately, this causes problems with servers that set the
limit to 0 and therefore do not offer any ENOSPC guarantees.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-09-08 00:36:17 +0800

18 Aug, 2015

2 commits

7e94d6c4a NFS: Don't fsync twice for O_SYNC/IS_SYNC files ... Browse Code »

generic_file_write_iter() will already do an fsync on our behalf
if the file descriptor is O_SYNC or the file is marked as IS_SYNC.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-08-18 05:55:18 +0800
aff8d8dc4 NFS: Remove nfs_release() ... Browse Code »

And call nfs_file_clear_open_context() directly. This makes it obvious
that nfs_file_release() will always return 0.

Signed-off-by: Anna Schumaker
Signed-off-by: Trond Myklebust

Anna Schumaker
2015-08-18 02:32:56 +0800

11 Jun, 2015

1 commit

3c87ef6ef sunrpc: keep a count of swapfiles associated with the rpc_clnt ... Browse Code »

Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.

To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.

Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.

Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.

This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.

Acked-by: Mel Gorman
Reported-by: Jerome Marchand
Signed-off-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Jeff Layton
2015-06-11 06:26:14 +0800

27 Apr, 2015

1 commit

59953fba8 Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.

Highlights include:

Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping

Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()

Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"

* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...

Linus Torvalds
2015-04-27 08:33:59 +0800

16 Apr, 2015

1 commit

65a4a1cad nfs: generic_write_checks() shouldn't be done on swapout... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-04-16 03:04:27 +0800

12 Apr, 2015

3 commits

2ba48ce51 mirror O_APPEND and O_DIRECT into iocb->ki_flags ... Browse Code »

... avoiding write_iter/fcntl races.

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:30:22 +0800
5d5d56897 make new_sync_{read,write}() static ... Browse Code »

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:29:40 +0800
c0fec3a98 Merge branch 'iocb' into for-next Browse Code »

Al Viro
2015-04-12 10:24:41 +0800

28 Mar, 2015

2 commits

a0815d556 NFSv4.1/pnfs: Ensure that writes respect the O_SYNC flag when doing O_DIRECT ... Browse Code »

If the caller does not specify the O_SYNC flag, then it is legitimate
to return from O_DIRECT without doing a pNFS layoutcommit operation.
However if the file is opened O_DIRECT|O_SYNC then we'd better get it
right.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-28 00:39:37 +0800
d9dabc1a0 NFS: File unlock needs to be a metadata synchronisation point ... Browse Code »

File unlock needs to update both data and metadata on the NFS server
in order to act as a synchronisation point for other clients.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-28 00:39:37 +0800

26 Mar, 2015

1 commit

e2e40f2c1 fs: move struct kiocb to fs.h ... Browse Code »

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2015-03-26 08:28:11 +0800

04 Mar, 2015

2 commits

ef070dcb3 NFS: Don't write enable new pages while an invalidation is proceeding ... Browse Code »

nfs_vm_page_mkwrite() should wait until the page cache invalidation
is finished. This is the second patch in a 2 patch series to deprecate
the NFS client's reliance on nfs_release_page() in the context of
nfs_invalidate_mapping().

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-04 02:58:08 +0800
874f94637 NFS: Fix a regression in the read() syscall ... Browse Code »

When invalidating the page cache for a regular file, we want to first
sync all dirty data to disk and then call invalidate_inode_pages2().
The latter relies on nfs_launder_page() and nfs_release_page() to deal
respectively with dirty pages, and unstable written pages.

When commit 9590544694bec ("NFS: avoid deadlocks with loop-back mounted
NFS filesystems.") changed the behaviour of nfs_release_page(), then it
made it possible for invalidate_inode_pages2() to fail with an EBUSY.
Unfortunately, that error is then propagated back to read().

Let's therefore work around the problem for now by protecting the call
to sync the data and invalidate_inode_pages2() so that they are atomic
w.r.t. the addition of new writes.
Later on, we can revisit whether or not we still need nfs_launder_page()
and nfs_release_page().

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-04 02:02:29 +0800

02 Mar, 2015

1 commit

aa5accea4 NFS: Ensure that buffered writes wait for O_DIRECT writes to complete ... Browse Code »

The O_DIRECT code will grab the inode->i_mutex and flush out buffered
writes, before scheduling a read or a write. However there is no
equivalent in the buffered write code to wait for O_DIRECT to complete.

Fixes a reported issue in xfstests generic/133, when first performing an
O_DIRECT write followed by a buffered write.

Signed-off-by: Trond Myklebust
Tested-by: Chuck Lever

Trond Myklebust
2015-03-02 12:22:40 +0800

11 Feb, 2015

1 commit

d83a08db5 mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub ... Browse Code »

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-02-11 06:30:30 +0800

19 Oct, 2014

1 commit

d3dc366bb Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.

- blk-mq timeout updates and fixes from Christoph.

- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.

- blk integrity updates and fixes from Martin and Gu Zheng.

- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.

- Improve the error printing, from Rob Elliott.

- Backing device improvements and cleanups from Tejun.

- Fixup of a misplaced rq_complete() tracepoint from Hannes.

- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.

- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.

- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.

- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...

Linus Torvalds
2014-10-19 02:53:51 +0800

14 Oct, 2014

1 commit

e19a8a0ad block: Remove REQ_KERNEL ... Browse Code »

REQ_KERNEL is no longer used. Remove it and drop the redundant uio
argument to nfs_file_direct_{read,write}.

Signed-off-by: Martin K. Petersen
Cc: Christoph Hellwig
Reported-by: Dan Carpenter
Signed-off-by: Jens Axboe

Martin K. Petersen
2014-10-14 23:00:44 +0800

12 Oct, 2014

1 commit

ef4a48c51 Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux ... Browse Code »

Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:

- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.

There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"

* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...

Linus Torvalds
2014-10-12 01:21:34 +0800

25 Sep, 2014

3 commits

1aff52562 NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() ... Browse Code »

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
- it doesn't hurt for kswapd to block occasionally. If it doesn't
want to block it would clear __GFP_WAIT. The current_is_kswapd()
was only added to avoid deadlocks and we have a new approach for
that.
- memory allocation in the SUNRPC layer can very rarely try to
->releasepage() a page it is trying to handle. The deadlock
is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.

Signed-off-by: NeilBrown
Acked-by: Jeff Layton
Signed-off-by: Trond Myklebust

NeilBrown
2014-09-25 20:25:47 +0800
353db7966 NFS: avoid waiting at all in nfs_release_page when congested. ... Browse Code »

If nfs_release_page() is called on a sequence of pages which are all
in the same file which is blocked on COMMIT, each page could
contribute a 1 second delay which could be come excessive. I have
seen delays of as much as 208 seconds.

To keep the delay to one second, mark the bdi as write-congested
if the commit didn't finished. Once it does finish, the
write-congested flag will be cleared by nfs_commit_release_pages().

With this, the longest total delay in try_to_free_pages that I have
seen is under 3 seconds. With no waiting in nfs_release_page at all
I have seen delays of nearly 1.5 seconds.

Signed-off-by: NeilBrown
Acked-by: Jeff Layton
Signed-off-by: Trond Myklebust

NeilBrown
2014-09-25 20:25:38 +0800
959054469 NFS: avoid deadlocks with loop-back mounted NFS filesystems. ... Browse Code »

Support for loop-back mounted NFS filesystems is useful when NFS is
used to access shared storage in a high-availability cluster.

If the node running the NFS server fails, some other node can mount the
filesystem and start providing NFS service. If that node already had
the filesystem NFS mounted, it will now have it loop-back mounted.

nfsd can suffer a deadlock when allocating memory and entering direct
reclaim.
While direct reclaim does not write to the NFS filesystem it can send
and wait for a COMMIT through nfs_release_page().

This patch modifies nfs_release_page() to wait a limited time for the
commit to complete - one second. If the commit doesn't complete
in this time, nfs_release_page() will fail. This means it might now
fail in some cases where it wouldn't before. These cases are only
when 'gfp' includes '__GFP_WAIT'.

nfs_release_page() is only called by try_to_release_page(), and that
can only be called on an NFS page with required 'gfp' flags from
- page_cache_pipe_buf_steal() in splice.c
- shrink_page_list() in vmscan.c
- invalidate_inode_pages2_range() in truncate.c

The first two handle failure quite safely. The last is only called
after ->launder_page() has been called, and that will have waited
for the commit to finish already.

So aborting if the commit takes longer than 1 second is perfectly safe.

Signed-off-by: NeilBrown
Acked-by: Jeff Layton
Signed-off-by: Trond Myklebust

NeilBrown
2014-09-25 20:25:28 +0800

11 Sep, 2014

2 commits

dad2b015b nfs: fix RCU cl_xprt handling in nfs_swap_activate/deactivate ... Browse Code »

sparse says:

fs/nfs/file.c:543:60: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:543:60: expected struct rpc_xprt *xprt
fs/nfs/file.c:543:60: got struct rpc_xprt [noderef] *cl_xprt
fs/nfs/file.c:548:53: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:548:53: expected struct rpc_xprt *xprt
fs/nfs/file.c:548:53: got struct rpc_xprt [noderef] *cl_xprt

cl_xprt is RCU-managed, so we need to take care to dereference and use
it while holding the RCU read lock.

Cc: Mel Gorman
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-09-11 03:47:04 +0800
612aa983a pnfs: add flag to force read-modify-write in ->write_begin ... Browse Code »

Like all block based filesystems, the pNFS block layout driver can't read
or write at a byte granularity and thus has to perform read-modify-write
cycles on writes smaller than this granularity.

Add a flag so that the core NFS code always reads a whole page when
starting a smaller write, so that we can do it in the place where the VFS
expects it instead of doing in very deadlock prone way in the writeback
handler.

Note that in theory we could do less than page size reads here for disks
that have a smaller sector size which are served by a server with a smaller
pnfs block size. But so far that doesn't seem like a worthwhile
optimization.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:02 +0800

10 Sep, 2014

1 commit

1c994a090 locks: consolidate "nolease" routines ... Browse Code »

GFS2 and NFS have setlease routines that always just return -EINVAL.
Turn that into a generic routine that can live in fs/libfs.c.

Cc:
Cc: Steven Whitehouse
Cc:
Signed-off-by: Jeff Layton
Acked-by: Trond Myklebust
Reviewed-by: Christoph Hellwig

Jeff Layton
2014-09-10 04:01:36 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

13 Jun, 2014

1 commit

16b905780 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.

In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...

Linus Torvalds
2014-06-13 01:30:18 +0800

12 Jun, 2014

1 commit

4da54c218 nfs: switch to iter_splice_write_file() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-06-12 12:21:11 +0800

02 Jun, 2014

1 commit

130d1f956 locks: ensure that fl_owner is always initialized properly in flock and lease codepaths ... Browse Code »

Currently, the fl_owner isn't set for flock locks. Some filesystems use
byte-range locks to simulate flock locks and there is a common idiom in
those that does:

fl->fl_owner = (fl_owner_t)filp;
fl->fl_start = 0;
fl->fl_end = OFFSET_MAX;

Since flock locks are generally "owned" by the open file description,
move this into the common flock lock setup code. The fl_start and fl_end
fields are already set appropriately, so remove the unneeded setting of
that in flock ops in those filesystems as well.

Finally, the lease code also sets the fl_owner as if they were owned by
the process and not the open file description. This is incorrect as
leases have the same ownership semantics as flock locks. Set them the
same way. The lease code doesn't actually use the fl_owner value for
anything, so this is more for consistency's sake than a bugfix.

Reported-by: Trond Myklebust
Signed-off-by: Jeff Layton
Acked-by: Greg Kroah-Hartman (Staging portion)
Acked-by: J. Bruce Fields

Jeff Layton
2014-06-02 20:09:29 +0800

07 May, 2014

1 commit

edaf43694 nfs: switch to ->write_iter() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:39:38 +0800