23 Jun, 2015
1 commit
-
This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
20 Jun, 2015
3 commits
-
Split out two self contained helpers to make the function more readable.
Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
Refactor the raparam hash helpers to just deal with the raparms,
and keep opening/closing files separate from that.Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
Use kernel.h macro definition.
Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: Fabian Frederick
Signed-off-by: J. Bruce Fields
05 Jun, 2015
7 commits
-
Bi-directional RPC support means code in svcrdma.ko invokes a bit of
code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
merge the server and client side modules together into a single
module.When backchannel capabilities are added, the combined module will
register all needed transport capabilities so that Upper Layer
consumers automatically have everything needed to create a
bi-directional transport connection.Module aliases are added for backwards compatibility with user
space, which still may expect svcrdma.ko or xprtrdma.ko to be
present.This commit reverts commit 2e8c12e1b765 ("xprtrdma: add separate
Kconfig options for NFSoRDMA client and server support") and
provides a single CONFIG option for enabling the new module.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
The server and client maximum are architecturally independent.
Allow changing one without affecting the other.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
At the 2015 LSF/MM, it was requested that memory allocation
call sites that request GFP_KERNEL allocations in a loop should be
annotated with __GFP_NOFAIL.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Fields in struct rpcrdma_msg are __be32. Don't byte-swap these
fields when decoding RPC calls and then swap them back for the
reply. For the most part, they can be left alone.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
In send_write_chunks(), we have:
for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0;
xfer_len && chunk_no < arg_ary->wc_nchunks;
chunk_no++) {
. . .
}Note that arg_ary->wc_nchunk is in network byte-order. For the
comparison to work correctly, both have to be in native byte-order.In send_reply_chunks, we have:
write_len = min(xfer_len, htonl(ch->rs_length));
xfer_len is in native byte-order, and ch->rs_length is in
network byte-order. be32_to_cpu() is the correct byte swap
for ch->rs_length.As an additional clean up, replace ntohl() with be32_to_cpu() in
a few other places.This appears to address a problem with large rsize hangs while
using PHYSICAL memory registration. I suspect that is the only
registration mode that uses more than one chunk element.BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED.
It is caused by nfs return NFS4ERR_DELAY before validate_seqid(),
don't update the sequnce id, but nfsd updates the sequnce id !!!According to RFC5661 20.9.3,
" If CB_SEQUENCE returns an error, then the state of the slot
(sequence ID, cached reply) MUST NOT change. "Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields -
nfsd enters a infinite loop and prints message every 10 seconds:
May 31 18:33:52 test-server kernel: Error sending entire callback!
May 31 18:34:01 test-server kernel: Error sending entire callback!This is caused by a cb_layoutreturn getting error -10008
(NFS4ERR_DELAY), the client crashing, and then nfsd entering the
infinite loop:bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...Reproduced using xfstests 074 with nfs client's kdump on,
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set, and client's blkmapd down:1. nfs client's write operation will get the layout of file,
and then send getdeviceinfo,
2. but layout segment is not recorded by client because blkmapd is down,
3. client writes data by sending WRITE to server,
4. nfs server recalls the layout of the file before WRITE,
5. network error causes the client reset the session and return NFS4ERR_DELAY,
6. so client's WRITE operation is waiting the reply.
If the task hangs 120s, the client will crash.
7. so that, the next bc_sendto will fail with TIMEOUT,
and cb_status is NFS4ERR_DELAY.Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields
04 Jun, 2015
2 commits
-
svc_rdma_xdr_decode_deferred_req() indexes an array with an
un-byte-swapped value off the wire. Fortunately this function
isn't used anywhere, so simply remove it.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Clean up.
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
01 Jun, 2015
1 commit
-
Add the ACL related protocol definitions which were added in the NFSv4.1
specification.(But we're not using them yet.)
Signed-off-by: Andreas Gruenbacher
Signed-off-by: J. Bruce Fields
29 May, 2015
4 commits
-
Signed-off-by: Andreas Gruenbacher
Signed-off-by: J. Bruce Fields -
gcc-5.0 warns about a potential uninitialized variable use in nfsd:
fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
reset_union_bmap_deny(old_deny_bmap, stp);
^
fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
unsigned char old_deny_bmap;
^This is a false positive, the code path that is warned about cannot
actually be reached.This adds an initialization for the variable to make the warning go
away.Signed-off-by: Arnd Bergmann
Signed-off-by: J. Bruce Fields -
Whether or not a file system supports acls can be determined with
IS_POSIXACL(inode) and does not require trying to fetch any acls; the code for
computing the supported_attrs and aclsupport attributes can be simplified.Signed-off-by: Andreas Gruenbacher
Signed-off-by: J. Bruce Fields -
NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
to the server's current time. To implement the equivalent of utimes("file",
NULL), it uses a heuristic.NFSv3 and later do support setting the atime and/or mtime to the server's
current time directly. The NFSv2 heuristic is still enabled, and causes
timestamps to be set wrong sometimes.Fix this by moving the heuristic into the NFSv2 specific code. We can leave it
out of the create code path: the owner can always set timestamps arbitrarily,
and the workaround would never trigger.Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
07 May, 2015
1 commit
-
The NFSv3 READDIRPLUS gets some of the returned attributes from the
readdir, and some from an inode returned from a new lookup. The two
objects could be different thanks to intervening renames.The attributes in READDIRPLUS are optional, so let's just skip them if
we notice this case.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
05 May, 2015
9 commits
-
The 'overloads-avoided' counter itself was removed several years ago by
commit 78c210e (Revert "knfsd: avoid overloading the CPU scheduler with
enormous load averages").Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields -
Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
With sessions in v4.1 or later we don't need to manually probe the backchannel
connection, so we can declare it up instantly after setting up the RPC client.Note that we really should split nfsd4_run_cb_work in the long run, this is
just the least intrusive fix for now.Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
Checking the rpc_client pointer is not a reliable way to detect
backchannel changes: cl_cb_client is changed only after shutting down
the rpc client, so the condition cl_cb_client = tk_client will always be
true.Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
the buggy cl_callbacks list and fix the lifetime rules due to double
calls of the ->prepare callback operations method for this retry case.Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
We must only increment the sequence id if the client has seen and responded
to a request. If we failed to deliver it to the client we must resend with
the same sequence id. So just like the client track errors at the transport
level differently from those returned in the XDR.Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary. Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org
Reviewed-by: Simo Sorce
Signed-off-by: J. Bruce Fields -
For the sake of forgetful clients, the server should return the layouts
to the file system on 'last close' of a file (assuming that there are no
delegations outstanding to that particular client) or on delegreturn
(assuming that there are no opens on a file from that particular
client).In theory the information is all there in current data structures, but
it's not efficiently available; nfs4_file->fi_ref includes references on
the file across all clients, but we need a per-(client, file) count.
Walking through lots of stateid's to calculate this on each close or
delegreturn would be painful.This patch introduces infrastructure to maintain per-client opens and
delegation counters on a per-file basis.[hch: ported to the mainline pNFS support, merged various fixes from Jeff]
Signed-off-by: Sachin Bhamare
Signed-off-by: Jeff Layton
Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value. Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
Commit df52699e4fcef ("NFSv4.1: Don't cache deviceids that have no
notifications") causes the Linux NFS client to stop caching deviceid's
unless a server pretends to support deviceid notifications. While this
behavior is stupid and the language around this area in rfc5661 is a
mess carified by an errata that I submittted, Trond insists on this
behavior. Not caching deviceids degrades block layout performance
massively as a GETDEVICEINFO is fairly expensive.So add this hack to make the Linux client happy again.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
04 May, 2015
8 commits
-
Pull ext4 fixes from Ted Ts'o:
"Some miscellaneous bug fixes and some final on-disk and ABI changes
for ext4 encryption which provide better security and performance"* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix growing of tiny filesystems
ext4: move check under lock scope to close a race.
ext4: fix data corruption caused by unwritten and delayed extents
ext4 crypto: remove duplicated encryption mode definitions
ext4 crypto: do not select from EXT4_FS_ENCRYPTION
ext4 crypto: add padding to filenames before encrypting
ext4 crypto: simplify and speed up filename encryption -
Pull drm fixes from Dave Airlie:
"One intel fix, one rockchip fix, and a bunch of radeon fixes for some
regressions from audio rework and vm stability"* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915/chv: Implement WaDisableShadowRegForCpd
drm/radeon: fix userptr return value checking (v2)
drm/radeon: check new address before removing old one
drm/radeon: reset BOs address after clearing it.
drm/radeon: fix lockup when BOs aren't part of the VM on release
drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
drm/radeon: adjust pll when audio is not enabled
drm/radeon: only enable audio streams if the monitor supports it
drm/radeon: only mark audio as connected if the monitor supports it (v3)
drm/radeon/audio: don't enable packets until the end
drm/radeon: drop dce6_dp_enable
drm/radeon: fix ordering of AVI packet setup
drm/radeon: Use drm_calloc_ab for CS relocs
drm/rockchip: fix error check when getting irq
MAINTAINERS: add entry for Rockchip drm drivers -
Just a single intel fix
* tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915/chv: Implement WaDisableShadowRegForCpd -
one fix and maintainers update
* 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
drm/rockchip: fix error check when getting irq
MAINTAINERS: add entry for Rockchip drm drivers -
Pull SCSI fixes from James Bottomley:
"This is three logical fixes (as 5 patches).The 3ware class of drivers were causing an oops with multiqueue by
tearing down the command mappings after completing the command (where
the variables in the command used to tear down the mapping were
no-longer valid). There's also a fix for the qnap iscsi target which
was choking on us sending it commands that were too long and a fix for
the reworked aha1542 allocating GFP_KERNEL under a lock"* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
3w-9xxx: fix command completion race
3w-xxxx: fix command completion race
3w-sas: fix command completion race
aha1542: Allocate memory before taking a lock
SCSI: add 1024 max sectors black list flag -
Pull slave dmaengine fixes from Vinod Koul:
"Here are the fixes in dmaengine subsystem for rc2:- privatecnt fix for slave dma request API by Christopher
- warn fix for PM ifdef in usb-dmac by Geert
- fix hardware dependency for xgene by Jean"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: increment privatecnt when using dma_get_any_slave_channel
dmaengine: xgene: Set hardware dependency
dmaengine: usb-dmac: Protect PM-only functions to kill warning -
Pull powerpc fixes from Michael Ellerman:
- build fix for SMP=n in book3s_xics.c
- fix for Daniel's pci_controller_ops on powernv.
- revert the TM syscall abort patch for now.
- CPU affinity fix from Nathan.
- two EEH fixes from Gavin.
- fix for CR corruption from Sam.
- selftest build fix.* tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/powernv: Restore non-volatile CRs after nap
powerpc/eeh: Delay probing EEH device during hotplug
powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
powerpc/pseries: Correct cpu affinity for dlpar added cpus
selftests/powerpc: Fix the pmu install rule
Revert "powerpc/tm: Abort syscalls in active transactions"
powerpc/powernv: Fix early pci_controller_ops loading.
powerpc/kvm: Fix SMP=n build error in book3s_xics.c
03 May, 2015
3 commits
-
The estimate of necessary transaction credits in ext4_flex_group_add()
is too pessimistic. It reserves credit for sb, resize inode, and resize
inode dindirect block for each group added in a flex group although they
are always the same block and thus it is enough to account them only
once. Also the number of modified GDT block is overestimated since we
fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.Make the estimation more precise. That reduces number of requested
credits enough that we can grow 20 MB filesystem (which has 1 MB
journal, 79 reserved GDT blocks, and flex group size 16 by default).Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
Reviewed-by: Eric Sandeen -
fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.Signed-off-by: Davide Italiano
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org -
Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.This problem can be easily reproduced by running the following xfs_io.
xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
-c "falloc 0 131072" \
-c "pwrite -S 0xbb 65536 2048" \
-c "fsync" /mnt/test/fffecho 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fffThis can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)Signed-off-by: Lukas Czerner
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
02 May, 2015
1 commit
-
This patch removes duplicated encryption modes which were already in
ext4.h. They were duplicated from commit 3edc18d and commit f542fb.Cc: Theodore Ts'o
Cc: Michael Halcrow
Cc: Andreas Dilger
Signed-off-by: Chanho Park
Signed-off-by: Theodore Ts'o