05 Aug, 2016
5 commits
-
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.This check is therefore unnecessary.
(In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed. I'm leaving that behavior alone.)Signed-off-by: J. Bruce Fields
-
There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked. The only already-locked
caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.Also fix some comments while we're here.
Signed-off-by: J. Bruce Fields
-
Create and other nfsd ops generally assume we can call lookup_one_len on
inodes with S_IFDIR set. Al says that this assumption isn't true in
general, though it should be for the filesystem objects nfsd sees.Add a check just to make sure our assumption isn't violated.
Remove a couple checks for i_op->lookup in create code.
Cc: Al Viro
Signed-off-by: J. Bruce Fields -
lookup_one_len already has this check.
The only effect of this patch is to return access instead of perm in the
0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.Signed-off-by: J. Bruce Fields
-
When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.Signed-off-by: Oleg Drokin
Signed-off-by: J. Bruce Fields
02 Aug, 2016
2 commits
-
This modification is useful for debugging issues that happen while
the socket is being initialised.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
We're seeing traces of the following form:
[10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
[10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
[10952.396362] nfsd: connect from 10.2.6.1, port=187
[10952.396364] svc: svc_setup_socket ffff8800b99bcf00
[10952.396368] setting up TCP socket for reading
[10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
[10952.396373] svc: transport ffff8803eb10a000 put into queue
[10952.396375] svc: transport ffff88042ba4a000 put into queue
[10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
[10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
[10952.396381] svc_recv: found XPT_CLOSE
[10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
[10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
[10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
[10952.396412] svc: svc_sock_free(ffff8803eb10a000)i.e. an immediate close of the socket after initialisation.
The culprit appears to be the test at the end of svc_tcp_init, which
checks if the newly created socket is in the TCP_ESTABLISHED state,
and immediately closes it if not. The evidence appears to suggest that
the socket might still be in the SYN_RECV state at this time.The fix is to check for both states, and then to add a check in
svc_tcp_state_change() to ensure we don't close the socket when
it transitions into TCP_ESTABLISHED.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
16 Jul, 2016
4 commits
-
If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.Signed-off-by: Jeff Layton
Reviewed-by: Weston Andros Adamson
Signed-off-by: J. Bruce Fields -
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.Found by inspection, we don't currently have a reproducer.
Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
These values are all multiples of 4 already, so there's no change in
behavior from this patch. But perhaps this will prevent mistakes in the
future.Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields -
Instead of creeping pnfs layout configuration into filesystems, move the
definition of block-based export operations under a more abstract
configuration.Signed-off-by: Benjamin Coddington
Reviewed-by: Christoph Hellwig
Acked-by: Dave Chinner
Signed-off-by: J. Bruce Fields
14 Jul, 2016
17 commits
-
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
The current server rpc tcp code attempts to predict how much writeable
socket space will be available to a given RPC call before accepting it
for processing. On a 40GigE network, we've found this throttles
individual clients long before the network or disk is saturated. The
server may handle more clients easily, but the bandwidth of individual
clients is still artificially limited.Instead of trying (and failing) to predict how much writeable socket space
will be available to the RPC call, just fall back to the simple model of
deferring processing until the socket is uncongested.This may increase the risk of fast clients starving slower clients; in
such cases, the previous patch allows setting a hard per-connection
limit.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Allow the user to limit the number of requests serviced through a single
connection, to help prevent faster clients from starving slower clients.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Don't call svc_xprt_enqueue() if the XPT_DATA flag is already set.
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Rather than code up our own versions of the socket callbacks, just
call the defaults.
This also allows us to merge svc_udp_data_ready() and svc_tcp_data_ready().Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Prevent callbacks from triggering while we're detaching the socket.
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Dropping and/or deferring requests has an impact on performance. Let's
make sure we can trace those events.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Add a tracepoint to track when the processing of incoming RPC data gets
deferred due to out-of-space issues on the outgoing transport.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Silent a few smatch warnings about indentation
Signed-off-by: Christophe JAILLET
Signed-off-by: J. Bruce Fields -
Those are now defined in fs/nfsd/vfs.h
Signed-off-by: Oleg Drokin
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
is also the ds (NFSv3). I.e., the metadata and the data file are
the exact same file.This will allow testing of the flex file client.
Simply add the "pnfs" export option to your export
in /etc/exports and mount from a client that supports
flex files.Signed-off-by: Tom Haynes
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
Signed-off-by: Tom Haynes
Reviewed-by: Christoph Hellwig
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
GSS-Proxy doesn't produce very much debug logging at all. Printing out
the gss minor status will aid in troubleshooting if the
GSS_Accept_sec_context upcall fails.Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields -
This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.Signed-off-by: Andrew Elble
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify
Signed-off-by: Andrew Elble
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
So these may be used in nfsd as well
Signed-off-by: Andrew Elble
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
This field is not currently in use.
Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
02 Jul, 2016
1 commit
-
We're always tracing IPv4 or IPv6 addresses, so we can save a lot
of space on the ringbuffer by allocating the correct sockaddr size.Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org
Fixes: 83a712e0afef "sunrpc: add some tracepoints around ..."
Signed-off-by: J. Bruce Fields
01 Jul, 2016
2 commits
-
(Another one for the f_path debacle.)
ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode(). This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode. When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.Reported-by: Eryu Guan
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc:
Signed-off-by: Miklos Szeredi
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org
Fixes: 0751ddf77b6a "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields
27 Jun, 2016
2 commits
-
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
25 Jun, 2016
7 commits
-
…git/mason/linux-btrfs
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes -
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.This first part has some fixes and two performance improvements that
we've been testing for some time.Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing -
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM -
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping -
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:- force watchdog reset while processing sysrq-w
- fix a deadlock when enabling trace events in the scheduler
- fixes to the throttled next buddy logic
- fixes for the average accounting (missing serialization and
underflow handling)- allow kernel threads for fallback to online but not active cpus"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization -
Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
donealong with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i/dev/null &
done
wait
doneSigned-off-by: Omar Sandoval
Signed-off-by: David Sterba
Signed-off-by: Chris Mason -
Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc()