Eric Lee / smarc-fsl-linux-kernel

30 Aug, 2017

1 commit

fd8235e72 nfsd: Limit end of page list when decoding NFSv4 WRITE ... Browse Code »

commit fc788f64f1f3eb31e87d4f53bcf1ab76590d5838 upstream.

When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.

More critical, nfsd4_decode_write is failing to increment argp->pagelen
when it increments argp->pagelist. This can cause later xdr decoders
to assume more data is available than really is, which can cause server
crashes on malformed requests.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-08-30 16:21:50 +0800

14 Jun, 2017

3 commits

61604a262 fs: add i_blocksize() ... Browse Code »

commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream.

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick
Signed-off-by: Geliang Tang
Cc: Alexander Viro
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Fabian Frederick
2017-06-14 21:06:00 +0800
e273ed246 nfsd: Fix up the "supattr_exclcreat" attributes ... Browse Code »

commit b26b78cb726007533d81fdf90a62e915002ef5c8 upstream.

If an NFSv4 client asks us for the supattr_exclcreat, then we must
not return attributes that are unsupported by this minor version.

Signed-off-by: Trond Myklebust
Fixes: 75976de6556f ("NFSD: Return word2 bitmask if setting security..,")
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2017-06-14 21:05:55 +0800
bfeac8380 nfsd4: fix null dereference on replay ... Browse Code »

commit 9a307403d374b993061f5992a6e260c944920d0b upstream.

if we receive a compound such that:

- the sessionid, slot, and sequence number in the SEQUENCE op
match a cached succesful reply with N ops, and
- the Nth operation of the compound is a PUTFH, PUTPUBFH,
PUTROOTFH, or RESTOREFH,

then nfsd4_sequence will return 0 and set cstate->status to
nfserr_replay_cache. The current filehandle will not be set. This will
cause us to call check_nfsd_access with first argument NULL.

To nfsd4_compound it looks like we just succesfully executed an
operation that set a filehandle, but the current filehandle is not set.

Fix this by moving the nfserr_replay_cache earlier. There was never any
reason to have it after the encode_op label, since the only case where
he hit that is when opdesc->op_func sets it.

Note that there are two ways we could hit this case:

- a client is resending a previously sent compound that ended
with one of the four PUTFH-like operations, or
- a client is sending a *new* compound that (incorrectly) shares
sessionid, slot, and sequence number with a previously sent
compound, and the length of the previously sent compound
happens to match the position of a PUTFH-like operation in the
new compound.

The second is obviously incorrect client behavior. The first is also
very strange--the only purpose of a PUTFH-like operation is to set the
current filehandle to be used by the following operation, so there's no
point in having it as the last in a compound.

So it's likely this requires a buggy or malicious client to reproduce.

Reported-by: Scott Mayhew
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-06-14 21:05:55 +0800

25 May, 2017

2 commits

51d9c5152 nfsd: encoders mustn't use unitialized values in error cases ... Browse Code »

commit f961e3f2acae94b727380c0b74e2d3954d0edf79 upstream.

In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.

This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.

GETDEVICEINFO has the same problem.

Reported-by: Ari Kauppi
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-05-25 21:44:49 +0800
ea465551a nfsd: fix undefined behavior in nfsd4_layout_verify ... Browse Code »

commit b550a32e60a4941994b437a8d662432a486235a5 upstream.

UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34
shift exponent 128 is too large for 32-bit type 'int'

Depending on compiler+architecture, this may cause the check for
layout_type to succeed for overly large values (which seems to be the
case with amd64). The large value will be later used in de-referencing
nfsd4_layout_ops for function pointers.

Reported-by: Jani Tuovila
Signed-off-by: Ari Kauppi
[colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32]
Reviewed-by: Dan Carpenter
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Ari Kauppi
2017-05-25 21:44:48 +0800

03 May, 2017

3 commits

d7809b9e9 nfsd: stricter decoding of write-like NFSv2/v3 ops ... Browse Code »

commit 13bf9fbff0e5e099e2b6f003a0ab8ae145436309 upstream.

The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.

Reported-by: Tuomas Haanpää
Reported-by: Ari Kauppi
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-05-03 23:36:38 +0800
8ed079796 nfsd4: minor NFSv2/v3 write decoding cleanup ... Browse Code »

commit db44bac41bbfc0c0d9dd943092d8bded3c9db19b upstream.

Use a couple shortcuts that will simplify a following bugfix.

Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-05-03 23:36:38 +0800
fc6445df4 nfsd: check for oversized NFSv2/v3 arguments ... Browse Code »

commit e6838a29ecb484c97e4efef9429643b9851fba6e upstream.

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää
Reported-by: Ari Kauppi
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-05-03 23:36:38 +0800

08 Apr, 2017

1 commit

3967cf7e6 nfsd: map the ENOKEY to nfserr_perm for avoiding warning ... Browse Code »

commit c952cd4e949ab3d07287efc2e80246e03727d15d upstream.

Now that Ext4 and f2fs filesystems support encrypted directories and
files, attempts to access those files may return ENOKEY, resulting in
the following WARNING.

Map ENOKEY to nfserr_perm instead of nfserr_io.

[ 1295.411759] ------------[ cut here ]------------
[ 1295.411787] WARNING: CPU: 0 PID: 12786 at fs/nfsd/nfsproc.c:796 nfserrno+0x74/0x80 [nfsd]
[ 1295.411806] nfsd: non-standard errno: -126
[ 1295.411816] Modules linked in: nfsd nfs_acl auth_rpcgss nfsv4 nfs lockd fscache tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event coretemp crct10dif_pclmul crc32_generic crc32_pclmul snd_ens1371 gameport ghash_clmulni_intel snd_ac97_codec f2fs intel_rapl_perf ac97_bus snd_seq ppdev snd_pcm snd_rawmidi snd_timer vmw_balloon snd_seq_device snd joydev soundcore parport_pc parport nfit acpi_cpufreq tpm_tis vmw_vmci tpm_tis_core tpm shpchp i2c_piix4 grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs_acl]
[ 1295.412522] CPU: 0 PID: 12786 Comm: nfsd Tainted: G W 4.11.0-rc1+ #521
[ 1295.412959] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1295.413814] Call Trace:
[ 1295.414252] dump_stack+0x63/0x86
[ 1295.414666] __warn+0xcb/0xf0
[ 1295.415087] warn_slowpath_fmt+0x5f/0x80
[ 1295.415502] ? put_filp+0x42/0x50
[ 1295.415927] nfserrno+0x74/0x80 [nfsd]
[ 1295.416339] nfsd_open+0xd7/0x180 [nfsd]
[ 1295.416746] nfs4_get_vfs_file+0x367/0x3c0 [nfsd]
[ 1295.417182] ? security_inode_permission+0x41/0x60
[ 1295.417591] nfsd4_process_open2+0x9b2/0x1200 [nfsd]
[ 1295.418007] nfsd4_open+0x481/0x790 [nfsd]
[ 1295.418409] nfsd4_proc_compound+0x395/0x680 [nfsd]
[ 1295.418812] nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 1295.419233] svc_process_common+0x4d9/0x830 [sunrpc]
[ 1295.419631] svc_process+0xfe/0x1b0 [sunrpc]
[ 1295.420033] nfsd+0xe9/0x150 [nfsd]
[ 1295.420420] kthread+0x101/0x140
[ 1295.420802] ? nfsd_destroy+0x60/0x60 [nfsd]
[ 1295.421199] ? kthread_park+0x90/0x90
[ 1295.421598] ret_from_fork+0x2c/0x40
[ 1295.421996] ---[ end trace 0d5a969cd7852e1f ]---

Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Kinglong Mee
2017-04-08 15:30:33 +0800

12 Mar, 2017

2 commits

a3c6cbc4e nfsd: special case truncates some more ... Browse Code »

commit 783112f7401ff449d979530209b3f6c2594fdb4e upstream.

Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributes to set to set various file attributes including the
file size and the uid/gid.

The Linux syscalls never mix size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates don't update random other attributes, and many other file
systems handle the case but do not update the other attributes in the
same transaction. NFSD on the other hand passes the attributes it gets
on the wire more or less directly through to the VFS, leading to updates
the file systems don't expect. XFS at least has an assert on the
allowed attributes, which caught an unusual NFS client setting the size
and group at the same time.

To handle this issue properly this splits the notify_change call in
nfsd_setattr into two separate ones.

Signed-off-by: Christoph Hellwig
Tested-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-03-12 13:41:50 +0800
9bdd39c14 nfsd: minor nfsd_setattr cleanup ... Browse Code »

commit 758e99fefe1d9230111296956335cd35995c0eaf upstream.

Simplify exit paths, size_change use.

Signed-off-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-03-12 13:41:50 +0800

09 Feb, 2017

1 commit

743146d34 NFSD: Fix a null reference case in find_or_create_lock_stateid() ... Browse Code »

commit d19fb70dd68c4e960e2ac09b0b9c79dfdeefa726 upstream.

nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Kinglong Mee
2017-02-09 15:08:27 +0800

02 Nov, 2016

1 commit

f46c445b7 nfsd: Fix general protection fault in release_lock_stateid() ... Browse Code »

When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[] [] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0 EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS: 0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP [] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
> struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
> stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton
Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever
Reviewed-by: Jeff Layton
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-11-02 03:24:43 +0800

25 Oct, 2016

1 commit

0cc11a61b nfsd: move blocked lock handling under a dedicated spinlock ... Browse Code »

Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.

The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-10-25 04:51:21 +0800

14 Oct, 2016

1 commit

277855647 Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:

- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.

- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

Linus Torvalds
2016-10-14 12:04:42 +0800

11 Oct, 2016

2 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
abb5a14fa Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.

There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...

Linus Torvalds
2016-10-11 04:04:49 +0800

08 Oct, 2016

5 commits

e55f1d1d1 Merge remote-tracking branch 'jk/vfs' into work.misc Browse Code »

Al Viro
2016-10-08 23:06:08 +0800
81243eacf cred: simpler, 1D supplementary groups ... Browse Code »

Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.

If number of gids is
Cc: Vasily Kulikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2016-10-08 09:46:30 +0800
29ae7f9dc NFSD: Implement the COPY call ... Browse Code »

I only implemented the sync version of this call, since it's the
easiest. I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Anna Schumaker
Signed-off-by: J. Bruce Fields

Anna Schumaker
2016-10-08 02:54:25 +0800
42e616167 nfsd: handle EUCLEAN ... Browse Code »

Eric Sandeen reports that xfs can return this if filesystem corruption
prevented completing the operation.

Reported-by: Eric Sandeen
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2016-10-08 02:54:19 +0800
ff30f08c3 nfsd: only WARN once on unmapped errors ... Browse Code »

No need to spam the logs here.

The only drawback is losing information if we ever encounter two
different unmapped errors, but in practice we've rarely see even one.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2016-10-08 02:53:33 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

27 Sep, 2016

7 commits

7d22fc11c nfsd4: setclientid_confirm with unmatched verifier should fail ... Browse Code »

A setclientid_confirm with (clientid, verifier) both matching an
existing confirmed record is assumed to be a replay, but if the verifier
doesn't match, it shouldn't be.

This would be a very rare case, except that clients following
https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
failure.

Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2016-09-27 03:20:38 +0800
ebd7c72c6 nfsd: randomize SETCLIENTID reply to help distinguish servers ... Browse Code »

NFSv4.1 has built-in trunking support that allows a client to determine
whether two connections to two different IP addresses are actually to
the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
clients a means to do this, basically by performing a SETCLIENTID to one
address and confirming it with a SETCLIENTID_CONFIRM to the other.

Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking
when mounting" implement a variation on this suggestion. It is possible
that other clients do too.

This depends on the clientid and verifier not being accepted by an
unrelated server. Since both are 64-bit values, that would be very
unlikely if they were random numbers. But they aren't:

knfsd generates the 64-bit clientid by concatenating the 32-bit boot
time (in seconds) and a counter. This makes collisions between
clientids generated by the same server extremely unlikely. But
collisions are very likely between clientids generated by servers that
boot at the same time, and it's quite common for multiple servers to
boot at the same time. The verifier is a concatenation of the
SETCLIENTID time (in seconds) and a counter, so again collisions between
different servers are likely if multiple SETCLIENTIDs are done at the
same time, which is a common case.

Therefore recent NFSv4.0 clients may decide two different servers are
really the same, and mount a filesystem from the wrong server.

Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the
client owner id during trunking detection", only does this when given
the non-default "migration" mount option.

The fault is really with RFC 7931, and needs a client fix, but in the
meantime we can mitigate the chance of these collisions by randomizing
the starting value of the counters used to generate clientids and
verifiers.

Reported-by: Frank Sorenson
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2016-09-27 03:20:38 +0800
19e4c3477 nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies ... Browse Code »

If we are using v4.1+, then we can send notification when contended
locks become free. Inform the client of that fact.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-27 03:20:37 +0800
7919d0a27 nfsd: add a LRU list for blocked locks ... Browse Code »

It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.

Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.

This also gives us a mechanism to clean up these objects at server
shutdown time as well.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-27 03:20:36 +0800
76d348fad nfsd: have nfsd4_lock use blocking locks for v4.1+ locks ... Browse Code »

Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.

When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-27 03:20:36 +0800
a188620eb nfsd: plumb in a CB_NOTIFY_LOCK operation ... Browse Code »

Add the encoding/decoding for CB_NOTIFY_LOCK operations.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-27 03:20:35 +0800
1eca45f8a NFSD: fix corruption in notifier registration ... Browse Code »

By design notifier can be registered once only, however nfsd registers
the same inetaddr notifiers per net-namespace. When this happen it
corrupts list of notifiers, as result some notifiers can be not called
on proper event, traverse on list can be cycled forever, and second
unregister can access already freed memory.

Cc: stable@vger.kernel.org
fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
Signed-off-by: Vasily Averin
Reviewed-by: Jeff Layton
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Vasily Averin
2016-09-27 02:17:45 +0800

23 Sep, 2016

1 commit

bec782b4f nfsd: fix dprintk in nfsd4_encode_getdeviceinfo ... Browse Code »

nfserr is big-endian, so we should convert it to host-endian before
printing it.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-23 22:18:52 +0800

22 Sep, 2016

1 commit

31051c85b fs: Give dentry to inode_change_ok() instead of inode ... Browse Code »

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara

Jan Kara
2016-09-22 16:56:19 +0800

17 Sep, 2016

2 commits

89dfdc964 nfsd: eliminate cb_minorversion field ... Browse Code »

We already have that info in the client pointer. No need to pass around
a copy.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-17 04:15:52 +0800
1983a66f5 nfsd: don't set a FL_LAYOUT lease for flexfiles layouts ... Browse Code »

We currently can hit a deadlock (of sorts) when trying to use flexfiles
layouts with XFS. XFS will call break_layout when something wants to
write to the file. In the case of the (super-simple) flexfiles layout
driver in knfsd, the MDS and DS are the same machine.

The client can get a layout and then issue a v3 write to do its I/O. XFS
will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
be issued to the client. The client however can't return the layout
until the v3 WRITE completes, but XFS won't allow the write to proceed
until the layout is returned.

Christoph says:

XFS only cares about block-like layouts where the client has direct
access to the file blocks. I'd need to look how to propagate the
flag into break_layout, but in principle we don't need to do any
recalls on truncate ever for file and flexfile layouts.

If we're never going to recall the layout, then we don't even need to
set the lease at all. Just skip doing so on flexfiles layouts by
adding a new flag to struct nfsd4_layout_ops and skipping the lease
setting and removal when that flag is true.

Cc: Christoph Hellwig
Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-09-17 04:15:52 +0800

13 Aug, 2016

1 commit

dd257933f nfsd: don't return an unhashed lock stateid after taking mutex ... Browse Code »

nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.

Signed-off-by: Jeff Layton
Tested-by: Alexey Kodanev
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-08-13 04:10:25 +0800

12 Aug, 2016

1 commit

42691398b nfsd: Fix race between FREE_STATEID and LOCK ... Browse Code »

When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324 R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.

Reported-by: Alexey Kodanev
Fix-suggested-by: Jeff Layton
Signed-off-by: Chuck Lever
Tested-by: Alexey Kodanev
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-08-12 03:08:39 +0800

11 Aug, 2016

1 commit

502aa0a5b nfsd: fix dentry refcounting on create ... Browse Code »

b44061d0b9 introduced a dentry ref counting bug. Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked(). The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.

Signed-off-by: Josef Bacik
Fixes: b44061d0b9 "nfsd: reorganize nfsd_create"
Reported-by: kernel test robot
Reviewed-by: Jeff Layton
Acked-by: Al Viro
Signed-off-by: J. Bruce Fields

Josef Bacik
2016-08-11 23:42:08 +0800

05 Aug, 2016

2 commits

a71e36045 Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Highlights:

- Trond made a change to the server's tcp logic that allows a fast
client to better take advantage of high bandwidth networks, but may
increase the risk that a single client could starve other clients;
a new sunrpc.svc_rpc_per_connection_limit parameter should help
mitigate this in the (hopefully unlikely) event this becomes a
problem in practice.

- Tom Haynes added a minimal flex-layout pnfs server, which is of no
use in production for now--don't build it unless you're doing
client testing or further server development"

* tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: remove some dead code in nfsd_create_locked()
nfsd: drop unnecessary MAY_EXEC check from create
nfsd: clean up bad-type check in nfsd_create_locked
nfsd: remove unnecessary positive-dentry check
nfsd: reorganize nfsd_create
nfsd: check d_can_lookup in fh_verify of directories
nfsd: remove redundant zero-length check from create
nfsd: Make creates return EEXIST instead of EACCES
SUNRPC: Detect immediate closure of accepted sockets
SUNRPC: accept() may return sockets that are still in SYN_RECV
nfsd: allow nfsd to advertise multiple layout types
nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
nfsd/blocklayout: Make sure calculate signature/designator length aligned
xfs: abstract block export operations from nfsd layouts
SUNRPC: Remove unused callback xpo_adjust_wspace()
SUNRPC: Change TCP socket space reservation
SUNRPC: Add a server side per-connection limit
SUNRPC: Micro optimisation for svc_data_ready
SUNRPC: Call the default socket callbacks instead of open coding
SUNRPC: lock the socket while detaching it
...

Linus Torvalds
2016-08-05 07:59:06 +0800
2b1188592 nfsd: remove some dead code in nfsd_create_locked() ... Browse Code »

We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
so "dchild" can't be an error pointer any more. Also, dchild can't be
NULL here (and dput would already handle this even if it was).

Signed-off-by: Dan Carpenter
Signed-off-by: J. Bruce Fields

Dan Carpenter
2016-08-05 05:11:53 +0800