Eric Lee / smarc-fsl-linux-kernel

09 Feb, 2017

1 commit

a3d729526 svcrpc: fix oops in absence of krb5 module ... Browse Code »

commit 034dd34ff4916ec1f8f74e39ca3efb04eab2f791 upstream.

Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."

The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize. Fix that.

[120408.542387] general protection fault: 0000 [#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257] ? __enqueue_entity+0x6c/0x70
[120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212] ? try_to_wake_up+0x4a/0x360
[120408.591036] ? wake_up_process+0x15/0x20
[120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177] svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168] svc_process_common+0x203/0x710 [sunrpc]
[120408.595220] svc_process+0x105/0x1c0 [sunrpc]
[120408.596278] nfsd+0xe9/0x160 [nfsd]
[120408.597060] kthread+0x101/0x140
[120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626] ? kthread_park+0x90/0x90
[120408.599448] ret_from_fork+0x22/0x30

Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: Simo Sorce
Reported-by: Olga Kornievskaia
Tested-by: Olga Kornievskaia
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-02-09 15:08:27 +0800

01 Feb, 2017

1 commit

cb1d48f55 SUNRPC: cleanup ida information when removing sunrpc module ... Browse Code »

commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream.

After removing sunrpc module, I get many kmemleak information as,
unreferenced object 0xffff88003316b1e0 (size 544):
comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc+0x15e/0x1f0
[] ida_pre_get+0xaa/0x150
[] ida_simple_get+0xad/0x180
[] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
[] lockd+0x4d/0x270 [lockd]
[] param_set_timeout+0x55/0x100 [lockd]
[] svc_defer+0x114/0x3f0 [sunrpc]
[] svc_defer+0x2d7/0x3f0 [sunrpc]
[] rpc_show_info+0x8a/0x110 [sunrpc]
[] proc_reg_write+0x7f/0xc0
[] __vfs_write+0xdf/0x3c0
[] vfs_write+0xef/0x240
[] SyS_write+0xad/0x130
[] entry_SYSCALL_64_fastpath+0x1a/0xa9
[] 0xffffffffffffffff

I found, the ida information (dynamic memory) isn't cleanup.

Signed-off-by: Kinglong Mee
Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Kinglong Mee
2017-02-01 15:33:09 +0800

26 Jan, 2017

5 commits

d34b6684e xprtrdma: Squelch "max send, max recv" messages at connect time ... Browse Code »

commit 6d6bf72de914059b304f7b99530a7856e5c846aa upstream.

Clean up: This message was intended to be a dprintk, as it is on the
server-side.

Fixes: 87cfb9a0c85c ('xprtrdma: Client-side support for ...')
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-01-26 15:24:43 +0800
8ade1c2b4 xprtrdma: Make FRWR send queue entry accounting more accurate ... Browse Code »

commit 8d38de65644d900199f035277aa5f3da4aa9fc17 upstream.

Verbs providers may perform house-keeping on the Send Queue during
each signaled send completion. It is necessary therefore for a verbs
consumer (like xprtrdma) to occasionally force a signaled send
completion if it runs unsignaled most of the time.

xprtrdma does not require signaled completions for Send or FastReg
Work Requests, but does signal some LocalInv Work Requests. To
ensure that Send Queue house-keeping can run before the Send Queue
is more than half-consumed, xprtrdma forces a signaled completion
on occasion by counting the number of Send Queue Entries it
consumes. It currently does this by counting each ib_post_send as
one Entry.

Commit c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
introduced the ability for frwr_op_unmap_sync to post more than one
Work Request with a single post_send. Thus the underlying assumption
of one Send Queue Entry per ib_post_send is no longer true.

Also, FastReg Work Requests are currently never signaled. They
should be signaled once in a while, just as Send is, to keep the
accounting of consumed SQEs accurate.

While we're here, convert the CQCOUNT macros to the currently
preferred kernel coding style, which is inline functions.

Fixes: c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-01-26 15:24:43 +0800
73a2e2405 svcrdma: avoid duplicate dma unmapping during error recovery ... Browse Code »

commit ce1ca7d2d140a1f4aaffd297ac487f246963dd2f upstream.

In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
problems when the iova being unmapped is subsequently reused. Remove
the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
handle it.

Fixes: 412a15c0fe53 ("svcrdma: Port to new memory registration API")
Signed-off-by: Sriharsha Basavapatna
Reviewed-by: Chuck Lever
Reviewed-by: Yuval Shaia
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Sriharsha Basavapatna
2017-01-26 15:24:40 +0800
f29f3616b svcrpc: don't leak contexts on PROC_DESTROY ... Browse Code »

commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future. This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2017-01-26 15:24:37 +0800
a297ed84b sunrpc: don't call sleeping functions from the notifier block callbacks ... Browse Code »

commit 546125d1614264d26080817d0c8cddb9b25081fa upstream.

The inet6addr_chain is an atomic notifier chain, so we can't call
anything that might sleep (like lock_sock)... instead of closing the
socket from svc_age_temp_xprts_now (which is called by the notifier
function), just have the rpc service threads do it instead.

Fixes: c3d4879e01be "sunrpc: Add a function to close..."
Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Scott Mayhew
2017-01-26 15:24:37 +0800

15 Jan, 2017

1 commit

bd99e7a60 svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm ... Browse Code »

commit 1b9f700b8cfc31089e2dfa5d0905c52fd4529b50 upstream.

Logic copied from xs_setup_bc_tcp().

Fixes: 39a9beab5acb ('rpc: share one xps between all backchannels')
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-01-15 20:42:56 +0800

09 Jan, 2017

1 commit

369b330c9 SUNRPC: fix refcounting problems with auth_gss messages. ... Browse Code »

commit 1cded9d2974fe4fe339fc0ccd6638b80d465ab2c upstream.

There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted. It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this. I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
calls __gss_find_upcall(pipe, U, NULL) and it finds the
*second* message, as new messages are placed at the head
of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details. In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250
Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

NeilBrown
2017-01-09 15:32:25 +0800

19 Nov, 2016

1 commit

aad931a30 Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfix from Bruce Fields:
"Just one fix for an NFS/RDMA crash"

* tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux:
sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports

Linus Torvalds
2016-11-19 08:32:21 +0800

14 Nov, 2016

1 commit

ea08e3923 sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports ... Browse Code »

This fixes the following panic that can occur with NFSoRDMA.

general protection fault: 0000 [#1] SMP
Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
tg3 crct10dif_pclmul drm crct10dif_common
ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
dm_log dm_mod
CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
Workqueue: events check_lifetime
task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
RIP: 0010:[] []
_raw_spin_lock_bh+0x17/0x50
RSP: 0018:ffff88031f587ba8 EFLAGS: 00010206
RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff880322a40000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
Call Trace:
[] lock_sock_nested+0x20/0x50
[] sock_setsockopt+0x78/0x940
[] ? lock_timer_base.isra.33+0x2b/0x50
[] kernel_setsockopt+0x4d/0x50
[] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
[] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
[] notifier_call_chain+0x4c/0x70
[] __blocking_notifier_call_chain+0x4d/0x70
[] blocking_notifier_call_chain+0x16/0x20
[] __inet_del_ifa+0x168/0x2d0
[] check_lifetime+0x25f/0x270
[] process_one_work+0x17b/0x470
[] worker_thread+0x126/0x410
[] ? rescuer_thread+0x460/0x460
[] kthread+0xcf/0xe0
[] ? kthread_create_on_node+0x140/0x140
[] ret_from_fork+0x58/0x90
[] ? kthread_create_on_node+0x140/0x140
Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 0f
c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
RIP [] _raw_spin_lock_bh+0x17/0x50
RSP

Signed-off-by: Scott Mayhew
Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
Reviewed-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Scott Mayhew
2016-11-14 23:30:58 +0800

12 Nov, 2016

1 commit

ef5beed99 Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Anna Schumaker:
"Most of these fix regressions in 4.9, and none are going to stable
this time around.

Bugfixes:
- Trim extra slashes in v4 nfs_paths to fix tools that use this
- Fix a -Wmaybe-uninitialized warnings
- Fix suspicious RCU usages
- Fix Oops when mounting multiple servers at once
- Suppress a false-positive pNFS error
- Fix a DMAR failure in NFS over RDMA"

* tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
NFS: Don't print a pNFS error if we aren't using pNFS
NFS: Ignore connections that have cl_rpcclient uninitialized
SUNRPC: Fix suspicious RCU usage
NFSv4.1: work around -Wmaybe-uninitialized warning
NFS: Trim extra slash in v4 nfs_path

Linus Torvalds
2016-11-12 01:15:30 +0800

11 Nov, 2016

1 commit

62bdf94a2 xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect ... Browse Code »

When a LOCALINV WR is flushed, the frmr is marked STALE, then
frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
are then recovered when frwr_op_map hunts for an INVALID frmr to
use.

All other cases that need frmr recovery leave that SGL DMA-mapped.
The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.

To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
alter the recovery logic (rather than the hot frwr_op_unmap_sync
path) to distinguish among these cases. This solution also takes
care of the case where multiple LOCAL_INV WRs are issued for the
same rpcrdma_req, some complete successfully, but some are flushed.

Reported-by: Vasco Steinmetz
Signed-off-by: Chuck Lever
Tested-by: Vasco Steinmetz
Signed-off-by: Anna Schumaker

Chuck Lever
2016-11-11 00:04:54 +0800

08 Nov, 2016

1 commit

bb29dd843 SUNRPC: Fix suspicious RCU usage ... Browse Code »

We need to hold the rcu_read_lock() when calling rcu_dereference(),
otherwise we can't guarantee that the object being dereferenced still
exists.

Fixes: 39e5d2df ("SUNRPC search xprt switch for sockaddr")
Signed-off-by: Anna Schumaker

Anna Schumaker
2016-11-08 03:35:59 +0800

02 Nov, 2016

1 commit

8d42629be svcrdma: backchannel cannot share a page for send and rcv buffers ... Browse Code »

The underlying transport releases the page pointed to by rq_buffer
during xprt_rdma_bc_send_request. When the backchannel reply arrives,
rq_rbuffer then points to freed memory.

Fixes: 68778945e46f ('SUNRPC: Separate buffer pointers for RPC ...')
Signed-off-by: Chuck Lever
Cc: Jeff Layton
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-11-02 03:23:58 +0800

29 Oct, 2016

1 commit

18e601d6a sunrpc: fix some missing rq_rbuffer assignments ... Browse Code »

We've been seeing some crashes in testing that look like this:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] memcpy_orig+0x29/0x110
PGD 212ca2067 PUD 212ca3067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88020d7ed200 task.stack: ffff880211838000
RIP: 0010:[] [] memcpy_orig+0x29/0x110
RSP: 0018:ffff88021183bdd0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
Stack:
ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
Call Trace:
[] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
[] svc_recv+0xad8/0xbd0 [sunrpc]
[] nfsd+0xde/0x160 [nfsd]
[] ? nfsd_destroy+0x60/0x60 [nfsd]
[] kthread+0xd8/0xf0
[] ret_from_fork+0x1f/0x40
[] ? kthread_park+0x60/0x60
Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
RIP [] memcpy_orig+0x29/0x110
RSP
CR2: 0000000000000000

Both Bruce and Eryu ran a bisect here and found that the problematic
patch was 68778945e46 (SUNRPC: Separate buffer pointers for RPC Call and
Reply messages).

That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
set up the receive buffer, but didn't change all of the necessary
codepaths to set it properly. In particular the backchannel setup was
missing.

We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.

Reviewed-by: Chuck Lever
Tested-by: Chuck Lever
Reported-by: Eryu Guan
Tested-by: Eryu Guan
Fixes: 68778945e46 "SUNRPC: Separate buffer pointers..."
Reported-by: J. Bruce Fields
Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-10-29 04:57:33 +0800

27 Oct, 2016

1 commit

2876a3446 sunrpc: don't pass on-stack memory to sg_set_buf ... Browse Code »

As of ac4e97abce9b "scatterlist: sg_set_buf() argument must be in linear
mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf,
among other callers, passes it memory on the stack.

We only need a scatterlist to pass this to the crypto code, and it seems
like overkill to require kmalloc'd memory just to encrypt a few bytes,
but for now this seems the best fix.

Many of these callers are in the NFS write paths, so we allocate with
GFP_NOFS. It might be possible to do without allocations here entirely,
but that would probably be a bigger project.

Cc: Rusty Russell
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2016-10-27 03:49:48 +0800

14 Oct, 2016

2 commits

c4a86165d Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Highlights include:

Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid

Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...

Linus Torvalds
2016-10-14 12:28:20 +0800
277855647 Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:

- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.

- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

Linus Torvalds
2016-10-14 12:04:42 +0800

11 Oct, 2016

2 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
2016-10-11 11:02:51 +0800

10 Oct, 2016

1 commit

b9044ac82 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma ... Browse Code »

Pull main rdma updates from Doug Ledford:
"This is the main pull request for the rdma stack this release. The
code has been through 0day and I had it tagged for linux-next testing
for a couple days.

Summary:

- updates to mlx5

- updates to mlx4 (two conflicts, both minor and easily resolved)

- updates to iw_cxgb4 (one conflict, not so obvious to resolve,
proper resolution is to keep the code in cxgb4_main.c as it is in
Linus' tree as attach_uld was refactored and moved into
cxgb4_uld.c)

- improvements to uAPI (moved vendor specific API elements to uAPI
area)

- add hns-roce driver and hns and hns-roce ACPI reset support

- conversion of all rdma code away from deprecated
create_singlethread_workqueue

- security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
staging/lustre: Disable InfiniBand support
iw_cxgb4: add fast-path for small REG_MR operations
cxgb4: advertise support for FR_NSMR_TPTE_WR
IB/core: correctly handle rdma_rw_init_mrs() failure
IB/srp: Fix infinite loop when FMR sg[0].offset != 0
IB/srp: Remove an unused argument
IB/core: Improve ib_map_mr_sg() documentation
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
IB/mthca: Move user vendor structures
IB/nes: Move user vendor structures
IB/ocrdma: Move user vendor structures
IB/mlx4: Move user vendor structures
IB/cxgb4: Move user vendor structures
IB/cxgb3: Move user vendor structures
IB/mlx5: Move and decouple user vendor structures
IB/{core,hw}: Add constant for node_desc
ipoib: Make ipoib_warn ratelimited
IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
IB/ipoib: Remove deprecated create_singlethread_workqueue
...

Linus Torvalds
2016-10-10 08:04:33 +0800

08 Oct, 2016

1 commit

81243eacf cred: simpler, 1D supplementary groups ... Browse Code »

Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.

If number of gids is
Cc: Vasily Kulikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2016-10-08 09:46:30 +0800

01 Oct, 2016

4 commits

66cbd4ba8 sunrpc: replace generic auth_cred hash with auth-specific function ... Browse Code »

Replace the generic code to hash the auth_cred with the call to
the auth-specific hash function in the rpc_authops struct.

Signed-off-by: Frank Sorenson
Signed-off-by: Anna Schumaker

Frank Sorenson
2016-10-01 03:47:47 +0800
a960f8d6d sunrpc: add RPCSEC_GSS hash_cred() function ... Browse Code »

Add a hash_cred() function for RPCSEC_GSS, using only the
uid from the auth_cred.

Signed-off-by: Frank Sorenson
Signed-off-by: Anna Schumaker

Frank Sorenson
2016-10-01 03:47:13 +0800
1e035d065 sunrpc: add auth_unix hash_cred() function ... Browse Code »

Add a hash_cred() function for auth_unix, using both the
uid and gid from the auth_cred.

Signed-off-by: Frank Sorenson
Signed-off-by: Anna Schumaker

Frank Sorenson
2016-10-01 03:45:21 +0800
18028c967 sunrpc: add generic_auth hash_cred() function ... Browse Code »

Add a hash_cred() function for generic_auth, using both the
uid and gid from the auth_cred.

Signed-off-by: Frank Sorenson
Signed-off-by: Anna Schumaker

Frank Sorenson
2016-10-01 03:33:36 +0800

28 Sep, 2016

2 commits

078cd8279 fs: Replace CURRENT_TIME with current_time() for inode timestamps ... Browse Code »

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Acked-by: Felipe Balbi
Acked-by: Steven Whitehouse
Acked-by: Ryusuke Konishi
Acked-by: David Sterba
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:21 +0800
77b00bc03 sunrpc: queue work on system_power_efficient_wq ... Browse Code »

sunrpc uses workqueue to clean cache regulary. There is no real dependency
of executing work on the cpu which queueing it.

On a idle system, especially for a heterogeneous systems like big.LITTLE,
it is observed that the big idle cpu was woke up many times just to service
this work, which against the principle of power saving. It would be better
if we can schedule it on a cpu which the scheduler believes to be the most
appropriate one.

After apply this patch, system_wq will be replaced by
system_power_efficient_wq for sunrpc. This functionality is enabled when
CONFIG_WQ_POWER_EFFICIENT is selected.

Signed-off-by: Ke Wang
Signed-off-by: Anna Schumaker

Ke Wang
2016-09-28 02:35:36 +0800

24 Sep, 2016

1 commit

ed082d36a IB/core: add support to create a unsafe global rkey to ib_create_pd ... Browse Code »

Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
less unchecked, this moves the capability of creating a global rkey into
the RDMA core, where it can be easily audited. It also prints a warning
everytime this feature is used as well.

Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Jason Gunthorpe
Reviewed-by: Steve Wise
Signed-off-by: Doug Ledford

Christoph Hellwig
2016-09-24 01:47:44 +0800

23 Sep, 2016

7 commits

25d55296d svcrdma: support Remote Invalidation ... Browse Code »

Support Remote Invalidation. A private message is exchanged with
the client upon RDMA transport connect that indicates whether
Send With Invalidation may be used by the server to send RPC
replies. The invalidate_rkey is arbitrarily chosen from among
rkeys present in the RPC-over-RDMA header's chunk lists.

Send With Invalidate improves performance only when clients can
recognize, while processing an RPC reply, that an rkey has already
been invalidated. That has been submitted as a separate change.

In the future, the RPC-over-RDMA protocol might support Remote
Invalidation properly. The protocol needs to enable signaling
between peers to indicate when Remote Invalidation can be used
for each individual RPC.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-09-23 22:18:54 +0800
cc9d83408 svcrdma: Server-side support for rpcrdma_connect_private ... Browse Code »

Prepare to receive an RDMA-CM private message when handling a new
connection attempt, and send a similar message as part of connection
acceptance.

Both sides can communicate their various implementation limits.
Implementations that don't support this sideband protocol ignore it.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-09-23 22:18:54 +0800
9995237bb svcrdma: Skip put_page() when send_reply() fails ... Browse Code »

Message from syslogd@klimt at Aug 18 17:00:37 ...
kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
Aug 18 17:00:37 klimt kernel: RIP: 0010:[] svc_rdma_sendto+0x641/0x820 [rpcrdma]

send_reply() assigns its page argument as the first page of ctxt. On
error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
which does a put_page() on that very page. No need to do that again
as svc_rdma_sendto exits.

Fixes: 3e1eeb980822 ("svcrdma: Close connection when a send error occurs")
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-09-23 22:18:53 +0800
cace564f8 svcrdma: Tail iovec leaves an orphaned DMA mapping ... Browse Code »

The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.

However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.

This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.

krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.

Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-09-23 22:18:52 +0800
5690a22d8 xprtrdma: use complete() instead complete_all() ... Browse Code »

There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

The usage pattern of the completion is:

waiter context waker context

frwr_op_unmap_sync()
reinit_completion()
ib_post_send()
wait_for_completion()

frwr_wc_localinv_wake()
complete()

Signed-off-by: Daniel Wagner
Cc: Anna Schumaker
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Anna Schumaker

Daniel Wagner
2016-09-23 21:48:24 +0800
a6cebd41b SUNRPC: Fix setting of buffer length in xdr_set_next_buffer() ... Browse Code »

Use xdr->nwords to tell us how much buffer remains.

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2016-09-23 05:17:47 +0800
ace0e14f4 SUNRPC: Fix corruption of xdr->nwords in xdr_copy_to_scratch ... Browse Code »

When we copy the first part of the data, we need to ensure that value
of xdr->nwords is updated as well. Do so by calling __xdr_inline_decode()

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2016-09-23 05:12:31 +0800

20 Sep, 2016

3 commits

d48f9ce73 sunrpc: fix write space race causing stalls ... Browse Code »

Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space(). The existing mechanism to avoid the
race does not work.

This (edited) partial trace illustrates the problem:

[1] rpc_task_run_action: task:43546@5 ... action=call_transmit
[2] xs_write_space snd_task (== 43546), but
this has not yet been queued and the wake up is lost.

[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
which queues task 43546.

[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
is supposed to handle the above race) does not call
xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
thus the task is not woken.

Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
so the second call to sk->sk_write_space() calls xprt_write_space().

Suggested-by: Trond Myklebust
Signed-off-by: David Vrabel
cc: stable@vger.kernel.org # 4.4
Signed-off-by: Anna Schumaker

David Vrabel
2016-09-20 01:21:36 +0800
496b77a5c xprtrdma: Eliminate rpcrdma_receive_worker() ... Browse Code »

Clean up: the extra layer of indirection doesn't add value.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2016-09-20 01:08:38 +0800
1519e9697 xprtrdma: Rename rpcrdma_receive_wc() ... Browse Code »

Clean up: When converting xprtrdma to use the new CQ API, I missed a
spot. The naming convention elsewhere is:

{svc_rdma,rpcrdma}_wc_{operation}

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2016-09-20 01:08:38 +0800