Eric Lee / smarc-fsl-linux-kernel

09 Dec, 2019

1 commit

a78f7cddd Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"Nine cifs/smb3 fixes:

- one fix for stable (oops during oplock break)

- two timestamp fixes including important one for updating mtime at
close to avoid stale metadata caching issue on dirty files (also
improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
wire)

- two fixes for "modefromsid" mount option for file create (now
allows mode bits to be set more atomically and accurately on create
by adding "sd_context" on create when modefromsid specified on
mount)

- two fixes for multichannel found in testing this week against
different servers

- two small cleanup patches"

* tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: improve check for when we send the security descriptor context on create
smb3: fix mode passed in on create for modetosid mount option
cifs: fix possible uninitialized access and race on iface_list
cifs: Fix lookup of SMB connections on multichannel
smb3: query attributes on file close
smb3: remove unused flag passed into close functions
cifs: remove redundant assignment to pointer pneg_ctxt
fs: cifs: Fix atime update check vs mtime
CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks

Linus Torvalds
2019-12-09 04:12:18 +0800

08 Dec, 2019

1 commit

231e2a0ba smb3: improve check for when we send the security descriptor context on create ... Browse Code »

We had cases in the previous patch where we were sending the security
descriptor context on SMB3 open (file create) in cases when we hadn't
mounted with with "modefromsid" mount option.

Add check for that mount flag before calling ad_sd_context in
open init.

Signed-off-by: Steve French
Reviewed-by: Pavel Shilovsky

Steve French
2019-12-08 07:38:22 +0800

07 Dec, 2019

2 commits

fdef665ba smb3: fix mode passed in on create for modetosid mount option ... Browse Code »

When using the special SID to store the mode bits in an ACE (See
http://technet.microsoft.com/en-us/library/hh509017(v=ws.10).aspx)
which is enabled with mount parm "modefromsid" we were not
passing in the mode via SMB3 create (although chmod was enabled).
SMB3 create allows a security descriptor context to be passed
in (which is more atomic and thus preferable to setting the mode
bits after create via a setinfo).

This patch enables setting the mode bits on create when using
modefromsid mount option. In addition it fixes an endian
error in the definition of the Control field flags in the SMB3
security descriptor. It also makes the ACE type of the special
SID better match the documentation (and behavior of servers
which use this to store mode bits in SMB3 ACLs).

Signed-off-by: Steve French
Acked-by: Ronnie Sahlberg
Reviewed-by: Pavel Shilovsky

Steve French
2019-12-07 04:15:52 +0800
0aecba617 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
"Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
Basically, the problem is that negative pinned dentries require
careful treatment - unless ->d_lock is locked or parent is held at
least shared, another thread can make them positive right under us.

Most of the uses turned out to be safe - the main surprises as far as
filesystems are concerned were

- race in dget_parent() fastpath, that might end up with the caller
observing the returned dentry _negative_, due to insufficient
barriers. It is positive in memory, but we could end up seeing the
wrong value of ->d_inode in CPU cache. Fixed.

- manual checks that result of lookup_one_len_unlocked() is positive
(and rejection of negatives). Again, insufficient barriers (we
might end up with inconsistent observed values of ->d_inode and
->d_flags). Fixed by switching to a new primitive that does the
checks itself and returns ERR_PTR(-ENOENT) instead of a negative
dentry. That way we get rid of boilerplate converting negatives
into ERR_PTR(-ENOENT) in the callers and have a single place to
deal with the barrier-related mess - inside fs/namei.c rather than
in every caller out there.

The guts of pathname resolution *do* need to be careful - the race
found by Ritesh is real, as well as several similar races.
Fortunately, it turns out that we can take care of that with fairly
local changes in there.

The tree-wide audit had not been fun, and I hate the idea of repeating
it. I think the right approach would be to annotate the places where
we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
catch regressions. But I'm still not sure what would be the least
invasive way of doing that and it's clearly the next cycle fodder"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/namei.c: fix missing barriers when checking positivity
fix dget_parent() fastpath race
new helper: lookup_positive_unlocked()
fs/namei.c: pull positivity check into follow_managed()

Linus Torvalds
2019-12-07 01:06:58 +0800

05 Dec, 2019

2 commits

9a7d5a9e6 cifs: fix possible uninitialized access and race on iface_list ... Browse Code »

iface[0] was accessed regardless of the count value and without
locking.

* check count before accessing any ifaces
* make copy of iface list (it's a simple POD array) and use it without
locking.

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French
Reviewed-by: Paulo Alcantara (SUSE)

Aurelien Aptel
2019-12-05 01:51:18 +0800
3345bb44b cifs: Fix lookup of SMB connections on multichannel ... Browse Code »

With the addition of SMB session channels, we introduced new TCP
server pointers that have no sessions or tcons associated with them.

In this case, when we started looking for TCP connections, we might
end up picking session channel rather than the master connection,
hence failing to get either a session or a tcon.

In order to fix that, this patch introduces a new "is_channel" field
to TCP_Server_Info structure so we can skip session channels during
lookup of connections.

Signed-off-by: Paulo Alcantara (SUSE)
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-12-05 01:50:32 +0800

04 Dec, 2019

1 commit

43f8a6a74 smb3: query attributes on file close ... Browse Code »

Since timestamps on files on most servers can be updated at
close, and since timestamps on our dentries default to one
second we can have stale timestamps in some common cases
(e.g. open, write, close, stat, wait one second, stat - will
show different mtime for the first and second stat).

The SMB2/SMB3 protocol allows querying timestamps at close
so add the code to request timestamp and attr information
(which is cheap for the server to provide) to be returned
when a file is closed (it is not needed for the many
paths that call SMB2_close that are from compounded
query infos and close nor is it needed for some of
the cases where a directory close immediately follows a
directory open.

Signed-off-by: Steve French
Acked-by: Ronnie Sahlberg
Reviewed-by: Aurelien Aptel
Reviewed-by: Pavel Shilovsky

Steve French
2019-12-04 05:48:02 +0800

03 Dec, 2019

5 commits

9e8fae259 smb3: remove unused flag passed into close functions ... Browse Code »

close was relayered to allow passing in an async flag which
is no longer needed in this path. Remove the unneeded parameter
"flags" passed in on close.

Signed-off-by: Steve French
Reviewed-by: Pavel Shilovsky
Reviewed-by: Ronnie Sahlberg

Steve French
2019-12-03 08:07:17 +0800
a9f76cf82 cifs: remove redundant assignment to pointer pneg_ctxt ... Browse Code »

The pointer pneg_ctxt is being initialized with a value that is never
read and it is being updated later with a new value. The assignment
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King
Signed-off-by: Steve French

Colin Ian King
2019-12-03 06:55:08 +0800
69738cfdf fs: cifs: Fix atime update check vs mtime ... Browse Code »

According to the comment in the code and commit log, some apps
expect atime >= mtime; but the introduced code results in
atime==mtime. Fix the comparison to guard against atime
Cc: stfrench@microsoft.com
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French

Deepa Dinamani
2019-12-03 05:15:35 +0800
6f582b273 CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks ... Browse Code »

Currently when the client creates a cifsFileInfo structure for
a newly opened file, it allocates a list of byte-range locks
with a pointer to the new cfile and attaches this list to the
inode's lock list. The latter happens before initializing all
other fields, e.g. cfile->tlink. Thus a partially initialized
cifsFileInfo structure becomes available to other threads that
walk through the inode's lock list. One example of such a thread
may be an oplock break worker thread that tries to push all
cached byte-range locks. This causes NULL-pointer dereference
in smb2_push_mandatory_locks() when accessing cfile->tlink:

[598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
[598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
...
[598428.945834] Call Trace:
[598428.945870] ? cifs_revalidate_mapping+0x45/0x90 [cifs]
[598428.945901] cifs_oplock_break+0x13d/0x450 [cifs]
[598428.945909] process_one_work+0x1db/0x380
[598428.945914] worker_thread+0x4d/0x400
[598428.945921] kthread+0x104/0x140
[598428.945925] ? process_one_work+0x380/0x380
[598428.945931] ? kthread_park+0x80/0x80
[598428.945937] ret_from_fork+0x35/0x40

Fix this by reordering initialization steps of the cifsFileInfo
structure: initialize all the fields first and then add the new
byte-range lock list to the inode's lock list.

Cc: Stable
Signed-off-by: Pavel Shilovsky
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Pavel Shilovsky
2019-12-03 05:15:00 +0800
937d6eefc Merge tag 'docs-5.5a' of git://git.lwn.net/linux ... Browse Code »

Pull Documentation updates from Jonathan Corbet:
"Here are the main documentation changes for 5.5:

- Various kerneldoc script enhancements.

- More RST conversions; those are slowing down as we run out of
things to convert, but we're a ways from done still.

- Dan's "maintainer profile entry" work landed at last. Now we just
need to get maintainers to fill in the profiles...

- A reworking of the parallel build setup to work better with a
variety of systems (and to not take over huge systems entirely in
particular).

- The MAINTAINERS file is now converted to RST during the build.
Hopefully nobody ever tries to print this thing, or they will need
to load a lot of paper.

- A script and documentation making it easy for maintainers to add
Link: tags at commit time.

Also included is the removal of a bunch of spurious CR characters"

* tag 'docs-5.5a' of git://git.lwn.net/linux: (91 commits)
docs: remove a bunch of stray CRs
docs: fix up the maintainer profile document
libnvdimm, MAINTAINERS: Maintainer Entry Profile
Maintainer Handbook: Maintainer Entry Profile
MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile
docs, parallelism: Rearrange how jobserver reservations are made
docs, parallelism: Do not leak blocking mode to other readers
docs, parallelism: Fix failure path and add comment
Documentation: Remove bootmem_debug from kernel-parameters.txt
Documentation: security: core.rst: fix warnings
Documentation/process/howto/kokr: Update for 4.x -> 5.x versioning
Documentation/translation: Use Korean for Korean translation title
docs/memory-barriers.txt: Remove remaining references to mmiowb()
docs/memory-barriers.txt/kokr: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt/kokr: Fix style, spacing and grammar in I/O section
Documentation/kokr: Kill all references to mmiowb()
docs/memory-barriers.txt/kokr: Rewrite "KERNEL I/O BARRIER EFFECTS" section
docs: Add initial documentation for devfreq
Documentation: Document how to get links with git am
docs: Add request_irq() documentation
...

Linus Torvalds
2019-12-03 03:51:02 +0800

28 Nov, 2019

1 commit

68464b88c CIFS: fix a white space issue in cifs_get_inode_info() ... Browse Code »

We accidentally messed up the indenting on this if statement.

Fixes: 16c696a6c300 ("CIFS: refactor cifs_get_inode_info()")
Signed-off-by: Dan Carpenter
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Dan Carpenter via samba-technical
2019-11-28 01:31:49 +0800

26 Nov, 2019

1 commit

1656a07a8 cifs: update internal module version number ... Browse Code »

To 2.24

Signed-off-by: Steve French

Steve French
2019-11-26 00:00:02 +0800

25 Nov, 2019

26 commits

ff6b6f3f9 cifs: Always update signing key of first channel ... Browse Code »

Update signing key of first channel whenever generating the master
sigining/encryption/decryption keys rather than only in cifs_mount().

This also fixes reconnect when re-establishing smb sessions to other
servers.

Signed-off-by: Paulo Alcantara (SUSE)
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-11-25 23:59:28 +0800
5bb30a4dd cifs: Fix retrieval of DFS referrals in cifs_mount() ... Browse Code »

Make sure that DFS referrals are sent to newly resolved root targets
as in a multi tier DFS setup.

Signed-off-by: Paulo Alcantara (SUSE)
Link: https://lkml.kernel.org/r/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com
Cc: stable@vger.kernel.org
Tested-by: Matthew Ruffell
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-11-25 23:36:49 +0800
84a1f5b1c cifs: Fix potential softlockups while refreshing DFS cache ... Browse Code »

We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+
FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still
establishing a SMB session.

However, when refresh_cache_worker() calls smb2_get_dfs_refer() and
we're under reconnect, SMB2_ioctl() will not be able to get a proper
status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an
-EAGAIN from cifs_send_recv() thus looping forever in
refresh_cache_worker().

Fixes: e99c63e4d86d ("SMB3: Fix deadlock in validate negotiate hits reconnect")
Signed-off-by: Paulo Alcantara (SUSE)
Suggested-by: Aurelien Aptel
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-11-25 23:33:04 +0800
df3df923b cifs: Fix lookup of root ses in DFS referral cache ... Browse Code »

We don't care about module aliasing validation in
cifs_compose_mount_options(..., is_smb3) when finding the root SMB
session of an DFS namespace in order to refresh DFS referral cache.

The following issue has been observed when mounting with '-t smb3' and
then specifying 'vers=2.0':

...
Nov 08 15:27:08 tw kernel: address conversion returned 0 for FS0.WIN.LOCAL
Nov 08 15:27:08 tw kernel: [kworke] ==> dns_query((null),FS0.WIN.LOCAL,13,(null))
Nov 08 15:27:08 tw kernel: [kworke] call request_key(,FS0.WIN.LOCAL,)
Nov 08 15:27:08 tw kernel: [kworke] ==> dns_resolver_cmp(FS0.WIN.LOCAL,FS0.WIN.LOCAL)
Nov 08 15:27:08 tw kernel: [kworke] Nov 08 15:27:08 tw kernel: CIFS VFS: vers=2.0 not permitted when mounting with smb3
Nov 08 15:27:08 tw kernel: fs/cifs/dfs_cache.c: CIFS VFS: leaving refresh_tcon (xid = 26) rc = -22
...

Fixes: 5072010ccf05 ("cifs: Fix DFS cache refresher for DFS links")
Signed-off-by: Paulo Alcantara (SUSE)
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-11-25 23:25:32 +0800
8354d88ef cifs: Fix use-after-free bug in cifs_reconnect() ... Browse Code »

Ensure we grab an active reference in cifs superblock while doing
failover to prevent automounts (DFS links) of expiring and then
destroying the superblock pointer.

This patch fixes the following KASAN report:

[ 464.301462] BUG: KASAN: use-after-free in
cifs_reconnect+0x6ab/0x1350
[ 464.303052] Read of size 8 at addr ffff888155e580d0 by task
cifsd/1107

[ 464.304682] CPU: 3 PID: 1107 Comm: cifsd Not tainted 5.4.0-rc4+ #13
[ 464.305552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.12.1-0-ga5cab58-rebuilt.opensuse.org 04/01/2014
[ 464.307146] Call Trace:
[ 464.307875] dump_stack+0x5b/0x90
[ 464.308631] print_address_description.constprop.0+0x16/0x200
[ 464.309478] ? cifs_reconnect+0x6ab/0x1350
[ 464.310253] ? cifs_reconnect+0x6ab/0x1350
[ 464.311040] __kasan_report.cold+0x1a/0x41
[ 464.311811] ? cifs_reconnect+0x6ab/0x1350
[ 464.312563] kasan_report+0xe/0x20
[ 464.313300] cifs_reconnect+0x6ab/0x1350
[ 464.314062] ? extract_hostname.part.0+0x90/0x90
[ 464.314829] ? printk+0xad/0xde
[ 464.315525] ? _raw_spin_lock+0x7c/0xd0
[ 464.316252] ? _raw_read_lock_irq+0x40/0x40
[ 464.316961] ? ___ratelimit+0xed/0x182
[ 464.317655] cifs_readv_from_socket+0x289/0x3b0
[ 464.318386] cifs_read_from_socket+0x98/0xd0
[ 464.319078] ? cifs_readv_from_socket+0x3b0/0x3b0
[ 464.319782] ? try_to_wake_up+0x43c/0xa90
[ 464.320463] ? cifs_small_buf_get+0x4b/0x60
[ 464.321173] ? allocate_buffers+0x98/0x1a0
[ 464.321856] cifs_demultiplex_thread+0x218/0x14a0
[ 464.322558] ? cifs_handle_standard+0x270/0x270
[ 464.323237] ? __switch_to_asm+0x40/0x70
[ 464.323893] ? __switch_to_asm+0x34/0x70
[ 464.324554] ? __switch_to_asm+0x40/0x70
[ 464.325226] ? __switch_to_asm+0x40/0x70
[ 464.325863] ? __switch_to_asm+0x34/0x70
[ 464.326505] ? __switch_to_asm+0x40/0x70
[ 464.327161] ? __switch_to_asm+0x34/0x70
[ 464.327784] ? finish_task_switch+0xa1/0x330
[ 464.328414] ? __switch_to+0x363/0x640
[ 464.329044] ? __schedule+0x575/0xaf0
[ 464.329655] ? _raw_spin_lock_irqsave+0x82/0xe0
[ 464.330301] kthread+0x1a3/0x1f0
[ 464.330884] ? cifs_handle_standard+0x270/0x270
[ 464.331624] ? kthread_create_on_node+0xd0/0xd0
[ 464.332347] ret_from_fork+0x35/0x40

[ 464.333577] Allocated by task 1110:
[ 464.334381] save_stack+0x1b/0x80
[ 464.335123] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 464.335848] cifs_smb3_do_mount+0xd4/0xb00
[ 464.336619] legacy_get_tree+0x6b/0xa0
[ 464.337235] vfs_get_tree+0x41/0x110
[ 464.337975] fc_mount+0xa/0x40
[ 464.338557] vfs_kern_mount.part.0+0x6c/0x80
[ 464.339227] cifs_dfs_d_automount+0x336/0xd29
[ 464.339846] follow_managed+0x1b1/0x450
[ 464.340449] lookup_fast+0x231/0x4a0
[ 464.341039] path_openat+0x240/0x1fd0
[ 464.341634] do_filp_open+0x126/0x1c0
[ 464.342277] do_sys_open+0x1eb/0x2c0
[ 464.342957] do_syscall_64+0x5e/0x190
[ 464.343555] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 464.344772] Freed by task 0:
[ 464.345347] save_stack+0x1b/0x80
[ 464.345966] __kasan_slab_free+0x12c/0x170
[ 464.346576] kfree+0xa6/0x270
[ 464.347211] rcu_core+0x39c/0xc80
[ 464.347800] __do_softirq+0x10d/0x3da

[ 464.348919] The buggy address belongs to the object at
ffff888155e58000
which belongs to the cache kmalloc-256 of size 256
[ 464.350222] The buggy address is located 208 bytes inside of
256-byte region [ffff888155e58000, ffff888155e58100)
[ 464.351575] The buggy address belongs to the page:
[ 464.352333] page:ffffea0005579600 refcount:1 mapcount:0
mapping:ffff88815a803400 index:0x0 compound_mapcount: 0
[ 464.353583] flags: 0x200000000010200(slab|head)
[ 464.354209] raw: 0200000000010200 ffffea0005576200 0000000400000004
ffff88815a803400
[ 464.355353] raw: 0000000000000000 0000000080100010 00000001ffffffff
0000000000000000
[ 464.356458] page dumped because: kasan: bad access detected

[ 464.367005] Memory state around the buggy address:
[ 464.367787] ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.368877] ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 464.371111] ^
[ 464.371775] ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.372893] ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.373983] ==================================================================

Signed-off-by: Paulo Alcantara (SUSE)
Reviewed-by: Aurelien Aptel
Signed-off-by: Steve French

Paulo Alcantara (SUSE)
2019-11-25 23:23:10 +0800
85150929a cifs: dump channel info in DebugData ... Browse Code »

* show server&TCP states for extra channels
* mention if an interface has a channel connected to it

In this version three of the patch, fixed minor printk format
issue pointed out by the kbuild robot.
Reported-by: kbuild test robot

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:17:12 +0800
1ae9a5a55 smb3: dump in_send and num_waiters stats counters by default ... Browse Code »

Number of requests in_send and the number of waiters on sendRecv
are useful counters in various cases, move them from
CONFIG_CIFS_STATS2 to be on by default especially with multichannel

Signed-off-by: Steve French
Acked-by: Ronnie Sahlberg

Steve French
2019-11-25 15:17:12 +0800
65a37a341 cifs: try harder to open new channels ... Browse Code »

Previously we would only loop over the iface list once.
This patch tries to loop over multiple times until all channels are
opened. It will also try to reuse RSS ifaces.

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:17:12 +0800
9bd454083 CIFS: Properly process SMB3 lease breaks ... Browse Code »

Currenly we doesn't assume that a server may break a lease
from RWH to RW which causes us setting a wrong lease state
on a file and thus mistakenly flushing data and byte-range
locks and purging cached data on the client. This leads to
performance degradation because subsequent IOs go directly
to the server.

Fix this by propagating new lease state and epoch values
to the oplock break handler through cifsFileInfo structure
and removing the use of cifsInodeInfo flags for that. It
allows to avoid some races of several lease/oplock breaks
using those flags in parallel.

Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:17:12 +0800
32546a958 cifs: move cifsFileInfo_put logic into a work-queue ... Browse Code »

This patch moves the final part of the cifsFileInfo_put() logic where we
need a write lock on lock_sem to be processed in a separate thread that
holds no other locks.
This is to prevent deadlocks like the one below:

> there are 6 processes looping to while trying to down_write
> cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
> cifs_new_fileinfo
>
> and there are 5 other processes which are blocked, several of them
> waiting on either PG_writeback or PG_locked (which are both set), all
> for the same page of the file
>
> 2 inode_lock() (inode->i_rwsem) for the file
> 1 wait_on_page_writeback() for the page
> 1 down_read(inode->i_rwsem) for the inode of the directory
> 1 inode_lock()(inode->i_rwsem) for the inode of the directory
> 1 __lock_page
>
>
> so processes are blocked waiting on:
> page flags PG_locked and PG_writeback for one specific page
> inode->i_rwsem for the directory
> inode->i_rwsem for the file
> cifsInodeInflock_sem
>
>
>
> here are the more gory details (let me know if I need to provide
> anything more/better):
>
> [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3
> COMMAND: "reopen_file"
> #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
> #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
> #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
> #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
> #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
> #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
> #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
> #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
> * (I think legitimize_path is bogus)
>
> in path_openat
> } else {
> const char *s = path_init(nd, flags);
> while (!(error = link_path_walk(s, nd)) &&
> (error = do_last(nd, file, op)) > 0) { <<<<
>
> do_last:
> if (open_flag & O_CREAT)
> inode_lock(dir->d_inode); <<<<
> else
> so it's trying to take inode->i_rwsem for the directory
>
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/
> inode.i_rwsem is ffff8c691158efc0
>
> :
> owner: (UN - 8856 -
> reopen_file), counter: 0x0000000000000003
> waitlist: 2
> 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926
> RWSEM_WAITING_FOR_WRITE
> 0xffff996500393e00 9802 ls UN 0 1:17:26.700
> RWSEM_WAITING_FOR_READ
>
>
> the owner of the inode.i_rwsem of the directory is:
>
> [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3
> COMMAND: "reopen_file"
> #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
> #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
> #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff99650065b940] msleep at ffffffff9af573a9
> #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
> #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
> #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
> #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
> #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
> #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
> #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
> #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
> #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
> #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
> #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
>
> cifs_launder_page is for page 0xffffd1e2c07d2480
>
> crash> page.index,mapping,flags 0xffffd1e2c07d2480
> index = 0x8
> mapping = 0xffff8c68f3cd0db0
> flags = 0xfffffc0008095
>
> PAGE-FLAG BIT VALUE
> PG_locked 0 0000001
> PG_uptodate 2 0000004
> PG_lru 4 0000010
> PG_waiters 7 0000080
> PG_writeback 15 0008000
>
>
> inode is ffff8c68f3cd0c40
> inode.i_rwsem is ffff8c68f3cd0ce0
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
> /mnt/vm1_smb/testfile.8853
>
>
> this process holds the inode->i_rwsem for the parent directory, is
> laundering a page attached to the inode of the file it's opening, and in
> _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
> for the file itself.
>
>
> :
> owner: (UN - 8854 -
> reopen_file), counter: 0x0000000000000003
> waitlist: 1
> 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912
> RWSEM_WAITING_FOR_WRITE
>
> this is the inode.i_rwsem for the file
>
> the owner:
>
> [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2
> COMMAND: "reopen_file"
> #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
> #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
> #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
> #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
> #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
> #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
> #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
> #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
> #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
> #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
> #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
> #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
> #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
> #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
>
> the process holds the inode->i_rwsem for the file to which it's writing,
> and is trying to __lock_page for the same page as in the other processes
>
>
> the other tasks:
> [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2
> COMMAND: "reopen_file"
> #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
> #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
> #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
> #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
> #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
> #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
> #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
> #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
> #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
> #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
>
> this is opening the file, and is trying to down_write cinode->lock_sem
>
>
> [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3
> COMMAND: "reopen_file"
> [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6
> COMMAND: "reopen_file"
> #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
> #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
> #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
> #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
> #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
> #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
> #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
> #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
> #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
>
> closing the file, and trying to down_write cifsi->lock_sem
>
>
> [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7
> COMMAND: "reopen_file"
> #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
> #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
> #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
> #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
> #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
> #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
> #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
> #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
> #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
> #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
>
> in __filemap_fdatawait_range
> wait_on_page_writeback(page);
> for the same page of the file
>
>
>
> [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7
> COMMAND: "reopen_file"
> #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
> #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
> #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
> #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
> #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
> #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
> #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
> #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
>
> inode_lock(inode);
>
>
> and one 'ls' later on, to see whether the rest of the mount is available
> (the test file is in the root, so we get blocked up on the directory
> ->i_rwsem), so the entire mount is unavailable
>
> [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4
> COMMAND: "ls"
> #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
> #1 [ffff996500393db8] schedule at ffffffff9b6e64df
> #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
> #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
> #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
> #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
> #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
> #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
>
> in iterate_dir:
> if (shared)
> res = down_read_killable(&inode->i_rwsem); <<<<
> else
> res = down_write_killable(&inode->i_rwsem);
>

Reported-by: Frank Sorenson
Reviewed-by: Pavel Shilovsky
Signed-off-by: Ronnie Sahlberg
Signed-off-by: Steve French

Ronnie Sahlberg
2019-11-25 15:17:12 +0800
d70e9fa55 cifs: try opening channels after mounting ... Browse Code »

After doing mount() successfully we call cifs_try_adding_channels()
which will open as many channels as it can.

Channels are closed when the master session is closed.

The master connection becomes the first channel.

,-------------> global cifs_tcp_ses_list
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
b8f7442bc CIFS: refactor cifs_get_inode_info() ... Browse Code »

Make logic of cifs_get_inode() much clearer by moving code to sub
functions and adding comments.

Document the steps this function does.

cifs_get_inode_info() gets and updates a file inode metadata from its
file path.

* If caller already has raw info data from server they can pass it.
* If inode already exists (just need to update) caller can pass it.

Step 1: get raw data from server if none was passed
Step 2: parse raw data into intermediate internal cifs_fattr struct
Step 3: set fattr uniqueid which is later used for inode number. This
can sometime be done from raw data
Step 4: tweak fattr according to mount options (file_mode, acl to mode
bits, uid, gid, etc)
Step 5: update or create inode from final fattr struct

* add is_smb1_server() helper
* add is_inode_cache_good() helper
* move SMB1-backupcreds-getinfo-retry to separate func
cifs_backup_query_path_info().
* move set-uniqueid code to separate func cifs_set_fattr_ino()
* don't clobber uniqueid from backup cred retry
* fix some probable corner cases memleaks

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
f6a6bf7c4 cifs: switch servers depending on binding state ... Browse Code »

Currently a lot of the code to initialize a connection & session uses
the cifs_ses as input. But depending on if we are opening a new session
or a new channel we need to use different server pointers.

Add a "binding" flag in cifs_ses and a helper function that returns
the server ptr a session should use (only in the sess establishment
code path).

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
f780bd3fe cifs: add server param ... Browse Code »

As we get down to the transport layer, plenty of functions are passed
the session pointer and assume the transport to use is ses->server.

Instead we modify those functions to pass (ses, server) so that we
can decouple the session from the server.

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
bcc888011 cifs: add multichannel mount options and data structs ... Browse Code »

adds:
- [no]multichannel to enable/disable multichannel
- max_channels=N to control how many channels to create

these options are then stored in the volume struct.

- store channels and max_channels in cifs_ses

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
35adffed0 cifs: sort interface list by speed ... Browse Code »

New channels are going to be opened by walking the list sequentially,
so by sorting it we will connect to the fastest interfaces first.

Signed-off-by: Aurelien Aptel
Signed-off-by: Steve French

Aurelien Aptel
2019-11-25 15:16:30 +0800
fa9c23624 CIFS: Fix SMB2 oplock break processing ... Browse Code »

Even when mounting modern protocol version the server may be
configured without supporting SMB2.1 leases and the client
uses SMB2 oplock to optimize IO performance through local caching.

However there is a problem in oplock break handling that leads
to missing a break notification on the client who has a file
opened. It latter causes big latencies to other clients that
are trying to open the same file.

The problem reproduces when there are multiple shares from the
same server mounted on the client. The processing code tries to
match persistent and volatile file ids from the break notification
with an open file but it skips all share besides the first one.
Fix this by looking up in all shares belonging to the server that
issued the oplock break.

Cc: Stable
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:16:30 +0800
3591bb83e cifs: don't use 'pre:' for MODULE_SOFTDEP ... Browse Code »

It can cause
to fail with
modprobe: FATAL: Module is builtin.

RHBZ: 1767094

Signed-off-by: Ronnie Sahlberg
Signed-off-by: Steve French

Ronnie Sahlberg
2019-11-25 15:16:30 +0800
4357d45f5 cifs: smbd: Return -EAGAIN when transport is reconnecting ... Browse Code »

During reconnecting, the transport may have already been destroyed and is in
the process being reconnected. In this case, return -EAGAIN to not fail and
to retry this I/O.

Signed-off-by: Long Li
Cc: stable@vger.kernel.org
Signed-off-by: Steve French

Long Li
2019-11-25 15:16:30 +0800
c21ce58ea cifs: smbd: Only queue work for error recovery on memory registration ... Browse Code »

It's not necessary to queue invalidated memory registration to work queue, as
all we need to do is to unmap the SG and make it usable again. This can save
CPU cycles in normal data paths as memory registration errors are rare and
normally only happens during reconnection.

Signed-off-by: Long Li
Cc: stable@vger.kernel.org
Signed-off-by: Steve French

Long Li
2019-11-25 15:16:30 +0800
87bc2376f smb3: add debug messages for closing unmatched open ... Browse Code »

Helps distinguish between an interrupted close and a truly
unmatched open.

Signed-off-by: Ronnie Sahlberg
Signed-off-by: Steve French

Ronnie Sahlberg
2019-11-25 15:14:53 +0800
7b71843fa CIFS: Do not miss cancelled OPEN responses ... Browse Code »

When an OPEN command is cancelled we mark a mid as
cancelled and let the demultiplex thread process it
by closing an open handle. The problem is there is
a race between a system call thread and the demultiplex
thread and there may be a situation when the mid has
been already processed before it is set as cancelled.

Fix this by processing cancelled requests when mids
are being destroyed which means that there is only
one thread referencing a particular mid. Also set
mids as cancelled unconditionally on their state.

Cc: Stable
Tested-by: Frank Sorenson
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:14:53 +0800
86a7964be CIFS: Fix NULL pointer dereference in mid callback ... Browse Code »

There is a race between a system call processing thread
and the demultiplex thread when mid->resp_buf becomes NULL
and later is being accessed to get credits. It happens when
the 1st thread wakes up before a mid callback is called in
the 2nd one but the mid state has already been set to
MID_RESPONSE_RECEIVED. This causes NULL pointer dereference
in mid callback.

Fix this by saving credits from the response before we
update the mid state and then use this value in the mid
callback rather then accessing a response buffer.

Cc: Stable
Fixes: ee258d79159afed5 ("CIFS: Move credit processing to mid callbacks for SMB3")
Tested-by: Frank Sorenson
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:14:53 +0800
9150c3adb CIFS: Close open handle after interrupted close ... Browse Code »

If Close command is interrupted before sending a request
to the server the client ends up leaking an open file
handle. This wastes server resources and can potentially
block applications that try to remove the file or any
directory containing this file.

Fix this by putting the close command into a worker queue,
so another thread retries it later.

Cc: Stable
Tested-by: Frank Sorenson
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:14:53 +0800
44805b0e6 CIFS: Respect O_SYNC and O_DIRECT flags during reconnect ... Browse Code »

Currently the client translates O_SYNC and O_DIRECT flags
into corresponding SMB create options when openning a file.
The problem is that on reconnect when the file is being
re-opened the client doesn't set those flags and it causes
a server to reject re-open requests because create options
don't match. The latter means that any subsequent system
call against that open file fail until a share is re-mounted.

Fix this by properly setting SMB create options when
re-openning files after reconnects.

Fixes: 1013e760d10e6: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
Cc: Stable
Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2019-11-25 15:14:53 +0800
037d05072 smb3: remove confusing dmesg when mounting with encryption ("seal") ... Browse Code »

The smb2/smb3 message checking code was logging to dmesg when mounting
with encryption ("seal") for compounded SMB3 requests. When encrypted
the whole frame (including potentially multiple compounds) is read
so the length field is longer than in the case of non-encrypted
case (where length field will match the the calculated length for
the particular SMB3 request in the compound being validated).

Avoids the warning on mount (with "seal"):

"srv rsp padded more than expected. Length 384 not ..."

Signed-off-by: Steve French

Steve French
2019-11-25 15:14:53 +0800