09 Dec, 2019

1 commit

  • Pull cifs fixes from Steve French:
    "Nine cifs/smb3 fixes:

    - one fix for stable (oops during oplock break)

    - two timestamp fixes including important one for updating mtime at
    close to avoid stale metadata caching issue on dirty files (also
    improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
    wire)

    - two fixes for "modefromsid" mount option for file create (now
    allows mode bits to be set more atomically and accurately on create
    by adding "sd_context" on create when modefromsid specified on
    mount)

    - two fixes for multichannel found in testing this week against
    different servers

    - two small cleanup patches"

    * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
    smb3: improve check for when we send the security descriptor context on create
    smb3: fix mode passed in on create for modetosid mount option
    cifs: fix possible uninitialized access and race on iface_list
    cifs: Fix lookup of SMB connections on multichannel
    smb3: query attributes on file close
    smb3: remove unused flag passed into close functions
    cifs: remove redundant assignment to pointer pneg_ctxt
    fs: cifs: Fix atime update check vs mtime
    CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks

    Linus Torvalds
     

08 Dec, 2019

1 commit


07 Dec, 2019

2 commits

  • When using the special SID to store the mode bits in an ACE (See
    http://technet.microsoft.com/en-us/library/hh509017(v=ws.10).aspx)
    which is enabled with mount parm "modefromsid" we were not
    passing in the mode via SMB3 create (although chmod was enabled).
    SMB3 create allows a security descriptor context to be passed
    in (which is more atomic and thus preferable to setting the mode
    bits after create via a setinfo).

    This patch enables setting the mode bits on create when using
    modefromsid mount option. In addition it fixes an endian
    error in the definition of the Control field flags in the SMB3
    security descriptor. It also makes the ACE type of the special
    SID better match the documentation (and behavior of servers
    which use this to store mode bits in SMB3 ACLs).

    Signed-off-by: Steve French
    Acked-by: Ronnie Sahlberg
    Reviewed-by: Pavel Shilovsky

    Steve French
     
  • Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
    "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
    Basically, the problem is that negative pinned dentries require
    careful treatment - unless ->d_lock is locked or parent is held at
    least shared, another thread can make them positive right under us.

    Most of the uses turned out to be safe - the main surprises as far as
    filesystems are concerned were

    - race in dget_parent() fastpath, that might end up with the caller
    observing the returned dentry _negative_, due to insufficient
    barriers. It is positive in memory, but we could end up seeing the
    wrong value of ->d_inode in CPU cache. Fixed.

    - manual checks that result of lookup_one_len_unlocked() is positive
    (and rejection of negatives). Again, insufficient barriers (we
    might end up with inconsistent observed values of ->d_inode and
    ->d_flags). Fixed by switching to a new primitive that does the
    checks itself and returns ERR_PTR(-ENOENT) instead of a negative
    dentry. That way we get rid of boilerplate converting negatives
    into ERR_PTR(-ENOENT) in the callers and have a single place to
    deal with the barrier-related mess - inside fs/namei.c rather than
    in every caller out there.

    The guts of pathname resolution *do* need to be careful - the race
    found by Ritesh is real, as well as several similar races.
    Fortunately, it turns out that we can take care of that with fairly
    local changes in there.

    The tree-wide audit had not been fun, and I hate the idea of repeating
    it. I think the right approach would be to annotate the places where
    we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
    catch regressions. But I'm still not sure what would be the least
    invasive way of doing that and it's clearly the next cycle fodder"

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/namei.c: fix missing barriers when checking positivity
    fix dget_parent() fastpath race
    new helper: lookup_positive_unlocked()
    fs/namei.c: pull positivity check into follow_managed()

    Linus Torvalds
     

05 Dec, 2019

2 commits

  • iface[0] was accessed regardless of the count value and without
    locking.

    * check count before accessing any ifaces
    * make copy of iface list (it's a simple POD array) and use it without
    locking.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French
    Reviewed-by: Paulo Alcantara (SUSE)

    Aurelien Aptel
     
  • With the addition of SMB session channels, we introduced new TCP
    server pointers that have no sessions or tcons associated with them.

    In this case, when we started looking for TCP connections, we might
    end up picking session channel rather than the master connection,
    hence failing to get either a session or a tcon.

    In order to fix that, this patch introduces a new "is_channel" field
    to TCP_Server_Info structure so we can skip session channels during
    lookup of connections.

    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     

04 Dec, 2019

1 commit

  • Since timestamps on files on most servers can be updated at
    close, and since timestamps on our dentries default to one
    second we can have stale timestamps in some common cases
    (e.g. open, write, close, stat, wait one second, stat - will
    show different mtime for the first and second stat).

    The SMB2/SMB3 protocol allows querying timestamps at close
    so add the code to request timestamp and attr information
    (which is cheap for the server to provide) to be returned
    when a file is closed (it is not needed for the many
    paths that call SMB2_close that are from compounded
    query infos and close nor is it needed for some of
    the cases where a directory close immediately follows a
    directory open.

    Signed-off-by: Steve French
    Acked-by: Ronnie Sahlberg
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Pavel Shilovsky

    Steve French
     

03 Dec, 2019

5 commits

  • close was relayered to allow passing in an async flag which
    is no longer needed in this path. Remove the unneeded parameter
    "flags" passed in on close.

    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg

    Steve French
     
  • The pointer pneg_ctxt is being initialized with a value that is never
    read and it is being updated later with a new value. The assignment
    is redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Signed-off-by: Steve French

    Colin Ian King
     
  • According to the comment in the code and commit log, some apps
    expect atime >= mtime; but the introduced code results in
    atime==mtime. Fix the comparison to guard against atime
    Cc: stfrench@microsoft.com
    Cc: linux-cifs@vger.kernel.org
    Signed-off-by: Steve French

    Deepa Dinamani
     
  • Currently when the client creates a cifsFileInfo structure for
    a newly opened file, it allocates a list of byte-range locks
    with a pointer to the new cfile and attaches this list to the
    inode's lock list. The latter happens before initializing all
    other fields, e.g. cfile->tlink. Thus a partially initialized
    cifsFileInfo structure becomes available to other threads that
    walk through the inode's lock list. One example of such a thread
    may be an oplock break worker thread that tries to push all
    cached byte-range locks. This causes NULL-pointer dereference
    in smb2_push_mandatory_locks() when accessing cfile->tlink:

    [598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
    ...
    [598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
    [598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
    ...
    [598428.945834] Call Trace:
    [598428.945870] ? cifs_revalidate_mapping+0x45/0x90 [cifs]
    [598428.945901] cifs_oplock_break+0x13d/0x450 [cifs]
    [598428.945909] process_one_work+0x1db/0x380
    [598428.945914] worker_thread+0x4d/0x400
    [598428.945921] kthread+0x104/0x140
    [598428.945925] ? process_one_work+0x380/0x380
    [598428.945931] ? kthread_park+0x80/0x80
    [598428.945937] ret_from_fork+0x35/0x40

    Fix this by reordering initialization steps of the cifsFileInfo
    structure: initialize all the fields first and then add the new
    byte-range lock list to the inode's lock list.

    Cc: Stable
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • Pull Documentation updates from Jonathan Corbet:
    "Here are the main documentation changes for 5.5:

    - Various kerneldoc script enhancements.

    - More RST conversions; those are slowing down as we run out of
    things to convert, but we're a ways from done still.

    - Dan's "maintainer profile entry" work landed at last. Now we just
    need to get maintainers to fill in the profiles...

    - A reworking of the parallel build setup to work better with a
    variety of systems (and to not take over huge systems entirely in
    particular).

    - The MAINTAINERS file is now converted to RST during the build.
    Hopefully nobody ever tries to print this thing, or they will need
    to load a lot of paper.

    - A script and documentation making it easy for maintainers to add
    Link: tags at commit time.

    Also included is the removal of a bunch of spurious CR characters"

    * tag 'docs-5.5a' of git://git.lwn.net/linux: (91 commits)
    docs: remove a bunch of stray CRs
    docs: fix up the maintainer profile document
    libnvdimm, MAINTAINERS: Maintainer Entry Profile
    Maintainer Handbook: Maintainer Entry Profile
    MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile
    docs, parallelism: Rearrange how jobserver reservations are made
    docs, parallelism: Do not leak blocking mode to other readers
    docs, parallelism: Fix failure path and add comment
    Documentation: Remove bootmem_debug from kernel-parameters.txt
    Documentation: security: core.rst: fix warnings
    Documentation/process/howto/kokr: Update for 4.x -> 5.x versioning
    Documentation/translation: Use Korean for Korean translation title
    docs/memory-barriers.txt: Remove remaining references to mmiowb()
    docs/memory-barriers.txt/kokr: Update I/O section to be clearer about CPU vs thread
    docs/memory-barriers.txt/kokr: Fix style, spacing and grammar in I/O section
    Documentation/kokr: Kill all references to mmiowb()
    docs/memory-barriers.txt/kokr: Rewrite "KERNEL I/O BARRIER EFFECTS" section
    docs: Add initial documentation for devfreq
    Documentation: Document how to get links with git am
    docs: Add request_irq() documentation
    ...

    Linus Torvalds
     

28 Nov, 2019

1 commit


26 Nov, 2019

1 commit


25 Nov, 2019

26 commits

  • Update signing key of first channel whenever generating the master
    sigining/encryption/decryption keys rather than only in cifs_mount().

    This also fixes reconnect when re-establishing smb sessions to other
    servers.

    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • Make sure that DFS referrals are sent to newly resolved root targets
    as in a multi tier DFS setup.

    Signed-off-by: Paulo Alcantara (SUSE)
    Link: https://lkml.kernel.org/r/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com
    Cc: stable@vger.kernel.org
    Tested-by: Matthew Ruffell
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+
    FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still
    establishing a SMB session.

    However, when refresh_cache_worker() calls smb2_get_dfs_refer() and
    we're under reconnect, SMB2_ioctl() will not be able to get a proper
    status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an
    -EAGAIN from cifs_send_recv() thus looping forever in
    refresh_cache_worker().

    Fixes: e99c63e4d86d ("SMB3: Fix deadlock in validate negotiate hits reconnect")
    Signed-off-by: Paulo Alcantara (SUSE)
    Suggested-by: Aurelien Aptel
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • We don't care about module aliasing validation in
    cifs_compose_mount_options(..., is_smb3) when finding the root SMB
    session of an DFS namespace in order to refresh DFS referral cache.

    The following issue has been observed when mounting with '-t smb3' and
    then specifying 'vers=2.0':

    ...
    Nov 08 15:27:08 tw kernel: address conversion returned 0 for FS0.WIN.LOCAL
    Nov 08 15:27:08 tw kernel: [kworke] ==> dns_query((null),FS0.WIN.LOCAL,13,(null))
    Nov 08 15:27:08 tw kernel: [kworke] call request_key(,FS0.WIN.LOCAL,)
    Nov 08 15:27:08 tw kernel: [kworke] ==> dns_resolver_cmp(FS0.WIN.LOCAL,FS0.WIN.LOCAL)
    Nov 08 15:27:08 tw kernel: [kworke] Nov 08 15:27:08 tw kernel: CIFS VFS: vers=2.0 not permitted when mounting with smb3
    Nov 08 15:27:08 tw kernel: fs/cifs/dfs_cache.c: CIFS VFS: leaving refresh_tcon (xid = 26) rc = -22
    ...

    Fixes: 5072010ccf05 ("cifs: Fix DFS cache refresher for DFS links")
    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • Ensure we grab an active reference in cifs superblock while doing
    failover to prevent automounts (DFS links) of expiring and then
    destroying the superblock pointer.

    This patch fixes the following KASAN report:

    [ 464.301462] BUG: KASAN: use-after-free in
    cifs_reconnect+0x6ab/0x1350
    [ 464.303052] Read of size 8 at addr ffff888155e580d0 by task
    cifsd/1107

    [ 464.304682] CPU: 3 PID: 1107 Comm: cifsd Not tainted 5.4.0-rc4+ #13
    [ 464.305552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
    BIOS rel-1.12.1-0-ga5cab58-rebuilt.opensuse.org 04/01/2014
    [ 464.307146] Call Trace:
    [ 464.307875] dump_stack+0x5b/0x90
    [ 464.308631] print_address_description.constprop.0+0x16/0x200
    [ 464.309478] ? cifs_reconnect+0x6ab/0x1350
    [ 464.310253] ? cifs_reconnect+0x6ab/0x1350
    [ 464.311040] __kasan_report.cold+0x1a/0x41
    [ 464.311811] ? cifs_reconnect+0x6ab/0x1350
    [ 464.312563] kasan_report+0xe/0x20
    [ 464.313300] cifs_reconnect+0x6ab/0x1350
    [ 464.314062] ? extract_hostname.part.0+0x90/0x90
    [ 464.314829] ? printk+0xad/0xde
    [ 464.315525] ? _raw_spin_lock+0x7c/0xd0
    [ 464.316252] ? _raw_read_lock_irq+0x40/0x40
    [ 464.316961] ? ___ratelimit+0xed/0x182
    [ 464.317655] cifs_readv_from_socket+0x289/0x3b0
    [ 464.318386] cifs_read_from_socket+0x98/0xd0
    [ 464.319078] ? cifs_readv_from_socket+0x3b0/0x3b0
    [ 464.319782] ? try_to_wake_up+0x43c/0xa90
    [ 464.320463] ? cifs_small_buf_get+0x4b/0x60
    [ 464.321173] ? allocate_buffers+0x98/0x1a0
    [ 464.321856] cifs_demultiplex_thread+0x218/0x14a0
    [ 464.322558] ? cifs_handle_standard+0x270/0x270
    [ 464.323237] ? __switch_to_asm+0x40/0x70
    [ 464.323893] ? __switch_to_asm+0x34/0x70
    [ 464.324554] ? __switch_to_asm+0x40/0x70
    [ 464.325226] ? __switch_to_asm+0x40/0x70
    [ 464.325863] ? __switch_to_asm+0x34/0x70
    [ 464.326505] ? __switch_to_asm+0x40/0x70
    [ 464.327161] ? __switch_to_asm+0x34/0x70
    [ 464.327784] ? finish_task_switch+0xa1/0x330
    [ 464.328414] ? __switch_to+0x363/0x640
    [ 464.329044] ? __schedule+0x575/0xaf0
    [ 464.329655] ? _raw_spin_lock_irqsave+0x82/0xe0
    [ 464.330301] kthread+0x1a3/0x1f0
    [ 464.330884] ? cifs_handle_standard+0x270/0x270
    [ 464.331624] ? kthread_create_on_node+0xd0/0xd0
    [ 464.332347] ret_from_fork+0x35/0x40

    [ 464.333577] Allocated by task 1110:
    [ 464.334381] save_stack+0x1b/0x80
    [ 464.335123] __kasan_kmalloc.constprop.0+0xc2/0xd0
    [ 464.335848] cifs_smb3_do_mount+0xd4/0xb00
    [ 464.336619] legacy_get_tree+0x6b/0xa0
    [ 464.337235] vfs_get_tree+0x41/0x110
    [ 464.337975] fc_mount+0xa/0x40
    [ 464.338557] vfs_kern_mount.part.0+0x6c/0x80
    [ 464.339227] cifs_dfs_d_automount+0x336/0xd29
    [ 464.339846] follow_managed+0x1b1/0x450
    [ 464.340449] lookup_fast+0x231/0x4a0
    [ 464.341039] path_openat+0x240/0x1fd0
    [ 464.341634] do_filp_open+0x126/0x1c0
    [ 464.342277] do_sys_open+0x1eb/0x2c0
    [ 464.342957] do_syscall_64+0x5e/0x190
    [ 464.343555] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [ 464.344772] Freed by task 0:
    [ 464.345347] save_stack+0x1b/0x80
    [ 464.345966] __kasan_slab_free+0x12c/0x170
    [ 464.346576] kfree+0xa6/0x270
    [ 464.347211] rcu_core+0x39c/0xc80
    [ 464.347800] __do_softirq+0x10d/0x3da

    [ 464.348919] The buggy address belongs to the object at
    ffff888155e58000
    which belongs to the cache kmalloc-256 of size 256
    [ 464.350222] The buggy address is located 208 bytes inside of
    256-byte region [ffff888155e58000, ffff888155e58100)
    [ 464.351575] The buggy address belongs to the page:
    [ 464.352333] page:ffffea0005579600 refcount:1 mapcount:0
    mapping:ffff88815a803400 index:0x0 compound_mapcount: 0
    [ 464.353583] flags: 0x200000000010200(slab|head)
    [ 464.354209] raw: 0200000000010200 ffffea0005576200 0000000400000004
    ffff88815a803400
    [ 464.355353] raw: 0000000000000000 0000000080100010 00000001ffffffff
    0000000000000000
    [ 464.356458] page dumped because: kasan: bad access detected

    [ 464.367005] Memory state around the buggy address:
    [ 464.367787] ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc fc
    [ 464.368877] ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb
    fb fb fb fb
    [ 464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb
    fb fb fb fb
    [ 464.371111] ^
    [ 464.371775] ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc fc
    [ 464.372893] ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc fc
    [ 464.373983] ==================================================================

    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • * show server&TCP states for extra channels
    * mention if an interface has a channel connected to it

    In this version three of the patch, fixed minor printk format
    issue pointed out by the kbuild robot.
    Reported-by: kbuild test robot

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Number of requests in_send and the number of waiters on sendRecv
    are useful counters in various cases, move them from
    CONFIG_CIFS_STATS2 to be on by default especially with multichannel

    Signed-off-by: Steve French
    Acked-by: Ronnie Sahlberg

    Steve French
     
  • Previously we would only loop over the iface list once.
    This patch tries to loop over multiple times until all channels are
    opened. It will also try to reuse RSS ifaces.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Currenly we doesn't assume that a server may break a lease
    from RWH to RW which causes us setting a wrong lease state
    on a file and thus mistakenly flushing data and byte-range
    locks and purging cached data on the client. This leads to
    performance degradation because subsequent IOs go directly
    to the server.

    Fix this by propagating new lease state and epoch values
    to the oplock break handler through cifsFileInfo structure
    and removing the use of cifsInodeInfo flags for that. It
    allows to avoid some races of several lease/oplock breaks
    using those flags in parallel.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • This patch moves the final part of the cifsFileInfo_put() logic where we
    need a write lock on lock_sem to be processed in a separate thread that
    holds no other locks.
    This is to prevent deadlocks like the one below:

    > there are 6 processes looping to while trying to down_write
    > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
    > cifs_new_fileinfo
    >
    > and there are 5 other processes which are blocked, several of them
    > waiting on either PG_writeback or PG_locked (which are both set), all
    > for the same page of the file
    >
    > 2 inode_lock() (inode->i_rwsem) for the file
    > 1 wait_on_page_writeback() for the page
    > 1 down_read(inode->i_rwsem) for the inode of the directory
    > 1 inode_lock()(inode->i_rwsem) for the inode of the directory
    > 1 __lock_page
    >
    >
    > so processes are blocked waiting on:
    > page flags PG_locked and PG_writeback for one specific page
    > inode->i_rwsem for the directory
    > inode->i_rwsem for the file
    > cifsInodeInflock_sem
    >
    >
    >
    > here are the more gory details (let me know if I need to provide
    > anything more/better):
    >
    > [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3
    > COMMAND: "reopen_file"
    > #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
    > #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
    > #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
    > #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
    > #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
    > #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
    > #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
    > * (I think legitimize_path is bogus)
    >
    > in path_openat
    > } else {
    > const char *s = path_init(nd, flags);
    > while (!(error = link_path_walk(s, nd)) &&
    > (error = do_last(nd, file, op)) > 0) { <<<<
    >
    > do_last:
    > if (open_flag & O_CREAT)
    > inode_lock(dir->d_inode); <<<<
    > else
    > so it's trying to take inode->i_rwsem for the directory
    >
    > DENTRY INODE SUPERBLK TYPE PATH
    > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/
    > inode.i_rwsem is ffff8c691158efc0
    >
    > :
    > owner: (UN - 8856 -
    > reopen_file), counter: 0x0000000000000003
    > waitlist: 2
    > 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926
    > RWSEM_WAITING_FOR_WRITE
    > 0xffff996500393e00 9802 ls UN 0 1:17:26.700
    > RWSEM_WAITING_FOR_READ
    >
    >
    > the owner of the inode.i_rwsem of the directory is:
    >
    > [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3
    > COMMAND: "reopen_file"
    > #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
    > #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
    > #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff99650065b940] msleep at ffffffff9af573a9
    > #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    > #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
    > #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
    > #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
    > #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
    > #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
    > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
    > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
    > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
    > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
    > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
    >
    > cifs_launder_page is for page 0xffffd1e2c07d2480
    >
    > crash> page.index,mapping,flags 0xffffd1e2c07d2480
    > index = 0x8
    > mapping = 0xffff8c68f3cd0db0
    > flags = 0xfffffc0008095
    >
    > PAGE-FLAG BIT VALUE
    > PG_locked 0 0000001
    > PG_uptodate 2 0000004
    > PG_lru 4 0000010
    > PG_waiters 7 0000080
    > PG_writeback 15 0008000
    >
    >
    > inode is ffff8c68f3cd0c40
    > inode.i_rwsem is ffff8c68f3cd0ce0
    > DENTRY INODE SUPERBLK TYPE PATH
    > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
    > /mnt/vm1_smb/testfile.8853
    >
    >
    > this process holds the inode->i_rwsem for the parent directory, is
    > laundering a page attached to the inode of the file it's opening, and in
    > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
    > for the file itself.
    >
    >
    > :
    > owner: (UN - 8854 -
    > reopen_file), counter: 0x0000000000000003
    > waitlist: 1
    > 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912
    > RWSEM_WAITING_FOR_WRITE
    >
    > this is the inode.i_rwsem for the file
    >
    > the owner:
    >
    > [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2
    > COMMAND: "reopen_file"
    > #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
    > #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
    > #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
    > #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
    > #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
    > #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
    > #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
    > #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
    > #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
    > #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
    > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
    > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
    > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
    > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
    >
    > the process holds the inode->i_rwsem for the file to which it's writing,
    > and is trying to __lock_page for the same page as in the other processes
    >
    >
    > the other tasks:
    > [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2
    > COMMAND: "reopen_file"
    > #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
    > #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
    > #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
    > #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
    > #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
    > #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
    > #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
    > #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
    > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
    >
    > this is opening the file, and is trying to down_write cinode->lock_sem
    >
    >
    > [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3
    > COMMAND: "reopen_file"
    > [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6
    > COMMAND: "reopen_file"
    > #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
    > #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
    > #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    > #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
    > #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
    > #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
    > #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
    > #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
    >
    > closing the file, and trying to down_write cifsi->lock_sem
    >
    >
    > [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7
    > COMMAND: "reopen_file"
    > #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
    > #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
    > #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
    > #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
    > #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
    > #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
    > #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
    > #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
    > #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
    >
    > in __filemap_fdatawait_range
    > wait_on_page_writeback(page);
    > for the same page of the file
    >
    >
    >
    > [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7
    > COMMAND: "reopen_file"
    > #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
    > #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
    > #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
    > #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
    > #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
    > #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
    > #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
    > #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
    >
    > inode_lock(inode);
    >
    >
    > and one 'ls' later on, to see whether the rest of the mount is available
    > (the test file is in the root, so we get blocked up on the directory
    > ->i_rwsem), so the entire mount is unavailable
    >
    > [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4
    > COMMAND: "ls"
    > #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
    > #1 [ffff996500393db8] schedule at ffffffff9b6e64df
    > #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
    > #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
    > #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
    > #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
    > #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
    > #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
    >
    > in iterate_dir:
    > if (shared)
    > res = down_read_killable(&inode->i_rwsem); <<<<
    > else
    > res = down_write_killable(&inode->i_rwsem);
    >

    Reported-by: Frank Sorenson
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • After doing mount() successfully we call cifs_try_adding_channels()
    which will open as many channels as it can.

    Channels are closed when the master session is closed.

    The master connection becomes the first channel.

    ,-------------> global cifs_tcp_ses_list
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Make logic of cifs_get_inode() much clearer by moving code to sub
    functions and adding comments.

    Document the steps this function does.

    cifs_get_inode_info() gets and updates a file inode metadata from its
    file path.

    * If caller already has raw info data from server they can pass it.
    * If inode already exists (just need to update) caller can pass it.

    Step 1: get raw data from server if none was passed
    Step 2: parse raw data into intermediate internal cifs_fattr struct
    Step 3: set fattr uniqueid which is later used for inode number. This
    can sometime be done from raw data
    Step 4: tweak fattr according to mount options (file_mode, acl to mode
    bits, uid, gid, etc)
    Step 5: update or create inode from final fattr struct

    * add is_smb1_server() helper
    * add is_inode_cache_good() helper
    * move SMB1-backupcreds-getinfo-retry to separate func
    cifs_backup_query_path_info().
    * move set-uniqueid code to separate func cifs_set_fattr_ino()
    * don't clobber uniqueid from backup cred retry
    * fix some probable corner cases memleaks

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Currently a lot of the code to initialize a connection & session uses
    the cifs_ses as input. But depending on if we are opening a new session
    or a new channel we need to use different server pointers.

    Add a "binding" flag in cifs_ses and a helper function that returns
    the server ptr a session should use (only in the sess establishment
    code path).

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • As we get down to the transport layer, plenty of functions are passed
    the session pointer and assume the transport to use is ses->server.

    Instead we modify those functions to pass (ses, server) so that we
    can decouple the session from the server.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • adds:
    - [no]multichannel to enable/disable multichannel
    - max_channels=N to control how many channels to create

    these options are then stored in the volume struct.

    - store channels and max_channels in cifs_ses

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • New channels are going to be opened by walking the list sequentially,
    so by sorting it we will connect to the fastest interfaces first.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Even when mounting modern protocol version the server may be
    configured without supporting SMB2.1 leases and the client
    uses SMB2 oplock to optimize IO performance through local caching.

    However there is a problem in oplock break handling that leads
    to missing a break notification on the client who has a file
    opened. It latter causes big latencies to other clients that
    are trying to open the same file.

    The problem reproduces when there are multiple shares from the
    same server mounted on the client. The processing code tries to
    match persistent and volatile file ids from the break notification
    with an open file but it skips all share besides the first one.
    Fix this by looking up in all shares belonging to the server that
    issued the oplock break.

    Cc: Stable
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • It can cause
    to fail with
    modprobe: FATAL: Module is builtin.

    RHBZ: 1767094

    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • During reconnecting, the transport may have already been destroyed and is in
    the process being reconnected. In this case, return -EAGAIN to not fail and
    to retry this I/O.

    Signed-off-by: Long Li
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French

    Long Li
     
  • It's not necessary to queue invalidated memory registration to work queue, as
    all we need to do is to unmap the SG and make it usable again. This can save
    CPU cycles in normal data paths as memory registration errors are rare and
    normally only happens during reconnection.

    Signed-off-by: Long Li
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French

    Long Li
     
  • Helps distinguish between an interrupted close and a truly
    unmatched open.

    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • When an OPEN command is cancelled we mark a mid as
    cancelled and let the demultiplex thread process it
    by closing an open handle. The problem is there is
    a race between a system call thread and the demultiplex
    thread and there may be a situation when the mid has
    been already processed before it is set as cancelled.

    Fix this by processing cancelled requests when mids
    are being destroyed which means that there is only
    one thread referencing a particular mid. Also set
    mids as cancelled unconditionally on their state.

    Cc: Stable
    Tested-by: Frank Sorenson
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • There is a race between a system call processing thread
    and the demultiplex thread when mid->resp_buf becomes NULL
    and later is being accessed to get credits. It happens when
    the 1st thread wakes up before a mid callback is called in
    the 2nd one but the mid state has already been set to
    MID_RESPONSE_RECEIVED. This causes NULL pointer dereference
    in mid callback.

    Fix this by saving credits from the response before we
    update the mid state and then use this value in the mid
    callback rather then accessing a response buffer.

    Cc: Stable
    Fixes: ee258d79159afed5 ("CIFS: Move credit processing to mid callbacks for SMB3")
    Tested-by: Frank Sorenson
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • If Close command is interrupted before sending a request
    to the server the client ends up leaking an open file
    handle. This wastes server resources and can potentially
    block applications that try to remove the file or any
    directory containing this file.

    Fix this by putting the close command into a worker queue,
    so another thread retries it later.

    Cc: Stable
    Tested-by: Frank Sorenson
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • Currently the client translates O_SYNC and O_DIRECT flags
    into corresponding SMB create options when openning a file.
    The problem is that on reconnect when the file is being
    re-opened the client doesn't set those flags and it causes
    a server to reject re-open requests because create options
    don't match. The latter means that any subsequent system
    call against that open file fail until a share is re-mounted.

    Fix this by properly setting SMB create options when
    re-openning files after reconnects.

    Fixes: 1013e760d10e6: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
    Cc: Stable
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • The smb2/smb3 message checking code was logging to dmesg when mounting
    with encryption ("seal") for compounded SMB3 requests. When encrypted
    the whole frame (including potentially multiple compounds) is read
    so the length field is longer than in the case of non-encrypted
    case (where length field will match the the calculated length for
    the particular SMB3 request in the compound being validated).

    Avoids the warning on mount (with "seal"):

    "srv rsp padded more than expected. Length 384 not ..."

    Signed-off-by: Steve French

    Steve French