24 Oct, 2020

1 commit

  • This is needed so when mounting to Windows we do not
    misinterpret various special files created by Linux (WSL) as symlinks.
    An earlier patch addressed readdir. This patch fixes stat (getattr).

    With this patch:
      File: /mnt1/char
      Size: 0          Blocks: 0          IO Block: 16384  character special file
    Device: 34h/52d Inode: 844424930132069  Links: 1     Device type: 0,0
    Access: (0755/crwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 17:46:51.839458900 -0500
    Modify: 2020-10-21 17:46:51.839458900 -0500
    Change: 2020-10-21 18:30:39.797358800 -0500
     Birth: -
      File: /mnt1/fifo
      Size: 0          Blocks: 0          IO Block: 16384  fifo
    Device: 34h/52d Inode: 1125899906842722  Links: 1
    Access: (0755/prwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 16:21:37.259249700 -0500
    Modify: 2020-10-21 16:21:37.259249700 -0500
    Change: 2020-10-21 18:30:39.797358800 -0500
     Birth: -
      File: /mnt1/block
      Size: 0          Blocks: 0          IO Block: 16384  block special file
    Device: 34h/52d Inode: 844424930132068  Links: 1     Device type: 0,0
    Access: (0755/brwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 17:10:47.913103200 -0500
    Modify: 2020-10-21 17:10:47.913103200 -0500
    Change: 2020-10-21 18:30:39.796725500 -0500
     Birth: -

    without the patch all show up incorrectly as symlinks with annoying "operation not supported error also returned"
      File: /mnt1/charstat: cannot read symbolic link '/mnt1/char': Operation not supported

      Size: 0          Blocks: 0          IO Block: 16384  symbolic link
    Device: 34h/52d Inode: 844424930132069  Links: 1
    Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 17:46:51.839458900 -0500
    Modify: 2020-10-21 17:46:51.839458900 -0500
    Change: 2020-10-21 18:30:39.797358800 -0500
     Birth: -
      File: /mnt1/fifostat: cannot read symbolic link '/mnt1/fifo': Operation not supported

      Size: 0          Blocks: 0          IO Block: 16384  symbolic link
    Device: 34h/52d Inode: 1125899906842722  Links: 1
    Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 16:21:37.259249700 -0500
    Modify: 2020-10-21 16:21:37.259249700 -0500
    Change: 2020-10-21 18:30:39.797358800 -0500
     Birth: -
      File: /mnt1/blockstat: cannot read symbolic link '/mnt1/block': Operation not supported

      Size: 0          Blocks: 0          IO Block: 16384  symbolic link
    Device: 34h/52d Inode: 844424930132068  Links: 1
    Access: (0000/l---------)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2020-10-21 17:10:47.913103200 -0500
    Modify: 2020-10-21 17:10:47.913103200 -0500
    Change: 2020-10-21 18:30:39.796725500 -0500

    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg

    Steve French
     

23 Oct, 2020

1 commit

  • This and related patches which move mount related
    code to fs_context.c has the advantage of
    shriking the code in fs/cifs/connect.c (which had
    the second most lines of code of any of the files
    in cifs.ko and was getting harder to read due
    to its size) and will also make it easier to
    switch over to the new mount API in the future.

    Signed-off-by: Ronnie Sahlberg
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Ronnie Sahlberg
     

16 Oct, 2020

3 commits

  • Add new module load parameter enable_gcm_256. If set, then add
    AES-256-GCM (strongest encryption type) to the list of encryption
    types requested. Put it in the list as the second choice (since
    AES-128-GCM is faster and much more broadly supported by
    SMB3 servers). To make this stronger encryption type, GCM-256,
    required (the first and only choice, you would use module parameter
    "require_gcm_256."

    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Steve French
     
  • Add new module load parameter require_gcm_256. If set, then only
    request AES-256-GCM (strongest encryption type).

    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Steve French
     
  • Currently STATUS_IO_TIMEOUT is not treated as retriable error.
    It is currently mapped to ETIMEDOUT and returned to userspace
    for most system calls. STATUS_IO_TIMEOUT is returned by server
    in case of unavailability or throttling errors.

    This patch will map the STATUS_IO_TIMEOUT to EAGAIN, so that it
    can be retried. Also, added a check to drop the connection to
    not overload the server in case of ongoing unavailability.

    Signed-off-by: Rohith Surabattula
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Rohith Surabattula
     

29 Aug, 2020

1 commit

  • For SMB1, the DFS flag should be checked against tcon->Flags rather
    than tcon->share_flags. While at it, add an is_tcon_dfs() helper to
    check for DFS capability in a more generic way.

    Signed-off-by: Paulo Alcantara (SUSE)
    Signed-off-by: Steve French
    Reviewed-by: Shyam Prasad N

    Paulo Alcantara
     

07 Aug, 2020

1 commit

  • Pull cifs updates from Steve French:
    "16 cifs/smb3 fixes, about half DFS related, two fixes for stable.

    Still working on and testing an additional set of fixes (including
    updates to mount, and some fallocate scenario improvements) for later
    in the merge window"

    * tag '5.9-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: document and cleanup dfs mount
    cifs: only update prefix path of DFS links in cifs_tree_connect()
    cifs: fix double free error on share and prefix
    cifs: handle RESP_GET_DFS_REFERRAL.PathConsumed in reconnect
    cifs: handle empty list of targets in cifs_reconnect()
    cifs: rename reconn_inval_dfs_target()
    cifs: reduce number of referral requests in DFS link lookups
    cifs: merge __{cifs,smb2}_reconnect[_tcon]() into cifs_tree_connect()
    cifs: convert to use be32_add_cpu()
    cifs: delete duplicated words in header files
    cifs: Remove the superfluous break
    cifs: smb1: Try failing back to SetFileInfo if SetPathInfo fails
    cifs`: handle ERRBaduid for SMB1
    cifs: remove unused variable 'server'
    smb3: warn on confusing error scenario with sec=krb5
    cifs: Fix leak when handling lease break for cached root fid

    Linus Torvalds
     

03 Aug, 2020

1 commit


06 Jul, 2020

1 commit

  • Rationale:
    Reduces attack surface on kernel devs opening the links for MITM
    as HTTPS traffic is much harder to manipulate.

    Deterministic algorithm:
    For each file:
    If not .svg:
    For each line:
    If doesn't contain `\bxmlns\b`:
    For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
    If both the HTTP and HTTPS versions
    return 200 OK and serve the same content:
    Replace HTTP with HTTPS.

    Signed-off-by: Alexander A. Klimov
    Reviewed-by: Aurelien Aptel
    Link: https://lore.kernel.org/r/20200627103125.71828-1-grandmaster@al2klimov.de
    Signed-off-by: Jonathan Corbet

    Alexander A. Klimov
     

09 Jun, 2020

1 commit


05 Jun, 2020

2 commits

  • Add a cifs_chan pointer in struct cifs_ses that points to the channel
    currently being bound if ses->binding is true.

    Previously it was always the channel past the established count.

    This will make reconnecting (and rebinding) a channel easier later on.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Move the channel (TCP_Server_Info*) selection from the tranport
    layer to higher in the call stack so that:

    - credit handling is done with the server that will actually be used
    to send.
    * ->wait_mtu_credit
    * ->set_credits / set_credits
    * ->add_credits / add_credits
    * add_credits_and_wake_if

    - potential reconnection (smb2_reconnect) done when initializing a
    request is checked and done with the server that will actually be
    used to send.

    To do this:

    - remove the cifs_pick_channel() call out of compound_send_recv()

    - select channel and pass it down by adding a cifs_pick_channel(ses)
    call in:
    - smb311_posix_mkdir
    - SMB2_open
    - SMB2_ioctl
    - __SMB2_close
    - query_info
    - SMB2_change_notify
    - SMB2_flush
    - smb2_async_readv (if none provided in context param)
    - SMB2_read (if none provided in context param)
    - smb2_async_writev (if none provided in context param)
    - SMB2_write (if none provided in context param)
    - SMB2_query_directory
    - send_set_info
    - SMB2_oplock_break
    - SMB311_posix_qfs_info
    - SMB2_QFS_info
    - SMB2_QFS_attr
    - smb2_lockv
    - SMB2_lease_break
    - smb2_compound_op
    - smb2_set_ea
    - smb2_ioctl_query_info
    - smb2_query_dir_first
    - smb2_query_info_comound
    - smb2_query_symlink
    - cifs_writepages
    - cifs_write_from_iter
    - cifs_send_async_read
    - cifs_read
    - cifs_readpages

    - add TCP_Server_Info *server param argument to:
    - cifs_send_recv
    - compound_send_recv
    - SMB2_open_init
    - SMB2_query_info_init
    - SMB2_set_info_init
    - SMB2_close_init
    - SMB2_ioctl_init
    - smb2_iotcl_req_init
    - SMB2_query_directory_init
    - SMB2_notify_init
    - SMB2_flush_init
    - build_qfs_info_req
    - smb2_hdr_assemble
    - smb2_reconnect
    - fill_small_buf
    - smb2_plain_req_init
    - __smb2_plain_req_init

    The read/write codepath is different than the rest as it is using
    pages, io iterators and async calls. To deal with those we add a
    server pointer in the cifs_writedata/cifs_readdata/cifs_io_parms
    context struct and set it in:

    - cifs_writepages (wdata)
    - cifs_write_from_iter (wdata)
    - cifs_readpages (rdata)
    - cifs_send_async_read (rdata)

    The [rw]data->server pointer is eventually copied to
    cifs_io_parms->server to pass it down to SMB2_read/SMB2_write.
    If SMB2_read/SMB2_write is called from a different place that doesn't
    set the server field it will pick a channel.

    Some places do not pick a channel and just use ses->server or
    cifs_ses_server(ses). All cifs_ses_server(ses) calls are in codepaths
    involving negprot/sess.setup.

    - SMB2_negotiate (binding channel)
    - SMB2_sess_alloc_buffer (binding channel)
    - SMB2_echo (uses provided one)
    - SMB2_logoff (uses master)
    - SMB2_tdis (uses master)

    (list not exhaustive)

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     

01 Jun, 2020

1 commit

  • In order to handle workloads where it is important to make sure that
    a buggy app did not delete content on the drive, the new mount option
    "nodelete" allows standard permission checks on the server to work,
    but prevents on the client any attempts to unlink a file or delete
    a directory on that mount point. This can be helpful when running
    a little understood app on a network mount that contains important
    content that should not be deleted.

    Signed-off-by: Steve French
    CC: Stable
    Reviewed-by: Pavel Shilovsky

    Steve French
     

22 Apr, 2020

1 commit


11 Apr, 2020

1 commit

  • Add experimental support for allowing a swap file to be on an SMB3
    mount. There are use cases where swapping over a secure network
    filesystem is preferable. In some cases there are no local
    block devices large enough, and network block devices can be
    hard to setup and secure. And in some cases there are no
    local block devices at all (e.g. with the recent addition of
    remote boot over SMB3 mounts).

    There are various enhancements that can be added later e.g.:
    - doing a mandatory byte range lock over the swapfile (until
    the Linux VFS is modified to notify the file system that an open
    is for a swapfile, when the file can be opened "DENY_ALL" to prevent
    others from opening it).
    - pinning more buffers in the underlying transport to minimize memory
    allocations in the TCP stack under the fs
    - documenting how to create ACLs (on the server) to secure the
    swapfile (or adding additional tools to cifs-utils to make it easier)

    Signed-off-by: Steve French
    Acked-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg

    Steve French
     

08 Apr, 2020

1 commit

  • CIFS uses pre-allocated crypto structures to calculate signatures for both
    incoming and outgoing packets. In this way it doesn't need to allocate crypto
    structures for every packet, but it requires a lock to prevent concurrent
    access to crypto structures.

    Remove the lock by allocating crypto structures on the fly for
    incoming packets. At the same time, we can still use pre-allocated crypto
    structures for outgoing packets, as they are already protected by transport
    lock srv_mutex.

    Signed-off-by: Long Li
    Signed-off-by: Steve French

    Long Li
     

25 Feb, 2020

1 commit

  • To rename a file in SMB2 we open it with the DELETE access and do a
    special SetInfo on it. If the handle is missing the DELETE bit the
    server will fail the SetInfo with STATUS_ACCESS_DENIED.

    We currently try to reuse any existing opened handle we have with
    cifs_get_writable_path(). That function looks for handles with WRITE
    access but doesn't check for DELETE, making rename() fail if it finds
    a handle to reuse. Simple reproducer below.

    To select handles with the DELETE bit, this patch adds a flag argument
    to cifs_get_writable_path() and find_writable_file() and the existing
    'bool fsuid_only' argument is converted to a flag.

    The cifsFileInfo struct only stores the UNIX open mode but not the
    original SMB access flags. Since the DELETE bit is not mapped in that
    mode, this patch stores the access mask in cifs_fid on file open,
    which is accessible from cifsFileInfo.

    Simple reproducer:

    #include
    #include
    #include
    #include
    #include
    #include
    #define E(s) perror(s), exit(1)

    int main(int argc, char *argv[])
    {
    int fd, ret;
    if (argc != 3) {
    fprintf(stderr, "Usage: %s A B\n"
    "create&open A in write mode, "
    "rename A to B, close A\n", argv[0]);
    return 0;
    }

    fd = openat(AT_FDCWD, argv[1], O_WRONLY|O_CREAT|O_SYNC, 0666);
    if (fd == -1) E("openat()");

    ret = rename(argv[1], argv[2]);
    if (ret) E("rename()");

    ret = close(fd);
    if (ret) E("close()");

    return ret;
    }

    $ gcc -o bugrename bugrename.c
    $ ./bugrename /mnt/a /mnt/b
    rename(): Permission denied

    Fixes: 8de9e86c67ba ("cifs: create a helper to find a writeable handle by path name")
    CC: Stable
    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    Reviewed-by: Paulo Alcantara (SUSE)

    Aurelien Aptel
     

06 Feb, 2020

1 commit

  • A commonly used SMB3 feature is change notification, allowing an
    app to be notified about changes to a directory. The SMB3
    Notify request blocks until the server detects a change to that
    directory or its contents that matches the completion flags
    that were passed in and the "watch_tree" flag (which indicates
    whether subdirectories under this directory should be also
    included). See MS-SMB2 2.2.35 for additional detail.

    To use this simply pass in the following structure to ioctl:

    struct __attribute__((__packed__)) smb3_notify {
    uint32_t completion_filter;
    bool watch_tree;
    } __packed;

    using CIFS_IOC_NOTIFY 0x4005cf09
    or equivalently _IOW(CIFS_IOCTL_MAGIC, 9, struct smb3_notify)

    SMB3 change notification is supported by all major servers.
    The ioctl will block until the server detects a change to that
    directory or its subdirectories (if watch_tree is set).

    Signed-off-by: Steve French
    Reviewed-by: Aurelien Aptel
    Acked-by: Paulo Alcantara (SUSE)

    Steve French
     

04 Feb, 2020

1 commit

  • When "backup intent" is requested on the mount (e.g. backupuid or
    backupgid mount options), the corresponding flag was missing from
    some of the operations.

    Change all operations to use the macro cifs_create_options() to
    set the backup intent flag if needed.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Steve French

    Amir Goldstein
     

27 Jan, 2020

1 commit

  • The task which created the MID may be gone by the time cifsd attempts to
    call the callbacks on MIDs from cifs_reconnect().

    This leads to a use-after-free of the task struct in cifs_wake_up_task:

    ==================================================================
    BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
    Read of size 8 at addr ffff8880103e3a68 by task cifsd/630

    CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Call Trace:
    dump_stack+0x8e/0xcb
    print_address_description.constprop.5+0x1d3/0x3c0
    ? __lock_acquire+0x31a0/0x3270
    __kasan_report+0x152/0x1aa
    ? __lock_acquire+0x31a0/0x3270
    ? __lock_acquire+0x31a0/0x3270
    kasan_report+0xe/0x20
    __lock_acquire+0x31a0/0x3270
    ? __wake_up_common+0x1dc/0x630
    ? _raw_spin_unlock_irqrestore+0x4c/0x60
    ? mark_held_locks+0xf0/0xf0
    ? _raw_spin_unlock_irqrestore+0x39/0x60
    ? __wake_up_common_lock+0xd5/0x130
    ? __wake_up_common+0x630/0x630
    lock_acquire+0x13f/0x330
    ? try_to_wake_up+0xa3/0x19e0
    _raw_spin_lock_irqsave+0x38/0x50
    ? try_to_wake_up+0xa3/0x19e0
    try_to_wake_up+0xa3/0x19e0
    ? cifs_compound_callback+0x178/0x210
    ? set_cpus_allowed_ptr+0x10/0x10
    cifs_reconnect+0xa1c/0x15d0
    ? generic_ip_connect+0x1860/0x1860
    ? rwlock_bug.part.0+0x90/0x90
    cifs_readv_from_socket+0x479/0x690
    cifs_read_from_socket+0x9d/0xe0
    ? cifs_readv_from_socket+0x690/0x690
    ? mempool_resize+0x690/0x690
    ? rwlock_bug.part.0+0x90/0x90
    ? memset+0x1f/0x40
    ? allocate_buffers+0xff/0x340
    cifs_demultiplex_thread+0x388/0x2a50
    ? cifs_handle_standard+0x610/0x610
    ? rcu_read_lock_held_common+0x120/0x120
    ? mark_lock+0x11b/0xc00
    ? __lock_acquire+0x14ed/0x3270
    ? __kthread_parkme+0x78/0x100
    ? lockdep_hardirqs_on+0x3e8/0x560
    ? lock_downgrade+0x6a0/0x6a0
    ? lockdep_hardirqs_on+0x3e8/0x560
    ? _raw_spin_unlock_irqrestore+0x39/0x60
    ? cifs_handle_standard+0x610/0x610
    kthread+0x2bb/0x3a0
    ? kthread_create_worker_on_cpu+0xc0/0xc0
    ret_from_fork+0x3a/0x50

    Allocated by task 649:
    save_stack+0x19/0x70
    __kasan_kmalloc.constprop.5+0xa6/0xf0
    kmem_cache_alloc+0x107/0x320
    copy_process+0x17bc/0x5370
    _do_fork+0x103/0xbf0
    __x64_sys_clone+0x168/0x1e0
    do_syscall_64+0x9b/0xec0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 0:
    save_stack+0x19/0x70
    __kasan_slab_free+0x11d/0x160
    kmem_cache_free+0xb5/0x3d0
    rcu_core+0x52f/0x1230
    __do_softirq+0x24d/0x962

    The buggy address belongs to the object at ffff8880103e32c0
    which belongs to the cache task_struct of size 6016
    The buggy address is located 1960 bytes inside of
    6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
    The buggy address belongs to the page:
    page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
    index:0xffff8880103e4c00 compound_mapcount: 0
    raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
    raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

    This can be reliably reproduced by adding the below delay to
    cifs_reconnect(), running find(1) on the mount, restarting the samba
    server while find is running, and killing find during the delay:

    spin_unlock(&GlobalMid_Lock);
    mutex_unlock(&server->srv_mutex);

    + msleep(10000);
    +
    cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
    list_for_each_safe(tmp, tmp2, &retry_list) {
    mid_entry = list_entry(tmp, struct mid_q_entry, qhead);

    Fix this by holding a reference to the task struct until the MID is
    freed.

    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Steve French
    CC: Stable
    Reviewed-by: Paulo Alcantara (SUSE)
    Reviewed-by: Pavel Shilovsky

    Vincent Whitchurch
     

23 Dec, 2019

1 commit

  • When listing a directory with thounsands of files and most of them are
    reparse points, we simply marked all those dentries for revalidation
    and then sending additional (compounded) create/getinfo/close requests
    for each of them.

    Instead, upon receiving a response from an SMB2_QUERY_DIRECTORY
    (FileIdFullDirectoryInformation) command, the directory entries that
    have a file attribute of FILE_ATTRIBUTE_REPARSE_POINT will contain an
    EaSize field with a reparse tag in it, so we parse it and mark the
    dentry for revalidation only if it is a DFS or a symlink.

    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     

13 Dec, 2019

1 commit

  • SMB2_tdis() checks if a root handle is valid in order to decide
    whether it needs to close the handle or not. However if another
    thread has reference for the handle, it may end up with putting
    the reference twice. The extra reference that we want to put
    during the tree disconnect is the reference that has a directory
    lease. So, track the fact that we have a directory lease and
    close the handle only in that case.

    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Pavel Shilovsky
     

05 Dec, 2019

1 commit

  • With the addition of SMB session channels, we introduced new TCP
    server pointers that have no sessions or tcons associated with them.

    In this case, when we started looking for TCP connections, we might
    end up picking session channel rather than the master connection,
    hence failing to get either a session or a tcon.

    In order to fix that, this patch introduces a new "is_channel" field
    to TCP_Server_Info structure so we can skip session channels during
    lookup of connections.

    Signed-off-by: Paulo Alcantara (SUSE)
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     

04 Dec, 2019

1 commit

  • Since timestamps on files on most servers can be updated at
    close, and since timestamps on our dentries default to one
    second we can have stale timestamps in some common cases
    (e.g. open, write, close, stat, wait one second, stat - will
    show different mtime for the first and second stat).

    The SMB2/SMB3 protocol allows querying timestamps at close
    so add the code to request timestamp and attr information
    (which is cheap for the server to provide) to be returned
    when a file is closed (it is not needed for the many
    paths that call SMB2_close that are from compounded
    query infos and close nor is it needed for some of
    the cases where a directory close immediately follows a
    directory open.

    Signed-off-by: Steve French
    Acked-by: Ronnie Sahlberg
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Pavel Shilovsky

    Steve French
     

25 Nov, 2019

10 commits

  • Number of requests in_send and the number of waiters on sendRecv
    are useful counters in various cases, move them from
    CONFIG_CIFS_STATS2 to be on by default especially with multichannel

    Signed-off-by: Steve French
    Acked-by: Ronnie Sahlberg

    Steve French
     
  • Currenly we doesn't assume that a server may break a lease
    from RWH to RW which causes us setting a wrong lease state
    on a file and thus mistakenly flushing data and byte-range
    locks and purging cached data on the client. This leads to
    performance degradation because subsequent IOs go directly
    to the server.

    Fix this by propagating new lease state and epoch values
    to the oplock break handler through cifsFileInfo structure
    and removing the use of cifsInodeInfo flags for that. It
    allows to avoid some races of several lease/oplock breaks
    using those flags in parallel.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • This patch moves the final part of the cifsFileInfo_put() logic where we
    need a write lock on lock_sem to be processed in a separate thread that
    holds no other locks.
    This is to prevent deadlocks like the one below:

    > there are 6 processes looping to while trying to down_write
    > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
    > cifs_new_fileinfo
    >
    > and there are 5 other processes which are blocked, several of them
    > waiting on either PG_writeback or PG_locked (which are both set), all
    > for the same page of the file
    >
    > 2 inode_lock() (inode->i_rwsem) for the file
    > 1 wait_on_page_writeback() for the page
    > 1 down_read(inode->i_rwsem) for the inode of the directory
    > 1 inode_lock()(inode->i_rwsem) for the inode of the directory
    > 1 __lock_page
    >
    >
    > so processes are blocked waiting on:
    > page flags PG_locked and PG_writeback for one specific page
    > inode->i_rwsem for the directory
    > inode->i_rwsem for the file
    > cifsInodeInflock_sem
    >
    >
    >
    > here are the more gory details (let me know if I need to provide
    > anything more/better):
    >
    > [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3
    > COMMAND: "reopen_file"
    > #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
    > #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
    > #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
    > #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
    > #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
    > #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
    > #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
    > * (I think legitimize_path is bogus)
    >
    > in path_openat
    > } else {
    > const char *s = path_init(nd, flags);
    > while (!(error = link_path_walk(s, nd)) &&
    > (error = do_last(nd, file, op)) > 0) { <<<<
    >
    > do_last:
    > if (open_flag & O_CREAT)
    > inode_lock(dir->d_inode); <<<<
    > else
    > so it's trying to take inode->i_rwsem for the directory
    >
    > DENTRY INODE SUPERBLK TYPE PATH
    > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/
    > inode.i_rwsem is ffff8c691158efc0
    >
    > :
    > owner: (UN - 8856 -
    > reopen_file), counter: 0x0000000000000003
    > waitlist: 2
    > 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926
    > RWSEM_WAITING_FOR_WRITE
    > 0xffff996500393e00 9802 ls UN 0 1:17:26.700
    > RWSEM_WAITING_FOR_READ
    >
    >
    > the owner of the inode.i_rwsem of the directory is:
    >
    > [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3
    > COMMAND: "reopen_file"
    > #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
    > #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
    > #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff99650065b940] msleep at ffffffff9af573a9
    > #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    > #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
    > #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
    > #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
    > #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
    > #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
    > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
    > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
    > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
    > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
    > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
    >
    > cifs_launder_page is for page 0xffffd1e2c07d2480
    >
    > crash> page.index,mapping,flags 0xffffd1e2c07d2480
    > index = 0x8
    > mapping = 0xffff8c68f3cd0db0
    > flags = 0xfffffc0008095
    >
    > PAGE-FLAG BIT VALUE
    > PG_locked 0 0000001
    > PG_uptodate 2 0000004
    > PG_lru 4 0000010
    > PG_waiters 7 0000080
    > PG_writeback 15 0008000
    >
    >
    > inode is ffff8c68f3cd0c40
    > inode.i_rwsem is ffff8c68f3cd0ce0
    > DENTRY INODE SUPERBLK TYPE PATH
    > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
    > /mnt/vm1_smb/testfile.8853
    >
    >
    > this process holds the inode->i_rwsem for the parent directory, is
    > laundering a page attached to the inode of the file it's opening, and in
    > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
    > for the file itself.
    >
    >
    > :
    > owner: (UN - 8854 -
    > reopen_file), counter: 0x0000000000000003
    > waitlist: 1
    > 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912
    > RWSEM_WAITING_FOR_WRITE
    >
    > this is the inode.i_rwsem for the file
    >
    > the owner:
    >
    > [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2
    > COMMAND: "reopen_file"
    > #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
    > #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
    > #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
    > #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
    > #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
    > #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
    > #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
    > #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
    > #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
    > #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
    > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
    > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
    > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
    > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
    >
    > the process holds the inode->i_rwsem for the file to which it's writing,
    > and is trying to __lock_page for the same page as in the other processes
    >
    >
    > the other tasks:
    > [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2
    > COMMAND: "reopen_file"
    > #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
    > #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
    > #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
    > #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
    > #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
    > #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
    > #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
    > #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
    > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
    >
    > this is opening the file, and is trying to down_write cinode->lock_sem
    >
    >
    > [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3
    > COMMAND: "reopen_file"
    > [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6
    > COMMAND: "reopen_file"
    > #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
    > #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
    > #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
    > #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
    > #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    > #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
    > #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
    > #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
    > #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
    > #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
    >
    > closing the file, and trying to down_write cifsi->lock_sem
    >
    >
    > [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7
    > COMMAND: "reopen_file"
    > #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
    > #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
    > #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
    > #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
    > #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
    > #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
    > #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
    > #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
    > #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
    > #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
    >
    > in __filemap_fdatawait_range
    > wait_on_page_writeback(page);
    > for the same page of the file
    >
    >
    >
    > [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7
    > COMMAND: "reopen_file"
    > #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
    > #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
    > #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
    > #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
    > #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
    > #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
    > #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
    > #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
    >
    > inode_lock(inode);
    >
    >
    > and one 'ls' later on, to see whether the rest of the mount is available
    > (the test file is in the root, so we get blocked up on the directory
    > ->i_rwsem), so the entire mount is unavailable
    >
    > [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4
    > COMMAND: "ls"
    > #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
    > #1 [ffff996500393db8] schedule at ffffffff9b6e64df
    > #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
    > #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
    > #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
    > #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
    > #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
    > #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
    >
    > in iterate_dir:
    > if (shared)
    > res = down_read_killable(&inode->i_rwsem); <<<<
    > else
    > res = down_write_killable(&inode->i_rwsem);
    >

    Reported-by: Frank Sorenson
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • After doing mount() successfully we call cifs_try_adding_channels()
    which will open as many channels as it can.

    Channels are closed when the master session is closed.

    The master connection becomes the first channel.

    ,-------------> global cifs_tcp_ses_list
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Make logic of cifs_get_inode() much clearer by moving code to sub
    functions and adding comments.

    Document the steps this function does.

    cifs_get_inode_info() gets and updates a file inode metadata from its
    file path.

    * If caller already has raw info data from server they can pass it.
    * If inode already exists (just need to update) caller can pass it.

    Step 1: get raw data from server if none was passed
    Step 2: parse raw data into intermediate internal cifs_fattr struct
    Step 3: set fattr uniqueid which is later used for inode number. This
    can sometime be done from raw data
    Step 4: tweak fattr according to mount options (file_mode, acl to mode
    bits, uid, gid, etc)
    Step 5: update or create inode from final fattr struct

    * add is_smb1_server() helper
    * add is_inode_cache_good() helper
    * move SMB1-backupcreds-getinfo-retry to separate func
    cifs_backup_query_path_info().
    * move set-uniqueid code to separate func cifs_set_fattr_ino()
    * don't clobber uniqueid from backup cred retry
    * fix some probable corner cases memleaks

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Currently a lot of the code to initialize a connection & session uses
    the cifs_ses as input. But depending on if we are opening a new session
    or a new channel we need to use different server pointers.

    Add a "binding" flag in cifs_ses and a helper function that returns
    the server ptr a session should use (only in the sess establishment
    code path).

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • As we get down to the transport layer, plenty of functions are passed
    the session pointer and assume the transport to use is ses->server.

    Instead we modify those functions to pass (ses, server) so that we
    can decouple the session from the server.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • adds:
    - [no]multichannel to enable/disable multichannel
    - max_channels=N to control how many channels to create

    these options are then stored in the volume struct.

    - store channels and max_channels in cifs_ses

    Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French

    Aurelien Aptel
     
  • Helps distinguish between an interrupted close and a truly
    unmatched open.

    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • There is a race between a system call processing thread
    and the demultiplex thread when mid->resp_buf becomes NULL
    and later is being accessed to get credits. It happens when
    the 1st thread wakes up before a mid callback is called in
    the 2nd one but the mid state has already been set to
    MID_RESPONSE_RECEIVED. This causes NULL pointer dereference
    in mid callback.

    Fix this by saving credits from the response before we
    update the mid state and then use this value in the mid
    callback rather then accessing a response buffer.

    Cc: Stable
    Fixes: ee258d79159afed5 ("CIFS: Move credit processing to mid callbacks for SMB3")
    Tested-by: Frank Sorenson
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     

25 Oct, 2019

1 commit

  • There's a deadlock that is possible and can easily be seen with
    a test where multiple readers open/read/close of the same file
    and a disruption occurs causing reconnect. The deadlock is due
    a reader thread inside cifs_strict_readv calling down_read and
    obtaining lock_sem, and then after reconnect inside
    cifs_reopen_file calling down_read a second time. If in
    between the two down_read calls, a down_write comes from
    another process, deadlock occurs.

    CPU0 CPU1
    ---- ----
    cifs_strict_readv()
    down_read(&cifsi->lock_sem);
    _cifsFileInfo_put
    OR
    cifs_new_fileinfo
    down_write(&cifsi->lock_sem);
    cifs_reopen_file()
    down_read(&cifsi->lock_sem);

    Fix the above by changing all down_write(lock_sem) calls to
    down_write_trylock(lock_sem)/msleep() loop, which in turn
    makes the second down_read call benign since it will never
    block behind the writer while holding lock_sem.

    Signed-off-by: Dave Wysochanski
    Suggested-by: Ronnie Sahlberg
    Reviewed--by: Ronnie Sahlberg
    Reviewed-by: Pavel Shilovsky

    Dave Wysochanski
     

09 Oct, 2019

1 commit


26 Sep, 2019

1 commit

  • We need to populate an ACL (security descriptor open context)
    on file and directory correct. This patch passes in the
    mode. Followon patch will build the open context and the
    security descriptor (from the mode) that goes in the open
    context.

    Signed-off-by: Steve French
    Reviewed-by: Aurelien Aptel

    Steve French
     

17 Sep, 2019

3 commits

  • Introduce a new CONFIG_CIFS_ROOT option to handle root file systems
    over a SMB share.

    In order to mount the root file system during the init process, make
    cifs.ko perform non-blocking socket operations while mounting and
    accessing it.

    Cc: Steve French
    Reviewed-by: Aurelien Aptel
    Signed-off-by: Paulo Alcantara (SUSE)
    Signed-off-by: Steve French

    Paulo Alcantara (SUSE)
     
  • In some cases to work around server bugs or performance
    problems it can be helpful to be able to disable requesting
    SMB2.1/SMB3 leases on a particular mount (not to all servers
    and all shares we are mounted to). Add new mount parm
    "nolease" which turns off requesting leases on directory
    or file opens. Currently the only way to disable leases is
    globally through a module load parameter. This is more
    granular.

    Suggested-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg
    Reviewed-by: Pavel Shilovsky
    CC: Stable

    Steve French
     
  • Displayed in /proc/fs/cifs/Stats once for each
    socket we are connected to.

    This allows us to find out what the maximum number of
    requests that had been in flight (at any one time). Note that
    /proc/fs/cifs/Stats can be reset if you want to look for
    maximum over a small period of time.

    Sample output (immediately after mount):

    Resources in use
    CIFS Session: 1
    Share (unique mount targets): 2
    SMB Request/Response Buffer: 1 Pool size: 5
    SMB Small Req/Resp Buffer: 1 Pool size: 30
    Operations (MIDs): 0

    0 session 0 share reconnects
    Total vfs operations: 5 maximum at one time: 2

    Max requests in flight: 2
    1) \\localhost\scratch
    SMBs: 18
    Bytes read: 0 Bytes written: 0
    ...

    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky

    Steve French