06 Apr, 2019

1 commit

  • [ Upstream commit bc31d0cdcfbadb6258b45db97e93b1c83822ba33 ]

    We have a customer reporting crashes in lock_get_status() with many
    "Leaked POSIX lock" messages preceeding the crash.

    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x56 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    Leaked POSIX lock on dev=0x0:0x53 ...
    POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709
    Leaked POSIX lock on dev=0x0:0x4b ino...
    Leaked locks on dev=0x0:0x4b ino=0xf911400000029:
    POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592
    stack segment: 0000 [#1] SMP
    Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4

    Supported: Yes
    CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
    task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000
    RIP: 0010:[] [] lock_get_status+0x9b/0x3b0
    RSP: 0018:ffff88c74005fd90 EFLAGS: 00010202
    RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50
    RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae
    RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff
    R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340
    R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340
    FS: 00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Stack:
    0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18
    ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340
    ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28
    Call Trace:
    [] locks_show+0x2f/0x70
    [] seq_read+0x251/0x3a0
    [] proc_reg_read+0x3c/0x70
    [] __vfs_read+0x26/0x140
    [] vfs_read+0x7a/0x120
    [] SyS_read+0x42/0xa0
    [] entry_SYSCALL_64_fastpath+0x1e/0xb7

    When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls
    filp_close() which also removes all posix locks.

    The lock struct is initialized like so in filp_close() and passed
    down to cifs

    ...
    lock.fl_type = F_UNLCK;
    lock.fl_flags = FL_POSIX | FL_CLOSE;
    lock.fl_start = 0;
    lock.fl_end = OFFSET_MAX;
    ...

    Note the FL_CLOSE flag, which hints the VFS code that this unlocking
    is done for closing the fd.

    filp_close()
    locks_remove_posix(filp, id);
    vfs_lock_file(filp, F_SETLK, &lock, NULL);
    return filp->f_op->lock(filp, cmd, fl) => cifs_lock()
    rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid);
    rc = server->ops->mand_unlock_range(cfile, flock, xid);
    if (flock->fl_flags & FL_POSIX && !rc)
    rc = locks_lock_file_wait(file, flock)

    Notice how we don't call locks_lock_file_wait() which does the
    generic VFS lock/unlock/wait work on the inode if rc != 0.

    If we are closing the handle, the SMB server is supposed to remove any
    locks associated with it. Similarly, cifs.ko frees and wakes up any
    lock and lock waiter when closing the file:

    cifs_close()
    cifsFileInfo_put(file->private_data)
    /*
    * Delete any outstanding lock records. We'll lose them when the file
    * is closed anyway.
    */
    down_write(&cifsi->lock_sem);
    list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
    list_del(&li->llist);
    cifs_del_lock_waiters(li);
    kfree(li);
    }
    list_del(&cifs_file->llist->llist);
    kfree(cifs_file->llist);
    up_write(&cifsi->lock_sem);

    So we can safely ignore unlocking failures in cifs_lock() if they
    happen with the FL_CLOSE flag hint set as both the server and the
    client take care of it during the actual closing.

    This is not a proper fix for the unlocking failure but it's safe and
    it seems to prevent the lock leakages and crashes the customer
    experiences.

    Signed-off-by: Aurelien Aptel
    Signed-off-by: NeilBrown
    Signed-off-by: Steve French
    Acked-by: Pavel Shilovsky
    Signed-off-by: Sasha Levin

    Aurelien Aptel
     

24 Mar, 2019

1 commit

  • commit 6dfbd84684700cb58b34e8602c01c12f3d2595c8 upstream.

    When we have a READ lease for a file and have just issued a write
    operation to the server we need to purge the cache and set oplock/lease
    level to NONE to avoid reading stale data. Currently we do that
    only if a write operation succedeed thus not covering cases when
    a request was sent to the server but a negative error code was
    returned later for some other reasons (e.g. -EIOCBQUEUED or -EINTR).
    Fix this by turning off caching regardless of the error code being
    returned.

    The patches fixes generic tests 075 and 112 from the xfs-tests.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     

20 Feb, 2019

1 commit

  • [ Upstream commit 92a8109e4d3a34fb6b115c9098b51767dc933444 ]

    The code tries to allocate a contiguous buffer with a size supplied by
    the server (maxBuf). This could fail if memory is fragmented since it
    results in high order allocations for commonly used server
    implementations. It is also wasteful since there are probably
    few locks in the usual case. Limit the buffer to be no larger than a
    page to avoid memory allocation failures due to fragmentation.

    Signed-off-by: Ross Lagerwall
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Ross Lagerwall
     

17 Jan, 2019

1 commit


13 Jun, 2018

1 commit

  • The kzalloc() function has a 2-factor argument form, kcalloc(). This
    patch replaces cases of:

    kzalloc(a * b, gfp)

    with:
    kcalloc(a * b, gfp)

    as well as handling cases of:

    kzalloc(a * b * c, gfp)

    with:

    kzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kzalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kzalloc
    + kcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kzalloc(sizeof(THING) * C2, ...)
    |
    kzalloc(sizeof(TYPE) * C2, ...)
    |
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(C1 * C2, ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

03 Jun, 2018

2 commits

  • With offset defined in rdata, transport functions need to look at this
    offset when reading data into the correct places in pages.

    Signed-off-by: Long Li
    Signed-off-by: Steve French

    Long Li
     
  • Add a function to allocate rdata without allocating pages for data
    transfer. This gives the caller an option to pass a number of pages
    that point to the data buffer.

    rdata is still reponsible for free those pages after it's done.

    Signed-off-by: Long Li
    Signed-off-by: Steve French

    Long Li
     

18 Apr, 2018

1 commit


12 Apr, 2018

1 commit

  • Remove the address_space ->tree_lock and use the xa_lock newly added to
    the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
    since we don't really care that it's a tree.

    [willy@infradead.org: fix nds32, fs/dax.c]
    Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jeff Layton
    Cc: Darrick J. Wong
    Cc: Dave Chinner
    Cc: Ryusuke Konishi
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

25 Jan, 2018

2 commits

  • If I/O size is larger than rdma_readwrite_threshold, use RDMA write for
    SMB read by specifying channel SMB2_CHANNEL_RDMA_V1 or
    SMB2_CHANNEL_RDMA_V1_INVALIDATE in the SMB packet, depending on SMB dialect
    used. Append a smbd_buffer_descriptor_v1 to the end of the SMB packet and fill
    in other values to indicate this SMB read uses RDMA write.

    There is no need to read from the transport for incoming payload. At the time
    SMB read response comes back, the data is already transferred and placed in the
    pages by RDMA hardware.

    When SMB read is finished, deregister the memory regions if RDMA write is used
    for this SMB read. smbd_deregister_mr may need to do local invalidation and
    sleep, if server remote invalidation is not used.

    There are situations where the MID may not be created on I/O failure, under
    which memory region is deregistered when read data context is released.

    Signed-off-by: Long Li
    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg

    Long Li
     
  • If cifs_zap_mapping() returned an error, we would return without putting
    the xid that we got earlier. Restructure cifs_file_strict_mmap() and
    cifs_file_mmap() to be more similar to each other and have a single
    point of return that always puts the xid.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Steve French
    CC: Stable

    Matthew Wilcox
     

16 Nov, 2017

1 commit

  • wdata_alloc_and_fillpages() needlessly iterates calls to
    find_get_pages_tag(). Also it wants only pages from given range. Make
    it use find_get_pages_range_tag().

    Link: http://lkml.kernel.org/r/20171009151359.31984-17-jack@suse.cz
    Signed-off-by: Jan Kara
    Suggested-by: Daniel Jordan
    Reviewed-by: Daniel Jordan
    Cc: Steve French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

23 Sep, 2017

1 commit


21 Sep, 2017

1 commit

  • Don't populate the read-only arrays types[] on the stack, instead make
    them both static const. Makes the object code smaller by over 200 bytes:

    Before:
    text data bss dec hex filename
    111503 37696 448 149647 2488f fs/cifs/file.o

    After:
    text data bss dec hex filename
    111140 37856 448 149444 247c4 fs/cifs/file.o

    Signed-off-by: Colin Ian King
    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg

    Colin Ian King
     

01 Aug, 2017

1 commit

  • This patch converts most of the in-kernel filesystems that do writeback
    out of the pagecache to report errors using the errseq_t-based
    infrastructure that was recently added. This allows them to report
    errors once for each open file description.

    Most filesystems have a fairly straightforward fsync operation. They
    call filemap_write_and_wait_range to write back all of the data and
    wait on it, and then (sometimes) sync out the metadata.

    For those filesystems this is a straightforward conversion from calling
    filemap_write_and_wait_range in their fsync operation to calling
    file_write_and_wait_range.

    Acked-by: Jan Kara
    Acked-by: Dave Kleikamp
    Signed-off-by: Jeff Layton

    Jeff Layton
     

06 Jul, 2017

2 commits

  • When a CIFS filesystem is mounted with the forcemand option and the
    following command is run on it, lockdep warns about a circular locking
    dependency between CifsInodeInfo::lock_sem and the inode lock.

    while echo foo > hello; do :; done & while touch -c hello; do :; done

    cifs_writev() takes the locks in the wrong order, but note that we can't
    only flip the order around because it releases the inode lock before the
    call to generic_write_sync() while it holds the lock_sem across that
    call.

    But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across
    the generic_write_sync() call either, so we can release both the locks
    before generic_write_sync(), and change the order.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.12.0-rc7+ #9 Not tainted
    ------------------------------------------------------
    touch/487 is trying to acquire lock:
    (&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0

    but task is already holding lock:
    (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}:
    __lock_acquire+0x1f74/0x38f0
    lock_acquire+0x1cc/0x600
    down_write+0x74/0x110
    cifs_strict_writev+0x3cb/0x8c0
    __vfs_write+0x4c1/0x930
    vfs_write+0x14c/0x2d0
    SyS_write+0xf7/0x240
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    -> #0 (&cifsi->lock_sem){++++..}:
    check_prevs_add+0xfa0/0x1d10
    __lock_acquire+0x1f74/0x38f0
    lock_acquire+0x1cc/0x600
    down_write+0x74/0x110
    cifsFileInfo_put+0x88f/0x16a0
    cifs_setattr+0x992/0x1680
    notify_change+0x61a/0xa80
    utimes_common+0x3d4/0x870
    do_utimes+0x1c1/0x220
    SyS_utimensat+0x84/0x1a0
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&sb->s_type->i_mutex_key#11);
    lock(&cifsi->lock_sem);
    lock(&sb->s_type->i_mutex_key#11);
    lock(&cifsi->lock_sem);

    *** DEADLOCK ***

    2 locks held by touch/487:
    #0: (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0
    #1: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870

    stack backtrace:
    CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9
    Call Trace:
    dump_stack+0xdb/0x185
    print_circular_bug+0x45b/0x790
    __lock_acquire+0x1f74/0x38f0
    lock_acquire+0x1cc/0x600
    down_write+0x74/0x110
    cifsFileInfo_put+0x88f/0x16a0
    cifs_setattr+0x992/0x1680
    notify_change+0x61a/0xa80
    utimes_common+0x3d4/0x870
    do_utimes+0x1c1/0x220
    SyS_utimensat+0x84/0x1a0
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Fixes: 19dfc1f5f2ef03a52 ("cifs: fix the race in cifs_writev()")
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steve French
    Acked-by: Pavel Shilovsky

    Rabin Vincent
     
  • Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Steve French

    Jeff Layton
     

21 Jun, 2017

1 commit


10 May, 2017

1 commit

  • cifs_relock_file() can perform a down_write() on the inode's lock_sem even
    though it was already performed in cifs_strict_readv(). Lockdep complains
    about this. AFAICS, there is no problem here, and lockdep just needs to be
    told that this nesting is OK.

    =============================================
    [ INFO: possible recursive locking detected ]
    4.11.0+ #20 Not tainted
    ---------------------------------------------
    cat/701 is trying to acquire lock:
    (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00

    but task is already holding lock:
    (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&cifsi->lock_sem);
    lock(&cifsi->lock_sem);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    1 lock held by cat/701:
    #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

    stack backtrace:
    CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
    Call Trace:
    dump_stack+0x85/0xc2
    __lock_acquire+0x17dd/0x2260
    ? trace_hardirqs_on_thunk+0x1a/0x1c
    ? preempt_schedule_irq+0x6b/0x80
    lock_acquire+0xcc/0x260
    ? lock_acquire+0xcc/0x260
    ? cifs_reopen_file+0x7a7/0xc00
    down_read+0x2d/0x70
    ? cifs_reopen_file+0x7a7/0xc00
    cifs_reopen_file+0x7a7/0xc00
    ? printk+0x43/0x4b
    cifs_readpage_worker+0x327/0x8a0
    cifs_readpage+0x8c/0x2a0
    generic_file_read_iter+0x692/0xd00
    cifs_strict_readv+0x29f/0x310
    generic_file_splice_read+0x11c/0x1c0
    do_splice_to+0xa5/0xc0
    splice_direct_to_actor+0xfa/0x350
    ? generic_pipe_buf_nosteal+0x10/0x10
    do_splice_direct+0xb5/0xe0
    do_sendfile+0x278/0x3a0
    SyS_sendfile64+0xc4/0xe0
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Signed-off-by: Rabin Vincent
    Acked-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Rabin Vincent
     

03 May, 2017

2 commits

  • This patch adds support to process write calls passed by io_submit()
    asynchronously. It based on the previously introduced async context
    that allows to process i/o responses in a separate thread and
    return the caller immediately for asynchronous calls.

    This improves writing performance of single threaded applications
    with increasing of i/o queue depth size.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • This patch adds support to process read calls passed by io_submit()
    asynchronously. It based on the previously introduced async context
    that allows to process i/o responses in a separate thread and
    return the caller immediately for asynchronous calls.

    This improves reading performance of single threaded applications
    with increasing of i/o queue depth size.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     

11 Apr, 2017

1 commit

  • This fixes Continuous Availability when errors during
    file reopen are encountered.

    cifs_user_readv and cifs_user_writev would wait for ever if
    results of cifs_reopen_file are not stored and for later inspection.

    In fact, results are checked and, in case of errors, a chain
    of function calls leading to reads and writes to be scheduled in
    a separate thread is skipped.
    These threads will wake up the corresponding waiters once reads
    and writes are done.

    However, given the return value is not stored, when rc is checked
    for errors a previous one (always zero) is inspected instead.
    This leads to pending reads/writes added to the list, making
    cifs_user_readv and cifs_user_writev wait for ever.

    Signed-off-by: Germano Percossi
    Reviewed-by: Pavel Shilovsky
    CC: Stable
    Signed-off-by: Steve French

    Germano Percossi
     

25 Feb, 2017

1 commit

  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

02 Feb, 2017

2 commits

  • Since we have two different types of reads (pagecache and direct)
    we need to process such responses differently after decryption of
    a packet. The change allows to specify a callback that copies a read
    payload data into preallocated pages.

    Signed-off-by: Pavel Shilovsky

    Pavel Shilovsky
     
  • Currently we call copy_page_to_iter() for uncached reading into a pipe.
    This is wrong because it treats pages as VFS cache pages and copies references
    rather than actual data. When we are trying to read from the pipe we end up
    calling page_cache_pipe_buf_confirm() which returns -ENODATA. This error
    is translated into 0 which is returned to a user.

    This issue is reproduced by running xfs-tests suite (generic test #249)
    against mount points with "cache=none". Fix it by mapping pages manually
    and calling copy_to_iter() that copies data into the pipe.

    Cc: Stable
    Signed-off-by: Pavel Shilovsky

    Pavel Shilovsky
     

06 Dec, 2016

1 commit

  • With the current code it is possible to lock a mutex twice when
    a subsequent reconnects are triggered. On the 1st reconnect we
    reconnect sessions and tcons and then persistent file handles.
    If the 2nd reconnect happens during the reconnecting of persistent
    file handles then the following sequence of calls is observed:

    cifs_reopen_file -> SMB2_open -> small_smb2_init -> smb2_reconnect
    -> cifs_reopen_persistent_file_handles -> cifs_reopen_file (again!).

    So, we are trying to acquire the same cfile->fh_mutex twice which
    is wrong. Fix this by moving reconnecting of persistent handles to
    the delayed work (smb2_reconnect_server) and submitting this work
    every time we reconnect tcon in SMB2 commands handling codepath.

    This can also lead to corruption of a temporary file list in
    cifs_reopen_persistent_file_handles() because we can recursively
    call this function twice.

    Cc: Stable # v4.9+
    Signed-off-by: Pavel Shilovsky

    Pavel Shilovsky
     

14 Oct, 2016

2 commits


13 Oct, 2016

2 commits

  • Continuous Availability features like persistent handles
    require that clients reconnect their open files, not
    just the sessions, soon after the network connection comes
    back up, otherwise the server will throw away the state
    (byte range locks, leases, deny modes) on those handles
    after a timeout.

    Add code to reconnect handles when use_persistent set
    (e.g. Continuous Availability shares) after tree reconnect.

    Signed-off-by: Aurelien Aptel
    Reviewed-by: Germano Percossi
    Signed-off-by: Steve French

    Steve French
     
  • Remove the global file_list_lock to simplify cifs/smb3 locking and
    have spinlocks that more closely match the information they are
    protecting.

    Add new tcon->open_file_lock and file->file_info_lock spinlocks.
    Locks continue to follow a heirachy,
    cifs_socket --> cifs_ses --> cifs_tcon --> cifs_file
    where global tcp_ses_lock still protects socket and cifs_ses, while the
    the newer locks protect the lower level structure's information
    (tcon and cifs_file respectively).

    CC: Stable
    Signed-off-by: Steve French
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Aurelien Aptel
    Reviewed-by: Germano Percossi

    Steve French
     

11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

28 Sep, 2016

2 commits


27 Jul, 2016

1 commit

  • Vladimir has noticed that we might declare memcg oom even during
    readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
    restriction) while __do_page_cache_readahead uses
    page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
    OOMs. This gfp mask discrepancy is really unfortunate and easily
    fixable. Drop page_cache_alloc_readahead() which only has one user and
    outsource the gfp_mask logic into readahead_gfp_mask and propagate this
    mask from __do_page_cache_readahead down to read_pages.

    This alone would have only very limited impact as most filesystems are
    implementing ->readpages and the common implementation mpage_readpages
    does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
    use readahead_gfp_mask instead as this function is called only during
    readahead as well. The same applies to read_cache_pages.

    ext4 has its own ext4_mpage_readpages but the path which has pages !=
    NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
    doing a very similar pattern to mpage_readpages so the same can be
    applied to them as well.

    [akpm@linux-foundation.org: coding-style fixes]
    [mhocko@suse.com: restrict gfp mask in mpage_alloc]
    Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Chris Mason
    Cc: Steve French
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Cc: Mike Marshall
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

24 Jun, 2016

1 commit

  • Right now, we send the tgid cross the wire. What we really want to send
    though is a hashed fl_owner_t since samba treats this field as a generic
    lockowner.

    It turns out that because we enforce and release locks locally before
    they are ever sent to the server, this patch makes no difference in
    behavior. Still, setting OFD locks on the server using the process
    pid seems wrong, so I think this patch still makes sense.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French
    Acked-by: Pavel Shilovsky
    Acked-by: Sachin Prabhu

    Jeff Layton
     

19 May, 2016

2 commits

  • Pull cifs iovec cleanups from Al Viro.

    * 'sendmsg.cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    cifs: don't bother with kmap on read_pages side
    cifs_readv_receive: use cifs_read_from_socket()
    cifs: no need to wank with copying and advancing iovec on recvmsg side either
    cifs: quit playing games with draining iovecs
    cifs: merge the hash calculation helpers

    Linus Torvalds
     
  • Pull cifs updates from Steve French:
    "Various small CIFS and SMB3 fixes (including some for stable)"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    remove directory incorrectly tries to set delete on close on non-empty directories
    Update cifs.ko version to 2.09
    fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
    fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
    fs/cifs: correctly to anonymous authentication for the LANMAN authentication
    fs/cifs: correctly to anonymous authentication via NTLMSSP
    cifs: remove any preceding delimiter from prefix_path
    cifs: Use file_dentry()

    Linus Torvalds
     

18 May, 2016

1 commit

  • CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can
    lead to a crash.

    Fix by replacing direct access of file->f_path.dentry with the
    file_dentry() accessor, which will always return a native object.

    Signed-off-by: Goldwyn Rodrigues
    Acked-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Goldwyn Rodrigues
     

02 May, 2016

2 commits

  • The kiocb already has the new position, so use that. The only interesting
    case is AIO, where we currently don't bother updating ki_pos. We're about
    to free the kiocb after we're done, so we might as well update it to make
    everyone's life simpler.

    While we're at it also return the bytes written argument passed in if
    we were successful so that the boilerplate error switch code in the
    callers can go away.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This will allow us to do per-I/O sync file writes, as required by a lot
    of fileservers or storage targets.

    XXX: Will need a few additional audits for O_DSYNC

    Signed-off-by: Al Viro

    Christoph Hellwig