26 Jan, 2017

5 commits

  • commit fe2ed42517533068ac03eed5630fffafff27eacf upstream.

    sparse says:

    fs/ceph/inode.c:308:36: warning: incorrect type in argument 1 (different base types)
    fs/ceph/inode.c:308:36: expected unsigned int [unsigned] [usertype] a
    fs/ceph/inode.c:308:36: got restricted __le32 [usertype] frag
    fs/ceph/inode.c:308:46: warning: incorrect type in argument 2 (different base types)
    fs/ceph/inode.c:308:46: expected unsigned int [unsigned] [usertype] b
    fs/ceph/inode.c:308:46: got restricted __le32 [usertype] frag

    We need to convert these values to host-endian before calling the
    comparator.

    Fixes: a407846ef7c6 ("ceph: don't assume frag tree splits in mds reply are sorted")
    Signed-off-by: Jeff Layton
    Reviewed-by: Sage Weil
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 1097680d759918ce4a8705381c0ab2ed7bd60cf1 upstream.

    sparse says:

    fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types)
    fs/ceph/dir.c:1248:50: expected restricted __le32 [usertype] mask
    fs/ceph/dir.c:1248:50: got int [signed] [assigned] mask

    Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
    Signed-off-by: Jeff Layton
    Reviewed-by: Sage Weil
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 6e09d0fb64402cec579f029ca4c7f39f5c48fc60 upstream.

    Commit 5c341ee32881 ("ceph: fix scheduler warning due to nested
    blocking") causes infinite loop when process is interrupted. Fix it.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Yan, Zheng
     
  • commit 5c341ee32881c554727ec14b71ec3e8832f01989 upstream.

    try_get_cap_refs can be used as a condition in a wait_event* calls.
    This is all fine until it has to call __ceph_do_pending_vmtruncate,
    which in turn acquires the i_truncate_mutex. This leads to a situation
    in which a task's state is !TASK_RUNNING and at the same time it's
    trying to acquire a sleeping primitive. In essence a nested sleeping
    primitives are being used. This causes the following warning:

    WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
    do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_event+0x5d/0x110
    ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
    CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6
    Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
    0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
    ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
    0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
    Call Trace:
    [] dump_stack+0x67/0x9e
    [] warn_slowpath_common+0x86/0xc0
    [] warn_slowpath_fmt+0x4c/0x50
    [] ? prepare_to_wait_event+0x5d/0x110
    [] ? prepare_to_wait_event+0x5d/0x110
    [] __might_sleep+0x9f/0xb0
    [] mutex_lock+0x20/0x40
    [] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
    [] try_get_cap_refs+0xa2/0x320 [ceph]
    [] ceph_get_caps+0x255/0x2b0 [ceph]
    [] ? wait_woken+0xb0/0xb0
    [] ceph_write_iter+0x2b1/0xde0 [ceph]
    [] ? schedule_timeout+0x202/0x260
    [] ? kmem_cache_free+0x1ea/0x200
    [] ? iput+0x9e/0x230
    [] ? __might_sleep+0x52/0xb0
    [] ? __might_fault+0x37/0x40
    [] ? cp_new_stat+0x153/0x170
    [] __vfs_write+0xaa/0xe0
    [] vfs_write+0xa9/0x190
    [] ? set_close_on_exec+0x31/0x70
    [] SyS_write+0x46/0xa0

    This happens since wait_event_interruptible can interfere with the
    mutex locking code, since they both fiddle with the task state.

    Fix the issue by using the newly-added nested blocking infrastructure
    in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with
    nested blocking")

    Link: https://lwn.net/Articles/628628/
    Signed-off-by: Nikolay Borisov
    Signed-off-by: Yan, Zheng
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     
  • commit 6df8c9d80a27cb587f61b4f06b57e248d8bc3f86 upstream.

    sparse says:

    fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer

    The op value is __le32, so we need to convert it before comparing it.

    Signed-off-by: Jeff Layton
    Reviewed-by: Sage Weil
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     

08 Dec, 2016

1 commit

  • This function sets req->r_locked_dir which is supposed to indicate to
    ceph_fill_trace that the parent's i_rwsem is locked for write.
    Unfortunately, there is no guarantee that the dir will be locked when
    d_revalidate is called, so we really don't want ceph_fill_trace to do
    any dcache manipulation from this context. Clear req->r_locked_dir since
    it's clearly not safe to do that.

    What we really want to know with d_revalidate is whether the dentry
    still points to the same inode. ceph_fill_trace installs a pointer to
    the inode in req->r_target_inode, so we can just compare that to
    d_inode(dentry) to see if it's the same one after the lookup.

    Also, since we aren't generally interested in the parent here, we can
    switch to using a GETATTR to hint that to the MDS, which also means that
    we only need to reserve one cap.

    Finally, just remove the d_unhashed check. That's really outside the
    purview of a filesystem's d_revalidate. If the thing became unhashed
    while we're checking it, then that's up to the VFS to handle anyway.

    Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
    Link: http://tracker.ceph.com/issues/18041
    Reported-by: Donatas Abraitis
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

11 Nov, 2016

1 commit

  • Splice read/write implementation changed recently. When using
    generic_file_splice_read(), iov_iter with type == ITER_PIPE is
    passed to filesystem's read_iter callback. But ceph_sync_read()
    can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
    expects pages from page cache).

    Fixing ceph_sync_read() requires a big patch. So use default
    splice read callback for now.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

18 Oct, 2016

3 commits

  • Fixes the following sparse warning:

    fs/ceph/xattr.c:19:28: warning:
    symbol 'ceph_other_xattr_handler' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Signed-off-by: Ilya Dryomov

    Wei Yongjun
     
  • fs/ceph/super.c: In function ‘ceph_real_mount’:
    fs/ceph/super.c:818: warning: ‘root’ may be used uninitialized in this function

    If s_root is already valid, dentry pointer root is never initialized,
    and returned by ceph_real_mount(). This will cause a crash later when
    the caller dereferences the pointer.

    Fixes: ce2728aaa82bbeba ("ceph: avoid accessing / when mounting a subpath")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Yan, Zheng

    Geert Uytterhoeven
     
  • following sequence of events tigger the race

    - client readdir frag 0* -> got item 'A'
    - MDS merges frag 0* and frag 1*
    - client send readdir request (frag 1*, offset 2, readdir_start 'A')
    - MDS reply items (that are after item 'A') in frag *

    Link: http://tracker.ceph.com/issues/17286
    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

16 Oct, 2016

1 commit

  • In case __ceph_do_getattr returns an error and the retry_op in
    ceph_read_iter is not READ_INLINE, then it's possible to invoke
    __free_page on a page which is NULL, this naturally leads to a crash.
    This can happen when, for example, a process waiting on a MDS reply
    receives sigterm.

    Fix this by explicitly checking whether the page is set or not.

    Cc: stable@vger.kernel.org # 3.19+
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Nikolay Borisov
     

11 Oct, 2016

4 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     
  • Pull Ceph updates from Ilya Dryomov:
    "The big ticket item here is support for rbd exclusive-lock feature,
    with maintenance operations offloaded to userspace (Douglas Fuller,
    Mike Christie and myself). Another block device bullet is a series
    fixing up layering error paths (myself).

    On the filesystem side, we've got patches that improve our handling of
    buffered vs dio write races (Neil Brown) and a few assorted fixes from
    Zheng. Also included a couple of random cleanups and a minor CRUSH
    update"

    * tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
    crush: remove redundant local variable
    crush: don't normalize input of crush_ln iteratively
    libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
    libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
    ceph: fix description for rsize and rasize mount options
    rbd: use kmalloc_array() in rbd_header_from_disk()
    ceph: use list_move instead of list_del/list_add
    ceph: handle CEPH_SESSION_REJECT message
    ceph: avoid accessing / when mounting a subpath
    ceph: fix mandatory flock check
    ceph: remove warning when ceph_releasepage() is called on dirty page
    ceph: ignore error from invalidate_inode_pages2_range() in direct write
    ceph: fix error handling of start_read()
    rbd: add rbd_obj_request_error() helper
    rbd: img_data requests don't own their page array
    rbd: don't call rbd_osd_req_format_read() for !img_data requests
    rbd: rework rbd_img_obj_exists_submit() error paths
    rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
    rbd: move bumping img_request refcount into rbd_obj_request_submit()
    rbd: mark the original request as done if stat request fails
    ...

    Linus Torvalds
     

08 Oct, 2016

2 commits


03 Oct, 2016

7 commits

  • Using list_move() instead of list_del() + list_add().

    Signed-off-by: Wei Yongjun
    Signed-off-by: Ilya Dryomov

    Wei Yongjun
     
  • Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Accessing / causes failuire if the client has caps that restrict path

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • If O_DIRECT writes are racing with buffered writes, then
    the call to invalidate_inode_pages2_range() can call ceph_releasepage()
    on dirty pages.

    Most filesystems hold inode_lock() across O_DIRECT writes so they do not
    suffer this race, but cephfs deliberately drops the lock, and opens a window
    for the race.

    This race can be triggered with the generic/036 test from the xfstests
    test suite. It doesn't happen every time, but it does happen often.

    As the possibilty is expected, remove the warning, and instead include
    the PageDirty() status in the debug message.

    Signed-off-by: NeilBrown
    Reviewed-by: Jeff Layton
    Reviewed-by: Yan, Zheng

    NeilBrown
     
  • This call can fail if there are dirty pages. The preceding call to
    filemap_write_and_wait_range() will normally remove dirty pages, but
    as inode_lock() is not held over calls to ceph_direct_read_write(), it
    could race with non-direct writes and pages could be dirtied
    immediately after filemap_write_and_wait_range() returns

    If there are dirty pages, they will be removed by the subsequent call
    to truncate_inode_pages_range(), so having them here is not a problem.

    If the 'ret' value is left holding an error, then in the async IO case
    (aio_req is not NULL) the loop that would normally call
    ceph_osdc_start_request() will see the error in 'ret' and abort all
    requests. This doesn't seem like correct behaviour.

    So use separate 'ret2' instead of overloading 'ret'.

    Signed-off-by: NeilBrown
    Reviewed-by: Jeff Layton
    Reviewed-by: Yan, Zheng

    NeilBrown
     
  • If start_page() fails to add a page to page cache or fails to send
    OSD request. It should cal put_page() (instead of free_page()) for
    relevant pages.

    Besides, start_page() need to cancel fscache readpage if it fails
    to send OSD request.

    Signed-off-by: Yan, Zheng
    Reported-by: Zhi Zhang

    Yan, Zheng
     

28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

27 Sep, 2016

2 commits

  • Generated patch:

    sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
    sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This is trivial to do:

    - add flags argument to foo_rename()
    - check if flags is zero
    - assign foo_rename() to .rename2 instead of .rename

    This doesn't mean it's impossible to support RENAME_NOREPLACE for these
    filesystems, but it is not trivial, like for local filesystems.
    RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
    for a file to be created on one host while it is overwritten by rename on
    another host).

    Filesystems converted:

    9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

    After this, we can get rid of the duplicate interfaces for rename.

    Signed-off-by: Miklos Szeredi
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Howells [AFS]
    Acked-by: Mike Marshall
    Cc: Eric Van Hensbergen
    Cc: Ilya Dryomov
    Cc: Jan Harkes
    Cc: Tyler Hicks
    Cc: Oleg Drokin
    Cc: Trond Myklebust
    Cc: Mark Fasheh

    Miklos Szeredi
     

22 Sep, 2016

3 commits

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • To avoid clearing of capabilities or security related extended
    attributes too early, inode_change_ok() will need to take dentry instead
    of inode. ceph_setattr() has the dentry easily available but
    __ceph_setattr() is also called from ceph_set_acl() where dentry is not
    easily available. Luckily that call path does not need inode_change_ok()
    to be called anyway. So reorganize functions a bit so that
    inode_change_ok() is called only from paths where dentry is available.

    Reviewed-by: Christoph Hellwig
    Acked-by: Jeff Layton
    Signed-off-by: Jan Kara

    Jan Kara
     
  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     

05 Sep, 2016

1 commit

  • Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
    modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
    fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
    comparison operator with an assignment one.

    This looks like a typo which is reported by clang when building the
    kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ^
    ( )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ^~
    !=

    Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
    Signed-off-by: Nicolas Iooss
    Signed-off-by: Ilya Dryomov

    Nicolas Iooss
     

09 Aug, 2016

2 commits


03 Aug, 2016

1 commit

  • Pull Ceph updates from Ilya Dryomov:
    "The highlights are:

    - RADOS namespace support in libceph and CephFS (Zheng Yan and
    myself). The stopgaps added in 4.5 to deny access to inodes in
    namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
    bit is now fully supported

    - A large rework of the MDS cap flushing code (Zheng Yan)

    - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU

    On top of that we've got a few CephFS bug fixes, a couple of cleanups
    and Arnd's workaround for a weird genksyms issue"

    * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
    ceph: fix symbol versioning for ceph_monc_do_statfs
    ceph: Correctly return NXIO errors from ceph_llseek
    ceph: Mark the file cache as unreclaimable
    ceph: optimize cap flush waiting
    ceph: cleanup ceph_flush_snaps()
    ceph: kick cap flushes before sending other cap message
    ceph: introduce an inode flag to indicates if snapflush is needed
    ceph: avoid sending duplicated cap flush message
    ceph: unify cap flush and snapcap flush
    ceph: use list instead of rbtree to track cap flushes
    ceph: update types of some local varibles
    ceph: include 'follows' of pending snapflush in cap reconnect message
    ceph: update cap reconnect message to version 3
    ceph: mount non-default filesystem by name
    libceph: fsmap.user subscription support
    ceph: handle LOOKUP_RCU in ceph_d_revalidate
    ceph: allow dentry_lease_is_valid to work under RCU walk
    ceph: clear d_fsinfo pointer under d_lock
    ceph: remove ceph_mdsc_lease_release
    ceph: don't use ->d_time
    ...

    Linus Torvalds
     

29 Jul, 2016

1 commit

  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

28 Jul, 2016

5 commits

  • ceph_llseek does not correctly return NXIO errors because the 'out' path
    always returns 'offset'.

    Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
    Signed-off-by: Phil Turnbull
    Signed-off-by: Yan, Zheng

    Phil Turnbull
     
  • Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
    that it can satisfy its internal needs. Inspecting the code shows that
    most of the caches are indeed reclaimable since they are directly
    related to the generic inode/dentry shrinkers. However, one of the
    cache used to satisfy struct file is not reclaimable since its
    entries are freed only when the last reference to the file is
    dropped. If a heavily loaded node opens a lot of files it can
    introduce non-trivial discrepancies between memory shown as reclaimable
    and what is actually reclaimed when drop_caches is used.

    Fix this by removing the reclaimable flag for the file's cache.

    Signed-off-by: Nikolay Borisov
    Signed-off-by: Yan, Zheng

    Nikolay Borisov
     
  • Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
    is someone waiting for it to finish. When getting flush ack message,
    we check the 'wake' flag in corresponding ceph_cap_flush struct to
    decide if we should wake up waiters. One corner case is that the
    acked cap flush has 'wake' flags is set, but it is not the first one
    on the flushing list. We do not wake up waiters in this case, set
    'wake' flags of preceding ceph_cap_flush struct instead

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • This patch devide __ceph_flush_snaps() into two stags. In the first
    stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
    to cap flush lists. __ceph_flush_snaps() keeps holding the
    i_ceph_lock in this stagge. So inode's auth cap can not change. In
    the second stage, __ceph_flush_snaps() send flushsnap cap messages.
    i_ceph_lock is unlocked before sending each cap message. If auth cap
    changes in the middle, __ceph_flush_snaps() just stops. This is OK
    because kick_flushing_inode_caps() will re-send flushsnap cap messages
    to inode's new auth MDS.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • If ceph_check_caps() wants to send cap message to a recovering MDS,
    make sure it kicks cap flushes first.

    Signed-off-by: Yan, Zheng

    Yan, Zheng