03 Nov, 2015

1 commit


03 Jul, 2015

1 commit

  • Pull Ceph updates from Sage Weil:
    "We have a pile of bug fixes from Ilya, including a few patches that
    sync up the CRUSH code with the latest from userspace.

    There is also a long series from Zheng that fixes various issues with
    snapshots, inline data, and directory fsync, some simplification and
    improvement in the cap release code, and a rework of the caching of
    directory contents.

    To top it off there are a few small fixes and cleanups from Benoit and
    Hong"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
    rbd: use GFP_NOIO in rbd_obj_request_create()
    crush: fix a bug in tree bucket decode
    libceph: Fix ceph_tcp_sendpage()'s more boolean usage
    libceph: Remove spurious kunmap() of the zero page
    rbd: queue_depth map option
    rbd: store rbd_options in rbd_device
    rbd: terminate rbd_opts_tokens with Opt_err
    ceph: fix ceph_writepages_start()
    rbd: bump queue_max_segments
    ceph: rework dcache readdir
    crush: sync up with userspace
    crush: fix crash from invalid 'take' argument
    ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
    ceph: pre-allocate data structure that tracks caps flushing
    ceph: re-send flushing caps (which are revoked) in reconnect stage
    ceph: send TID of the oldest pending caps flush to MDS
    ceph: track pending caps flushing globally
    ceph: track pending caps flushing accurately
    libceph: fix wrong name "Ceph filesystem for Linux"
    ceph: fix directory fsync
    ...

    Linus Torvalds
     

25 Jun, 2015

6 commits

  • Previously our dcache readdir code relies on that child dentries in
    directory dentry's d_subdir list are sorted by dentry's offset in
    descending order. When adding dentries to the dcache, if a dentry
    already exists, our readdir code moves it to head of directory
    dentry's d_subdir list. This design relies on dcache internals.
    Al Viro suggests using ncpfs's approach: keeping array of pointers
    to dentries in page cache of directory inode. the validity of those
    pointers are presented by directory inode's complete and ordered
    flags. When a dentry gets pruned, we clear directory inode's complete
    flag in the d_prune() callback. Before moving a dentry to other
    directory, we clear the ordered flag for both old and new directory.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • So we know TID of the oldest pending caps flushing. Later patch will
    send this information to MDS, so that MDS can trim its completed caps
    flush list.

    Tracking pending caps flushing globally also simplifies syncfs code.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Previously we do not trace accurate TID for flushing caps. when
    MDS failovers, we have no choice but to re-send all flushing caps
    with a new TID. This can cause problem because MDS can has already
    flushed some caps and has issued the same caps to other client.
    The re-sent cap flush has a new TID, which makes MDS unable to
    detect if it has already processed the cap flush.

    This patch adds code to track pending caps flushing accurately.
    When re-sending cap flush is needed, we use its original flush
    TID.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps()
    accesses snap realm's cached_context. So we need take read lock
    of snap_rwsem.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng

    Yan, Zheng
     

11 May, 2015

1 commit


16 Apr, 2015

1 commit


20 Feb, 2015

1 commit

  • Pull Ceph changes from Sage Weil:
    "On the RBD side, there is a conversion to blk-mq from Christoph,
    several long-standing bug fixes from Ilya, and some cleanup from
    Rickard Strandqvist.

    On the CephFS side there is a long list of fixes from Zheng, including
    improved session handling, a few IO path fixes, some dcache management
    correctness fixes, and several blocking while !TASK_RUNNING fixes.

    The core code gets a few cleanups and Chaitanya has added support for
    TCP_NODELAY (which has been used on the server side for ages but we
    somehow missed on the kernel client).

    There is also an update to MAINTAINERS to fix up some email addresses
    and reflect that Ilya and Zheng are doing most of the maintenance for
    RBD and CephFS these days. Do not be surprised to see a pull request
    come from one of them in the future if I am unavailable for some
    reason"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    MAINTAINERS: update Ceph and RBD maintainers
    libceph: kfree() in put_osd() shouldn't depend on authorizer
    libceph: fix double __remove_osd() problem
    rbd: convert to blk-mq
    ceph: return error for traceless reply race
    ceph: fix dentry leaks
    ceph: re-send requests when MDS enters reconnecting stage
    ceph: show nocephx_require_signatures and notcp_nodelay options
    libceph: tcp_nodelay support
    rbd: do not treat standalone as flatten
    ceph: fix atomic_open snapdir
    ceph: properly mark empty directory as complete
    client: include kernel version in client metadata
    ceph: provide seperate {inode,file}_operations for snapdir
    ceph: fix request time stamp encoding
    ceph: fix reading inline data when i_size > PAGE_SIZE
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
    ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
    rbd: fix error paths in rbd_dev_refresh()
    ...

    Linus Torvalds
     

19 Feb, 2015

3 commits


21 Jan, 2015

1 commit

  • Now that we never use the backing_dev_info pointer in struct address_space
    we can simply remove it and save 4 to 8 bytes in every inode.

    Signed-off-by: Christoph Hellwig
    Acked-by: Ryusuke Konishi
    Reviewed-by: Tejun Heo
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Dec, 2014

4 commits

  • Pull ceph updates from Sage Weil:
    "The big item here is support for inline data for CephFS and for
    message signatures from Zheng. There are also several bug fixes,
    including interrupted flock request handling, 0-length xattrs, mksnap,
    cached readdir results, and a message version compat field. Finally
    there are several cleanups from Ilya, Dan, and Markus.

    Note that there is another series coming soon that fixes some bugs in
    the RBD 'lingering' requests, but it isn't quite ready yet"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    ceph: fix setting empty extended attribute
    ceph: fix mksnap crash
    ceph: do_sync is never initialized
    libceph: fixup includes in pagelist.h
    ceph: support inline data feature
    ceph: flush inline version
    ceph: convert inline data to normal data before data write
    ceph: sync read inline data
    ceph: fetch inline data when getting Fcr cap refs
    ceph: use getattr request to fetch inline data
    ceph: add inline data to pagecache
    ceph: parse inline data in MClientReply and MClientCaps
    libceph: specify position of extent operation
    libceph: add CREATE osd operation support
    libceph: add SETXATTR/CMPXATTR osd operations support
    rbd: don't treat CEPH_OSD_OP_DELETE as extent op
    ceph: remove unused stringification macros
    libceph: require cephx message signature by default
    ceph: introduce global empty snap context
    ceph: message versioning fixes
    ...

    Linus Torvalds
     
  • Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
    in getattr reply will be copied to the page.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Request reply and cap message can contain inline data. add inline data
    to the page cache if there is Fc cap.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • After creating/deleting/renaming file, offsets of sibling dentries may
    change. So we can not use cached dentries to satisfy readdir. But we can
    still use the cached dentries to conclude -ENOENT for lookup.

    This patch introduces a new inode flag indicating if child dentries are
    ordered. The flag is set at the same time marking a directory complete.
    After creating/deleting/renaming file, we clear the flag on directory
    inode. This prevents ceph_readdir() from using cached dentries to satisfy
    readdir syscall.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

20 Nov, 2014

2 commits


04 Nov, 2014

1 commit


15 Oct, 2014

2 commits

  • Both ceph_update_writeable_page and ceph_setattr will verify file size
    with max size ceph supported.
    There are two caller for ceph_update_writeable_page, ceph_write_begin and
    ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
    generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no
    chance to change file size when mmap. Likewise we have already verified the size
    in inode_change_ok when we call ceph_setattr.
    So let's remove the redundant code for max file size verification.

    Signed-off-by: Chao Yu
    Reviewed-by: Yan, Zheng

    Chao Yu
     
  • Following sequence of events can happen.
    - Client releases an inode, queues cap release message.
    - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

    The fix is force sending a getattr request to MDS if xattrs_version
    is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
    does not have xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

13 Jun, 2014

1 commit

  • Pull Ceph updates from Sage Weil:
    "This has a mix of bug fixes and cleanups.

    Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT
    check when a second rbd image is mapped and a couple memory leaks.
    Zheng fixes several issues with fragmented directories and multiple
    MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's
    patches fix setting and unsetting RBD images read-only.

    Naturally there are several other cleanups mixed in for good measure"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    rbd: only set disk to read-only once
    rbd: move calls that may sleep out of spin lock range
    rbd: add ioctl for rbd
    ceph: use truncate_pagecache() instead of truncate_inode_pages()
    ceph: include time stamp in every MDS request
    rbd: fix ida/idr memory leak
    rbd: use reference counts for image requests
    rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
    rbd: make sure we have latest osdmap on 'rbd map'
    libceph: add ceph_monc_wait_osdmap()
    libceph: mon_get_version request infrastructure
    libceph: recognize poolop requests in debugfs
    ceph: refactor readpage_nounlock() to make the logic clearer
    mds: check cap ID when handling cap export message
    ceph: remember subtree root dirfrag's auth MDS
    ceph: introduce ceph_fill_fragtree()
    ceph: handle cap import atomically
    ceph: pre-allocate ceph_cap struct for ceph_add_cap()
    ceph: update inode fields according to issued caps
    rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    ...

    Linus Torvalds
     

08 Jun, 2014

1 commit


07 Jun, 2014

1 commit


06 Jun, 2014

4 commits


29 Apr, 2014

1 commit

  • When creating a file, ceph_set_dentry_offset() puts the new dentry
    at the end of directory's d_subdirs, then set the dentry's offset
    based on directory's max offset. The offset does not reflect the
    real postion of the dentry in directory. Later readdir reply from
    MDS may change the dentry's position/offset. This inconsistency
    can cause missing/duplicate entries in readdir result if readdir
    is partly satisfied by dcache_readdir().

    The fix is clear directory's completeness after creating/renaming
    file. It prevents later readdir from using dcache_readdir().

    Fixes: http://tracker.ceph.com/issues/8025
    Signed-off-by: Yan, Zheng
    Reviewed-by: Sage Weil

    Yan, Zheng
     

05 Apr, 2014

3 commits


03 Apr, 2014

2 commits


30 Jan, 2014

1 commit

  • The merge of commit 7221fe4c2ed7 ("ceph: add acl for cephfs") raced with
    upstream changes in the generic POSIX ACL code (eg commit 2aeccbe957d0
    "fs: add generic xattr_acl handlers" and others).

    Some of the fallout was fixed in commit 4db658ea0ca ("ceph: Fix up after
    semantic merge conflict"), but it was incomplete: the set_acl
    inode_operation wasn't getting set, and the prototype needed to be
    adjusted a bit (it doesn't take a dentry anymore).

    Signed-off-by: Sage Weil
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Linus Torvalds

    Sage Weil
     

29 Jan, 2014

1 commit

  • The previous ceph-client merge resulted in ceph not even building,
    because there was a merge conflict that wasn't visible as an actual data
    conflict: commit 7221fe4c2ed7 ("ceph: add acl for cephfs") added support
    for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
    a lot of the POSIX ACL helper functions to be much more helpful to
    filesystems (see for example commits 2aeccbe957d0 "fs: add generic
    xattr_acl handlers", 5bf3258fd2ac "fs: make posix_acl_chmod more useful"
    and 37bc15392a23 "fs: make posix_acl_create more useful")

    The reason this conflict wasn't obvious was many-fold: because it was a
    semantic conflict rather than a data conflict, it wasn't visible in the
    git merge as a conflict. And because the VFS tree hadn't been in
    linux-next, people hadn't become aware of it that way. And because I
    was at jury duty this morning, I was using my laptop and as a result not
    doing constant "allmodconfig" builds.

    Anyway, this fixes the build and generally removes a fair chunk of the
    Ceph POSIX ACL support code, since the improved helpers seem to match
    really well for Ceph too. But I don't actually have any way to *test*
    the end result, and I was really hoping for some ACK's for this. Oh,
    well.

    Not compiling certainly doesn't make things easier to test, so I'm
    committing this without the acks after having waited for four hours...
    Plus it's what I would have done for the merge had I noticed the
    semantic conflict..

    Reported-by: Dave Jones
    Cc: Sage Weil
    Cc: Guangliang Zhao
    Cc: Li Wang
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Jan, 2014

1 commit

  • Version 3 cap export message includes information about the imported
    caps. It allows us to add the imported caps if the corresponding cap
    import message still hasn't been received.

    This allow us to handle situation that the importer MDS crashes and
    the cap import message is missing.

    Signed-off-by: Yan, Zheng

    Yan, Zheng