13 Jun, 2014

1 commit

  • Pull Ceph updates from Sage Weil:
    "This has a mix of bug fixes and cleanups.

    Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT
    check when a second rbd image is mapped and a couple memory leaks.
    Zheng fixes several issues with fragmented directories and multiple
    MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's
    patches fix setting and unsetting RBD images read-only.

    Naturally there are several other cleanups mixed in for good measure"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    rbd: only set disk to read-only once
    rbd: move calls that may sleep out of spin lock range
    rbd: add ioctl for rbd
    ceph: use truncate_pagecache() instead of truncate_inode_pages()
    ceph: include time stamp in every MDS request
    rbd: fix ida/idr memory leak
    rbd: use reference counts for image requests
    rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
    rbd: make sure we have latest osdmap on 'rbd map'
    libceph: add ceph_monc_wait_osdmap()
    libceph: mon_get_version request infrastructure
    libceph: recognize poolop requests in debugfs
    ceph: refactor readpage_nounlock() to make the logic clearer
    mds: check cap ID when handling cap export message
    ceph: remember subtree root dirfrag's auth MDS
    ceph: introduce ceph_fill_fragtree()
    ceph: handle cap import atomically
    ceph: pre-allocate ceph_cap struct for ceph_add_cap()
    ceph: update inode fields according to issued caps
    rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    ...

    Linus Torvalds
     

06 Jun, 2014

3 commits

  • Add ceph_monc_wait_osdmap(), which will block until the osdmap with the
    specified epoch is received or timeout occurs.

    Export both of these as they are going to be needed by rbd.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Add support for mon_get_version requests to libceph. This reuses much
    of the ceph_mon_generic_request infrastructure, with one exception.
    Older OSDs don't set mon_get_version reply hdr->tid even if the
    original request had a non-zero tid, which makes it impossible to
    lookup ceph_mon_generic_request contexts by tid in get_generic_reply()
    for such replies. As a workaround, we allocate a reply message on the
    reply path. This can probably interfere with revoke, but I don't see
    a better way.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Cap message and request reply from non-auth MDS may carry stale
    information (corresponding locks are in LOCK states) even they
    have the newest inode version. So client should update inode fields
    according to issued caps.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

07 May, 2014

1 commit


05 Apr, 2014

12 commits


03 Apr, 2014

5 commits

  • Use the newly introduced LOOKUPNAME MDS request to connect child
    inode to its parent directory.

    Signed-off-by: Yan, Zheng
    Reviewed-by: Sage Weil

    Yan, Zheng
     
  • Our longest osd request now contains 3 ops: copyup+hint+write.

    Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was
    hard-coded to 2. Fix it, and switch to rbd_assert while at it.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • This is primarily for rbd's benefit and is supposed to combat
    fragmentation:

    "... knowing that rbd images have a 4m size, librbd can pass a hint
    that will let the osd do the xfs allocation size ioctl on new files so
    that they are allocated in 1m or 4m chunks. We've seen cases where
    users with rbd workloads have very high levels of fragmentation in xfs
    and this would mitigate that and probably have a pretty nice
    performance benefit."

    SETALLOCHINT is considered advisory, so our backwards compatibility
    mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Encode ceph_osd_op::flags field so that it gets sent over the wire.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • With the addition of erasure coding support in the future, scratch
    variable-length array in crush_do_rule_ary() is going to grow to at
    least 200 bytes on average, on top of another 128 bytes consumed by
    rawosd/osd arrays in the call chain. Replace it with a buffer inside
    struct osdmap and a mutex. This shouldn't result in any contention,
    because all osd requests were already serialized by request_mutex at
    that point; the only unlocked caller was ceph_ioctl_get_dataloc().

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

18 Feb, 2014

1 commit


31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

29 Jan, 2014

1 commit

  • Pull ceph updates from Sage Weil:
    "This is a big batch. From Ilya we have:

    - rbd support for more than ~250 mapped devices (now uses same scheme
    that SCSI does for device major/minor numbering)
    - crush updates for new mapping behaviors (will be needed for coming
    erasure coding support, among other things)
    - preliminary support for tiered storage pools

    There is also a big series fixing a pile cephfs bugs with clustered
    MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph
    fscache improvements from Li Wang, improved behavior when we get
    ENOSPC from Josh Durgin, some readv/writev improvements from
    Majianpeng, and the usual mix of small cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits)
    ceph: cast PAGE_SIZE to size_t in ceph_sync_write()
    ceph: fix dout() compile warnings in ceph_filemap_fault()
    libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
    libceph: follow redirect replies from osds
    libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
    libceph: follow {read,write}_tier fields on osd request submission
    libceph: add ceph_pg_pool_by_id()
    libceph: CEPH_OSD_FLAG_* enum update
    libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
    libceph: introduce and start using oid abstraction
    libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
    libceph: move ceph_file_layout helpers to ceph_fs.h
    libceph: start using oloc abstraction
    libceph: dout() is missing a newline
    libceph: add ceph_kv{malloc,free}() and switch to them
    libceph: support CEPH_FEATURE_EXPORT_PEER
    ceph: add imported caps when handling cap export message
    ceph: add open export target session helper
    ceph: remove exported caps when handling cap import message
    ceph: handle session flush message
    ...

    Linus Torvalds
     

28 Jan, 2014

11 commits


26 Jan, 2014

1 commit

  • Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into
    two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them.

    ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is
    vmalloc()'ed with __GFP_HIGHMEM set. This changes the existing
    behaviour:

    - for buffers (ceph_buffer_new()), from trying to kmalloc() everything
    and using vmalloc() just as a fallback

    - for messages (ceph_msg_new()), from going to vmalloc() for anything
    bigger than a page

    - for messages (ceph_msg_new()), from disallowing vmalloc() to use high
    memory

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

24 Jan, 2014

2 commits

  • Now that the definition is centralized in , the
    definitions of U32_MAX (and related) elsewhere in the kernel can be
    removed.

    Signed-off-by: Alex Elder
    Acked-by: Sage Weil
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • The symbol U32_MAX is defined in several spots. Change these
    definitions to be conditional. This is in preparation for the next
    patch, which centralizes the definition in .

    Signed-off-by: Alex Elder
    Cc: Sage Weil
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     

21 Jan, 2014

1 commit