28 Jul, 2016

4 commits


26 May, 2016

4 commits

  • This is a major sync up, up to ~Jewel. The highlights are:

    - per-session request trees (vs a global per-client tree)
    - per-session locking (vs a global per-client rwlock)
    - homeless OSD session
    - no ad-hoc global per-client lists
    - support for pool quotas
    - foundation for watch/notify v2 support
    - foundation for map check (pool deletion detection) support

    The switchover is incomplete: lingering requests can be setup and
    teared down but aren't ever reestablished. This functionality is
    restored with the introduction of the new lingering infrastructure
    (ceph_osd_linger_request, linger_work, etc) in a later commit.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
    emphasise that it returns acting primary.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise
    that returned is raw PG and return -ENOENT instead of -EIO if the pool
    doesn't exist.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Currently ceph_object_id can hold object names of up to 100
    (CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases,
    expect one - long rbd image names:

    - a format 1 header is named ".rbd"
    - an object that points to a format 2 header is named "rbd_id."

    We operate on these potentially long-named objects during rbd map, and,
    for format 1 images, during header refresh. (A format 2 header name is
    a small system-generated string.)

    Lift this 100 character limit by making ceph_object_id be able to point
    to an externally-allocated string. Apart from being able to work with
    almost arbitrarily-long named objects, this allows us to reduce the
    size of ceph_object_id from >100 bytes to 64 bytes.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

15 Oct, 2014

2 commits

  • The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Following sequence of events can happen.
    - Client releases an inode, queues cap release message.
    - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

    The fix is force sending a getattr request to MDS if xattrs_version
    is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
    does not have xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

06 May, 2014

1 commit

  • Pull Ceph fixes from Sage Weil:
    "First, there is a critical fix for the new primary-affinity function
    that went into -rc1.

    The second batch of patches from Zheng fix a range of problems with
    directory fragmentation, readdir, and a few odds and ends for cephfs"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: reserve caps for file layout/lock MDS requests
    ceph: avoid releasing caps that are being used
    ceph: clear directory's completeness when creating file
    libceph: fix non-default values check in apply_primary_affinity()
    ceph: use fpos_cmp() to compare dentry positions
    ceph: check directory's completeness before emitting directory entry

    Linus Torvalds
     

29 Apr, 2014

1 commit


13 Apr, 2014

1 commit

  • The vfs merge caused a latent bug to show up:

    In file included from fs/ceph/super.h:4:0,
    from fs/ceph/ioctl.c:3:
    include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    ^
    In file included from include/linux/kernel.h:13:0,
    from include/linux/uio.h:12,
    from include/linux/socket.h:7,
    from include/uapi/linux/in.h:22,
    from include/linux/in.h:23,
    from fs/ceph/ioctl.c:1:
    include/linux/printk.h:214:0: note: this is the location of the previous definition
    #define pr_fmt(fmt) fmt
    ^

    where the reason is that is included much too late
    for the "pr_fmt()" define.

    The include of needs to be the first include in the
    file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
    noticeable until some unrelated header file changes brought in an
    indirect earlier include of .

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2014

1 commit


28 Jan, 2014

1 commit


10 Aug, 2013

2 commits


02 May, 2013

1 commit

  • The purpose of ceph_calc_object_layout() is to fill in the pool
    number and seed for a ceph_pg structure provided, based on a given
    osd map and target object id.

    Currently that function takes a file layout parameter, but the only
    thing used out of that is its pool number.

    Change the function so it takes a pool number rather than the full
    file layout structure. Only update the ceph_pg if the pool is found
    in the osd map. Get rid of few useless lines of code from the
    function while there.

    Since the function now very clearly just fills in the ceph_pg
    structure it's provided, rename it ceph_calc_ceph_pg().

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

01 Mar, 2013

1 commit

  • Pull Ceph updates from Sage Weil:
    "A few groups of patches here. Alex has been hard at work improving
    the RBD code, layout groundwork for understanding the new formats and
    doing layering. Most of the infrastructure is now in place for the
    final bits that will come with the next window.

    There are a few changes to the data layout. Jim Schutt's patch fixes
    some non-ideal CRUSH behavior, and a set of patches from me updates
    the client to speak a newer version of the protocol and implement an
    improved hashing strategy across storage nodes (when the server side
    supports it too).

    A pair of patches from Sam Lang fix the atomicity of open+create
    operations. Several patches from Yan, Zheng fix various mds/client
    issues that turned up during multi-mds torture tests.

    A final set of patches expose file layouts via virtual xattrs, and
    allow the policies to be set on directories via xattrs as well
    (avoiding the awkward ioctl interface and providing a consistent
    interface for both kernel mount and ceph-fuse users)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
    libceph: add support for HASHPSPOOL pool flag
    libceph: update osd request/reply encoding
    libceph: calculate placement based on the internal data types
    ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
    ceph: update "ceph_features.h"
    libceph: decode into cpu-native ceph_pg type
    libceph: rename ceph_pg -> ceph_pg_v1
    rbd: pass length, not op for osd completions
    rbd: move rbd_osd_trivial_callback()
    libceph: use a do..while loop in con_work()
    libceph: use a flag to indicate a fault has occurred
    libceph: separate non-locked fault handling
    libceph: encapsulate connection backoff
    libceph: eliminate sparse warnings
    ceph: eliminate sparse warnings in fs code
    rbd: eliminate sparse warnings
    libceph: define connection flag helpers
    rbd: normalize dout() calls
    rbd: barriers are hard
    rbd: ignore zero-length requests
    ...

    Linus Torvalds
     

27 Feb, 2013

3 commits


23 Feb, 2013

1 commit


18 Jan, 2013

1 commit

  • ceph_calc_file_object_mapping() takes (among other things) a "file"
    offset and length, and based on the layout, determines the object
    number ("bno") backing the affected portion of the file's data and
    the offset into that object where the desired range begins. It also
    computes the size that should be used for the request--either the
    amount requested or something less if that would exceed the end of
    the object.

    This patch changes the input length parameter in this function so it
    is used only for input. That is, the argument will be passed by
    value rather than by address, so the value provided won't get
    updated by the function.

    The value would only get updated if the length would surpass the
    current object, and in that case the value it got updated to would
    be exactly that returned in *oxlen.

    Only one of the two callers is affected by this change. Update
    ceph_calc_raw_layout() so it records any updated value.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

03 Oct, 2012

1 commit


22 Aug, 2012

1 commit

  • If "l->stripe_unit" is zero the the mod on the next line will cause a
    divide by zero bug. This comes from the copy_from_user() in
    ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means
    "do not change") so avoid the % check in that case.

    Reported-by: Dan Carpenter
    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     

17 May, 2012

2 commits


08 May, 2012

2 commits


08 Dec, 2011

1 commit

  • We have been using i_lock to protect all kinds of data structures in the
    ceph_inode_info struct, including lists of inodes that we need to iterate
    over while avoiding races with inode destruction. That requires grabbing
    a reference to the inode with the list lock protected, but igrab() now
    takes i_lock to check the inode flags.

    Changing the list lock ordering would be a painful process.

    However, using a ceph-specific i_ceph_lock in the ceph inode instead of
    i_lock is a simple mechanical change and avoids the ordering constraints
    imposed by igrab().

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     

26 Oct, 2011

1 commit

  • Previously we were validating the passed-in stripe unit, object size,
    and stripe count against each other (and not testing most other stuff).
    Instead, make sure that the composed previous layout and new values are valid,
    and only send the new values to the MDS. This lets users change the
    pool without setting the whole layout, for instance.

    Signed-off-by: Greg Farnum

    Greg Farnum
     

27 Jul, 2011

2 commits

  • d_parent is protected by d_lock: use it when looking up a dentry's parent
    directory inode. Also take a reference and drop it in the caller to avoid
    a use-after-free.

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This allows us to force IO through the sync path which you normally only
    get when multiple clients are reading/writing to the same file or by
    mounting with -o sync. Among other things, this lets test programs verify
    correctness with a single mount.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

08 Jun, 2011

1 commit


21 Oct, 2010

2 commits

  • Signed-off-by: Sage Weil

    Greg Farnum
     
  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

02 Aug, 2010

1 commit


18 May, 2010

1 commit

  • ceph_sb_to_client and ceph_client are really identical, we need to dump
    one; while function ceph_client is confusing with "struct ceph_client",
    ceph_sb_to_client's definition is more clear; so we'd better switch all
    call to ceph_sb_to_client.

    -static inline struct ceph_client *ceph_client(struct super_block *sb)
    -{
    - return sb->s_fs_info;
    -}

    Signed-off-by: Cheng Renquan
    Signed-off-by: Sage Weil

    Cheng Renquan
     

04 Dec, 2009

1 commit