30 Mar, 2020

1 commit

  • Add i_last_rd and i_last_wr to ceph_inode_info. These fields are
    used to track the last time the client acquired read/write caps for
    the inode.

    If there is no read/write on an inode for 'caps_wanted_delay_max'
    seconds, __ceph_caps_file_wanted() does not request caps for read/write
    even there are open files.

    Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted()
    calculates dir's wanted caps according to last dir read/modification. If
    there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If
    there is recent dir modification, also wants CEPH_CAP_FILE_EXCL.

    Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after
    readdir, as with that, modifications do not need to release
    CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

02 Apr, 2018

2 commits


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

20 Feb, 2017

1 commit


28 Jul, 2016

4 commits


26 May, 2016

4 commits

  • This is a major sync up, up to ~Jewel. The highlights are:

    - per-session request trees (vs a global per-client tree)
    - per-session locking (vs a global per-client rwlock)
    - homeless OSD session
    - no ad-hoc global per-client lists
    - support for pool quotas
    - foundation for watch/notify v2 support
    - foundation for map check (pool deletion detection) support

    The switchover is incomplete: lingering requests can be setup and
    teared down but aren't ever reestablished. This functionality is
    restored with the introduction of the new lingering infrastructure
    (ceph_osd_linger_request, linger_work, etc) in a later commit.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
    emphasise that it returns acting primary.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise
    that returned is raw PG and return -ENOENT instead of -EIO if the pool
    doesn't exist.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Currently ceph_object_id can hold object names of up to 100
    (CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases,
    expect one - long rbd image names:

    - a format 1 header is named ".rbd"
    - an object that points to a format 2 header is named "rbd_id."

    We operate on these potentially long-named objects during rbd map, and,
    for format 1 images, during header refresh. (A format 2 header name is
    a small system-generated string.)

    Lift this 100 character limit by making ceph_object_id be able to point
    to an externally-allocated string. Apart from being able to work with
    almost arbitrarily-long named objects, this allows us to reduce the
    size of ceph_object_id from >100 bytes to 64 bytes.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

15 Oct, 2014

2 commits

  • The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Following sequence of events can happen.
    - Client releases an inode, queues cap release message.
    - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

    The fix is force sending a getattr request to MDS if xattrs_version
    is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
    does not have xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

06 May, 2014

1 commit

  • Pull Ceph fixes from Sage Weil:
    "First, there is a critical fix for the new primary-affinity function
    that went into -rc1.

    The second batch of patches from Zheng fix a range of problems with
    directory fragmentation, readdir, and a few odds and ends for cephfs"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: reserve caps for file layout/lock MDS requests
    ceph: avoid releasing caps that are being used
    ceph: clear directory's completeness when creating file
    libceph: fix non-default values check in apply_primary_affinity()
    ceph: use fpos_cmp() to compare dentry positions
    ceph: check directory's completeness before emitting directory entry

    Linus Torvalds
     

29 Apr, 2014

1 commit


13 Apr, 2014

1 commit

  • The vfs merge caused a latent bug to show up:

    In file included from fs/ceph/super.h:4:0,
    from fs/ceph/ioctl.c:3:
    include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    ^
    In file included from include/linux/kernel.h:13:0,
    from include/linux/uio.h:12,
    from include/linux/socket.h:7,
    from include/uapi/linux/in.h:22,
    from include/linux/in.h:23,
    from fs/ceph/ioctl.c:1:
    include/linux/printk.h:214:0: note: this is the location of the previous definition
    #define pr_fmt(fmt) fmt
    ^

    where the reason is that is included much too late
    for the "pr_fmt()" define.

    The include of needs to be the first include in the
    file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
    noticeable until some unrelated header file changes brought in an
    indirect earlier include of .

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2014

1 commit


28 Jan, 2014

1 commit


10 Aug, 2013

2 commits


02 May, 2013

1 commit

  • The purpose of ceph_calc_object_layout() is to fill in the pool
    number and seed for a ceph_pg structure provided, based on a given
    osd map and target object id.

    Currently that function takes a file layout parameter, but the only
    thing used out of that is its pool number.

    Change the function so it takes a pool number rather than the full
    file layout structure. Only update the ceph_pg if the pool is found
    in the osd map. Get rid of few useless lines of code from the
    function while there.

    Since the function now very clearly just fills in the ceph_pg
    structure it's provided, rename it ceph_calc_ceph_pg().

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

01 Mar, 2013

1 commit

  • Pull Ceph updates from Sage Weil:
    "A few groups of patches here. Alex has been hard at work improving
    the RBD code, layout groundwork for understanding the new formats and
    doing layering. Most of the infrastructure is now in place for the
    final bits that will come with the next window.

    There are a few changes to the data layout. Jim Schutt's patch fixes
    some non-ideal CRUSH behavior, and a set of patches from me updates
    the client to speak a newer version of the protocol and implement an
    improved hashing strategy across storage nodes (when the server side
    supports it too).

    A pair of patches from Sam Lang fix the atomicity of open+create
    operations. Several patches from Yan, Zheng fix various mds/client
    issues that turned up during multi-mds torture tests.

    A final set of patches expose file layouts via virtual xattrs, and
    allow the policies to be set on directories via xattrs as well
    (avoiding the awkward ioctl interface and providing a consistent
    interface for both kernel mount and ceph-fuse users)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
    libceph: add support for HASHPSPOOL pool flag
    libceph: update osd request/reply encoding
    libceph: calculate placement based on the internal data types
    ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
    ceph: update "ceph_features.h"
    libceph: decode into cpu-native ceph_pg type
    libceph: rename ceph_pg -> ceph_pg_v1
    rbd: pass length, not op for osd completions
    rbd: move rbd_osd_trivial_callback()
    libceph: use a do..while loop in con_work()
    libceph: use a flag to indicate a fault has occurred
    libceph: separate non-locked fault handling
    libceph: encapsulate connection backoff
    libceph: eliminate sparse warnings
    ceph: eliminate sparse warnings in fs code
    rbd: eliminate sparse warnings
    libceph: define connection flag helpers
    rbd: normalize dout() calls
    rbd: barriers are hard
    rbd: ignore zero-length requests
    ...

    Linus Torvalds
     

27 Feb, 2013

3 commits


23 Feb, 2013

1 commit


18 Jan, 2013

1 commit

  • ceph_calc_file_object_mapping() takes (among other things) a "file"
    offset and length, and based on the layout, determines the object
    number ("bno") backing the affected portion of the file's data and
    the offset into that object where the desired range begins. It also
    computes the size that should be used for the request--either the
    amount requested or something less if that would exceed the end of
    the object.

    This patch changes the input length parameter in this function so it
    is used only for input. That is, the argument will be passed by
    value rather than by address, so the value provided won't get
    updated by the function.

    The value would only get updated if the length would surpass the
    current object, and in that case the value it got updated to would
    be exactly that returned in *oxlen.

    Only one of the two callers is affected by this change. Update
    ceph_calc_raw_layout() so it records any updated value.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

03 Oct, 2012

1 commit


22 Aug, 2012

1 commit

  • If "l->stripe_unit" is zero the the mod on the next line will cause a
    divide by zero bug. This comes from the copy_from_user() in
    ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means
    "do not change") so avoid the % check in that case.

    Reported-by: Dan Carpenter
    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     

17 May, 2012

2 commits


08 May, 2012

2 commits


08 Dec, 2011

1 commit

  • We have been using i_lock to protect all kinds of data structures in the
    ceph_inode_info struct, including lists of inodes that we need to iterate
    over while avoiding races with inode destruction. That requires grabbing
    a reference to the inode with the list lock protected, but igrab() now
    takes i_lock to check the inode flags.

    Changing the list lock ordering would be a painful process.

    However, using a ceph-specific i_ceph_lock in the ceph inode instead of
    i_lock is a simple mechanical change and avoids the ordering constraints
    imposed by igrab().

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     

26 Oct, 2011

1 commit

  • Previously we were validating the passed-in stripe unit, object size,
    and stripe count against each other (and not testing most other stuff).
    Instead, make sure that the composed previous layout and new values are valid,
    and only send the new values to the MDS. This lets users change the
    pool without setting the whole layout, for instance.

    Signed-off-by: Greg Farnum

    Greg Farnum
     

27 Jul, 2011

2 commits

  • d_parent is protected by d_lock: use it when looking up a dentry's parent
    directory inode. Also take a reference and drop it in the caller to avoid
    a use-after-free.

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This allows us to force IO through the sync path which you normally only
    get when multiple clients are reading/writing to the same file or by
    mounting with -o sync. Among other things, this lets test programs verify
    correctness with a single mount.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

08 Jun, 2011

1 commit