30 Mar, 2020
1 commit
-
Add i_last_rd and i_last_wr to ceph_inode_info. These fields are
used to track the last time the client acquired read/write caps for
the inode.If there is no read/write on an inode for 'caps_wanted_delay_max'
seconds, __ceph_caps_file_wanted() does not request caps for read/write
even there are open files.Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted()
calculates dir's wanted caps according to last dir read/modification. If
there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If
there is recent dir modification, also wants CEPH_CAP_FILE_EXCL.Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after
readdir, as with that, modifications do not need to release
CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir.Signed-off-by: "Yan, Zheng"
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
02 Apr, 2018
2 commits
-
ceph_calc_file_object_mapping() has nothing to do with osdmaps.
Signed-off-by: Ilya Dryomov
-
- make it void
- xlen (object extent length) out parameter should be u32 because only
a single stripe unit is mapped at a timeSigned-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
20 Feb, 2017
1 commit
-
sparse says:
fs/ceph/ioctl.c:100:28: warning: cast to restricted __le64
preferred_osd is a __s64 so we don't need to do any conversion. Also,
just remove the cast in ceph_ioctl_get_layout as it's not needed.Signed-off-by: Jeff Layton
Reviewed-by: Sage Weil
Signed-off-by: Ilya Dryomov
28 Jul, 2016
4 commits
-
Track usage count for individual fmode bit. This can reduce the
array size by half.Signed-off-by: Yan, Zheng
-
This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.Signed-off-by: Yan, Zheng
-
Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.Signed-off-by: Yan, Zheng
-
An on-stack oid in ceph_ioctl_get_dataloc() is not initialized,
resulting in a WARN and a NULL pointer dereference later on. We will
have more of these on-stack in the future, so fix it with a convenience
macro.Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id")
Signed-off-by: Ilya Dryomov
26 May, 2016
4 commits
-
This is a major sync up, up to ~Jewel. The highlights are:
- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) supportThe switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished. This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.Signed-off-by: Ilya Dryomov
-
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.Signed-off-by: Ilya Dryomov
-
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.Signed-off-by: Ilya Dryomov
-
Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases,
expect one - long rbd image names:- a format 1 header is named ".rbd"
- an object that points to a format 2 header is named "rbd_id."We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh. (A format 2 header name is
a small system-generated string.)Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string. Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.Signed-off-by: Ilya Dryomov
15 Oct, 2014
2 commits
-
The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero.
Signed-off-by: Yan, Zheng
-
Following sequence of events can happen.
- Client releases an inode, queues cap release message.
- A 'lookup' reply brings the same inode back, but the reply
doesn't contain xattrs because MDS didn't receive the cap release
message and thought client already has up-to-data xattrs.The fix is force sending a getattr request to MDS if xattrs_version
is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
does not have xattr.Signed-off-by: Yan, Zheng
06 May, 2014
1 commit
-
Pull Ceph fixes from Sage Weil:
"First, there is a critical fix for the new primary-affinity function
that went into -rc1.The second batch of patches from Zheng fix a range of problems with
directory fragmentation, readdir, and a few odds and ends for cephfs"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: reserve caps for file layout/lock MDS requests
ceph: avoid releasing caps that are being used
ceph: clear directory's completeness when creating file
libceph: fix non-default values check in apply_primary_affinity()
ceph: use fpos_cmp() to compare dentry positions
ceph: check directory's completeness before emitting directory entry
29 Apr, 2014
1 commit
-
Signed-off-by: Yan, Zheng
Reviewed-by: Sage Weil
13 Apr, 2014
1 commit
-
The vfs merge caused a latent bug to show up:
In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^where the reason is that is included much too late
for the "pr_fmt()" define.The include of needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of .Signed-off-by: Linus Torvalds
03 Apr, 2014
1 commit
-
The fsync(dirfd) only covers namespace operations, not inode updates.
We do not need to cover setattr variants or O_TRUNC.Reported-by: Al Viro
Signed-off-by: Sage Weil
Reviewed-by: Yan, Zheng
28 Jan, 2014
1 commit
-
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
10 Aug, 2013
2 commits
-
Func ceph_calc_ceph_pg maybe failed.So add check for returned value.
Signed-off-by: Jianpeng Ma
Reviewed-by: Sage Weil
Signed-off-by: Sage Weil -
CC: stable@vger.kernel.org
Signed-off-by: Jianpeng Ma
Reviewed-by: Sage Weil
02 May, 2013
1 commit
-
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.Change the function so it takes a pool number rather than the full
file layout structure. Only update the ceph_pg if the pool is found
in the osd map. Get rid of few useless lines of code from the
function while there.Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().Signed-off-by: Alex Elder
Reviewed-by: Josh Durgin
01 Mar, 2013
1 commit
-
Pull Ceph updates from Sage Weil:
"A few groups of patches here. Alex has been hard at work improving
the RBD code, layout groundwork for understanding the new formats and
doing layering. Most of the infrastructure is now in place for the
final bits that will come with the next window.There are a few changes to the data layout. Jim Schutt's patch fixes
some non-ideal CRUSH behavior, and a set of patches from me updates
the client to speak a newer version of the protocol and implement an
improved hashing strategy across storage nodes (when the server side
supports it too).A pair of patches from Sam Lang fix the atomicity of open+create
operations. Several patches from Yan, Zheng fix various mds/client
issues that turned up during multi-mds torture tests.A final set of patches expose file layouts via virtual xattrs, and
allow the policies to be set on directories via xattrs as well
(avoiding the awkward ioctl interface and providing a consistent
interface for both kernel mount and ceph-fuse users)."* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
libceph: add support for HASHPSPOOL pool flag
libceph: update osd request/reply encoding
libceph: calculate placement based on the internal data types
ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
ceph: update "ceph_features.h"
libceph: decode into cpu-native ceph_pg type
libceph: rename ceph_pg -> ceph_pg_v1
rbd: pass length, not op for osd completions
rbd: move rbd_osd_trivial_callback()
libceph: use a do..while loop in con_work()
libceph: use a flag to indicate a fault has occurred
libceph: separate non-locked fault handling
libceph: encapsulate connection backoff
libceph: eliminate sparse warnings
ceph: eliminate sparse warnings in fs code
rbd: eliminate sparse warnings
libceph: define connection flag helpers
rbd: normalize dout() calls
rbd: barriers are hard
rbd: ignore zero-length requests
...
27 Feb, 2013
3 commits
-
Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type. This allows us to
pass the full 32-bit precision of the pgid.seed to the callers. It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.Signed-off-by: Sage Weil
Reviewed-by: Alex Elder -
Always decode data into our cpu-native ceph_pg type that has the correct
field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy protocol.Signed-off-by: Sage Weil
Reviewed-by: Alex Elder -
Rename the old version this type to distinguish it from the new version.
Signed-off-by: Sage Weil
Reviewed-by: Alex Elder
23 Feb, 2013
1 commit
-
Signed-off-by: Al Viro
18 Jan, 2013
1 commit
-
ceph_calc_file_object_mapping() takes (among other things) a "file"
offset and length, and based on the layout, determines the object
number ("bno") backing the affected portion of the file's data and
the offset into that object where the desired range begins. It also
computes the size that should be used for the request--either the
amount requested or something less if that would exceed the end of
the object.This patch changes the input length parameter in this function so it
is used only for input. That is, the argument will be passed by
value rather than by address, so the value provided won't get
updated by the function.The value would only get updated if the length would surpass the
current object, and in that case the value it got updated to would
be exactly that returned in *oxlen.Only one of the two callers is affected by this change. Update
ceph_calc_raw_layout() so it records any updated value.Signed-off-by: Alex Elder
Reviewed-by: Josh Durgin
03 Oct, 2012
1 commit
-
If the user calls GET_DATALOC on a file with an invalid (e.g.,
zeroed) layout, return EIO to userland.Signed-off-by: Sage Weil
Reviewed-by: Alex Elder
22 Aug, 2012
1 commit
-
If "l->stripe_unit" is zero the the mod on the next line will cause a
divide by zero bug. This comes from the copy_from_user() in
ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means
"do not change") so avoid the % check in that case.Reported-by: Dan Carpenter
Signed-off-by: Sage Weil
Reviewed-by: Alex Elder
17 May, 2012
2 commits
-
Old users may not expect EINVAL, and there is no clear user-visibile
behavior change now that we ignore it.Signed-off-by: Sage Weil
Reviewed-by: Alex Elder -
When we are setting a new layout, fully initialize the structure:
- zero it out
- always set preferred_osd to -1Signed-off-by: Sage Weil
Reviewed-by: Alex Elder
08 May, 2012
2 commits
-
Both of these methods perform similar checks; move that code to a helper
so that we can ensure the checks are consistent.Reviewed-by: Alex Elder
Signed-off-by: Sage Weil -
This was an ill-conceived feature that has been removed from Ceph. Do
this gracefully:- reject attempts to specify a preferred_osd via the ioctl
- stop exposing this information via virtual xattrs
- always fill in -1 for requests, in case we talk to an older server
- don't calculate preferred_osd placements/pgidsReviewed-by: Alex Elder
Signed-off-by: Sage Weil
08 Dec, 2011
1 commit
-
We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction. That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.Changing the list lock ordering would be a painful process.
However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().Reported-by: Amon Ott
Signed-off-by: Sage Weil
26 Oct, 2011
1 commit
-
Previously we were validating the passed-in stripe unit, object size,
and stripe count against each other (and not testing most other stuff).
Instead, make sure that the composed previous layout and new values are valid,
and only send the new values to the MDS. This lets users change the
pool without setting the whole layout, for instance.Signed-off-by: Greg Farnum
27 Jul, 2011
2 commits
-
d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode. Also take a reference and drop it in the caller to avoid
a use-after-free.Reported-by: Al Viro
Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil -
This allows us to force IO through the sync path which you normally only
get when multiple clients are reading/writing to the same file or by
mounting with -o sync. Among other things, this lets test programs verify
correctness with a single mount.Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil
08 Jun, 2011
1 commit
-
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock. This avoids adding new and unnecessary
locking dependencies.Signed-off-by: Sage Weil