26 Jan, 2017
1 commit
-
commit 1097680d759918ce4a8705381c0ab2ed7bd60cf1 upstream.
sparse says:
fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types)
fs/ceph/dir.c:1248:50: expected restricted __le32 [usertype] mask
fs/ceph/dir.c:1248:50: got int [signed] [assigned] maskFixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
Signed-off-by: Jeff Layton
Reviewed-by: Sage Weil
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman
08 Dec, 2016
1 commit
-
This function sets req->r_locked_dir which is supposed to indicate to
ceph_fill_trace that the parent's i_rwsem is locked for write.
Unfortunately, there is no guarantee that the dir will be locked when
d_revalidate is called, so we really don't want ceph_fill_trace to do
any dcache manipulation from this context. Clear req->r_locked_dir since
it's clearly not safe to do that.What we really want to know with d_revalidate is whether the dentry
still points to the same inode. ceph_fill_trace installs a pointer to
the inode in req->r_target_inode, so we can just compare that to
d_inode(dentry) to see if it's the same one after the lookup.Also, since we aren't generally interested in the parent here, we can
switch to using a GETATTR to hint that to the MDS, which also means that
we only need to reserve one cap.Finally, just remove the d_unhashed check. That's really outside the
purview of a filesystem's d_revalidate. If the thing became unhashed
while we're checking it, then that's up to the VFS to handle anyway.Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
Link: http://tracker.ceph.com/issues/18041
Reported-by: Donatas Abraitis
Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
11 Oct, 2016
1 commit
-
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
08 Oct, 2016
1 commit
-
These inode operations are no longer used; remove them.
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro
27 Sep, 2016
2 commits
-
Generated patch:
sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`Signed-off-by: Miklos Szeredi
-
This is trivial to do:
- add flags argument to foo_rename()
- check if flags is zero
- assign foo_rename() to .rename2 instead of .renameThis doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).Filesystems converted:
9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
After this, we can get rid of the duplicate interfaces for rename.
Signed-off-by: Miklos Szeredi
Acked-by: Greg Kroah-Hartman
Acked-by: David Howells [AFS]
Acked-by: Mike Marshall
Cc: Eric Van Hensbergen
Cc: Ilya Dryomov
Cc: Jan Harkes
Cc: Tyler Hicks
Cc: Oleg Drokin
Cc: Trond Myklebust
Cc: Mark Fasheh
05 Sep, 2016
1 commit
-
Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.This looks like a typo which is reported by clang when building the
kernel with some warning flags:fs/ceph/dir.c:600:22: error: using the result of an assignment as a
condition without parentheses [-Werror,-Wparentheses]
} else if (fi->frag |= fpos_frag(new_pos)) {
~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
fs/ceph/dir.c:600:22: note: place parentheses around the assignment
to silence this warning
} else if (fi->frag |= fpos_frag(new_pos)) {
^
( )
fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
assignment into an inequality comparison
} else if (fi->frag |= fpos_frag(new_pos)) {
^~
!=Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
Signed-off-by: Nicolas Iooss
Signed-off-by: Ilya Dryomov
28 Jul, 2016
4 commits
-
We can now handle the snapshot cases under RCU, as well as the
non-snapshot case when we don't need to queue up a lease renewal
allow LOOKUP_RCU walks to proceed under those conditions.Signed-off-by: Jeff Layton
Reviewed-by: Yan, Zheng -
Under rcuwalk, we need to take extra care when dereferencing d_parent.
We want to do that once and pass a pointer to dentry_lease_is_valid.Also, we must ensure that that function can handle the case where we're
racing with d_release. Check whether "di" is NULL under the d_lock, and
just return 0 if so.Finally, we still need to kick off a renewal job if the lease is getting
close to expiration. If that's the case, then just drop out of rcuwalk
mode since that could block.Signed-off-by: Jeff Layton
Reviewed-by: Yan, Zheng -
To check for a valid dentry lease, we need to get at the
ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that
is on its way to destruction. Since we need to take the d_lock in
dentry_lease_is_valid already, we can just ensure that we clear the
d_fsinfo pointer out under the same lock before destroying it.Signed-off-by: Jeff Layton
Reviewed-by: Yan, Zheng -
Pretty simple: just use ceph_dentry_info.time instead (which was already
there, unused).Signed-off-by: Miklos Szeredi
27 May, 2016
1 commit
-
Pull Ceph updates from Sage Weil:
"This changeset has a few main parts:- Ilya has finished a huge refactoring effort to sync up the
client-side logic in libceph with the user-space client code, which
has evolved significantly over the last couple years, with lots of
additional behaviors (e.g., how requests are handled when cluster
is full and transitions from full to non-full).This structure of the code is more closely aligned with userspace
now such that it will be much easier to maintain going forward when
behavior changes take place. There are some locking improvements
bundled in as well.- Zheng adds multi-filesystem support (multiple namespaces within the
same Ceph cluster)- Zheng has changed the readdir offsets and directory enumeration so
that dentry offsets are hash-based and therefore stable across
directory fragmentation events on the MDS.- Zheng has a smorgasbord of bug fixes across fs/ceph"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: fix wake_up_session_cb()
ceph: don't use truncate_pagecache() to invalidate read cache
ceph: SetPageError() for writeback pages if writepages fails
ceph: handle interrupted ceph_writepage()
ceph: make ceph_update_writeable_page() uninterruptible
libceph: make ceph_osdc_wait_request() uninterruptible
ceph: handle -EAGAIN returned by ceph_update_writeable_page()
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
ceph: block non-fatal signals for fault/page_mkwrite
ceph: make logical calculation functions return bool
ceph: tolerate bad i_size for symlink inode
ceph: improve fragtree change detection
ceph: keep leaf frag when updating fragtree
ceph: fix dir_auth check in ceph_fill_dirfrag()
ceph: don't assume frag tree splits in mds reply are sorted
ceph: fix inode reference leak
ceph: using hash value to compose dentry offset
ceph: don't forbid marking directory complete after forward seek
ceph: record 'offset' for each entry of readdir result
ceph: define 'end/complete' in readdir reply as bit flags
...
26 May, 2016
9 commits
-
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.No functional change.
Signed-off-by: Zhang Zhuoyu
-
If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:(0xff << 52) | ((24 bits hash) << 28) |
(the nth entry hash hash collision)This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.Signed-off-by: Yan, Zheng
-
Forward seek within same frag does not update fi->last_name, it will
not affect contents of later readdir reply. So there is no need to
forbid marking directory completeSigned-off-by: Yan, Zheng
-
This is preparation for using hash value as dentry 'offset'
Signed-off-by: Yan, Zheng
-
Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.Signed-off-by: Yan, Zheng
-
This avoids defining multiple arrays for entries in readdir reply
Signed-off-by: Yan, Zheng
-
don't distinguish leftmost frag from other frags. always use 2 as
first entry's offset.Signed-off-by: Yan, Zheng
-
we never add snapdir and the hidden .ceph dir into readdir cache
Signed-off-by: Yan, Zheng
-
use binary search to find cache index that corresponds to readdir
postion.Signed-off-by: Yan, Zheng
19 May, 2016
1 commit
-
Pull remaining vfs xattr work from Al Viro:
"The rest of work.xattr (non-cifs conversions)"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: Switch to generic xattr handlers
ubifs: Switch to generic xattr handlers
jfs: Switch to generic xattr handlers
jfs: Clean up xattr name mapping
gfs2: Switch to generic xattr handlers
ceph: kill __ceph_removexattr()
ceph: Switch to generic xattr handlers
ceph: Get rid of d_find_alias in ceph_set_acl
24 Apr, 2016
1 commit
-
Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check
for valid attribute names there, and remove those checks from
__ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be
handled by the catch-all handler anymore.The set xattr handler is called with a NULL value to indicate that the
attribute should be removed; __ceph_setxattr already handles that case
correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL
value).Move the check for snapshots from ceph_{set,remove}xattr into
__ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be
replaced with the generic iops.Signed-off-by: Andreas Gruenbacher
Signed-off-by: "Yan, Zheng"
Signed-off-by: Al Viro
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
26 Mar, 2016
4 commits
-
Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.
Signed-off-by: Geliang Tang
Signed-off-by: Ilya Dryomov -
If dentry has no lease, ceph_d_revalidate() previously return 0.
This causes VFS to invalidate the dentry and create a new dentry
for later lookup. Invalidating a dentry also detach any underneath
mount points. So mount point inside cephfs can disapear mystically
(even the mount point is not modified by other hosts).The fix is using lookup request to revalidate dentry without lease.
This can partly solve the mount points disapear issue (as long as
the mount point is not modified by other hosts)Signed-off-by: Yan, Zheng
-
use vfs helper dget_parent() instead
Signed-off-by: Yan, Zheng
-
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.Signed-off-by: Yan, Zheng
23 Jan, 2016
1 commit
-
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.Signed-off-by: Al Viro
25 Jun, 2015
5 commits
-
Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.Signed-off-by: Yan, Zheng
-
GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
pathSigned-off-by: Yan, Zheng
-
fsync() on directory should flush dirty caps and wait for any
uncommitted directory opertions to commit. But ceph_dir_fsync()
only waits for uncommitted directory opertions.Signed-off-by: Yan, Zheng
-
No need to bifurcate wait now that we've got ceph_timeout_jiffies().
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: Yan, Zheng -
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.There is also a bug in the way mount_timeout=0 is handled. It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
27 Apr, 2015
1 commit
-
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
22 Apr, 2015
1 commit
-
Signed-off-by: Yan, Zheng
20 Apr, 2015
3 commits
-
Currently, there is no check for the kstrdup() for r_path2,
r_path1 and snapdir_name as various locations as there is a
possibility of failure during memory pressure. Therefore,
returning ENOMEM where the checks have been missed.Signed-off-by: Sanidhya Kashyap
Signed-off-by: Yan, Zheng -
return type of wait_for_completion_timeout is unsigned long not int. An
appropriately named unsigned long is added and the assignment fixed up.Signed-off-by: Nicholas Mc Guire
Signed-off-by: Yan, Zheng -
Signed-off-by: Yan, Zheng
16 Apr, 2015
1 commit
-
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells
Signed-off-by: Al Viro