Eric Lee / smarc-fsl-linux-kernel

01 Oct, 2020

2 commits

3e4ca8bf5 ceph: fix potential race in ceph_check_caps ... Browse Code »

[ Upstream commit dc3da0461cc4b76f2d0c5b12247fcb3b520edbbf ]

Nothing ensures that session will still be valid by the time we
dereference the pointer. Take and put a reference.

In principle, we should always be able to get a reference here, but
throw a warning if that's ever not the case.

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jeff Layton
2020-10-01 19:18:08 +0800
379deeac5 ceph: ensure we have a new cap before continuing in fill_inode ... Browse Code »

[ Upstream commit 9a6bed4fe0c8bf57785cbc4db9f86086cb9b193d ]

If the caller passes in a NULL cap_reservation, and we can't allocate
one then ensure that we fail gracefully.

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jeff Layton
2020-10-01 19:17:29 +0800

10 Sep, 2020

1 commit

0a8dcad24 ceph: don't allow setlease on cephfs ... Browse Code »

[ Upstream commit 496ceaf12432b3d136dcdec48424312e71359ea7 ]

Leases don't currently work correctly on kcephfs, as they are not broken
when caps are revoked. They could eventually be implemented similarly to
how we did them in libcephfs, but for now don't allow them.

[ idryomov: no need for simple_nosetlease() in ceph_dir_fops and
ceph_snapdir_fops ]

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jeff Layton
2020-09-10 01:12:21 +0800

03 Sep, 2020

2 commits

a002274db ceph: do not access the kiocb after aio requests ... Browse Code »

[ Upstream commit d1d9655052606fd9078e896668ec90191372d513 ]

In aio case, if the completion comes very fast just before the
ceph_read_iter() returns to fs/aio.c, the kiocb will be freed in
the completion callback, then if ceph_read_iter() access again
we will potentially hit the use-after-free bug.

[ jlayton: initialize direct_lock early, and use it everywhere ]

URL: https://tracker.ceph.com/issues/45649
Signed-off-by: Xiubo Li
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Xiubo Li
2020-09-03 17:26:47 +0800
01540d5e7 ceph: fix potential mdsc use-after-free crash ... Browse Code »

[ Upstream commit fa9967734227b44acb1b6918033f9122dc7825b9 ]

Make sure the delayed work stopped before releasing the resources.

cancel_delayed_work_sync() will only guarantee that the work finishes
executing if the work is already in the ->worklist. That means after
the cancel_delayed_work_sync() returns, it will leave the work requeued
if it was rearmed at the end. That can lead to a use after free once the
work struct is freed.

Fix it by flushing the delayed work instead of trying to cancel it, and
ensure that the work doesn't rearm if the mdsc is stopping.

URL: https://tracker.ceph.com/issues/46293
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Xiubo Li
2020-09-03 17:26:47 +0800

26 Aug, 2020

1 commit

2bd8ba398 ceph: fix use-after-free for fsc->mdsc ... Browse Code »

[ Upstream commit a7caa88f8b72c136f9a401f498471b8a8e35370d ]

If the ceph_mdsc_init() fails, it will free the mdsc already.

Reported-by: syzbot+b57f46d8d6ea51960b8c@syzkaller.appspotmail.com
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Xiubo Li
2020-08-26 16:40:56 +0800

21 Aug, 2020

2 commits

37e3a1c08 ceph: handle zero-length feature mask in session messages ... Browse Code »

commit 02e37571f9e79022498fd0525c073b07e9d9ac69 upstream.

Most session messages contain a feature mask, but the MDS will
routinely send a REJECT message with one that is zero-length.

Commit 0fa8263367db ("ceph: fix endianness bug when handling MDS
session feature bits") fixed the decoding of the feature mask,
but failed to account for the MDS sending a zero-length feature
mask. This causes REJECT message decoding to fail.

Skip trying to decode a feature mask if the word count is zero.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46823
Fixes: 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits")
Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Tested-by: Patrick Donnelly
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2020-08-21 19:05:28 +0800
8953e8cb0 ceph: set sec_context xattr on symlink creation ... Browse Code »

commit b748fc7a8763a5b3f8149f12c45711cd73ef8176 upstream.

Symlink inodes should have the security context set in their xattrs on
creation. We already set the context on creation, but we don't attach
the pagelist. The effect is that symlink inodes don't get an SELinux
context set on them at creation, so they end up unlabeled instead of
inheriting the proper context. Make it do so.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2020-08-21 19:05:28 +0800

24 Jun, 2020

1 commit

807460787 ceph: don't return -ESTALE if there's still an open file ... Browse Code »

[ Upstream commit 878dabb64117406abd40977b87544d05bb3031fc ]

Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
a file handle to dentry"), this fixes another corner case with
name_to_handle_at/open_by_handle_at. The issue has been detected by
xfstest generic/467, when doing:

- name_to_handle_at("/cephfs/myfile")
- open("/cephfs/myfile")
- unlink("/cephfs/myfile")
- sync; sync;
- drop caches
- open_by_handle_at()

The call to open_by_handle_at should not fail because the file hasn't been
deleted yet (only unlinked) and we do have a valid handle to it. -ESTALE
shall be returned only if i_nlink is 0 *and* i_count is 1.

This patch also makes sure we have LINK caps before checking i_nlink.

Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Acked-by: Amir Goldstein
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Luis Henriques
2020-06-24 23:50:37 +0800

03 Jun, 2020

1 commit

6b292d780 ceph: flush release queue when handling caps for unknown inode ... Browse Code »

[ Upstream commit fb33c114d3ed5bdac230716f5b0a93b56b92a90d ]

It's possible for the VFS to completely forget about an inode, but for
it to still be sitting on the cap release queue. If the MDS sends the
client a cap message for such an inode, it just ignores it today, which
can lead to a stall of up to 5s until the cap release queue is flushed.

If we get a cap message for an inode that can't be located, then go
ahead and flush the cap release queue.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/45532
Fixes: 1e9c2eb6811e ("ceph: delete stale dentry when last reference is dropped")
Reported-and-Tested-by: Andrej Filipčič
Suggested-by: Yan, Zheng
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jeff Layton
2020-06-03 14:21:25 +0800

27 May, 2020

1 commit

b68d27c5f ceph: fix double unlock in handle_cap_export() ... Browse Code »

[ Upstream commit 4d8e28ff3106b093d98bfd2eceb9b430c70a8758 ]

If the ceph_mdsc_open_export_target_session() return fails, it will
do a "goto retry", but the session mutex has already been unlocked.
Re-lock the mutex in that case to ensure that we don't unlock it
twice.

Signed-off-by: Wu Bo
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Wu Bo
2020-05-27 23:46:34 +0800

14 May, 2020

2 commits

53f453031 ceph: demote quotarealm lookup warning to a debug message ... Browse Code »

commit 12ae44a40a1be891bdc6463f8c7072b4ede746ef upstream.

A misconfigured cephx can easily result in having the kernel client
flooding the logs with:

ceph: Can't lookup inode 1 (err: -13)

Change this message to debug level.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/44546
Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Luis Henriques
2020-05-14 13:58:27 +0800
3fd9f902c ceph: fix endianness bug when handling MDS session feature bits ... Browse Code »

commit 0fa8263367db9287aa0632f96c1a5f93cc478150 upstream.

Eduard reported a problem mounting cephfs on s390 arch. The feature
mask sent by the MDS is little-endian, so we need to convert it
before storing and testing against it.

Cc: stable@vger.kernel.org
Reported-and-Tested-by: Eduard Shishkin
Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2020-05-14 13:58:27 +0800

29 Apr, 2020

2 commits

b71ac8086 ceph: don't skip updating wanted caps when cap is stale ... Browse Code »

[ Upstream commit 0aa971b6fd3f92afef6afe24ef78d9bb14471519 ]

1. try_get_cap_refs() fails to get caps and finds that mds_wanted
does not include what it wants. It returns -ESTALE.
2. ceph_get_caps() calls ceph_renew_caps(). ceph_renew_caps() finds
that inode has cap, so it calls ceph_check_caps().
3. ceph_check_caps() finds that issued caps (without checking if it's
stale) already includes caps wanted by open file, so it skips
updating wanted caps.

Above events can cause an infinite loop inside ceph_get_caps().

Signed-off-by: "Yan, Zheng"
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Yan, Zheng
2020-04-29 22:32:58 +0800
acbfccc6a ceph: return ceph_mdsc_do_request() errors from __get_parent() ... Browse Code »

[ Upstream commit c6d50296032f0b97473eb2e274dc7cc5d0173847 ]

Return the error returned by ceph_mdsc_do_request(). Otherwise,
r_target_inode ends up being NULL this ends up returning ENOENT
regardless of the error.

Signed-off-by: Qiujun Huang
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Qiujun Huang
2020-04-29 22:32:58 +0800

13 Apr, 2020

2 commits

193490dbe ceph: canonicalize server path in place ... Browse Code »

commit b27a939e8376a3f1ed09b9c33ef44d20f18ec3d0 upstream.

syzbot reported that 4fbc0c711b24 ("ceph: remove the extra slashes in
the server path") had caused a regression where an allocation could be
done under a spinlock -- compare_mount_options() is called by sget_fc()
with sb_lock held.

We don't really need the supplied server path, so canonicalize it
in place and compare it directly. To make this work, the leading
slash is kept around and the logic in ceph_real_mount() to skip it
is restored. CEPH_MSG_CLIENT_SESSION now reports the same (i.e.
canonicalized) path, with the leading slash of course.

Fixes: 4fbc0c711b24 ("ceph: remove the extra slashes in the server path")
Reported-by: syzbot+98704a51af8e3d9425a9@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Signed-off-by: Luis Henriques
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-04-13 16:48:11 +0800
56385788f ceph: remove the extra slashes in the server path ... Browse Code »

commit 4fbc0c711b2464ee1551850b85002faae0b775d5 upstream.

It's possible to pass the mount helper a server path that has more
than one contiguous slash character. For example:

$ mount -t ceph 192.168.195.165:40176:/// /mnt/cephfs/

In the MDS server side the extra slashes of the server path will be
treated as snap dir, and then we can get the following debug logs:

ceph: mount opening path //
ceph: open_root_inode opening '//'
ceph: fill_trace 0000000059b8a3bc is_dentry 0 is_target 1
ceph: alloc_inode 00000000dc4ca00b
ceph: get_inode created new inode 00000000dc4ca00b 1.ffffffffffffffff ino 1
ceph: get_inode on 1=1.ffffffffffffffff got 00000000dc4ca00b

And then when creating any new file or directory under the mount
point, we can hit the following BUG_ON in ceph_fill_trace():

BUG_ON(ceph_snap(dir) != dvino.snap);

Have the client ignore the extra slashes in the server path when
mounting. This will also canonicalize the path, so that identical mounts
can be consilidated.

1) "//mydir1///mydir//"
2) "/mydir1/mydir"
3) "/mydir1/mydir/"

Regardless of the internal treatment of these paths, the kernel still
stores the original string including the leading '/' for presentation
to userland.

URL: https://tracker.ceph.com/issues/42771
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Luis Henriques
Signed-off-by: Greg Kroah-Hartman

Xiubo Li
2020-04-13 16:48:11 +0800

01 Apr, 2020

2 commits

7cdaa5cd7 ceph: fix memory leak in ceph_cleanup_snapid_map() ... Browse Code »

commit c8d6ee01449cd0d2f30410681cccb616a88f50b1 upstream.

kmemleak reports the following memory leak:

unreferenced object 0xffff88821feac8a0 (size 96):
comm "kworker/1:0", pid 17, jiffies 4294896362 (age 20.512s)
hex dump (first 32 bytes):
a0 c8 ea 1f 82 88 ff ff 00 c9 ea 1f 82 88 ff ff ................
00 00 00 00 00 00 00 00 00 01 00 00 00 00 ad de ................
backtrace:
[] ceph_get_snapid_map+0x75/0x2a0
[] fill_inode+0xb26/0x1010
[] ceph_readdir_prepopulate+0x389/0xc40
[] dispatch+0x11ab/0x1521
[] ceph_con_workfn+0xf3d/0x3240
[] process_one_work+0x24d/0x590
[] worker_thread+0x4a/0x3d0
[] kthread+0xfb/0x130
[] ret_from_fork+0x3a/0x50

A kfree is missing while looping the 'to_free' list of ceph_snapid_map
objects.

Cc: stable@vger.kernel.org
Fixes: 75c9627efb72 ("ceph: map snapid to anonymous bdev ID")
Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Luis Henriques
2020-04-01 17:01:59 +0800
ed24820d1 ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL ... Browse Code »

commit 7614209736fbc4927584d4387faade4f31444fce upstream.

CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well. Unfortunately the backwards compatibility here
is lacking:

- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
was guarded by require_osd_release >= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
no difference to clients that only check OSDMAP_FULL/NEARFULL because
require_osd_release is not client-facing -- it is for OSDs

Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.

These checks are best effort, so take osdc->lock and look up pool flags
just once. Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.

Cc: stable@vger.kernel.org
Reported-by: Yanhu Cao
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Acked-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-04-01 17:01:58 +0800

05 Mar, 2020

1 commit

b520f78ba ceph: do not execute direct write in parallel if O_APPEND is specified ... Browse Code »

[ Upstream commit 8e4473bb50a1796c9c32b244e5dbc5ee24ead937 ]

In O_APPEND & O_DIRECT mode, the data from different writers will
be possibly overlapping each other since they take the shared lock.

For example, both Writer1 and Writer2 are in O_APPEND and O_DIRECT
mode:

Writer1 Writer2

shared_lock() shared_lock()
getattr(CAP_SIZE) getattr(CAP_SIZE)
iocb->ki_pos = EOF iocb->ki_pos = EOF
write(data1)
write(data2)
shared_unlock() shared_unlock()

The data2 will overlap the data1 from the same file offset, the
old EOF.

Switch to exclusive lock instead when O_APPEND is specified.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Xiubo Li
2020-03-05 23:43:38 +0800

24 Feb, 2020

1 commit

bd4e18941 ceph: check availability of mds cluster on mount after wait timeout ... Browse Code »

[ Upstream commit 97820058fb2831a4b203981fa2566ceaaa396103 ]

If all the MDS daemons are down for some reason, then the first mount
attempt will fail with EIO after the mount request times out. A mount
attempt will also fail with EIO if all of the MDS's are laggy.

This patch changes the code to return -EHOSTUNREACH in these situations
and adds a pr_info error message to help the admin determine the cause.

URL: https://tracker.ceph.com/issues/4386
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Xiubo Li
2020-02-24 15:36:59 +0800

29 Jan, 2020

1 commit

fdd0f3b0e ceph: hold extra reference to r_parent over life of request ... Browse Code »

commit 9c1c2b35f1d94de8325344c2777d7ee67492db3b upstream.

Currently, we just assume that it will stick around by virtue of the
submitter's reference, but later patches will allow the syscall to
return early and we can't rely on that reference at that point.

While I'm not aware of any reports of it, Xiubo pointed out that this
may fix a use-after-free. If the wait for a reply times out or is
canceled via signal, and then the reply comes in after the syscall
returns, the client can end up trying to access r_parent without a
reference.

Take an extra reference to the inode when setting r_parent and release
it when releasing the request.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2020-01-29 23:45:24 +0800

18 Dec, 2019

1 commit

c13f137cf ceph: fix compat_ioctl for ceph_dir_operations ... Browse Code »

commit 18bd6caaef4021803dd0d031dc37c2d001d18a5b upstream.

The ceph_ioctl function is used both for files and directories, but only
the files support doing that in 32-bit compat mode.

On the s390 architecture, there is also a problem with invalid 31-bit
pointers that need to be passed through compat_ptr().

Use the new compat_ptr_ioctl() to address both issues.

Note: When backporting this patch to stable kernels, "compat_ioctl:
add compat_ptr_ioctl()" is needed as well.

Reviewed-by: "Yan, Zheng"
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2019-12-18 02:55:31 +0800

15 Nov, 2019

2 commits

6a81749eb ceph: increment/decrement dio counter on async requests ... Browse Code »

Ceph can in some cases issue an async DIO request, in which case we can
end up calling ceph_end_io_direct before the I/O is actually complete.
That may allow buffered operations to proceed while DIO requests are
still in flight.

Fix this by incrementing the i_dio_count when issuing an async DIO
request, and decrement it when tearing down the aio_req.

Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes")
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-11-15 01:44:51 +0800
a81bc3102 ceph: take the inode lock before acquiring cap refs ... Browse Code »

Most of the time, we (or the vfs layer) takes the inode_lock and then
acquires caps, but ceph_read_iter does the opposite, and that can lead
to a deadlock.

When there are multiple clients treading over the same data, we can end
up in a situation where a reader takes caps and then tries to acquire
the inode_lock. Another task holds the inode_lock and issues a request
to the MDS which needs to revoke the caps, but that can't happen until
the inode_lock is unwedged.

Fix this by having ceph_read_iter take the inode_lock earlier, before
attempting to acquire caps.

Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes")
Link: https://tracker.ceph.com/issues/36348
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-11-15 01:44:51 +0800

08 Nov, 2019

1 commit

ff29fde84 ceph: return -EINVAL if given fsc mount option on kernel w/o support ... Browse Code »

If someone requests fscache on the mount, and the kernel doesn't
support it, it should fail the mount.

[ Drop ceph prefix -- it's provided by pr_err. ]

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-11-08 01:03:23 +0800

05 Nov, 2019

2 commits

a3a081938 ceph: don't allow copy_file_range when stripe_count != 1 ... Browse Code »

copy_file_range tries to use the OSD 'copy-from' operation, which simply
performs a full object copy. Unfortunately, the implementation of this
system call assumes that stripe_count is always set to 1 and doesn't take
into account that the data may be striped across an object set. If the
file layout has stripe_count different from 1, then the destination file
data will be corrupted.

For example:

Consider a 8 MiB file with 4 MiB object size, stripe_count of 2 and
stripe_size of 2 MiB; the first half of the file will be filled with 'A's
and the second half will be filled with 'B's:

0 4M 8M Obj1 Obj2
+------+------+ +----+ +----+
file: | AAAA | BBBB | | AA | | AA |
+------+------+ |----| |----|
| BB | | BB |
+----+ +----+

If we copy_file_range this file into a new file (which needs to have the
same file layout!), then it will start by copying the object starting at
file offset 0 (Obj1). And then it will copy the object starting at file
offset 4M -- which is Obj1 again.

Unfortunately, the solution for this is to not allow remote object copies
to be performed when the file layout stripe_count is not 1 and simply
fallback to the default (VFS) copy_file_range implementation.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Luis Henriques
2019-11-05 22:42:58 +0800
5bb5e6ee6 ceph: don't try to handle hashed dentries in non-O_CREAT atomic_open ... Browse Code »

If ceph_atomic_open is handed a !d_in_lookup dentry, then that means
that it already passed d_revalidate so we *know* that it's negative (or
at least was very recently). Just return -ENOENT in that case.

This also addresses a subtle bug in dentry handling. Non-O_CREAT opens
call atomic_open with the parent's i_rwsem shared, but calling
d_splice_alias on a hashed dentry requires the exclusive lock.

If ceph_atomic_open receives a hashed, negative dentry on a non-O_CREAT
open, and another client were to race in and create the file before we
issue our OPEN, ceph_fill_trace could end up calling d_splice_alias on
the dentry with the new inode with insufficient locks.

Cc: stable@vger.kernel.org
Reported-by: Al Viro
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-11-05 22:42:44 +0800

30 Oct, 2019

3 commits

1f08529c8 ceph: add missing check in d_revalidate snapdir handling ... Browse Code »

We should not play with dcache without parent locked...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Al Viro
2019-10-30 05:29:55 +0800
aa8dd8167 ceph: fix RCU case handling in ceph_d_revalidate() ... Browse Code »

For RCU case ->d_revalidate() is called with rcu_read_lock() and
without pinning the dentry passed to it. Which means that it
can't rely upon ->d_inode remaining stable; that's the reason
for d_inode_rcu(), actually.

Make sure we don't reload ->d_inode there.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Al Viro
2019-10-30 05:29:54 +0800
ea60ed6fc ceph: fix use-after-free in __ceph_remove_cap() ... Browse Code »

KASAN reports a use-after-free when running xfstest generic/531, with the
following trace:

[ 293.903362] kasan_report+0xe/0x20
[ 293.903365] rb_erase+0x1f/0x790
[ 293.903370] __ceph_remove_cap+0x201/0x370
[ 293.903375] __ceph_remove_caps+0x4b/0x70
[ 293.903380] ceph_evict_inode+0x4e/0x360
[ 293.903386] evict+0x169/0x290
[ 293.903390] __dentry_kill+0x16f/0x250
[ 293.903394] dput+0x1c6/0x440
[ 293.903398] __fput+0x184/0x330
[ 293.903404] task_work_run+0xb9/0xe0
[ 293.903410] exit_to_usermode_loop+0xd3/0xe0
[ 293.903413] do_syscall_64+0x1a0/0x1c0
[ 293.903417] entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens because __ceph_remove_cap() may queue a cap release
(__ceph_queue_cap_release) which can be scheduled before that cap is
removed from the inode list with

rb_erase(&cap->ci_node, &ci->i_caps);

And, when this finally happens, the use-after-free will occur.

This can be fixed by removing the cap from the inode list before being
removed from the session list, and thus eliminating the risk of an UAF.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Luis Henriques
2019-10-30 05:29:51 +0800

15 Oct, 2019

1 commit

1d3f87233 ceph: just skip unrecognized info in ceph_reply_info_extra ... Browse Code »

In the future, we're going to want to extend the ceph_reply_info_extra
for create replies. Currently though, the kernel code doesn't accept an
extra blob that is larger than the expected data.

Change the code to skip over any unrecognized fields at the end of the
extra blob, rather than returning -EIO.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-10-15 23:43:10 +0800

26 Sep, 2019

1 commit

f41def397 Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"The highlights are:

- automatic recovery of a blacklisted filesystem session (Zheng Yan).
This is disabled by default and can be enabled by mounting with the
new "recover_session=clean" option.

- serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is
taken to avoid serializing O_DIRECT reads and writes with each
other, this is based on the exclusion scheme from NFS.

- handle large osdmaps better in the face of fragmented memory
(myself)

- don't limit what security.* xattrs can be get or set (Jeff Layton).
We were overly restrictive here, unnecessarily preventing things
like file capability sets stored in security.capability from
working.

- allow copy_file_range() within the same inode and across different
filesystems within the same cluster (Luis Henriques)"

* tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits)
ceph: call ceph_mdsc_destroy from destroy_fs_client
libceph: use ceph_kvmalloc() for osdmap arrays
libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()
ceph: allow object copies across different filesystems in the same cluster
ceph: include ceph_debug.h in cache.c
ceph: move static keyword to the front of declarations
rbd: pull rbd_img_request_create() dout out into the callers
ceph: reconnect connection if session hang in opening state
libceph: drop unused con parameter of calc_target()
ceph: use release_pages() directly
rbd: fix response length parameter for encoded strings
ceph: allow arbitrary security.* xattrs
ceph: only set CEPH_I_SEC_INITED if we got a MAC label
ceph: turn ceph_security_invalidate_secctx into static inline
ceph: add buffered/direct exclusionary locking for reads and writes
libceph: handle OSD op ceph_pagelist_append() errors
ceph: don't return a value from void function
ceph: don't freeze during write page faults
ceph: update the mtime when truncating up
ceph: fix indentation in __get_snap_name()
...

Linus Torvalds
2019-09-26 01:21:13 +0800

16 Sep, 2019

7 commits

3ee5a7015 ceph: call ceph_mdsc_destroy from destroy_fs_client ... Browse Code »

They're always called in succession.

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-09-16 18:06:25 +0800
6fd4e6348 ceph: allow object copies across different filesystems in the same cluster ... Browse Code »

OSDs are able to perform object copies across different pools. Thus,
there's no need to prevent copy_file_range from doing remote copies if the
source and destination superblocks are different. Only return -EXDEV if
they have different fsid (the cluster ID).

Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Luis Henriques
2019-09-16 18:06:25 +0800
48f930ea6 ceph: include ceph_debug.h in cache.c ... Browse Code »

Any file that uses dout() should include ceph_debug.h at the top.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2019-09-16 18:06:25 +0800
536cc331a ceph: move static keyword to the front of declarations ... Browse Code »

Move the static keyword to the front of declarations of
snap_handle_length, handle_length and connected_handle_length,
and resolve the following compiler warnings that can be seen
when building with warnings enabled (W=1):

fs/ceph/export.c:38:2: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]

fs/ceph/export.c:88:2: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]

fs/ceph/export.c:90:2: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]

Signed-off-by: Krzysztof Wilczynski
Signed-off-by: Ilya Dryomov

Krzysztof Wilczynski
2019-09-16 18:06:25 +0800
71a228bc8 ceph: reconnect connection if session hang in opening state ... Browse Code »

If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
mds won't send session msg to client, and delayed_work skip
CEPH_MDS_SESSION_OPENING state session, the session hang forever.

Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
session hang. Also, ensure that we skip sessions in RESTARTING and
REJECTED states since those states can't be resurrected by issuing
a keepalive.

Link: https://tracker.ceph.com/issues/41551
Signed-off-by: Erqi Chen chenerqi@gmail.com
Reviewed-by: "Yan, Zheng"
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Erqi Chen
2019-09-16 18:06:25 +0800
96ac9158a ceph: use release_pages() directly ... Browse Code »

release_pages() has been available to modules since Oct, 2010,
when commit 0be8557bcd34 ("fuse: use release_pages()") added
EXPORT_SYMBOL(release_pages). However, this ceph code was still
using a workaround.

Remove the workaround, and call release_pages() directly.

Signed-off-by: John Hubbard
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

John Hubbard
2019-09-16 18:06:25 +0800
b8fe918b0 ceph: allow arbitrary security.* xattrs ... Browse Code »

Most filesystems don't limit what security.* xattrs can be set or
fetched. I see no reason that we need to limit that on cephfs either.

Drop the special xattr handler for "security." xattrs, and allow the
"other" xattr handler to handle security xattrs as well.

In addition to fixing xfstest generic/093, this allows us to support
per-file capabilities (a'la setcap(8)).

Link: https://tracker.ceph.com/issues/41135
Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-09-16 18:06:25 +0800