Eric Lee / smarc-fsl-linux-kernel

12 Feb, 2020

3 commits

3b20bc2fe ceph: noacl mount option is effectively ignored ... Browse Code »

For the old mount API, the module parameters parseing function will
be called in ceph_mount() and also just after the default posix acl
flag set, so we can control to enable/disable it via the mount option.

But for the new mount API, it will call the module parameters
parseing function before ceph_get_tree(), so the posix acl will always
be enabled.

Fixes: 82995cc6c5ae ("libceph, rbd, ceph: convert to use the new mount API")
Signed-off-by: Xiubo Li
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-02-12 00:04:40 +0800
b27a939e8 ceph: canonicalize server path in place ... Browse Code »

syzbot reported that 4fbc0c711b24 ("ceph: remove the extra slashes in
the server path") had caused a regression where an allocation could be
done under a spinlock -- compare_mount_options() is called by sget_fc()
with sb_lock held.

We don't really need the supplied server path, so canonicalize it
in place and compare it directly. To make this work, the leading
slash is kept around and the logic in ceph_real_mount() to skip it
is restored. CEPH_MSG_CLIENT_SESSION now reports the same (i.e.
canonicalized) path, with the leading slash of course.

Fixes: 4fbc0c711b24 ("ceph: remove the extra slashes in the server path")
Reported-by: syzbot+98704a51af8e3d9425a9@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton

Ilya Dryomov
2020-02-12 00:04:40 +0800
8e4473bb5 ceph: do not execute direct write in parallel if O_APPEND is specified ... Browse Code »

In O_APPEND & O_DIRECT mode, the data from different writers will
be possibly overlapping each other since they take the shared lock.

For example, both Writer1 and Writer2 are in O_APPEND and O_DIRECT
mode:

Writer1 Writer2

shared_lock() shared_lock()
getattr(CAP_SIZE) getattr(CAP_SIZE)
iocb->ki_pos = EOF iocb->ki_pos = EOF
write(data1)
write(data2)
shared_unlock() shared_unlock()

The data2 will overlap the data1 from the same file offset, the
old EOF.

Switch to exclusive lock instead when O_APPEND is specified.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-02-12 00:04:04 +0800

09 Feb, 2020

1 commit

c9d35ee04 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.

New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.

And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.

Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...

Linus Torvalds
2020-02-09 05:26:41 +0800

08 Feb, 2020

6 commits

d53d0f746 ceph: use errorfc() and friends instead of spelling the prefix out ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:39 +0800
48ce73b1b fs_parse: handle optional arguments sanely ... Browse Code »

Don't bother with "mixed" options that would allow both the
form with and without argument (i.e. both -o foo and -o foo=bar).
Rather than trying to shove both into a single fs_parameter_spec,
allow having with-argument and no-argument specs with the same
name and teach fs_parse to handle that.

There are very few options of that sort, and they are actually
easier to handle that way - callers end up with less postprocessing.

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:37 +0800
d7167b149 fs_parse: fold fs_parameter_desc/fs_parameter_spec ... Browse Code »

The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:37 +0800
96cafb9cc fs_parser: remove fs_parameter_description name field ... Browse Code »

Unused now.

Signed-off-by: Eric Sandeen
Acked-by: David Howells
Signed-off-by: Al Viro

Eric Sandeen
2020-02-08 03:48:36 +0800
cc3c0b533 add prefix to fs_context->log ... Browse Code »

... turning it into struct p_log embedded into fs_context. Initialize
the prefix with fs_type->name, turning fs_parse() into a trivial
inline wrapper for __fs_parse().

This makes fs_parameter_description->name completely unused.

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:35 +0800
c80c98f0d ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log ... Browse Code »

... and now errorf() et.al. are never called with NULL fs_context,
so we can get rid of conditional in those.

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:34 +0800

07 Feb, 2020

2 commits

5eede6252 fold struct fs_parameter_enum into struct constant_table ... Browse Code »

no real difference now

Signed-off-by: Al Viro

Al Viro
2020-02-07 13:12:50 +0800
2710c957a fs_parse: get rid of ->enums ... Browse Code »

Don't do a single array; attach them to fsparam_enum() entry
instead. And don't bother trying to embed the names into those -
it actually loses memory, with no real speedup worth mentioning.

Simplifies validation as well.

Signed-off-by: Al Viro

Al Viro
2020-02-07 13:12:50 +0800

06 Feb, 2020

1 commit

4c46bef2e Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client ... Browse Code »

Pull ceph fixes from Ilya Dryomov:

- a set of patches that fixes various corner cases in mount and umount
code (Xiubo Li). This has to do with choosing an MDS, distinguishing
between laggy and down MDSes and parsing the server path.

- inode initialization fixes (Jeff Layton). The one included here
mostly concerns things like open_by_handle() and there is another one
that will come through Al.

- copy_file_range() now uses the new copy-from2 op (Luis Henriques).
The existing copy-from op turned out to be infeasible for generic
filesystem use; we disable the copy offload if OSDs don't support
copy-from2.

- a patch to link "rbd" and "block" devices together in sysfs (Hannes
Reinecke)

... and a smattering of cleanups from Xiubo, Jeff and Chengguang.

* tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits)
rbd: set the 'device' link in sysfs
ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c
ceph: print name of xattr in __ceph_{get,set}xattr() douts
ceph: print r_direct_hash in hex in __choose_mds() dout
ceph: use copy-from2 op in copy_file_range
ceph: close holes in structs ceph_mds_session and ceph_mds_request
rbd: work around -Wuninitialized warning
ceph: allocate the correct amount of extra bytes for the session features
ceph: rename get_session and switch to use ceph_get_mds_session
ceph: remove the extra slashes in the server path
ceph: add possible_max_rank and make the code more readable
ceph: print dentry offset in hex and fix xattr_version type
ceph: only touch the caps which have the subset mask requested
ceph: don't clear I_NEW until inode metadata is fully populated
ceph: retry the same mds later after the new session is opened
ceph: check availability of mds cluster on mount after wait timeout
ceph: keep the session state until it is released
ceph: add __send_request helper
ceph: ensure we have a new cap before continuing in fill_inode
ceph: drop unused ttl_from parameter from fill_inode
...

Linus Torvalds
2020-02-06 20:21:01 +0800

05 Feb, 2020

1 commit

bddea11b1 Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs timestamp updates from Al Viro:
"More 64bit timestamp work"

* 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kernfs: don't bother with timestamp truncation
fs: Do not overload update_time
fs: Delete timespec64_trunc()
fs: ubifs: Eliminate timespec64_trunc() usage
fs: ceph: Delete timespec64_trunc() usage
fs: cifs: Delete usage of timespec64_trunc
fs: fat: Eliminate timespec64_trunc() usage
utimes: Clamp the timestamps in notify_change()

Linus Torvalds
2020-02-05 13:02:42 +0800

27 Jan, 2020

23 commits

24604f7e2 ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c ... Browse Code »

All of these functions are only called from CephFS, so move them into
ceph.ko, and drop the exports.

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:40 +0800
d36e0b620 ceph: print name of xattr in __ceph_{get,set}xattr() douts ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:40 +0800
3c802092d ceph: print r_direct_hash in hex in __choose_mds() dout ... Browse Code »

It's hard to read, especially when it is:

ceph: __choose_mds 00000000b7bc9c15 is_hash=1 (-271041095) mode 0

At the same time, switch to __func__ to get rid of the checkpatch
warning.

Signed-off-by: Xiubo Li
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:40 +0800
78beb0ff2 ceph: use copy-from2 op in copy_file_range ... Browse Code »

Instead of using the copy-from operation, switch copy_file_range to the
new copy-from2 operation, which allows to send the truncate_seq and
truncate_size parameters.

If an OSD does not support the copy-from2 operation it will return
-EOPNOTSUPP. In that case, the kernel client will stop trying to do
remote object copies for this fs client and will always use the generic
VFS copy_file_range.

Signed-off-by: Luis Henriques
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Luis Henriques
2020-01-27 23:53:40 +0800
045100cd7 ceph: close holes in structs ceph_mds_session and ceph_mds_request ... Browse Code »

Move s_ref up to plug a 4 byte hole, which plugs another.
Move r_kref to shave 8 bytes off per request on x86_64.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:40 +0800
9ba1e2245 ceph: allocate the correct amount of extra bytes for the session features ... Browse Code »

The total bytes may potentially be larger than 8.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:40 +0800
5b3248c67 ceph: rename get_session and switch to use ceph_get_mds_session ... Browse Code »

Just in case the session's refcount reach 0 and is releasing, and
if we get the session without checking it, we may encounter kernel
crash.

Rename get_session to ceph_get_mds_session and make it global.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:40 +0800
4fbc0c711 ceph: remove the extra slashes in the server path ... Browse Code »

It's possible to pass the mount helper a server path that has more
than one contiguous slash character. For example:

$ mount -t ceph 192.168.195.165:40176:/// /mnt/cephfs/

In the MDS server side the extra slashes of the server path will be
treated as snap dir, and then we can get the following debug logs:

ceph: mount opening path //
ceph: open_root_inode opening '//'
ceph: fill_trace 0000000059b8a3bc is_dentry 0 is_target 1
ceph: alloc_inode 00000000dc4ca00b
ceph: get_inode created new inode 00000000dc4ca00b 1.ffffffffffffffff ino 1
ceph: get_inode on 1=1.ffffffffffffffff got 00000000dc4ca00b

And then when creating any new file or directory under the mount
point, we can hit the following BUG_ON in ceph_fill_trace():

BUG_ON(ceph_snap(dir) != dvino.snap);

Have the client ignore the extra slashes in the server path when
mounting. This will also canonicalize the path, so that identical mounts
can be consilidated.

1) "//mydir1///mydir//"
2) "/mydir1/mydir"
3) "/mydir1/mydir/"

Regardless of the internal treatment of these paths, the kernel still
stores the original string including the leading '/' for presentation
to userland.

URL: https://tracker.ceph.com/issues/42771
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:40 +0800
b38c9eb47 ceph: add possible_max_rank and make the code more readable ... Browse Code »

The m_num_mds here is actually the number for MDSs which are in
up:active status, and it will be duplicated to m_num_active_mds,
so remove it.

Add possible_max_rank to the mdsmap struct and this will be
the correctly possible largest rank boundary.

Remove the special case for one mds in __mdsmap_get_random_mds(),
because the validate mds rank may not always be 0.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:40 +0800
0eb308531 ceph: print dentry offset in hex and fix xattr_version type ... Browse Code »

In the debug logs about the di->offset or ctx->pos it is in hex
format, but some others are using the dec format. It is a little
hard to read.

For the xattr version, it is u64 type, using a shorter type may
truncate it.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
9f8b72b3a ceph: only touch the caps which have the subset mask requested ... Browse Code »

For the caps having no any subset mask requested we shouldn't touch
them.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
893e456b2 ceph: don't clear I_NEW until inode metadata is fully populated ... Browse Code »

Currently, we could have an open-by-handle (or NFS server) call
into the filesystem and start working with an inode before it's
properly filled out.

Don't clear I_NEW until we have filled out the inode, and discard it
properly if that fails. Note that we occasionally take an extra
reference to the inode to ensure that we don't put the last reference in
discard_new_inode, but rather leave it for ceph_async_iput.

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:39 +0800
c4853e977 ceph: retry the same mds later after the new session is opened ... Browse Code »

If max_mds > 1 and a request is submitted that chooses a random mds
rank, and the relating session is not opened yet, the request will wait
until the session has been opened and resend again.

Every time the request goes through __do_request, it will release the
req->session first and choose a random one again, which may be a
completely different rank than the one it just waited on.

In the worst case, it will open all the mds sessions one by one just
before the request can be successfully sent out.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
97820058f ceph: check availability of mds cluster on mount after wait timeout ... Browse Code »

If all the MDS daemons are down for some reason, then the first mount
attempt will fail with EIO after the mount request times out. A mount
attempt will also fail with EIO if all of the MDS's are laggy.

This patch changes the code to return -EHOSTUNREACH in these situations
and adds a pr_info error message to help the admin determine the cause.

URL: https://tracker.ceph.com/issues/4386
Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
4d681c2f9 ceph: keep the session state until it is released ... Browse Code »

When reconnecting the session but if it is denied by the MDS due
to client was in blacklist or something else, kclient will receive
a session close reply, and we will never see the important log:

"ceph: mds%d reconnect denied"

And with the confusing log:

"ceph: handle_session mds0 close 0000000085804730 state ??? seq 0"

Let's keep the session state until its memories is released.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
9cf54563b ceph: add __send_request helper ... Browse Code »

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
9a6bed4fe ceph: ensure we have a new cap before continuing in fill_inode ... Browse Code »

If the caller passes in a NULL cap_reservation, and we can't allocate
one then ensure that we fail gracefully.

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:39 +0800
57c219948 ceph: drop unused ttl_from parameter from fill_inode ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-27 23:53:39 +0800
07edc0571 ceph: fix possible long time wait during umount ... Browse Code »

During umount, if there has no any unsafe request in the mdsc and
some requests still in-flight and not got reply yet, and if the
rest requets are all safe ones, after that even all of them in mdsc
are unregistered, the umount must wait until after mount_timeout
seconds anyway.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
5d47648fe ceph: only choose one MDS who is in up:active state without laggy ... Browse Code »

Even the MDS is in up:active state, but it also maybe laggy. Here
will skip the laggy MDSs.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
4d7ace02b ceph: fix mdsmap cluster available check based on laggy number ... Browse Code »

In case the max_mds > 1 in MDS cluster and there is no any standby
MDS and all the max_mds MDSs are in up:active state, if one of the
up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
Then the mount will fail without considering other healthy MDSs.

There manybe some MDSs still "in" the cluster but not in up:active
state, we will ignore them. Only when all the up:active MDSs in
the cluster are laggy will treat the cluster as not be available.

In case decreasing the max_mds, the cluster will not stop the extra
up:active MDSs immediately and there will be a latency. During it
the up:active MDS number will be larger than the max_mds, so later
the m_info memories will 100% be reallocated.

Here will pick out the up:active MDSs as the m_num_mds and allocate
the needed memories once.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2020-01-27 23:53:39 +0800
d80865bff ceph: remove unnecessary assignment in ceph_pre_init_acls() ... Browse Code »

ceph_pagelist_encode_string() will not fail in reserved case,
also, we do not check err code here, so remove unnecessary
assignment.

Signed-off-by: Chengguang Xu
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Chengguang Xu
2020-01-27 23:53:39 +0800
8f5ac172a ceph: delete redundant douts in con_get/put() ... Browse Code »

We print session's refcount in debug message inside
ceph_put_mds_session() and get_session(), so we don't have to
print it in con_get()/__ceph_lookup_mds_session()/con_put().

Signed-off-by: Chengguang Xu
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Chengguang Xu
2020-01-27 23:53:39 +0800

22 Jan, 2020

1 commit

9c1c2b35f ceph: hold extra reference to r_parent over life of request ... Browse Code »

Currently, we just assume that it will stick around by virtue of the
submitter's reference, but later patches will allow the syscall to
return early and we can't rely on that reference at that point.

While I'm not aware of any reports of it, Xiubo pointed out that this
may fix a use-after-free. If the wait for a reply times out or is
canceled via signal, and then the reply comes in after the syscall
returns, the client can end up trying to access r_parent without a
reference.

Take an extra reference to the inode when setting r_parent and release
it when releasing the request.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2020-01-22 02:02:37 +0800

10 Dec, 2019

2 commits

da08e1e1d ceph: add more debug info when decoding mdsmap ... Browse Code »

Show the laggy state.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2019-12-10 03:55:10 +0800
bd84fbcb3 ceph: switch to global cap helper ... Browse Code »

__ceph_is_any_caps is a duplicate helper.

Signed-off-by: Xiubo Li
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Xiubo Li
2019-12-10 03:55:10 +0800