Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2017

5 commits

6e9fa67c5 ceph: fix endianness bug in frag_tree_split_cmp ... Browse Code »

commit fe2ed42517533068ac03eed5630fffafff27eacf upstream.

sparse says:

fs/ceph/inode.c:308:36: warning: incorrect type in argument 1 (different base types)
fs/ceph/inode.c:308:36: expected unsigned int [unsigned] [usertype] a
fs/ceph/inode.c:308:36: got restricted __le32 [usertype] frag
fs/ceph/inode.c:308:46: warning: incorrect type in argument 2 (different base types)
fs/ceph/inode.c:308:46: expected unsigned int [unsigned] [usertype] b
fs/ceph/inode.c:308:46: got restricted __le32 [usertype] frag

We need to convert these values to host-endian before calling the
comparator.

Fixes: a407846ef7c6 ("ceph: don't assume frag tree splits in mds reply are sorted")
Signed-off-by: Jeff Layton
Reviewed-by: Sage Weil
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2017-01-26 15:24:43 +0800
2e4f2131b ceph: fix endianness of getattr mask in ceph_d_revalidate ... Browse Code »

commit 1097680d759918ce4a8705381c0ab2ed7bd60cf1 upstream.

sparse says:

fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types)
fs/ceph/dir.c:1248:50: expected restricted __le32 [usertype] mask
fs/ceph/dir.c:1248:50: got int [signed] [assigned] mask

Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
Signed-off-by: Jeff Layton
Reviewed-by: Sage Weil
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2017-01-26 15:24:43 +0800
8934e0696 ceph: fix ceph_get_caps() interruption ... Browse Code »

commit 6e09d0fb64402cec579f029ca4c7f39f5c48fc60 upstream.

Commit 5c341ee32881 ("ceph: fix scheduler warning due to nested
blocking") causes infinite loop when process is interrupted. Fix it.

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Yan, Zheng
2017-01-26 15:24:43 +0800
48baa9241 ceph: fix scheduler warning due to nested blocking ... Browse Code »

commit 5c341ee32881c554727ec14b71ec3e8832f01989 upstream.

try_get_cap_refs can be used as a condition in a wait_event* calls.
This is all fine until it has to call __ceph_do_pending_vmtruncate,
which in turn acquires the i_truncate_mutex. This leads to a situation
in which a task's state is !TASK_RUNNING and at the same time it's
trying to acquire a sleeping primitive. In essence a nested sleeping
primitives are being used. This causes the following warning:

WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_event+0x5d/0x110
ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
Call Trace:
[] dump_stack+0x67/0x9e
[] warn_slowpath_common+0x86/0xc0
[] warn_slowpath_fmt+0x4c/0x50
[] ? prepare_to_wait_event+0x5d/0x110
[] ? prepare_to_wait_event+0x5d/0x110
[] __might_sleep+0x9f/0xb0
[] mutex_lock+0x20/0x40
[] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
[] try_get_cap_refs+0xa2/0x320 [ceph]
[] ceph_get_caps+0x255/0x2b0 [ceph]
[] ? wait_woken+0xb0/0xb0
[] ceph_write_iter+0x2b1/0xde0 [ceph]
[] ? schedule_timeout+0x202/0x260
[] ? kmem_cache_free+0x1ea/0x200
[] ? iput+0x9e/0x230
[] ? __might_sleep+0x52/0xb0
[] ? __might_fault+0x37/0x40
[] ? cp_new_stat+0x153/0x170
[] __vfs_write+0xaa/0xe0
[] vfs_write+0xa9/0x190
[] ? set_close_on_exec+0x31/0x70
[] SyS_write+0x46/0xa0

This happens since wait_event_interruptible can interfere with the
mutex locking code, since they both fiddle with the task state.

Fix the issue by using the newly-added nested blocking infrastructure
in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with
nested blocking")

Link: https://lwn.net/Articles/628628/
Signed-off-by: Nikolay Borisov
Signed-off-by: Yan, Zheng
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2017-01-26 15:24:43 +0800
1f75575ac ceph: fix bad endianness handling in parse_reply_info_extra ... Browse Code »

commit 6df8c9d80a27cb587f61b4f06b57e248d8bc3f86 upstream.

sparse says:

fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer

The op value is __le32, so we need to convert it before comparing it.

Signed-off-by: Jeff Layton
Reviewed-by: Sage Weil
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2017-01-26 15:24:40 +0800

08 Dec, 2016

1 commit

c3f4688a0 ceph: don't set req->r_locked_dir in ceph_d_revalidate ... Browse Code »

This function sets req->r_locked_dir which is supposed to indicate to
ceph_fill_trace that the parent's i_rwsem is locked for write.
Unfortunately, there is no guarantee that the dir will be locked when
d_revalidate is called, so we really don't want ceph_fill_trace to do
any dcache manipulation from this context. Clear req->r_locked_dir since
it's clearly not safe to do that.

What we really want to know with d_revalidate is whether the dentry
still points to the same inode. ceph_fill_trace installs a pointer to
the inode in req->r_target_inode, so we can just compare that to
d_inode(dentry) to see if it's the same one after the lookup.

Also, since we aren't generally interested in the parent here, we can
switch to using a GETATTR to hint that to the MDS, which also means that
we only need to reserve one cap.

Finally, just remove the d_unhashed check. That's really outside the
purview of a filesystem's d_revalidate. If the thing became unhashed
while we're checking it, then that's up to the VFS to handle anyway.

Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
Link: http://tracker.ceph.com/issues/18041
Reported-by: Donatas Abraitis
Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2016-12-08 21:32:16 +0800

11 Nov, 2016

1 commit

8a8d56176 ceph: use default file splice read callback ... Browse Code »

Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-11-11 03:13:04 +0800

18 Oct, 2016

3 commits

5130ccea7 ceph: fix non static symbol warning ... Browse Code »

Fixes the following sparse warning:

fs/ceph/xattr.c:19:28: warning:
symbol 'ceph_other_xattr_handler' was not declared. Should it be static?

Signed-off-by: Wei Yongjun
Signed-off-by: Ilya Dryomov

Wei Yongjun
2016-10-18 18:30:32 +0800
31ca58781 ceph: fix uninitialized dentry pointer in ceph_real_mount() ... Browse Code »

fs/ceph/super.c: In function ‘ceph_real_mount’:
fs/ceph/super.c:818: warning: ‘root’ may be used uninitialized in this function

If s_root is already valid, dentry pointer root is never initialized,
and returned by ceph_real_mount(). This will cause a crash later when
the caller dereferences the pointer.

Fixes: ce2728aaa82bbeba ("ceph: avoid accessing / when mounting a subpath")
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Yan, Zheng

Geert Uytterhoeven
2016-10-18 18:10:59 +0800
f72f94555 ceph: fix readdir vs fragmentation race ... Browse Code »

following sequence of events tigger the race

- client readdir frag 0* -> got item 'A'
- MDS merges frag 0* and frag 1*
- client send readdir request (frag 1*, offset 2, readdir_start 'A')
- MDS reply items (that are after item 'A') in frag *

Link: http://tracker.ceph.com/issues/17286
Signed-off-by: Yan, Zheng

Yan, Zheng
2016-10-18 18:09:58 +0800

16 Oct, 2016

1 commit

0d7718f66 ceph: fix error handling in ceph_read_iter ... Browse Code »

In case __ceph_do_getattr returns an error and the retry_op in
ceph_read_iter is not READ_INLINE, then it's possible to invoke
__free_page on a page which is NULL, this naturally leads to a crash.
This can happen when, for example, a process waiting on a MDS reply
receives sigterm.

Fix this by explicitly checking whether the page is set or not.

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Nikolay Borisov
Reviewed-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Nikolay Borisov
2016-10-16 05:28:07 +0800

11 Oct, 2016

4 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
2016-10-11 11:02:51 +0800
97d211670 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas

This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check

Linus Torvalds
2016-10-11 08:11:50 +0800
8dfb790b1 Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull Ceph updates from Ilya Dryomov:
"The big ticket item here is support for rbd exclusive-lock feature,
with maintenance operations offloaded to userspace (Douglas Fuller,
Mike Christie and myself). Another block device bullet is a series
fixing up layering error paths (myself).

On the filesystem side, we've got patches that improve our handling of
buffered vs dio write races (Neil Brown) and a few assorted fixes from
Zheng. Also included a couple of random cleanups and a minor CRUSH
update"

* tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
crush: remove redundant local variable
crush: don't normalize input of crush_ln iteratively
libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
ceph: fix description for rsize and rasize mount options
rbd: use kmalloc_array() in rbd_header_from_disk()
ceph: use list_move instead of list_del/list_add
ceph: handle CEPH_SESSION_REJECT message
ceph: avoid accessing / when mounting a subpath
ceph: fix mandatory flock check
ceph: remove warning when ceph_releasepage() is called on dirty page
ceph: ignore error from invalidate_inode_pages2_range() in direct write
ceph: fix error handling of start_read()
rbd: add rbd_obj_request_error() helper
rbd: img_data requests don't own their page array
rbd: don't call rbd_osd_req_format_read() for !img_data requests
rbd: rework rbd_img_obj_exists_submit() error paths
rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
rbd: move bumping img_request refcount into rbd_obj_request_submit()
rbd: mark the original request as done if stat request fails
...

Linus Torvalds
2016-10-11 04:52:05 +0800

08 Oct, 2016

2 commits

e55f1d1d1 Merge remote-tracking branch 'jk/vfs' into work.misc Browse Code »

Al Viro
2016-10-08 23:06:08 +0800
fd50ecadd vfs: Remove {get,set,remove}xattr inode operations ... Browse Code »

These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2016-10-08 09:48:36 +0800

03 Oct, 2016

7 commits

8cdcc07dd ceph: use list_move instead of list_del/list_add ... Browse Code »

Using list_move() instead of list_del() + list_add().

Signed-off-by: Wei Yongjun
Signed-off-by: Ilya Dryomov

Wei Yongjun
2016-10-03 22:13:50 +0800
fcff415c9 ceph: handle CEPH_SESSION_REJECT message ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-10-03 22:13:50 +0800
ce2728aaa ceph: avoid accessing / when mounting a subpath ... Browse Code »

Accessing / causes failuire if the client has caps that restrict path

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-10-03 22:13:50 +0800
db4a63aab ceph: fix mandatory flock check ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-10-03 22:13:49 +0800
e55f1a187 ceph: remove warning when ceph_releasepage() is called on dirty page ... Browse Code »

If O_DIRECT writes are racing with buffered writes, then
the call to invalidate_inode_pages2_range() can call ceph_releasepage()
on dirty pages.

Most filesystems hold inode_lock() across O_DIRECT writes so they do not
suffer this race, but cephfs deliberately drops the lock, and opens a window
for the race.

This race can be triggered with the generic/036 test from the xfstests
test suite. It doesn't happen every time, but it does happen often.

As the possibilty is expected, remove the warning, and instead include
the PageDirty() status in the debug message.

Signed-off-by: NeilBrown
Reviewed-by: Jeff Layton
Reviewed-by: Yan, Zheng

NeilBrown
2016-10-03 22:13:49 +0800
5d7eb1a32 ceph: ignore error from invalidate_inode_pages2_range() in direct write ... Browse Code »

This call can fail if there are dirty pages. The preceding call to
filemap_write_and_wait_range() will normally remove dirty pages, but
as inode_lock() is not held over calls to ceph_direct_read_write(), it
could race with non-direct writes and pages could be dirtied
immediately after filemap_write_and_wait_range() returns

If there are dirty pages, they will be removed by the subsequent call
to truncate_inode_pages_range(), so having them here is not a problem.

If the 'ret' value is left holding an error, then in the async IO case
(aio_req is not NULL) the loop that would normally call
ceph_osdc_start_request() will see the error in 'ret' and abort all
requests. This doesn't seem like correct behaviour.

So use separate 'ret2' instead of overloading 'ret'.

Signed-off-by: NeilBrown
Reviewed-by: Jeff Layton
Reviewed-by: Yan, Zheng

NeilBrown
2016-10-03 22:13:49 +0800
1afe47856 ceph: fix error handling of start_read() ... Browse Code »

If start_page() fails to add a page to page cache or fails to send
OSD request. It should cal put_page() (instead of free_page()) for
relevant pages.

Besides, start_page() need to cancel fscache readpage if it fails
to send OSD request.

Signed-off-by: Yan, Zheng
Reported-by: Zhi Zhang

Yan, Zheng
2016-10-03 22:13:49 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

27 Sep, 2016

2 commits

2773bf00a fs: rename "rename2" i_op to "rename" ... Browse Code »

Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-27 17:03:58 +0800
1cd66c93b fs: make remaining filesystems use .rename2 ... Browse Code »

This is trivial to do:

- add flags argument to foo_rename()
- check if flags is zero
- assign foo_rename() to .rename2 instead of .rename

This doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).

Filesystems converted:

9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

After this, we can get rid of the duplicate interfaces for rename.

Signed-off-by: Miklos Szeredi
Acked-by: Greg Kroah-Hartman
Acked-by: David Howells [AFS]
Acked-by: Mike Marshall
Cc: Eric Van Hensbergen
Cc: Ilya Dryomov
Cc: Jan Harkes
Cc: Tyler Hicks
Cc: Oleg Drokin
Cc: Trond Myklebust
Cc: Mark Fasheh

Miklos Szeredi
2016-09-27 17:03:58 +0800

22 Sep, 2016

3 commits

31051c85b fs: Give dentry to inode_change_ok() instead of inode ... Browse Code »

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara

Jan Kara
2016-09-22 16:56:19 +0800
fd5472ed4 ceph: Propagate dentry down to inode_change_ok() ... Browse Code »

To avoid clearing of capabilities or security related extended
attributes too early, inode_change_ok() will need to take dentry instead
of inode. ceph_setattr() has the dentry easily available but
__ceph_setattr() is also called from ceph_set_acl() where dentry is not
easily available. Luckily that call path does not need inode_change_ok()
to be called anyway. So reorganize functions a bit so that
inode_change_ok() is called only from paths where dentry is available.

Reviewed-by: Christoph Hellwig
Acked-by: Jeff Layton
Signed-off-by: Jan Kara

Jan Kara
2016-09-22 16:56:19 +0800
073931017 posix_acl: Clear SGID bit when setting file permissions ... Browse Code »

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2). Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig
Reviewed-by: Jeff Layton
Signed-off-by: Jan Kara
Signed-off-by: Andreas Gruenbacher

Jan Kara
2016-09-22 16:55:32 +0800

05 Sep, 2016

1 commit

0f5aa88a7 ceph: do not modify fi->frag in need_reset_readdir() ... Browse Code »

Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

fs/ceph/dir.c:600:22: error: using the result of an assignment as a
condition without parentheses [-Werror,-Wparentheses]
} else if (fi->frag |= fpos_frag(new_pos)) {
~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
fs/ceph/dir.c:600:22: note: place parentheses around the assignment
to silence this warning
} else if (fi->frag |= fpos_frag(new_pos)) {
^
( )
fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
assignment into an inequality comparison
} else if (fi->frag |= fpos_frag(new_pos)) {
^~
!=

Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
Signed-off-by: Nicolas Iooss
Signed-off-by: Ilya Dryomov

Nicolas Iooss
2016-09-05 20:30:35 +0800

09 Aug, 2016

2 commits

4eacd4cb3 ceph: initialize pathbase in the !dentry case in encode_caps_cb() ... Browse Code »

pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2016-08-09 23:26:56 +0800
e4d2b16a4 ceph: fix null pointer dereference in ceph_flush_snaps() ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-08-09 03:41:43 +0800

03 Aug, 2016

1 commit

72b5ac54d Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull Ceph updates from Ilya Dryomov:
"The highlights are:

- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported

- A large rework of the MDS cap flushing code (Zheng Yan)

- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU

On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"

* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...

Linus Torvalds
2016-08-03 07:39:09 +0800

29 Jul, 2016

1 commit

554828ee0 Merge branch 'salted-string-hash' ... Browse Code »

This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash

Linus Torvalds
2016-07-29 03:26:31 +0800

28 Jul, 2016

5 commits

955818cd5 ceph: Correctly return NXIO errors from ceph_llseek ... Browse Code »

ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.

Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: Phil Turnbull
Signed-off-by: Yan, Zheng

Phil Turnbull
2016-07-28 09:00:45 +0800
6b1a9a6c5 ceph: Mark the file cache as unreclaimable ... Browse Code »

Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
that it can satisfy its internal needs. Inspecting the code shows that
most of the caches are indeed reclaimable since they are directly
related to the generic inode/dentry shrinkers. However, one of the
cache used to satisfy struct file is not reclaimable since its
entries are freed only when the last reference to the file is
dropped. If a heavily loaded node opens a lot of files it can
introduce non-trivial discrepancies between memory shown as reclaimable
and what is actually reclaimed when drop_caches is used.

Fix this by removing the reclaimable flag for the file's cache.

Signed-off-by: Nikolay Borisov
Signed-off-by: Yan, Zheng

Nikolay Borisov
2016-07-28 09:00:45 +0800
c8799fc46 ceph: optimize cap flush waiting ... Browse Code »

Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:45 +0800
ed9b430c9 ceph: cleanup ceph_flush_snaps() ... Browse Code »

This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:44 +0800
7bc00fddb ceph: kick cap flushes before sending other cap message ... Browse Code »

If ceph_check_caps() wants to send cap message to a recovering MDS,
make sure it kicks cap flushes first.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:44 +0800