Eric Lee / smarc-fsl-linux-kernel

03 Oct, 2016

1 commit

fcff415c9 ceph: handle CEPH_SESSION_REJECT message ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-10-03 22:13:50 +0800

28 Jul, 2016

6 commits

0e2943878 ceph: unify cap flush and snapcap flush ... Browse Code »

This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
s_cap_flushing list instead. Inode is removed from the list when
there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
flushes.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:42 +0800
e4500b5e3 ceph: use list instead of rbtree to track cap flushes ... Browse Code »

We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:42 +0800
430afbadd ceph: mount non-default filesystem by name ... Browse Code »

To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.

This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap. after getting ID of filesystem.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 09:00:40 +0800
8aa152c77 ceph: remove ceph_mdsc_lease_release ... Browse Code »

Nothing calls it.

Signed-off-by: Jeff Layton
Reviewed-by: Yan, Zheng

Jeff Layton
2016-07-28 09:00:38 +0800
779fe0fb8 ceph: rados pool namespace support ... Browse Code »

This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 08:55:38 +0800
7627151ea libceph: define new ceph_file_layout structure ... Browse Code »

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 08:55:36 +0800

26 May, 2016

4 commits

f3c4ebe65 ceph: using hash value to compose dentry offset ... Browse Code »

If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:

(0xff << 52) | ((24 bits hash) << 28) |
(the nth entry hash hash collision)

This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-05-26 07:15:36 +0800
8974eebd3 ceph: record 'offset' for each entry of readdir result ... Browse Code »

This is preparation for using hash value as dentry 'offset'

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-05-26 07:15:35 +0800
956d39d63 ceph: define 'end/complete' in readdir reply as bit flags ... Browse Code »

Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-05-26 07:15:35 +0800
2a5beea3f ceph: define struct for dir entry in readdir reply ... Browse Code »

This avoids defining multiple arrays for entries in readdir reply

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-05-26 07:15:34 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

05 Mar, 2016

1 commit

5ea5c5e0a ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support ... Browse Code »

Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.

Signed-off-by: Yan, Zheng
Reviewed-by: Sage Weil

Yan, Zheng
2016-03-05 04:00:37 +0800

03 Nov, 2015

1 commit

68cd5b4b7 ceph: make fsync() wait unsafe requests that created/modified inode ... Browse Code »

If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-11-03 06:36:48 +0800

09 Sep, 2015

1 commit

48fec5d0a ceph: EIO all operations after forced umount ... Browse Code »

This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-09-09 04:14:28 +0800

25 Jun, 2015

8 commits

fdd4e1583 ceph: rework dcache readdir ... Browse Code »

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:32 +0800
8310b0891 ceph: track pending caps flushing globally ... Browse Code »

So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:31 +0800
553adfd94 ceph: track pending caps flushing accurately ... Browse Code »

Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:30 +0800
a319bf56a libceph: store timeouts in jiffies, verify user input ... Browse Code »

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled. It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2015-06-25 16:49:29 +0800
e8a7b8b12 ceph: exclude setfilelock requests when calculating oldest tid ... Browse Code »

setfilelock requests can block for a long time, which can prevent
client from advancing its oldest tid.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:29 +0800
745a8e3bc ceph: don't pre-allocate space for cap release messages ... Browse Code »

Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:29 +0800
affbc19a6 ceph: make sure syncfs flushes all cap snaps ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:29 +0800
10183a695 ceph: check OSD caps before read/write ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-06-25 16:49:28 +0800

19 Feb, 2015

2 commits

86d8f67b2 ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ... Browse Code »

use an atomic variable to track number of sessions, this can avoid block
operation inside wait loops.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-02-19 18:31:38 +0800
03f4fcb02 ceph: handle SESSION_FORCE_RO message ... Browse Code »

mark session as readonly and wake up all cap waiters.

Signed-off-by: Yan, Zheng

Yan, Zheng
2015-02-19 18:31:37 +0800

18 Dec, 2014

3 commits

01deead04 ceph: use getattr request to fetch inline data ... Browse Code »

Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
in getattr reply will be copied to the page.

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-12-18 01:09:52 +0800
fb01d1f8b ceph: parse inline data in MClientReply and MClientCaps ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-12-18 01:09:52 +0800
9280be24d ceph: fix file lock interruption ... Browse Code »

When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.

The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-12-18 01:09:49 +0800

15 Oct, 2014

2 commits

a687ecaf5 ceph: export ceph_session_state_name function ... Browse Code »

...so that it can be used from the ceph debugfs
code when dumping session info.

Signed-off-by: John Spray

John Spray
2014-10-15 03:56:50 +0800
25e6bae35 ceph: use pagelist to present MDS request data ... Browse Code »

Current code uses page array to present MDS request data. Pages in the
array are allocated/freed by caller of ceph_mdsc_do_request(). If request
is interrupted, the pages can be freed while they are still being used by
the request message.

The fix is use pagelist to present MDS request data. Pagelist is
reference counted.

Signed-off-by: Yan, Zheng
Reviewed-by: Sage Weil

Yan, Zheng
2014-10-15 03:56:49 +0800

06 Jun, 2014

1 commit

b8e69066d ceph: include time stamp in every MDS request ... Browse Code »

We recently modified the client/MDS protocol to include a timestamp in the
client request. This allows ctime updates to follow the client's clock
in most cases, which avoids subtle problems when clocks are out of sync
and timestamps are updated sometimes by the MDS clock (for most requests)
and sometimes by the client clock (for cap writeback).

Signed-off-by: Sage Weil

Sage Weil
2014-06-06 09:30:00 +0800

05 Apr, 2014

1 commit

54008399d ceph: preallocate buffer for readdir reply ... Browse Code »

Preallocate buffer for readdir reply. Limit number of entries in
readdir reply according to the buffer size.

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-04-05 12:08:22 +0800

21 Jan, 2014

1 commit

5d72d13c4 ceph: add open export target session helper ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-01-21 16:30:30 +0800

24 Nov, 2013

1 commit

99a9c273b ceph: handle race between cap reconnect and cap release ... Browse Code »

When a cap get released while composing the cap reconnect message.
We should skip queuing the release message if the cap hasn't been
added to the cap reconnect message.

Signed-off-by: Yan, Zheng
Reviewed-by: Sage Weil

Yan, Zheng
2013-11-24 03:01:02 +0800

01 Mar, 2013

1 commit

1cf0209c4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph updates from Sage Weil:
"A few groups of patches here. Alex has been hard at work improving
the RBD code, layout groundwork for understanding the new formats and
doing layering. Most of the infrastructure is now in place for the
final bits that will come with the next window.

There are a few changes to the data layout. Jim Schutt's patch fixes
some non-ideal CRUSH behavior, and a set of patches from me updates
the client to speak a newer version of the protocol and implement an
improved hashing strategy across storage nodes (when the server side
supports it too).

A pair of patches from Sam Lang fix the atomicity of open+create
operations. Several patches from Yan, Zheng fix various mds/client
issues that turned up during multi-mds torture tests.

A final set of patches expose file layouts via virtual xattrs, and
allow the policies to be set on directories via xattrs as well
(avoiding the awkward ioctl interface and providing a consistent
interface for both kernel mount and ceph-fuse users)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
libceph: add support for HASHPSPOOL pool flag
libceph: update osd request/reply encoding
libceph: calculate placement based on the internal data types
ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
ceph: update "ceph_features.h"
libceph: decode into cpu-native ceph_pg type
libceph: rename ceph_pg -> ceph_pg_v1
rbd: pass length, not op for osd completions
rbd: move rbd_osd_trivial_callback()
libceph: use a do..while loop in con_work()
libceph: use a flag to indicate a fault has occurred
libceph: separate non-locked fault handling
libceph: encapsulate connection backoff
libceph: eliminate sparse warnings
ceph: eliminate sparse warnings in fs code
rbd: eliminate sparse warnings
libceph: define connection flag helpers
rbd: normalize dout() calls
rbd: barriers are hard
rbd: ignore zero-length requests
...

Linus Torvalds
2013-03-01 09:43:09 +0800

12 Feb, 2013

1 commit

ff3d00466 ceph: Convert struct ceph_mds_request to use kuid_t and kgid_t ... Browse Code »

Hold the uid and gid for a pending ceph mds request using the types
kuid_t and kgid_t. When a request message is finally created convert
the kuid_t and kgid_t values into uids and gids in the initial user
namespace.

Cc: Sage Weil
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-12 19:19:26 +0800

18 Jan, 2013

1 commit

6e8575faa ceph: Check for created flag in response from mds ... Browse Code »

The mds now sends back a created inode if the create request
performed the create. If the file already existed, no inode is
returned in the reply. This allows ceph to set the created flag
in atomic_open so that permissions are properly checked in the case
that the file wasn't created by the create call to the mds.

To ensure compability with previous kernels, a feature for sending
back the inode in the create reply was added, so that the mds will
only send back the inode if the client indicates it supports the
feature.

Signed-off-by: Sam Lang
Reviewed-by: Sage Weil

Sam Lang
2013-01-18 02:42:36 +0800

17 May, 2012

1 commit

6c4a19158 ceph: define ceph_auth_handshake type ... Browse Code »

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers." Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.

Signed-off-by: Alex Elder
Reviewed-by: Sage Weil

Alex Elder
2012-05-17 21:18:12 +0800

03 Feb, 2012

1 commit

d8fb02abd ceph: create a new session lock to avoid lock inversion ... Browse Code »

Lockdep was reporting a possible circular lock dependency in
dentry_lease_is_valid(). That function needs to sample the
session's s_cap_gen and and s_cap_ttl fields coherently, but needs
to do so while holding a dentry lock. The s_cap_lock field was
being used to protect the two fields, but that can't be taken while
holding a lock on a dentry within the session.

In most cases, the s_cap_gen and s_cap_ttl fields only get operated
on separately. But in three cases they need to be updated together.
Implement a new lock to protect the spots updating both fields
atomically is required.

Signed-off-by: Alex Elder
Reviewed-by: Sage Weil

Alex Elder
2012-02-03 04:49:19 +0800

08 Dec, 2011

1 commit

be655596b ceph: use i_ceph_lock instead of i_lock ... Browse Code »

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction. That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().

Reported-by: Amon Ott
Signed-off-by: Sage Weil

Sage Weil
2011-12-08 02:46:44 +0800