Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2017

12 commits

f77ef5348 libceph: stop allocating a new cipher on every crypto request ... Browse Code »

commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream.

This is useless and more importantly not allowed on the writeback path,
because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
can recurse back into the filesystem:

kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080
Workqueue: ceph-msgr ceph_con_workfn [libceph]
ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
Call Trace:
[] ? schedule+0x31/0x80
[] ? schedule_preempt_disabled+0xa/0x10
[] ? __mutex_lock_slowpath+0xb4/0x130
[] ? mutex_lock+0x1b/0x30
[] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
[] ? move_active_pages_to_lru+0x125/0x270
[] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
[] ? __list_lru_walk_one.isra.3+0x33/0x120
[] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
[] ? super_cache_scan+0x17e/0x190
[] ? shrink_slab.part.38+0x1e3/0x3d0
[] ? shrink_node+0x10a/0x320
[] ? do_try_to_free_pages+0xf4/0x350
[] ? try_to_free_pages+0xea/0x1b0
[] ? __alloc_pages_nodemask+0x61d/0xe60
[] ? cache_grow_begin+0x9d/0x560
[] ? fallback_alloc+0x148/0x1c0
[] ? __crypto_alloc_tfm+0x37/0x130
[] ? __kmalloc+0x1eb/0x580
[] ? crush_choose_firstn+0x3eb/0x470 [libceph]
[] ? __crypto_alloc_tfm+0x37/0x130
[] ? crypto_spawn_tfm+0x39/0x60
[] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
[] ? __crypto_alloc_tfm+0xcc/0x130
[] ? crypto_skcipher_init_tfm+0x113/0x180
[] ? crypto_create_tfm+0x43/0xb0
[] ? crypto_larval_lookup+0x150/0x150
[] ? crypto_alloc_tfm+0x72/0x120
[] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
[] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
[] ? release_sock+0x40/0x90
[] ? tcp_recvmsg+0x4b4/0xae0
[] ? ceph_encrypt2+0x54/0xc0 [libceph]
[] ? ceph_x_encrypt+0x5d/0x90 [libceph]
[] ? calcu_signature+0x5f/0x90 [libceph]
[] ? ceph_x_sign_message+0x35/0x50 [libceph]
[] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
[] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
[] ? queue_con_delay+0x33/0xd0 [libceph]
[] ? __submit_request+0x20d/0x2f0 [libceph]
[] ? ceph_osdc_start_request+0x28/0x30 [libceph]
[] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
[] ? process_one_work+0x160/0x410
[] ? worker_thread+0x4d/0x480
[] ? process_one_work+0x410/0x410
[] ? kthread+0xcd/0xf0
[] ? ret_from_fork+0x1f/0x40
[] ? kthread_create_on_node+0x190/0x190

Allocating the cipher along with the key fixes the issue - as long the
key doesn't change, a single cipher context can be used concurrently in
multiple requests.

We still can't take that GFP_KERNEL allocation though. Both
ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

Reported-by: Lucas Stach
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:46 +0800
5b482bf58 libceph: uninline ceph_crypto_key_destroy() ... Browse Code »

commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:46 +0800
a193c7247 libceph: make sure ceph_aes_crypt() IV is aligned ... Browse Code »

commit 124f930b8cbc4ac11236e6eb1c5f008318864588 upstream.

... otherwise the crypto stack will align it for us with a GFP_ATOMIC
allocation and a memcpy() -- see skcipher_walk_first().

Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:43 +0800
b8add6715 libceph: remove now unused ceph_*{en,de}crypt*() functions ... Browse Code »

commit 2b1e1a7cd0a615d57455567a549f9965023321b5 upstream.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:42 +0800
2982b9c92 libceph: switch ceph_x_decrypt() to ceph_crypt() ... Browse Code »

commit e15fd0a11db00fc7f470a9fc804657ec3f6d04a5 upstream.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:42 +0800
717a145bd libceph: switch ceph_x_encrypt() to ceph_crypt() ... Browse Code »

commit d03857c63bb036edff0aa7a107276360173aca4e upstream.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
6e371f9a4 libceph: tweak calcu_signature() a little ... Browse Code »

commit 4eb4517ce7c9c573b6c823de403aeccb40018cfc upstream.

- replace an ad-hoc array with a struct
- rename to calc_signature() for consistency

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
788a0bbc7 libceph: rename and align ceph_x_authorizer::reply_buf ... Browse Code »

commit 7882a26d2e2e520099e2961d5e2e870f8e4172dc upstream.

It's going to be used as a temporary buffer for in-place en/decryption
with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
ecf7ced85 libceph: introduce ceph_crypt() for in-place en/decryption ... Browse Code »

commit a45f795c65b479b4ba107b6ccde29b896d51ee98 upstream.

Starting with 4.9, kernel stacks may be vmalloced and therefore not
guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
option is enabled by default on x86. This makes it invalid to use
on-stack buffers with the crypto scatterlist API, as sg_set_buf()
expects a logical address and won't work with vmalloced addresses.

There isn't a different (e.g. kvec-based) crypto API we could switch
net/ceph/crypto.c to and the current scatterlist.h API isn't getting
updated to accommodate this use case. Allocating a new header and
padding for each operation is a non-starter, so do the en/decryption
in-place on a single pre-assembled (header + data + padding) heap
buffer. This is explicitly supported by the crypto API:

"... the caller may provide the same scatter/gather list for the
plaintext and cipher text. After the completion of the cipher
operation, the plaintext data is replaced with the ciphertext data
in case of an encryption and vice versa for a decryption."

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
0548b8298 libceph: introduce ceph_x_encrypt_offset() ... Browse Code »

commit 55d9cc834f933698fc864f0d36f3cca533d30a8d upstream.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
be6045761 libceph: old_key in process_one_ticket() is redundant ... Browse Code »

commit 462e650451c577d15eeb4d883d70fa9e4e529fad upstream.

Since commit 0a990e709356 ("ceph: clean up service ticket decoding"),
th->session_key isn't assigned until everything is decoded.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800
2e62bf3c6 libceph: ceph_x_encrypt_buflen() takes in_len ... Browse Code »

commit 36721ece1e84a25130c4befb930509b3f96de020 upstream.

Pass what's going to be encrypted - that's msg_b, not ticket_blob.
ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
the maxlen calculation, but makes it a bit clearer.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-26 15:24:41 +0800

09 Jan, 2017

1 commit

fc6cb9c30 libceph: verify authorize reply on connect ... Browse Code »

commit 5c056fdc5b474329037f2aa18401bd73033e0ce0 upstream.

After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.

AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd1bba ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").

The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down. Pass 0 to facilitate backporting, and kill
it in the next commit.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2017-01-09 15:32:24 +0800

11 Nov, 2016

2 commits

264048afa libceph: initialize last_linger_id with a large integer ... Browse Code »

osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies. Starting with a large integer should ease the task
of telling apart kernel and userspace clients.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-11-11 03:13:08 +0800
3890dce1d libceph: fix legacy layout decode with pool 0 ... Browse Code »

If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid pool.

Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825
Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-11-11 03:13:08 +0800

19 Oct, 2016

1 commit

c164154f6 mm: replace get_user_pages_unlocked() write/force parameters with gup_flags ... Browse Code »

This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Reviewed-by: Jan Kara
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 05:13:37 +0800

06 Oct, 2016

2 commits

64f77566e crush: remove redundant local variable ... Browse Code »

Remove extra x1 variable, it's just temporary placeholder that
clutters the code unnecessarily.

Reflects ceph.git commit 0d19408d91dd747340d70287b4ef9efd89e95c6b.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-10-06 05:02:10 +0800
74a529383 crush: don't normalize input of crush_ln iteratively ... Browse Code »

Use __builtin_clz() supported by GCC and Clang to figure out
how many bits we should shift instead of shifting by a bit
in a loop until the value gets normalized. Improves performance
of this function by up to 3x in worst-case scenario and overall
straw2 performance by ~10%.

Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-10-06 05:02:04 +0800

03 Oct, 2016

2 commits

464691bd5 libceph: ceph_build_auth() doesn't need ceph_auth_build_hello() ... Browse Code »

A static bug finder (EBA) on Linux 4.7:

Double lock in net/ceph/auth.c
second lock at 108: mutex_lock(& ac->mutex); [ceph_auth_build_hello]
after calling from 263: ret = ceph_auth_build_hello(ac, msg_buf, msg_len);
if ! ac->protocol -> true at 262
first lock at 261: mutex_lock(& ac->mutex); [ceph_build_auth]

ceph_auth_build_hello() is never called, because the protocol is always
initialized, whether we are checking existing tickets (in delayed_work())
or getting new ones after invalidation (in invalidate_authorizer()).

Reported-by: Iago Abal
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-10-03 22:13:50 +0800
fdc723e77 libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello() ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-10-03 22:13:50 +0800

25 Aug, 2016

10 commits

005a07bf0 rbd: add 'client_addr' sysfs rbd device attribute ... Browse Code »

Export client addr/nonce, so userspace can check if a image is being
blacklisted.

Signed-off-by: Mike Christie
[idryomov@gmail.com: ceph_client_addr(), endianess fix]
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-08-25 05:49:16 +0800
ed95b21a4 rbd: support for exclusive-lock feature ... Browse Code »

Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature. Maintenance
operations (resize, snapshot create, etc) are offloaded to librbd via
returning -EOPNOTSUPP - librbd should request the lock and execute the
operation.

Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Tested-by: Mike Christie

Ilya Dryomov
2016-08-25 05:49:16 +0800
99d169431 rbd: retry watch re-registration periodically ... Browse Code »

Revamp watch code to support retrying watch re-registration:

- add rbd_dev->watch_state for more robust errcb handling
- store watch cookie separately to avoid dereferencing watch_handle
which is set to NULL on unwatch
- move re-register code into a delayed work and retry re-registration
every second, unless the client is blacklisted

Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Tested-by: Mike Christie

Ilya Dryomov
2016-08-25 05:49:16 +0800
033268a5f libceph: rename ceph_client_id() -> ceph_client_gid() ... Browse Code »

It's gid / global_id in other places.

Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Ilya Dryomov
2016-08-25 05:49:16 +0800
6305a3b41 libceph: support for blacklisting clients ... Browse Code »

Reuse ceph_mon_generic_request infrastructure for sending monitor
commands. In particular, add support for 'blacklist add' to prevent
other, non-responsive clients from making further updates.

Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Douglas Fuller
2016-08-25 05:49:15 +0800
d4ed4a530 libceph: support for lock.lock_info ... Browse Code »

Add an interface for the Ceph OSD lock.lock_info method and associated
data structures.

Based heavily on code by Mike Christie .

Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Douglas Fuller
2016-08-25 05:49:15 +0800
f66241cb9 libceph: support for advisory locking on RADOS objects ... Browse Code »

This patch adds support for rados lock, unlock and break lock.

Based heavily on code by Mike Christie .

Signed-off-by: Douglas Fuller
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Douglas Fuller
2016-08-25 05:49:15 +0800
428a71581 libceph: add ceph_osdc_call() single-page helper ... Browse Code »

Add a convenience function to osd_client to send Ceph OSD
'class' ops. The interface assumes that the request and
reply data each consist of single pages.

Signed-off-by: Douglas Fuller
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Douglas Fuller
2016-08-25 05:49:15 +0800
a4ed38d7a libceph: support for CEPH_OSD_OP_LIST_WATCHERS ... Browse Code »

Add support for this Ceph OSD op, needed to support the RBD exclusive
lock feature.

Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Douglas Fuller
2016-08-25 05:49:15 +0800
f01d5cb24 libceph: rename ceph_entity_name_encode() -> ceph_auth_entity_name_encode() ... Browse Code »

Clear up EntityName vs entity_name_t confusion.

Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder

Ilya Dryomov
2016-08-25 05:49:15 +0800

09 Aug, 2016

3 commits

864364a29 libceph: using kfree_rcu() to simplify the code ... Browse Code »

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun
Signed-off-by: Ilya Dryomov

Wei Yongjun
2016-08-09 03:41:42 +0800
f52ec33cb libceph: make cancel_generic_request() static ... Browse Code »

Fixes the following sparse warning:

net/ceph/mon_client.c:577:6: warning:
symbol 'cancel_generic_request' was not declared. Should it be static?

Signed-off-by: Wei Yongjun
Signed-off-by: Ilya Dryomov

Wei Yongjun
2016-08-09 03:41:42 +0800
c22e853a2 libceph: fix return value check in alloc_msg_with_page_vector() ... Browse Code »

In case of error, the function ceph_alloc_page_vector() returns
ERR_PTR() and never returns NULL. The NULL test in the return value
check should be replaced with IS_ERR().

Fixes: 1907920324f1 ('libceph: support for sending notifies')
Signed-off-by: Wei Yongjun
Signed-off-by: Ilya Dryomov

Wei Yongjun
2016-08-09 03:41:41 +0800

28 Jul, 2016

6 commits

0cabbd94f libceph: fsmap.user subscription support ... Browse Code »

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-07-28 09:00:40 +0800
cd08e0a27 libceph: make sure redirect does not change namespace ... Browse Code »

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-07-28 08:55:37 +0800
30c156d99 libceph: rados pool namespace support ... Browse Code »

Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.

The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.

Signed-off-by: Yan, Zheng
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-07-28 08:55:37 +0800
51e927379 libceph: introduce reference counted string ... Browse Code »

The data structure is for storing namesapce string. It allows namespace
string to be shared between cephfs inodes with same layout. This data
structure can also be referenced by OSD request.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 08:55:37 +0800
7627151ea libceph: define new ceph_file_layout structure ... Browse Code »

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-07-28 08:55:36 +0800
b2aa5d0bc libceph: fix some missing includes ... Browse Code »

- decode.h needs slab.h for kmalloc()
- osd_client.h needs msgpool.h for struct ceph_msgpool
- msgpool.h doesn't need messenger.h

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-07-28 08:55:35 +0800

22 Jul, 2016

1 commit

930c53286 libceph: apply new_state before new_up_client on incrementals ... Browse Code »

Currently, osd_weight and osd_state fields are updated in the encoding
order. This is wrong, because an incremental map may look like e.g.

new_up_client: { osd=6, addr=... } # set osd_state and addr
new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state. A non-existent OSD is considered down by the
mapping code

2087 for (i = 0; i < pg->pg_temp.len; i++) {
2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089 if (ceph_can_shift_osds(pi))
2090 continue;
2091
2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[ 493.566805] rbd: rbd0: result -6 xferred 400000
[ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov
Reviewed-by: Josh Durgin

Ilya Dryomov
2016-07-22 21:17:40 +0800