26 Jan, 2017

12 commits

  • commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream.

    This is useless and more importantly not allowed on the writeback path,
    because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
    can recurse back into the filesystem:

    kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
    ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
    ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
    00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
    [] ? schedule+0x31/0x80
    [] ? schedule_preempt_disabled+0xa/0x10
    [] ? __mutex_lock_slowpath+0xb4/0x130
    [] ? mutex_lock+0x1b/0x30
    [] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
    [] ? move_active_pages_to_lru+0x125/0x270
    [] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
    [] ? __list_lru_walk_one.isra.3+0x33/0x120
    [] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
    [] ? super_cache_scan+0x17e/0x190
    [] ? shrink_slab.part.38+0x1e3/0x3d0
    [] ? shrink_node+0x10a/0x320
    [] ? do_try_to_free_pages+0xf4/0x350
    [] ? try_to_free_pages+0xea/0x1b0
    [] ? __alloc_pages_nodemask+0x61d/0xe60
    [] ? cache_grow_begin+0x9d/0x560
    [] ? fallback_alloc+0x148/0x1c0
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? __kmalloc+0x1eb/0x580
    [] ? crush_choose_firstn+0x3eb/0x470 [libceph]
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? crypto_spawn_tfm+0x39/0x60
    [] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
    [] ? __crypto_alloc_tfm+0xcc/0x130
    [] ? crypto_skcipher_init_tfm+0x113/0x180
    [] ? crypto_create_tfm+0x43/0xb0
    [] ? crypto_larval_lookup+0x150/0x150
    [] ? crypto_alloc_tfm+0x72/0x120
    [] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
    [] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
    [] ? release_sock+0x40/0x90
    [] ? tcp_recvmsg+0x4b4/0xae0
    [] ? ceph_encrypt2+0x54/0xc0 [libceph]
    [] ? ceph_x_encrypt+0x5d/0x90 [libceph]
    [] ? calcu_signature+0x5f/0x90 [libceph]
    [] ? ceph_x_sign_message+0x35/0x50 [libceph]
    [] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
    [] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
    [] ? queue_con_delay+0x33/0xd0 [libceph]
    [] ? __submit_request+0x20d/0x2f0 [libceph]
    [] ? ceph_osdc_start_request+0x28/0x30 [libceph]
    [] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
    [] ? process_one_work+0x160/0x410
    [] ? worker_thread+0x4d/0x480
    [] ? process_one_work+0x410/0x410
    [] ? kthread+0xcd/0xf0
    [] ? ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x190/0x190

    Allocating the cipher along with the key fixes the issue - as long the
    key doesn't change, a single cipher context can be used concurrently in
    multiple requests.

    We still can't take that GFP_KERNEL allocation though. Both
    ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
    GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

    Reported-by: Lucas Stach
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 124f930b8cbc4ac11236e6eb1c5f008318864588 upstream.

    ... otherwise the crypto stack will align it for us with a GFP_ATOMIC
    allocation and a memcpy() -- see skcipher_walk_first().

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 2b1e1a7cd0a615d57455567a549f9965023321b5 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit e15fd0a11db00fc7f470a9fc804657ec3f6d04a5 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit d03857c63bb036edff0aa7a107276360173aca4e upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 4eb4517ce7c9c573b6c823de403aeccb40018cfc upstream.

    - replace an ad-hoc array with a struct
    - rename to calc_signature() for consistency

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 7882a26d2e2e520099e2961d5e2e870f8e4172dc upstream.

    It's going to be used as a temporary buffer for in-place en/decryption
    with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
    Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit a45f795c65b479b4ba107b6ccde29b896d51ee98 upstream.

    Starting with 4.9, kernel stacks may be vmalloced and therefore not
    guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
    option is enabled by default on x86. This makes it invalid to use
    on-stack buffers with the crypto scatterlist API, as sg_set_buf()
    expects a logical address and won't work with vmalloced addresses.

    There isn't a different (e.g. kvec-based) crypto API we could switch
    net/ceph/crypto.c to and the current scatterlist.h API isn't getting
    updated to accommodate this use case. Allocating a new header and
    padding for each operation is a non-starter, so do the en/decryption
    in-place on a single pre-assembled (header + data + padding) heap
    buffer. This is explicitly supported by the crypto API:

    "... the caller may provide the same scatter/gather list for the
    plaintext and cipher text. After the completion of the cipher
    operation, the plaintext data is replaced with the ciphertext data
    in case of an encryption and vice versa for a decryption."

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 55d9cc834f933698fc864f0d36f3cca533d30a8d upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 462e650451c577d15eeb4d883d70fa9e4e529fad upstream.

    Since commit 0a990e709356 ("ceph: clean up service ticket decoding"),
    th->session_key isn't assigned until everything is decoded.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 36721ece1e84a25130c4befb930509b3f96de020 upstream.

    Pass what's going to be encrypted - that's msg_b, not ticket_blob.
    ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
    the maxlen calculation, but makes it a bit clearer.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     

09 Jan, 2017

1 commit

  • commit 5c056fdc5b474329037f2aa18401bd73033e0ce0 upstream.

    After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
    the client gets back a ceph_x_authorize_reply, which it is supposed to
    verify to ensure the authenticity and protect against replay attacks.
    The code for doing this is there (ceph_x_verify_authorizer_reply(),
    ceph_auth_verify_authorizer_reply() + plumbing), but it is never
    invoked by the the messenger.

    AFAICT this goes back to 2009, when ceph authentication protocols
    support was added to the kernel client in 4e7a5dcd1bba ("ceph:
    negotiate authentication protocol; implement AUTH_NONE protocol").

    The second param of ceph_connection_operations::verify_authorizer_reply
    is unused all the way down. Pass 0 to facilitate backporting, and kill
    it in the next commit.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     

11 Nov, 2016

2 commits

  • osdc->last_linger_id is a counter for lreq->linger_id, which is used
    for watch cookies. Starting with a large integer should ease the task
    of telling apart kernel and userspace clients.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • If your data pool was pool 0, ceph_file_layout_from_legacy()
    transform that to -1 unconditionally, which broke upgrades.
    We only want do that for a fully zeroed ceph_file_layout,
    so that it still maps to a file_layout_t. If any fields
    are set, though, we trust the fl_pgpool to be a valid pool.

    Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
    Link: http://tracker.ceph.com/issues/17825
    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

19 Oct, 2016

1 commit


06 Oct, 2016

2 commits

  • Remove extra x1 variable, it's just temporary placeholder that
    clutters the code unnecessarily.

    Reflects ceph.git commit 0d19408d91dd747340d70287b4ef9efd89e95c6b.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Use __builtin_clz() supported by GCC and Clang to figure out
    how many bits we should shift instead of shifting by a bit
    in a loop until the value gets normalized. Improves performance
    of this function by up to 3x in worst-case scenario and overall
    straw2 performance by ~10%.

    Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

03 Oct, 2016

2 commits


25 Aug, 2016

10 commits


09 Aug, 2016

3 commits


28 Jul, 2016

6 commits


22 Jul, 2016

1 commit

  • Currently, osd_weight and osd_state fields are updated in the encoding
    order. This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

    Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
    applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
    on with the new_state update, we flip EXISTS and leave osd6 in a weird
    "!EXISTS but UP" state. A non-existent OSD is considered down by the
    mapping code

    2087 for (i = 0; i < pg->pg_temp.len; i++) {
    2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
    2089 if (ceph_can_shift_osds(pi))
    2090 continue;
    2091
    2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;

    and so requests get directed to the second OSD in the set instead of
    the first, resulting in OSD-side errors like:

    [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

    and hung rbds on the client:

    [ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
    [ 493.566805] rbd: rbd0: result -6 xferred 400000
    [ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

    The fix is to decouple application from the decoding and:
    - apply new_weight first
    - apply new_state before new_up_client
    - twiddle osd_state flags if marking in
    - clear out some of the state if osd is destroyed

    Fixes: http://tracker.ceph.com/issues/14901

    Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
    Cc: stable@vger.kernel.org # 3.15+
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Josh Durgin

    Ilya Dryomov