15 Dec, 2016

2 commits

  • Kill the wrapper and rename __finish_request() to finish_request().

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • r_safe_completion is currently, and has always been, signaled only if
    on-disk ack was requested. It's there for fsync and syncfs, which wait
    for in-flight writes to flush - all data write requests set ONDISK.

    However, the pool perm check code introduced in 4.2 sends a write
    request with only ACK set. An unfortunately timed syncfs can then hang
    forever: r_safe_completion won't be signaled because only an unsafe
    reply was requested.

    We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
    that is somewhat incomplete and yet another special case. Instead,
    rename this completion to r_done_completion and always signal it when
    the OSD client is done with the request, whether unsafe, safe, or
    error. This is a bit cleaner and helps with the cancellation code.

    Reported-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

13 Dec, 2016

16 commits

  • Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of
    crush_find_rule and crush_do_rule which are defined there. This fixes
    the following GCC warnings when building with 'W=1':

    net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes]
    net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes]

    Signed-off-by: Tobias Klauser
    [idryomov@gmail.com: corresponding !__KERNEL__ include]
    Signed-off-by: Ilya Dryomov

    Tobias Klauser
     
  • ->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
    ->check_message_signature() shouldn't be doing anything with or on the
    connection (like closing it or sending messages).

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • The length of the reply is protocol-dependent - for cephx it's
    ceph_x_authorize_reply. Nothing sensible can be passed from the
    messenger layer anyway.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
    the client gets back a ceph_x_authorize_reply, which it is supposed to
    verify to ensure the authenticity and protect against replay attacks.
    The code for doing this is there (ceph_x_verify_authorizer_reply(),
    ceph_auth_verify_authorizer_reply() + plumbing), but it is never
    invoked by the the messenger.

    AFAICT this goes back to 2009, when ceph authentication protocols
    support was added to the kernel client in 4e7a5dcd1bba ("ceph:
    negotiate authentication protocol; implement AUTH_NONE protocol").

    The second param of ceph_connection_operations::verify_authorizer_reply
    is unused all the way down. Pass 0 to facilitate backporting, and kill
    it in the next commit.

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • It's called during inital setup, when everything should be allocated
    with GFP_KERNEL.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • This is useless and more importantly not allowed on the writeback path,
    because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
    can recurse back into the filesystem:

    kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
    ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
    ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
    00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
    [] ? schedule+0x31/0x80
    [] ? schedule_preempt_disabled+0xa/0x10
    [] ? __mutex_lock_slowpath+0xb4/0x130
    [] ? mutex_lock+0x1b/0x30
    [] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
    [] ? move_active_pages_to_lru+0x125/0x270
    [] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
    [] ? __list_lru_walk_one.isra.3+0x33/0x120
    [] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
    [] ? super_cache_scan+0x17e/0x190
    [] ? shrink_slab.part.38+0x1e3/0x3d0
    [] ? shrink_node+0x10a/0x320
    [] ? do_try_to_free_pages+0xf4/0x350
    [] ? try_to_free_pages+0xea/0x1b0
    [] ? __alloc_pages_nodemask+0x61d/0xe60
    [] ? cache_grow_begin+0x9d/0x560
    [] ? fallback_alloc+0x148/0x1c0
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? __kmalloc+0x1eb/0x580
    [] ? crush_choose_firstn+0x3eb/0x470 [libceph]
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? crypto_spawn_tfm+0x39/0x60
    [] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
    [] ? __crypto_alloc_tfm+0xcc/0x130
    [] ? crypto_skcipher_init_tfm+0x113/0x180
    [] ? crypto_create_tfm+0x43/0xb0
    [] ? crypto_larval_lookup+0x150/0x150
    [] ? crypto_alloc_tfm+0x72/0x120
    [] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
    [] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
    [] ? release_sock+0x40/0x90
    [] ? tcp_recvmsg+0x4b4/0xae0
    [] ? ceph_encrypt2+0x54/0xc0 [libceph]
    [] ? ceph_x_encrypt+0x5d/0x90 [libceph]
    [] ? calcu_signature+0x5f/0x90 [libceph]
    [] ? ceph_x_sign_message+0x35/0x50 [libceph]
    [] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
    [] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
    [] ? queue_con_delay+0x33/0xd0 [libceph]
    [] ? __submit_request+0x20d/0x2f0 [libceph]
    [] ? ceph_osdc_start_request+0x28/0x30 [libceph]
    [] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
    [] ? process_one_work+0x160/0x410
    [] ? worker_thread+0x4d/0x480
    [] ? process_one_work+0x410/0x410
    [] ? kthread+0xcd/0xf0
    [] ? ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x190/0x190

    Allocating the cipher along with the key fixes the issue - as long the
    key doesn't change, a single cipher context can be used concurrently in
    multiple requests.

    We still can't take that GFP_KERNEL allocation though. Both
    ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
    GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

    Reported-by: Lucas Stach
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • - replace an ad-hoc array with a struct
    - rename to calc_signature() for consistency

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • It's going to be used as a temporary buffer for in-place en/decryption
    with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
    Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Starting with 4.9, kernel stacks may be vmalloced and therefore not
    guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
    option is enabled by default on x86. This makes it invalid to use
    on-stack buffers with the crypto scatterlist API, as sg_set_buf()
    expects a logical address and won't work with vmalloced addresses.

    There isn't a different (e.g. kvec-based) crypto API we could switch
    net/ceph/crypto.c to and the current scatterlist.h API isn't getting
    updated to accommodate this use case. Allocating a new header and
    padding for each operation is a non-starter, so do the en/decryption
    in-place on a single pre-assembled (header + data + padding) heap
    buffer. This is explicitly supported by the crypto API:

    "... the caller may provide the same scatter/gather list for the
    plaintext and cipher text. After the completion of the cipher
    operation, the plaintext data is replaced with the ciphertext data
    in case of an encryption and vice versa for a decryption."

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Since commit 0a990e709356 ("ceph: clean up service ticket decoding"),
    th->session_key isn't assigned until everything is decoded.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Pass what's going to be encrypted - that's msg_b, not ticket_blob.
    ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
    the maxlen calculation, but makes it a bit clearer.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

11 Nov, 2016

2 commits

  • osdc->last_linger_id is a counter for lreq->linger_id, which is used
    for watch cookies. Starting with a large integer should ease the task
    of telling apart kernel and userspace clients.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • If your data pool was pool 0, ceph_file_layout_from_legacy()
    transform that to -1 unconditionally, which broke upgrades.
    We only want do that for a fully zeroed ceph_file_layout,
    so that it still maps to a file_layout_t. If any fields
    are set, though, we trust the fl_pgpool to be a valid pool.

    Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
    Link: http://tracker.ceph.com/issues/17825
    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

19 Oct, 2016

1 commit


06 Oct, 2016

2 commits

  • Remove extra x1 variable, it's just temporary placeholder that
    clutters the code unnecessarily.

    Reflects ceph.git commit 0d19408d91dd747340d70287b4ef9efd89e95c6b.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Use __builtin_clz() supported by GCC and Clang to figure out
    how many bits we should shift instead of shifting by a bit
    in a loop until the value gets normalized. Improves performance
    of this function by up to 3x in worst-case scenario and overall
    straw2 performance by ~10%.

    Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

03 Oct, 2016

2 commits


25 Aug, 2016

10 commits


09 Aug, 2016

3 commits


28 Jul, 2016

2 commits