18 Dec, 2014

7 commits


14 Nov, 2014

4 commits

  • No reason to use BUG_ON for osd request list assertions.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • kick_requests() can put linger requests on the notarget list. This
    means we need to clear the much-overloaded req->r_req_lru_item in
    __unregister_linger_request() as well, or we get an assertion failure
    in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

    AFAICT the assumption was that registered linger requests cannot be on
    any of req->r_req_lru_item lists, but that's clearly not the case.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Requests have to be unlinked from both osd->o_requests (normal
    requests) and osd->o_linger_requests (linger requests) lists when
    clearing req->r_osd. Otherwise __unregister_linger_request() gets
    confused and we trip over a !list_empty(&osd->o_linger_requests)
    assert in __remove_osd().

    MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
    tickets will have their buffers vmalloc'ed, which leads to the
    following crash in crypto:

    [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
    [ 28.686032] IP: [] scatterwalk_pagedone+0x22/0x80
    [ 28.686032] PGD 0
    [ 28.688088] Oops: 0000 [#1] PREEMPT SMP
    [ 28.688088] Modules linked in:
    [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
    [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 28.688088] Workqueue: ceph-msgr con_work
    [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
    [ 28.688088] RIP: 0010:[] [] scatterwalk_pagedone+0x22/0x80
    [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286
    [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
    [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
    [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
    [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
    [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
    [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
    [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
    [ 28.688088] Stack:
    [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
    [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
    [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
    [ 28.688088] Call Trace:
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] scatterwalk_done+0x38/0x40
    [ 28.688088] [] blkcipher_walk_done+0x182/0x220
    [ 28.688088] [] crypto_cbc_encrypt+0x15f/0x180
    [ 28.688088] [] ? crypto_aes_set_key+0x30/0x30
    [ 28.688088] [] ceph_aes_encrypt2+0x29c/0x2e0
    [ 28.688088] [] ceph_encrypt2+0x93/0xb0
    [ 28.688088] [] ceph_x_encrypt+0x4a/0x60
    [ 28.688088] [] ? ceph_buffer_new+0x5d/0xf0
    [ 28.688088] [] ceph_x_build_authorizer.isra.6+0x297/0x360
    [ 28.688088] [] ? kmem_cache_alloc_trace+0x11b/0x1c0
    [ 28.688088] [] ? ceph_auth_create_authorizer+0x36/0x80
    [ 28.688088] [] ceph_x_create_authorizer+0x63/0xd0
    [ 28.688088] [] ceph_auth_create_authorizer+0x54/0x80
    [ 28.688088] [] get_authorizer+0x80/0xd0
    [ 28.688088] [] prepare_write_connect+0x18b/0x2b0
    [ 28.688088] [] try_read+0x1e59/0x1f10

    This is because we set up crypto scatterlists as if all buffers were
    kmalloc'ed. Fix it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

01 Nov, 2014

1 commit

  • Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len")
    while fixing a buffer overlow tried to keep the same as much of the
    surrounding code as possible and introduced an unnecessary kmalloc() in
    the unencrypted ticket path. It is likely to fail on huge tickets, so
    get rid of it.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

30 Oct, 2014

1 commit

  • This patch has ceph's lib code use the memalloc flags.

    If the VM layer needs to write data out to free up memory to handle new
    allocation requests, the block layer must be able to make forward progress.
    To handle that requirement we use structs like mempools to reserve memory for
    objects like bios and requests.

    The problem is when we send/receive block layer requests over the network
    layer, net skb allocations can fail and the system can lock up.
    To solve this, the memalloc related flags were added. NBD, iSCSI
    and NFS uses these flags to tell the network/vm layer that it should
    use memory reserves to fullfill allcation requests for structs like
    skbs.

    I am running ceph in a bunch of VMs in my laptop, so this patch was
    not tested very harshly.

    Signed-off-by: Mike Christie
    Reviewed-by: Ilya Dryomov

    Mike Christie
     

15 Oct, 2014

11 commits

  • Pull Ceph updates from Sage Weil:
    "There is the long-awaited discard support for RBD (Guangliang Zhao,
    Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
    (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
    performance and debugging improvements (Yan, Zheng, John Spray), and a
    smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
    ceph: fix divide-by-zero in __validate_layout()
    rbd: rbd workqueues need a resque worker
    libceph: ceph-msgr workqueue needs a resque worker
    ceph: fix bool assignments
    libceph: separate multiple ops with commas in debugfs output
    libceph: sync osd op definitions in rados.h
    libceph: remove redundant declaration
    ceph: additional debugfs output
    ceph: export ceph_session_state_name function
    ceph: include the initial ACL in create/mkdir/mknod MDS requests
    ceph: use pagelist to present MDS request data
    libceph: reference counting pagelist
    ceph: fix llistxattr on symlink
    ceph: send client metadata to MDS
    ceph: remove redundant code for max file size verification
    ceph: remove redundant io_iter_advance()
    ceph: move ceph_find_inode() outside the s_mutex
    ceph: request xattrs if xattr_version is zero
    rbd: set the remaining discard properties to enable support
    rbd: use helpers to handle discard for layered images correctly
    ...

    Linus Torvalds
     
  • Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
    effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
    wrong - libceph is very much a memory reclaim path, so restore it.

    Cc: stable@vger.kernel.org # needs backporting for < 3.12
    Signed-off-by: Ilya Dryomov
    Tested-by: Micha Krause
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • For requests with multiple ops, separate ops with commas instead of \t,
    which is a field separator here.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Bring in missing osd ops and strings, use macros to eliminate multiple
    points of maintenance.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • this allow pagelist to present data that may be sent multiple times.

    Signed-off-by: Yan, Zheng
    Reviewed-by: Sage Weil

    Yan, Zheng
     
  • queue_work() doesn't "fail to queue", it returns false if work was
    already on a queue, which can't happen here since we allocate
    event_work right before we queue it. So don't bother at all.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Use the more common pr_warn.

    Other miscellanea:

    o Coalesce formats
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: Ilya Dryomov

    Joe Perches
     
  • If the state variable is krealloced successfully, map->osd_state will be
    freed, once following two reallocation failed, and exit the function
    without resetting map->osd_state, map->osd_state become a wild pointer.

    fix it by resetting them after krealloc successfully.

    Signed-off-by: Li RongQing
    Signed-off-by: Ilya Dryomov

    Li RongQing
     
  • We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just
    CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount
    with

    libceph: error -2 building auth method x request

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Both not yet registered (r_linger && list_empty(&r_linger_item)) and
    registered linger requests should use the new tid on resend to avoid
    the dup op detection logic on the OSDs, yet we were doing this only for
    "registered" case. Factor out and simplify the "registered" logic and
    use the new helper for "not registered" case as well.

    Fixes: http://tracker.ceph.com/issues/8806

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • Introduce __enqueue_request() and switch to it.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     

12 Oct, 2014

1 commit

  • Pull security subsystem updates from James Morris.

    Mostly ima, selinux, smack and key handling updates.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
    integrity: do zero padding of the key id
    KEYS: output last portion of fingerprint in /proc/keys
    KEYS: strip 'id:' from ca_keyid
    KEYS: use swapped SKID for performing partial matching
    KEYS: Restore partial ID matching functionality for asymmetric keys
    X.509: If available, use the raw subjKeyId to form the key description
    KEYS: handle error code encoded in pointer
    selinux: normalize audit log formatting
    selinux: cleanup error reporting in selinux_nlmsg_perm()
    KEYS: Check hex2bin()'s return when generating an asymmetric key ID
    ima: detect violations for mmaped files
    ima: fix race condition on ima_rdwr_violation_check and process_measurement
    ima: added ima_policy_flag variable
    ima: return an error code from ima_add_boot_aggregate()
    ima: provide 'ima_appraise=log' kernel option
    ima: move keyring initialization to ima_init()
    PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
    PKCS#7: Better handling of unsupported crypto
    KEYS: Overhaul key identification when searching for asymmetric keys
    KEYS: Implement binary asymmetric key ID handling
    ...

    Linus Torvalds
     

17 Sep, 2014

1 commit

  • A previous patch added a ->match_preparse() method to the key type. This is
    allowed to override the function called by the iteration algorithm.
    Therefore, we can just set a default that simply checks for an exact match of
    the key description with the original criterion data and allow match_preparse
    to override it as needed.

    The key_type::match op is then redundant and can be removed, as can the
    user_match() function.

    Signed-off-by: David Howells
    Acked-by: Vivek Goyal

    David Howells
     

11 Sep, 2014

3 commits

  • We hard code cephx auth ticket buffer size to 256 bytes. This isn't
    enough for any moderate setups and, in case tickets themselves are not
    encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
    ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the
    buffer is allocated dynamically anyway, allocated it a bit later, at
    the point where we know how much is going to be needed.

    Fixes: http://tracker.ceph.com/issues/8979

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • Add a helper for processing individual cephx auth tickets. Needed for
    the next commit, which deals with allocating ticket buffers. (Most of
    the diff here is whitespace - view with git diff -b).

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • We preallocate a few of the message types we get back from the mon. If we
    get a larger message than we are expecting, fall back to trying to allocate
    a new one instead of blindly using the one we have.

    CC: stable@vger.kernel.org
    Signed-off-by: Sage Weil
    Reviewed-by: Ilya Dryomov

    Sage Weil
     

14 Aug, 2014

1 commit

  • Pull Ceph updates from Sage Weil:
    "There is a lot of refactoring and hardening of the libceph and rbd
    code here from Ilya that fix various smaller bugs, and a few more
    important fixes with clone overlap. The main fix is a critical change
    to the request_fn handling to not sleep that was exposed by the recent
    mutex changes (which will also go to the 3.16 stable series).

    Yan Zheng has several fixes in here for CephFS fixing ACL handling,
    time stamps, and request resends when the MDS restarts.

    Finally, there are a few cleanups from Himangi Saraogi based on
    Coccinelle"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
    libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
    rbd: remove extra newlines from rbd_warn() messages
    rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC
    rbd: rework rbd_request_fn()
    ceph: fix kick_requests()
    ceph: fix append mode write
    ceph: fix sizeof(struct tYpO *) typo
    ceph: remove redundant memset(0)
    rbd: take snap_id into account when reading in parent info
    rbd: do not read in parent info before snap context
    rbd: update mapping size only on refresh
    rbd: harden rbd_dev_refresh() and callers a bit
    rbd: split rbd_dev_spec_update() into two functions
    rbd: remove unnecessary asserts in rbd_dev_image_probe()
    rbd: introduce rbd_dev_header_info()
    rbd: show the entire chain of parent images
    ceph: replace comma with a semicolon
    rbd: use rbd_segment_name_free() instead of kfree()
    ceph: check zero length in ceph_sync_read()
    ceph: reset r_resend_mds after receiving -ESTALE
    ...

    Linus Torvalds
     

09 Aug, 2014

1 commit

  • Determining ->last_piece based on the value of ->page_offset + length
    is incorrect because length here is the length of the entire message.
    ->last_piece set to false even if page array data item length is /dev/null
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    # rbd_resize calls librbd rbd_resize(), size is in bytes
    ./rbd_resize bar $(((4 << 20) + 512))
    rbd resize --size 10 bar
    BAR_DEV=$(rbd map bar)
    # trigger a 512-byte copyup -- 512-byte page array data item
    dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5

    The problem exists only in ceph_msg_data_pages_cursor_init(),
    ceph_msg_data_pages_advance() does the right thing. The size_t cast is
    unnecessary.

    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     

23 Jul, 2014

2 commits


08 Jul, 2014

7 commits