14 Jan, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup when trying to mount inexistent image
    net/ceph: make ceph_msgr_wq non-reentrant
    ceph: fsc->*_wq's aren't used in memory reclaim path
    ceph: Always free allocated memory in osdmap_decode()
    ceph: Makefile: Remove unnessary code
    ceph: associate requests with opening sessions
    ceph: drop redundant r_mds field
    ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
    ceph: add dir_layout to inode

    Linus Torvalds
     

13 Jan, 2011

3 commits

  • ceph messenger code does a rather complex dancing around multithread
    workqueue to make sure the same work item isn't executed concurrently
    on different CPUs. This restriction can be provided by workqueue with
    WQ_NON_REENTRANT.

    Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
    level and remove the QUEUED/BUSY logic.

    * This removes backoff handling in con_work() but it couldn't reliably
    block execution of con_work() to begin with - queue_con() can be
    called after the work started but before BUSY is set. It seems that
    it was an optimization for a rather cold path and can be safely
    removed.

    * The number of concurrent work items is bound by the number of
    connections and connetions are independent from each other. With
    the default concurrency level, different connections will be
    executed independently.

    Signed-off-by: Tejun Heo
    Cc: Sage Weil
    Cc: ceph-devel@vger.kernel.org
    Signed-off-by: Sage Weil

    Tejun Heo
     
  • Always free memory allocated to 'pi' in
    net/ceph/osdmap.c::osdmap_decode().

    Signed-off-by: Jesper Juhl
    Signed-off-by: Sage Weil

    Jesper Juhl
     
  • Add a ceph_dir_layout to the inode, and calculate dentry hash values based
    on the parent directory's specified dir_hash function. This is needed
    because the old default Linux dcache hash function is extremely week and
    leads to a poor distribution of files among dir fragments.

    Signed-off-by: Sage Weil

    Sage Weil
     

27 Dec, 2010

1 commit


21 Dec, 2010

1 commit


18 Dec, 2010

2 commits


14 Dec, 2010

1 commit


09 Dec, 2010

1 commit


30 Nov, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    af_unix: limit recursion level
    pch_gbe driver: The wrong of initializer entry
    pch_gbe dreiver: chang author
    ucc_geth: fix ucc halt problem in half duplex mode
    inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
    ehea: Add some info messages and fix an issue
    hso: fix disable_net
    NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty
    cxgb4vf: fix setting unicast/multicast addresses ...
    net, ppp: Report correct error code if unit allocation failed
    DECnet: don't leak uninitialized stack byte
    au1000_eth: fix invalid address accessing the MAC enable register
    dccp: fix error in updating the GAR
    tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
    netns: Don't leak others' openreq-s in proc
    Net: ceph: Makefile: Remove unnessary code
    vhost/net: fix rcu check usage
    econet: fix CVE-2010-3848
    econet: fix CVE-2010-3850
    econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
    ...

    Linus Torvalds
     

28 Nov, 2010

1 commit


24 Nov, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    of/phylib: Use device tree properties to initialize Marvell PHYs.
    phylib: Add support for Marvell 88E1149R devices.
    phylib: Use common page register definition for Marvell PHYs.
    qlge: Fix incorrect usage of module parameters and netdev msg level
    ipv6: fix missing in6_ifa_put in addrconf
    SuperH IrDA: correct Baud rate error correction
    atl1c: Fix hardware type check for enabling OTP CLK
    net: allow GFP_HIGHMEM in __vmalloc()
    bonding: change list contact to netdev@vger.kernel.org
    e1000: fix screaming IRQ

    Linus Torvalds
     

23 Nov, 2010

1 commit


22 Nov, 2010

1 commit

  • We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

    In ceph, add the missing flag.

    In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
    cleaner and allows using HIGHMEM pages as well.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Nov, 2010

3 commits

  • The alignment used for reading data into or out of pages used to be taken
    from the data_off field in the message header. This only worked as long
    as the page alignment matched the object offset, breaking direct io to
    non-page aligned offsets.

    Instead, explicitly specify the page alignment next to the page vector
    in the ceph_msg struct, and use that instead of the message header (which
    probably shouldn't be trusted). The alloc_msg callback is responsible for
    filling in this field properly when it sets up the page vector.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We used to infer alignment of IOs within a page based on the file offset,
    which assumed they matched. This broke with direct IO that was not aligned
    to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
    specified in the OSD reply, which could have been adjusted by the server.

    Explicitly specify the page alignment when setting up OSD IO requests.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The offset/length arguments aren't used.

    Signed-off-by: Sage Weil

    Sage Weil
     

02 Nov, 2010

1 commit

  • If the client gets out of sync with the server message sequence number, we
    normally skip low seq messages (ones we already received). The skip code
    was also incrementing the expected seq, such that all subsequent messages
    also appeared old and got skipped, and an eventual timeout on the osd
    connection. This resulted in some lagging requests and console messages
    like

    [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
    [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
    [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
    [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
    [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
    [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
    [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
    [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
    [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
    [233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd

    Reported-by: Theodore Ts'o
    Signed-off-by: Sage Weil

    Sage Weil
     

21 Oct, 2010

5 commits

  • Decrement the free page counter when removing a page from the free_list.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • This only happened when parse_extra_token was not passed
    to ceph_parse_option() (hence, only happened in rbd).

    Signed-off-by: Yehuda Sadeh

    Yehuda Sadeh
     
  • These facilitate preallocation of pages so that we can encode into the pagelist
    in an atomic context.

    Signed-off-by: Greg Farnum
    Signed-off-by: Sage Weil

    Greg Farnum
     
  • The rados block device (rbd), based on osdblk, creates a block device
    that is backed by objects stored in the Ceph distributed object storage
    cluster. Each device consists of a single metadata object and data
    striped over many data objects.

    The rbd driver supports read-only snapshots.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh