23 Dec, 2010

4 commits


21 Sep, 2010

1 commit


26 Jan, 2010

1 commit


23 Aug, 2008

1 commit


31 May, 2008

1 commit


18 Apr, 2008

2 commits

  • This patch exposes o2net information via debugfs. The information includes
    the list of sockets (sock_containers) as well as the list of outstanding
    messages (send_tracking). Useful for o2dlm debugging.

    (This patch is derived from an earlier one written by Zach Brown that
    exposed the same information via /proc.)

    [Mark: checkpatch fixes]

    Signed-off-by: Sunil Mushran
    Reviewed-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • Currently, o2net connects to a node on hb_up and disconnects on
    hb_down and net timeout.

    It disconnects on net timeout is ok, but it should attempt to
    reconnect back. This is because sometimes nodes get overloaded
    enough that the network connection breaks but the disk hb does not.
    And if we get into that situation, we either fence (unnecessarily)
    or wait for its disk hb to die (and sometimes hang in the process).

    So in this updated scheme, when the network disconnects, we keep
    attempting to reconnect till we succeed or we get a disk hb down
    event.

    If the other node is really dead, then we will eventually get a
    node down event. If not, we should be able to connect again and
    continue.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     

07 Feb, 2008

1 commit

  • Currently, when ocfs2 nodes connect via TCP, they advertise their
    compatibility level. If the versions do not match, two nodes cannot speak
    to each other and they disconnect. As a result, this provides no forward or
    backwards compatibility.

    This patch implements a simple protocol negotiation at the dlm level by
    introducing a major/minor version number scheme for entities that
    communicate. Specifically, o2dlm has a major/minor version for interaction
    with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
    interacting with the filesystem on other nodes.

    This will allow rolling upgrades of ocfs2 clusters when changes to the
    locking or network protocols can be done in a backwards compatible manner.
    In those cases, only the minor number is changed and the negotatied protocol
    minor is returned from dlm join. In the far less likely event that a
    required protocol change makes backwards compatibility impossible, we simply
    bump the major number.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

26 Jan, 2008

2 commits

  • The meta lock now covers both meta data and data, so this just removes the
    now-redundant data lock.

    Combining locks saves us a round of lock mastery per inode and one less lock
    to ping between nodes during read/write.

    We don't lose much - since meta locks were always held before a data lock
    (and at the same level) ordered writeout mode (the default) ensured that
    flushing for the meta data lock also pushed out data anyways.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The node maps that are set/unset by these votes are no longer relevant, thus
    we can remove the mount and umount votes. Since those are the last two
    remaining votes, we can also remove the entire vote infrastructure.

    The vote thread has been renamed to the downconvert thread, and the small
    amount of functionality related to managing it has been moved into
    fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

27 Apr, 2007

1 commit

  • Ocfs2 currently does cluster-wide node messaging to check the open state of
    an inode during delete. This patch removes that mechanism in favor of an
    inode cluster lock which is taken at shared read when an inode is first read
    and dropped in clear_inode(). This allows a deleting node to test the
    liveness of an inode by attempting to take an exclusive lock.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     

08 Feb, 2007

4 commits

  • When there is a lot of multithreaded I/O usage, two threads can collide
    while sending out a message to the other nodes. This is due to the lack of
    locking between threads while sending out the messages.

    When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
    kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
    itself by acquiring a lock at invocation by calling lock_sock().
    tcp_sendmsg() then loops over the buffers in the iovec, allocating
    associated sk_buff's and cache pages for use in the actual send. As it does
    so, it pushes the data out to tcp for actual transmission. However, if one
    of those allocation fails (because a large number of large sends is being
    processed, for example), it must wait for memory to become available. It
    does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
    eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
    contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
    contains the call to release_sock().

    The following patch adds a lock to the socket container in order to
    properly serialize outbound requests.

    From: Zhen Wei
    Acked-by: Jeff Mahoney
    Signed-off-by: Mark Fasheh

    Zhen Wei
     
  • There is a small window where a joining node may not see the node(s) that
    just died but are still part of the domain. To fix this, we must disallow
    join requests if the joining node has a different node map.

    A new field node_map is added to dlm_query_join_request to send the current
    nodes nodemap along with join request. On the receiving end the nodes that
    are part of the cluster verifies if this new node sees all the nodes that
    are still part of the cluster. They disallow the join if the maps mismatch.

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Mark Fasheh

    Srinivas Eeda
     
  • Currently o2net allows one handler function per message type. This
    patch adds the ability to call another function to be called after
    the handler has returned the message to the other node.

    Handlers are now given the option of returning a context (in the form of a
    void **) which will be passed back into the post message handler function.

    Signed-off-by: Kurt Hackel
    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Kurt Hackel
     
  • This was previously broken and migration of some locks had to be temporarily
    disabled. We use a new (and backward-incompatible) set of network messages
    to account for all references to a lock resources held across the cluster.
    once these are all freed, the master node may then free the lock resource
    memory once its local references are dropped.

    Signed-off-by: Kurt Hackel
    Signed-off-by: Mark Fasheh

    Kurt Hackel
     

12 Dec, 2006

1 commit

  • Modify the OCFS2 handshake to ensure essential timeouts are configured
    identically on all nodes.

    Only allow changes when there are no connected peers

    Improves the logic in o2net_advance_rx() which broke now that
    sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

    Included is the field for userspace-heartbeat timeout to avoid the need for
    further protocol changes.

    Uses a global spinlock to ensure the decisions to update configfs entries
    are made on the correct value. The region covered by the spinlock when
    incrementing the counter is much larger as this is the more critical case.

    Small cleanup contributed by Adrian Bunk

    Signed-off-by: Andrew Beekhof
    Signed-off-by: Mark Fasheh

    Andrew Beekhof
     

08 Dec, 2006

1 commit


22 Nov, 2006

1 commit


25 Sep, 2006

2 commits

  • OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
    Typically, i_generation is encoded in the lock name so that a deleted inode
    on and a new one in the same block don't share the same lvb.

    Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
    is potentially thrown away as soon as the meta data lock is taken - we
    cannot encode the lock name without first knowing i_generation, which
    requires a disk read.

    This patch encodes i_generation in the inode meta data lvb, and removes the
    value from the inode meta data lock name. This way, the read can be covered
    by a lock, and at the same time we can distinguish between an up to date and
    a stale LVB.

    This will help cold-cache stat(2) performance in particular.

    Since this patch changes the protocol version, we take the opportunity to do
    a minor re-organization of two of the LVB fields.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Actually replace the vote calls with the new dentry operations. Make any
    necessary adjustments to get the scheme to work.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

04 Jan, 2006

1 commit