27 Feb, 2010

6 commits

  • Unlike ocfs2, dlmfs has no permanent storage. It can't store off a
    cluster stack it is supposed to be using. So it can't specify the stack
    name in ocfs2_cluster_connect().

    Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses
    the stack that is currently enabled. This is find for dlmfs, which will
    rely on the stack initialization.

    We add the "stackglue" capability to dlmfs's capability list. This lets
    userspace know dlmfs can be used with all cluster stacks.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Inside the stackglue, the locking protocol structure is hanging off of
    the ocfs2_cluster_connection. This takes it one further; the locking
    protocol is passed into ocfs2_cluster_connect(). Now different cluster
    connections can have different locking protocols with distinct asts.
    Note that all locking protocols have to keep their maximum protocol
    version in lock-step.

    With the protocol structure set in ocfs2_cluster_connect(), there is no
    need for the stackglue to have a static pointer to a specific protocol
    structure. We can change initialization to only pass in the maximum
    protocol version.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • With the full ocfs2_locking_protocol hanging off of the
    ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers
    there. They don't need to get them from their plugin structure.

    The user plugin still needs the maximum locking protocol version,
    though. This changes the plugin structure so that it only holds the max
    version, not the entire ocfs2_locking_protocol pointer.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we
    have access to it in the ast and bast wrapper functions. Attach the
    ocfs2_locking_protocol to the conn.

    Now, instead of refering to a static variable for ast/bast pointers, the
    wrappers can look at the connection. This means different connections
    can have different ast/bast pointers, and it reduces the need for the
    static pointer.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We're going to want it in the ast functions, so we convert union
    ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • The stackglue ast and bast functions tried to maintain the fiction that
    their arguments were void pointers. In reality, stack_user.c had to
    know that the argument was an ocfs2_lock_res in order to get the status
    off of the lksb. That's ugly.

    This changes stackglue to always pass the lksb as the argument to ast
    and bast functions. The caller can always use container_of() to get the
    ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
    zero. They still get back the lockres in their ast. stackglue gets
    cleaner, and now can use the lksb itself.

    Signed-off-by: Joel Becker

    Joel Becker
     

19 Nov, 2009

1 commit


12 Nov, 2009

1 commit


23 Jun, 2009

1 commit

  • The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
    the DLM cannot reconstruct its state. Clients of the DLM need to know
    this.

    ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
    track of the state. This is not a standard behavior, but ocfs2 has
    always relied on it. Thus, an o2dlm LVB is always "valid".

    ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
    fs/dlm loses track of an LVBs state, it sets a flag
    (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
    the LVB may be garbage or merely stale.

    ocfs2 doesn't want to try to guess at the validity of the stale LVB.
    Instead, it should be checking the VALNOTVALID flag. As this is the
    'standard' way of treating LVBs, we will promote this behavior.

    We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
    the LVB is valid. o2dlm will always return valid, while fs/dlm will
    check VALNOTVALID.

    Signed-off-by: Joel Becker
    Acked-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

2 commits

  • ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
    one value.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This is actually pretty easy since fs/dlm already handles the bulk of the
    work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
    underlying lock manager, so I only had to add the right calls.

    Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
    UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
    Internally, the file system uses two sets of file_operations, depending on
    whether cluster aware plocks is required. This turns out to be easier than
    implementing local-only versions of ->lock.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

25 Aug, 2008

1 commit


17 Jun, 2008

3 commits


18 Apr, 2008

16 commits

  • Add code to use fs/dlm.

    [ Modified to be part of the stack_user module -- Joel ]

    Signed-off-by: David Teigland
    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • Userspace can now query and specify the cluster stack in use via the
    /sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
    the classic stack. Thus, old tools that do not know how to modify this
    file will work just fine. The stack cannot be modified if there is a
    live filesystem.

    ocfs2_cluster_connect() now takes the expected cluster stack as an
    argument. This way, the filesystem and the stack glue ensure they are
    speaking to the same backend.

    If the stack is 'o2cb', the o2cb stack plugin is used. For any other
    value, the fsdlm stack plugin is selected.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Introduce a set of sysfs files that describe the current stack glue
    state. The files live under /sys/fs/ocfs2. The locking_protocol file
    displays the version of ocfs2's locking code. The
    loaded_cluster_plugins file displays all of the currently loaded stack
    plugins. When filesystems are mounted, the active_cluster_plugin file
    will display the plugin in use.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We define the ocfs2_stack_plugin structure to represent a stack driver.
    The o2cb stack code is split into stack_o2cb.c. This becomes the
    ocfs2_stack_o2cb.ko module.

    The stackglue generic functions are similarly split into the
    ocfs2_stackglue.ko module. This module now provides an interface to
    register drivers. The ocfs2_stack_o2cb driver registers itself. As
    part of this interface, ocfs2_stackglue can load drivers on demand.
    This is accomplished in ocfs2_cluster_connect().

    ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
    If a hangup is pending, it will not release the driver module and will
    let _hangup() do that.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Define the ocfs2_stack_operations structure. Build o2cb_stack_ops from
    all of the o2cb-specific stack functions. Change the generic stack glue
    functions to call the stack_ops instead of the o2cb functions directly.

    The o2cb functions are moved to stack_o2cb.c. The headers are cleaned up
    to where only needed headers are included.

    In this code, stackglue.c and stack_o2cb.c refer to some shared
    extern variables. When they become modules, that will change.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Split off the o2cb-specific funtionality from the generic stack glue
    calls. This is a precurser to wrapping the o2cb functionality in an
    operations vector.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The stack glue initialization function needs a better name so that it can be
    used cleanly when stackglue becomes a module.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's
    create a generic ocfs2_dlm_dump_lksb() function. This allows underlying
    DLMs to print whatever they want about their lock.

    We then move the o2dlm dump into stackglue.c where it belongs.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • o2dlm has the non-standard behavior of providing a cancel callback
    (unlock_ast) even when the cancel has failed (the locking operation
    succeeded without canceling). This is called CANCELGRANT after the
    status code sent to the callback. fs/dlm does not provide this
    callback, so dlmglue must be changed to live without it.
    o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

    Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
    needs to check for it. ocfs2_locking_ast() must catch that a cancel was
    tried and clear the cancel state.

    Making these changes opens up a locking race. dlmglue uses the the
    OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
    one time. But dlmglue must unlock the lockres before calling into the
    dlm. In the small window of time between unlocking the lockres and
    calling the dlm, the downconvert thread can try to cancel the lock. The
    downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
    know that ocfs2_dlm_lock() has not yet been called.

    Because ocfs2_dlm_lock() has not yet been called, the cancel operation
    will just be a no-op. There's nothing to cancel. With CANCELGRANT,
    dlmglue uses the CANCELGRANT callback to clear up the cancel state.
    When it comes around again, it will retry the cancel. Eventually, the
    first thread will have called into ocfs2_dlm_lock(), and either the
    lock or the cancel will succeed. The downconvert thread can then do its
    downconvert.

    Without CANCELGRANT, there is nothing to clean up the cancellation
    state. The downconvert thread does not know to retry its operations.
    More importantly, the original lock may be blocking on the other node
    that is trying to cancel us. With neither able to make progress, the
    ast is never called and the cancellation state is never cleaned up that
    way. dlmglue is deadlocked.

    The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
    set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
    can check whether the lock is cancelable. If not, it just loops around
    to try again. Once ocfs2_dlm_lock() is called, the thread then clears
    OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
    downconvert thread finds the lock BUSY, it can safely try to cancel it.
    Whether the cancel works or not, the state will be properly set and the
    lock processing can continue.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The last bit of classic stack used directly in ocfs2 code is o2hb.
    Specifically, the check for heartbeat during mount and the call to
    ocfs2_hb_ctl during unmount.

    We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
    to ocfs2_hb_ctl. Other stacks will just leave hangup() empty.

    The check for heartbeat is moved into ocfs2_cluster_connect(). It will
    be matched by a similar check for other stacks.

    With this change, only stackglue.c includes cluster/ headers.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2 asks the cluster stack for the local node's node number for two
    reasons; to fill the slot map and to print it. While the slot map isn't
    necessary for userspace cluster stacks, the printing is very nice for
    debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
    this value. It is anticipated that the slot map will not be used under a
    userspace cluster stack, so validity checks of the node num only need to
    exist in the slot map code. Otherwise, it just gets used and printed as an
    opaque value.

    [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
    truly opaque. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This step introduces a cluster stack agnostic API for initializing and
    exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
    connect to the stack. It is all handled in stackglue.c.

    heartbeat.c no longer needs to know how it gets called.
    ocfs2_do_node_down() is now a clean recovery trigger.

    The big gotcha is the ordering of initializations and de-initializations done
    underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
    o2dlm initialization in one block. Thus, the o2dlm functionality of
    ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
    however, did a few things between de-registration of the eviction
    callback and actually shutting down the domain. Now de-registration and
    shutdown of the domain are wrapped within the single
    ocfs2_cluster_disconnect() call. I've checked the code paths to make
    sure we can safely tear down things in ocfs2_dlm_shutdown() before
    calling ocfs2_cluster_disconnect(). The filesystem has already set
    itself to ignore the callback.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Wrap the lock status block (lksb) in a union. Later we will add a union
    element for the fs/dlm lksb. Create accessors for the status and lvb
    fields.

    Other than a debugging function, dlmglue.c does not directly reference
    the o2dlm locking path anymore.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
    This is the first step towards elminiating dlm_status in
    fs/ocfs2/dlmglue.c. The change also passes -errno values to
    ->unlock_ast().

    [ Fix a return code in dlmglue.c and change the error translation table into
    an array of ints. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 generic code should use the values in .
    stackglue.c will convert them to o2dlm values.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This is the first in a series of patches to isolate ocfs2 from the
    underlying cluster stack. Here we wrap the dlm locking functions with
    ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
    callbacks, we can eliminate the callbacks from the filesystem visible
    functions.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker