20 Jan, 2017

1 commit

  • commit e7ee2c089e94067d68475990bdeed211c8852917 upstream.

    The crash happens rather often when we reset some cluster nodes while
    nodes contend fiercely to do truncate and append.

    The crash backtrace is below:

    dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
    dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
    ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
    ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
    ocfs2: Beginning quota recovery on device (253,18) for slot 2
    ocfs2: Finishing quota recovery on device (253,18) for slot 2
    (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
    (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
    ------------[ cut here ]------------
    kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: No, Unsupported modules are loaded
    CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
    task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
    RIP: 0010:[] [] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
    RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282
    RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
    R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
    R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
    FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
    Call Trace:
    ocfs2_setattr+0x698/0xa90 [ocfs2]
    notify_change+0x1ae/0x380
    do_truncate+0x5e/0x90
    do_sys_ftruncate.constprop.11+0x108/0x160
    entry_SYSCALL_64_fastpath+0x12/0x6d
    Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
    RIP [] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

    It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
    not equal to the disk i_size. We mistakenly trust the LVB because the
    underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
    DLM_SBF_VALNOTVALID properly for us. But, why?

    The current code tries to downconvert lock without DLM_LKF_VALBLK flag
    to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
    if the lock resource type needs LVB. This is not the right way for
    fsdlm.

    The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
    DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
    DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
    this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
    failure happens.

    The following diagram briefly illustrates how this crash happens:

    RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

    The 1st round:

    Node1 Node2
    RSB1: PR
    RSB1(master): NULL->EX
    ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
    ocfs2_dlm_lock(no DLM_LKF_VALBLK)

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    dlm_lock(no DLM_LKF_VALBLK)
    convert_lock(overwrite lkb->lkb_exflags
    with no DLM_LKF_VALBLK)

    RSB1: NULL RSB1: EX
    reset Node2
    dlm_recover_rsbs()
    recover_lvb()

    /* The LVB is not trustable if the node with EX fails and
    * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
    */

    if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
    return; * to invalid the LVB here.
    */

    The 2nd round:

    Node 1 Node2
    RSB1(become master from recovery)

    ocfs2_setattr()
    ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
    ocfs2_truncate_file()
    mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */

    The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag
    for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
    is uesed.

    Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric Ren
     

27 Jul, 2016

1 commit

  • Obviously, memset() has zeroed the whole struct locking_max_version.
    So, it's no need to zero its two fields individually.

    Link: http://lkml.kernel.org/r/1463970605-18354-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: Joseph Qi
    Reviewed-by: Gang He
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Ren
     

23 Mar, 2016

1 commit

  • When there are errors in the ocfs2 filesystem, they are usually
    accompanied by the inode number which caused the error. This inode
    number would be the input to fixing the file. One of these options
    could be considered:

    A file in the sys filesytem which would accept inode numbers. This
    could be used to communication back what has to be fixed or is fixed.
    You could write:

    $# echo "" > /sys/fs/ocfs2/devname/filecheck/check

    or

    $# echo "" > /sys/fs/ocfs2/devname/filecheck/fix

    Compare with second version, I re-design filecheck sysfs interfaces,
    there are three sysfs files (check, fix and set) under filecheck
    directory (see above), sysfs will accept only one argument .
    Second, I adjust some code in ocfs2_filecheck_repair_inode_block()
    function according to upstream feedback, we cannot just add VALID_FL
    flag back as a inode block fix, then we will not fix this field
    corruption currently until having a complete solution. Compare with
    first version, I use strncasecmp instead of double strncmp functions.
    Second, update the source file contribution vendor.

    This patch (of 4):

    Export ocfs2_kset object from ocfs2_stackglue kernel module, then online
    file check code will create the related sysfiles under ocfs2_kset
    object. We're exporting this because it's built in ocfs2_stackglue.ko.

    Signed-off-by: Gang He
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gang He
     

05 Jun, 2014

1 commit


07 Apr, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Nothing major: the stricter permissions checking for sysfs broke a
    staging driver; fix included. Greg KH said he'd take the patch but
    hadn't as the merge window opened, so it's included here to avoid
    breaking build"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    staging: fix up speakup kobject mode
    Use 'E' instead of 'X' for unsigned module taint flag.
    VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
    kallsyms: fix percpu vars on x86-64 with relocation.
    kallsyms: generalize address range checking
    module: LLVMLinux: Remove unused function warning from __param_check macro
    Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
    module: remove MODULE_GENERIC_TABLE
    module: allow multiple calls to MODULE_DEVICE_TABLE() per module
    module: use pr_cont

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • This is a part of the nocontrold feature which was incorporated sometime
    back.

    This is required for backward compatibility of the tools, specifically
    the scenario where the tools with recovery callback is used with a
    kernel not using the recovery callbacks (older kernel + newer tools).
    The tools look for this file to understand if the kernel supports DLM
    recovery callbacks.

    For kernels which support recovery callbacks but will miss this patch,
    ocfs2 will continue to use the older API and would still be able to
    mount the filesystem.

    [akpm@linux-foundation.org: simplify]
    [sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up]
    Signed-off-by: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

29 Mar, 2014

1 commit

  • Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
    trying to strlcpy a string which was explicitly passed as NULL in the
    very same patch, triggering a NULL ptr deref.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: strlcpy (lib/string.c:388 lib/string.c:151)
    CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
    RIP: strlcpy (lib/string.c:388 lib/string.c:151)
    Call Trace:
    ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
    ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
    user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
    dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
    vfs_mkdir (fs/namei.c:3467)
    SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
    tracesys (arch/x86/kernel/entry_64.S:749)

    akpm: this patch probably disables the feature. A temporary thing to
    avoid triviel oopses.

    Signed-off-by: Sasha Levin
    Cc: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

24 Mar, 2014

1 commit

  • Summary of http://lkml.org/lkml/2014/3/14/363 :

    Ted: module_param(queue_depth, int, 444)
    Joe: 0444!
    Rusty: User perms >= group perms >= other perms?
    Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?

    Side effect of stricter permissions means removing the unnecessary
    S_IFREG from several callers.

    Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
    number of drivers fail this test, so that will be the debate for a
    future patch.

    Suggested-by: Joe Perches
    Acked-by: Bjorn Helgaas for drivers/pci/slot.c
    Acked-by: Greg Kroah-Hartman
    Cc: Miklos Szeredi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Rusty Russell

    Rusty Russell
     

22 Jan, 2014

2 commits

  • This is done to differentiate between using and not using controld and
    use the connection information accordingly.

    We need to be backward compatible. So, we use a new enum
    ocfs2_connection_type to identify when controld is used and when it is
    not.

    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
    handling up to the times with respect to DLM (>=4.0.1) and corosync
    (2.3.x). AFAIK, cman also is being phased out for a unified corosync
    cluster stack.

    fs/dlm performs all the functions with respect to fencing and node
    management and provides the API's to do so for ocfs2. For all future
    references, DLM stands for fs/dlm code.

    The advantages are:
    + No need to run an additional userspace daemon (ocfs2_controld)
    + No controld device handling and controld protocol
    + Shifting responsibilities of node management to DLM layer

    For backward compatibility, we are keeping the controld handling code.
    Once enough time has passed we can remove a significant portion of the
    code. This was tested by using the kernel with changes on older
    unmodified tools. The kernel used ocfs2_controld as expected, and
    displayed the appropriate warning message.

    This feature requires modification in the userspace ocfs2-tools. The
    changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
    nocontrold Currently, not many checks are present in the userspace code,
    but that would change soon.

    This patch (of 6):

    Add clustername to cluster connection.

    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

13 Nov, 2013

1 commit


27 Feb, 2010

6 commits

  • Unlike ocfs2, dlmfs has no permanent storage. It can't store off a
    cluster stack it is supposed to be using. So it can't specify the stack
    name in ocfs2_cluster_connect().

    Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses
    the stack that is currently enabled. This is find for dlmfs, which will
    rely on the stack initialization.

    We add the "stackglue" capability to dlmfs's capability list. This lets
    userspace know dlmfs can be used with all cluster stacks.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Inside the stackglue, the locking protocol structure is hanging off of
    the ocfs2_cluster_connection. This takes it one further; the locking
    protocol is passed into ocfs2_cluster_connect(). Now different cluster
    connections can have different locking protocols with distinct asts.
    Note that all locking protocols have to keep their maximum protocol
    version in lock-step.

    With the protocol structure set in ocfs2_cluster_connect(), there is no
    need for the stackglue to have a static pointer to a specific protocol
    structure. We can change initialization to only pass in the maximum
    protocol version.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • With the full ocfs2_locking_protocol hanging off of the
    ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers
    there. They don't need to get them from their plugin structure.

    The user plugin still needs the maximum locking protocol version,
    though. This changes the plugin structure so that it only holds the max
    version, not the entire ocfs2_locking_protocol pointer.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we
    have access to it in the ast and bast wrapper functions. Attach the
    ocfs2_locking_protocol to the conn.

    Now, instead of refering to a static variable for ast/bast pointers, the
    wrappers can look at the connection. This means different connections
    can have different ast/bast pointers, and it reduces the need for the
    static pointer.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We're going to want it in the ast functions, so we convert union
    ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • The stackglue ast and bast functions tried to maintain the fiction that
    their arguments were void pointers. In reality, stack_user.c had to
    know that the argument was an ocfs2_lock_res in order to get the status
    off of the lksb. That's ugly.

    This changes stackglue to always pass the lksb as the argument to ast
    and bast functions. The caller can always use container_of() to get the
    ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
    zero. They still get back the lockres in their ast. stackglue gets
    cleaner, and now can use the lksb itself.

    Signed-off-by: Joel Becker

    Joel Becker
     

19 Nov, 2009

1 commit


12 Nov, 2009

1 commit


23 Jun, 2009

1 commit

  • The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
    the DLM cannot reconstruct its state. Clients of the DLM need to know
    this.

    ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
    track of the state. This is not a standard behavior, but ocfs2 has
    always relied on it. Thus, an o2dlm LVB is always "valid".

    ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
    fs/dlm loses track of an LVBs state, it sets a flag
    (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
    the LVB may be garbage or merely stale.

    ocfs2 doesn't want to try to guess at the validity of the stale LVB.
    Instead, it should be checking the VALNOTVALID flag. As this is the
    'standard' way of treating LVBs, we will promote this behavior.

    We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
    the LVB is valid. o2dlm will always return valid, while fs/dlm will
    check VALNOTVALID.

    Signed-off-by: Joel Becker
    Acked-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

2 commits

  • ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
    one value.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This is actually pretty easy since fs/dlm already handles the bulk of the
    work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
    underlying lock manager, so I only had to add the right calls.

    Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
    UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
    Internally, the file system uses two sets of file_operations, depending on
    whether cluster aware plocks is required. This turns out to be easier than
    implementing local-only versions of ->lock.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

25 Aug, 2008

1 commit


17 Jun, 2008

3 commits


18 Apr, 2008

14 commits

  • Add code to use fs/dlm.

    [ Modified to be part of the stack_user module -- Joel ]

    Signed-off-by: David Teigland
    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • Userspace can now query and specify the cluster stack in use via the
    /sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
    the classic stack. Thus, old tools that do not know how to modify this
    file will work just fine. The stack cannot be modified if there is a
    live filesystem.

    ocfs2_cluster_connect() now takes the expected cluster stack as an
    argument. This way, the filesystem and the stack glue ensure they are
    speaking to the same backend.

    If the stack is 'o2cb', the o2cb stack plugin is used. For any other
    value, the fsdlm stack plugin is selected.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Introduce a set of sysfs files that describe the current stack glue
    state. The files live under /sys/fs/ocfs2. The locking_protocol file
    displays the version of ocfs2's locking code. The
    loaded_cluster_plugins file displays all of the currently loaded stack
    plugins. When filesystems are mounted, the active_cluster_plugin file
    will display the plugin in use.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We define the ocfs2_stack_plugin structure to represent a stack driver.
    The o2cb stack code is split into stack_o2cb.c. This becomes the
    ocfs2_stack_o2cb.ko module.

    The stackglue generic functions are similarly split into the
    ocfs2_stackglue.ko module. This module now provides an interface to
    register drivers. The ocfs2_stack_o2cb driver registers itself. As
    part of this interface, ocfs2_stackglue can load drivers on demand.
    This is accomplished in ocfs2_cluster_connect().

    ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
    If a hangup is pending, it will not release the driver module and will
    let _hangup() do that.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Define the ocfs2_stack_operations structure. Build o2cb_stack_ops from
    all of the o2cb-specific stack functions. Change the generic stack glue
    functions to call the stack_ops instead of the o2cb functions directly.

    The o2cb functions are moved to stack_o2cb.c. The headers are cleaned up
    to where only needed headers are included.

    In this code, stackglue.c and stack_o2cb.c refer to some shared
    extern variables. When they become modules, that will change.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Split off the o2cb-specific funtionality from the generic stack glue
    calls. This is a precurser to wrapping the o2cb functionality in an
    operations vector.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The stack glue initialization function needs a better name so that it can be
    used cleanly when stackglue becomes a module.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's
    create a generic ocfs2_dlm_dump_lksb() function. This allows underlying
    DLMs to print whatever they want about their lock.

    We then move the o2dlm dump into stackglue.c where it belongs.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • o2dlm has the non-standard behavior of providing a cancel callback
    (unlock_ast) even when the cancel has failed (the locking operation
    succeeded without canceling). This is called CANCELGRANT after the
    status code sent to the callback. fs/dlm does not provide this
    callback, so dlmglue must be changed to live without it.
    o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

    Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
    needs to check for it. ocfs2_locking_ast() must catch that a cancel was
    tried and clear the cancel state.

    Making these changes opens up a locking race. dlmglue uses the the
    OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
    one time. But dlmglue must unlock the lockres before calling into the
    dlm. In the small window of time between unlocking the lockres and
    calling the dlm, the downconvert thread can try to cancel the lock. The
    downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
    know that ocfs2_dlm_lock() has not yet been called.

    Because ocfs2_dlm_lock() has not yet been called, the cancel operation
    will just be a no-op. There's nothing to cancel. With CANCELGRANT,
    dlmglue uses the CANCELGRANT callback to clear up the cancel state.
    When it comes around again, it will retry the cancel. Eventually, the
    first thread will have called into ocfs2_dlm_lock(), and either the
    lock or the cancel will succeed. The downconvert thread can then do its
    downconvert.

    Without CANCELGRANT, there is nothing to clean up the cancellation
    state. The downconvert thread does not know to retry its operations.
    More importantly, the original lock may be blocking on the other node
    that is trying to cancel us. With neither able to make progress, the
    ast is never called and the cancellation state is never cleaned up that
    way. dlmglue is deadlocked.

    The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
    set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
    can check whether the lock is cancelable. If not, it just loops around
    to try again. Once ocfs2_dlm_lock() is called, the thread then clears
    OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
    downconvert thread finds the lock BUSY, it can safely try to cancel it.
    Whether the cancel works or not, the state will be properly set and the
    lock processing can continue.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The last bit of classic stack used directly in ocfs2 code is o2hb.
    Specifically, the check for heartbeat during mount and the call to
    ocfs2_hb_ctl during unmount.

    We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
    to ocfs2_hb_ctl. Other stacks will just leave hangup() empty.

    The check for heartbeat is moved into ocfs2_cluster_connect(). It will
    be matched by a similar check for other stacks.

    With this change, only stackglue.c includes cluster/ headers.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2 asks the cluster stack for the local node's node number for two
    reasons; to fill the slot map and to print it. While the slot map isn't
    necessary for userspace cluster stacks, the printing is very nice for
    debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
    this value. It is anticipated that the slot map will not be used under a
    userspace cluster stack, so validity checks of the node num only need to
    exist in the slot map code. Otherwise, it just gets used and printed as an
    opaque value.

    [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
    truly opaque. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This step introduces a cluster stack agnostic API for initializing and
    exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
    connect to the stack. It is all handled in stackglue.c.

    heartbeat.c no longer needs to know how it gets called.
    ocfs2_do_node_down() is now a clean recovery trigger.

    The big gotcha is the ordering of initializations and de-initializations done
    underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
    o2dlm initialization in one block. Thus, the o2dlm functionality of
    ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
    however, did a few things between de-registration of the eviction
    callback and actually shutting down the domain. Now de-registration and
    shutdown of the domain are wrapped within the single
    ocfs2_cluster_disconnect() call. I've checked the code paths to make
    sure we can safely tear down things in ocfs2_dlm_shutdown() before
    calling ocfs2_cluster_disconnect(). The filesystem has already set
    itself to ignore the callback.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Wrap the lock status block (lksb) in a union. Later we will add a union
    element for the fs/dlm lksb. Create accessors for the status and lvb
    fields.

    Other than a debugging function, dlmglue.c does not directly reference
    the o2dlm locking path anymore.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
    This is the first step towards elminiating dlm_status in
    fs/ocfs2/dlmglue.c. The change also passes -errno values to
    ->unlock_ast().

    [ Fix a return code in dlmglue.c and change the error translation table into
    an array of ints. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker