01 Aug, 2008

1 commit

  • As the fs recovery is asynchronous, there is a small chance that another
    node can mount (and thus recover) the slot before the recovery thread
    gets to it.

    If this happens, the recovery thread will block indefinitely on the
    journal/slot lock as that lock will be held for the duration of the mount
    (by design) by the node assigned to that slot.

    The solution implemented is to keep track of the journal replays using
    a recovery generation in the journal inode, which will be incremented by the
    thread replaying that journal. The recovery thread, before attempting the
    blocking lock on the journal/slot lock, will compare the generation on disk
    with what it has cached and skip recovery if it does not match.

    This bug appears to have been inadvertently introduced during the mount/umount
    vote removal by mainline commit 34d024f84345807bf44163fac84e921513dde323. In the
    mount voting scheme, the messaging would indirectly indicate that the slot
    was being recovered.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

15 Jul, 2008

1 commit


18 Apr, 2008

5 commits

  • if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
    side-effects to allow a definition of BUG_ON that drops the code completely.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @ disable unlikely @ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (unlikely(E)) { BUG(); }
    + BUG_ON(E);
    )

    @@ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (E) { BUG(); }
    + BUG_ON(E);
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: Mark Fasheh

    Julia Lawall
     
  • The in-memory slot map uses the same magic as the on-disk one. There is
    a special value to mark a slot as invalid. It relies on the size of
    certain types and so on.

    Write a new in-memory map that keeps validity as a separate field. Outside
    of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
    It also is no longer tied to the type size.

    This also means that only the I/O functions refer to 16bit quantities.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The old recovery map was a bitmap of node numbers. This was sufficient
    for the maximum node number of 254. Going forward, we want node numbers
    to be UINT32. Thus, we need a new recovery map.

    Note that we can't keep track of slots here. We must write down the
    node number to recovery *before* we get the locks needed to convert a
    node number into a slot number.

    The recovery map is now an array of unsigned ints, max_slots in size.
    It moves to journal.c with the rest of recovery.

    Because it needs to be initialized, we move all of recovery initialization
    into a new function, ocfs2_recovery_init(). This actually cleans up
    ocfs2_initialize_super() a little as well. Following on, recovery cleaup
    becomes part of ocfs2_recovery_exit().

    A number of node map functions are rendered obsolete and are removed.

    Finally, waiting on recovery is wrapped in a function rather than naked
    checks on the recovery_event. This is a cleanup from Mark.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Just use osb_lock around the ocfs2_slot_info data. This allows us to
    take the ocfs2_slot_info structure private in slot_info.c. All access
    is now via accessors.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • journal.c and dlmglue.c would refresh the slot map by hand. Instead, have
    the update and clear functions do the work inside slot_map.c. The eventual
    result is to make ocfs2_slot_info defined privately in slot_map.c

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

26 Jan, 2008

4 commits

  • Create separate lockdep lock classes for system file's i_mutexes. They are
    used to guard allocations and similar things and thus rank differently
    than i_mutex of a regular file or directory.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Mostly taken from ext3. This allows the user to set the jbd commit interval,
    in seconds. The default of 5 seconds stays the same, but now users can
    easily increase the commit interval. Typically, this would be increased in
    order to benefit performance at the expense of data-safety.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Call this the "inode_lock" now, since it covers both data and meta data.
    This patch makes no functional changes.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The node maps that are set/unset by these votes are no longer relevant, thus
    we can remove the mount and umount votes. Since those are the last two
    remaining votes, we can also remove the entire vote infrastructure.

    The vote thread has been renamed to the downconvert thread, and the small
    amount of functionality related to managing it has been moved into
    fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

18 Dec, 2007

3 commits


13 Oct, 2007

2 commits

  • ocfs2_queue_orphans() has an open coded readdir loop which can easily just
    use a directory accessor function.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     
  • The code for adding, removing, deleting directory entries was splattered all
    over namei.c. I'd rather have this all centralized, so that it's easier to
    make changes for inline dir data, and eventually indexed directories.

    None of the code in any of the functions was changed. I only removed the
    static keyword from some prototypes so that they could be exported.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Joel Becker

    Mark Fasheh
     

11 Jul, 2007

1 commit


03 May, 2007

1 commit


27 Apr, 2007

5 commits

  • Older file systems which didn't support holes did a dumb calculation of
    i_blocks based on i_size. This is no longer accurate, so fix things up to
    take actual allocation into account.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Initially, we had wired things to return a size '1' of holes. Cook up a
    small amount of code to find the next extent and calculate the number of
    clusters between the virtual offset and the next allocated extent.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Return an optional extent flags field from our lookup functions and wire up
    callers to treat unwritten regions as holes for the purpose of returning
    zeros to the user.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The code in extent_map.c is not prepared to deal with a subtree being
    rotated between lookups. This can happen when filling holes in sparse files.
    Instead of a lengthy patch to update the code (which would likely lose the
    benefit of caching subtree roots), we remove most of the algorithms and
    implement a simple path based lookup. A less ambitious extent caching scheme
    will be added in a later patch.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Ocfs2 currently does cluster-wide node messaging to check the open state of
    an inode during delete. This patch removes that mechanism in favor of an
    inode cluster lock which is taken at shared read when an inode is first read
    and dropped in clear_inode(). This allows a deleting node to test the
    liveness of an inode by attempting to take an exclusive lock.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     

08 Dec, 2006

1 commit

  • This allows users to format an ocfs2 file system with a special flag,
    OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
    will not use any cluster services, nor will it require a cluster
    configuration, thus acting like a 'local' file system.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

06 Dec, 2006

1 commit


02 Dec, 2006

11 commits


22 Nov, 2006

1 commit


25 Sep, 2006

1 commit

  • OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
    Typically, i_generation is encoded in the lock name so that a deleted inode
    on and a new one in the same block don't share the same lvb.

    Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
    is potentially thrown away as soon as the meta data lock is taken - we
    cannot encode the lock name without first knowing i_generation, which
    requires a disk read.

    This patch encodes i_generation in the inode meta data lvb, and removes the
    value from the inode meta data lock name. This way, the read can be covered
    by a lock, and at the same time we can distinguish between an up to date and
    a stale LVB.

    This will help cold-cache stat(2) performance in particular.

    Since this patch changes the protocol version, we take the opportunity to do
    a minor re-organization of two of the LVB fields.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

30 Jun, 2006

1 commit

  • Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were
    unused, or could easily be removed. As a result, we also no longer need
    MAX_OSB_ID or ocfs2_globals_lock.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

28 Jun, 2006

1 commit

  • locking init cleanups:

    - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
    - convert rwlocks in a similar manner

    this patch was generated automatically.

    Motivation:

    - cleanliness
    - lockdep needs control of lock initialization, which the open-coded
    variants do not give
    - it's also useful for -rt and for lock debugging in general

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar