25 Jul, 2011

2 commits


14 May, 2011

1 commit


31 Mar, 2011

1 commit


07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

24 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

10 Sep, 2010

3 commits


08 Aug, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Adding error check after calling ext4_mb_regular_allocator()
    ext4: Fix dirtying of journalled buffers in data=journal mode
    ext4: re-inline ext4_rec_len_(to|from)_disk functions
    jbd2: Remove t_handle_lock from start_this_handle()
    jbd2: Change j_state_lock to be a rwlock_t
    jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
    ext4: Add mount options in superblock
    ext4: force block allocation on quota_off
    ext4: fix freeze deadlock under IO
    ext4: drop inode from orphan list if ext4_delete_inode() fails
    ext4: check to make make sure bd_dev is set before dereferencing it
    jbd2: Make barrier messages less scary
    ext4: don't print scary messages for allocation failures post-abort
    ext4: fix EFBIG edge case when writing to large non-extent file
    ext4: fix ext4_get_blocks references
    ext4: Always journal quota file modifications
    ext4: Fix potential memory leak in ext4_fill_super
    ext4: Don't error out the fs if the user tries to make a file too big
    ext4: allocate stripe-multiple IOs on stripe boundaries
    ext4: move aio completion after unwritten extent conversion
    ...

    Fix up conflicts in fs/ext4/inode.c as per Ted.

    Fix up xfs conflicts as per earlier xfs merge.

    Linus Torvalds
     

04 Aug, 2010

1 commit


16 Jul, 2010

1 commit

  • OCFS2 uses t_commit trigger to compute and store checksum of the just
    committed blocks. When a buffer has b_frozen_data, checksum is computed
    for it instead of b_data but this can result in an old checksum being
    written to the filesystem in the following scenario:

    1) transaction1 is opened
    2) handle1 is opened
    3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
    4) modify(bh)
    5) journal_dirty(handle1, bh)
    6) handle1 is closed
    7) start committing transaction1, opening transaction2
    8) handle2 is opened
    9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
    jh->b_next_transaction is set to transaction2.
    10) jbd2_journal_write_metadata() checksums b_frozen_data
    11) the journal correctly writes b_frozen_data to the disk journal
    12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
    any more journal operation
    13) Checkpointing finally happens, and it just spools the bh via normal buffer
    writeback. This will write b_data, which was never triggered on and thus
    contains a wrong (old) checksum.

    This patch fixes the problem by calling the trigger at the moment data is
    frozen for journal commit - i.e., either when b_frozen_data is created by
    do_get_write_access or just before we write a buffer to the log if
    b_frozen_data does not exist. We also rename the trigger to t_frozen as
    that better describes when it is called.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Jan Kara
     

16 Jun, 2010

1 commit

  • We used to let orphan scan work in the default work queue,
    but there is a corner case which will make the system deadlock.
    The scenario is like this:
    1. set heartbeat threadshold to 200. this will allow us to have a
    great chance to have a orphan scan work before our quorum decision.
    2. mount node 1.
    3. after 1~2 minutes, mount node 2(in order to make the bug easier
    to reproduce, better add maxcpus=1 to kernel command line).
    4. node 1 do orphan scan work.
    5. node 2 do orphan scan work.
    6. node 1 do orphan scan work. After this, node 1 hold the orphan scan
    lock while node 2 know node 1 is the master.
    7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection).

    Now when node 2 begins orphan scan, the system queue is blocked.

    The root cause is that both orphan scan work and quorum decision work
    will use the system event work queue. orphan scan has a chance of
    blocking the event work queue(in dlm_wait_for_node_death) so that there
    is no chance for quorum decision work to proceed.

    This patch resolve it by moving orphan scan work to ocfs2_wq.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

06 May, 2010

2 commits

  • In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
    blocks. But if jbd2_journal_extend() fails, it will only restart
    with the the new number of blocks. This tends to be awkward since
    in most cases we want additional reserved blocks. It makes our code
    harder to mantain since the caller can't be sure all the original
    blocks will not be accessed and dirtied again. There are 15 callers
    of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
    h_buffer_credits before they call ocfs2_extend_trans(). This makes
    ocfs2_extend_trans() really extend atop the original block count.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
    since before the kernel moved to git. There is no point in checking
    this error.

    ocfs2_journal_dirty() has been faithfully returning the status since the
    beginning. All over ocfs2, we have blocks of code checking this can't
    fail status. In the past few years, we've tried to avoid adding these
    checks, because they are pointless. But anyone who looks at our code
    assumes they are needed.

    Finally, ocfs2_journal_dirty() is made a void function. All error
    checking is removed from other files. We'll BUG_ON() the status of
    jbd2_journal_dirty_metadata() just in case they change it someday. They
    won't.

    Signed-off-by: Joel Becker

    Joel Becker
     

26 Jan, 2010

1 commit


04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

23 Sep, 2009

1 commit


05 Sep, 2009

2 commits

  • The next step in divorcing metadata I/O management from struct inode is
    to pass struct ocfs2_caching_info to the journal functions. Thus the
    journal locks a metadata cache with the cache io_lock function. It also
    can compare ci_last_trans and ci_created_trans directly.

    This is a large patch because of all the places we change
    ocfs2_journal_access..(handle, inode, ...) to
    ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We are really passing the inode into the ocfs2_read/write_blocks()
    functions to get at the metadata cache. This commit passes the cache
    directly into the metadata block functions, divorcing them from the
    inode.

    Signed-off-by: Joel Becker

    Joel Becker
     

09 Jul, 2009

1 commit

  • If the mount fails for any reason, ocfs2_dismount_volume calls
    ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
    be called to setup the mutex and work queues, but that doesn't
    happen if the mount has failed and we oops accessing an uninitialized
    work queue.

    This patch splits the init and startup of the orphan scan, eliminating
    the oops.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Joel Becker

    Jeff Mahoney
     

23 Jun, 2009

3 commits


04 Jun, 2009

2 commits

  • Patch to track delayed orphan scan timer statistics.

    Modifies ocfs2_osb_dump to print the following:
    Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Joel Becker

    Srinivas Eeda
     
  • When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
    before moving the dentry to the orphan directory. Other nodes that have
    this dentry in cache have a PR on the same dentry lock. When the EX is
    requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
    during downconvert. The inode is finally deleted when the last node to iput
    the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

    A problem arises if a node is forced to free dentry locks because of memory
    pressure. If this happens, the node will no longer get downconvert
    notifications for the dentries that have been unlinked on another node.
    If it also happens that node is actively using the corresponding inode and
    happens to be the one performing the last iput on that inode, it will fail
    to delete the inode as it will not have the MAYBE_ORPHANED flag set.

    This patch fixes this shortcoming by introducing a periodic scan of the
    orphan directories to delete such inodes. Care has been taken to distribute
    the workload across the cluster so that no one node has to perform the task
    all the time.

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Joel Becker

    Srinivas Eeda
     

04 Apr, 2009

3 commits

  • During recovery, a node recovers orphans in it's slot and the dead node(s). But
    if the dead nodes were holding orphans in offline slots, they will be left
    unrecovered.

    If the dead node is the last one to die and is holding orphans in other slots
    and is the first one to mount, then it only recovers it's own slot, which
    leaves orphans in offline slots.

    This patch queues complete_recovery to clean orphans for all offline slots
    during mount and node recovery.

    Signed-off-by: Srinivas Eeda
    Acked-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Srinivas Eeda
     
  • This patch makes use of Ocfs2's flexible btree code to add an additional
    tree to directory inodes. The new tree stores an array of small,
    fixed-length records in each leaf block. Each record stores a hash value,
    and pointer to a block in the traditional (unindexed) directory tree where a
    dirent with the given name hash resides. Lookup exclusively uses this tree
    to find dirents, thus providing us with constant time name lookups.

    Some of the hashing code was copied from ext3. Unfortunately, it has lots of
    unfixed checkpatch errors. I left that as-is so that tracking changes would
    be easier.

    Signed-off-by: Mark Fasheh
    Acked-by: Joel Becker

    Mark Fasheh
     
  • Move the definition of struct recovery_map from journal.c to journal.h. This
    is preparation for the next patch.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

06 Jan, 2009

9 commits

  • Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
    dirblocks.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
    commit triggers and allow us to compute metadata ecc right before the
    buffers are written out. This commit provides ecc for inodes, extent
    blocks, group descriptors, and quota blocks. It is not safe to use
    extened attributes and metaecc at the same time yet.

    The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
    the type of block at their root. Before, it didn't matter, but now the
    root block must use the appropriate ocfs2_journal_access_*() function.
    To keep this abstract, the structures now have a pointer to the matching
    journal_access function and a wrapper call to call it.

    A few places use naked ocfs2_write_block() calls instead of adding the
    blocks to the journal. We make sure to calculate their checksum and ecc
    before the write.

    Since we pass around the journal_access functions. Let's typedef them
    in ocfs2.h.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We create wrappers for ocfs2_journal_access() that are specific to the
    type of metadata block. This allows us to associate jbd2 commit
    triggers with the block. The triggers will compute metadata ecc in a
    future commit.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Enable quota usage tracking on mount and disable it on umount. Also
    add support for quota on and quota off quotactls and usrquota and
    grpquota mount options. Add quota features among supported ones.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Implement functions for recovery after a crash. Functions just
    read local quota file and sync info to global quota file.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • OCFS2 can easily support nested transactions. We just have to
    take care and not spoil statistics acquire semaphore unnecessarily.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • JBD2 is fully backwards compatible with JBD and it's been tested enough with
    Ocfs2 that we can clean this code up now.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Random places in the code would check a dinode bh to see if it was
    valid. Not only did they do different levels of validation, they
    handled errors in different ways.

    The previous commit unified inode block reads, validating all block
    reads in the same place. Thus, these haphazard checks are no longer
    necessary. Rather than eliminate them, however, we change them to
    BUG_ON() checks. This ensures the assumptions remain true. All of the
    code paths to these checks have been audited to ensure they come from a
    validated inode read.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 code currently reads inodes off disk with a simple
    ocfs2_read_block() call. Each place that does this has a different set
    of sanity checks it performs. Some check only the signature. A couple
    validate the block number (the block read vs di->i_blkno). A couple
    others check for VALID_FL. Only one place validates i_fs_generation. A
    couple check nothing. Even when an error is found, they don't all do
    the same thing.

    We wrap inode reading into ocfs2_read_inode_block(). This will validate
    all the above fields, going readonly if they are invalid (they never
    should be). ocfs2_read_inode_block_full() is provided for the places
    that want to pass read_block flags. Every caller is passing a struct
    inode with a valid ip_blkno, so we don't need a separate blkno argument
    either.

    We will remove the validation checks from the rest of the code in a
    later commit, as they are no longer necessary.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

11 Nov, 2008

1 commit

  • Patch sets journal descriptor to NULL after the journal is shutdown.
    This ensures that jbd2_journal_release_jbd_inode(), which removes the
    jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
    even after the journal has been shutdown.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Sunil Mushran