02 Dec, 2011

1 commit

  • The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
    but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
    ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
    are wrapper macros for ext2_*_bit() which need to take an unsigned long
    aligned address (though some architectures are able to handle unaligned
    address correctly)

    So some 64bit architectures may not be able to access the dqc_bitmap
    correctly.

    This avoids such unaligned access by using another wrapper functions for
    ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
    handle unaligned bitmap access.

    Signed-off-by: Akinobu Mita
    Acked-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Joel Becker

    Akinobu Mita
     

25 Jul, 2011

1 commit


07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

23 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

09 Jul, 2010

1 commit

  • ocfs2's allocation unit is the cluster. This can be larger than a block
    or even a memory page. This means that a file may have many blocks in
    its last extent that are beyond the block containing i_size. There also
    may be more unwritten extents after that.

    When ocfs2 grows a file, it zeros the entire cluster in order to ensure
    future i_size growth will see cleared blocks. Unfortunately,
    block_write_full_page() drops the pages past i_size. This means that
    ocfs2 is actually leaking garbage data into the tail end of that last
    cluster. This is a bug.

    We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
    when a write or truncate is past i_size. They will use
    ocfs2_zero_extend() to ensure the data is properly zeroed.

    Older versions of ocfs2_zero_extend() simply zeroed every block between
    i_size and the zeroing position. This presumes three things:

    1) There is allocation for all of these blocks.
    2) The extents are not unwritten.
    3) The extents are not refcounted.

    (1) and (2) hold true for non-sparse filesystems, which used to be the
    only users of ocfs2_zero_extend(). (3) is another bug.

    Since we're now using ocfs2_zero_extend() for sparse filesystems as
    well, we teach ocfs2_zero_extend() to check every extent between
    i_size and the zeroing position. If the extent is unwritten, it is
    ignored. If it is refcounted, it is CoWed. Then it is zeroed.

    Signed-off-by: Joel Becker
    Cc: stable@kernel.org

    Joel Becker
     

22 May, 2010

5 commits

  • We cannot cancel delayed work from ocfs2_local_free_info because that is called
    with dqonoff_mutex held and the work it cancels requires dqonoff_mutex to
    finish. Cancel the work before acquiring dqonoff_mutex.

    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     
  • commit_dqblk() can write quota info to global file. That is actually a bad
    thing to do because if we are just modifying local quota file, we are not
    prepared (do not hold proper locks, do not have transaction credits) to do
    a modification of the global quota file. So do not use commit_dqblk() and
    instead call our writing function directly.

    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     
  • OCFS2 had three issues with quota locking:
    a) When reading dquot from global quota file, we started a transaction while
    holding dqio_mutex which is prone to deadlocks because other paths do it
    the other way around
    b) During ocfs2_sync_dquot we were not protected against concurrent writers
    on the same node. Because we first copy data to local buffer, a race
    could happen resulting in old data being written to global quota file and
    thus causing quota inconsistency after a crash.
    c) ip_alloc_sem of quota files was acquired while a transaction is started
    in ocfs2_quota_write which can deadlock because we first get ip_alloc_sem
    and then start a transaction when extending quota files.

    We fix the problem a) by pulling all necessary code to ocfs2_acquire_dquot
    and ocfs2_release_dquot. Thus we no longer depend on generic dquot_acquire
    to do the locking and can force proper lock ordering.

    Problems b) and c) are fixed by locking i_mutex and ip_alloc_sem of
    global quota file in ocfs2_lock_global_qf and removing ip_alloc_sem from
    ocfs2_quota_read and ocfs2_quota_write.

    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     
  • The position of global quota file info does not change. So we do not have
    to do logical -> physical block translation every time we reread it from
    disk. Thus we can also avoid taking ip_alloc_sem.

    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     
  • There is no need to map offset of local dquot structure to on disk block
    in each quota write. It is enough to map it just once and store the physical
    block number in quota structure in memory. Moreover this simplifies locking
    as we do not have to take ip_alloc_sem from quota write path.

    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     

21 May, 2010

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits)
    ocfs2: Silence a gcc warning.
    ocfs2: Don't retry xattr set in case value extension fails.
    ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break
    ocfs2: Reset xattr value size after xa_cleanup_value_truncate().
    fs/ocfs2/dlm: Use kstrdup
    fs/ocfs2/dlm: Drop memory allocation cast
    Ocfs2: Optimize punching-hole code.
    Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.
    Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.
    Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.
    ocfs2: Block signals for mkdir/link/symlink/O_CREAT.
    ocfs2: Wrap signal blocking in void functions.
    ocfs2/dlm: Increase o2dlm lockres hash size
    ocfs2: Make ocfs2_extend_trans() really extend.
    ocfs2/trivial: Code cleanup for allocation reservation.
    ocfs2: make ocfs2_adjust_resv_from_alloc simple.
    ocfs2: Make nointr a default mount option
    ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE
    o2net: log socket state changes
    ocfs2: print node # when tcp fails
    ...

    Linus Torvalds
     

06 May, 2010

1 commit

  • jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
    since before the kernel moved to git. There is no point in checking
    this error.

    ocfs2_journal_dirty() has been faithfully returning the status since the
    beginning. All over ocfs2, we have blocks of code checking this can't
    fail status. In the past few years, we've tried to avoid adding these
    checks, because they are pointless. But anyone who looks at our code
    assumes they are needed.

    Finally, ocfs2_journal_dirty() is made a void function. All error
    checking is removed from other files. We'll BUG_ON() the status of
    jbd2_journal_dirty_metadata() just in case they change it someday. They
    won't.

    Signed-off-by: Joel Becker

    Joel Becker
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit

  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

10 Dec, 2009

1 commit


05 Sep, 2009

2 commits

  • The next step in divorcing metadata I/O management from struct inode is
    to pass struct ocfs2_caching_info to the journal functions. Thus the
    journal locks a metadata cache with the cache io_lock function. It also
    can compare ci_last_trans and ci_created_trans directly.

    This is a large patch because of all the places we change
    ocfs2_journal_access..(handle, inode, ...) to
    ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We are really passing the inode into the ocfs2_read/write_blocks()
    functions to get at the metadata cache. This commit passes the cache
    directly into the metadata block functions, divorcing them from the
    inode.

    Signed-off-by: Joel Becker

    Joel Becker
     

24 Jul, 2009

3 commits


04 Jun, 2009

2 commits

  • In ocfs2_finish_quota_recovery() we acquired global quota file lock and started
    recovering local quota file. During this process we need to get quota
    structures, which calls ocfs2_dquot_acquire() which gets global quota file lock
    again. This second lock can block in case some other node has requested the
    quota file lock in the mean time. Fix the problem by moving quota file locking
    down into the function where it is really needed. Then dqget() or dqput()
    won't be called with the lock held.

    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     
  • This function is called with dqio_mutex held but it has to acquire lock
    from global quota file which ranks above this lock. This is not deadlockable
    lock inversion since this code path is take only during mount when noone
    else can race with us but let's clean this up to silence lockdep.

    We just drop the dqio_mutex in the beginning of the function and reacquire
    it in the end since we don't need it - noone can race with us at this moment.

    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     

06 Jan, 2009

7 commits

  • The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
    commit triggers and allow us to compute metadata ecc right before the
    buffers are written out. This commit provides ecc for inodes, extent
    blocks, group descriptors, and quota blocks. It is not safe to use
    extened attributes and metaecc at the same time yet.

    The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
    the type of block at their root. Before, it didn't matter, but now the
    root block must use the appropriate ocfs2_journal_access_*() function.
    To keep this abstract, the structures now have a pointer to the matching
    journal_access function and a wrapper call to call it.

    A few places use naked ocfs2_write_block() calls instead of adding the
    blocks to the journal. We make sure to calculate their checksum and ecc
    before the write.

    Since we pass around the journal_access functions. Let's typedef them
    in ocfs2.h.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Fix 2 minor things in quota. They are both found by sparse check.
    1. an endian bug in ocfs2_local_quota_add_chunk.
    2. change olq_alloc_dquot to static.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • fs/ocfs2/quota_local.c: In function 'olq_set_dquot':
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64'
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64'
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64'
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64'
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64'
    fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64'
    fs/ocfs2/quota_global.c: In function '__ocfs2_sync_dquot':
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64'
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64'
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64'
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64'
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64'
    fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64'

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to
    match ocfs2_read_blocks(). The quota code, converting from
    ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in
    ocfs2_read_quota_block(). Unfortunately, the prototype of
    ocfs2_read_quota_block() matches the old prototype of ocfs2_bread().

    The problem is that ocfs2_bread() returned the buffer head, and callers
    assumed that a NULL pointer was indicative of error. It wasn't. This
    is why ocfs2_bread() took an int*err argument as well.

    The new prototype of ocfs2_read_virt_blocks() avoids this error handling
    confusion. Let's change ocfs2_read_quota_block() to match.

    Signed-off-by: Joel Becker
    Acked-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Implement functions for recovery after a crash. Functions just
    read local quota file and sync info to global quota file.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • This patch creates a work queue for periodic syncing of locally cached quota
    information to the global quota files. We constantly queue a delayed work
    item, to get the periodic behavior.

    Signed-off-by: Mark Fasheh
    Acked-by: Jan Kara

    Mark Fasheh
     
  • For each quota type each node has local quota file. In this file it stores
    changes users have made to disk usage via this node. Once in a while this
    information is synced to global file (and thus with other nodes) so that
    limits enforcement at least aproximately works.

    Global quota files contain all the information about usage and limits. It's
    mostly handled by the generic VFS code (which implements a trie of structures
    inside a quota file). We only have to provide functions to convert structures
    from on-disk format to in-memory one. We also have to provide wrappers for
    various quota functions starting transactions and acquiring necessary cluster
    locks before the actual IO is really started.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara