04 Jan, 2012

1 commit


02 Dec, 2011

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
    ocfs2: avoid unaligned access to dqc_bitmap
    ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
    ocfs2: honor O_(D)SYNC flag in fallocate
    ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
    ocfs2: send correct UUID to cleancache initialization
    ocfs2: Commit transactions in error cases -v2
    ocfs2: make direntry invalid when deleting it
    fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
    ocfs2: Avoid livelock in ocfs2_readpage()
    ocfs2: serialize unaligned aio
    ocfs2: Implement llseek()
    ocfs2: Fix ocfs2_page_mkwrite()
    ocfs2: Add comment about orphan scanning
    ocfs2: Clean up messages in the fs
    ocfs2/cluster: Cluster up now includes network connections too
    ocfs2/cluster: Add new function o2net_fill_node_map()
    ocfs2/cluster: Fix output in file elapsed_time_in_ms
    ocfs2/dlm: dlmlock_remote() needs to account for remastery
    ocfs2/dlm: Take inflight reference count for remotely mastered resources too
    ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
    ...

    Linus Torvalds
     

17 Nov, 2011

1 commit

  • There are three cases found that in error cases, journal transactions are not
    committed nor aborted. We should take care of these case by committing the
    transactions. Otherwise, there would left a journal handle which will lead to
    , in same process context, the comming ocfs2_start_trans() gets wrong credits.

    Signed-off-by: Wengang Wang
    Signed-off-by: Joel Becker

    Wengang Wang
     

19 Jul, 2011

1 commit

  • This patch changes the security_inode_init_security API by adding a
    filesystem specific callback to write security extended attributes.
    This change is in preparation for supporting the initialization of
    multiple LSM xattrs and the EVM xattr. Initially the callback function
    walks an array of xattrs, writing each xattr separately, but could be
    optimized to write multiple xattrs at once.

    For existing security_inode_init_security() calls, which have not yet
    been converted to use the new callback function, such as those in
    reiserfs and ocfs2, this patch defines security_old_inode_init_security().

    Signed-off-by: Mimi Zohar

    Mimi Zohar
     

31 Mar, 2011

1 commit


29 Mar, 2011

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits)
    Treat writes as new when holes span across page boundaries
    fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
    ocfs2/dlm: Move kmalloc() outside the spinlock
    ocfs2: Make the left masklogs compat.
    ocfs2: Remove masklog ML_AIO.
    ocfs2: Remove masklog ML_UPTODATE.
    ocfs2: Remove masklog ML_BH_IO.
    ocfs2: Remove masklog ML_JOURNAL.
    ocfs2: Remove masklog ML_EXPORT.
    ocfs2: Remove masklog ML_DCACHE.
    ocfs2: Remove masklog ML_NAMEI.
    ocfs2: Remove mlog(0) from fs/ocfs2/dir.c
    ocfs2: remove NAMEI from symlink.c
    ocfs2: Remove masklog ML_QUOTA.
    ocfs2: Remove mlog(0) from quota_local.c.
    ocfs2: Remove masklog ML_RESERVATIONS.
    ocfs2: Remove masklog ML_XATTR.
    ocfs2: Remove masklog ML_SUPER.
    ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c
    ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c
    ...

    Fix up trivial conflict in fs/ocfs2/super.c

    Linus Torvalds
     

07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

23 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

02 Feb, 2011

1 commit

  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

16 Oct, 2010

1 commit


11 Sep, 2010

1 commit

  • As the name shows, we shouldn't have any lock in
    ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller.
    This should be safe for us since the only 2 callers are:
    1. ocfs2_xattr_get which will lock the resources.
    2. ocfs2_mknod which don't need this locking.

    And this also resolves the following lockdep warning.

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.35+ #5
    -------------------------------------------------------
    reflink/30027 is trying to acquire lock:
    (&oi->ip_alloc_sem){+.+.+.}, at: [] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2]

    but task is already holding lock:
    (&oi->ip_xattr_sem){++++..}, at: [] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #3 (&oi->ip_xattr_sem){++++..}:
    [] __lock_acquire+0x79a/0x7f1
    [] lock_acquire+0xc6/0xed
    [] down_read+0x34/0x47
    [] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2]
    [] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2]
    [] ocfs2_init_acl+0x60/0x243 [ocfs2]
    [] ocfs2_mknod+0xae8/0xfea [ocfs2]
    [] ocfs2_create+0x9d/0x105 [ocfs2]
    [] vfs_create+0x9b/0xf4
    [] do_last+0x2fd/0x5be
    [] do_filp_open+0x1fb/0x572
    [] do_sys_open+0x5a/0xe7
    [] sys_open+0x1b/0x1d
    [] system_call_fastpath+0x16/0x1b

    -> #2 (jbd2_handle){+.+...}:
    [] __lock_acquire+0x79a/0x7f1
    [] lock_acquire+0xc6/0xed
    [] start_this_handle+0x4a3/0x4bc [jbd2]
    [] jbd2__journal_start+0xba/0xee [jbd2]
    [] jbd2_journal_start+0xe/0x10 [jbd2]
    [] ocfs2_start_trans+0xb7/0x19b [ocfs2]
    [] ocfs2_mknod+0x73e/0xfea [ocfs2]
    [] ocfs2_create+0x9d/0x105 [ocfs2]
    [] vfs_create+0x9b/0xf4
    [] do_last+0x2fd/0x5be
    [] do_filp_open+0x1fb/0x572
    [] do_sys_open+0x5a/0xe7
    [] sys_open+0x1b/0x1d
    [] system_call_fastpath+0x16/0x1b

    -> #1 (&journal->j_trans_barrier){.+.+..}:
    [] __lock_acquire+0x79a/0x7f1
    [] lock_release_non_nested+0x1e5/0x24b
    [] lock_release+0x158/0x17a
    [] __mutex_unlock_slowpath+0xbf/0x11b
    [] mutex_unlock+0x9/0xb
    [] ocfs2_free_ac_resource+0x31/0x67 [ocfs2]
    [] ocfs2_free_alloc_context+0x11/0x1d [ocfs2]
    [] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2]
    [] ocfs2_write_begin+0x11e/0x1e7 [ocfs2]
    [] generic_file_buffered_write+0x10c/0x210
    [] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2]
    [] do_sync_write+0xc2/0x106
    [] vfs_write+0xae/0x131
    [] sys_write+0x47/0x6f
    [] system_call_fastpath+0x16/0x1b

    -> #0 (&oi->ip_alloc_sem){+.+.+.}:
    [] validate_chain+0x727/0xd68
    [] __lock_acquire+0x79a/0x7f1
    [] lock_acquire+0xc6/0xed
    [] down_write+0x31/0x52
    [] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2]
    [] ocfs2_ioctl+0x61a/0x656 [ocfs2]
    [] vfs_ioctl+0x2a/0x9d
    [] do_vfs_ioctl+0x45d/0x4ae
    [] sys_ioctl+0x57/0x7a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

13 Jul, 2010

2 commits

  • The new reservation code in local alloc has add the limitation
    that the caller should handle the case that the local alloc
    doesn't give use enough contiguous clusters. It make the old
    xattr reflink code broken.

    So this patch udpate the xattr reflink code so that it can
    handle the case that local alloc give us one cluster at a time.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • The old ocfs2_xattr_extent_allocation is too optimistic about
    the clusters we can get. So actually if the file system is
    too fragmented, ocfs2_add_clusters_in_btree will return us
    with EGAIN and we need to allocate clusters once again.

    So this patch change it to a while loop so that we can allocate
    clusters until we reach clusters_to_add.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker
    Cc: stable@kernel.org

    Tao Ma
     

22 May, 2010

1 commit


19 May, 2010

2 commits

  • In normal xattr set, the set sequence is inode, xattr block
    and finally xattr bucket if we meet with a ENOSPC. But there
    is a corner case.
    So consider we will set a xattr whose value will be stored in
    a cluster, and there is no xattr block by now. So we will
    reserve 1 xattr block and 1 cluster for setting it. Now if we
    fail in value extension(in case the volume is almost full and
    we can't allocate the cluster because the check in
    ocfs2_test_bg_bit_allocatable), ENOSPC will be returned. So
    we will try to create a bucket(this time there is a chance that
    the reserved cluster will be used), and when we try value extension
    again, kernel bug happens. We did meet with it. Check the bug below.
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1251

    This patch just try to avoid this by adding a set_abort in
    ocfs2_xattr_set_ctxt, so in case ENOSPC happens in value extension,
    we will check whether it is caused by the real ENOSPC or just the
    full of inode or xattr block. If it is the first case, we set set_abort
    so that we don't try any further. we are safe to exit directly here
    ince it is really ENOSPC.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • In ocfs2_prepare_xattr_entry, if we fail to grow an existing value,
    xa_cleanup_value_truncate() will leave the old entry in place. Thus, we
    reset its value size. However, if we were allocating a new value, we
    must not reset the value size or we will BUG(). This resolves
    oss.oracle.com bug 1247.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

06 May, 2010

3 commits

  • They all take an ocfs2_alloc_context, which has the allocation inode.

    Signed-off-by: Joel Becker
    Signed-off-by: Tao Ma

    Joel Becker
     
  • In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
    blocks. But if jbd2_journal_extend() fails, it will only restart
    with the the new number of blocks. This tends to be awkward since
    in most cases we want additional reserved blocks. It makes our code
    harder to mantain since the caller can't be sure all the original
    blocks will not be accessed and dirtied again. There are 15 callers
    of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
    h_buffer_credits before they call ocfs2_extend_trans(). This makes
    ocfs2_extend_trans() really extend atop the original block count.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
    since before the kernel moved to git. There is no point in checking
    this error.

    ocfs2_journal_dirty() has been faithfully returning the status since the
    beginning. All over ocfs2, we have blocks of code checking this can't
    fail status. In the past few years, we've tried to avoid adding these
    checks, because they are pointless. But anyone who looks at our code
    assumes they are needed.

    Finally, ocfs2_journal_dirty() is made a void function. All error
    checking is removed from other files. We'll BUG_ON() the status of
    jbd2_journal_dirty_metadata() just in case they change it someday. They
    won't.

    Signed-off-by: Joel Becker

    Joel Becker
     

26 Mar, 2010

1 commit


22 Mar, 2010

1 commit


20 Mar, 2010

2 commits

  • You can't store a pointer that you haven't filled in yet and expect it
    to work.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • When replacing a xattr's value, in some case we wipe its name/value
    first and then re-add it. The wipe is done by
    ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
    block. We currently adjust name_offset for all the entries which have
    (offset < name_offset). This does not adjust the entrie we're replacing.
    Since we are replacing the entry, we don't adjust the total entry count.
    When we calculate a new namevalue location, we trust the entries
    now-wrong offset in ocfs2_xa_get_free_start(). The solution is to
    also adjust the name_offset for the replaced entry, allowing
    ocfs2_xa_get_free_start() to calculate the new namevalue location
    correctly.

    The following script can trigger a kernel panic easily.

    echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
    mount -t ocfs2 $DEVICE $MNT_DIR
    FILE=$MNT_DIR/$RANDOM
    for((i=0;i
    Signed-off-by: Joel Becker

    Tao Ma
     

27 Feb, 2010

16 commits

  • ocfs2 can store extended attribute values as large as a single file. It
    does this using a standard ocfs2 btree for the large value. However,
    the previous code did not handle all error cases cleanly.

    There are multiple problems to have.

    1) We have trouble allocating space for a new xattr. This leaves us
    with an empty xattr.
    2) We overwrote an existing local xattr with a value root, and now we
    have an error allocating the storage. This leaves us an empty xattr.
    where there used to be a value. The value is lost.
    3) We have trouble truncating a reused value. This leaves us with the
    original entry pointing to the truncated original value. The value
    is lost.
    4) We have trouble extending the storage on a reused value. This leaves
    us with the original value safely in place, but with more storage
    allocated when needed.

    This doesn't consider storing local xattrs (values that don't require a
    btree). Those only fail when the journal fails.

    Case (1) is easy. We just remove the xattr we added. We leak the
    storage because we can't safely remove it, but otherwise everything is
    happy. We'll print a warning about the leak.

    Case (4) is easy. We still have the original value in place. We can
    just leave the extra storage attached to this xattr. We return the
    error, but the old value is untouched. We print a warning about the
    storage.

    Case (2) and (3) are hard because we've lost the original values. In
    the old code, we ended up with values that could be partially read.
    That's not good. Instead, we just wipe the xattr entry and leak the
    storage. It stinks that the original value is lost, but now there isn't
    a partial value to be read. We'll print a big fat warning.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • ocfs2_xattr_ibody_set() is the only remaining user of
    ocfs2_xattr_set_entry(). ocfs2_xattr_set_entry() actually does two
    things: it calls ocfs2_xa_set(), and it initializes the inline xattrs.
    Initializing the inline space really belongs in its own call.

    We lift the initialization to ocfs2_xattr_ibody_init(), called from
    ocfs2_xattr_ibody_set() only when necessary. Now
    ocfs2_xattr_ibody_set() can call ocfs2_xa_set() directly.
    ocfs2_xattr_set_entry() goes away.

    Another nice fact is that ocfs2_init_dinode_xa_loc() can trust
    i_xattr_inline_size.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • ocfs2_xattr_block_set() calls into ocfs2_xattr_set_entry() with just the
    HAS_XATTR flag. Most of the machinery of ocfs2_xattr_set_entry() is
    skipped. All that really happens other than the call to ocfs2_xa_set()
    is making sure the HAS_XATTR flag is set on the inode.

    But HAS_XATTR should be set when we also set di->i_xattr_loc. And
    that's done in ocfs2_create_xattr_block(). So let's move it there, and
    then ocfs2_xattr_block_set() can just call ocfs2_xa_set().

    While we're there, ocfs2_create_xattr_block() can take the set_ctxt for
    a smaller argument list. It also learns to set HAS_XATTR_FL, because it
    knows for sure. ocfs2_create_empty_xatttr_block() in the reflink path
    fakes a set_ctxt to call ocfs2_create_xattr_block().

    Signed-off-by: Joel Becker

    Joel Becker
     
  • ocfs2_xattr_set_in_bucket() doesn't need to do its own hacky space
    checking. Let's let ocfs2_xa_prepare_entry() (via ocfs2_xa_set()) do
    the more accurate work. Whenever it doesn't have space,
    ocfs2_xattr_set_in_bucket() can try to get more space.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • ocfs2_xa_set() wraps the ocfs2_xa_prepare_entry()/ocfs2_xa_store_value()
    logic. Both callers can now use the same routine. ocfs2_xa_remove()
    moves directly into ocfs2_xa_set().

    Signed-off-by: Joel Becker

    Joel Becker
     
  • ocfs2_xa_prepare_entry() gets all the logic to add, remove, or modify
    external value trees. Now, when it exits, the entry is ready to receive
    a value of any size.

    ocfs2_xa_remove() is added to handle the complete removal of an entry.
    It truncates the external value tree before calling
    ocfs2_xa_remove_entry().

    ocfs2_xa_store_inline_value() becomes ocfs2_xa_store_value(). It can
    store any value.

    ocfs2_xattr_set_entry() loses all the allocation logic and just uses
    these functions. ocfs2_xattr_set_value_outside() disappears.

    ocfs2_xattr_set_in_bucket() uses these functions and makes
    ocfs2_xattr_set_entry_in_bucket() obsolete. That goes away, as does
    ocfs2_xattr_bucket_set_value_outside() and
    ocfs2_xattr_bucket_value_truncate().

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We're going to want to make sure our buffers get accessed and dirtied
    correctly. So have the xa_loc do the work. This includes storing the
    inode on ocfs2_xa_loc.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We use the ocfs2_xattr_value_buf structure to manage external values.
    It lets the value tree code do its work regardless of the containing
    storage. ocfs2_xa_fill_value_buf() initializes a value buf from an
    ocfs2_xa_loc entry.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Previously the xattr code would send in a fake value, containing a tree
    root, to the function that installed name+value pairs. Instead, we pass
    the real value to ocfs2_xa_set_inline_value(), and it notices that the
    value cannot fit. Thus, it installs a tree root.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We create two new functions on ocfs2_xa_loc, ocfs2_xa_prepare_entry()
    and ocfs2_xa_store_inline_value().

    ocfs2_xa_prepare_entry() makes sure that the xl_entry field of
    ocfs2_xa_loc is ready to receive an xattr. The entry will point to an
    appropriately sized name+value region in storage. If an existing entry
    can be reused, it will be. If no entry already exists, it will be
    allocated. If there isn't space to allocate it, -ENOSPC will be
    returned.

    ocfs2_xa_store_inline_value() stores the data that goes into the 'value'
    part of the name+value pair. For values that don't fit directly, this
    stores the value tree root.

    A number of operations are added to ocfs2_xa_loc_operations to support
    these functions. This reflects the disparate behaviors of xattr blocks
    and buckets.

    With these functions, the overlapping ocfs2_xattr_set_entry_local() and
    ocfs2_xattr_set_entry_normal() can be replaced with a single call
    scheme.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • An ocfs2 xattr entry stores the text name and value as a pair in the
    storage area. Obviously names and values can be variable-sized. If a
    value is too large for the entry storage, a tree root is stored instead.
    The name+value pair is also padded.

    Because of this, there are a million places in the code that do:

    if (needs_external_tree(value_size)
    namevalue_size = pad(name_size) + tree_root_size;
    else
    namevalue_size = pad(name_size) + pad(value_size);

    Let's create some convenience functions to make the code more readable.
    There are three forms. The first takes the raw sizes. The second takes
    an ocfs2_xattr_info structure. The third takes an existing
    ocfs2_xattr_entry.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Rather than calculating strlen all over the place, let's store the
    name length directly on ocfs2_xattr_info.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • struct ocfs2_xattr_info is a useful structure describing an xattr
    you'd like to set. Let's put prefixes on the member fields so it's
    easier to read and use.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • Add ocfs2_xa_remove_entry(), which will remove an xattr entry from its
    storage via the ocfs2_xa_loc descriptor.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • The ocfs2 extended attribute (xattr) code is very flexible. It can
    store xattrs in the inode itself, in an external block, or in a tree of
    data structures. This allows the number of xattrs to be bounded by the
    filesystem size.

    However, the code that manages each possible storage location is
    different. Maintaining the ocfs2 xattr code requires changing each hunk
    separately.

    This patch is the start of a series introducing the ocfs2_xa_loc
    structure. This structure wraps the on-disk details of an xattr
    entry. The goal is that the generic xattr routines can use
    ocfs2_xa_loc without knowing the underlying storage location.

    This first pass merely implements the basic structure, initializing it,
    and wiping the name+value pair of the entry.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • This patch add extent block (metadata) stealing mechanism for
    extent allocation. This mechanism is same as the inode stealing.
    if no room in slot specific extent_alloc, we will try to
    allocate extent block from the next slot.

    Signed-off-by: Tiger Yang
    Acked-by: Tao Ma
    Signed-off-by: Joel Becker

    Tiger Yang