04 Aug, 2012

1 commit


02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

31 Jul, 2012

2 commits

  • We update gfs2_page_mkwrite() to use new freeze protection and the transaction
    code to use freeze protection while the transaction is running. That is needed
    to stop iput() of unlinked file from modifying the filesystem. The rest is
    handled by the generic code.

    CC: cluster-devel@redhat.com
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • CC: Steven Whitehouse
    CC: cluster-devel@redhat.com
    Acked-by: Steven Whitehouse
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

25 Jul, 2012

1 commit

  • Pull GFS2 updates from Steven Whitehouse.

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: Eliminate 64-bit divides
    GFS2: Reduce file fragmentation
    GFS2: kernel panic with small gfs2 filesystems - 1 RG
    GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
    GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
    GFS2: Add kobject release method
    GFS2: Size seq_file buffer more carefully
    GFS2: Use seq_vprintf for glocks debugfs file
    seq_file: Add seq_vprintf function and export it
    GFS2: Use lvbs for storing rgrp information with mount option
    GFS2: Cache last hash bucket for glock seq_files
    GFS2: Increase buffer size for glocks and glstats debugfs files
    GFS2: Fix error handling when reading an invalid block from the journal
    GFS2: Add "top dir" flag support
    GFS2: Fold quota data into the reservations struct
    GFS2: Extend the life of the reservations

    Linus Torvalds
     

23 Jul, 2012

2 commits

  • Since the moment writes to quota files are using block device page cache and
    space for quota structures is reserved at the moment they are first accessed we
    have no reason to sync quota before inode writeback. In fact this order is now
    only harmful since quota information can easily change during inode writeback
    (either because conversion of delayed-allocated extents or simply because of
    allocation of new blocks for simple filesystems not using page_mkwrite).

    So move syncing of quota information after writeback of inodes into ->sync_fs
    method. This way we do not have to use ->quota_sync callback which is primarily
    intended for use by quotactl syscall anyway and we get rid of calling
    ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
    proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
    also support legacy non-journalled quotas) and thus there are no dirty quota
    structures.

    CC: "Theodore Ts'o"
    CC: Joel Becker
    CC: reiserfs-devel@vger.kernel.org
    Acked-by: Steven Whitehouse
    Acked-by: Dave Kleikamp
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Split off part of dquot_quota_sync() which writes dquots into a quota file
    to a separate function. In the next patch we will use the function from
    filesystems and we do not want to abuse ->quota_sync quotactl callback more
    than necessary.

    Acked-by: Steven Whitehouse
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

21 Jul, 2012

1 commit

  • This patch removes the 64-bit divides introduced in the previous patch
    in favor of shifting, so that it will compile properly on 32-bit machines.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

19 Jul, 2012

1 commit

  • This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
    resulting improved on disk layout greatly speeds up operations in cases
    which would have resulted in interlaced allocation of blocks previously.
    A typical example of this is 10 parallel dd processes, each writing to a
    file in a common dirctory.

    The implementation uses an rbtree of reservations attached to each
    resource group (and each inode).

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

18 Jul, 2012

1 commit


14 Jul, 2012

4 commits


28 Jun, 2012

1 commit

  • This patch fixes buffer_head double free in following code path:

    gfs2_block_map
    => gfs2_meta_inode_buffer
    => gfs2_meta_indirect_buffer
    => gfs2_meta_read
    => release_metapath

    gfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
    as an argument. mp.mp_bh are filled with zero at the beginning
    of gfs2_block_map.

    If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
    calls release_metapath to free buffers chained to mp.mp_bh.
    release_metapath checks each slot of mp.mp_bh[i] and
    free(with brelse) unless the slot is filled with NULL.

    &mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
    gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
    gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
    is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
    passed back to caller via an argument bhp.

    gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
    to its caller with EIO. Finally gfs2_block_map gets both EIO and
    &mp.mp_bh[0] filled with the wrong pointer. release_metapath
    calls brelse again on the wrong pointer.

    Signed-off-by: Masatake YAMATO
    Signed-off-by: Steven Whitehouse

    Masatake YAMATO
     

14 Jun, 2012

1 commit


13 Jun, 2012

1 commit

  • This patch adds a kobject release function that properly maintains
    the kobject use count, so that accesses to the sysfs files do not
    cause an access to freed kernel memory after an unmount.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

11 Jun, 2012

2 commits


08 Jun, 2012

2 commits

  • Instead of reading in the resource groups when gfs2 is checking
    for free space to allocate from, gfs2 can store the necessary infromation
    in the resource group's lvb. Also, instead of searching for unlinked
    inodes in every resource group that's checked for free space, gfs2 can
    store the number of unlinked but inodes in the lvb, and only check for
    unlinked inodes if it will find some.

    The first time a resource group is locked, the lvb must initialized.
    Since this involves counting the unlinked inodes in the resource group,
    this takes a little extra time. But after that, if the resource group
    is locked with GL_SKIP, the buffer head won't be read in unless it's
    actually needed.

    Enabling the resource groups lvbs is done via the rgrplvb mount option. If
    this option isn't set, the lvbs will still be set and updated, but they won't
    be verfied or used by the filesystem. To safely turn on this option, all of
    the nodes mounting the filesystem must be running code with this patch, and
    the filesystem must have been completely unmounted since they were updated.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • For the glocks and glstats seq_files, which are exposed via debugfs
    we should cache the most recent hash bucket, along with the offset
    into that bucket. This allows us to restart from that point, rather
    than having to begin at the beginning each time.

    This is an idea from Eric Dumazet, however I've slightly extended it
    so that if the position from which we are due to start is at any
    point beyond the last cached point, we start from the last cached
    point, plus whatever is the appropriate offset. I don't really expect
    people to be lseeking around these files, but if they did so with only
    positive offsets, then we'd still get some of the benefit of using a
    cached offset.

    With my simple test of around 200k entries in the file, I'm seeing
    an approx 10x speed up.

    Cc: Eric Dumazet
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

07 Jun, 2012

1 commit


06 Jun, 2012

4 commits

  • When we read an invalid block from the journal, we should not call
    withdraw, but simply print a message and return an error. It is
    up to the caller to then handle that error. In the case of mount
    that means a failed mount, rather than a withdraw (requiring a
    reboot). In the case of recovering another nodes journal then
    we return an error via the uevent.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds support for the "top dir" flag. Currently this is unused
    but a subsequent patch is planned which will add support for the
    Orlov allocation policy when allocating subdirectories in a parent
    with this flag set.

    In order to ensure backward compatible behaviour, mkfs.gfs2 does
    not currently tag the root directory with this flag, it must always be
    set manually.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch moves the ancillary quota data structures into the
    block reservations structure. This saves GFS2 some time and
    effort in allocating and deallocating the qadata structure.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch lengthens the lifespan of the reservations structure for
    inodes. Before, they were allocated and deallocated for every write
    operation. With this patch, they are allocated when the first write
    occurs, and deallocated when the last process closes the file.
    It's more efficient to do it this way because it saves GFS2 a lot of
    unnecessary allocates and frees. It also gives us more flexibility
    for the future: (1) we can now fold the qadata structure back into
    the structure and save those alloc/frees, (2) we can use this for
    multi-block reservations.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

30 May, 2012

1 commit

  • pass inode + parent's inode or NULL instead of dentry + bool saying
    whether we want the parent or not.

    NOTE: that needs ceph fix folded in.

    Signed-off-by: Al Viro

    Al Viro
     

29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

23 May, 2012

1 commit

  • Pull dlm updates from David Teigland:
    "This set includes some minor fixes and improvements. The one large
    patch addresses the special "nodir" mode, which has been a long
    neglected proof of concept, but with these fixes seems to be quite
    usable. It allows the resource master to be assigned statically
    instead of dynamically, which can improve performance if there is
    little locality and most resources are shared."

    * tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    dlm: NULL dereference on failure in kmem_cache_create()
    gfs2: fix recovery during unmount
    dlm: fixes for nodir mode
    dlm: improve error and debug messages
    dlm: avoid unnecessary search in search_rsb
    dlm: limit rcom debug messages
    dlm: fix waiter recovery
    dlm: prevent connections during shutdown

    Linus Torvalds
     

22 May, 2012

1 commit

  • Pull GFS2 changes from Steven Whitehouse.

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
    GFS2: Fix quota adjustment return code
    GFS2: Add rgrp information to block_alloc trace point
    GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
    GFS2: Update glock doc to add new stats info
    GFS2: Update main gfs2 doc
    GFS2: Remove redundant metadata block type check
    GFS2: Fix sgid propagation when using ACLs
    GFS2: eliminate log elements and simplify
    GFS2: Eliminate vestigial sd_log_le_rg
    GFS2: Eliminate needless parameter from function gfs2_setbit
    GFS2: Log code fixes
    GFS2: Remove unused argument from gfs2_internal_read
    GFS2: Remove bd_list_tr
    GFS2: Remove duplicate log code
    GFS2: Clean up log write code path
    GFS2: Use variable rather than qa to determine if unstuff necessary
    GFS2: Change variable blk to biblk
    GFS2: Fix function parameter comments in rgrp.c
    GFS2: Eliminate offset parameter to gfs2_setbit
    GFS2: Use slab for block reservation memory
    ...

    Linus Torvalds
     

16 May, 2012

1 commit

  • This patch changes function gfs2_adjust_quota so that it properly
    returns a good (zero) return code on the normal path through the code.
    Without this, mounting GFS2 with -o quota=account periodically gave
    this error message: GFS2: fsid=cluster:fs: gfs2_quotad: sync error -5

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

11 May, 2012

3 commits

  • This is a second attempt at a patch that adds rgrp information to the
    block allocation trace point for GFS2. As suggested, the patch was
    modified to list the rgrp information _after_ the fields that exist today.

    Again, the reason for this patch is to allow us to trace and debug
    problems with the block reservations patch, which is still in the works.
    We can debug problems with reservations if we can see what block allocations
    result from the block reservations. It may also be handy in figuring out
    if there are problems in rgrp free space accounting. In other words,
    we can use it to track the rgrp and its free space along side the allocations
    that are taking place.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • It turns out that the "new" parameter to function gfs2_meta_indirect_buffer
    was always being passed in as zero. Therefore, this patch eliminates it
    and simplifies the function.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This allows comparing hash and len in one operation on 64-bit
    architectures. Right now only __d_lookup_rcu() takes advantage of this,
    since that is the case we care most about.

    The use of anonymous struct/unions hides the alternate 64-bit approach
    from most users, the exception being a few cases where we initialize a
    'struct qstr' with a static initializer. This makes the problematic
    cases use a new QSTR_INIT() helper function for that (but initializing
    just the name pointer with a "{ .name = xyzzy }" initializer remains
    valid, as does just copying another qstr structure).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 May, 2012

1 commit


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

04 May, 2012

1 commit

  • This cleans up the mode setting code when creating inodes. The
    SGID bit was being reset by setattr_copy() when the user creating a
    subdirectory was not in the owning group. When ACLs are in use this
    SGID bit should have been propagated if the ACL allows creation of
    a subdirectory. GFS2's behaviour now matches that of the other ACL
    supporting filesystems in this regard.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 May, 2012

2 commits

  • Journal recovery from lock_dlm should not be ignored
    if there is an unmount in progress. Ignoring it will
    causes the recovery to get stuck. The recovery
    process will correctly handle an in-progess unmount.

    Signed-off-by: David Teigland

    David Teigland
     
  • The "nodir" mode (statically assign master nodes instead
    of using the resource directory) has always been highly
    experimental, and never seriously used. This commit
    fixes a number of problems, making nodir much more usable.

    - Major change to recovery: recover all locks and restart
    all in-progress operations after recovery. In some
    cases it's not possible to know which in-progess locks
    to recover, so recover all. (Most require recovery
    in nodir mode anyway since rehashing changes most
    master nodes.)

    - Change the way nodir mode is enabled, from a command
    line mount arg passed through gfs2, into a sysfs
    file managed by dlm_controld, consistent with the
    other config settings.

    - Allow recovering MSTCPY locks on an rsb that has not
    yet been turned into a master copy.

    - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
    from a previous, aborted recovery cycle. Base this
    on the local recovery status not being in the state
    where any nodes should be sending LOCK messages for the
    current recovery cycle.

    - Hold rsb lock around dlm_purge_mstcpy_locks() because it
    may run concurrently with dlm_recover_master_copy().

    - Maintain highbast on process-copy lkb's (in addition to
    the master as is usual), because the lkb can switch
    back and forth between being a master and being a
    process copy as the master node changes in recovery.

    - When recovering MSTCPY locks, flag rsb's that have
    non-empty convert or waiting queues for granting
    at the end of recovery. (Rename flag from LOCKS_PURGED
    to RECOVER_GRANT and similar for the recovery function,
    because it's not only resources with purged locks
    that need grant a grant attempt.)

    - Replace a couple of unnecessary assertion panics with
    error messages.

    Signed-off-by: David Teigland

    David Teigland
     

02 May, 2012

1 commit

  • This patch eliminates the gfs2_log_element data structure and
    rolls its two components into the gfs2_bufdata. This makes the code
    easier to understand and makes it easier to migrate to a rbtree
    to keep the list sorted.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson