13 Jan, 2011

1 commit


08 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
    hfsplus: %L-to-%ll, macro correction, and remove unneeded braces
    hfsplus: spaces/indentation clean-up
    hfsplus: C99 comments clean-up
    hfsplus: over 80 character lines clean-up
    hfsplus: fix an artifact in ioctl flag checking
    hfsplus: flush disk caches in sync and fsync
    hfsplus: optimize fsync
    hfsplus: split up inode flags
    hfsplus: write up fsync for directories
    hfsplus: simplify fsync
    hfsplus: avoid useless work in hfsplus_sync_fs
    hfsplus: make sure sync writes out all metadata
    hfsplus: use raw bio access for partition tables
    hfsplus: use raw bio access for the volume headers
    hfsplus: always use hfsplus_sync_fs to write the volume header
    hfsplus: silence a few debug printks
    hfsplus: fix option parsing during remount

    Fix up conflicts due to VFS changes in fs/hfsplus/{hfsplus_fs.h,unicode.c}

    Linus Torvalds
     

07 Jan, 2011

4 commits

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_hash so it may be called from lock-free RCU lookups. See similar
    patch for d_compare for details.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_compare so it may be called from lock-free RCU lookups. This
    does put significant restrictions on what may be done from the callback,
    however there don't seem to have been any problems with in-tree fses.
    If some strange use case pops up that _really_ cannot cope with the
    rcu-walk rules, we can just add new rcu-unaware callbacks, which would
    cause name lookup to drop out of rcu-walk mode.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

17 Dec, 2010

5 commits


23 Nov, 2010

11 commits

  • Flush the disk cache in fsync and sync to make sure data actually is
    on disk on completion of these system calls. There is a nobarrier
    mount option to disable this behaviour. It's slightly misnamed now
    that barrier actually are gone, but it matches the name used by all
    major filesystems.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Avoid doing unessecary work in fsync. Do nothing unless the inode
    was marked dirty, and only write the various metadata inodes out if
    they contain any dirty state from this inode. This is archived by
    adding three new dirty bits to the hfsplus-specific inode which are
    set in the correct places.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Split the flags field in the hfsplus inode into an extent_state
    flag that is locked by the extent_lock, and a new flags field
    that uses atomic bitops. The second will grow more flags in the
    next patch.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • fsync is supposed to not just work on regular files, but also on
    directories. Fortunately enough hfsplus_file_fsync works just fine
    for directories, so we can just wire it up.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Remove lots of code we don't need from fsync, we just need to call
    ->write_inode on the inode if it's dirty, for which sync_inode_metadata
    is a lot more efficient than write_inode_now, and we need to write
    out the various metadata inodes, which we now do explicitly instead
    of by calling ->sync_fs.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • There is no reason to write out the metadata inodes or volume headers
    during a non-blocking sync, as we are almost guaranteed to dirty them
    again during the inode writeouts.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • hfsplus stores all metadata except for the volume headers in special
    inodes. While these are marked hashed and periodically written out
    by the flusher threads, we can't rely on that for sync. For the case
    of a data integrity sync the VM has life-lock avoidance code that
    avoids writing inodes again that are redirtied during the sync,
    which is something that can happen easily for hfsplus. So make sure
    we explicitly write out the metadata inodes at the beginning of
    hfsplus_sync_fs.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Switch the hfsplus partition table reding for cdroms to use our bio
    helpers. Again we don't rely on any caching in the buffer_heads, and
    this gets rid of the last buffer_head use in hfsplus.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • The hfsplus backup volume header is located two blocks from the end of
    the device. In case of device sizes that are not 4k aligned this means
    we can't access it using buffer_heads when using the default 4k block
    size.

    Switch to using raw bios to read/write all buffer headers. We were not
    relying on any caching behaviour of the buffer heads anyway. Additionally
    always read in the backup volume header during mount to verify that we
    can actually read it.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Remove opencoded writing of the volume header in hfsplus_fill_super
    and hfsplus_put_super and offload it to hfsplus_sync_fs. In the
    put_super case this means we only write the superblock once instead
    of twice.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Turn a few noisy debug printks that show up during xfstests into
    complied out debug print statements.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

08 Nov, 2010

1 commit

  • hfsplus only actually uses the force option during remount, but it uses
    the full option parser with a fake superblock to do so. This means remount
    will fail if any nls option is set (which happens frequently with older
    mount tools), even if it is the same.

    Fix this by adding a simpler version of the parser that only parses the force
    option for remount.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

29 Oct, 2010

2 commits


27 Oct, 2010

1 commit


26 Oct, 2010

3 commits


15 Oct, 2010

1 commit


14 Oct, 2010

8 commits

  • Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Make sure the initial insertation of the catalog entry already contains
    the device number by calling init_special_inode early and setting writing
    out the dev field of the on-disk permission structure. The latter is
    facilitated by sharing the almost identical hfsplus_set_perms helpers
    between initial catalog entry creating and ->write_inode.

    Unless we crashed just after mknod this bug was harmless as the inode
    is marked dirty at the end of hfsplus_mknod, and hfsplus_write_inode
    will update the catalog entry to contain the correct value.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • The rootflags field in hfsplus_inode_info only caches the immutable and
    append-only flags in the VFS inode, so we can easily get rid of it.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • HFS implements hardlink by using indirect catalog entries that refer to a hidden
    directly. The link target is cached in the dev field in the HFS+ specific
    inode, which is also used for the device number for device files, and inside
    for passing the nlink value of the indirect node from hfsplus_cat_write_inode
    to a helper function. Now if we happen to write out the indirect node while
    hfsplus_link is creating the catalog entry we'll get a link pointing to the
    linkid of the current nlink value. This can easily be reproduced by a large
    enough loop of local git-clone operations.

    Stop abusing the dev field in the HFS+ inode for short term storage by
    refactoring the way the permission structure in the catalog entry is
    set up, and rename the dev field to linkid to avoid any confusion.

    While we're at it also prevent creating hard links to special files, as
    the HFS+ dev and linkid share the same space in the on-disk structure.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • hfs seems prone to bad things when it encounters on disk corruption. Many
    values are read from disk, and used as lengths to memcpy, as an example.
    This patch fixes up several of these problematic cases.

    o sanity check the on-disk maximum key lengths on mount
    (these are set to a defined value at mkfs time and shouldn't differ)
    o check on-disk node keylens against the maximum key length for each tree
    o fix hfs_btree_open so that going out via free_tree: doesn't wind
    up in hfs_releasepage, which wants to follow the very pointer
    we were trying to set up:
    HFS_SB(sb)->cat_tree = hfs_btree_open()
    .
    failure gets to hfs_releasepage and tries to follow HFS_SB(sb)->cat_tree

    Tested with the fsfuzzer; it survives more than it used to.

    [hch: ported of commit cf0594625083111ae522496dc1c256f7476939c2 from hfs]
    [hch: added the fixes from 5581d018ed3493d226e7a4d645d9c8a5af6c36b]

    Signed-off-by: Eric Sandeen
    Signed-off-by: Christoph Hellwig

    Eric Sandeen
     
  • oops and fs corruption; the latter can happen even on valid fs in case of oom.

    [hch: port of commit 3d10a15d6919488204bdb264050d156ced20d9aa from hfs]

    Signed-off-by: Al Viro
    Signed-off-by: Christoph Hellwig

    Al Viro
     
  • A particular fsfuzzer run caused an hfs file system to crash on mount. This
    is due to a corrupted MDB extent record causing a miscalculation of
    HFSPLUS_I(inode)->first_blocks for the extent tree. If the extent records
    are zereod out, then it won't trigger the first_blocks special case and
    instead falls through to the extent code, which we're in the middle
    of initializing.

    This patch catches the 0 size extent records, reports the corruption,
    and fails the mount.

    [hch: ported of commit 47f365eb575735c6b2edf5d08e0d16d26a9c23bd from hfs]

    Reported-by: Ramon de Carvalho Valle
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Christoph Hellwig

    Jeff Mahoney
     

01 Oct, 2010

2 commits

  • When renaming over a directory we need to use hfsplus_rmdir instead of
    hfsplus_unlink to evict the victim. This makes sure we properly error out
    on non-empty directory as required by Posix (BZ #16571), and it also makes
    sure we do the right thing in case i_nlink will every be set correctly for
    directories on hfsplus.

    Reported-by: Vlado Plaga
    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • tree_lock is used as mutex so make it a mutex.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Christoph Hellwig

    Thomas Gleixner