08 Apr, 2014

1 commit

  • filemap_map_pages() is generic implementation of ->map_pages() for
    filesystems who uses page cache.

    It should be safe to use filemap_map_pages() for ->map_pages() if
    filesystem use filemap_fault() for ->fault().

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Alexander Viro
    Cc: Dave Chinner
    Cc: Ning Qu
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

05 Apr, 2014

2 commits

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     
  • Pull GFS2 updates from Steven Whitehouse:
    "One of the main highlights this time, is not the patches themselves
    but instead the widening contributor base. It is good to see that
    interest is increasing in GFS2, and I'd like to thank all the
    contributors to this patch set.

    In addition to the usual set of bug fixes and clean ups, there are
    patches to improve inode creation performance when xattrs are required
    and some improvements to the transaction code which is intended to
    help improve scalability after further changes in due course.

    Journal extent mapping is also updated to make it more efficient and
    again, this is a foundation for future work in this area.

    The maximum number of ACLs has been increased to 300 (for a 4k block
    size) which means that even with a few additional xattrs from selinux,
    everything should fit within a single fs block.

    There is also a patch to bring GFS2's own copy of the writepages code
    up to the same level as the core VFS. Eventually we may be able to
    merge some of this code, since it is fairly similar.

    The other major change this time, is bringing consistency to the
    printing of messages via fs_, pr_ macros"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (29 commits)
    GFS2: Fix address space from page function
    GFS2: Fix uninitialized VFS inode in gfs2_create_inode
    GFS2: Fix return value in slot_get()
    GFS2: inline function gfs2_set_mode
    GFS2: Remove extraneous function gfs2_security_init
    GFS2: Increase the max number of ACLs
    GFS2: Re-add a call to log_flush_wait when flushing the journal
    GFS2: Ensure workqueue is scheduled after noexp request
    GFS2: check NULL return value in gfs2_ok_to_move
    GFS2: Convert gfs2_lm_withdraw to use fs_err
    GFS2: Use fs_ more often
    GFS2: Use pr_ more consistently
    GFS2: Move recovery variables to journal structure in memory
    GFS2: global conversion to pr_foo()
    GFS2: return -E2BIG if hit the maximum limits of ACLs
    GFS2: Clean up journal extent mapping
    GFS2: replace kmalloc - __vmalloc / memset 0
    GFS2: Remove extra "if" in gfs2_log_flush()
    fs: NULL dereference in posix_acl_to_xattr()
    GFS2: Move log buffer accounting to transaction
    ...

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

01 Apr, 2014

1 commit

  • Now that rgrps use the address space which is part of the super
    block, we need to update gfs2_mapping2sbd() to take account of
    that. The only way to do that easily is to use a different set
    of address_space_operations for rgrps.

    Reported-by: Abhi Das
    Tested-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

31 Mar, 2014

2 commits

  • When gfs2_create_inode() fails due to quota violation, the VFS
    inode is not completely uninitialized. This can cause a list
    corruption error.

    This patch correctly uninitializes the VFS inode when a quota
    violation occurs in the gfs2_create_inode codepath.

    Resolves: rhbz#1059808
    Signed-off-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Abhi Das
     
  • ENOSPC was being returned in slot_get inspite of successful
    execution of the function. This patch fixes this return
    code.

    Signed-off-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Abhi Das
     

19 Mar, 2014

3 commits


13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

12 Mar, 2014

3 commits

  • Upstream commit 34cc178 changed a line of code from calling function
    log_flush_commit to calling log_write_header. This had the effect of
    eliminating a call to function log_flush_wait. That causes the journal
    to skip over log headers, which results in multiple wrap points,
    which itself leads to infinite loops in journal replay, both in the
    kernel code and fsck.gfs2 code. This patch re-adds that call.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch closes a small timing window whereby a request to hold the
    transaction glock can get stuck. The problem is that after the DLM has
    granted the lock, it can get into a state whereby it doesn't transition
    the glock to a held state, due to not having requeued the glock state
    machine to finish the transition.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • gfs2_lookupi() can return NULL if the path to the root is broken by
    another rename/rmdir. In this case gfs2_ok_to_move() must check for
    this NULL pointer and return error.

    Resolves: rhbz#1060246
    Signed-off-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Abhi Das
     

07 Mar, 2014

5 commits


06 Mar, 2014

1 commit

  • Return -E2BIG rather than -EINVAL if hit the maximum size limits of
    ACLs, as the former errno is consistent with VFS xattr syscalls.

    This is pointed out by Dave Chinner in previous discussion thread:
    http://www.spinics.net/lists/linux-fsdevel/msg71125.html

    Signed-off-by: Jie Liu
    Signed-off-by: Steven Whitehouse

    Jie Liu
     

03 Mar, 2014

1 commit

  • This patch fixes a long standing issue in mapping the journal
    extents. Most journals will consist of only a single extent,
    and although the cache took account of that by merging extents,
    it did not actually map large extents, but instead was doing a
    block by block mapping. Since the journal was only being mapped
    on mount, this was not normally noticeable.

    With the updated code, it is now possible to use the same extent
    mapping system during journal recovery (which will be added in a
    later patch). This will allow checking of the integrity of the
    journal before any reply of the journal content is attempted. For
    this reason the code is moving to bmap.c, since it will be used
    more widely in due course.

    An exercise left for the reader is to compare the new function
    gfs2_map_journal_extents() with gfs2_write_alloc_required()

    Additionally, should there be a failure, the error reporting is
    also updated to show more detail about what went wrong.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Feb, 2014

1 commit


25 Feb, 2014

3 commits

  • By reordering some of the assignments in gfs2_log_flush() it
    is possible to remove one of the "if" statements as it can be
    merged with one higher up the function.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Now we have a master transaction into which other transactions
    are merged, the accounting can be done using this master
    transaction. We no longer require the superblock fields which
    were being used for this function.

    In addition, this allows for a clean up in calc_reserved()
    making it rather easier understand. Also, by reducing the
    number of variables used to track the buffers being added
    and removed from the journal, a number of error checks are
    now no longer required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Over time, we hope to be able to improve the concurrency available
    in the log code. This is one small step towards that, by moving
    the buffer lists from the super block, and into the transaction
    structure, so that each transaction builds its own buffer lists.

    At transaction commit time, the buffer lists are merged into
    the currently accumulating transaction. That transaction then
    is passed into the before and after commit functions at journal
    flush time. Thus there should be no change in overall behaviour
    yet.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Feb, 2014

1 commit


17 Feb, 2014

1 commit


10 Feb, 2014

1 commit

  • Mark functions as static in gfs2/rgrp.c because they are not used
    outside this file.

    This eliminates the following warning in gfs2/rgrp.c:
    fs/gfs2/rgrp.c:1092:5: warning: no previous prototype for ‘gfs2_rgrp_bh_get’ [-Wmissing-prototypes]
    fs/gfs2/rgrp.c:1157:5: warning: no previous prototype for ‘update_rgrp_lvb’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: Steven Whitehouse

    Rashika Kheria
     

07 Feb, 2014

1 commit

  • The intent of this new field in the directory entry is to
    allow a subsequent lookup to know how many blocks, which
    are contiguous with the inode, contain metadata which relates
    to the inode. This will then allow the issuing of a single
    read to read these blocks, rather than reading the inode
    first, and then issuing a second read for the metadata.

    This only works under some fairly strict conditions, since
    we do not have back pointers from inodes to directory entries
    we must ensure that the blocks referenced in this way will
    always belong to the inode.

    This rules out being able to use this system for indirect
    blocks, as these can change as a result of truncate/rewrite.

    So the idea here is to restrict this to xattr blocks only
    for the time being. For most inodes, that means only a
    single block. Also, when using ACLs and/or SELinux or
    other LSMs, these will be added at inode creation time
    so that they will be contiguous with the inode on disk and
    also will almost always be needed when we read the inode in
    for permissions checks.

    Once an xattr block for an inode is allocated, it will never
    change until the inode is deallocated.

    This patch adds the new field, a further patch will add the
    readahead in due course.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

06 Feb, 2014

2 commits

  • This patch causes GFS2 to lock the i_mutex during fallocate. It
    also switches from using a dinode's inode glock to using a local
    holder like the other GFS2 i_operations.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • GFS2 has carried what is more or less a copy of the
    write_cache_pages() for some time. It seems that this
    copy has slipped behind the core code over time. This
    patch brings it back uptodate, and in addition adds the
    tracepoint which would otherwise be missing.

    We could go further, and eliminate some or all of the
    code duplication here. The issue is that if we do that,
    then the function we need to split out from the existing
    write_cache_pages(), which will look a lot like
    gfs2_jdata_write_pagevec(), would land up putting quite a
    lot of extra variables on the stack. I know that has been
    a problem in the past in the writeback code path, which
    is why I've hesitated to do it here.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

04 Feb, 2014

1 commit

  • This is another step towards improving the allocation of xattr
    blocks at inode allocation time. Here we take advantage of
    Christoph's recent work on ACLs to allocate a block for the
    xattrs early if we know that we will be adding ACLs to the
    inode later on. The advantage of that is that it is much
    more likely that we'll get a contiguous run of two blocks
    where the first is the inode and the second is the xattr block.

    We still have to fall back to the original system in case we
    don't get the requested two contiguous blocks, or in case the
    ACLs are too large to fit into the block.

    Future patches will move more of the ACL setting code further
    up the gfs2_inode_create() function. Also, I'd like to be
    able to do the same thing with the xattrs from LSMs in
    due course, too. That way we should be able to slowly reduce
    the number of independent transactions, at least in the
    most common cases.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2014

1 commit

  • When we do a flush of the AIL list, we are writing out what is
    likely to be a lot of small I/Os, which are possibly in an order
    which is not ideal performance-wise. Since this is done by calling
    filemap_fdatatwrite for each individual inode's address space there
    is no overall plugging going on.

    In addition to that, we do not always wait for AIL i/o when we flush
    it, so that it is possible for things to get left behind on the queue.
    By adding explicit plugging here, we reduce the chances of this
    being an issues. A quick test using the AIL flush tracepoint shows a
    small, but measurable improvement.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

26 Jan, 2014

3 commits


18 Jan, 2014

1 commit

  • 0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias()
    can't return error unless it was given an IS_ERR(inode)".

    That was true of the implementation of d_splice_alias, but this is
    really a problem with d_splice_alias: at a minimum it should be able to
    return -ELOOP in the case where inserting the given dentry would cause a
    directory loop.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Steven Whitehouse

    J. Bruce Fields
     

16 Jan, 2014

1 commit

  • This is a small cleanup to function gfs2_rgrp_go_lock so that it
    uses rgd instead of its more complicated twin.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson