24 Jul, 2012

1 commit

  • Pull the big VFS changes from Al Viro:
    "This one is *big* and changes quite a few things around VFS. What's in there:

    - the first of two really major architecture changes - death to open
    intents.

    The former is finally there; it was very long in making, but with
    Miklos getting through really hard and messy final push in
    fs/namei.c, we finally have it. Unlike his variant, this one
    doesn't introduce struct opendata; what we have instead is
    ->atomic_open() taking preallocated struct file * and passing
    everything via its fields.

    Instead of returning struct file *, it returns -E... on error, 0
    on success and 1 in "deal with it yourself" case (e.g. symlink
    found on server, etc.).

    See comments before fs/namei.c:atomic_open(). That made a lot of
    goodies finally possible and quite a few are in that pile:
    ->lookup(), ->d_revalidate() and ->create() do not get struct
    nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
    flags instead, ->create() gets "do we want it exclusive" flag.

    With the introduction of new helper (kern_path_locked()) we are rid
    of all struct nameidata instances outside of fs/namei.c; it's still
    visible in namei.h, but not for long. Come the next cycle,
    declaration will move either to fs/internal.h or to fs/namei.c
    itself. [me, miklos, hch]

    - The second major change: behaviour of final fput(). Now we have
    __fput() done without any locks held by caller *and* not from deep
    in call stack.

    That obviously lifts a lot of constraints on the locking in there.
    Moreover, it's legal now to call fput() from atomic contexts (which
    has immediately simplified life for aio.c). We also don't need
    anti-recursion logics in __scm_destroy() anymore.

    There is a price, though - the damn thing has become partially
    asynchronous. For fput() from normal process we are guaranteed
    that pending __fput() will be done before the caller returns to
    userland, exits or gets stopped for ptrace.

    For kernel threads and atomic contexts it's done via
    schedule_work(), so theoretically we might need a way to make sure
    it's finished; so far only one such place had been found, but there
    might be more.

    There's flush_delayed_fput() (do all pending __fput()) and there's
    __fput_sync() (fput() analog doing __fput() immediately). I hope
    we won't need them often; see warnings in fs/file_table.c for
    details. [me, based on task_work series from Oleg merged last
    cycle]

    - sync series from Jan

    - large part of "death to sync_supers()" work from Artem; the only
    bits missing here are exofs and ext4 ones. As far as I understand,
    those are going via the exofs and ext4 trees resp.; once they are
    in, we can put ->write_super() to the rest, along with the thread
    calling it.

    - preparatory bits from unionmount series (from dhowells).

    - assorted cleanups and fixes all over the place, as usual.

    This is not the last pile for this cycle; there's at least jlayton's
    ESTALE work and fsfreeze series (the latter - in dire need of fixes,
    so I'm not sure it'll make the cut this cycle). I'll probably throw
    symlink/hardlink restrictions stuff from Kees into the next pile, too.
    Plus there's a lot of misc patches I hadn't thrown into that one -
    it's large enough as it is..."

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
    ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
    btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
    switch dentry_open() to struct path, make it grab references itself
    spufs: shift dget/mntget towards dentry_open()
    zoran: don't bother with struct file * in zoran_map
    ecryptfs: don't reinvent the wheels, please - use struct completion
    don't expose I_NEW inodes via dentry->d_inode
    tidy up namei.c a bit
    unobfuscate follow_up() a bit
    ext3: pass custom EOF to generic_file_llseek_size()
    ext4: use core vfs llseek code for dir seeks
    vfs: allow custom EOF in generic_file_llseek code
    vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
    vfs: Remove unnecessary flushing of block devices
    vfs: Make sys_sync writeout also block device inodes
    vfs: Create function for iterating over block devices
    vfs: Reorder operations during sys_sync
    quota: Move quota syncing to ->sync_fs method
    quota: Split dquot_quota_sync() to writeback and cache flushing part
    vfs: Move noop_backing_dev_info check from sync into writeback
    ...

    Linus Torvalds
     

23 Jul, 2012

1 commit


14 Jul, 2012

6 commits

  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     
  • xfs_bdstrat_cb only adds a check for a shutdown filesystem over
    xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
    filesystem a little earlier. In addition the shutdown handling in
    xfs_bdstrat_cb is not very suitable for this caller.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • If the b_iodone handler is run in calling context in xfs_buf_iorequest we
    can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
    into xfs_buf_iorequest because an I/O error happened, which keeps calling
    back into xfs_buf_iorequest. This chain will usually not take long
    because the filesystem gets shut down because of log I/O errors, but even
    over a short time it can cause stack overflows if run on the same context.

    As a short term workaround make sure we always call the iodone handler in
    workqueue context.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Almost all metadata allocations come from shallow stack usage
    situations. Avoid the overhead of switching the allocation to a
    workqueue as we are not in danger of running out of stack when
    making these allocations. Metadata allocations are already marked
    through the args that are passed down, so this is trivial to do.

    Signed-off-by: Dave Chinner
    Reported-by: Mel Gorman
    Tested-by: Mel Gorman
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The current cursor is reallocated when retrying the allocation, so
    the existing cursor needs to be destroyed in both the restart and
    the failure cases.

    Signed-off-by: Dave Chinner
    Tested-by: Mike Snitzer
    Signed-off-by: Ben Myers

    Dave Chinner
     

22 Jun, 2012

5 commits

  • Rename the XFS log structure to xlog to help crash distinquish it from the
    other logs in Linux.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • Revert commit 1307bbd, which uses the s_umount semaphore to provide
    exclusion between xfs_sync_worker and unmount, in favor of shutting down
    the sync worker before freeing the log in xfs_log_unmount. This is a
    cleaner way of resolving the race between xfs_sync_worker and unmount
    than using s_umount.

    Signed-off-by: Ben Myers
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner

    Ben Myers
     
  • Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
    causing xfs_buf_allocate_memory() to overestimate the number of necessary
    pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
    every buffer is straddling a page boundary which causes
    xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
    which is unnecessary.

    Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
    buffer is inserted into the cache only after being fully initialized now.
    So just make xfs_buf_alloc() fill in proper block number from the beginning.

    CC: David Chinner
    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Jan Kara
     
  • When we fail to find an matching extent near the requested extent
    specification during a left-right distance search in
    xfs_alloc_ag_vextent_near, we fail to free the original cursor that
    we used to look up the XFS_BTNUM_CNT tree and hence leak it.

    Reported-by: Chris J Arges
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • An inode in the AIL can be flush locked and marked stale if
    a cluster free transaction occurs at the right time. The
    inode item is then marked as flushing, which causes xfsaild
    to spin and leaves the filesystem stalled. This is
    reproduced by running xfstests 273 in a loop for an
    extended period of time.

    Check for stale inodes before the flush lock. This marks
    the inode as pinned, leads to a log flush and allows the
    filesystem to proceed.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Brian Foster
     

21 Jun, 2012

2 commits

  • Fengguang reports:

    [ 780.529603] XFS (vdd): Ending clean mount
    [ 781.454590] ODEBUG: object is on stack, but not annotated
    [ 781.455433] ------------[ cut here ]------------
    [ 781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
    [ 781.455433] Hardware name: Bochs
    [ 781.455433] Modules linked in:
    [ 781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
    [ 781.455433] Call Trace:
    [ 781.455433] [] warn_slowpath_common+0x83/0x9b
    [ 781.455433] [] warn_slowpath_null+0x1a/0x1c
    [ 781.455433] [] __debug_object_init+0x173/0x1f1
    [ 781.455433] [] debug_object_init+0x14/0x16
    [ 781.455433] [] __init_work+0x20/0x22
    [ 781.455433] [] xfs_alloc_vextent+0x6c/0xd5

    Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.

    Reported-by: Wu Fengguang
    Signed-off-by: Jie Liu
    Signed-off-by: Ben Myers

    Jeff Liu
     
  • On filesytems with a block size smaller than PAGE_SIZE we currently have
    a problem with unwritten extents. If a we have multi-block page for
    which an unwritten extent has been allocated, and only some of the
    buffers have been written to, and they are not contiguous, we can expose
    stale data from disk in the blocks between the writes after extent
    conversion.

    Example of a page with unwritten and real data.
    buffer content
    0 empty b_state = 0
    1 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
    2 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
    3 empty b_state = 0
    4 empty b_state = 0
    5 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
    6 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
    7 empty b_state = 0

    Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
    empty. Currently buffers 1, 2, 5, and 6 are added to a single ioend,
    and when IO has completed, extent conversion creates a real extent from
    block 1 through block 6, leaving 0 and 7 unwritten. However buffers 3
    and 4 were not written to disk, so stale data is exposed from those
    blocks on a subsequent read.

    Fix this by setting iomap_valid = 0 when we find a buffer that is not
    Uptodate. This ensures that buffers 5 and 6 are not added to the same
    ioend as buffers 1 and 2. Later these blocks will be converted into two
    separate real extents, leaving the blocks in between unwritten.

    Signed-off-by: Alain Renaud
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Alain Renaud
     

02 Jun, 2012

1 commit

  • Btrfs has to make sure we have space to allocate new blocks in order to modify
    the inode, so updating time can fail. We've gotten around this by having our
    own file_update_time but this is kind of a pain, and Christoph has indicated he
    would like to make xfs do something different with atime updates. So introduce
    ->update_time, where we will deal with i_version an a/m/c time updates and
    indicate which changes need to be made. The normal version just does what it
    has always done, updates the time and marks the inode dirty, and then
    filesystems can choose to do something different.

    I've gone through all of the users of file_update_time and made them check for
    errors with the exception of the fault code since it's complicated and I wasn't
    quite sure what to do there, also Jan is going to be pushing the file time
    updates into page_mkwrite for those who have it so that should satisfy btrfs and
    make it not a big deal to check the file_update_time() return code in the
    generic fault path. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

30 May, 2012

2 commits


29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

21 May, 2012

3 commits

  • To enable easy tracing of the location of log forces and the
    frequency of them via perf, add a pair of trace points to the log
    force functions. This will help debug where excessive log forces
    are being issued from by simple perf commands like:

    # ~/perf/perf top -e xfs:xfs_log_force -G -U

    Which gives this sort of output:

    Events: 141 xfs:xfs_log_force
    - 100.00% [kernel] [k] xfs_log_force
    - xfs_log_force
    87.04% xfsaild
    kthread
    kernel_thread_helper
    - 12.87% xfs_buf_lock
    _xfs_buf_find
    xfs_buf_get
    xfs_trans_get_buf
    xfs_da_do_buf
    xfs_da_get_buf
    xfs_dir2_data_init
    xfs_dir2_leaf_addname
    xfs_dir_createname
    xfs_create
    xfs_vn_mknod
    xfs_vn_create
    vfs_create
    do_last.isra.41
    path_openat
    do_filp_open
    do_sys_open
    sys_open
    system_call_fastpath

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Note xfs_iget can be called while holding a locked agi buffer. If
    it goes into memory reclaim then inode teardown may try to lock the
    same buffer. Prevent the deadlock by calling radix_tree_preload
    with GFP_NOFS.

    Signed-off-by: Peter Watkins
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Peter Watkins
     
  • xfstest 270 was causing quota reservations way beyond what was sane
    (ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
    in the error handling path of xfs_bmapi_reserve_delalloc() because
    xfs_trans_unreserve_quota_nblks() simple negates the value passed -
    which doesn't work for an unsigned variable. This causes
    reservations of close to 2^32 block instead of removing a
    reservation of a handful of blocks.

    Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
    callers where unsigned integer variables are used, too.

    Signed-off-by: Dave Chinner
    Reviewed-by: Eric Sandeen
    Signed-off-by: Ben Myers

    Dave Chinner
     

16 May, 2012

1 commit

  • xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
    work during mount and unmount. This flag can be cleared by unmount
    after the xfs_sync_worker checks it but before the work is completed.
    The has caused crashes in the completion handler for the dummy
    transaction commited by xfs_sync_worker:

    PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0"
    #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
    #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
    #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
    #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
    #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
    #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
    #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
    #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246
    RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980
    RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400
    R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
    #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]

    Protect xfs_sync_worker by using the s_umount semaphore at the read
    level to provide exclusion with unmount while work is progressing.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Ben Myers
     

15 May, 2012

17 commits

  • This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs.

    Signed-off-by: Jie Liu
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Jeff Liu
     
  • Commit e459df5, 'xfs: move busy extent handling to it's own file'
    moved some code from xfs_alloc.c into xfs_extent_busy.c for
    convenience in userspace code merges. One of the functions moved is
    xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined
    STATIC. Unfortunately this function is still used in xfs_alloc.c, and
    this results in an undefined symbol in xfs.ko.

    Make xfs_extent_busy_trim not static and add its prototype to
    xfs_extent_busy.h.

    Signed-off-by: Ben Myers
    Reviewed-by: Mark Tinguely

    Ben Myers
     
  • Rather than specifying XBF_MAPPED for almost all buffers, introduce
    XBF_UNMAPPED for the couple of users that use unmapped buffers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When we fail to mount the log in xfs_mountfs(), we tear down all the
    infrastructure we have already allocated. However, the process of
    mounting the log may have progressed to the point of reading,
    caching and modifying buffers in memory. Hence before we can free
    all the infrastructure, we have to flush and remove all the buffers
    from memory.

    Problem first reported by Eric Sandeen, later a different incarnation
    was reported by Ben Myers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Recent event tracing during a debugging session showed that flags
    that define the IO type for a buffer are leaking into the flags on
    the buffer incorrectly. Fix the flag exclusion mask in
    xfs_buf_alloc() to avoid problems that may be caused by such
    leakage.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no
    need to keep it in it's own file anymore, so move it to xfs_fsops.c
    next to xfs_fs_goingdown() and kill xfs_rw.c.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The only thing left in xfs_rw.h is a function prototype for an inode
    function. Move that to xfs_inode.h, and kill xfs_rw.h.

    Also move the function implementing the prototype from xfs_rw.c to
    xfs_inode.c so we only have one function left in xfs_rw.c

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • This is the only remaining useful function in xfs_rw.h, so move it
    to a header file responsible for block mapping functions that the
    callers already include. Soon we can get rid of xfs_rw.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Now that the busy extent tracking has been moved out of the
    allocation files, clean up the namespace it uses to
    "xfs_extent_busy" rather than a mix of "xfs_busy" and
    "xfs_alloc_busy".

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • To make it easier to handle userspace code merges, move all the busy
    extent handling out of the allocation code and into it's own file.
    The userspace code does not need the busy extent code, so this
    simplifies the merging of the kernel code into the userspace
    xfsprogs library.

    Because the busy extent code has been almost completely rewritten
    over the past couple of years, also update the copyright on this new
    file to include the authors that made all those changes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • fsstress has a particular effective way of stopping debug XFS
    kernels. We keep seeing assert failures due finding delayed
    allocation extents where there should be none. This shows up when
    extracting extent maps and we are holding all the locks we should be
    to prevent races, so this really makes no sense to see these errors.

    After checking that fsstress does not use mmap, it occurred to me
    that fsstress uses something that no sane application uses - the
    XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
    do allocation of blocks beyond EOF without using preallocation, and
    then call setattr to extend and zero the allocated blocks.

    THe problem here is this is a buffered write, and hence the
    allocation is a delayed allocation. Unlike the buffered IO path, the
    allocation and zeroing are not serialised using the IOLOCK. Hence
    the ALLOCSP operation can race with operations holding the iolock to
    prevent buffered IO operations from occurring.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
    This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
    avoid recursion through memory reclaim back into the filesystem.

    All the blocking get calls in growfs occur inside a transaction, even though
    they are no part of the transaction, so all allocation will be GFP_NOFS due to
    the task flag PF_TRANS being set. The blocking read calls occur during log
    recovery, so they will probably be unaffected by converting to GFP_NOFS
    allocations.

    Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
    outside a transaction context. The error handling is slightly different in that
    xfs_read_buf stales the errored buffer it gets back, but there is probably good
    reason for xfs_trans_read_buf() for doing this.

    Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
    and convert all the callers of xfs_read_buf() to use the former function. We can
    then remove xfs_read_buf().

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Buffers are always returned locked from the lookup routines. Hence
    we don't need to tell the lookup routines to return locked buffers,
    on to try and lock them. Remove XBF_LOCK from all the callers and
    from internal buffer cache usage.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_buf_btoc and friends are simple macros that do basic block
    to page index conversion and vice versa. These aren't widely used,
    and we use open coded masking and shifting everywhere else. Hence
    remove the macros and open code the work they do.

    Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
    incorrect - we are using pages directly and not the page cache, so
    use PAGE_{SIZE|MASK|SHIFT} instead.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner