31 Jul, 2012

2 commits

  • There are several entry points which dirty pages in a filesystem. mmap
    (handled by block_page_mkwrite()), buffered write (handled by
    __generic_file_aio_write()), splice write (generic_file_splice_write),
    truncate, and fallocate (these can dirty last partial page - handled inside
    each filesystem separately). Protect these places with sb_start_write() and
    sb_end_write().

    ->page_mkwrite() calls are particularly complex since they are called with
    mmap_sem held and thus we cannot use standard sb_start_write() due to lock
    ordering constraints. We solve the problem by using a special freeze protection
    sb_start_pagefault() which ranks below mmap_sem.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

13 Jul, 2012

1 commit

  • Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as
    mapped") exposed a bug in __getblk_slow that causes mount to hang as it
    loops infinitely waiting for a buffer that lies beyond the end of the
    disk to become uptodate.

    The problem was initially reported by Torsten Hilbrich here:

    https://lkml.org/lkml/2012/6/18/54

    and also reported independently here:

    http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

    and then Richard W.M. Jones and Marcos Mello noted a few separate
    bugzillas also associated with the same issue. This patch has been
    confirmed to fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=835019

    The main problem is here, in __getblk_slow:

    for (;;) {
    struct buffer_head * bh;
    int ret;

    bh = __find_get_block(bdev, block, size);
    if (bh)
    return bh;

    ret = grow_buffers(bdev, block, size);
    if (ret < 0)
    return NULL;
    if (ret == 0)
    free_more_memory();
    }

    __find_get_block does not find the block, since it will not be marked as
    mapped, and so grow_buffers is called to fill in the buffers for the
    associated page. I believe the for (;;) loop is there primarily to
    retry in the case of memory pressure keeping grow_buffers from
    succeeding. However, we also continue to loop for other cases, like the
    block lying beond the end of the disk. So, the fix I came up with is to
    only loop when grow_buffers fails due to memory allocation issues
    (return value of 0).

    The attached patch was tested by myself, Torsten, and Rich, and was
    found to resolve the problem in call cases.

    Signed-off-by: Jeff Moyer
    Reported-and-Tested-by: Torsten Hilbrich
    Tested-by: Richard W.M. Jones
    Reviewed-by: Josh Boyer
    Cc: Stable # 3.0+
    [ Jens is on vacation, taking this directly - Linus ]
    --
    Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

31 May, 2012

1 commit


11 May, 2012

1 commit

  • Hi,

    We have a bug report open where a squashfs image mounted on ppc64 would
    exhibit errors due to trying to read beyond the end of the disk. It can
    easily be reproduced by doing the following:

    [root@ibm-p750e-02-lp3 ~]# ls -l install.img
    -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
    [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
    [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
    dd: reading `/dev/loop0': Input/output error
    277376+0 records in
    277376+0 records out
    142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s

    In dmesg, you'll find the following:

    squashfs: version 4.0 (2009/01/31) Phillip Lougher
    [ 43.106012] attempt to access beyond end of device
    [ 43.106029] loop0: rw=0, want=277410, limit=277408
    [ 43.106039] Buffer I/O error on device loop0, logical block 138704
    [ 43.106053] attempt to access beyond end of device
    [ 43.106057] loop0: rw=0, want=277412, limit=277408
    [ 43.106061] Buffer I/O error on device loop0, logical block 138705
    [ 43.106066] attempt to access beyond end of device
    [ 43.106070] loop0: rw=0, want=277414, limit=277408
    [ 43.106073] Buffer I/O error on device loop0, logical block 138706
    [ 43.106078] attempt to access beyond end of device
    [ 43.106081] loop0: rw=0, want=277416, limit=277408
    [ 43.106085] Buffer I/O error on device loop0, logical block 138707
    [ 43.106089] attempt to access beyond end of device
    [ 43.106093] loop0: rw=0, want=277418, limit=277408
    [ 43.106096] Buffer I/O error on device loop0, logical block 138708
    [ 43.106101] attempt to access beyond end of device
    [ 43.106104] loop0: rw=0, want=277420, limit=277408
    [ 43.106108] Buffer I/O error on device loop0, logical block 138709
    [ 43.106112] attempt to access beyond end of device
    [ 43.106116] loop0: rw=0, want=277422, limit=277408
    [ 43.106120] Buffer I/O error on device loop0, logical block 138710
    [ 43.106124] attempt to access beyond end of device
    [ 43.106128] loop0: rw=0, want=277424, limit=277408
    [ 43.106131] Buffer I/O error on device loop0, logical block 138711
    [ 43.106135] attempt to access beyond end of device
    [ 43.106139] loop0: rw=0, want=277426, limit=277408
    [ 43.106143] Buffer I/O error on device loop0, logical block 138712
    [ 43.106147] attempt to access beyond end of device
    [ 43.106151] loop0: rw=0, want=277428, limit=277408
    [ 43.106154] Buffer I/O error on device loop0, logical block 138713
    [ 43.106158] attempt to access beyond end of device
    [ 43.106162] loop0: rw=0, want=277430, limit=277408
    [ 43.106166] attempt to access beyond end of device
    [ 43.106169] loop0: rw=0, want=277432, limit=277408
    ...
    [ 43.106307] attempt to access beyond end of device
    [ 43.106311] loop0: rw=0, want=277470, limit=2774

    Squashfs manages to read in the end block(s) of the disk during the
    mount operation. Then, when dd reads the block device, it leads to
    block_read_full_page being called with buffers that are beyond end of
    disk, but are marked as mapped. Thus, it would end up submitting read
    I/O against them, resulting in the errors mentioned above. I fixed the
    problem by modifying init_page_buffers to only set the buffer mapped if
    it fell inside of i_size.

    Cheers,
    Jeff

    Signed-off-by: Jeff Moyer
    Acked-by: Nick Piggin

    --

    Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

26 Apr, 2012

1 commit


29 Mar, 2012

1 commit

  • In several code paths, such as when unmounting a file system (but not
    only) we send an IPI to ask each cpu to invalidate its local LRU BHs.

    For multi-cores systems that have many cpus that may not have any LRU BH
    because they are idle or because they have not performed any file system
    accesses since last invalidation (e.g. CPU crunching on high perfomance
    computing nodes that write results to shared memory or only using
    filesystems that do not use the bh layer.) This can lead to loss of
    performance each time someone switches the KVM (the virtual keyboard and
    screen type, not the hypervisor) if it has a USB storage stuck in.

    This patch attempts to only send an IPI to cpus that have LRU BH.

    Signed-off-by: Gilad Ben-Yossef
    Acked-by: Peter Zijlstra
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gilad Ben-Yossef
     

29 Feb, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

07 Nov, 2011

1 commit

  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     

01 Nov, 2011

1 commit

  • On the ext4 mailing list[1], we got some report about errors in
    __find_get_block_slow(), but the information is very limited.

    If the device information is given, we can know the name of the sick
    volume. Futhermore, we can get the corresponding status of that
    block(group, inode block etc) by analyzing the disk layout.

    [1] http://marc.info/?l=linux-ext4&m=131379831421147&w=2

    Signed-off-by: Tao Ma
    Cc: Al Viro
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tao Ma
     

31 Oct, 2011

1 commit

  • This creates a new 'reason' field in a wb_writeback_work
    structure, which unambiguously identifies who initiates
    writeback activity. A 'wb_reason' enumeration has been
    added to writeback.h, to enumerate the possible reasons.

    The 'writeback_work_class' and tracepoint event class and
    'writeback_queue_io' tracepoints are updated to include the
    symbolic 'reason' in all trace events.

    And the 'writeback_inodes_sbXXX' family of routines has had
    a wb_stats parameter added to them, so callers can specify
    why writeback is being started.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     

28 Oct, 2011

1 commit


16 Jun, 2011

1 commit

  • I've got a report of a file corruption from fsxlinux on ext3. The important
    operations to the page were:
    mapwrite to a hole
    partial write to the page
    read - found the page zeroed from the end of the normal write

    The culprit seems to be that if get_block() fails in __block_write_begin()
    (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
    Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
    of the page is needed and overwrites old data. In fact, I don't see why we
    should ever need to zero the uptodate bit here - either the page was uptodate
    when we entered __block_write_begin() and it should stay so when we leave it,
    or it was not uptodate and noone had right to set it uptodate during
    __block_write_begin() so it remains !uptodate when we leave as well. So just
    remove clearing of the bit.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

28 May, 2011

1 commit


27 May, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
    xen: cleancache shim to Xen Transcendent Memory
    ocfs2: add cleancache support
    ext4: add cleancache support
    btrfs: add cleancache support
    ext3: add cleancache support
    mm/fs: add hooks to support cleancache
    mm: cleancache core ops functions and config
    fs: add field to superblock to support cleancache
    mm/fs: cleancache documentation

    Fix up trivial conflict in fs/btrfs/extent_io.c due to includes

    Linus Torvalds
     
  • This fourth patch of eight in this cleancache series provides the
    core hooks in VFS for: initializing cleancache per filesystem;
    capturing clean pages reclaimed by page cache; attempting to get
    pages from cleancache before filesystem read; and ensuring coherency
    between pagecache, disk, and cleancache. Note that the placement
    of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
    change was required due to a patchset in 2.6.39.

    All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
    a check of a boolean global if CONFIG_CLEANCACHE is set but no
    cleancache "backend" has claimed cleancache_ops.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
    Signed-off-by: Chris Mason
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

26 May, 2011

2 commits

  • We should not allow file modification via mmap while the filesystem is
    frozen. So block in block_page_mkwrite() while the filesystem is frozen.
    We cannot do the blocking wait in __block_page_mkwrite() since e.g. ext4
    will want to call that function with transaction started in some cases
    and that would deadlock. But we can at least do the non-blocking reliable
    check in __block_page_mkwrite() which is the hardest part anyway.

    We have to check for frozen filesystem with the page marked dirty and under
    page lock with which we then return from ->page_mkwrite(). Only that way we
    cannot race with writeback done by freezing code - either we mark the page
    dirty after the writeback has started, see freezing in progress and block, or
    writeback will wait for our page lock which is released only when the fault is
    done and then writeback will writeout and writeprotect the page again.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Create __block_page_mkwrite() helper which does all what block_page_mkwrite()
    does except that it passes back errors from __block_write_begin /
    block_commit_write calls.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

25 Mar, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: simplify iget & friends
    fs: pull inode->i_lock up out of writeback_single_inode
    fs: rename inode_lock to inode_hash_lock
    fs: move i_wb_list out from under inode_lock
    fs: move i_sb_list out from under inode_lock
    fs: remove inode_lock from iput_final and prune_icache
    fs: Lock the inode LRU list separately
    fs: factor inode disposal
    fs: protect inode->i_state with inode->i_lock
    autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
    autofs4 - remove autofs4_lock
    autofs4 - fix d_manage() return on rcu-walk
    autofs4 - fix autofs4_expire_indirect() traversal
    autofs4 - fix dentry leak in autofs4_expire_direct()
    autofs4 - reinstate last used update on access
    vfs - check non-mountpoint dentry might block in __follow_mount_rcu()

    Linus Torvalds
     
  • Protect inode state transitions and validity checks with the
    inode->i_lock. This enables us to make inode state transitions
    independently of the inode_lock and is the first step to peeling
    away the inode_lock from the code.

    This requires that __iget() is done atomically with i_state checks
    during list traversals so that we don't race with another thread
    marking the inode I_FREEING between the state check and grabbing the
    reference.

    Also remove the unlock_new_inode() memory barrier optimisation
    required to avoid taking the inode_lock when clearing I_NEW.
    Simplify the code by simply taking the inode->i_lock around the
    state change and wakeup. Because the wakeup is no longer tricky,
    remove the wake_up_inode() function and open code the wakeup where
    necessary.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     

17 Mar, 2011

1 commit


10 Mar, 2011

2 commits

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

17 Dec, 2010

2 commits

  • __this_cpu_inc can create a single instruction with the same effect
    as the _get_cpu_var(..)++ construct in buffer.c.

    Cc: Wu Fengguang
    Cc: Christoph Hellwig
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     
  • Optimize various per cpu area operations through these new percpu
    operations. These operations avoid address calculations through the
    use of segment prefixes and multiple memory references through RMW
    instructions etc.

    Reduces code size:

    Before:

    christoph@linux-2.6$ size fs/buffer.o
    text data bss dec hex filename
    19169 80 28 19277 4b4d fs/buffer.o

    After:

    christoph@linux-2.6$ size fs/buffer.o
    text data bss dec hex filename
    19138 80 28 19246 4b2e fs/buffer.o

    V3->V4:
    - Move the use of this_cpu_inc_return into a later patch so that
    this one can go in without percpu infrastructure changes.

    Cc: Wu Fengguang
    Cc: Christoph Hellwig
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

27 Oct, 2010

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • bh->b_private is initialized within init_buffer(), thus this assignment is
    redundant.

    Signed-off-by: Namhyung Kim
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This removes more dead code that was somehow missed by commit 0d99519efef
    (writeback: remove unused nonblocking and congestion checks). There are
    no behavior change except for the removal of two entries from one of the
    ext4 tracing interface.

    The nonblocking checks in ->writepages are no longer used because the
    flusher now prefer to block on get_request_wait() than to skip inodes on
    IO congestion. The latter will lead to more seeky IO.

    The nonblocking checks in ->writepage are no longer used because it's
    redundant with the WB_SYNC_NONE check.

    We no long set ->nonblocking in VM page out and page migration, because
    a) it's effectively redundant with WB_SYNC_NONE in current code
    b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
    that would skip some dirty inodes on congestion and page out others, which
    is unfair in terms of LRU age.

    Inspired by Christoph Hellwig. Thanks!

    Signed-off-by: Wu Fengguang
    Cc: Theodore Ts'o
    Cc: David Howells
    Cc: Sage Weil
    Cc: Steve French
    Cc: Chris Mason
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

26 Oct, 2010

3 commits


10 Sep, 2010

1 commit


18 Aug, 2010

2 commits

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Aug, 2010

4 commits

  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split up the block_write_begin implementation - __block_write_begin is a new
    trivial wrapper for block_prepare_write that always takes an already
    allocated page and can be either called from block_write_begin or filesystem
    code that already has a page allocated. Remove the handling of already
    allocated pages from block_write_begin after switching all callers that
    do it to __block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to cont_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the only
    remaining caller and rename the non-truncating version to nobh_write_begin.

    Get rid of the superflous file argument to it while we're at it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 May, 2010

1 commit

  • Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
    setattr > vmtruncate > truncate, have filesystems call their truncate sequence
    from ->setattr if filesystem specific operations are required. vmtruncate is
    deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
    previously should be used.

    simple_setattr is introduced for simple in-ram filesystems to implement
    the new truncate sequence. Eventually all filesystems should be converted
    to implement a setattr, and the default code in notify_change should go
    away.

    simple_setsize is also introduced to perform just the ATTR_SIZE portion
    of simple_setattr (ie. changing i_size and trimming pagecache).

    To implement the new truncate sequence:
    - filesystem specific manipulations (eg freeing blocks) must be done in
    the setattr method rather than ->truncate.
    - vmtruncate can not be used by core code to trim blocks past i_size in
    the event of write failure after allocation, so this must be performed
    in the fs code.
    - convert usage of helpers block_write_begin, nobh_write_begin,
    cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
    variants. These avoid calling vmtruncate to trim blocks (see previous).
    - inode_setattr should not be used. generic_setattr is a new function
    to be used to copy simple attributes into the generic inode.
    - make use of the better opportunity to handle errors with the new sequence.

    Big problem with the previous calling sequence: the filesystem is not called
    until i_size has already changed. This means it is not allowed to fail the
    call, and also it does not know what the previous i_size was. Also, generic
    code calling vmtruncate to truncate allocated blocks in case of error had
    no good way to return a meaningful error (or, for example, atomically handle
    block deallocation).

    Cc: Christoph Hellwig
    Acked-by: Jan Kara
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de