27 May, 2013

2 commits

  • Pull NFS client bugfixes from Trond Myklebust:

    - Stable fix to prevent an rpc_task wakeup race
    - Fix a NFSv4.1 session drain deadlock
    - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
    - Ensure auth_gss pipe detection works in namespaces
    - Fix SETCLIENTID fallback if rpcsec_gss is not available

    * tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS: Fix SETCLIENTID fallback if GSS is not available
    SUNRPC: Prevent an rpc_task wakeup race
    NFSv4.1 Fix a pNFS session draining deadlock
    SUNRPC: Convert auth_gss pipe detection to work in namespaces
    SUNRPC: Faster detection if gssd is actually running
    SUNRPC: Fix a bug in gss_create_upcall

    Linus Torvalds
     
  • Pull xfs fixes from Ben Myers:
    "Here are fixes for corruption on 512 byte filesystems, a rounding
    error, a use-after-free, some flags to fix lockdep reports, and
    several fixes related to CRCs. We have a somewhat larger post -rc1
    queue than usual due to fixes related to the CRC feature we merged for
    3.10:

    - Fix for corruption with FSX on 512 byte blocksize filesystems
    - Fix rounding error in xfs_free_file_space
    - Fix use-after-free with extent free intents
    - Add several missing KM_NOFS flags to fix lockdep reports
    - Several fixes for CRC related code"

    * tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
    xfs: remote attribute lookups require the value length
    xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
    xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
    xfs: fix missing KM_NOFS tags to keep lockdep happy
    xfs: Don't reference the EFI after it is freed
    xfs: fix rounding in xfs_free_file_space
    xfs: fix sub-page blocksize data integrity writes

    Linus Torvalds
     

25 May, 2013

15 commits

  • The recent changes overhauling fs/aio.c introduced a bug that results in
    the kioctx not being freed when outstanding kiocbs are cancelled at
    exit_aio() time. Specifically, a kiocb that is cancelled has its
    completion events discarded by batch_complete_aio(), which then fails to
    wake up the process stuck in free_ioctx(). Fix this by modifying the
    wait_event() condition in free_ioctx() appropriately.

    This patch was tested with the cancel operation in the thread based code
    posted yesterday.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Kent Overstreet
    Cc: Kent Overstreet
    Cc: Josh Boyer
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     
  • Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
    then we did a thorough search for all lock resources in
    ocfs2_inode_info, including rw, inode and open lockres and found this
    bug. My kernel version is 3.0.13, and it is also in the lastest version
    3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
    goto out_unlock instead of out, because we need release buffer head, up
    read alloc sem and unlock inode.

    Signed-off-by: Joseph Qi
    Reviewed-by: Jie Liu
    Cc: Mark Fasheh
    Cc: Joel Becker
    Acked-by: Sunil Mushran
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

    DESCRIPTION:
    There are use-cases when NILFS2 file system (formatted with block size
    lesser than 4 KB) can be remounted in RO mode because of encountering of
    "broken bmap" issue.

    The issue was reported by Anthony Doggett :
    "The machine I've been trialling nilfs on is running Debian Testing,
    Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
    version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
    also reproduced it (identically) with Debian Unstable amd64 and Debian
    Experimental (using the 3.8-trunk kernel). The problematic partitions
    were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

    SYMPTOMS:
    (1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

    (2) The NILFS2 file system is remounted in RO mode.

    REPRODUSING PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett ):

    ----------------[BEGIN SCRIPT]--------------------

    VG=unencrypted
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------

    REPRODUCIBILITY: 100%

    INVESTIGATION:
    As it was discovered, the issue takes place during segment
    construction after executing such sequence of user-space operations:

    open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
    fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
    ftruncate(7, 60)

    The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
    bmap (inode number=28)" takes place because of trying to get block
    number for third block of the file with logical offset #3072 bytes. As
    it is possible to see from above output, the file has 60 bytes of the
    whole size. So, it is enough one block (1 KB in size) allocation for
    the whole file. Trying to operate with several blocks instead of one
    takes place because of discovering several dirty buffers for this file
    in nilfs_segctor_scan_file() method.

    The root cause of this issue is in nilfs_set_page_dirty function which
    is called just before writing to an mmapped page.

    When nilfs_page_mkwrite function handles a page at EOF boundary, it
    fills hole blocks only inside EOF through __block_page_mkwrite().

    The __block_page_mkwrite() function calls set_page_dirty() after filling
    hole blocks, thus nilfs_set_page_dirty function (=
    a_ops->set_page_dirty) is called. However, the current implementation
    of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
    at EOF boundary.

    As a result, buffers outside EOF are inconsistently marked dirty and
    queued for write even though they are not mapped with nilfs_get_block
    function.

    FIX:
    This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

    Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
    for this issue.

    Signed-off-by: Ryusuke Konishi
    Reported-by: Anthony Doggett
    Reported-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • In reviewing man pages, I noticed that io_getevents is documented to
    update the timeout that gets passed into the library call. This doesn't
    happen in kernel space or in the library (even though it's documented to
    do so in both places). Unless there is objection, I'd like to fix the
    comments/docs to match the code (I will also update the man page upon
    consensus).

    Signed-off-by: Jeff Moyer
    Signed-off-by: Benjamin LaHaise
    Acked-by: Cyril Hrubis
    Acked-by: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     
  • Commit 634725a92938 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
    hfs_bnode_create in hfsplus. This patch removes it from the hfs version
    and avoids an fsfuzzer crash.

    Signed-off-by: Jeff Mahoney
    Acked-by: Jeff Mahoney
    Signed-off-by: Jiri Slaby
    Cc: Vyacheslav Dubeyko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
    ocfs2_inode_lock().

    But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
    rw lock. This will cause a bug in ocfs2_lock_res_free() when testing
    res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
    decreased in __ocfs2_cluster_unlock().

    Signed-off-by: Joseph Qi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: Li Zefan
    Cc: "Duyongfeng (B)"
    Acked-by: Sunil Mushran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Intermediate value of fat_clusters can be overflowed on 32bits arch.

    Reported-by: Krzysztof Strasburger
    Signed-off-by: OGAWA Hirofumi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • When reading a remote attribute, to correctly calculate the length
    of the data buffer for CRC enable filesystems, we need to know the
    length of the attribute data. We get this information when we look
    up the attribute, but we don't store it in the args structure along
    with the other remote attr information we get from the lookup. Add
    this information to the args structure so we can use it
    appropriately.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit e461fcb194172b3f709e0b478d2ac1bdac7ab9a3)

    Dave Chinner
     
  • xfstests generic/117 fails with:

    XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

    indicating a function that does not handle the attr3 format
    correctly. Fix it.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers
    (cherry picked from commit b38958d715316031fe9ea0cc6c22043072a55f49)

    Dave Chinner
     
  • Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 72916fb8cbcf0c2928f56cdc2fbe8c7bf5517758)

    Dave Chinner
     
  • There are several places where we use KM_SLEEP allocation contexts
    and use the fact that they are called from transaction context to
    add KM_NOFS where appropriate. Unfortunately, there are several
    places where the code makes this assumption but can be called from
    outside transaction context but with filesystem locks held. These
    places need explicit KM_NOFS annotations to avoid lockdep
    complaining about reclaim contexts.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit ac14876cf9255175bf3bdad645bf8aa2b8fb2d7c)

    Dave Chinner
     
  • Checking the EFI for whether it is being released from recovery
    after we've already released the known active reference is a mistake
    worthy of a brown paper bag. Fix the (now) obvious use after free
    that it can cause.

    Reported-by: Dave Jones
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit 52c24ad39ff02d7bd73c92eb0c926fb44984a41d)

    Dave Chinner
     
  • The offset passed into xfs_free_file_space() needs to be rounded
    down to a certain size, but the rounding mask is built by a 32 bit
    variable. Hence the mask will always mask off the upper 32 bits of
    the offset and lead to incorrect writeback and invalidation ranges.

    This is not actually exposed as a bug because we writeback and
    invalidate from the rounded offset to the end of the file, and hence
    the offset we are actually punching a hole out of will always be
    covered by the code. This needs fixing, however, if we ever want to
    use exact ranges for writeback/invalidation here...

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit 28ca489c63e9aceed8801d2f82d731b3c9aa50f5)

    Dave Chinner
     
  • FSX on 512 byte block size filesystems has been failing for some
    time with corrupted data. The fault dates back to the change in
    the writeback data integrity algorithm that uses a mark-and-sweep
    approach to avoid data writeback livelocks.

    Unfortunately, a side effect of this mark-and-sweep approach is that
    each page will only be written once for a data integrity sync, and
    there is a condition in writeback in XFS where a page may require
    two writeback attempts to be fully written. As a result of the high
    level change, we now only get a partial page writeback during the
    integrity sync because the first pass through writeback clears the
    mark left on the page index to tell writeback that the page needs
    writeback....

    The cause is writing a partial page in the clustering code. This can
    happen when a mapping boundary falls in the middle of a page - we
    end up writing back the first part of the page that the mapping
    covers, but then never revisit the page to have the remainder mapped
    and written.

    The fix is simple - if the mapping boundary falls inside a page,
    then simple abort clustering without touching the page. This means
    that the next ->writepage entry that write_cache_pages() will make
    is the page we aborted on, and xfs_vm_writepage() will map all
    sections of the page correctly. This behaviour is also optimal for
    non-data integrity writes, as it results in contiguous sequential
    writeback of the file rather than missing small holes and having to
    write them a "random" writes in a future pass.

    With this fix, all the fsx tests in xfstests now pass on a 512 byte
    block size filesystem on a 4k page machine.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit 49b137cbbcc836ef231866c137d24f42c42bb483)

    Dave Chinner
     
  • Pull CIFS fix from Steve French:
    "One cifs fix to merge now - fixes possible DFS oops (I expect to
    request a merge of 4 additional cifs fixes next week)"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: only set ops for inodes in I_NEW state

    Linus Torvalds
     

24 May, 2013

5 commits

  • There was a missing _all in this loop iterator

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Fix build errors by correcting DLM dependencies in GFS2.
    Build errors happen when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m:

    fs/built-in.o: In function `gfs2_lock':
    file.c:(.text+0xc7abd): undefined reference to `dlm_posix_get'
    file.c:(.text+0xc7ad0): undefined reference to `dlm_posix_unlock'
    file.c:(.text+0xc7ad9): undefined reference to `dlm_posix_lock'
    fs/built-in.o: In function `gdlm_unmount':
    lock_dlm.c:(.text+0xd6e5b): undefined reference to `dlm_release_lockspace'
    fs/built-in.o: In function `sync_unlock':
    lock_dlm.c:(.text+0xd6e9e): undefined reference to `dlm_unlock'
    fs/built-in.o: In function `sync_lock':
    lock_dlm.c:(.text+0xd6fb6): undefined reference to `dlm_lock'
    fs/built-in.o: In function `gdlm_put_lock':
    lock_dlm.c:(.text+0xd7238): undefined reference to `dlm_unlock'
    fs/built-in.o: In function `gdlm_mount':
    lock_dlm.c:(.text+0xd753e): undefined reference to `dlm_new_lockspace'
    lock_dlm.c:(.text+0xd79d3): undefined reference to `dlm_release_lockspace'
    fs/built-in.o: In function `gdlm_lock':
    lock_dlm.c:(.text+0xd8179): undefined reference to `dlm_lock'
    fs/built-in.o: In function `gdlm_cancel':
    lock_dlm.c:(.text+0xd6b22): undefined reference to `dlm_unlock'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Steven Whitehouse

    Randy Dunlap
     
  • This patch changes the multi-block allocation code, such that
    directory inodes only get a single block reserved in the bitmap.
    That way, the bitmaps are more tightly packed together, and there
    are fewer spans of free blocks for in-use block reservations.
    This means it takes less time to find a free span of blocks in the
    bitmap, which speeds things up. This increases the performance of
    some workloads by almost 2X. In Nate's mockup.py script (which does
    (1) create dir, (2) create dir in dir, (3) create file in that dir)
    the test executes in 23 steps rather than 43 steps, a 47%
    performance improvement.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch fixes two regression problems that Abhi found in the
    GFS2 quota code.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Commit 79d852bf "NFS: Retry SETCLIENTID with AUTH_SYS instead of
    AUTH_NONE" did not take into account commit 23631227 "NFSv4: Fix the
    fallback to AUTH_NULL if krb5i is not available".

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

21 May, 2013

1 commit

  • On a CB_RECALL the callback service thread flushes the inode using
    filemap_flush prior to scheduling the state manager thread to return the
    delegation. When pNFS is used and I/O has not yet gone to the data server
    servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async
    filemap_flush call, the LAYOUTGET must proceed to completion.

    If the state manager starts to recover data while the inode flush is sending
    the LAYOUTGET, a deadlock occurs as the callback service thread holds the
    single callback session slot until the flushing is done which blocks the state
    manager thread, and the state manager thread has set the session draining bit
    which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot
    table waitq.

    Separate the draining of the back channel from the draining of the fore channel
    by moving the NFS4_SESSION_DRAINING bit from session scope into the fore
    and back slot tables. Drain the back channel first allowing the LAYOUTGET
    call to proceed (and fail) so the callback service thread frees the callback
    slot. Then proceed with draining the forechannel.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

19 May, 2013

1 commit

  • Pull btrfs fixes from Chris Mason:
    "Miao Xie has been very busy, fixing races and enospc problems and many
    other small but important pieces.

    Alexandre Oliva discovered some problems with how our error handling
    was interacting with the block layer and for now has disabled our
    partial handling of sub-page writes. The real sub-page work is in a
    series of patches from IBM that we still need to integrate and test.
    The code Alexandre has turned off was really incomplete.

    Josef has more error handling fixes and an important fix for the new
    skinny extent format.

    This also has my fix for the tracepoint crash from late in 3.9. It's
    the first stage in a larger clean up to get rid of btrfs_bio and make
    a proper bioset for all the items we need to tack into the bio. For
    now the bioset only holds our mirror_num and stripe_index, but for the
    next merge window I'll shuffle more in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: use a btrfs bioset instead of abusing bio internals
    Btrfs: make sure roots are assigned before freeing their nodes
    Btrfs: explicitly use global_block_rsv for quota_tree
    btrfs: do away with non-whole_page extent I/O
    Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context
    Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()
    Btrfs: pause the space balance when remounting to R/O
    Btrfs: fix unprotected root node of the subvolume's inode rb-tree
    Btrfs: fix accessing a freed tree root
    Btrfs: return errno if possible when we fail to allocate memory
    Btrfs: update the global reserve if it is empty
    Btrfs: don't steal the reserved space from the global reserve if their space type is different
    Btrfs: optimize the error handle of use_block_rsv()
    Btrfs: don't use global block reservation for inode cache truncation
    Btrfs: don't abort the current transaction if there is no enough space for inode cache
    Correct allowed raid levels on balance.
    Btrfs: fix possible memory leak in replace_path()
    Btrfs: fix possible memory leak in the find_parent_nodes()
    Btrfs: don't allow device replace on RAID5/RAID6
    Btrfs: handle running extent ops with skinny metadata
    ...

    Linus Torvalds
     

18 May, 2013

16 commits

  • Chris Mason
     
  • Btrfs has been pointer tagging bi_private and using bi_bdev
    to store the stripe index and mirror number of failed IOs.

    As bios bubble back up through the call chain, we use these
    to decide if and how to retry our IOs. They are also used
    to count IO failures on a per device basis.

    Recently a bio tracepoint was added lead to crashes because
    we were abusing bi_bdev.

    This commit adds a btrfs bioset, and creates explicit fields
    for the mirror number and stripe index. The plan is to
    extend this structure for all of the fields currently in
    struct btrfs_bio, which will mean one less kmalloc in
    our IO path.

    Signed-off-by: Chris Mason
    Reported-by: Tejun Heo

    Chris Mason
     
  • If we fail to load the chunk tree we'll call free_root_pointers, except we may
    not have assigned the roots for the dev_root/extent_root/csum_root yet, so we
    could NULL pointer deref at this point. Just add checks to make sure these
    roots are set to keep us from panicing. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • The quota_tree was set up to use the empty_block_rsv before
    which would be problematic when the filesystem is filled up
    and ENOSPC happens during internal operations while the quota
    tree is updated and COWed (when the btrfs_qgroup_info_item
    items) are written. In fact, use_block_rsv() which is used
    in btrfs_cow_block() falls back to the global_block_rsv in
    this case. But just in order to make it more clear what is
    happening, change it to explicitly use the global_block_rsv.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Josef Bacik

    Stefan Behrens
     
  • end_bio_extent_readpage computes whole_page based on bv_offset and
    bv_len, without taking into account that blk_update_request may modify
    them when some of the blocks to be read into a page produce a read
    error. This would cause the read to unlock only part of the file
    range associated with the page, which would in turn leave the entire
    page locked, which would not only keep the process blocked instead of
    returning -EIO to it, but also prevent any further access to the file.

    It turns out that btrfs always issues whole-page reads and writes.
    The special handling of non-whole_page appears to be a mistake or a
    left-over from a time when this wasn't the case. Indeed,
    end_bio_extent_writepage distinguished between whole_page and
    non-whole_page writes but behaved identically in both cases!

    I've replaced the whole_page computations with warnings, just to be
    sure that we're not issuing partial page reads or writes. The
    warnings should probably just go away some time.

    Signed-off-by: Alexandre Oliva
    Signed-off-by: Josef Bacik

    Alexandre Oliva
     
  • btrfs_invalidate_inodes() may sleep, so we should not invoke it in the
    spin lock context. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • We have checked if ->node is NULL or not, so it is unnecessary to
    use BUG_ON() to check again. Remove it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • The root node of the rb-tree may be changed, so we should get it under
    the lock. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • inode_tree_del() will move the tree root into the dead root list, and
    then the tree will be destroyed by the cleaner. So if we remove the
    delayed node which is cached in the inode after inode_tree_del(),
    we may access a freed tree root. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • We need to set return value explicitly, otherwise we'll lose the error
    value.

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     
  • Before applying this patch, we reserved the space for the global reserve
    by the minimum unit if we found it is empty, it was unreasonable and
    inefficient, because if the global reserve space was depleted, it implied
    that the size of the global reserve was too small. In this case, we shoud
    update the global reserve and fill it.

    Cc: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • If the type of the space we need is different with the global reserve, we
    can not steal the space from the global reserve, because we can not allocate
    the space from the free space cache that the global reserve points to.

    Cc: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • cc: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • It is very likely that there are lots of subvolumes/snapshots in the filesystem,
    so if we use global block reservation to do inode cache truncation, we may hog
    all the free space that is reserved in global rsv. So it is better that we do
    the free space reservation for inode cache truncation by ourselves.

    Cc: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • The filesystem with inode cache was forced to be read-only when we umounted it.

    Steps to reproduce:
    # mkfs.btrfs -f ${DEV}
    # mount -o inode_cache ${DEV} ${MNT}
    # dd if=/dev/zero of=${MNT}/file1 bs=1M count=8192
    # btrfs fi syn ${MNT}
    # dd if=${MNT}/file1 of=/dev/null bs=1M
    # rm -f ${MNT}/file1
    # btrfs fi syn ${MNT}
    # umount ${MNT}

    It is because there was no enough space to do inode cache truncation, and then
    we aborted the current transaction.

    But no space error is not a serious problem when we write out the inode cache,
    and it is safe that we just skip this step if we meet this problem. So we need
    not abort the current transaction.

    Reported-by: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Tested-by: Tsutomu Itoh
    Signed-off-by: Josef Bacik

    Miao Xie