21 Sep, 2011

2 commits


18 Sep, 2011

7 commits

  • We can race with readdir and the RCU path walking stuff. This is because we
    clear the need lookup flag before actually instantiating the inode. This will
    lead the RCU path walk stuff to find a dentry it thinks is valid without a
    d_inode attached. So instead unhash the dentry when we first start the lookup,
    and then clear the flag after we've instantiated the dentry so we're garunteed
    to either try the slow lookup, or have the d_inode set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The recent reworking of btrfs' lseek lead to incorrect
    values being returned. This adds checks for seeking
    beyond EOF in SEEK_HOLE and makes sure the error
    values come back correct.

    Andi Kleen also sent in similar patches.

    Signed-off-by: Jie Liu
    Reported-by: Andi Kleen
    Signed-off-by: Chris Mason

    Jeff Liu
     
  • Chris Mason
     
  • The dst file will have the same inode flags with dst file after
    file clone, and I think it's unexpected.

    For example, the dst file will suddenly become immutable after
    getting some share of data with src file, if the src is immutable.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • To reproduce the bug:

    # mount /dev/sda7 /mnt
    # dd if=/dev/zero of=/mnt/src bs=4K count=1
    # umount /mnt

    # mount -o nodatasum /dev/sda7 /mnt
    # dd if=/dev/zero of=/mnt/dst bs=4K count=1
    # clone_range -s 4K -l 4K /mnt/src /mnt/dst

    # echo 3 > /proc/sys/vm/drop_caches
    # cat /mnt/dst
    # dmesg
    ...
    btrfs no csum found for inode 258 start 0
    btrfs csum failed ino 258 off 0 csum 2566472073 private 0

    It's because part of the file is checksummed and the other part is not,
    and then btrfs will complain checksum is not found when we read the file.

    Disallow file clone if src and dst file have different checksum flag,
    so we ensure a file is completely checksummed or unchecksummed.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • It's a bug in commit f81c9cdc567cd3160ff9e64868d9a1a7ee226480
    (Btrfs: truncate pages from clone ioctl target range)

    We should pass the dest range to the truncate function, but not the
    src range.

    Also move the function before locking extent state.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • Since the d_off in the first dirent for "." (that originates from
    the 4th argument "offset" of filldir() for the 2nd dirent for "..")
    is wrongly assigned in btrfs_real_readdir(), telldir returns same
    offset for different locations.

    | # mkfs.btrfs /dev/sdb1
    | # mount /dev/sdb1 fs0
    | # cd fs0
    | # touch file0 file1
    | # ../test
    | telldir: 0
    | readdir: d_off = 2, d_name = "."
    | telldir: 2
    | readdir: d_off = 2, d_name = ".."
    | telldir: 2
    | readdir: d_off = 3, d_name = "file0"
    | telldir: 3
    | readdir: d_off = 2147483647, d_name = "file1"
    | telldir: 2147483647

    To fix this problem, pass filp->f_pos (which is loff_t) instead.

    | # ../test
    | telldir: 0
    | readdir: d_off = 1, d_name = "."
    | telldir: 1
    | readdir: d_off = 2, d_name = ".."
    | telldir: 2
    | readdir: d_off = 3, d_name = "file0"
    :

    At the moment the "offset" for "." is unused because there is no
    preceding dirent, however it is better to pass filp->f_pos to follow
    grammatical usage.

    Signed-off-by: Hidetoshi Seto
    Signed-off-by: Chris Mason

    Hidetoshi Seto
     

13 Sep, 2011

1 commit

  • * 'for-linus' of git://github.com/chrismason/linux:
    Btrfs: add dummy extent if dst offset excceeds file end in
    Btrfs: calc file extent num_bytes correctly in file clone
    btrfs: xattr: fix attribute removal
    Btrfs: fix wrong nbytes information of the inode
    Btrfs: fix the file extent gap when doing direct IO
    Btrfs: fix unclosed transaction handle in btrfs_cont_expand
    Btrfs: fix misuse of trans block rsv
    Btrfs: reset to appropriate block rsv after orphan operations
    Btrfs: skip locking if searching the commit root in csum lookup
    btrfs: fix warning in iput for bad-inode
    Btrfs: fix an oops when deleting snapshots

    Linus Torvalds
     

11 Sep, 2011

11 commits

  • You can see there's no file extent with range [0, 4096]. Check this by
    btrfsck:

    # btrfsck /dev/sda7
    root 5 inode 258 errors 100
    ...

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • num_bytes should be 4096 not 12288.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • An attribute is not removed by 'setfattr -x attr file' and remains
    visible in attr list. This makes xfstests/062 pass again.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • If we write some data into the data hole of the file(no preallocation for this
    hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
    the other element--disk_i_size needn't be updated. At this condition, we must
    update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
    return 1).

    # mkfs.btrfs /dev/sdb1
    # mount /dev/sdb1 /mnt
    # touch /mnt/a
    # truncate -s 856002 /mnt/a
    # dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
    # umount /mnt
    # btrfsck /dev/sdb1
    root 5 inode 257 errors 400
    found 32768 bytes used err is 1

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • When we write some data to the place that is beyond the end of the file
    in direct I/O mode, a data hole will be created. And Btrfs should insert
    a file extent item that point to this hole into the fs tree. But unfortunately
    Btrfs forgets doing it.

    The following is a simple way to reproduce it:
    # mkfs.btrfs /dev/sdc2
    # mount /dev/sdc2 /test4
    # touch /test4/a
    # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
    # umount /test4
    # btrfsck /dev/sdc2
    root 5 inode 257 errors 100

    Reported-by: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Tested-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Miao Xie
     
  • The function - btrfs_cont_expand() forgot to close the transaction handle before
    it jump out the while loop. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • At the beginning of create_pending_snapshot, trans->block_rsv is set
    to pending->block_rsv and is used for snapshot things, however, when
    it is done, we do not recover it as will.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • While truncating free space cache, we forget to change trans->block_rsv
    back to the original one, but leave it with the orphan_block_rsv, and
    then with option inode_cache enable, it leads to countless warnings of
    btrfs_alloc_free_block and btrfs_orphan_commit_root:

    WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
    ...
    WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • It's not enough to just search the commit root, since we could be cow'ing the
    very block we need to search through, which would mean that its locked and we'll
    still deadlock. So use path->skip_locking as well. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • iput() shouldn't be called for inodes in I_NEW state.
    We need to mark inode as constructed first.

    WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
    Call Trace:
    [] warn_slowpath_common+0x7a/0xb0
    [] warn_slowpath_null+0x15/0x20
    [] iput+0x20b/0x210
    [] btrfs_iget+0x1eb/0x4a0
    [] btrfs_run_defrag_inodes+0x136/0x210
    [] cleaner_kthread+0x17f/0x1a0
    [] ? sub_preempt_count+0x9d/0xd0
    [] ? transaction_kthread+0x280/0x280
    [] kthread+0x96/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? kthread_worker_fn+0x190/0x190
    [] ? gs_change+0xb/0xb

    Signed-off-by: Sergei Trofimovich
    CC: Konstantin Khlebnikov
    Tested-by: David Sterba
    CC: Josef Bacik
    CC: Chris Mason
    Signed-off-by: Chris Mason

    Sergei Trofimovich
     
  • We can reproduce this oops via the following steps:

    $ mkfs.btrfs /dev/sdb7
    $ mount /dev/sdb7 /mnt/btrfs
    $ for ((i=0; ii_ino
    to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
    while the snapshot's location.objectid remains unchanged.

    However, btrfs_ino() does not take this into account, and returns a wrong ino,
    and causes the oops.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     

21 Aug, 2011

1 commit

  • This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check
    if there is enough space for balancing smarter"). We can't do 64-bit
    divides on 32-bit architectures.

    In cases where we need to divide/multiply by 2 we should just left/right
    shift respectively, and in cases where theres N number of devices use
    do_div. Also make the counters u64 to match up with rw_devices.
    Thanks,

    Signed-off-by: Josef Bacik
    Acked-and-tested-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

18 Aug, 2011

4 commits


17 Aug, 2011

10 commits

  • We need to truncate page cache pages for the clone ioctl target range or
    else we'll confuse ourselves to no end. If the old data was cached, we
    used to still see it (until remount). If the page was partially updated
    we used to get a mix of old and new data.

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • sync_pending is uninitialized before it be used, fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Btrfs subtracted the size of the allocated space twice when it allocated
    the space from the bitmap in the cluster, it broke the free space information
    and led to oops finally.

    And this patch also fixes the bug that ctl->free_space was subtracted
    without lock.

    Reported-by: Liu Bo
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • We don't use the defrag struct on this path.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     
  • We've stopped using highmem for extent buffers.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • The filesystem turns readonly instead of returning the error to the
    caller when detected error in btrfs_drop_snapshot().
    and, because the caller doesn't check the error, the function type is
    changed to 'void'.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • When checking if there is enough space for balancing a block group,
    since we do not take raid types into consideration, we do not account
    corrent amounts of space that we needed. This makes us do some extra
    work before we get ENOSPC.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     
  • When balancing, we'll first try to shrink devices for some space,
    but if it is working on a full multi-disk partition with raid protection,
    we may encounter a bug, that is, while shrinking, total_bytes may be less
    than bytes_used, and btrfs may allocate a dev extent that accesses out of
    device's bounds.

    Then we will not be able to write or read the data which stores at the end
    of the device, and get the followings:

    device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
    Btrfs detected SSD devices, enabling SSD mode
    btrfs: relocating block group 476315648 flags 9
    btrfs: found 4 extents
    attempt to access beyond end of device
    sdb5: rw=145, want=546176, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546304, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546432, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546560, limit=546147
    attempt to access beyond end of device

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     
  • When btrfs recovers from a crash, it may hit the oops below:

    ------------[ cut here ]------------
    kernel BUG at fs/btrfs/inode.c:4580!
    [...]
    RIP: 0010:[] [] btrfs_add_link+0x161/0x1c0 [btrfs]
    [...]
    Call Trace:
    [] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
    [] add_inode_ref+0x319/0x3f0 [btrfs]
    [] replay_one_buffer+0x2c7/0x390 [btrfs]
    [] walk_down_log_tree+0x32a/0x480 [btrfs]
    [] walk_log_tree+0xf5/0x240 [btrfs]
    [] btrfs_recover_log_trees+0x250/0x350 [btrfs]
    [] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
    [] open_ctree+0x1442/0x17d0 [btrfs]
    [...]

    This comes from that while replaying an inode ref item, we forget to
    check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
    then we will come to conflict corners which lead to BUG_ON().

    Signed-off-by: Liu Bo
    Tested-by: Andy Lutomirski
    Signed-off-by: Chris Mason

    liubo
     
  • We have a problem where if a user specifies discard but doesn't actually support
    it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
    because this gets called (in a fashion) from the tree log recovery code, which
    has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
    replay. So instead detect wether our devices support discard when we're adding
    them and then don't issue discards if we know that the device doesn't support
    it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
    we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

06 Aug, 2011

1 commit

  • Btrfs does bio submissions from a worker thread, and each device
    has a list of high priority bios and regular priority bios.

    Synchronous writes go to the high priority thread while async writes
    go to regular list. This commit brings back an explicit unplug
    any time we switch from high to regular priority, which makes it
    easier for the block layer to give us low latencies.

    Signed-off-by: Chris Mason

    Chris Mason
     

03 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
    Btrfs: don't call writepages from within write_full_page
    Btrfs: Remove unused variable 'last_index' in file.c
    Btrfs: clean up for find_first_extent_bit()
    Btrfs: clean up for wait_extent_bit()
    Btrfs: clean up for insert_state()
    Btrfs: remove unused members from struct extent_state
    Btrfs: clean up code for merging extent maps
    Btrfs: clean up code for extent_map lookup
    Btrfs: clean up search_extent_mapping()
    Btrfs: remove redundant code for dir item lookup
    Btrfs: make acl functions really no-op if acl is not enabled
    Btrfs: remove remaining ref-cache code
    Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
    Btrfs: use wait_event()
    Btrfs: check the nodatasum flag when writing compressed files
    Btrfs: copy string correctly in INO_LOOKUP ioctl
    Btrfs: don't print the leaf if we had an error
    btrfs: make btrfs_set_root_node void
    Btrfs: fix oops while writing data to SSD partitions
    Btrfs: Protect the readonly flag of block group
    ...

    Fix up trivial conflicts (due to acl and writeback cleanups) in
    - fs/btrfs/acl.c
    - fs/btrfs/ctree.h
    - fs/btrfs/extent_io.c

    Linus Torvalds
     

02 Aug, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
    VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
    VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
    VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
    switch posix_acl_chmod() to umode_t
    switch posix_acl_from_mode() to umode_t
    switch posix_acl_equiv_mode() to umode_t *
    switch posix_acl_create() to umode_t *
    block: initialise bd_super in bdget()
    vfs: avoid call to inode_lru_list_del() if possible
    vfs: avoid taking inode_hash_lock on pipes and sockets
    vfs: conditionally call inode_wb_list_del()
    VFS: Fix automount for negative autofs dentries
    Btrfs: load the key from the dir item in readdir into a fake dentry
    devtmpfs: missing initialialization in never-hit case
    hppfs: missing include

    Linus Torvalds
     
  • When doing a writepage we call writepages to try and write out any other dirty
    pages in the area. This could cause problems where we commit a transaction and
    then have somebody else dirtying metadata in the area as we could end up writing
    out a lot more than we care about, which could cause latency on anybody who is
    waiting for the transaction to completely finish committing. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik