09 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

04 Jan, 2012

5 commits


24 Dec, 2011

1 commit


23 Dec, 2011

1 commit


17 Dec, 2011

1 commit

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     

16 Dec, 2011

5 commits

  • …/btrfs-work into integration

    Conflicts:
    fs/btrfs/inode.c

    Signed-off-by: Chris Mason <chris.mason@oracle.com>

    Chris Mason
     
  • I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
    a loop. This is because we will add an orphan item, do the truncate, the
    truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
    left with an orphan item still in the fs. Then we come back later to do another
    truncate and it blows up because we already have an orphan item. This is ok so
    just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We were occasionaly leaking space when running xfstest 269. This is because if
    we failed to start the transaction in the truncate loop we'd just goto out, but
    we need to break so that the inode is removed from the orphan list and the space
    is properly freed. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Running xfstests 269 with some tracing my scripts kept spitting out errors about
    releasing bytes that we didn't actually have reserved. This took me down a huge
    rabbit hole and it turns out the way we deal with reserved_extents is wrong,
    we need to only be setting it if the reservation succeeds, otherwise the free()
    method will come in and unreserve space that isn't actually reserved yet, which
    can lead to other warnings and such. The math was all working out right in the
    end, but it caused all sorts of other issues in addition to making my scripts
    yell and scream and generally make it impossible for me to track down the
    original issue I was looking for. The other problem is with our error handling
    in the reservation code. There are two cases that we need to deal with

    1) We raced with free. In this case free won't free anything because csum_bytes
    is modified before we dro the lock in our reservation path, so free rightly
    doesn't release any space because the reservation code may be depending on that
    reservation. However if we fail, we need the reservation side to do the free at
    that point since that space is no longer in use. So as it stands the code was
    doing this fine and it worked out, except in case #2

    2) We don't race with free. Nobody comes in and changes anything, and our
    reservation fails. In this case we didn't reserve anything anyway and we just
    need to clean up csum_bytes but not free anything. So we keep track of
    csum_bytes before we drop the lock and if it hasn't changed we know we can just
    decrement csum_bytes and carry on.

    Because of the case where we can race with free()'s since we have to drop our
    spin_lock to do the reservation, I'm going to serialize all reservations with
    the i_mutex. We already get this for free in the heavy use paths, truncate and
    file write all hold the i_mutex, just needed to add it to page_mkwrite and
    various ioctl/balance things. With this patch my space leak scripts no longer
    scream bloody murder. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Now that we're properly keeping track of delayed inode space we've been getting
    a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
    because a bunch of people call mark_inode_dirty, which is void so we can't
    return ENOSPC. This needs to be fixed in a few areas

    1) file_update_time - this updates the mtime and such when writing to a file,
    which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
    call btrfs_dirty_inode directly and return an error if we get one appropriately.

    2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
    setting ->setattr for symlinks, even though we should have been. This catches
    one of the cases where we were getting errors in mark_inode_dirty.

    3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
    instead of mark_inode_dirty. This lets us return errors properly for truncate
    and chown/anything related to setattr.

    4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
    print an error if we have one. The only remaining user we can't control for
    this is touch_atime(), but we don't really want to keep people from walking
    down the tree if we don't have space to save the atime update, so just complain
    but don't worry about it.

    With this patch xfstests 83 complains a handful of times instead of hundreds of
    times. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 Dec, 2011

4 commits

  • The Smack LSM hook for security_d_instantiate checks
    the inode's i_op->getxattr value to determine if the
    containing filesystem supports extended attributes.
    The BTRFS filesystem sets the inode's i_op value only
    after it has instantiated the inode. This results in
    Smack incorrectly giving new BTRFS inodes attributes
    from the filesystem defaults on the assumption that
    values can't be stored on the filesystem. This patch
    moves the assignment of inode operation vectors ahead
    of the calls to d_instantiate, letting Smack know that
    the filesystem supports extended attributes. There
    should be no impact on the performance or behavior of
    BTRFS.

    Signed-off-by: Casey Schaufler
    Signed-off-by: Chris Mason

    Casey Schaufler
     
  • Since we have the free space caches, btrfs_orphan_cleanup also runs for
    the tree_root. Unfortunately this also cleans up the orphans used to mark
    subvol deletions in progress.

    Currently if a subvol deletion gets interrupted twice by umount/mount, the
    deletion will not be continued and the space permanently lost, though it
    would be possible to write a tool to recover those lost subvol deletions.
    This patch checks if the orphan belongs to a subvol (dead root) and skips
    the deletion.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • Btrfsck report errors after the 83th case of xfstests was run, The error
    number is 400, it means the used disk space of the file is wrong.

    The reason of this bug is that:
    The file truncation may fail when the space of the file system is not enough,
    and leave some file extents, whose offset are beyond the end of the files.
    When we want to expand those files, we will drop those file extents, and
    put in dummy file extents, and then we should update the i-node. But btrfs
    forgets to do it.

    This patch adds the forgotten i-node update.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Btrfsck report error 100 after the 83th case of xfstests was run, it means
    the i_size of the file is wrong.

    The reason of this bug is that:
    Btrfs increased i_size of the file at the beginning, but it failed to expand
    the file, and failed to update the i_size to the old size because there is no
    enough space in the file system, so we found a wrong i_size.

    This patch fixes this bug by updating the i_size just when we pass the file
    expanding and get enough space to update i-node.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     

02 Dec, 2011

2 commits

  • The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
    Please let me know if I missed anything, and I will try to get it changed and resent.

    Signed-off-by: Justin P. Mattock
    Acked-by: Randy Dunlap
    Signed-off-by: Jiri Kosina

    Justin P. Mattock
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix meta data raid-repair merge problem
    Btrfs: skip allocation attempt from empty cluster
    Btrfs: skip block groups without enough space for a cluster
    Btrfs: start search for new cluster at the beginning
    Btrfs: reset cluster's max_size when creating bitmap
    Btrfs: initialize new bitmaps' list
    Btrfs: fix oops when calling statfs on readonly device
    Btrfs: Don't error on resizing FS to same size
    Btrfs: fix deadlock on metadata reservation when evicting a inode
    Fix URL of btrfs-progs git repository in docs
    btrfs scrub: handle -ENOMEM from init_ipath()

    Linus Torvalds
     

01 Dec, 2011

1 commit


23 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: remove free-space-cache.c WARN during log replay
    Btrfs: sectorsize align offsets in fiemap
    Btrfs: clear pages dirty for io and set them extent mapped
    Btrfs: wait on caching if we're loading the free space cache
    Btrfs: prefix resize related printks with btrfs:
    btrfs: fix stat blocks accounting
    Btrfs: avoid unnecessary bitmap search for cluster setup
    Btrfs: fix to search one more bitmap for cluster setup
    btrfs: mirror_num should be int, not u64
    btrfs: Fix up 32/64-bit compatibility for new ioctls
    Btrfs: fix barrier flushes
    Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

    Linus Torvalds
     

20 Nov, 2011

1 commit

  • Round inode bytes and delalloc bytes up to real blocksize before
    converting to sector size. Otherwise eg. files smaller than 512
    are reported with zero blocks due to incorrect rounding.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     

12 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: rename the option to nospace_cache
    Btrfs: handle bio_add_page failure gracefully in scrub
    Btrfs: fix deadlock caused by the race between relocation
    Btrfs: only map pages if we know we need them when reading the space cache
    Btrfs: fix orphan backref nodes
    Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
    Btrfs: fix unreleased path in btrfs_orphan_cleanup()
    Btrfs: fix no reserved space for writing out inode cache
    Btrfs: fix nocow when deleting the item
    Btrfs: tweak the delayed inode reservations again
    Btrfs: rework error handling in btrfs_mount()
    Btrfs: close devices on all error paths in open_ctree()
    Btrfs: avoid null dereference and leaks when bailing from open_ctree()
    Btrfs: fix subvol_name leak on error in btrfs_mount()
    Btrfs: fix memory leak in btrfs_parse_early_options()
    Btrfs: fix our reservations for updating an inode when completing io
    Btrfs: fix oops on NULL trans handle in btrfs_truncate
    btrfs: fix double-free 'tree_root' in 'btrfs_mount()'

    Linus Torvalds
     

11 Nov, 2011

2 commits

  • When we did stress test for the space relocation, the deadlock happened.
    By debugging, We found it was caused by the carelessness that we forgot
    to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
    before we end the transaction handle, so the transaction commit task waited
    the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
    but that task waited the commit task to end the transaction commit, and
    the deadlock happened. Fix it.

    Signed-ff-by: Miao Xie

    Signed-off-by: Chris Mason

    Miao Xie
     
  • Josef sent along an incremental to the inode reservation
    code to make sure we try and fall back to directly updating
    the inode item if things go horribly wrong.

    This reworks that patch slightly, adding a fallback function
    that will always try to update the inode item directly without
    going through the delayed_inode code.

    Signed-off-by: Chris Mason

    Chris Mason
     

09 Nov, 2011

2 commits

  • People have been reporting ENOSPC crashes in finish_ordered_io. This is because
    we try to steal from the delalloc block rsv to satisfy a reservation to update
    the inode. The problem with this is we don't explicitly save space for updating
    the inode when doing delalloc. This is kind of a problem and we've gotten away
    with this because way back when we just stole from the delalloc reserve without
    any questions, and this worked out fine because generally speaking the leaf had
    been modified either by the mtime update when we did the original write or
    because we just updated the leaf when we inserted the file extent item, only on
    rare occasions had the leaf not actually been modified, and that was still ok
    because we'd just use a block or two out of the over-reservation that is
    delalloc.

    Then came the delayed inode stuff. This is amazing, except it wants a full
    reservation for updating the inode since it may do it at some point down the
    road after we've written the blocks and we have to recow everything again. This
    worked out because the delayed inode stuff just stole from the global reserve,
    that is until recently when I changed that because it caused other problems.

    So here we are, we're doing everything right and being screwed for it. So take
    an extra reservation for the inode at delalloc reservation time and carry it
    through the life of the delalloc reservation. If we need it we can steal it in
    the delayed inode stuff. If we have already stolen it try and do a normal
    metadata reservation. If that fails try to steal from the delalloc reservation.
    If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
    solve this and in the meantime we'll steal from the global reserve.

    With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
    any problems.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • If we fail to reserve space in the transaction during truncate, we can
    error out with a NULL trans handle. The cleanup code needs an extra
    check to make sure we aren't trying to use the bad handle.

    Signed-off-by: Chris Mason

    Chris Mason
     

07 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
    Btrfs: check for a null fs root when writing to the backup root log
    Btrfs: fix race during transaction joins
    Btrfs: fix a potential btrfs_bio leak on scrub fixups
    Btrfs: rename btrfs_bio multi -> bbio for consistency
    Btrfs: stop leaking btrfs_bios on readahead
    Btrfs: stop the readahead threads on failed mount
    Btrfs: fix extent_buffer leak in the metadata IO error handling
    Btrfs: fix the new inspection ioctls for 32 bit compat
    Btrfs: fix delayed insertion reservation
    Btrfs: ClearPageError during writepage and clean_tree_block
    Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
    Btrfs: make a delayed_block_rsv for the delayed item insertion
    Btrfs: add a log of past tree roots
    btrfs: separate superblock items out of fs_info
    Btrfs: use the global reserve when truncating the free space cache inode
    Btrfs: release metadata from global reserve if we have to fallback for unlink
    Btrfs: make sure to flush queued bios if write_cache_pages waits
    Btrfs: fix extent pinning bugs in the tree log
    Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
    Btrfs: don't wait as long for more batches during SSD log commit
    ...

    Linus Torvalds
     

06 Nov, 2011

3 commits

  • Conflicts:
    fs/btrfs/Makefile
    fs/btrfs/extent_io.c
    fs/btrfs/extent_io.h
    fs/btrfs/scrub.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • fs_info has now ~9kb, more than fits into one page. This will cause
    mount failure when memory is too fragmented. Top space consumers are
    super block structures super_copy and super_for_commit, ~2.8kb each.
    Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

    Add a wrapper for freeing fs_info and all of it's dynamically allocated
    members.

    Signed-off-by: David Sterba

    David Sterba
     
  • I fixed a problem where we weren't reserving space for an orphan item when we
    had to fallback to using the global reserve for an unlink, but I introduced
    another problem. I was migrating the bytes from the transaction reserve to the
    global reserve and then releasing from the global reserve in
    btrfs_end_transaction(). The problem with this is that a migrate will jack up
    the size for the destination, but leave the size alone for the source, with the
    idea that you can do a release normally on the source and it all washes out, and
    then you can do a release again on the destination and it works out right. My
    way was skipping the release on the trans_block_rsv which still had the jacked
    up size from our original reservation. So instead release manually from the
    global reserve if this transaction was using it, and then set the
    trans->block_rsv back to the trans_block_rsv so that btrfs_end_transaction
    cleans everything up properly. With this patch xfstest 83 doesn't emit warnings
    about leaking space. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

02 Nov, 2011

1 commit


21 Oct, 2011

2 commits

  • To reproduce the bug:

    # mount -o nodatacow /dev/sda7 /mnt/
    # dd if=/dev/zero of=/mnt/tmp bs=4K count=1
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000136115 s, 30.1 MB/s
    # dd if=/dev/zero of=/mnt/tmp bs=4K count=1 conv=notrunc oflag=direct
    dd: writing `/mnt/tmp': Input/output error
    1+0 records in
    0+0 records out

    btrfs_ordered_update_i_size() may return 1, but btrfs_endio_direct_write()
    mistakenly takes it as an error.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • It's not a big deal if we fail to allocate the array, and instead of
    panic we can just give up compressing.

    Signed-off-by: Li Zefan

    Li Zefan
     

20 Oct, 2011

5 commits

  • Currently btrfs_block_rsv_check does 2 things, it will either refill a block
    reserve like in the truncate or refill case, or it will check to see if there is
    enough space in the global reserve and possibly refill it. However because of
    overcommit we could be well overcommitting ourselves just to try and refill the
    global reserve, when really we should just be committing the transaction. So
    breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill
    will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
    it will only tell you if the factor of the total space is still reserved.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • In __unlink_start_trans() if we don't have enough room for a reservation we will
    check to see if the unlink will free up space. If it does that's great, but we
    will still could add an orphan item, so we need to reserve enough space to add
    the orphan item. Do this and migrate the space the global reserve so it all
    works out right. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Our unlink reservations were a bit much, we were reserving 10 and I only count 8
    possible items we're touching, so comment what we're reserving for and fix the
    count value. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Yeah yeah I know this is how we used to do it and then I changed it, but damnit
    I'm changing it back. The fact is that writing out checksums will modify
    metadata, which could cause us to dirty a block group we've already written out,
    so we have to truncate it and all of it's checksums and re-write it which will
    write new checksums which could dirty a blockg roup that has already been
    written and you see where I'm going with this? This can cause unmount or really
    anything that depends on a transaction to commit to take it's sweet damned time
    to happen. So go back to the way it was, only this time we're specifically
    setting NODATACOW because we can't go through the COW pathway anyway and we're
    doing our own built-in cow'ing by truncating the free space cache. The other
    new thing is once we truncate the old cache and preallocate the new space, we
    don't need to do that song and dance at all for the rest of the transaction, we
    can just overwrite the existing space with the new cache if the block group
    changes for whatever reason, and the NODATACOW will let us do this fine. So
    keep track of which transaction we last cleared our cache in and if we cleared
    it in this transaction just say we're all setup and carry on. This survives
    xfstests and stress.sh.

    The inode cache will continue to use the normal csum infrastructure since it
    only gets written once and there will be no more modifications to the fs tree in
    a transaction commit.

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • I noticed while running xfstests 83 that if we didn't have enough space to
    delete our inode the orphan cleanup would just loop. This is because it keeps
    finding the same orphan item and keeps trying to kill it but can't because we
    don't get an error back from iput for deleting the inode. So keep track of the
    last guy we tried to kill, if it's the same as the one we're trying to kill
    currently we know we are having problems and can just error out. I don't have a
    way to test this so look hard and make sure it's right. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik