13 Jan, 2012

2 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

22 Dec, 2011

1 commit

  • * master: (848 commits)
    SELinux: Fix RCU deref check warning in sel_netport_insert()
    binary_sysctl(): fix memory leak
    mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
    ipmi_watchdog: restore settings when BMC reset
    oom: fix integer overflow of points in oom_badness
    memcg: keep root group unchanged if creation fails
    nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
    nilfs2: unbreak compat ioctl
    cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
    evm: prevent racing during tfm allocation
    evm: key must be set once during initialization
    mmc: vub300: fix type of firmware_rom_wait_states module parameter
    Revert "mmc: enable runtime PM by default"
    mmc: sdhci: remove "state" argument from sdhci_suspend_host
    x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
    IB/qib: Correct sense on freectxts increment and decrement
    RDMA/cma: Verify private data length
    cgroups: fix a css_set not found bug in cgroup_attach_proc
    oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
    ...

    Conflicts:
    kernel/cgroup_freezer.c

    Rafael J. Wysocki
     

17 Dec, 2011

1 commit

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     

16 Dec, 2011

1 commit

  • Al pointed out we have some random problems with the way we account for
    num_workers_starting in the async thread stuff. First of all we need to make
    sure to decrement num_workers_starting if we fail to start the worker, so make
    __btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
    doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
    failed to create a worker. Also check_pending_worker_creates needs to call
    __btrfs_start_work in it's work function since it already increments
    num_workers_starting.

    People only start one worker at a time, so get rid of the num_workers argument
    everywhere, and make btrfs_queue_worker a void since it will always succeed.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

24 Nov, 2011

1 commit

  • * 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
    freezer: fix wait_event_freezable/__thaw_task races
    freezer: kill unused set_freezable_with_signal()
    dmatest: don't use set_freezable_with_signal()
    usb_storage: don't use set_freezable_with_signal()
    freezer: remove unused @sig_only from freeze_task()
    freezer: use lock_task_sighand() in fake_signal_wake_up()
    freezer: restructure __refrigerator()
    freezer: fix set_freezable[_with_signal]() race
    freezer: remove should_send_signal() and update frozen()
    freezer: remove now unused TIF_FREEZE
    freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
    cgroup_freezer: prepare for removal of TIF_FREEZE
    freezer: clean up freeze_processes() failure path
    freezer: kill PF_FREEZING
    freezer: test freezable conditions while holding freezer_lock
    freezer: make freezing indicate freeze condition in effect
    freezer: use dedicated lock instead of task_lock() + memory barrier
    freezer: don't distinguish nosig tasks on thaw
    freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
    freezer: rename thaw_process() to __thaw_task() and simplify the implementation
    ...

    Rafael J. Wysocki
     

23 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: remove free-space-cache.c WARN during log replay
    Btrfs: sectorsize align offsets in fiemap
    Btrfs: clear pages dirty for io and set them extent mapped
    Btrfs: wait on caching if we're loading the free space cache
    Btrfs: prefix resize related printks with btrfs:
    btrfs: fix stat blocks accounting
    Btrfs: avoid unnecessary bitmap search for cluster setup
    Btrfs: fix to search one more bitmap for cluster setup
    btrfs: mirror_num should be int, not u64
    btrfs: Fix up 32/64-bit compatibility for new ioctls
    Btrfs: fix barrier flushes
    Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

    Linus Torvalds
     

22 Nov, 2011

1 commit

  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

20 Nov, 2011

2 commits

  • My previous patch introduced some u64 for failed_mirror variables, this one
    makes it consistent again.

    Signed-off-by: Jan Schmidt
    Signed-off-by: Chris Mason

    Jan Schmidt
     
  • When btrfs is writing the super blocks, it send barrier flushes to make
    sure writeback caching drives get all the metadata on disk in the
    right order.

    But, we have two bugs in the way these are sent down. When doing
    full commits (not via the tree log), we are sending the barrier down
    before the last super when it should be going down before the first.

    In multi-device setups, we should be waiting for the barriers to
    complete on all devices before writing any of the supers.

    Both of these bugs can cause corruptions on power failures. We fix it
    with some new code to send down empty barriers to all devices before
    writing the first super.

    Alexandre Oliva found the multi-device bug. Arne Jansen did the async
    barrier loop.

    Signed-off-by: Chris Mason
    Reported-by: Alexandre Oliva

    Chris Mason
     

12 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: rename the option to nospace_cache
    Btrfs: handle bio_add_page failure gracefully in scrub
    Btrfs: fix deadlock caused by the race between relocation
    Btrfs: only map pages if we know we need them when reading the space cache
    Btrfs: fix orphan backref nodes
    Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
    Btrfs: fix unreleased path in btrfs_orphan_cleanup()
    Btrfs: fix no reserved space for writing out inode cache
    Btrfs: fix nocow when deleting the item
    Btrfs: tweak the delayed inode reservations again
    Btrfs: rework error handling in btrfs_mount()
    Btrfs: close devices on all error paths in open_ctree()
    Btrfs: avoid null dereference and leaks when bailing from open_ctree()
    Btrfs: fix subvol_name leak on error in btrfs_mount()
    Btrfs: fix memory leak in btrfs_parse_early_options()
    Btrfs: fix our reservations for updating an inode when completing io
    Btrfs: fix oops on NULL trans handle in btrfs_truncate
    btrfs: fix double-free 'tree_root' in 'btrfs_mount()'

    Linus Torvalds
     

10 Nov, 2011

2 commits

  • Fix a bug introduced by 7e662854 where we would leave devices busy on
    certain error paths in open_ctree(). fs_info is guaranteed to be
    non-NULL now so it's safe to dereference it on all error paths.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any
    of the tree roots (first 'goto fail' in open_ctree()) we would
    dereference a NULL fs_info pointer in free_fs_info(). Secondly, after
    failures from init_srcu_struct(), setup_bdi() and new_inode() we would
    leak all earlier allocated roots: fs_info fields haven't been
    initialized yet so free_fs_info() is rendered useless.

    Fix this by initializing fs_info pointer and fs_info fields before any
    allocations happen.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

07 Nov, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
    Btrfs: check for a null fs root when writing to the backup root log
    Btrfs: fix race during transaction joins
    Btrfs: fix a potential btrfs_bio leak on scrub fixups
    Btrfs: rename btrfs_bio multi -> bbio for consistency
    Btrfs: stop leaking btrfs_bios on readahead
    Btrfs: stop the readahead threads on failed mount
    Btrfs: fix extent_buffer leak in the metadata IO error handling
    Btrfs: fix the new inspection ioctls for 32 bit compat
    Btrfs: fix delayed insertion reservation
    Btrfs: ClearPageError during writepage and clean_tree_block
    Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
    Btrfs: make a delayed_block_rsv for the delayed item insertion
    Btrfs: add a log of past tree roots
    btrfs: separate superblock items out of fs_info
    Btrfs: use the global reserve when truncating the free space cache inode
    Btrfs: release metadata from global reserve if we have to fallback for unlink
    Btrfs: make sure to flush queued bios if write_cache_pages waits
    Btrfs: fix extent pinning bugs in the tree log
    Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
    Btrfs: don't wait as long for more batches during SSD log commit
    ...

    Linus Torvalds
     
  • During log replay, can commit the transaction before the fs_root
    pointers are setup, so we have to make sure they are not null before
    trying to use them.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Nov, 2011

8 commits


02 Nov, 2011

1 commit


20 Oct, 2011

3 commits

  • One of the things that kills us is the fact that our ENOSPC reservations are
    horribly over the top in most normal cases. There isn't too much that can be
    done about this because when we are completely full we really need them to work
    like this so we don't under reserve. However if there is plenty of unallocated
    chunks on the disk we can use that to gauge how much we can overcommit. So this
    patch adds chunk free space accounting so we always know how much unallocated
    space we have. Then if we fail to make a reservation within our allocated
    space, check to see if we can overcommit. In the normal flushing case (like
    with delalloc metadata reservations) we'll take the free space and divide it by
    2 if our metadata profile is setup for DUP or any of those, and then divide it
    by 8 to make sure we don't overcommit too much. Then if we're in a non-flushing
    case (we really need this reservation now!) we only limit ourselves to half of
    the free space. This makes this fio test

    [torrent]
    filename=torrent-test
    rw=randwrite
    size=4g
    ioengine=sync
    directory=/mnt/btrfs-test

    go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
    file system. This doesn't seem to break my other enospc tests, but could really
    use some more testing as this is a super scary change. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • In moving some enospc stuff around I noticed that when we unmount we are often
    evicting the free space cache inodes before we do our last commit. This isn't
    bad, but it makes us constantly have to re-read the inodes back. So instead
    don't evict the cache until after we do our last commit, this will make things a
    little less crappy and makes a future enospc change work properly. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • This is confusing code and isn't used by anything anymore, so delete it.

    Signed-off-by: Josef Bacik

    Josef Bacik
     

02 Oct, 2011

4 commits

  • This adds the hooks needed for readahead. In the readpage_end_io_hook,
    the extent state is checked for the EXTENT_READAHEAD flag. Only in this
    case the readahead hook is called, to keep the impact on non-ra as low
    as possible.
    Additionally, a hook for a failed IO is added, otherwise readahead would
    wait indefinitely for the extent to finish.

    Changes for v2:
    - eliminate race condition

    Signed-off-by: Arne Jansen

    Arne Jansen
     
  • Add state information for readahead to btrfs_fs_info and btrfs_device

    Changes v2:
    - don't wait in radix_trees
    - add own set of workers for readahead

    Reviewed-by: Josef Bacik
    Signed-off-by: Arne Jansen

    Arne Jansen
     
  • Add a READAHEAD extent buffer flag.
    Add a function to trigger a read with this flag set.

    Changes v2:
    - use extent buffer flags instead of extent state flags

    Changes v5:
    - adapt to changed read_extent_buffer_pages interface
    - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set

    Signed-off-by: Arne Jansen

    Arne Jansen
     
  • read_extent_buffer_pages currently has two modes, either trigger a read
    without waiting for anything, or wait for the I/O to finish. The former
    also bails when it's unable to lock the page. This patch now adds an
    additional parameter to allow it to block on page lock, but don't wait
    for completion.

    Changes v5:
    - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
    WAIT_PAGE_LOCK

    Change v6:
    - fix bug introduced in v5

    Signed-off-by: Arne Jansen

    Arne Jansen
     

29 Sep, 2011

1 commit


28 Jul, 2011

4 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
    Btrfs: use the commit_root for reading free_space_inode crcs
    Btrfs: reduce extent_state lock contention for metadata
    Btrfs: remove lockdep magic from btrfs_next_leaf
    Btrfs: make a lockdep class for each root
    Btrfs: switch the btrfs tree locks to reader/writer
    Btrfs: fix deadlock when throttling transactions
    Btrfs: stop using highmem for extent_buffers
    Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
    Btrfs: tag pages for writeback in sync
    Btrfs: fix enospc problems with delalloc
    Btrfs: don't flush delalloc arbitrarily
    Btrfs: use find_or_create_page instead of grab_cache_page
    Btrfs: use a worker thread to do caching
    Btrfs: fix how we merge extent states and deal with cached states
    Btrfs: use the normal checksumming infrastructure for free space cache
    Btrfs: serialize flushers in reserve_metadata_bytes
    Btrfs: do transaction space reservation before joining the transaction
    Btrfs: try to only do one btrfs_search_slot in do_setxattr

    Linus Torvalds
     
  • This patch was originally from Tejun Heo. lockdep complains about the btrfs
    locking because we sometimes take btree locks from two different trees at the
    same time. The current classes are based only on level in the btree, which
    isn't enough information for lockdep to figure out if the lock is safe.

    This patch makes a class for each type of tree, and lumps all the FS trees that
    actually have files and directories into the same class.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The extent_buffers have a very complex interface where
    we use HIGHMEM for metadata and try to cache a kmap mapping
    to access the memory.

    The next commit adds reader/writer locks, and concurrent use
    of this kmap cache would make it even more complex.

    This commit drops the ability to use HIGHMEM with extent buffers,
    and rips out all of the related code.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • A user reported a deadlock when copying a bunch of files. This is because they
    were low on memory and kthreadd got hung up trying to migrate pages for an
    allocation when starting the caching kthread. The page was locked by the person
    starting the caching kthread. To fix this we just need to use the async thread
    stuff so that the threads are already created and we don't have to worry about
    deadlocks. Thanks,

    Reported-by: Roman Mamedov
    Signed-off-by: Josef Bacik

    Josef Bacik
     

20 Jul, 2011

1 commit


18 Jun, 2011

3 commits

  • When allocation fails in btrfs_read_fs_root_no_name, ret is not set
    although it is returned, holding a garbage value.

    Signed-off-by: David Sterba
    Reviewed-by: Li Zefan
    Signed-off-by: Chris Mason

    David Sterba
     
  • Removes code no longer used. The sysfs file itself is kept, because the
    btrfs developers expressed interest in putting new entries to sysfs.

    Signed-off-by: Maarten Lankhorst
    Signed-off-by: Chris Mason

    Maarten Lankhorst
     
  • The recent commit to get rid of our trans_mutex introduced
    some races with block group relocation. The problem is that relocation
    needs to do some record keeping about each root, and it was relying
    on the transaction mutex to coordinate things in subtle ways.

    This fix adds a mutex just for the relocation code and makes sure
    it doesn't have a big impact on normal operations. The race is
    really fixed in btrfs_record_root_in_trans, which is where we
    step back and wait for the relocation code to finish accounting
    setup.

    Signed-off-by: Chris Mason

    Chris Mason