18 Dec, 2009

26 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: make sure fallocate properly starts a transaction
    Btrfs: make metadata chunks smaller
    Btrfs: Show discard option in /proc/mounts
    Btrfs: deny sys_link across subvolumes.
    Btrfs: fail mount on bad mount options
    Btrfs: don't add extent 0 to the free space cache v2
    Btrfs: Fix per root used space accounting
    Btrfs: Fix btrfs_drop_extent_cache for skip pinned case
    Btrfs: Add delayed iput
    Btrfs: Pass transaction handle to security and ACL initialization functions
    Btrfs: Make truncate(2) more ENOSPC friendly
    Btrfs: Make fallocate(2) more ENOSPC friendly
    Btrfs: Avoid orphan inodes cleanup during committing transaction
    Btrfs: Avoid orphan inodes cleanup while replaying log
    Btrfs: Fix disk_i_size update corner case
    Btrfs: Rewrite btrfs_drop_extents
    Btrfs: Add btrfs_duplicate_item
    Btrfs: Avoid superfluous tree-log writeout

    Linus Torvalds
     
  • Introduce coredump parameter data structure (struct coredump_params) to
    simplify binfmt->core_dump() arguments.

    Signed-off-by: Masami Hiramatsu
    Suggested-by: Ingo Molnar
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • Thanks to Roland who pointed out de_thread() issues.

    Currently we add sub-threads to ->real_parent->children list. This buys
    nothing but slows down do_wait().

    With this patch ->children contains only main threads (group leaders).
    The only complication is that forget_original_parent() should iterate over
    sub-threads by hand, and de_thread() needs another list_replace() when it
    changes ->group_leader.

    Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
    tasks, we can remove this check (and we can unify do_wait_thread() and
    ptrace_do_wait()).

    This change can confuse the optimistic search in mm_update_next_owner(),
    but this is fixable and minor.

    Perhaps badness() and oom_kill_process() should be updated, but they
    should be fixed in any case.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Ingo Molnar
    Cc: Ratan Nalumasu
    Cc: Vitaly Mayatskikh
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Signed-off-by: Mike Frysinger
    Cc: David Howells
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • It can happen that write does not use all the blocks allocated in
    write_begin either because of some filesystem error (like ENOSPC) or
    because page with data to write has been removed from memory. We truncate
    these blocks so that we don't have dangling blocks beyond i_size.

    Cc: Jeff Mahoney
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as
    requested by Alexey:

    "I think I gave a good enough arguments to not merge it.
    To iterate:
    * patch makes impossible to start using ext3 on EXT3_FS=n kernels
    without reboot.
    * this is done only for one pointer on task_struct"

    None of config options which define task_struct are tristate directly
    or effectively."

    Requested-by: Alexey Dobriyan
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • …btrfs-unstable into for-linus

    Chris Mason
     
  • This reverts commit e9496ff46a20a8592fdc7bdaaf41b45eb808d310. Quoth Al:

    "it's dependent on a lot of other stuff not currently in mainline
    and badly broken with current fs/namespace.c. Sorry, badly
    out-of-order cherry-pick from old queue.

    PS: there's a large pending series reworking the refcounting and
    lifetime rules for vfsmounts that will, among other things, allow to
    rip a subtree away _without_ dissolving connections in it, to be
    garbage-collected when all active references are gone. It's
    considerably saner wrt "is the subtree busy" logics, but it's nowhere
    near being ready for merge at the moment; this changeset is one of the
    things becoming possible with that sucker, but it certainly shouldn't
    have been picked during this cycle. My apologies..."

    Noticed-by: Eric Paris
    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The recent patch to make fallocate enospc friendly would send
    down a NULL trans handle to the allocator. This moves the
    transaction start to properly fix things.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Conflicts:
    fs/btrfs/acl.c

    Chris Mason
     
  • This patch makes us a bit less zealous about making sure we have enough free
    metadata space by pearing down the size of new metadata chunks to 256mb instead
    of 1gb. Also, we used to try an allocate metadata chunks when allocating data,
    but that sort of thing is done elsewhere now so we can just remove it. With my
    -ENOSPC test I used to have 3gb reserved for metadata out of 75gb, now I have
    1.7gb. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Christoph's patch e244a0aeb6a599c19a7c802cda6e2d67c847b154 doesn't display
    the discard option in /proc/mounts, leading to some confusion for me.
    Here's the missing bit.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Chris Mason

    Matthew Wilcox
     
  • I rebased Christian Parpart's patch to deny hard link across
    subvolumes. Original patch modifies also btrfs_rename, but
    I excluded it because we can move across subvolumes now and
    it make no problem.
    -----------------

    Hard link across subvolumes should not allowed in Btrfs.
    btrfs_link checks root of 'to' directory is same as root
    of 'from' file. If not same, btrfs_link returns -EPERM.

    Signed-off-by: TARUISI Hiroaki
    Signed-off-by: Chris Mason

    TARUISI Hiroaki
     
  • We shouldn't silently ignore unrecognized options.

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • If block group 0 is completely free, btrfs_read_block_groups will
    add extent [0, BTRFS_SUPER_INFO_OFFSET) to the free space cache.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • The bytes_used field in root item was originally planned to
    trace the amount of used data and tree blocks. But it never
    worked right since we can't trace freeing of data accurately.
    This patch changes it to only trace the amount of tree blocks.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • The check for skip pinned case is wrong, it may breaks the
    while loop too soon.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • iput() can trigger new transactions if we are dropping the
    final reference, so calling it in btrfs_commit_transaction
    may end up deadlock. This patch adds delayed iput to avoid
    the issue.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • Pass transaction handle down to security and ACL initialization
    functions, so we can avoid starting nested transactions

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • truncating and deleting regular files are unbound operations,
    so it's not good to do them in a single transaction. This
    patch makes btrfs_truncate and btrfs_delete_inode start a
    new transaction after all items in a tree leaf are deleted.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • fallocate(2) may allocate large number of file extents, so it's not
    good to do it in a single transaction. This patch make fallocate(2)
    start a new transaction for each file extents it allocates.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • btrfs_lookup_dentry may trigger orphan cleanup, so it's not good
    to call it while committing a transaction.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • We do log replay in a single transaction, so it's not good to do unbound
    operations. This patch cleans up orphan inodes cleanup after replaying
    the log. It also avoids doing other unbound operations such as truncating
    a file during replaying log. These unbound operations are postponed to
    the orphan inode cleanup stage.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • There are some cases file extents are inserted without involving
    ordered struct. In these cases, we update disk_i_size directly,
    without checking pending ordered extent and DELALLOC bit. This
    patch extends btrfs_ordered_update_i_size() to handle these cases.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • After I_SYNC was split from I_LOCK the leftover is always used together with
    I_NEW and thus superflous.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We recently go rid of all callers of do_sync_file_range as they're better
    served with vfs_fsync or the filemap_write_and_wait. Now that
    do_sync_file_range is down to a single caller fold it into it so that people
    don't start using it again accidentally. While at it also switch it from
    using __filemap_fdatawrite_range(..., WB_SYNC_ALL) to the more clear
    filemap_fdatawrite_range().

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Dec, 2009

14 commits

  • Copy the inode size and blocks from one inode to another correctly on 32-bit
    systems with CONFIG_SMP, CONFIG_PREEMPT, or CONFIG_LBDAF. Use proper inode
    spinlocks only when i_size/i_blocks cannot fit in one 32-bit word.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Erez Zadok
    Signed-off-by: Al Viro

    Erez Zadok
     
  • This get_nlinks parameter was never used by the only mainline user,
    ecryptfs; and it has never been used by unionfs or wrapfs either.

    Acked-by: Dustin Kirkland
    Acked-by: Tyler Hicks
    Signed-off-by: Erez Zadok
    Signed-off-by: Al Viro

    Erez Zadok
     
  • We can't get to this point unless it's a valid pointer.

    Signed-off-by: Jeff Layton
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Filesystems outside the regular namespace do not have to clear DCACHE_UNHASHED
    in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the
    fd link from being used if its dentry is not in the hash.

    Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear;
    that depends on the filesystem calling d_add or d_rehash.

    So delete the misleading comments and needless code.

    Acked-by: Miklos Szeredi
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • Add a d_dname method for anon_inodes filesystem, the same way pipefs and
    sockfs pseudo filesystems. This allows us to remove the DCACHE_UNHASHED
    hack from anon_inodes.c (see next patch).

    [AV: inumber is useless here, dropped from anon_inodefs_dname()]

    Signed-off-by: Nick Piggin
    Cc: Miklos Szeredi
    Cc: Davide Libenzi
    Cc: "David S. Miller"
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Nick Piggin
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    XFS: Free buffer pages array unconditionally
    xfs: kill xfs_bmbt_rec_32/64 types
    xfs: improve metadata I/O merging in the elevator
    xfs: check for not fully initialized inodes in xfs_ireclaim

    Linus Torvalds
     
  • Commit 3d1e4631 ("get rid of init_file()") removed the export of
    alloc_file() -- possibly inadvertently, since that commit mainly
    consisted of deleting the lines between the end of alloc_file() and
    the start of the code in init_file().

    There is in fact one modular use of alloc_file() in the tree, in
    drivers/infiniband/core/uverbs_main.c, so re-add the export to fix:

    ERROR: "alloc_file" [drivers/infiniband/core/ib_uverbs.ko] undefined!

    when CONFIG_INFINIBAND_USER_ACCESS=m.

    Cc: Al Viro
    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Roland Dreier
     
  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (34 commits)
    HWPOISON: Remove stray phrase in a comment
    HWPOISON: Try to allocate migration page on the same node
    HWPOISON: Don't do early filtering if filter is disabled
    HWPOISON: Add a madvise() injector for soft page offlining
    HWPOISON: Add soft page offline support
    HWPOISON: Undefine short-hand macros after use to avoid namespace conflict
    HWPOISON: Use new shake_page in memory_failure
    HWPOISON: Use correct name for MADV_HWPOISON in documentation
    HWPOISON: mention HWPoison in Kconfig entry
    HWPOISON: Use get_user_page_fast in hwpoison madvise
    HWPOISON: add an interface to switch off/on all the page filters
    HWPOISON: add memory cgroup filter
    memcg: add accessor to mem_cgroup.css
    memcg: rename and export try_get_mem_cgroup_from_page()
    HWPOISON: add page flags filter
    mm: export stable page flags
    HWPOISON: limit hwpoison injector to known page types
    HWPOISON: add fs/device filters
    HWPOISON: return 0 to indicate success reliably
    HWPOISON: make semantics of IGNORED/DELAYED clear
    ...

    Linus Torvalds
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
    direct I/O fallback sync simplification
    ocfs: stop using do_sync_mapping_range
    cleanup blockdev_direct_IO locking
    make generic_acl slightly more generic
    sanitize xattr handler prototypes
    libfs: move EXPORT_SYMBOL for d_alloc_name
    vfs: force reval of target when following LAST_BIND symlinks (try #7)
    ima: limit imbalance msg
    Untangling ima mess, part 3: kill dead code in ima
    Untangling ima mess, part 2: deal with counters
    Untangling ima mess, part 1: alloc_file()
    O_TRUNC open shouldn't fail after file truncation
    ima: call ima_inode_free ima_inode_free
    IMA: clean up the IMA counts updating code
    ima: only insert at inode creation time
    ima: valid return code from ima_inode_alloc
    fs: move get_empty_filp() deffinition to internal.h
    Sanitize exec_permission_lite()
    Kill cached_lookup() and real_lookup()
    Kill path_lookup_open()
    ...

    Trivial conflicts in fs/direct-io.c

    Linus Torvalds
     
  • The code in xfs_free_buf() only attempts to free the b_pages array if the
    buffer is a page cache backed or page allocated buffer. The extra log buffer
    that is used when the log wraps uses pages that are allocated to a different
    log buffer, but it still has a b_pages array allocated when those pages
    are associated to with the extra buffer in xfs_buf_associate_memory.

    Hence we need to always attempt to free the b_pages array when tearing
    down a buffer, not just on buffers that are explicitly marked as page bearing
    buffers. This fixes a leak detected by the kernel memory leak code.

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • For a long time we've always stored bmap btree records in the 64bit format,
    so kill off the dead 32bit type, and make sure the 64bit type is named just
    xfs_bmbt_rec everywhere, without any size postfix.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Eric Sandeen
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Change all async metadata buffers to use [READ|WRITE]_META I/O types
    so that the I/O doesn't get issued immediately. This allows merging of
    adjacent metadata requests but still prioritises them over bulk data.
    This shows a 10-15% improvement in sequential create speed of small
    files.

    Don't include the log buffers in this classification - leave them as
    sync types so they are issued immediately.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Add an assert for inodes not added to the inode cache in xfs_ireclaim,
    to make sure we're not going to introduce something like the
    famous nfsd inode cache bug again.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig