05 Apr, 2014

2 commits

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     
  • Pull renameat2 system call from Miklos Szeredi:
    "This adds a new syscall, renameat2(), which is the same as renameat()
    but with a flags argument.

    The purpose of extending rename is to add cross-rename, a symmetric
    variant of rename, which exchanges the two files. This allows
    interesting things, which were not possible before, for example
    atomically replacing a directory tree with a symlink, etc... This
    also allows overlayfs and friends to operate on whiteouts atomically.

    Andy Lutomirski also suggested a "noreplace" flag, which disables the
    overwriting behavior of rename.

    These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
    implemented for ext4 as an example and for testing"

    * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ext4: add cross rename support
    ext4: rename: split out helper functions
    ext4: rename: move EMLINK check up
    ext4: rename: create ext4_renament structure for local vars
    vfs: add cross-rename
    vfs: lock_two_nondirectories: allow directory args
    security: add flags to rename hooks
    vfs: add RENAME_NOREPLACE flag
    vfs: add renameat2 syscall
    vfs: rename: use common code for dir and non-dir
    vfs: rename: move d_move() up
    vfs: add d_is_dir()

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

01 Apr, 2014

1 commit


25 Mar, 2014

1 commit

  • Use cmpxchg() to atomically set i_flags instead of clearing out the
    S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
    EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
    where an immutable file has the immutable flag cleared for a brief
    window of time.

    Reported-by: John Sullivan
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

09 Nov, 2013

5 commits


11 Sep, 2013

5 commits

  • Now that the shrinker is passing a node in the scan control structure, we
    can pass this to the the generic LRU list code to isolate reclaim to the
    lists on matching nodes.

    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • When removing an element from the lru, this will be done today after the lock
    is released. This is a clear mistake, although we are not sure if the bugs we
    are seeing are related to this. All list manipulations are done inside the
    lock, and so should this one.

    Signed-off-by: Glauber Costa
    Tested-by: Michal Hocko
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • [glommer@openvz.org: adapted for new LRU return codes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • Convert superblock shrinker to use the new count/scan API, and propagate
    the API changes through to the filesystem callouts. The filesystem
    callouts already use a count/scan API, so it's just changing counters to
    longs to match the VM API.

    This requires the dentry and inode shrinker callouts to be converted to
    the count/scan API. This is mainly a mechanical change.

    [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • This series reworks our current object cache shrinking infrastructure in
    two main ways:

    * Noticing that a lot of users copy and paste their own version of LRU
    lists for objects, we put some effort in providing a generic version.
    It is modeled after the filesystem users: dentries, inodes, and xfs
    (for various tasks), but we expect that other users could benefit in
    the near future with little or no modification. Let us know if you
    have any issues.

    * The underlying list_lru being proposed automatically and
    transparently keeps the elements in per-node lists, and is able to
    manipulate the node lists individually. Given this infrastructure, we
    are able to modify the up-to-now hammer called shrink_slab to proceed
    with node-reclaim instead of always searching memory from all over like
    it has been doing.

    Per-node lru lists are also expected to lead to less contention in the lru
    locks on multi-node scans, since we are now no longer fighting for a
    global lock. The locks usually disappear from the profilers with this
    change.

    Although we have no official benchmarks for this version - be our guest to
    independently evaluate this - earlier versions of this series were
    performance tested (details at
    http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
    visible performance regressions while yielding a better qualitative
    behavior in NUMA machines.

    With this infrastructure in place, we can use the list_lru entry point to
    provide memcg isolation and per-memcg targeted reclaim. Historically,
    those two pieces of work have been posted together. This version presents
    only the infrastructure work, deferring the memcg work for a later time,
    so we can focus on getting this part tested. You can see more about the
    history of such work at http://lwn.net/Articles/552769/

    Dave Chinner (18):
    dcache: convert dentry_stat.nr_unused to per-cpu counters
    dentry: move to per-sb LRU locks
    dcache: remove dentries from LRU before putting on dispose list
    mm: new shrinker API
    shrinker: convert superblock shrinkers to new API
    list: add a new LRU list type
    inode: convert inode lru list to generic lru list code.
    dcache: convert to use new lru list infrastructure
    list_lru: per-node list infrastructure
    shrinker: add node awareness
    fs: convert inode and dentry shrinking to be node aware
    xfs: convert buftarg LRU to generic code
    xfs: rework buffer dispose list tracking
    xfs: convert dquot cache lru to list_lru
    fs: convert fs shrinkers to new scan/count API
    drivers: convert shrinkers to new count/scan API
    shrinker: convert remaining shrinkers to count/scan API
    shrinker: Kill old ->shrink API.

    Glauber Costa (7):
    fs: bump inode and dentry counters to long
    super: fix calculation of shrinkable objects for small numbers
    list_lru: per-node API
    vmscan: per-node deferred work
    i915: bail out earlier when shrinker cannot acquire mutex
    hugepage: convert huge zero page shrinker to new shrinker API
    list_lru: dynamically adjust node arrays

    This patch:

    There are situations in very large machines in which we can have a large
    quantity of dirty inodes, unused dentries, etc. This is particularly true
    when umounting a filesystem, where eventually since every live object will
    eventually be discarded.

    Dave Chinner reported a problem with this while experimenting with the
    shrinker revamp patchset. So we believe it is time for a change. This
    patch just moves int to longs. Machines where it matters should have a
    big long anyway.

    Signed-off-by: Glauber Costa
    Cc: Dave Chinner
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: Dave Chinner
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     

04 Sep, 2013

1 commit


29 Jun, 2013

1 commit


02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

14 Apr, 2013

1 commit

  • Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").

    This commit doesn't look right: since we are looking at the tail of the
    list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
    it back at the head of the list instead of the tail, otherwise we will
    keep spinning on it.

    Discovered when investigating why prune_icache_sb came top in perf
    reports of a swapping load.

    Signed-off-by: Suleiman Souhlal
    Signed-off-by: Hugh Dickins
    Cc: stable@vger.kernel.org # v3.2+
    Signed-off-by: Linus Torvalds

    Suleiman Souhlal
     

10 Apr, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

23 Feb, 2013

1 commit


12 Dec, 2012

1 commit

  • Overhaul struct address_space.assoc_mapping renaming it to
    address_space.private_data and its type is redefined to void*. By this
    approach we consistently name the .private_* elements from struct
    address_space as well as allow extended usage for address_space
    association with other data structures through ->private_data.

    Also, all users of old ->assoc_mapping element are converted to reflect
    its new name and type change (->private_data).

    Signed-off-by: Rafael Aquini
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

27 Nov, 2012

1 commit

  • Commit 169ebd90131b ("writeback: Avoid iput() from flusher thread")
    removed iget-iput pair from inode writeback. As a side effect, inodes
    that are dirty during iput_final() call won't be ever added to inode LRU
    (iput_final() doesn't add dirty inodes to LRU and later when the inode
    is cleaned there's noone to add the inode there). Thus inodes are
    effectively unreclaimable until someone looks them up again.

    The practical effect of this bug is limited by the fact that inodes are
    pinned by a dentry for long enough that the inode gets cleaned. But
    still the bug can have nasty consequences leading up to OOM conditions
    under certain circumstances. Following can easily reproduce the
    problem:

    for (( i = 0; i < 1000; i++ )); do
    mkdir $i
    for (( j = 0; j < 1000; j++ )); do
    touch $i/$j
    echo 2 > /proc/sys/vm/drop_caches
    done
    done

    then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

    We fix the issue by inserting unused clean inodes into the LRU after
    writeback finishes in inode_sync_complete().

    Signed-off-by: Jan Kara
    Reported-by: OGAWA Hirofumi
    Cc: Al Viro
    Cc: OGAWA Hirofumi
    Cc: Wu Fengguang
    Cc: Dave Chinner
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

09 Oct, 2012

1 commit

  • Implement an interval tree as a replacement for the VMA prio_tree. The
    algorithms are similar to lib/interval_tree.c; however that code can't be
    directly reused as the interval endpoints are not explicitly stored in the
    VMA. So instead, the common algorithm is moved into a template and the
    details (node type, how to get interval endpoints from the node, etc) are
    filled in using the C preprocessor.

    Once the interval tree functions are available, using them as a
    replacement to the VMA prio tree is a relatively simple, mechanical job.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

31 Jul, 2012

2 commits

  • It is unexpected to block reading of frozen filesystem because of atime update.
    Also handling blocking on frozen filesystem because of atime update would make
    locking more complex than it already is. So just skip atime update when
    filesystem is frozen like we skip it when filesystem is remounted read-only.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Most of places where we want freeze protection coincides with the places where
    we also have remount-ro protection. So make mnt_want_write() and
    mnt_drop_write() (and their _file alternative) prevent freezing as well.
    For the few cases that are really interested only in remount-ro protection
    provide new function variants.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

27 Jul, 2012

1 commit

  • Pull large btrfs update from Chris Mason:
    "This pull request is very large, and the two main features in here
    have been under testing/devel for quite a while.

    We have subvolume quotas from the strato developers. This enables
    full tracking of how many blocks are allocated to each subvolume (and
    all snapshots) and you can set limits on a per-subvolume basis. You
    can also create quota groups and toss multiple subvolumes into a big
    group. It's everything you need to be a web hosting company and give
    each user their own subvolume.

    The userland side of the quotas is being refreshed, they'll send out
    details on where to grab it soon.

    Next is the kernel side of btrfs send/receive from Alexander Block.
    This leverages the same infrastructure as the quota code to figure out
    relationships between blocks and their owners. It can then compute
    the difference between two snapshots and sends the diffs in a neutral
    format into userland.

    The basic model:

    create a snapshot
    send that snapshot as the initial backup
    make changes
    create a second snapshot
    send the incremental as a backup
    delete the first snapshot
    (use the second snapshot for the next incremental)

    The receive portion is all in userland, and in the 'next' branch of my
    btrfs-progs repo.

    There's still some work to do in terms of optimizing the send side
    from kernel to userland. The really important part is figuring out
    how two snapshots are different, and this is where we are
    concentrating right now. The initial send of a dataset is a little
    slower than tar, but the incremental sends are dramatically faster
    than what rsync can do.

    On top of all of that, we have a nice queue of fixes, cleanups and
    optimizations."

    Fix up trivial modify/del conflict in fs/btrfs/ioctl.c

    Also fix up semantic conflict in fs/btrfs/send.c: the interface to
    dentry_open() changed in commit 765927b2d508 ("switch dentry_open() to
    struct path, make it grab references itself"), and since it now grabs
    whatever references it needs, we should no longer do the mntget() on the
    mnt (and we need to dput() the dentry reference we took).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits)
    Btrfs: uninit variable fixes in send/receive
    Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
    Btrfs: add btrfs_compare_trees function
    Btrfs: introduce subvol uuids and times
    Btrfs: make iref_to_path non static
    Btrfs: add a barrier before a waitqueue_active check
    Btrfs: call the ordered free operation without any locks held
    Btrfs: Check INCOMPAT flags on remount and add helper function
    Btrfs: add helper for tree enumeration
    btrfs: allow cross-subvolume file clone
    Btrfs: improve multi-thread buffer read
    Btrfs: make btrfs's allocation smoothly with preallocation
    Btrfs: lock the transition from dirty to writeback for an eb
    Btrfs: fix potential race in extent buffer freeing
    Btrfs: don't return true in releasepage unless we actually freed the eb
    Btrfs: suppress printk() if all device I/O stats are zero
    Btrfs: remove unwanted printk() for btrfs device I/O stats
    Btrfs: rewrite BTRFS_SETGET_FUNCS
    Btrfs: zero unused bytes in inode item
    Btrfs: kill free_space pointer from inode structure
    ...

    Conflicts:
    fs/btrfs/ioctl.c

    Linus Torvalds
     

24 Jul, 2012

1 commit

  • Before the update_time inode operation was indroduced, it was
    not possible to prevent updates of atime on RO subvolumes. VFS
    was only able to check for RO on the mount, but did not know
    anything about btrfs subvolumes.

    btrfs_update_time does now check if the root is RO and skip
    updating of times.

    Signed-off-by: Alexander Block

    Alexander Block
     

14 Jul, 2012

1 commit


02 Jun, 2012

2 commits

  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     
  • Btrfs has to make sure we have space to allocate new blocks in order to modify
    the inode, so updating time can fail. We've gotten around this by having our
    own file_update_time but this is kind of a pain, and Christoph has indicated he
    would like to make xfs do something different with atime updates. So introduce
    ->update_time, where we will deal with i_version an a/m/c time updates and
    indicate which changes need to be made. The normal version just does what it
    has always done, updates the time and marks the inode dirty, and then
    filesystems can choose to do something different.

    I've gone through all of the users of file_update_time and made them check for
    errors with the exception of the fault code since it's complicated and I wasn't
    quite sure what to do there, also Jan is going to be pushing the file time
    updates into page_mkwrite for those who have it so that should satisfy btrfs and
    make it not a big deal to check the file_update_time() return code in the
    generic fault path. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

01 Jun, 2012

1 commit


31 May, 2012

1 commit


30 May, 2012

1 commit

  • Fix kernel-doc warnings in fs/inode.c:

    Warning(fs/inode.c:1493): No description found for parameter 'path'
    Warning(fs/inode.c:1493): Excess function parameter 'mnt' description in 'touch_atime'
    Warning(fs/inode.c:1493): Excess function parameter 'dentry' description in 'touch_atime'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Al Viro

    Randy Dunlap
     

29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

25 May, 2012

1 commit

  • Pull more networking updates from David Miller:
    "Ok, everything from here on out will be bug fixes."

    1) One final sync of wireless and bluetooth stuff from John Linville.
    These changes have all been in his tree for more than a week, and
    therefore have had the necessary -next exposure. John was just away
    on a trip and didn't have a change to send the pull request until a
    day or two ago.

    2) Put back some defines in user exposed header file areas that were
    removed during the tokenring purge. From Stephen Hemminger and Paul
    Gortmaker.

    3) A bug fix for UDP hash table allocation got lost in the pile due to
    one of those "you got it.. no I've got it.." situations. :-)

    From Tim Bird.

    4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
    try to coalesce overlapping frags and crash. Fix from Eric Dumazet.

    5) RCU routing table lookups can race with free_fib_info(), causing
    crashes when we deref the device pointers in the route. Fix by
    releasing the net device in the RCU callback. From Yanmin Zhang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
    tcp: take care of overlaps in tcp_try_coalesce()
    ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
    mm: add a low limit to alloc_large_system_hash
    ipx: restore token ring define to include/linux/ipx.h
    if: restore token ring ARP type to header
    xen: do not disable netfront in dom0
    phy/micrel: Fix ID of KSZ9021
    mISDN: Add X-Tensions USB ISDN TA XC-525
    gianfar:don't add FCB length to hard_header_len
    Bluetooth: Report proper error number in disconnection
    Bluetooth: Create flags for bt_sk()
    Bluetooth: report the right security level in getsockopt
    Bluetooth: Lock the L2CAP channel when sending
    Bluetooth: Restore locking semantics when looking up L2CAP channels
    Bluetooth: Fix a redundant and problematic incoming MTU check
    Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
    Bluetooth: Fix EIR data generation for mgmt_device_found
    Bluetooth: Fix Inquiry with RSSI event mask
    Bluetooth: improve readability of l2cap_seq_list code
    Bluetooth: Fix skb length calculation
    ...

    Linus Torvalds
     

24 May, 2012

1 commit

  • UDP stack needs a minimum hash size value for proper operation and also
    uses alloc_large_system_hash() for proper NUMA distribution of its hash
    tables and automatic sizing depending on available system memory.

    On some low memory situations, udp_table_init() must ignore the
    alloc_large_system_hash() result and reallocs a bigger memory area.

    As we cannot easily free old hash table, we leak it and kmemleak can
    issue a warning.

    This patch adds a low limit parameter to alloc_large_system_hash() to
    solve this problem.

    We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
    allocation.

    Reported-by: Mark Asselstine
    Reported-by: Tim Bird
    Signed-off-by: Eric Dumazet
    Cc: Paul Gortmaker
    Signed-off-by: David S. Miller

    Tim Bird
     

06 May, 2012

1 commit

  • Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
    because iput() can do a lot of work - for example truncate the inode if it's
    the last iput on unlinked file. Some filesystems depend on flusher thread
    progressing (e.g. because they need to flush delay allocated blocks to reduce
    allocation uncertainty) and so flusher thread doing truncate creates
    interesting dependencies and possibilities for deadlocks.

    We get rid of iput() in flusher thread by using the fact that I_SYNC inode
    flag effectively pins the inode in memory. So if we take care to either hold
    i_lock or have I_SYNC set, we can get away without taking inode reference
    in writeback_sb_inodes().

    As a side effect of these changes, we also fix possible use-after-free in
    wb_writeback() because inode_wait_for_writeback() call could try to reacquire
    i_lock on the inode that was already free.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara