11 Oct, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

1 commit


28 Sep, 2016

1 commit


19 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • On overlayfs relatime_need_update() needs inode times to be correct on
    overlay inode. But i_mtime and i_ctime are updated by filesystem code on
    underlying inode only, so they will be out-of-date on the overlay inode.

    This patch copies the times from the underlying inode if needed. This
    can't be done if called from RCU lookup (link following) but link m/ctime
    are not updated by fs, so this is all right.

    This patch doesn't change functionality for anything but overlayfs.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

07 Aug, 2016

1 commit

  • Pull binfmt_misc update from James Bottomley:
    "This update is to allow architecture emulation containers to function
    such that the emulation binary can be housed outside the container
    itself. The container and fs parts both have acks from relevant
    experts.

    To use the new feature you have to add an F option to your binfmt_misc
    configuration"

    From the docs:
    "The usual behaviour of binfmt_misc is to spawn the binary lazily when
    the misc format file is invoked. However, this doesn't work very well
    in the face of mount namespaces and changeroots, so the F mode opens
    the binary as soon as the emulation is installed and uses the opened
    image to spawn the emulator, meaning it is always available once
    installed, regardless of how the environment changes"

    * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
    binfmt_misc: add F option description to documentation
    binfmt_misc: add persistent opened binary handler for containers
    fs: add filp_clone_open API

    Linus Torvalds
     

04 Aug, 2016

1 commit


03 Aug, 2016

1 commit


28 Jul, 2016

1 commit

  • Pull xfs updates from Dave Chinner:
    "The major addition is the new iomap based block mapping
    infrastructure. We've been kicking this about locally for years, but
    there are other filesystems want to use it too (e.g. gfs2). Now it
    is fully working, reviewed and ready for merge and be used by other
    filesystems.

    There are a lot of other fixes and cleanups in the tree, but those are
    XFS internal things and none are of the scale or visibility of the
    iomap changes. See below for details.

    I am likely to send another pull request next week - we're just about
    ready to merge some new functionality (on disk block->owner reverse
    mapping infrastructure), but that's a huge chunk of code (74 files
    changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that
    separate to all the "normal" pull request changes so they don't get
    lost in the noise.

    Summary of changes in this update:
    - generic iomap based IO path infrastructure
    - generic iomap based fiemap implementation
    - xfs iomap based Io path implementation
    - buffer error handling fixes
    - tracking of in flight buffer IO for unmount serialisation
    - direct IO and DAX io path separation and simplification
    - shortform directory format definition changes for wider platform
    compatibility
    - various buffer cache fixes
    - cleanups in preparation for rmap merge
    - error injection cleanups and fixes
    - log item format buffer memory allocation restructuring to prevent
    rare OOM reclaim deadlocks
    - sparse inode chunks are now fully supported"

    * tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits)
    xfs: remove EXPERIMENTAL tag from sparse inode feature
    xfs: bufferhead chains are invalid after end_page_writeback
    xfs: allocate log vector buffers outside CIL context lock
    libxfs: directory node splitting does not have an extra block
    xfs: remove dax code from object file when disabled
    xfs: skip dirty pages in ->releasepage()
    xfs: remove __arch_pack
    xfs: kill xfs_dir2_inou_t
    xfs: kill xfs_dir2_sf_off_t
    xfs: split direct I/O and DAX path
    xfs: direct calls in the direct I/O path
    xfs: stop using generic_file_read_iter for direct I/O
    xfs: split xfs_file_read_iter into buffered and direct I/O helpers
    xfs: remove s_maxbytes enforcement in xfs_file_read_iter
    xfs: kill ioflags
    xfs: don't pass ioflags around in the ioctl path
    xfs: track and serialize in-flight async buffers against unmount
    xfs: exclude never-released buffers from buftarg I/O accounting
    xfs: don't reset b_retries to 0 on every failure
    xfs: remove extraneous buffer flag changes
    ...

    Linus Torvalds
     

21 Jun, 2016

1 commit

  • Add infrastructure for multipage buffered writes. This is implemented
    using an main iterator that applies an actor function to a range that
    can be written.

    This infrastucture is used to implement a buffered write helper, one
    to zero file ranges and one to implement the ->page_mkwrite VM
    operations. All of them borrow a fair amount of code from fs/buffers.
    for now by using an internal version of __block_write_begin that
    gets passed an iomap and builds the corresponding buffer head.

    The file system is gets a set of paired ->iomap_begin and ->iomap_end
    calls which allow it to map/reserve a range and get a notification
    once the write code is finished with it.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

10 Jun, 2016

1 commit

  • d_walk() relies upon the tree not getting rearranged under it without
    rename_lock being touched. And we do grab rename_lock around the
    places that change the tree topology. Unfortunately, branch reordering
    is just as bad from d_walk() POV and we have two places that do it
    without touching rename_lock - one in handling of cursors (for ramfs-style
    directories) and another in autofs. autofs one is a separate story; this
    commit deals with the cursors.
    * mark cursor dentries explicitly at allocation time
    * make __dentry_kill() leave ->d_child.next pointing to the next
    non-cursor sibling, making sure that it won't be moved around unnoticed
    before the parent is relocked on ascend-to-parent path in d_walk().
    * make d_walk() skip cursors explicitly; strictly speaking it's
    not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
    but it makes analysis easier.

    Signed-off-by: Al Viro

    Al Viro
     

31 Mar, 2016

1 commit

  • I need an API that allows me to obtain a clone of the current file
    pointer to pass in to an exec handler. I've labelled this as an
    internal API because I can't see how it would be useful outside of the
    fs subsystem. The use case will be a persistent binfmt_misc handler.

    Signed-off-by: James Bottomley
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara

    James Bottomley
     

09 Jan, 2016

2 commits


04 Jan, 2016

1 commit


18 Aug, 2015

2 commits

  • There's a small consistency problem between the inode and writeback
    naming. Writeback calls the "for IO" inode queues b_io and
    b_more_io, but the inode calls these the "writeback list" or
    i_wb_list. This makes it hard to an new "under writeback" list to
    the inode, or call it an "under IO" list on the bdi because either
    way we'll have writeback on IO and IO on writeback and it'll just be
    confusing. I'm getting confused just writing this!

    So, rename the inode "for IO" list variable to i_io_list so we can
    add a new "writeback list" in a subsequent patch.

    Signed-off-by: Dave Chinner
    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Dave Chinner
     
  • The process of reducing contention on per-superblock inode lists
    starts with moving the locking to match the per-superblock inode
    list. This takes the global lock out of the picture and reduces the
    contention problems to within a single filesystem. This doesn't get
    rid of contention as the locks still have global CPU scope, but it
    does isolate operations on different superblocks form each other.

    Signed-off-by: Dave Chinner
    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Dave Chinner
     

19 Jun, 2015

1 commit

  • Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

23 Feb, 2015

1 commit

  • I've noticed significant locking contention in memory reclaimer around
    sb_lock inside grab_super_passive(). Grab_super_passive() is called from
    two places: in icache/dcache shrinkers (function super_cache_scan) and
    from writeback (function __writeback_inodes_wb). Both are required for
    progress in memory allocator.

    Grab_super_passive() acquires sb_lock to increment sb->s_count and check
    sb->s_instances. It seems sb->s_umount locked for read is enough here:
    super-block deactivation always runs under sb->s_umount locked for write.
    Protecting super-block itself isn't a problem: in super_cache_scan() sb
    is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
    are still active. Inside writeback super-block comes from inode from bdi
    writeback list under wb->list_lock.

    This patch removes locking sb_lock and checks s_instances under s_umount:
    generic_shutdown_super() unlinks it under sb->s_umount locked for write.
    New variant is called trylock_super() and since it only locks semaphore,
    callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
    they're done.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Al Viro

    Konstantin Khlebnikov
     

18 Feb, 2015

1 commit

  • Pull misc VFS updates from Al Viro:
    "This cycle a lot of stuff sits on topical branches, so I'll be sending
    more or less one pull request per branch.

    This is the first pile; more to follow in a few. In this one are
    several misc commits from early in the cycle (before I went for
    separate branches), plus the rework of mntput/dput ordering on umount,
    switching to use of fs_pin instead of convoluted games in
    namespace_unlock()"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the IO-triggering parts of umount to fs_pin
    new fs_pin killing logics
    allow attaching fs_pin to a group not associated with some superblock
    get rid of the second argument of acct_kill()
    take count and rcu_head out of fs_pin
    dcache: let the dentry count go down to zero without taking d_lock
    pull bumping refcount into ->kill()
    kill pin_put()
    mode_t whack-a-mole: chelsio
    file->f_path.dentry is pinned down for as long as the file is open...
    get rid of lustre_dump_dentry()
    gut proc_register() a bit
    kill d_validate()
    ncpfs: get rid of d_validate() nonsense
    selinuxfs: don't open-code d_genocide()

    Linus Torvalds
     

13 Feb, 2015

1 commit

  • Kmem accounting of memcg is unusable now, because it lacks slab shrinker
    support. That means when we hit the limit we will get ENOMEM w/o any
    chance to recover. What we should do then is to call shrink_slab, which
    would reclaim old inode/dentry caches from this cgroup. This is what
    this patch set is intended to do.

    Basically, it does two things. First, it introduces the notion of
    per-memcg slab shrinker. A shrinker that wants to reclaim objects per
    cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be
    passed the memory cgroup to scan from in shrink_control->memcg. For
    such shrinkers shrink_slab iterates over the whole cgroup subtree under
    the target cgroup and calls the shrinker for each kmem-active memory
    cgroup.

    Secondly, this patch set makes the list_lru structure per-memcg. It's
    done transparently to list_lru users - everything they have to do is to
    tell list_lru_init that they want memcg-aware list_lru. Then the
    list_lru will automatically distribute objects among per-memcg lists
    basing on which cgroup the object is accounted to. This way to make FS
    shrinkers (icache, dcache) memcg-aware we only need to make them use
    memcg-aware list_lru, and this is what this patch set does.

    As before, this patch set only enables per-memcg kmem reclaim when the
    pressure goes from memory.limit, not from memory.kmem.limit. Handling
    memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
    it is still unclear whether we will have this knob in the unified
    hierarchy.

    This patch (of 9):

    NUMA aware slab shrinkers use the list_lru structure to distribute
    objects coming from different NUMA nodes to different lists. Whenever
    such a shrinker needs to count or scan objects from a particular node,
    it issues commands like this:

    count = list_lru_count_node(lru, sc->nid);
    freed = list_lru_walk_node(lru, sc->nid, isolate_func,
    isolate_arg, &sc->nr_to_scan);

    where sc is an instance of the shrink_control structure passed to it
    from vmscan.

    To simplify this, let's add special list_lru functions to be used by
    shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
    consolidate the nid and nr_to_scan arguments in the shrink_control
    structure.

    This will also allow us to avoid patching shrinkers that use list_lru
    when we make shrink_slab() per-memcg - all we will have to do is extend
    the shrink_control structure to include the target memcg and make
    list_lru_shrink_{count,walk} handle this appropriately.

    Signed-off-by: Vladimir Davydov
    Suggested-by: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

26 Jan, 2015

1 commit


11 Dec, 2014

1 commit

  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro
     

24 Oct, 2014

2 commits


13 Oct, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The big thing in this pile is Eric's unmount-on-rmdir series; we
    finally have everything we need for that. The final piece of prereqs
    is delayed mntput() - now filesystem shutdown always happens on
    shallow stack.

    Other than that, we have several new primitives for iov_iter (Matt
    Wilcox, culled from his XIP-related series) pushing the conversion to
    ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
    cleanups and fixes (including the external name refcounting, which
    gives consistent behaviour of d_move() wrt procfs symlinks for long
    and short names alike) and assorted cleanups and fixes all over the
    place.

    This is just the first pile; there's a lot of stuff from various
    people that ought to go in this window. Starting with
    unionmount/overlayfs mess... ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
    fs/file_table.c: Update alloc_file() comment
    vfs: Deduplicate code shared by xattr system calls operating on paths
    reiserfs: remove pointless forward declaration of struct nameidata
    don't need that forward declaration of struct nameidata in dcache.h anymore
    take dname_external() into fs/dcache.c
    let path_init() failures treated the same way as subsequent link_path_walk()
    fix misuses of f_count() in ppp and netlink
    ncpfs: use list_for_each_entry() for d_subdirs walk
    vfs: move getname() from callers to do_mount()
    gfs2_atomic_open(): skip lookups on hashed dentry
    [infiniband] remove pointless assignments
    gadgetfs: saner API for gadgetfs_create_file()
    f_fs: saner API for ffs_sb_create_file()
    jfs: don't hash direct inode
    [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
    ecryptfs: ->f_op is never NULL
    android: ->f_op is never NULL
    nouveau: __iomem misannotations
    missing annotation in fs/file.c
    fs: namespace: suppress 'may be used uninitialized' warnings
    ...

    Linus Torvalds
     

10 Oct, 2014

1 commit

  • Add guard_bio_eod() check for mpage code in order to allow us to do IO
    even on the odd last sectors of a device, even if the block size is some
    multiple of the physical sector size.

    Using mpage_readpages() for block device requires this guard check.

    Signed-off-by: Akinobu Mita
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

09 Oct, 2014

1 commit

  • The gcc version 4.9.1 compiler complains Even though it isn't possible for
    these variables to not get initialized before they are used.

    fs/namespace.c: In function ‘SyS_mount’:
    fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
    ^
    fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here
    char *kernel_dev;
    ^
    fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
    ^
    fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here
    char *kernel_type;
    ^

    Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro.

    Cc: Alexander Viro
    Signed-off-by: Tim Gardner
    Signed-off-by: Al Viro

    Tim Gardner
     

08 Aug, 2014

2 commits


09 Nov, 2013

1 commit


25 Oct, 2013

1 commit


11 Sep, 2013

3 commits

  • Now that the shrinker is passing a node in the scan control structure, we
    can pass this to the the generic LRU list code to isolate reclaim to the
    lists on matching nodes.

    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Convert superblock shrinker to use the new count/scan API, and propagate
    the API changes through to the filesystem callouts. The filesystem
    callouts already use a count/scan API, so it's just changing counters to
    longs to match the VM API.

    This requires the dentry and inode shrinker callouts to be converted to
    the count/scan API. This is mainly a mechanical change.

    [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • This series reworks our current object cache shrinking infrastructure in
    two main ways:

    * Noticing that a lot of users copy and paste their own version of LRU
    lists for objects, we put some effort in providing a generic version.
    It is modeled after the filesystem users: dentries, inodes, and xfs
    (for various tasks), but we expect that other users could benefit in
    the near future with little or no modification. Let us know if you
    have any issues.

    * The underlying list_lru being proposed automatically and
    transparently keeps the elements in per-node lists, and is able to
    manipulate the node lists individually. Given this infrastructure, we
    are able to modify the up-to-now hammer called shrink_slab to proceed
    with node-reclaim instead of always searching memory from all over like
    it has been doing.

    Per-node lru lists are also expected to lead to less contention in the lru
    locks on multi-node scans, since we are now no longer fighting for a
    global lock. The locks usually disappear from the profilers with this
    change.

    Although we have no official benchmarks for this version - be our guest to
    independently evaluate this - earlier versions of this series were
    performance tested (details at
    http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
    visible performance regressions while yielding a better qualitative
    behavior in NUMA machines.

    With this infrastructure in place, we can use the list_lru entry point to
    provide memcg isolation and per-memcg targeted reclaim. Historically,
    those two pieces of work have been posted together. This version presents
    only the infrastructure work, deferring the memcg work for a later time,
    so we can focus on getting this part tested. You can see more about the
    history of such work at http://lwn.net/Articles/552769/

    Dave Chinner (18):
    dcache: convert dentry_stat.nr_unused to per-cpu counters
    dentry: move to per-sb LRU locks
    dcache: remove dentries from LRU before putting on dispose list
    mm: new shrinker API
    shrinker: convert superblock shrinkers to new API
    list: add a new LRU list type
    inode: convert inode lru list to generic lru list code.
    dcache: convert to use new lru list infrastructure
    list_lru: per-node list infrastructure
    shrinker: add node awareness
    fs: convert inode and dentry shrinking to be node aware
    xfs: convert buftarg LRU to generic code
    xfs: rework buffer dispose list tracking
    xfs: convert dquot cache lru to list_lru
    fs: convert fs shrinkers to new scan/count API
    drivers: convert shrinkers to new count/scan API
    shrinker: convert remaining shrinkers to count/scan API
    shrinker: Kill old ->shrink API.

    Glauber Costa (7):
    fs: bump inode and dentry counters to long
    super: fix calculation of shrinkable objects for small numbers
    list_lru: per-node API
    vmscan: per-node deferred work
    i915: bail out earlier when shrinker cannot acquire mutex
    hugepage: convert huge zero page shrinker to new shrinker API
    list_lru: dynamically adjust node arrays

    This patch:

    There are situations in very large machines in which we can have a large
    quantity of dirty inodes, unused dentries, etc. This is particularly true
    when umounting a filesystem, where eventually since every live object will
    eventually be discarded.

    Dave Chinner reported a problem with this while experimenting with the
    shrinker revamp patchset. So we believe it is time for a change. This
    patch just moves int to longs. Machines where it matters should have a
    big long anyway.

    Signed-off-by: Glauber Costa
    Cc: Dave Chinner
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: Dave Chinner
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     

09 Sep, 2013

1 commit


06 Sep, 2013

1 commit

  • We check submounts before doing d_drop() on a non-empty directory dentry in
    NFS (have_submounts()), but we do not exclude a racing mount. Nor do we
    prevent mounts to be added to the disconnected subtree using relative paths
    after the d_drop().

    This patch fixes these issues by checking for unlinked (unhashed, non-root)
    ancestors before proceeding with the mount. This is done with rename
    seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
    including our dentry itself. This ensures that the only one of
    check_submounts_and_drop() or has_unlinked_ancestor() can succeed.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     

29 Jun, 2013

2 commits


20 Jun, 2013

1 commit