17 Apr, 2014

1 commit

  • Commit 9e30cc9595303b27b48 removed an internal mount. This
    has the side-effect that rootfs now has FSID 0. Many
    userspace utilities assume that st_dev in struct stat
    is never 0, so this change breaks a number of tools in
    early userspace.

    Since we don't know how many userspace programs are affected,
    make sure that FSID is at least 1.

    References: http://article.gmane.org/gmane.linux.kernel/1666905
    References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557
    Cc: 3.14
    Signed-off-by: Thomas Bächler
    Acked-by: Tejun Heo
    Acked-by: H. Peter Anvin
    Tested-by: Alexandre Demers
    Signed-off-by: Greg Kroah-Hartman

    Thomas Bächler
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

01 Feb, 2014

1 commit

  • Move sync_filesystem() after sb_prepare_remount_readonly(). If writers
    sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
    it can cause inodes to be dirtied and writeback to occur well after
    sys_mount() has completely successfully.

    This was spotted by corrupted ubifs filesystems on reboot, but appears
    that it can cause issues with any filesystem using writeback.

    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    CC: Richard Weinberger
    Co-authored-by: Richard Weinberger
    Signed-off-by: Andrew Ruder
    Signed-off-by: Al Viro

    Andrew Ruder
     

22 Jan, 2014

1 commit

  • On fail path alloc_super() calls destroy_super(), which issues a warning
    if the sb's s_mounts list is not empty, in particular if it has not been
    initialized. That said s_mounts must be initialized in alloc_super()
    before any possible failure, but currently it is initialized close to
    the end of the function leading to a useless warning dumped to log if
    either percpu_counter_init() or list_lru_init() fails. Let's fix this.

    Signed-off-by: Vladimir Davydov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

09 Nov, 2013

1 commit


25 Oct, 2013

2 commits


02 Oct, 2013

1 commit

  • Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
    the right place is destroy_super(). As it is, we leak them if sget()
    decides that new superblock it has allocated (and never shown to
    anybody) isn't needed and should be freed.

    Signed-off-by: Al Viro

    Al Viro
     

11 Sep, 2013

8 commits

  • This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
    and moves the deletion to after the shrinker is unregistered, as correctly
    spotted by Dave

    Signed-off-by: Glauber Costa
    Cc: Michal Hocko
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • We currently use a compile-time constant to size the node array for the
    list_lru structure. Due to this, we don't need to allocate any memory at
    initialization time. But as a consequence, the structures that contain
    embedded list_lru lists can become way too big (the superblock for
    instance contains two of them).

    This patch aims at ameliorating this situation by dynamically allocating
    the node arrays with the firmware provided nr_node_ids.

    Signed-off-by: Glauber Costa
    Cc: Dave Chinner
    Cc: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • Now that the shrinker is passing a node in the scan control structure, we
    can pass this to the the generic LRU list code to isolate reclaim to the
    lists on matching nodes.

    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • [glommer@openvz.org: adapted for new LRU return codes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • Convert superblock shrinker to use the new count/scan API, and propagate
    the API changes through to the filesystem callouts. The filesystem
    callouts already use a count/scan API, so it's just changing counters to
    longs to match the VM API.

    This requires the dentry and inode shrinker callouts to be converted to
    the count/scan API. This is mainly a mechanical change.

    [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • With the dentry LRUs being per-sb structures, there is no real need for
    a global dentry_lru_lock. The locking can be made more fine-grained by
    moving to a per-sb LRU lock, isolating the LRU operations of different
    filesytsems completely from each other. The need for this is independent
    of any performance consideration that may arise: in the interest of
    abstracting the lru operations away, it is mandatory that each lru works
    around its own lock instead of a global lock for all of them.

    [glommer@openvz.org: updated changelog ]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Reviewed-by: Christoph Hellwig
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • The sysctl knob sysctl_vfs_cache_pressure is used to determine which
    percentage of the shrinkable objects in our cache we should actively try
    to shrink.

    It works great in situations in which we have many objects (at least more
    than 100), because the aproximation errors will be negligible. But if
    this is not the case, specially when total_objects < 100, we may end up
    concluding that we have no objects at all (total / 100 = 0, if total <
    100).

    This is certainly not the biggest killer in the world, but may matter in
    very low kernel memory situations.

    Signed-off-by: Glauber Costa
    Reviewed-by: Carlos Maiolino
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Cc: Dave Chinner
    Cc: Al Viro
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     

08 Sep, 2013

1 commit


04 Sep, 2013

1 commit

  • Add support to the core direct-io code to defer AIO completions to user
    context using a workqueue. This replaces opencoded and less efficient
    code in XFS and ext4 (we save a memory allocation for each direct IO)
    and will be needed to properly support O_(D)SYNC for AIO.

    The communication between the filesystem and the direct I/O code requires
    a new buffer head flag, which is a bit ugly but not avoidable until the
    direct I/O code stops abusing the buffer_head structure for communicating
    with the filesystems.

    Currently this creates a per-superblock unbound workqueue for these
    completions, which is taken from an earlier patch by Jan Kara. I'm
    not really convinced about this use and would prefer a "normal" global
    workqueue with a high concurrency limit, but this needs further discussion.

    JK: Fixed ext4 part, dynamic allocation of the workqueue.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Christoph Hellwig
     

20 Jul, 2013

1 commit

  • Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
    to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
    ->s_active is 1. Along comes two more processes, trying to mount the same
    thing; sget() in each is picking that superblock, bumping ->s_count and
    trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
    finally gets to deactivate_locked_super() on failure; ->s_active is 2,
    superblock is still ->fs_supers because shutdown will *not* happen until
    ->s_active hits 0. ->s_umount is dropped and now we have two processes
    chasing each other:
    s_active = 2, A acquired ->s_umount, B blocked
    A sees that the damn thing is stillborn, does deactivate_locked_super()
    s_active = 1, A drops ->s_umount, B gets it
    A restarts the search and finds the same superblock. And bumps it ->s_active.
    s_active = 2, B holds ->s_umount, A blocked on trying to get it
    ... and we are in the earlier situation with A and B switched places.

    The root cause, of course, is that ->s_active should not grow until we'd
    got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
    shut the damn thing down. Fortunately, it's easy to do - the key point
    is that grab_super() is called only for superblocks currently on ->fs_supers,
    so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
    bump ->s_active; we must never increment ->s_count for superblocks past
    ->kill_sb(), but grab_super() is never called for those.

    The bug is pretty old; we would've caught it by now, if not for accidental
    exclusion between sget() for block filesystems; the things like cgroup or
    e.g. mtd-based filesystems don't have anything of that sort, so they get
    bitten. The right way to deal with that is obviously to fix sget()...

    Signed-off-by: Al Viro

    Al Viro
     

28 Feb, 2013

2 commits

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
    whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

    Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
    They basically mask off the sign bit and operate on the rest, so if
    the caller, by accident, passes in a negative number, the sign bit
    will be masked off and the remaining part will be used as if that was
    the input, which is worse than crashing.

    The constant is visible in idr.h and there are several users in the
    kernel.

    * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

    Basically used to test if adap->nr is a negative number which isn't
    -1 and returns -EINVAL if so. idr_alloc() already has negative
    @start checking (w/ WARN_ON_ONCE), so this can go away.

    * drivers/infiniband/core/cm.c:cm_alloc_id()
    drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

    Used to wrap cyclic @start. Can be replaced with max(next, 0).
    Note that this type of cyclic allocation using idr is buggy. These
    are prone to spurious -ENOSPC failure after the first wraparound.

    * fs/super.c:get_anon_bdev()

    The ID allocated from ida is masked off before being tested whether
    it's inside valid range. ida allocated ID can never be a negative
    number and the masking is unnecessary.

    Update idr_*() functions to fail with -EINVAL when negative @id is
    specified and update other MAX_IDR_MASK users as described above.

    This leaves MAX_IDR_MASK without any user, remove it and relocate
    other MAX_IDR_* constants to lib/idr.c.

    Signed-off-by: Tejun Heo
    Cc: Jean Delvare
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Cc: "Marciniszyn, Mike"
    Cc: Jack Morgenstein
    Cc: Or Gerlitz
    Cc: Al Viro
    Acked-by: Wolfram Sang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

10 Oct, 2012

1 commit


06 Oct, 2012

1 commit

  • To avoid name conflicts:

    drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

    While at it, also make the other names more consistent and add
    parentheses.

    [akpm@linux-foundation.org: repair fallout]
    [sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
    Signed-off-by: Fengguang Wu
    Cc: Bernd Petrovitsch
    Cc: walter harms
    Cc: Glauber Costa
    Signed-off-by: Stephen Rothwell
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

03 Oct, 2012

1 commit

  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

04 Aug, 2012

1 commit

  • Finally we can kill the 'sync_supers' kernel thread along with the
    '->write_super()' superblock operation because all the users are gone.
    Now every file-system is supposed to self-manage own superblock and
    its dirty state.

    The nice thing about killing this thread is that it improves power management.
    Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up
    every 5 seconds no matter what - even if there were no dirty superblocks and
    even if there were no file-systems using this service (e.g., btrfs and
    journalled ext4 do not need it). So it was wasting power most of the time. And
    because the thread was in the core of the kernel, all systems had to have it.
    So I am quite happy to make it go away.

    Interestingly, this thread is a left-over from the pdflush kernel thread which
    was a self-forking kernel thread responsible for all the write-back in old
    Linux kernels. It was turned into per-block device BDI threads, and
    'sync_supers' was a left-over. Thus, R.I.P, pdflush as well.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     

02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

01 Aug, 2012

1 commit

  • 09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a
    shrinker callback which was returning -1 when nr_to_scan is zero, which
    caused excessive slab scanning. But 635697c6 ("vmscan: fix initial
    shrinker size handling") fixed the problem, again so we can freely return
    -1 although nr_to_scan is zero. So let's revert 09f363c7 because the
    comment added in 09f363c7 made an unnecessary rule.

    Signed-off-by: Minchan Kim
    Cc: Al Viro
    Cc: Mikulas Patocka
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

31 Jul, 2012

3 commits

  • Now that all users are converted, we can remove functions, variables, and
    constants defined by the old freezing mechanism.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • vfs_check_frozen() tests are racy since the filesystem can be frozen just after
    the test is performed. Thus in write paths we can end up marking some pages or
    inodes dirty even though the file system is already frozen. This creates
    problems with flusher thread hanging on frozen filesystem.

    Another problem is that exclusion between ->page_mkwrite() and filesystem
    freezing has been handled by setting page dirty and then verifying s_frozen.
    This guaranteed that either the freezing code sees the faulted page, writes it,
    and writeprotects it again or we see s_frozen set and bail out of page fault.
    This works to protect from page being marked writeable while filesystem
    freezing is running but has an unpleasant artefact of leaving dirty (although
    unmodified and writeprotected) pages on frozen filesystem resulting in similar
    problems with flusher thread as the first problem.

    This patch aims at providing exclusion between write paths and filesystem
    freezing. We implement a writer-freeze read-write semaphore in the superblock.
    Actually, there are three such semaphores because of lock ranking reasons - one
    for page fault handlers (->page_mkwrite), one for all other writers, and one of
    internal filesystem purposes (used e.g. to track running transactions). Write
    paths which should block freezing (e.g. directory operations, ->aio_write(),
    ->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem
    takes the writer side.

    Only that we don't really want to bounce cachelines of the semaphores between
    CPUs for each write happening. So we implement the reader side of the semaphore
    as a per-cpu counter and the writer side is implemented using s_writers.frozen
    superblock field.

    [AV: microoptimize sb_start_write(); we want it fast in normal case]

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Pull writeback updates from Wu Fengguang:
    "Use time based periods to age the writeback proportions, which can
    adapt equally well to fast/slow devices."

    Fix up trivial conflict in comment in fs/sync.c

    * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Fix some comment errors
    block: Convert BDI proportion calculations to flexible proportions
    lib: Fix possible deadlock in flexible proportion code
    lib: Proportions with flexible period

    Linus Torvalds
     

14 Jul, 2012

1 commit

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

09 Jun, 2012

1 commit


25 Mar, 2012

1 commit

  • Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
    "Fix up files in fs/ and lib/ dirs to only use module.h if they really
    need it.

    These are trivial in scope vs the work done previously. We now have
    things where any few remaining cleanups can be farmed out to arch or
    subsystem maintainers, and I have done so when possible. What is
    remaining here represents the bits that don't clearly lie within a
    single arch/subsystem boundary, like the fs dir and the lib dir.

    Some duplicate includes arising from overlapping fixes from
    independent subsystem maintainer submissions are also quashed."

    Fix up trivial conflicts due to clashes with other include file cleanups
    (including some due to the previous bug.h cleanup pull).

    * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    lib: reduce the use of module.h wherever possible
    fs: reduce the use of module.h wherever possible
    includecheck: delete any duplicate instances of module.h

    Linus Torvalds
     

23 Mar, 2012

1 commit

  • Pull cleancache changes from Konrad Rzeszutek Wilk:
    "This has some patches for the cleancache API that should have been
    submitted a _long_ time ago. They are basically cleanups:

    - rename of flush to invalidate

    - moving reporting of statistics into debugfs

    - use __read_mostly as necessary.

    Oh, and also the MAINTAINERS file change. The files (except the
    MAINTAINERS file) have been in #linux-next for months now. The late
    addition of MAINTAINERS file is a brain-fart on my side - didn't
    realize I needed that just until I was typing this up - and I based
    that patch on v3.3 - so the tree is on top of v3.3."

    * tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
    MAINTAINERS: Adding cleancache API to the list.
    mm: cleancache: Use __read_mostly as appropiate.
    mm: cleancache: report statistics via debugfs instead of sysfs.
    mm: zcache/tmem/cleancache: s/flush/invalidate/
    mm: cleancache: s/flush/invalidate/

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Pull security subsystem updates for 3.4 from James Morris:
    "The main addition here is the new Yama security module from Kees Cook,
    which was discussed at the Linux Security Summit last year. Its
    purpose is to collect miscellaneous DAC security enhancements in one
    place. This also marks a departure in policy for LSM modules, which
    were previously limited to being standalone access control systems.
    Chromium OS is using Yama, and I believe there are plans for Ubuntu,
    at least.

    This patchset also includes maintenance updates for AppArmor, TOMOYO
    and others."

    Fix trivial conflict in due to the jumo_label->static_key
    rename.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
    AppArmor: Fix location of const qualifier on generated string tables
    TOMOYO: Return error if fails to delete a domain
    AppArmor: add const qualifiers to string arrays
    AppArmor: Add ability to load extended policy
    TOMOYO: Return appropriate value to poll().
    AppArmor: Move path failure information into aa_get_name and rename
    AppArmor: Update dfa matching routines.
    AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
    AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
    AppArmor: Add const qualifiers to generated string tables
    AppArmor: Fix oops in policy unpack auditing
    AppArmor: Fix error returned when a path lookup is disconnected
    KEYS: testing wrong bit for KEY_FLAG_REVOKED
    TOMOYO: Fix mount flags checking order.
    security: fix ima kconfig warning
    AppArmor: Fix the error case for chroot relative path name lookup
    AppArmor: fix mapping of META_READ to audit and quiet flags
    AppArmor: Fix underflow in xindex calculation
    AppArmor: Fix dropping of allowed operations that are force audited
    AppArmor: Add mising end of structure test to caps unpacking
    ...

    Linus Torvalds
     

20 Mar, 2012

1 commit


29 Feb, 2012

1 commit


14 Feb, 2012

2 commits


24 Jan, 2012

1 commit

  • Per akpm suggestions alter the use of the term flush to be
    invalidate. The next patch will do this across all MM.

    This change is completely cosmetic.

    [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]

    Signed-off-by: Dan Magenheimer
    Cc: Kamezawa Hiroyuki
    Cc: Jan Beulich
    Reviewed-by: Seth Jennings
    Cc: Jeremy Fitzhardinge
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Nitin Gupta
    Cc: Matthew Wilcox
    Cc: Chris Mason
    Cc: Rik Riel
    Cc: Andrew Morton
    [v10: Fixed fs: move code out of buffer.c conflict change]
    Signed-off-by: Konrad Rzeszutek Wilk

    Dan Magenheimer