22 Sep, 2015

1 commit

  • commit 4b75de8615050c1b0dd8d7794838c42f74ed36ba upstream.

    Before the make_empty_dir_inode calls were introduce into proc, sysfs,
    and sysctl those directories when stated reported an i_size of 0.
    make_empty_dir_inode started reporting an i_size of 2. At least one
    userspace application depended on stat returning i_size of 0. So
    modify make_empty_dir_inode to cause an i_size of 0 to be reported for
    these directories.

    Reported-by: Tejun Heo
    Acked-by: Tejun Heo
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

22 Jul, 2015

1 commit

  • commit fbabfd0f4ee2e8847bf56edf481249ad1bb8c44d upstream.

    To ensure it is safe to mount proc and sysfs I need to check if
    filesystems that are mounted on top of them are mounted on truly empty
    directories. Given that some directories can gain entries over time,
    knowing that a directory is empty right now is insufficient.

    Therefore add supporting infrastructure for permantently empty
    directories that proc and sysfs can use when they create mount points
    for filesystems and fs_fully_visible can use to test for permanently
    empty directories to ensure that nothing will be gained by mounting a
    fresh copy of proc or sysfs.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

16 Apr, 2015

1 commit


23 Feb, 2015

1 commit

  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

05 Feb, 2015

1 commit

  • Add a new mount option which enables a new "lazytime" mode. This mode
    causes atime, mtime, and ctime updates to only be made to the
    in-memory version of the inode. The on-disk times will only get
    updated when (a) if the inode needs to be updated for some non-time
    related change, (b) if userspace calls fsync(), syncfs() or sync(), or
    (c) just before an undeleted inode is evicted from memory.

    This is OK according to POSIX because there are no guarantees after a
    crash unless userspace explicitly requests via a fsync(2) call.

    For workloads which feature a large number of random write to a
    preallocated file, the lazytime mount option significantly reduces
    writes to the inode table. The repeated 4k writes to a single block
    will result in undesirable stress on flash devices and SMR disk
    drives. Even on conventional HDD's, the repeated writes to the inode
    table block will trigger Adjacent Track Interference (ATI) remediation
    latencies, which very negatively impact long tail latencies --- which
    is a very big deal for web serving tiers (for example).

    Google-Bug-Id: 18297052

    Signed-off-by: Theodore Ts'o
    Signed-off-by: Al Viro

    Theodore Ts'o
     

04 Nov, 2014

1 commit


08 Oct, 2014

1 commit

  • In later patches, we're going to add a new lock_manager_operation to
    finish setting up the lease while still holding the i_lock. To do
    this, we'll need to pass a little bit of info in the fcntl setlease
    case (primarily an fasync structure). Plumb the extra pointer into
    there in advance of that.

    We declare this pointer as a void ** to make it clear that this is
    private info, and that the caller isn't required to set this unless
    the lm_setup specifically requires it.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     

10 Sep, 2014

1 commit

  • GFS2 and NFS have setlease routines that always just return -EINVAL.
    Turn that into a generic routine that can live in fs/libfs.c.

    Cc:
    Cc: Steven Whitehouse
    Cc:
    Signed-off-by: Jeff Layton
    Acked-by: Trond Myklebust
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     

05 Jun, 2014

1 commit

  • Description by Jan Kara:
    "A lot of older filesystems don't properly flush volatile disk caches
    on fsync(2) which can lead to loss of fsynced data after power failure.

    This patch makes generic_file_fsync() issue proper cache flush to fix the
    problem. Sysadmin can use /sys/devices/.../cache_type to tell the system
    it should not send the cache flush."

    [akpm@linux-foundation.org: nuke ifdef]
    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Fabian Frederick
    Suggested-by: Jan Kara
    Suggested-by: Christoph Hellwig
    Cc: Jan Kara
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

16 Nov, 2013

1 commit


09 Nov, 2013

1 commit


25 Oct, 2013

2 commits


14 Jul, 2013

1 commit


29 Jun, 2013

1 commit


21 Dec, 2012

1 commit


18 Dec, 2012

1 commit


05 Sep, 2012

1 commit


14 Jul, 2012

2 commits

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

11 May, 2012

1 commit

  • This allows comparing hash and len in one operation on 64-bit
    architectures. Right now only __d_lookup_rcu() takes advantage of this,
    since that is the case we care most about.

    The use of anonymous struct/unions hides the alternate 64-bit approach
    from most users, the exception being a few cases where we initialize a
    'struct qstr' with a static initializer. This makes the problematic
    cases use a new QSTR_INIT() helper function for that (but initializing
    just the name pointer with a "{ .name = xyzzy }" initializer remains
    valid, as does just copying another qstr structure).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Apr, 2012

1 commit

  • d_genocide() does _not_ evict dentries; it just removes extra ref
    pinning each of those. Normally it's followed by shrinking the
    tree (it's done just before generic_shutdown_super() by kill_litter_super()),
    but in case of simple_fill_super() nothing of that kind will follow.
    Just do shrink_dcache_parent() manually.

    Signed-off-by: Al Viro

    Al Viro
     

06 Apr, 2012

1 commit

  • debugfs and a few other drivers use an open-coded version of
    simple_open() to pass a pointer from the file to the read/write file
    ops. Add support for this simple case to libfs so that we can remove
    the many duplicate copies of this simple function.

    Signed-off-by: Stephen Boyd
    Cc: Al Viro
    Cc: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

25 Mar, 2012

1 commit

  • Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
    "Fix up files in fs/ and lib/ dirs to only use module.h if they really
    need it.

    These are trivial in scope vs the work done previously. We now have
    things where any few remaining cleanups can be farmed out to arch or
    subsystem maintainers, and I have done so when possible. What is
    remaining here represents the bits that don't clearly lie within a
    single arch/subsystem boundary, like the fs dir and the lib dir.

    Some duplicate includes arising from overlapping fixes from
    independent subsystem maintainer submissions are also quashed."

    Fix up trivial conflicts due to clashes with other include file cleanups
    (including some due to the previous bug.h cleanup pull).

    * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    lib: reduce the use of module.h wherever possible
    fs: reduce the use of module.h wherever possible
    includecheck: delete any duplicate instances of module.h

    Linus Torvalds
     

21 Mar, 2012

2 commits


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

02 Nov, 2011

2 commits


23 Jul, 2011

1 commit


21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

20 Jul, 2011

2 commits

  • New helper (non-exported, fs/internal.h-only): __d_alloc(sb, name).
    Allocates dentry, sets its ->d_sb to given superblock and sets
    ->d_op accordingly. Old d_alloc(NULL, name) callers are converted
    to that (all of them know what superblock they want). d_alloc()
    itself is left only for parent != NULl case; uses __d_alloc(),
    inserts result into the list of parent's children.

    Note that now ->d_sb is assign-once and never NULL *and*
    ->d_parent is never NULL either.

    Signed-off-by: Al Viro

    Al Viro
     
  • Assume that /sys/kernel/debug/dummy64 is debugfs file created by
    debugfs_create_x64().

    # cd /sys/kernel/debug
    # echo 0x1234567812345678 > dummy64
    # cat dummy64
    0x0000000012345678

    # echo 0x80000000 > dummy64
    # cat dummy64
    0xffffffff80000000

    A value larger than INT_MAX cannot be written to the debugfs file created
    by debugfs_create_u64 or debugfs_create_x64 on 32bit machine. Because
    simple_attr_write() uses simple_strtol() for the conversion.

    To fix this, use simple_strtoll() instead.

    Signed-off-by: Akinobu Mita
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

13 Jan, 2011

1 commit


07 Jan, 2011

5 commits

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
    using dcache_lock for these anyway (eg. using i_mutex).

    Note: if we change the locking rule in future so that ->d_child protection is
    provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
    But it would be an exception to an otherwise regular locking scheme, so we'd
    have to see some good results. Probably not worthwhile.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Protect d_unhashed(dentry) condition with d_lock. This means keeping
    DCACHE_UNHASHED bit in synch with hash manipulations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin