Eric Lee / smarc-fsl-linux-kernel

11 Oct, 2016

3 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
9 years ago
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
9 years ago
97d211670 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas

This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check

Linus Torvalds
9 years ago

08 Oct, 2016

1 commit

f5c244383 libfs: Use IOP_XATTR flag for empty directory handling ... Browse Code »

Instead of special xattr inode operations, use the IOP_XATTR inode
operations flag for the special libfs empty directories.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
9 years ago

07 Oct, 2016

1 commit

bba0bd31b sockfs: Get rid of getxattr iop ... Browse Code »

If we allow pseudo-filesystems created with mount_pseudo to have xattr
handlers, we can replace sockfs_getxattr with a sockfs_xattr_get handler
to use the xattr handler name parsing.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
9 years ago

28 Sep, 2016

1 commit

078cd8279 fs: Replace CURRENT_TIME with current_time() for inode timestamps ... Browse Code »

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Acked-by: Felipe Balbi
Acked-by: Steven Whitehouse
Acked-by: Ryusuke Konishi
Acked-by: David Sterba
Signed-off-by: Al Viro

Deepa Dinamani
9 years ago

27 Sep, 2016

1 commit

e0e0be8a8 libfs: support RENAME_NOREPLACE in simple_rename() ... Browse Code »

This is trivial to do:

- add flags argument to simple_rename()
- check if flags doesn't have any other than RENAME_NOREPLACE
- assign simple_rename() to .rename2 instead of .rename

Filesystems converted:

hugetlbfs, ramfs, bpf.

Debugfs uses simple_rename() to implement debugfs_rename(), which is for
debugfs instances to rename files internally, not for userspace filesystem
access. For this case pass zero flags to simple_rename().

Signed-off-by: Miklos Szeredi
Acked-by: Greg Kroah-Hartman
Cc: Alexei Starovoitov

Miklos Szeredi
9 years ago

22 Sep, 2016

1 commit

31051c85b fs: Give dentry to inode_change_ok() instead of inode ... Browse Code »

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara

Jan Kara
9 years ago

21 Jun, 2016

3 commits

ebaaa80e8 lockless next_positive() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
9 years ago
4f42c1b5b libfs.c: new helper - next_positive() ... Browse Code »

Return nth positive child after given or NULL if there's
less than n left. dcache_readdir() and dcache_dir_lseek()
switched to it.

Signed-off-by: Al Viro

Al Viro
9 years ago
274f5b041 dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock ... Browse Code »

Make sure that directory is locked shared in dcache_dir_lseek();
for dcache_readdir() it's already tru, and that's enough to make
simple_positive() stable.

Signed-off-by: Al Viro

Al Viro
9 years ago

10 Jun, 2016

1 commit

ba65dc5ef much milder d_walk() race ... Browse Code »

d_walk() relies upon the tree not getting rearranged under it without
rename_lock being touched. And we do grab rename_lock around the
places that change the tree topology. Unfortunately, branch reordering
is just as bad from d_walk() POV and we have two places that do it
without touching rename_lock - one in handling of cursors (for ramfs-style
directories) and another in autofs. autofs one is a separate story; this
commit deals with the cursors.
* mark cursor dentries explicitly at allocation time
* make __dentry_kill() leave ->d_child.next pointing to the next
non-cursor sibling, making sure that it won't be moved around unnoticed
before the parent is relocked on ascend-to-parent path in d_walk().
* make d_walk() skip cursors explicitly; strictly speaking it's
not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
but it makes analysis easier.

Signed-off-by: Al Viro

Al Viro
9 years ago

28 May, 2016

1 commit

3767e255b switch ->setxattr() to passing dentry and inode separately ... Browse Code »

smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.

Similar change for ->getxattr() had been done in commit ce23e64. Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.

Reported-by: Seung-Woo Kim
Tested-by: Casey Schaufler
Signed-off-by: Al Viro

Al Viro
9 years ago

09 May, 2016

1 commit

c51da20c4 more trivial ->iterate_shared conversions ... Browse Code »

Signed-off-by: Al Viro

Al Viro
9 years ago

03 May, 2016

2 commits

4e82901cd dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared ... Browse Code »

no need to lock directory in dcache_dir_lseek(), while we are
at it - per-struct file exclusion is enough.

Signed-off-by: Al Viro

Al Viro
9 years ago
84695ffee Merge getxattr prototype change into work.lookups ... Browse Code »

The rest of work.xattr stuff isn't needed for this branch

Al Viro
9 years ago

11 Apr, 2016

1 commit

ce23e6401 ->getxattr(): pass dentry and inode as separate arguments ... Browse Code »

Signed-off-by: Al Viro

Al Viro
9 years ago

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
9 years ago

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
10 years ago

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
10 years ago

30 Dec, 2015

1 commit

cd3417c8f kill free_page_put_link() ... Browse Code »

all callers are better off with kfree_put_link()

Signed-off-by: Al Viro

Al Viro
10 years ago

09 Dec, 2015

1 commit

6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
10 years ago

13 Aug, 2015

1 commit

4b75de861 fs: Set the size of empty dirs to 0. ... Browse Code »

Before the make_empty_dir_inode calls were introduce into proc, sysfs,
and sysctl those directories when stated reported an i_size of 0.
make_empty_dir_inode started reporting an i_size of 2. At least one
userspace application depended on stat returning i_size of 0. So
modify make_empty_dir_inode to cause an i_size of 0 to be reported for
these directories.

Cc: stable@vger.kernel.org
Reported-by: Tejun Heo
Acked-by: Tejun Heo
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
10 years ago

05 Jul, 2015

1 commit

1dc51b828 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...

Linus Torvalds
10 years ago

04 Jul, 2015

1 commit

0cbee9926 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.

Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.

There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.

This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.

This set of changes also addresses an issue with how open file
descriptors from /proc//ns/* are displayed. Recently readlink of
/proc//fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.

There is also a short list of issues that have not been fixed yet that
I will mention briefly.

It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.

As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace

Linus Torvalds
10 years ago

01 Jul, 2015

1 commit

fbabfd0f4 fs: Add helper functions for permanently empty directories. ... Browse Code »

To ensure it is safe to mount proc and sysfs I need to check if
filesystems that are mounted on top of them are mounted on truly empty
directories. Given that some directories can gain entries over time,
knowing that a directory is empty right now is insufficient.

Therefore add supporting infrastructure for permantently empty
directories that proc and sysfs can use when they create mount points
for filesystems and fs_fully_visible can use to test for permanently
empty directories to ensure that nothing will be gained by mounting a
fresh copy of proc or sysfs.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
10 years ago

24 Jun, 2015

1 commit

dc3f4198e make simple_positive() public ... Browse Code »

Signed-off-by: Al Viro

Al Viro
10 years ago

11 May, 2015

5 commits

ecc087ff1 new helper: free_page_put_link() ... Browse Code »

similar to kfree_put_link()

Signed-off-by: Al Viro

Al Viro
10 years ago
5f2c4179e switch ->put_link() from dentry to inode ... Browse Code »

only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.

Signed-off-by: Al Viro

Al Viro
10 years ago
6e77137b3 don't pass nameidata to ->follow_link() ... Browse Code »

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro

Al Viro
10 years ago
680baacbc new ->follow_link() and ->put_link() calling conventions ... Browse Code »

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.

Signed-off-by: Al Viro

Al Viro
10 years ago
61ba64fc0 libfs: simple_follow_link() ... Browse Code »

let "fast" symlinks store the pointer to the body into ->i_link and
use simple_follow_link for ->follow_link()

Reviewed-by: Jan Kara
Signed-off-by: Al Viro

Al Viro
10 years ago

16 Apr, 2015

1 commit

dea655c28 VFS: fs library helpers: d_inode() annotations ... Browse Code »

library helpers called by filesystem drivers on their own inodes

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago

23 Feb, 2015

1 commit

e36cb0b89 VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) ... Browse Code »

Convert the following where appropriate:

(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = ;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}

my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}

[AV: overlayfs parts skipped]

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago

05 Feb, 2015

1 commit

0ae45f63d vfs: add support for a lazytime mount option ... Browse Code »

Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o
Signed-off-by: Al Viro

Theodore Ts'o
10 years ago

04 Nov, 2014

1 commit

946e51f2b move d_rcu from overlapping d_child to overlapping d_alias ... Browse Code »

Signed-off-by: Al Viro

Al Viro
11 years ago

08 Oct, 2014

1 commit

e6f5c7893 locks: plumb a "priv" pointer into the setlease routines ... Browse Code »

In later patches, we're going to add a new lock_manager_operation to
finish setting up the lease while still holding the i_lock. To do
this, we'll need to pass a little bit of info in the fcntl setlease
case (primarily an fasync structure). Plumb the extra pointer into
there in advance of that.

We declare this pointer as a void ** to make it clear that this is
private info, and that the caller isn't required to set this unless
the lm_setup specifically requires it.

Signed-off-by: Jeff Layton
Reviewed-by: Christoph Hellwig

Jeff Layton
11 years ago

10 Sep, 2014

1 commit

1c994a090 locks: consolidate "nolease" routines ... Browse Code »

GFS2 and NFS have setlease routines that always just return -EINVAL.
Turn that into a generic routine that can live in fs/libfs.c.

Cc:
Cc: Steven Whitehouse
Cc:
Signed-off-by: Jeff Layton
Acked-by: Trond Myklebust
Reviewed-by: Christoph Hellwig

Jeff Layton
11 years ago

05 Jun, 2014

1 commit

ac13a829f fs/libfs.c: add generic data flush to fsync ... Browse Code »

Description by Jan Kara:
"A lot of older filesystems don't properly flush volatile disk caches
on fsync(2) which can lead to loss of fsynced data after power failure.

This patch makes generic_file_fsync() issue proper cache flush to fix the
problem. Sysadmin can use /sys/devices/.../cache_type to tell the system
it should not send the cache flush."

[akpm@linux-foundation.org: nuke ifdef]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Fabian Frederick
Suggested-by: Jan Kara
Suggested-by: Christoph Hellwig
Cc: Jan Kara
Cc: Christoph Hellwig
Cc: Alexander Viro
Cc: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
11 years ago

16 Nov, 2013

1 commit

b26d4cd38 consolidate simple ->d_delete() instances ... Browse Code »

Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates

Signed-off-by: Al Viro

Al Viro
12 years ago