Eric Lee / smarc-fsl-linux-kernel

16 Apr, 2015

2 commits

466b77bc9 VFS: fs/cachefiles: d_backing_inode() annotations ... Browse Code »

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago
5153bc817 VFS: Cachefiles should perform fs modifications on the top layer only ... Browse Code »

Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer
only and should not attempt to alter the lower layer.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago

23 Feb, 2015

2 commits

ce40fa78e Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions ... Browse Code »

Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
thereof) in cachefiles:

(1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
it doesn't want to deal with automounts in its cache.

(2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
cachefiles.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago
e36cb0b89 VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) ... Browse Code »

Convert the following where appropriate:

(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = ;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}

my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}

[AV: overlayfs parts skipped]

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
10 years ago

20 Nov, 2014

1 commit

a455589f1 assorted conversions to %p[dD] ... Browse Code »

Signed-off-by: Al Viro

Al Viro
11 years ago

14 Oct, 2014

2 commits

1b5a5f59e Merge tag 'fscache-fixes-20141013' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/dhowells/linux-fs

Pull fs-cache fixes from David Howells:
"Two fixes for bugs in CacheFiles and a cleanup in FS-Cache"

* tag 'fscache-fixes-20141013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fs/fscache/object-list.c: use __seq_open_private()
CacheFiles: Fix incorrect test for in-memory object collision
CacheFiles: Handle object being killed before being set up

Linus Torvalds
11 years ago
a30efe261 CacheFiles: Fix incorrect test for in-memory object collision ... Browse Code »

When CacheFiles cache objects are in use, they have in-memory representations,
as defined by the cachefiles_object struct. These are kept in a tree rooted in
the cache and indexed by dentry pointer (since there's a unique mapping between
object index key and dentry).

Collisions can occur between a representation already in the tree and a new
representation being set up because it takes time to dispose of an old
representation - particularly if it must be unlinked or renamed.

When such a collision occurs, cachefiles_mark_object_active() is meant to check
to see if the old, already-present representation is in the process of being
discarded (ie. FSCACHE_OBJECT_IS_LIVE is not set on it) - and, if so, wait for
the representation to be removed (ie. CACHEFILES_OBJECT_ACTIVE is then
cleared).

However, the test for whether the old representation is still live is checking
the new object - which always will be live at this point. This leads to an
oops looking like:

CacheFiles: Error: Unexpected object collision
object: OBJ1b354
objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
ops=0 inp=0 exc=0
parent=ffff88053f5417c0
cookie=ffff880538f202a0 [pr=ffff8805381b7160 nd=ffff880509c6eb78 fl=27]
key=[8] '2490000000000000'
xobject: OBJ1a600
xobjstate=DROP_OBJECT fl=70 wbusy=2 ev=0[0]
xops=0 inp=0 exc=0
xparent=ffff88053f5417c0
xcookie=ffff88050f4cbf70 [pr=ffff8805381b7160 nd= (null) fl=12]
------------[ cut here ]------------
kernel BUG at fs/cachefiles/namei.c:200!
...
Workqueue: fscache_object fscache_object_work_func [fscache]
...
RIP: ... cachefiles_walk_to_object+0x7ea/0x860 [cachefiles]
...
Call Trace:
[] ? cachefiles_lookup_object+0x58/0x100 [cachefiles]
[] ? fscache_look_up_object+0xb9/0x1d0 [fscache]
[] ? fscache_parent_ready+0x2d/0x80 [fscache]
[] ? fscache_object_work_func+0x92/0x1f0 [fscache]
[] ? process_one_work+0x16b/0x400
[] ? worker_thread+0x116/0x380
[] ? manage_workers.isra.21+0x290/0x290
[] ? kthread+0xbc/0xe0
[] ? flush_kthread_worker+0x80/0x80
[] ? ret_from_fork+0x7c/0xb0
[] ? flush_kthread_worker+0x80/0x80

Reported-by: Manuel Schölling
Signed-off-by: David Howells
Acked-by: Steve Dickson

David Howells
11 years ago

09 Oct, 2014

1 commit

2ec3a12a6 cachefiles_write_page(): switch to __kernel_write() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
11 years ago

30 Sep, 2014

1 commit

a3b7c0048 CacheFiles: Handle object being killed before being set up ... Browse Code »

If a cache object gets killed whilst in the process of being set up - for
instance if the netfs relinquishes the cookie that the object is associated
with - then the object's state machine will transit to the DROP_OBJECT state
without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.

This is a problem for CacheFiles because cachefiles_drop_object() assumes that
object->dentry will be set upon reaching the DROP_OBJECT state and has an
ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
they fail).

To fix this, just make the dentry cleanup in cachefiles_drop_object()
conditional on the dentry actually being set and remove the assertion.

CacheFiles: Assertion failed
------------[ cut here ]------------
kernel BUG at .../fs/cachefiles/namei.c:425!
...
Workqueue: fscache_object fscache_object_work_func [fscache]
...
RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
...
Call Trace:
[] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
[] ? fscache_drop_object+0xd1/0x1d0 [fscache]
[] ? fscache_object_work_func+0x87/0x210 [fscache]
[] ? process_one_work+0x155/0x450
[] ? worker_thread+0x114/0x370
[] ? manage_workers.isra.21+0x2c0/0x2c0
[] ? kthread+0xbc/0xe0
[] ? flush_kthread_worker+0xa0/0xa0
[] ? ret_from_fork+0x7c/0xb0
[] ? flush_kthread_worker+0xa0/0xa0

Reported-by: Manuel Schölling
Signed-off-by: David Howells
Acked-by: Steve Dickson

David Howells
11 years ago

26 Sep, 2014

1 commit

6ff66ac77 fs/cachefiles: add missing \n to kerror conversions ... Browse Code »

Commit 0227d6abb378 ("fs/cachefiles: replace kerror by pr_err") didn't
include newline featuring in original kerror definition

Signed-off-by: Fabian Frederick
Reported-by: David Howells
Acked-by: David Howells
Cc: [3.16.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
11 years ago

18 Sep, 2014

2 commits

e2cf1f1cc CacheFiles: Handle rename2 ... Browse Code »

Not all filesystems now provide the rename i_op - ext4 for one - but rather
provide the rename2 i_op. CacheFiles checks that the filesystem has rename
and so will reject ext4 now with EPERM:

CacheFiles: Failed to register: -1

Fix this by checking for rename2 as an alternative. The call to vfs_rename()
actually handles selection of the appropriate function, so we needn't worry
about that.

Turning on debugging shows:

[cachef] ==> cachefiles_get_directory(,,cache)
[cachef] subdir -> ffff88000b22b778 positive
[cachef]

David Howells
11 years ago
696382f93 cachefiles: remove two unused pagevecs. ... Browse Code »

These two have been unused since

commit c4d6d8dbf335c7fa47341654a37c53a512b519bb
CacheFiles: Fix the marking of cached pages

in 3.8.

Signed-off-by: NeilBrown
Signed-off-by: David Howells

NeilBrown
11 years ago

07 Jun, 2014

2 commits

0227d6abb fs/cachefiles: replace kerror by pr_err ... Browse Code »

Also add pr_fmt in internal.h

Signed-off-by: Fabian Frederick
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
11 years ago
4e1eb8830 FS/CACHEFILES: convert printk to pr_foo() ... Browse Code »

Signed-off-by: Fabian Frederick
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
11 years ago

13 Apr, 2014

1 commit

5166701b3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.

Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.

This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...

Linus Torvalds
11 years ago

05 Apr, 2014

1 commit

7df934526 Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull renameat2 system call from Miklos Szeredi:
"This adds a new syscall, renameat2(), which is the same as renameat()
but with a flags argument.

The purpose of extending rename is to add cross-rename, a symmetric
variant of rename, which exchanges the two files. This allows
interesting things, which were not possible before, for example
atomically replacing a directory tree with a symlink, etc... This
also allows overlayfs and friends to operate on whiteouts atomically.

Andy Lutomirski also suggested a "noreplace" flag, which disables the
overwriting behavior of rename.

These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
implemented for ext4 as an example and for testing"

* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ext4: add cross rename support
ext4: rename: split out helper functions
ext4: rename: move EMLINK check up
ext4: rename: create ext4_renament structure for local vars
vfs: add cross-rename
vfs: lock_two_nondirectories: allow directory args
security: add flags to rename hooks
vfs: add RENAME_NOREPLACE flag
vfs: add renameat2 syscall
vfs: rename: use common code for dir and non-dir
vfs: rename: move d_move() up
vfs: add d_is_dir()

Linus Torvalds
11 years ago

04 Apr, 2014

1 commit

55881bc76 fs: cachefiles: use add_to_page_cache_lru() ... Browse Code »

This code used to have its own lru cache pagevec up until a0b8cab3 ("mm:
remove lru parameter from __pagevec_lru_add and remove parts of pagevec
API"). Now it's just add_to_page_cache() followed by lru_cache_add(),
might as well use add_to_page_cache_lru() directly.

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Reviewed-by: Minchan Kim
Cc: Andrea Arcangeli
Cc: Bob Liu
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Greg Thelen
Cc: Hugh Dickins
Cc: Jan Kara
Cc: KOSAKI Motohiro
Cc: Luigi Semenzato
Cc: Mel Gorman
Cc: Metin Doslu
Cc: Michel Lespinasse
Cc: Ozgun Erdogan
Cc: Peter Zijlstra
Cc: Roman Gushchin
Cc: Ryan Mallon
Cc: Tejun Heo
Cc: Vlastimil Babka
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
11 years ago

02 Apr, 2014

1 commit

627bf81ac get rid of pointless checks for NULL ->i_op ... Browse Code »

Signed-off-by: Al Viro

Al Viro
11 years ago

01 Apr, 2014

2 commits

0b3974eb0 security: add flags to rename hooks ... Browse Code »

Add flags to security_path_rename() and security_inode_rename() hooks.

Signed-off-by: Miklos Szeredi
Reviewed-by: J. Bruce Fields

Miklos Szeredi
11 years ago
520c8b165 vfs: add renameat2 syscall ... Browse Code »

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.

Signed-off-by: Miklos Szeredi
Reviewed-by: J. Bruce Fields

Miklos Szeredi
11 years ago

13 Nov, 2013

1 commit

9bc9ccd7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:

- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes

plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...

Linus Torvalds
12 years ago

09 Nov, 2013

3 commits

27ac0ffea locks: break delegations on any attribute modification ... Browse Code »

NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka
Cc: David Howells
Cc: Tyler Hicks
Cc: Dustin Kirkland
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
12 years ago
8e6d782ca locks: break delegations on rename ... Browse Code »

Cc: David Howells
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
12 years ago
b21996e36 locks: break delegations on unlink ... Browse Code »

We need to break delegations on any operation that changes the set of
links pointing to an inode. Start with unlink.

Such operations also hold the i_mutex on a parent directory. Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client. To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

acquire locks
...
test for delegation; if found:
take reference on inode
release locks
wait for delegation break
drop reference on inode
retry

It is possible this could never terminate. (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.) But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack. We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells
Cc: Tyler Hicks
Cc: Dustin Kirkland
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
12 years ago

28 Sep, 2013

1 commit

94d30ae90 FS-Cache: Provide the ability to enable/disable cookies ... Browse Code »

Provide the ability to enable and disable fscache cookies. A disabled cookie
will reject or ignore further requests to:

Acquire a child cookie
Invalidate and update backing objects
Check the consistency of a backing object
Allocate storage for backing page
Read backing pages
Write to backing pages

but still allows:

Checks/waits on the completion of already in-progress objects
Uncaching of pages
Relinquishment of cookies

Two new operations are provided:

(1) Disable a cookie:

void fscache_disable_cookie(struct fscache_cookie *cookie,
bool invalidate);

If the cookie is not already disabled, this locks the cookie against other
dis/enablement ops, marks the cookie as being disabled, discards or
invalidates any backing objects and waits for cessation of activity on any
associated object.

This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
but it reinitialises the cookie such that it can be reenabled.

All possible failures are handled internally. The caller should consider
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
markings are cleared up.

(2) Enable a cookie:

void fscache_enable_cookie(struct fscache_cookie *cookie,
bool (*can_enable)(void *data),
void *data)

If the cookie is not already enabled, this locks the cookie against other
dis/enablement ops, invokes can_enable() and, if the cookie is not an
index cookie, will begin the procedure of acquiring backing objects.

The optional can_enable() function is passed the data argument and returns
a ruling as to whether or not enablement should actually be permitted to
begin.

All possible failures are handled internally. The cookie will only be
marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS. Cookie enablement during nfs_open()
is then contingent on i_writecount <dhowells@redhat.com

David Howells
12 years ago

21 Sep, 2013

2 commits

509bf24d1 CacheFiles: Don't try to dump the index key if the cookie has been cleared ... Browse Code »

Don't try to dump the index key that distinguishes an object if netfs
data in the cookie the object refers to has been cleared (ie. the
cookie has passed most of the way through
__fscache_relinquish_cookie()).

Since the netfs holds the index key, we can't get at it once the ->def
and ->netfs_data pointers have been cleared - and a NULL pointer
exception will ensue, usually just after a:

CacheFiles: Error: Unexpected object collision

error is reported.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
12 years ago
607566aec CacheFiles: Fix memory leak in cachefiles_check_auxdata error paths ... Browse Code »

In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if
we determine there's an error or that the data is stale.

Further, assigning the output of vfs_getxattr() to auxbuf->len gives
problems with checking for errors as auxbuf->len is a u16. We don't
actually need to set auxbuf->len, so keep the length in a variable for
now. We shouldn't need to check the upper limit of the buffer as an
overflow there should be indicated by -ERANGE.

While we're at it, fscache_check_aux() returns an enum value, not an
int, so assign it to an appropriately typed variable rather than to ret.

Signed-off-by: Josh Boyer
Signed-off-by: David Howells
cc: Hongyi Jia
cc: Milosz Tanski
Signed-off-by: Linus Torvalds

Josh Boyer
12 years ago

06 Sep, 2013

1 commit

5002d7bef CacheFiles: Implement interface to check cache consistency ... Browse Code »

Implement the FS-Cache interface to check the consistency of a cache object in
CacheFiles.

Original-author: Hongyi Jia
Signed-off-by: David Howells
cc: Hongyi Jia
cc: Milosz Tanski

David Howells
12 years ago

04 Jul, 2013

1 commit

a0b8cab3b mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API ... Browse Code »

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this
is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
helpers are misleading as the caller no longer has direct control over
what LRU the page is added to. Unused helpers are removed by this patch
and existing users of pagevec_lru_add_file() are converted to use
lru_cache_add_file() directly and use the per-cpu pagevecs instead of
creating their own pagevec.

Signed-off-by: Mel Gorman
Reviewed-by: Jan Kara
Reviewed-by: Rik van Riel
Acked-by: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
12 years ago

19 Jun, 2013

5 commits

2144bc78d cachefiles: remove unused macro list_to_page() ... Browse Code »

Signed-off-by: Haicheng Li
Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

Haicheng Li
12 years ago
1362729b1 FS-Cache: Simplify cookie retention for fscache_objects, fixing oops ... Browse Code »

Simplify the way fscache cache objects retain their cookie. The way I
implemented the cookie storage handling made synchronisation a pain (ie. the
object state machine can't rely on the cookie actually still being there).

Instead of the the object being detached from the cookie and the cookie being
freed in __fscache_relinquish_cookie(), we defer both operations:

(*) The detachment of the object from the list in the cookie now takes place
in fscache_drop_object() and is thus governed by the object state machine
(fscache_detach_from_cookie() has been removed).

(*) The release of the cookie is now in fscache_object_destroy() - which is
called by the cache backend just before it frees the object.

This means that the fscache_cookie struct is now available to the cache all the
way through from ->alloc_object() to ->drop_object() and ->put_object() -
meaning that it's no longer necessary to take object->lock to guarantee access.

However, __fscache_relinquish_cookie() doesn't wait for the object to go all
the way through to destruction before letting the netfs proceed. That would
massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
cookie around, in must therefore break all attachments to the netfs - which
includes ->def, ->netfs_data and any outstanding page read/writes.

To handle this, struct fscache_cookie now has an n_active counter:

(1) This starts off initialised to 1.

(2) Any time the cache needs to get at the netfs data, it calls
fscache_use_cookie() to increment it - if it is not zero. If it was zero,
then access is not permitted.

(3) When the cache has finished with the data, it calls fscache_unuse_cookie()
to decrement it. This does a wake-up on it if it reaches 0.

(4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
reach 0. The initialisation to 1 in step (1) ensures that we only get
wake ups when we're trying to get rid of the cookie.

This leaves __fscache_relinquish_cookie() a lot simpler.

***
This fixes a problem in the current code whereby if fscache_invalidate() is
followed sufficiently quickly by fscache_relinquish_cookie() then it is
possible for __fscache_relinquish_cookie() to have detached the cookie from the
object and cleared the pointer before a thread is dispatched to process the
invalidation state in the object state machine.

Since the pending write clearance was deferred to the invalidation state to
make it asynchronous, we need to either wait in relinquishment for the stores
tree to be cleared in the invalidation state or we need to handle the clearance
in relinquishment.

Further, if the relinquishment code does clear the tree, then the invalidation
state need to make the clearance contingent on still having the cookie to hand
(since that's where the tree is rooted) and we have to prevent the cookie from
disappearing for the duration.

This can lead to an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
...
RIP: 0010:[] _spin_lock+0xe/0x30
...
CR2: 000000000000000c ...
...
Process kslowd002 (...)
....
Call Trace:
[] fscache_invalidate_writes+0x38/0xd0 [fscache]
[] ? __switch_to+0xd0/0x320
[] ? find_busiest_queue+0x69/0x150
[] ? slow_work_enqueue+0x104/0x180
[] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
[] ? bit_waitqueue+0x17/0xd0
[] slow_work_execute+0x233/0x310
[] slow_work_thread+0x205/0x360
[] ? autoremove_wake_function+0x0/0x40
[] ? slow_work_thread+0x0/0x360
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
12 years ago
caaef6900 FS-Cache: Fix object state machine to have separate work and wait states ... Browse Code »

Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

(1) Work state. This is an execution state. No event processing is performed
by a work state. The function attached to a work state returns a pointer
indicating the next state to which the OSM should transition. Returning
NO_TRANSIT repeats the current state, but goes back to the scheduler
first.

(2) Wait state. This is an event processing state. No execution is
performed by a wait state. Wait states are just tables of "if event X
occurs, clear it and transition to state Y". The dispatcher returns to
the scheduler if none of the events in which the wait state has an
interest are currently pending.

(3) Out-of-band state. This is a special work state. Transitions to normal
states can be overridden when an unexpected event occurs (eg. I/O error).
Instead the dispatcher disables and clears the OOB event and transits to
the specified work state. This then acts as an ordinary work state,
though object->state points to the overridden destination. Returning
NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names. Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object->flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL). An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
12 years ago
493f7bc11 FS-Cache: Wrap checks on object state ... Browse Code »

Wrap checks on object state (mostly outside of fs/fscache/object.c) with
inline functions so that the mechanism can be replaced.

Some of the state checks within object.c are left as-is as they will be
replaced.

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
12 years ago
6bd5e82b0 CacheFiles: name i_mutex lock class explicitly ... Browse Code »

Just some cleanup.

(And note the caller of this function may, for example, call vfs_unlink
on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
here.)

Signed-off-by: J. Bruce Fields
Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

J. Bruce Fields
12 years ago

10 Apr, 2013

1 commit

03d95eb2f lift sb_start_write() out of ->write() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
12 years ago

21 Dec, 2012

5 commits

1f372dff1 FS-Cache: Mark cancellation of in-progress operation ... Browse Code »

Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.

Signed-off-by: David Howells

David Howells
13 years ago
c2d35bfe4 FS-Cache: Don't mask off the object event mask when printing it ... Browse Code »

Don't mask off the object event mask when printing it. That way it can be seen
if threre are bits set that shouldn't be.

Signed-off-by: David Howells

David Howells
13 years ago
b4cf1e08c CacheFiles: Add missing retrieval completions ... Browse Code »

CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.

This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]

Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[] [] fscache_put_operation+0x304/0x330
RSP: 0018:ffff880062f739d8 EFLAGS: 00010296
RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
FS: 00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[] __fscache_read_or_alloc_pages+0x1fe/0x530
[] __nfs_readpages_from_fscache+0x70/0x1c0
[] nfs_readpages+0xca/0x1e0
[] ? rpc_do_put_task+0x36/0x50
[] ? alloc_nfs_open_context+0x4b/0x110
[] ? rpc_call_sync+0x5a/0x70
[] __do_page_cache_readahead+0x1ca/0x270
[] ra_submit+0x21/0x30
[] ondemand_readahead+0x11d/0x250
[] page_cache_sync_readahead+0x36/0x60
[] generic_file_aio_read+0x454/0x770
[] nfs_file_read+0xe1/0x130
[] do_sync_read+0xd9/0x120
[] ? mntput+0x1f/0x40
[] ? fput+0x1cb/0x260
[] vfs_read+0xc8/0x180
[] sys_read+0x55/0x90

Reported-by: Mark Moseley
Signed-off-by: David Howells

David Howells
13 years ago
9dc8d9bfe CacheFiles: Implement invalidation ... Browse Code »

Implement invalidation for CacheFiles. This is in two parts:

(1) Provide an invalidation method (which just truncates the backing file).

(2) Abort attempts to copy anything read from the backing file whilst
invalidation is in progress.

Question: CacheFiles uses truncation in a couple of places. It has been using
notify_change() rather than sys_truncate() or something similar. This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write). Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?

Signed-off-by: David Howells

David Howells
13 years ago
9f10523f8 FS-Cache: Fix operation state management and accounting ... Browse Code »

Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

(1) Give struct fscache_operation a enum variable that directly represents the
state it's currently in, rather than spreading this knowledge over a bunch
of flags, who's processing the operation at the moment and whether it is
queued or not.

This makes it easier to write assertions to check the state at various
points and to prevent invalid state transitions.

(2) Add an 'operation complete' state and supply a function to indicate the
completion of an operation (fscache_op_complete()) and make things call
it. The final call to fscache_put_operation() can then check that an op
in the appropriate state (complete or cancelled).

(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
govern the state of an object:

(a) The ->n_ops is now the number of extant operations on the object
and is now decremented by fscache_put_operation() only.

(b) The ->n_in_progress is simply the number of objects that have been
taken off of the object's pending queue for the purposes of being
run. This is decremented by fscache_op_complete() only.

(c) The ->n_exclusive is the number of exclusive ops that have been
submitted and queued or are in progress. It is decremented by
fscache_op_complete() and by fscache_cancel_op().

fscache_put_operation() and fscache_operation_gc() now no longer try to
clean up ->n_exclusive and ->n_in_progress. That was leading to double
decrements against fscache_cancel_op().

fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
double decrements against fscache_put_operation().

fscache_submit_exclusive_op() now decides whether it has to queue an op
based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
will persist in being true even after all preceding operations have been
cancelled or completed. Furthermore, if an object is active and there are
runnable ops against it, there must be at least one op running.

(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
provide a function to record completion of the pages as they complete.

When n_pages reaches 0, the operation is deemed to be complete and
fscache_op_complete() is called.

Add calls to fscache_retrieval_complete() anywhere we've finished with a
page we've been given to read or allocate for. This includes places where
we just return pages to the netfs for reading from the server and where
accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs. This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.

FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
RIP: 0010:[] [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[] nfs_clear_inode+0x3c/0x41 [nfs]
[] nfs4_evict_inode+0x2f/0x33 [nfs]
[] evict+0xa1/0x15c
[] dispose_list+0x2c/0x38
[] prune_icache_sb+0x28c/0x29b
[] prune_super+0xd5/0x140
[] shrink_slab+0x102/0x1ab
[] balance_pgdat+0x2f2/0x595
[] ? process_timeout+0xb/0xb
[] kswapd+0x270/0x289
[] ? __init_waitqueue_head+0x46/0x46
[] ? balance_pgdat+0x595/0x595
[] kthread+0x7f/0x87
[] kernel_thread_helper+0x4/0x10
[] ? finish_task_switch+0x45/0xc0
[] ? retint_restore_args+0xe/0xe
[] ? __init_kthread_worker+0x53/0x53
[] ? gs_change+0xb/0xb

Signed-off-by: David Howells

David Howells
13 years ago