Eric Lee / smarc-fsl-linux-kernel

30 May, 2018

1 commit

22f1bde5d fscache: Fix hanging wait on page discarded by writeback ... Browse Code »

[ Upstream commit 2c98425720233ae3e135add0c7e869b32913502f ]

If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.

The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.

Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.

Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-05-30 13:52:25 +0800

07 Sep, 2017

2 commits

397162ffa mm: remove nr_pages argument from pagevec_lookup{,_range}() ... Browse Code »

All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.

Just drop the argument.

Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-09-07 08:27:27 +0800
d72dc8a25 mm: make pagevec_lookup() update index ... Browse Code »

Make pagevec_lookup() (and underlying find_get_pages()) update index to
the next page where iteration should continue. Most callers want this
and also pagevec_lookup_tag() already does this.

Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-09-07 08:27:26 +0800

01 Jun, 2016

1 commit

d21384552 FS-Cache: wake write waiter after invalidating writes ... Browse Code »

Signed-off-by: Yan, Zheng
Acked-by: David Howells

Yan, Zheng
2016-06-01 16:29:09 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

11 Nov, 2015

1 commit

102f4d900 FS-Cache: Handle a write to the page immediately beyond the EOF marker ... Browse Code »

Handle a write being requested to the page immediately beyond the EOF
marker on a cache object. Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.

The problem is actually in fscache where we check the index of the page
being written against store_limit. store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.

Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.

The assertion failure looks something like this:

CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[] [] cachefiles_write_page+0x273/0x2d0 [cachefiles]

Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least)
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-11-11 15:11:02 +0800

07 Nov, 2015

1 commit

d0164adc8 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep an… ... Browse Code »

…d avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2015-11-07 09:50:42 +0800

02 Apr, 2015

5 commits

4a47132ff FS-Cache: Retain the netfs context in the retrieval op earlier ... Browse Code »

Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.

Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.

Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.

In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised. To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.

Without these changes, the following oops might be seen:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
...
RIP: 0010:[] fscache_release_retrieval_op+0xae/0x100
...
Call Trace:
[] fscache_put_operation+0x117/0x2e0
[] __fscache_read_or_alloc_pages+0x351/0x3ac
[] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[] nfs_readpages+0x10c/0x185 [nfs]
[] ? alloc_pages_current+0x119/0x13e
[] ? __page_cache_alloc+0xfb/0x10a
[] __do_page_cache_readahead+0x188/0x22c
[] ondemand_readahead+0x29e/0x2af
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_read_iter+0x1a2/0x55a
[] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[] nfs_file_read+0x49/0x70 [nfs]
[] new_sync_read+0x78/0x9c
[] __vfs_read+0x13/0x38
[] vfs_read+0x95/0x121
[] SyS_read+0x4c/0x8a
[] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells
Reviewed-by: Steve Dickson
Acked-by: Jeff Layton

David Howells
2015-04-02 21:28:53 +0800
d3b97ca4a FS-Cache: The operation cancellation method needs calling in more places ... Browse Code »

Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up. This is currently being passed
directly to some of the functions that might want to call it, but not all.

Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct. Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).

The cancellation method must thence be called everywhere the CANCELLED state
is set. Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.

fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.

Without this, the following oops may be seen:

FS-Cache: Assertion failed
FS-Cache: 3 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/page.c:261!
...
RIP: 0010:[] fscache_release_retrieval_op+0x77/0x100
[] fscache_put_operation+0x114/0x2da
[] __fscache_read_or_alloc_pages+0x358/0x3b3
[] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[] nfs_readpages+0x10c/0x185 [nfs]
[] ? alloc_pages_current+0x119/0x13e
[] ? __page_cache_alloc+0xfb/0x10a
[] __do_page_cache_readahead+0x188/0x22c
[] ondemand_readahead+0x29e/0x2af
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_read_iter+0x1a2/0x55a
[] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[] nfs_file_read+0x49/0x70 [nfs]
[] new_sync_read+0x78/0x9c
[] __vfs_read+0x13/0x38
[] vfs_read+0x95/0x121
[] SyS_read+0x4c/0x8a
[] system_call_fastpath+0x12/0x17

The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.

Signed-off-by: David Howells
Reviewed-by: Steve Dickson
Acked-by: Jeff Layton

David Howells
2015-04-02 21:28:53 +0800
a39caadf0 FS-Cache: Put an aborted initialised op so that it is accounted correctly ... Browse Code »

Call fscache_put_operation() or a wrapper on any op that has gone through
fscache_operation_init() so that the accounting shown in /proc is done
correctly, specifically fscache_n_op_release.

fscache_put_operation() therefore now allows an op in the INITIALISED state as
well as in the CANCELLED and COMPLETE states.

Note that this means that an operation can get put that doesn't have its
->object pointer filled in, so anything that depends on the object needs to be
conditional in fscache_put_operation().

Signed-off-by: David Howells
Reviewed-by: Steve Dickson
Acked-by: Jeff Layton

David Howells
2015-04-02 21:28:53 +0800
418b7eb9e FS-Cache: Permit fscache_cancel_op() to cancel in-progress operations too ... Browse Code »

Currently, fscache_cancel_op() only cancels pending operations - attempts to
cancel in-progress operations are ignored. This leads to a problem in
fscache_wait_for_operation_activation() whereby the wait is terminated, but
the object has been killed.

The check at the end of the function now triggers because it's no longer
contingent on the cache having produced an I/O error since the commit that
fixed the logic error in fscache_object_is_dead().

The result of the check is that it tries to cancel the operation - but since
the object may not be pending by this point, the cancellation request may be
ignored - with the result that the the object is just put by the caller and
fscache_put_operation has an assertion failure because the operation isn't in
either the COMPLETE or the CANCELLED states.

To fix this, we permit in-progress ops to be cancelled under some
circumstances.

The bug results in an oops that looks something like this:

FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 3 == 5 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/operation.c:432!
...
RIP: 0010:[] fscache_put_operation+0xf2/0x2cd
Call Trace:
[] __fscache_read_or_alloc_pages+0x2ec/0x3b3
[] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[] nfs_readpages+0x10c/0x185 [nfs]
[] ? alloc_pages_current+0x119/0x13e
[] ? __page_cache_alloc+0xfb/0x10a
[] __do_page_cache_readahead+0x188/0x22c
[] ondemand_readahead+0x29e/0x2af
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_read_iter+0x1a2/0x55a
[] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[] nfs_file_read+0x49/0x70 [nfs]
[] new_sync_read+0x78/0x9c
[] __vfs_read+0x13/0x38
[] vfs_read+0x95/0x121
[] SyS_read+0x4c/0x8a
[] system_call_fastpath+0x12/0x17

Signed-off-by: David Howells
Reviewed-by: Steve Dickson
Acked-by: Jeff Layton

David Howells
2015-04-02 21:28:53 +0800
870215263 FS-Cache: fscache_object_is_dead() has wrong logic, kill it ... Browse Code »

fscache_object_is_dead() returns true only if the object is marked dead and
the cache got an I/O error. This should be a logical OR instead. Since two
of the callers got split up into handling for separate subcases, expand the
other callers and kill the function. This is probably the right thing to do
anyway since one of the subcases isn't about the object at all, but rather
about the cache.

Signed-off-by: David Howells
Reviewed-by: Steve Dickson
Acked-by: Jeff Layton

David Howells
2015-04-02 21:28:53 +0800

18 Sep, 2014

1 commit

3e1199dca FS-Cache: refcount becomes corrupt under vma pressure. ... Browse Code »

In rare cases under heavy VMA pressure the ref count for a fscache cookie
becomes corrupt. In this case we decrement ref count even if we fail before
incrementing the refcount.

FS-Cache: Assertion failed bnode-eca5f9c6/syslog
0 > 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
Call Trace:
[] __fscache_relinquish_cookie+0x50/0x220 [fscache]
[] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
[] ceph_destroy_inode+0x33/0x200 [ceph]
[] ? __fsnotify_inode_delete+0xe/0x10
[] destroy_inode+0x3c/0x70
[] evict+0x111/0x180
[] iput+0x103/0x190
[] __dentry_kill+0x1c8/0x220
[] shrink_dentry_list+0xf1/0x250
[] prune_dcache_sb+0x4c/0x60
[] super_cache_scan+0xff/0x170
[] shrink_slab_node+0x140/0x2c0
[] shrink_slab+0x8a/0x130
[] balance_pgdat+0x3e2/0x5d0
[] kswapd+0x16a/0x4a0
[] ? __wake_up_sync+0x20/0x20
[] ? balance_pgdat+0x5d0/0x5d0
[] kthread+0xc9/0xe0
[] ? ftrace_raw_event_xen_mmu_release_ptpage+0x70/0x90
[] ? flush_kthread_worker+0xb0/0xb0
[] ret_from_fork+0x7c/0xb0
[] ? flush_kthread_worker+0xb0/0xb0
RIP [] __fscache_disable_cookie+0x1db/0x210 [fscache]
RSP
---[ end trace 254d0d7c74a01f25 ]---

Signed-off-by: Milosz Tanski
Signed-off-by: David Howells

Milosz Tanski
2014-09-18 05:41:40 +0800

27 Aug, 2014

1 commit

9776de96e FS-Cache: Timeout for releasepage() ... Browse Code »

This is meant to avoid a recusive hang caused by underlying filesystem trying
to grab a free page and causing a write-out.

INFO: task kworker/u30:7:28375 blocked for more than 120 seconds.
Not tainted 3.15.0-virtual #74
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u30:7 D 0000000000000000 0 28375 2 0x00000000
Workqueue: fscache_operation fscache_op_work_func [fscache]
ffff88000b147148 0000000000000046 0000000000000000 ffff88000b1471c8
ffff8807aa031820 0000000000014040 ffff88000b147fd8 0000000000014040
ffff880f0c50c860 ffff8807aa031820 ffff88000b147158 ffff88007be59cd0
Call Trace:
[] schedule+0x29/0x70
[] __fscache_wait_on_page_write+0x55/0x90 [fscache]
[] ? __wake_up_sync+0x20/0x20
[] __fscache_maybe_release_page+0x65/0x1e0 [fscache]
[] ceph_releasepage+0x83/0x100 [ceph]
[] ? anon_vma_fork+0x130/0x130
[] try_to_release_page+0x32/0x50
[] shrink_page_list+0x7e6/0x9d0
[] ? isolate_lru_pages.isra.73+0x78/0x1e0
[] shrink_inactive_list+0x252/0x4c0
[] shrink_lruvec+0x3e1/0x670
[] shrink_zone+0x3f/0x110
[] do_try_to_free_pages+0x1d6/0x450
[] ? zone_statistics+0x99/0xc0
[] try_to_free_pages+0xc4/0x180
[] __alloc_pages_nodemask+0x6b2/0xa60
[] ? __find_get_block+0xbe/0x250
[] ? wake_up_bit+0x2e/0x40
[] alloc_pages_current+0xb3/0x180
[] __page_cache_alloc+0xb7/0xd0
[] grab_cache_page_write_begin+0x7c/0xe0
[] ? ext4_mark_inode_dirty+0x82/0x220
[] ext4_da_write_begin+0x89/0x2d0
[] generic_perform_write+0xbe/0x1d0
[] ? update_time+0x81/0xc0
[] ? mnt_clone_write+0x12/0x30
[] __generic_file_aio_write+0x1ce/0x3f0
[] generic_file_aio_write+0x5e/0xe0
[] ext4_file_write+0x9f/0x410
[] ? ext4_file_open+0x66/0x180
[] do_sync_write+0x5a/0x90
[] cachefiles_write_page+0x149/0x430 [cachefiles]
[] ? radix_tree_gang_lookup_tag+0x89/0xd0
[] fscache_write_op+0x222/0x3b0 [fscache]
[] fscache_op_work_func+0x3a/0x100 [fscache]
[] process_one_work+0x179/0x4a0
[] worker_thread+0x11b/0x370
[] ? manage_workers.isra.21+0x2e0/0x2e0
[] kthread+0xc9/0xe0
[] ? ftrace_raw_event_xen_mmu_release_ptpage+0x70/0x90
[] ? flush_kthread_worker+0xb0/0xb0
[] ret_from_fork+0x7c/0xb0
[] ? flush_kthread_worker+0xb0/0xb0

Signed-off-by: Milosz Tanski
Signed-off-by: David Howells

Milosz Tanski
2014-08-27 22:24:06 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

05 Jun, 2014

1 commit

36dfd116e fs/fscache: convert printk to pr_foo() ... Browse Code »

All printk converted to pr_foo() except internal.h: printk(KERN_DEBUG

Coalesce formats.

Add pr_fmt

Signed-off-by: Fabian Frederick
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:53:51 +0800

28 Sep, 2013

2 commits

94d30ae90 FS-Cache: Provide the ability to enable/disable cookies ... Browse Code »

Provide the ability to enable and disable fscache cookies. A disabled cookie
will reject or ignore further requests to:

Acquire a child cookie
Invalidate and update backing objects
Check the consistency of a backing object
Allocate storage for backing page
Read backing pages
Write to backing pages

but still allows:

Checks/waits on the completion of already in-progress objects
Uncaching of pages
Relinquishment of cookies

Two new operations are provided:

(1) Disable a cookie:

void fscache_disable_cookie(struct fscache_cookie *cookie,
bool invalidate);

If the cookie is not already disabled, this locks the cookie against other
dis/enablement ops, marks the cookie as being disabled, discards or
invalidates any backing objects and waits for cessation of activity on any
associated object.

This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
but it reinitialises the cookie such that it can be reenabled.

All possible failures are handled internally. The caller should consider
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
markings are cleared up.

(2) Enable a cookie:

void fscache_enable_cookie(struct fscache_cookie *cookie,
bool (*can_enable)(void *data),
void *data)

If the cookie is not already enabled, this locks the cookie against other
dis/enablement ops, invokes can_enable() and, if the cookie is not an
index cookie, will begin the procedure of acquiring backing objects.

The optional can_enable() function is passed the data argument and returns
a ruling as to whether or not enablement should actually be permitted to
begin.

All possible failures are handled internally. The cookie will only be
marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS. Cookie enablement during nfs_open()
is then contingent on i_writecount <dhowells@redhat.com

David Howells
2013-09-28 01:40:25 +0800
8fb883f3e FS-Cache: Add use/unuse/wake cookie wrappers ... Browse Code »

Add wrapper functions for dealing with cookie->n_active:

(*) __fscache_use_cookie() to increment it.

(*) __fscache_unuse_cookie() to decrement and test against zero.

(*) __fscache_wake_unused_cookie() to wake up anyone waiting for it to reach
zero.

The second and third are split so that the third can be done after cookie->lock
has been released in case the waiter wakes up whilst we're still holding it and
tries to get it.

We will need to wake-on-zero once the cookie disablement patch is applied
because it will then be possible to see n_active become zero without the cookie
being relinquished.

Also move the cookie usement out of fscache_attr_changed_op() and into
fscache_attr_changed() and the operation struct so that cookie disablement
will be able to track it.

Whilst we're at it, only increment n_active if we're about to do
fscache_submit_op() so that we don't have to deal with undoing it if anything
earlier fails. Possibly this should be moved into fscache_submit_op() which
could look at FSCACHE_OP_UNUSE_COOKIE.

Signed-off-by: David Howells

David Howells
2013-09-28 01:40:25 +0800

12 Sep, 2013

1 commit

5e4c0d974 lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt ... Browse Code »

With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];

...
radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice. That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt. Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes. However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed. To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2013-09-12 06:59:36 +0800

06 Sep, 2013

2 commits

5a6f282a2 fscache: Netfs function for cleanup post readpages ... Browse Code »

Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback. It marks all the pages in the list
provided by readahead with PG_private_2. In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG. This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages. It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS. It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345] [] dump_stack+0x19/0x1b
[12410647.597356] [] bad_page+0xc7/0x120
[12410647.597359] [] free_pages_prepare+0x10e/0x120
[12410647.597361] [] free_hot_cold_page+0x40/0x170
[12410647.597363] [] __put_single_page+0x27/0x30
[12410647.597365] [] put_page+0x25/0x40
[12410647.597376] [] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379] [] __do_page_cache_readahead+0x1af/0x260
[12410647.597382] [] ra_submit+0x21/0x30
[12410647.597384] [] filemap_fault+0x254/0x490
[12410647.597387] [] __do_fault+0x6f/0x4e0
[12410647.597391] [] ? __switch_to+0x16d/0x4a0
[12410647.597395] [] ? finish_task_switch+0x5a/0xc0
[12410647.597398] [] handle_pte_fault+0xf6/0x930
[12410647.597401] [] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403] [] ? xen_pmd_val+0xe/0x10
[12410647.597405] [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407] [] handle_mm_fault+0x251/0x370
[12410647.597411] [] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414] [] __do_page_fault+0x1aa/0x550
[12410647.597418] [] ? up_write+0x1d/0x20
[12410647.597422] [] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425] [] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427] [] do_page_fault+0xe/0x10
[12410647.597431] [] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski
Signed-off-by: David Howells

Milosz Tanski
2013-09-06 16:17:30 +0800
da9803bc8 FS-Cache: Add interface to check consistency of a cached object ... Browse Code »

Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call. FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia
Signed-off-by: David Howells
cc: Hongyi Jia
cc: Milosz Tanski

David Howells
2013-09-06 16:17:30 +0800

19 Jun, 2013

5 commits

1bb4b7f98 FS-Cache: The retrieval remaining-pages counter needs to be atomic_t ... Browse Code »

struct fscache_retrieval contains a count of the number of pages that still
need some processing (n_pages). This is decremented as the pages are
processed.

However, this needs to be atomic as fscache_retrieval_complete() (I think) just
occasionally may be called from cachefiles_read_backing_file() and
cachefiles_read_copier() simultaneously.

This happens when an fscache_read_or_alloc_pages() request containing a lot of
pages (say a couple of hundred) is being processed. The read on each backing
page is dispatched individually because we need to insert a monitor into the
waitqueue to catch when the read completes. However, under low-memory
conditions, we might be forced to wait in the allocator - and this gives the
I/O on the backing page a chance to complete first.

When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
the workqueue without waiting for the operation to finish the initial I/O
dispatch (we want to release any pages we can as soon as we can), thus both can
end up running simultaneously and potentially attempting to partially complete
the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
the page cache).

This was demonstrated by parallelling the non-atomic counter with an atomic
counter and printing both of them when the assertion fails. At this point, the
atomic counter has reached zero, but the non-atomic counter has not.

To fix this, make the counter an atomic_t.

This results in the following bug appearing

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:421!

or

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:414!

With a backtrace like the following:

RIP: 0010:[] fscache_put_operation+0x1ad/0x240 [fscache]
Call Trace:
[] fscache_retrieval_work+0x55/0x270 [fscache]
[] ? fscache_retrieval_work+0x0/0x270 [fscache]
[] worker_thread+0x170/0x2a0
[] ? autoremove_wake_function+0x0/0x40
[] ? worker_thread+0x0/0x2a0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

Signed-off-by: David Howells
Reviewed-and-tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
2013-06-19 21:16:47 +0800
1362729b1 FS-Cache: Simplify cookie retention for fscache_objects, fixing oops ... Browse Code »

Simplify the way fscache cache objects retain their cookie. The way I
implemented the cookie storage handling made synchronisation a pain (ie. the
object state machine can't rely on the cookie actually still being there).

Instead of the the object being detached from the cookie and the cookie being
freed in __fscache_relinquish_cookie(), we defer both operations:

(*) The detachment of the object from the list in the cookie now takes place
in fscache_drop_object() and is thus governed by the object state machine
(fscache_detach_from_cookie() has been removed).

(*) The release of the cookie is now in fscache_object_destroy() - which is
called by the cache backend just before it frees the object.

This means that the fscache_cookie struct is now available to the cache all the
way through from ->alloc_object() to ->drop_object() and ->put_object() -
meaning that it's no longer necessary to take object->lock to guarantee access.

However, __fscache_relinquish_cookie() doesn't wait for the object to go all
the way through to destruction before letting the netfs proceed. That would
massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
cookie around, in must therefore break all attachments to the netfs - which
includes ->def, ->netfs_data and any outstanding page read/writes.

To handle this, struct fscache_cookie now has an n_active counter:

(1) This starts off initialised to 1.

(2) Any time the cache needs to get at the netfs data, it calls
fscache_use_cookie() to increment it - if it is not zero. If it was zero,
then access is not permitted.

(3) When the cache has finished with the data, it calls fscache_unuse_cookie()
to decrement it. This does a wake-up on it if it reaches 0.

(4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
reach 0. The initialisation to 1 in step (1) ensures that we only get
wake ups when we're trying to get rid of the cookie.

This leaves __fscache_relinquish_cookie() a lot simpler.

***
This fixes a problem in the current code whereby if fscache_invalidate() is
followed sufficiently quickly by fscache_relinquish_cookie() then it is
possible for __fscache_relinquish_cookie() to have detached the cookie from the
object and cleared the pointer before a thread is dispatched to process the
invalidation state in the object state machine.

Since the pending write clearance was deferred to the invalidation state to
make it asynchronous, we need to either wait in relinquishment for the stores
tree to be cleared in the invalidation state or we need to handle the clearance
in relinquishment.

Further, if the relinquishment code does clear the tree, then the invalidation
state need to make the clearance contingent on still having the cookie to hand
(since that's where the tree is rooted) and we have to prevent the cookie from
disappearing for the duration.

This can lead to an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
...
RIP: 0010:[] _spin_lock+0xe/0x30
...
CR2: 000000000000000c ...
...
Process kslowd002 (...)
....
Call Trace:
[] fscache_invalidate_writes+0x38/0xd0 [fscache]
[] ? __switch_to+0xd0/0x320
[] ? find_busiest_queue+0x69/0x150
[] ? slow_work_enqueue+0x104/0x180
[] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
[] ? bit_waitqueue+0x17/0xd0
[] slow_work_execute+0x233/0x310
[] slow_work_thread+0x205/0x360
[] ? autoremove_wake_function+0x0/0x40
[] ? slow_work_thread+0x0/0x360
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
2013-06-19 21:16:47 +0800
caaef6900 FS-Cache: Fix object state machine to have separate work and wait states ... Browse Code »

Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

(1) Work state. This is an execution state. No event processing is performed
by a work state. The function attached to a work state returns a pointer
indicating the next state to which the OSM should transition. Returning
NO_TRANSIT repeats the current state, but goes back to the scheduler
first.

(2) Wait state. This is an event processing state. No execution is
performed by a wait state. Wait states are just tables of "if event X
occurs, clear it and transition to state Y". The dispatcher returns to
the scheduler if none of the events in which the wait state has an
interest are currently pending.

(3) Out-of-band state. This is a special work state. Transitions to normal
states can be overridden when an unexpected event occurs (eg. I/O error).
Instead the dispatcher disables and clears the OOB event and transits to
the specified work state. This then acts as an ordinary work state,
though object->state points to the overridden destination. Returning
NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names. Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object->flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL). An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
2013-06-19 21:16:47 +0800
0c59a95d9 FS-Cache: Don't sleep in page release if __GFP_FS is not set ... Browse Code »

Don't sleep in __fscache_maybe_release_page() if __GFP_FS is not set. This
goes some way towards mitigating fscache deadlocking against ext4 by way of
the allocator, eg:

INFO: task flush-8:0:24427 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0 D ffff88003e2b9fd8 0 24427 2 0x00000000
ffff88003e2b9138 0000000000000046 ffff880012e3a040 ffff88003e2b9fd8
0000000000011c80 ffff88003e2b9fd8 ffffffff81a10400 ffff880012e3a040
0000000000000002 ffff880012e3a040 ffff88003e2b9098 ffffffff8106dcf5
Call Trace:
[] ? __lock_is_held+0x31/0x53
[] ? radix_tree_lookup_element+0xf4/0x12a
[] schedule+0x60/0x62
[] __fscache_wait_on_page_write+0x8b/0xa5 [fscache]
[] ? __init_waitqueue_head+0x4d/0x4d
[] __fscache_maybe_release_page+0x30c/0x324 [fscache]
[] ? __fscache_maybe_release_page+0x6c/0x324 [fscache]
[] ? trace_hardirqs_on_caller+0x114/0x170
[] nfs_fscache_release_page+0x68/0x94 [nfs]
[] nfs_release_page+0x7e/0x86 [nfs]
[] try_to_release_page+0x32/0x3b
[] shrink_page_list+0x535/0x71a
[] ? trace_hardirqs_on_caller+0x114/0x170
[] shrink_inactive_list+0x20a/0x2dd
[] ? mark_held_locks+0xbe/0xea
[] shrink_lruvec+0x34c/0x3eb
[] do_try_to_free_pages+0xcf/0x355
[] try_to_free_pages+0x9a/0xa1
[] __alloc_pages_nodemask+0x494/0x6f7
[] kmem_getpages+0x58/0x155
[] fallback_alloc+0x120/0x1f3
[] ? trace_hardirqs_off+0xd/0xf
[] ____cache_alloc_node+0x177/0x186
[] ? ext4_init_io_end+0x1c/0x37
[] kmem_cache_alloc+0xf1/0x176
[] ? test_set_page_writeback+0x101/0x113
[] ext4_init_io_end+0x1c/0x37
[] ext4_bio_write_page+0x20f/0x3af
[] mpage_da_submit_io+0x26e/0x2f6
[] ? __find_get_block_slow+0x38/0x133
[] mpage_da_map_and_submit+0x3a7/0x3bd
[] ext4_da_writepages+0x30d/0x426
[] do_writepages+0x1c/0x2a
[] __writeback_single_inode+0x3e/0xe5
[] writeback_sb_inodes+0x1bd/0x2f4
[] __writeback_inodes_wb+0x6f/0xb4
[] wb_writeback+0x101/0x195
[] ? trace_hardirqs_on_caller+0x114/0x170
[] ? wb_do_writeback+0xaa/0x173
[] wb_do_writeback+0x4a/0x173
[] ? trace_hardirqs_on+0xd/0xf
[] ? del_timer+0x4b/0x5b
[] bdi_writeback_thread+0x6d/0x147
[] ? wb_do_writeback+0x173/0x173
[] kthread+0xd0/0xd8
[] ? _raw_spin_unlock_irq+0x29/0x3e
[] ? __init_kthread_worker+0x55/0x55
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x55/0x55
2 locks held by flush-8:0/24427:
#0: (&type->s_umount_key#41){.+.+..}, at: [] grab_super_passive+0x4c/0x76
#1: (jbd2_handle){+.+...}, at: [] start_this_handle+0x475/0x4ea

The problem here is that another thread, which is attempting to write the
to-be-stored NFS page to the on-ext4 cache file is waiting for the journal
lock, eg:

INFO: task kworker/u:2:24437 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u:2 D ffff880039589768 0 24437 2 0x00000000
ffff8800395896d8 0000000000000046 ffff8800283bf040 ffff880039589fd8
0000000000011c80 ffff880039589fd8 ffff880039f0b040 ffff8800283bf040
0000000000000006 ffff8800283bf6b8 ffff880039589658 ffffffff81071a13
Call Trace:
[] ? mark_held_locks+0xbe/0xea
[] ? _raw_spin_unlock_irqrestore+0x3a/0x50
[] ? trace_hardirqs_on_caller+0x114/0x170
[] ? trace_hardirqs_on+0xd/0xf
[] schedule+0x60/0x62
[] start_this_handle+0x317/0x4ea
[] ? __init_waitqueue_head+0x4d/0x4d
[] jbd2__journal_start+0xb3/0x12e
[] __ext4_journal_start_sb+0xb2/0xc6
[] ext4_da_write_begin+0x109/0x233
[] generic_file_buffered_write+0x11a/0x264
[] ? __mark_inode_dirty+0x2d/0x1ee
[] __generic_file_aio_write+0x2a5/0x2d5
[] generic_file_aio_write+0x6f/0xd0
[] ext4_file_write+0x38c/0x3c4
[] do_sync_write+0x91/0xd1
[] cachefiles_write_page+0x26f/0x310 [cachefiles]
[] fscache_write_op+0x21e/0x37a [fscache]
[] ? _raw_spin_unlock_irq+0x29/0x3e
[] fscache_op_work_func+0x78/0xd7 [fscache]
[] process_one_work+0x232/0x3a8
[] ? process_one_work+0x1d7/0x3a8
[] worker_thread+0x214/0x303
[] ? manage_workers+0x245/0x245
[] kthread+0xd0/0xd8
[] ? _raw_spin_unlock_irq+0x29/0x3e
[] ? __init_kthread_worker+0x55/0x55
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x55/0x55
4 locks held by kworker/u:2/24437:
#0: (fscache_operation){.+.+.+}, at: [] process_one_work+0x1d7/0x3a8
#1: ((&op->work)){+.+.+.}, at: [] process_one_work+0x1d7/0x3a8
#2: (sb_writers#14){.+.+.+}, at: [] generic_file_aio_write+0x51/0xd0
#3: (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [] generic_file_aio_write+0x5b/0x

fscache already tries to cancel pending stores, but it can't cancel a write
for which I/O is already in progress.

An alternative would be to accept writing garbage to the cache under extreme
circumstances and to kill the afflicted cache object if we have to do this.
However, we really need to know how strapped the allocator is before deciding
to do that.

Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

David Howells
2013-06-19 21:16:47 +0800
ee8be57bc fs/fscache: remove spin_lock() from the condition in while() ... Browse Code »

The spinlock() within the condition in while() will cause a compile error
if it is not a function. This is not a problem on mainline but it does not
look pretty and there is no reason to do it that way.
That patch writes it a little differently and avoids the double condition.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: David Howells
Tested-By: Milosz Tanski
Acked-by: Jeff Layton

Sebastian Andrzej Siewior
2013-06-19 21:16:47 +0800

21 Dec, 2012

10 commits

91c7fbbf6 FS-Cache: Clear remaining page count on retrieval cancellation ... Browse Code »

Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.

Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below). This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.

FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF D 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[] [] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:ffff88001c6d7988 EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff880014cdfe00 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88001c6d7998 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffe00
R13: ffff88001c6d7ab4 R14: ffff88001a8638a0 R15: ffff88001552b190
FS: 00007f877aaf0700(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff11378fd2 CR3: 000000001c6c6000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 6075, threadinfo ffff88001c6d6000, task ffff88001c6c4080)
Stack:
ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
[] ? fscache_cancel_op+0x194/0x1cf [fscache]
[] fscache_put_operation+0x135/0x2ed [fscache]
[] ? fscache_cancel_op+0x194/0x1cf [fscache]
[] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
[] ? __alloc_pages_nodemask+0x195/0x75c
[] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[] nfs_readpages+0x186/0x1bd [nfs]
[] ? alloc_pages_current+0xc7/0xe4
[] ? __page_cache_alloc+0x84/0x91
[] ? __do_page_cache_readahead+0xa6/0x2e0
[] __do_page_cache_readahead+0x237/0x2e0
[] ? __do_page_cache_readahead+0xa6/0x2e0
[] ra_submit+0x1c/0x20
[] ondemand_readahead+0x359/0x382
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_aio_read+0x26b/0x637
[] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[] nfs_file_read+0xaa/0xcf [nfs]
[] do_sync_read+0x91/0xd1
[] vfs_read+0x9b/0x144
[] sys_read+0x44/0x75
[] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells

David Howells
2012-12-21 06:35:15 +0800
1f372dff1 FS-Cache: Mark cancellation of in-progress operation ... Browse Code »

Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.

Signed-off-by: David Howells

David Howells
2012-12-21 06:34:00 +0800
7ef001e93 FS-Cache: One of the write operation paths doesn't set the object state ... Browse Code »

In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).

Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed. The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.

Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs. At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[] [] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:ffff88003e32bcf8 EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff88001818eb78 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88003e32bd18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00811da
R13: 0000000000000001 R14: 0000000100625d26 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff7dd31c68 CR3: 000000003d730000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo ffff88003e32a000, task ffff88003bb38080)
Stack:
ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
[] ? vprintk_emit+0x3c6/0x41a
[] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
[] fscache_op_work_func+0xec/0x123 [fscache]
[] process_one_work+0x21c/0x3b0
[] ? process_one_work+0x1be/0x3b0
[] ? fscache_operation_gc+0x23e/0x23e [fscache]
[] worker_thread+0x202/0x2df
[] ? rescuer_thread+0x18e/0x18e
[] kthread+0xd0/0xd8
[] ? _raw_spin_unlock_irq+0x29/0x3e
[] ? __init_kthread_worker+0x55/0x55
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x55/0x55

Signed-off-by: David Howells

David Howells
2012-12-21 06:20:40 +0800
9c04caa81 FS-Cache: Fix signal handling during waits ... Browse Code »

wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for. This means that the signal handling in
fscache_wait_for_retrieval_activation() should now work properly.

Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:

FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[] [] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:ffff88002a4c39a8 EFLAGS: 00010292
RAX: 000000000000001a RBX: ffff88002d3dc158 RCX: 0000000000008685
RDX: ffffffff8102ccd6 RSI: 0000000000000001 RDI: ffffffff8102d1d6
RBP: ffff88002a4c39c8 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff8163afa0 R11: ffff88003bd11900 R12: ffffffffa00868c8
R13: ffff880028306458 R14: ffff88002d3dc1b0 R15: ffff88001372e538
FS: 00007f17426a0700(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f1742494a44 CR3: 0000000031bd7000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 15006, threadinfo ffff88002a4c2000, task ffff880023de3040)
Stack:
ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
[] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
[] ? __alloc_pages_nodemask+0x195/0x75c
[] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[] nfs_readpages+0x186/0x1bd [nfs]
[] ? alloc_pages_current+0xc7/0xe4
[] ? __page_cache_alloc+0x84/0x91
[] ? __do_page_cache_readahead+0xa6/0x2e0
[] __do_page_cache_readahead+0x237/0x2e0
[] ? __do_page_cache_readahead+0xa6/0x2e0
[] ra_submit+0x1c/0x20
[] ondemand_readahead+0x359/0x382
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_aio_read+0x26b/0x637
[] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[] nfs_file_read+0xaa/0xcf [nfs]
[] do_sync_read+0x91/0xd1
[] vfs_read+0x9b/0x144
[] sys_read+0x44/0x75
[] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells

David Howells
2012-12-21 06:20:23 +0800
8c209ce72 NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page ... Browse Code »

nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

BUG: Bad page state in process python-bin pfn:17d39b
page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G B ---------------- )
Pid: 31053, comm: python-bin Tainted: G B ----------------
2.6.32-71.24.1.el6.x86_64 #1
Call Trace:
[] bad_page+0x107/0x160
[] free_hot_cold_page+0x1c9/0x220
[] __pagevec_free+0x59/0xb0
[] ? flush_tlb_others_ipi+0x128/0x130
[] release_pages+0x21c/0x250
[] ? remove_migration_pte+0x28a/0x2b0
[] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
[] ____pagevec_lru_add+0x167/0x180
[] __lru_cache_add+0x58/0x70
[] lru_cache_add_lru+0x21/0x40
[] putback_lru_page+0x69/0x100
[] migrate_pages+0x13d/0x5d0
[] ? ____pagevec_lru_add+0x167/0x180
[] ? compaction_alloc+0x0/0x370
[] compact_zone+0x4cc/0x600
[] ? get_page_from_freelist+0x15c/0x820
[] ? check_preempt_wakeup+0x1c4/0x3c0
[] compact_zone_order+0x7e/0xb0
[] try_to_compact_pages+0x109/0x170
[] __alloc_pages_nodemask+0x5ed/0x850
[] ? thread_return+0x4e/0x778
[] alloc_pages_vma+0x93/0x150
[] do_huge_pmd_anonymous_page+0x135/0x340
[] ? rwsem_down_read_failed+0x26/0x30
[] handle_mm_fault+0x245/0x2b0
[] do_page_fault+0x123/0x3a0
[] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set. The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem. There are a number of things
I can do to deal with this:

(1) Make nfs_migrate_page() wait.

(2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

(3) Set a timeout around the wait.

(4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells
Acked-by: Jeff Layton

David Howells
2012-12-21 06:12:03 +0800
b4cf1e08c CacheFiles: Add missing retrieval completions ... Browse Code »

CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.

This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]

Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[] [] fscache_put_operation+0x304/0x330
RSP: 0018:ffff880062f739d8 EFLAGS: 00010296
RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
FS: 00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[] __fscache_read_or_alloc_pages+0x1fe/0x530
[] __nfs_readpages_from_fscache+0x70/0x1c0
[] nfs_readpages+0xca/0x1e0
[] ? rpc_do_put_task+0x36/0x50
[] ? alloc_nfs_open_context+0x4b/0x110
[] ? rpc_call_sync+0x5a/0x70
[] __do_page_cache_readahead+0x1ca/0x270
[] ra_submit+0x21/0x30
[] ondemand_readahead+0x11d/0x250
[] page_cache_sync_readahead+0x36/0x60
[] generic_file_aio_read+0x454/0x770
[] nfs_file_read+0xe1/0x130
[] do_sync_read+0xd9/0x120
[] ? mntput+0x1f/0x40
[] ? fput+0x1cb/0x260
[] vfs_read+0xc8/0x180
[] sys_read+0x55/0x90

Reported-by: Mark Moseley
Signed-off-by: David Howells

David Howells
2012-12-21 06:07:40 +0800
ef778e7ae FS-Cache: Provide proper invalidation ... Browse Code »

Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells

David Howells
2012-12-21 06:04:07 +0800
9f10523f8 FS-Cache: Fix operation state management and accounting ... Browse Code »

Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

(1) Give struct fscache_operation a enum variable that directly represents the
state it's currently in, rather than spreading this knowledge over a bunch
of flags, who's processing the operation at the moment and whether it is
queued or not.

This makes it easier to write assertions to check the state at various
points and to prevent invalid state transitions.

(2) Add an 'operation complete' state and supply a function to indicate the
completion of an operation (fscache_op_complete()) and make things call
it. The final call to fscache_put_operation() can then check that an op
in the appropriate state (complete or cancelled).

(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
govern the state of an object:

(a) The ->n_ops is now the number of extant operations on the object
and is now decremented by fscache_put_operation() only.

(b) The ->n_in_progress is simply the number of objects that have been
taken off of the object's pending queue for the purposes of being
run. This is decremented by fscache_op_complete() only.

(c) The ->n_exclusive is the number of exclusive ops that have been
submitted and queued or are in progress. It is decremented by
fscache_op_complete() and by fscache_cancel_op().

fscache_put_operation() and fscache_operation_gc() now no longer try to
clean up ->n_exclusive and ->n_in_progress. That was leading to double
decrements against fscache_cancel_op().

fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
double decrements against fscache_put_operation().

fscache_submit_exclusive_op() now decides whether it has to queue an op
based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
will persist in being true even after all preceding operations have been
cancelled or completed. Furthermore, if an object is active and there are
runnable ops against it, there must be at least one op running.

(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
provide a function to record completion of the pages as they complete.

When n_pages reaches 0, the operation is deemed to be complete and
fscache_op_complete() is called.

Add calls to fscache_retrieval_complete() anywhere we've finished with a
page we've been given to read or allocate for. This includes places where
we just return pages to the netfs for reading from the server and where
accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs. This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.

FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
RIP: 0010:[] [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[] nfs_clear_inode+0x3c/0x41 [nfs]
[] nfs4_evict_inode+0x2f/0x33 [nfs]
[] evict+0xa1/0x15c
[] dispose_list+0x2c/0x38
[] prune_icache_sb+0x28c/0x29b
[] prune_super+0xd5/0x140
[] shrink_slab+0x102/0x1ab
[] balance_pgdat+0x2f2/0x595
[] ? process_timeout+0xb/0xb
[] kswapd+0x270/0x289
[] ? __init_waitqueue_head+0x46/0x46
[] ? balance_pgdat+0x595/0x595
[] kthread+0x7f/0x87
[] kernel_thread_helper+0x4/0x10
[] ? finish_task_switch+0x45/0xc0
[] ? retint_restore_args+0xe/0xe
[] ? __init_kthread_worker+0x53/0x53
[] ? gs_change+0xb/0xb

Signed-off-by: David Howells

David Howells
2012-12-21 05:58:26 +0800
5f4f9f4af CacheFiles: Downgrade the requirements passed to the allocator ... Browse Code »

Downgrade the requirements passed to the allocator in the gfp flags parameter.
FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to
store an object or a page in the cache.

Signed-off-by: David Howells

David Howells
2012-12-21 05:58:25 +0800
c4d6d8dbf CacheFiles: Fix the marking of cached pages ... Browse Code »

Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

(1) It can lead to the PG_fscache mark being applied _after_ the page is set
PG_uptodate and unlocked (by the call to fscache_end_io()).

(2) CacheFiles's ref on the page is dropped immediately following
fscache_end_io() - and so may not still be held when the mark is applied.
This can lead to the page being passed back to the allocator before the
mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page. This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064
Call Trace:
[] ? dump_page+0xb9/0xbe
[] bad_page+0xd5/0xea
[] get_page_from_freelist+0x35b/0x46a
[] __alloc_pages_nodemask+0x362/0x662
[] __do_page_cache_readahead+0x13a/0x267
[] ? __do_page_cache_readahead+0xa2/0x267
[] ra_submit+0x1c/0x20
[] ondemand_readahead+0x28b/0x29a
[] ? ondemand_readahead+0x163/0x29a
[] page_cache_sync_readahead+0x38/0x3a
[] generic_file_aio_read+0x2ab/0x67e
[] nfs_file_read+0xa4/0xc9 [nfs]
[] do_sync_read+0xba/0xfa
[] ? security_file_permission+0x7b/0x84
[] ? rw_verify_area+0xab/0xc8
[] vfs_read+0xaa/0x13a
[] sys_read+0x45/0x6c
[] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't. This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens
Signed-off-by: David Howells
Reviewed-by: Jeff Layton

David Howells
2012-12-21 05:54:30 +0800

22 Jul, 2011

1 commit

b307d4655 FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop ... Browse Code »

The compiler, at least for ix86 and m68k, validly warns that the
comparison:

next
Reported-by: Jan Beulich
Signed-off-by: Jan Beulich
Signed-off-by: David Howells
Cc: Suresh Jayaraman
Cc: Geert Uytterhoeven
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Jan Beulich
2011-07-22 01:59:16 +0800

08 Jul, 2011

1 commit

c902ce1bf FS-Cache: Add a helper to bulk uncache pages on an inode ... Browse Code »

Add an FS-Cache helper to bulk uncache pages on an inode. This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages. This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress. Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

------------[ cut here ]------------
kernel BUG at fs/cachefiles/namei.c:201!
invalid opcode: 0000 [#1] SMP
Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282
RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
Stack:
0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
Call Trace:
cachefiles_lookup_object+0x78/0xd4 [cachefiles]
fscache_lookup_object+0x131/0x16d [fscache]
fscache_object_work_func+0x1bc/0x669 [fscache]
process_one_work+0x186/0x298
worker_thread+0xda/0x15d
kthread+0x84/0x8c
kernel_thread_helper+0x4/0x10
RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles]
---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

(1) Create a big file on my NFS server (104857600 bytes).

(2) Read the file into the cache with md5sum on the NFS client. Look in
/proc/fs/fscache/stats:

Pages : mrk=25601 unc=0

(3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc
again:

Pages : mrk=25601 unc=25601

Reported-by: Jeff Layton
Signed-off-by: David Howells
Reviewed-and-Tested-by: Suresh Jayaraman
cc: stable@kernel.org
Signed-off-by: Linus Torvalds

David Howells
2011-07-08 04:21:56 +0800

25 May, 2011

1 commit

e50c1f609 fscache: remove dead code under CONFIG_WORKQUEUE_DEBUGFS ... Browse Code »

There is no CONFIG_WORKQUEUE_DEBUGFS any more, so this code is dead.

Signed-off-by: WANG Cong
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Amerigo Wang
2011-05-25 23:39:44 +0800

23 Jul, 2010

1 commit

8af7c1243 fscache: convert operation to use workqueue instead of slow-work ... Browse Code »

Make fscache operation to use only workqueue instead of combination of
workqueue and slow-work. FSCACHE_OP_SLOW is dropped and
FSCACHE_OP_FAST is renamed to FSCACHE_OP_ASYNC and uses newly added
fscache_op_wq workqueue to execute op->processor().
fscache_operation_init_slow() is dropped and fscache_operation_init()
now takes @processor argument directly.

* Unbound workqueue is used.

* fscache_retrieval_work() is no longer necessary as OP_ASYNC now does
the equivalent thing.

* sysctl fscache.operation_max_active added to control concurrency.
The default value is nr_cpus clamped between 2 and
WQ_UNBOUND_MAX_ACTIVE.

* debugfs support is dropped for now. Tracing API based debug
facility is planned to be added.

Signed-off-by: Tejun Heo
Acked-by: David Howells

Tejun Heo
2010-07-23 04:58:47 +0800