Eric Lee / linux-smarc-t335x-v3.2

29 Mar, 2011

1 commit

0444d76ae fs: don't use igrab() while holding i_lock ... Browse Code »

Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.

Signed-off-by: Dave Chinner
Cc: Al Viro
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon
Signed-off-by: Linus Torvalds

Dave Chinner
2011-03-29 22:50:34 +0800

20 Nov, 2010

1 commit

76db8ac45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix readdir EOVERFLOW on 32-bit archs
ceph: fix frag offset for non-leftmost frags
ceph: fix dangling pointer
ceph: explicitly specify page alignment in network messages
ceph: make page alignment explicit in osd interface
ceph: fix comment, remove extraneous args
ceph: fix update of ctime from MDS
ceph: fix version check on racing inode updates
ceph: fix uid/gid on resent mds requests
ceph: fix rdcache_gen usage and invalidate
ceph: re-request max_size if cap auth changes
ceph: only let auth caps update max_size
ceph: fix open for write on clustered mds
ceph: fix bad pointer dereference in ceph_fill_trace
ceph: fix small seq message skipping
Revert "ceph: update issue_seq on cap grant"

Linus Torvalds
2010-11-20 07:32:22 +0800

10 Nov, 2010

1 commit

b7495fc2f ceph: make page alignment explicit in osd interface ... Browse Code »

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:43:12 +0800

27 Oct, 2010

1 commit

1b430beee writeback: remove nonblocking/encountered_congestion references ... Browse Code »

This removes more dead code that was somehow missed by commit 0d99519efef
(writeback: remove unused nonblocking and congestion checks). There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion. The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
that would skip some dirty inodes on congestion and page out others, which
is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!

Signed-off-by: Wu Fengguang
Cc: Theodore Ts'o
Cc: David Howells
Cc: Sage Weil
Cc: Steve French
Cc: Chris Mason
Cc: Jens Axboe
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-10-27 07:52:05 +0800

21 Oct, 2010

1 commit

3d14c5d2b ceph: factor out libceph from Ceph file system ... Browse Code »

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:

- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil

Yehuda Sadeh
2010-10-21 06:37:28 +0800

17 Sep, 2010

1 commit

ae00d4f37 ceph: fix cap_snap and realm split ... Browse Code »

The cap_snap creation/queueing relies on both the current i_head_snapc
_and_ the i_snap_realm pointers being correct, so that the new cap_snap
can properly reference the old context and the new i_head_snapc can be
updated to reference the new snaprealm's context. To fix this, we:

- move inodes completely to the new (split) realm so that i_snap_realm
is correct, and
- generate the new snapc's _before_ queueing the cap_snaps in
ceph_update_snap_trace().

Signed-off-by: Sage Weil

Sage Weil
2010-09-17 07:26:51 +0800

12 Sep, 2010

1 commit

a77d9f7dc ceph: fix file offset wrapping at 4GB on 32-bit archs ... Browse Code »

Cast the value before shifting so that we don't run out of bits with a
32-bit unsigned long. This fixes wrapping of high file offsets into the
low 4GB of a file on disk, and the subsequent data corruption for large
files.

Signed-off-by: Sage Weil

Sage Weil
2010-09-12 01:55:25 +0800

25 Aug, 2010

1 commit

7d8cb26d7 ceph: maintain i_head_snapc when any caps are dirty, not just for data ... Browse Code »

We used to use i_head_snapc to keep track of which snapc the current epoch
of dirty data was dirtied under. It is used by queue_cap_snap to set up
the cap_snap. However, since we queue cap snaps for any dirty caps, not
just for dirty file data, we need to keep a valid i_head_snapc anytime
we have dirty|flushing caps. This fixes a NULL pointer deref in
queue_cap_snap when writing back dirty caps without data (e.g.,
snaptest-authwb.sh).

Signed-off-by: Sage Weil

Sage Weil
2010-08-25 07:24:18 +0800

23 Aug, 2010

1 commit

679ceace8 mm: exporting account_page_dirty ... Browse Code »

This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Signed-off-by: Michael Rubin
Signed-off-by: Sage Weil

Michael Rubin
2010-08-23 06:16:51 +0800

04 Aug, 2010

1 commit

213c99ee0 ceph: whitespace cleanup ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2010-08-04 01:25:11 +0800

02 Aug, 2010

1 commit

2962507ca ceph: perform lazy reads when file mode and caps permit ... Browse Code »

If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it. Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.

Signed-off-by: Sage Weil

Sage Weil
2010-08-02 11:11:39 +0800

18 May, 2010

2 commits

640ef79d2 ceph: use ceph_sb_to_client instead of ceph_client ... Browse Code »

ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

-static inline struct ceph_client *ceph_client(struct super_block *sb)
-{
- return sb->s_fs_info;
-}

Signed-off-by: Cheng Renquan
Signed-off-by: Sage Weil

Cheng Renquan
2010-05-18 06:25:17 +0800
31459fe4b ceph: use __page_cache_alloc and add_to_page_cache_lru ... Browse Code »

Following Nick Piggin patches in btrfs, pagecache pages should be
allocated with __page_cache_alloc, so they obey pagecache memory
policies.

Also, using add_to_page_cache_lru instead of using a private
pagevec where applicable.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-05-18 06:25:12 +0800

06 May, 2010

1 commit

54ad023ba ceph: don't use writeback_control in writepages completion ... Browse Code »

The ->writepages writeback_control is not still valid in the writepages
completion. We were touching it solely to adjust pages_skipped when there
was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials),
causing an oops in the writeback code shortly thereafter. Updating
pages_skipped on error isn't correct anyway, so let's just rip out this
(clearly broken) code to pass the wbc to the completion.

Signed-off-by: Sage Weil

Sage Weil
2010-05-06 12:31:40 +0800

04 May, 2010

1 commit

7ff899da0 ceph: fix lockless caps check ... Browse Code »

The __ variant requires caller to hold i_lock.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800

15 Apr, 2010

1 commit

96e35b40c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: use separate class for ceph sockets' sk_lock
ceph: reserve one more caps space when doing readdir
ceph: queue_cap_snap should always queue dirty context
ceph: fix dentry reference leak in dcache readdir
ceph: decode v5 of osdmap (pool names) [protocol change]
ceph: fix ack counter reset on connection reset
ceph: fix leaked inode ref due to snap metadata writeback race
ceph: fix snap context reference leaks
ceph: allow writeback of snapped pages older than 'oldest' snapc
ceph: fix dentry rehashing on virtual .snap dir

Linus Torvalds
2010-04-15 09:45:31 +0800

02 Apr, 2010

2 commits

6298a3375 ceph: fix snap context reference leaks ... Browse Code »

The get_oldest_context() helper takes a reference to the returned snap
context, but most callers weren't dropping that reference. Fix them.

Also drop the unused locked __get_oldest_context() variant.

Signed-off-by: Sage Weil

Sage Weil
2010-04-02 00:34:37 +0800
80e755fed ceph: allow writeback of snapped pages older than 'oldest' snapc ... Browse Code »

On snap deletion, we don't regenerate ceph_cap_snaps for inodes with dirty
pages because deletion does not affect metadata writeback. However, we
did run into problems when we went to write back the pages because the
'oldest' snapc is determined by the oldest cap_snap, and that may be the
newer snapc that reflects the deletion. This caused confusion and an
infinite loop in ceph_update_writeable_page().

Change the snapc checks to allow writeback of any snapc that is equal to
OR older than the 'oldest' snapc.

When there are no cap_snaps, we were also using the realm's latest snapc
for writeback, which complicates ceph_put_wrbufffer_cap_refs(). Instead,
use i_head_snapc, the most snapc used for the most recent ('head') data.
This makes the writeback snapc (ceph_osd_request.r_snapc) _always_ match a
capsnap or i_head_snapc.

Also, in writepags_finish(), drop the snapc referenced by the _page_
and do not assume it matches the request snapc (it may not anymore).

Signed-off-by: Sage Weil

Sage Weil
2010-04-02 00:34:36 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

23 Mar, 2010

1 commit

8f883c24d ceph: make write_begin wait propagate ERESTARTSYS ... Browse Code »

Currently, if the wait_event_interruptible is interrupted, we
return EAGAIN unconditionally and loop, such that we aren't, in
fact, interruptible. So, propagate ERESTARTSYS if we get it.

Signed-off-by: Sage Weil

Sage Weil
2010-03-23 22:47:03 +0800

24 Feb, 2010

1 commit

4ce1e9ada ceph: move dereference after NULL test ... Browse Code »

Signed-off-by: Alexander Beregalov
Signed-off-by: Sage Weil

Alexander Beregalov
2010-02-24 06:26:34 +0800

20 Feb, 2010

1 commit

e63dc5c78 ceph: remove page upon writeback completion if lost cache cap ... Browse Code »

This page should have been removed earlier when the cache cap was
revoked, but a writeback was in flight, so it was skipped. We truncate
it here just as the writeback finishes, while it's still locked.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-20 06:34:18 +0800

12 Feb, 2010

3 commits

3c6f6b79a ceph: cleanup async writeback, truncation, invalidate helpers ... Browse Code »

Grab inode ref in helper. Make work functions static, with consistent
naming.

Signed-off-by: Sage Weil

Sage Weil
2010-02-12 03:48:54 +0800
4af6b2257 ceph: refactor ceph_write_begin, fix ceph_page_mkwrite ... Browse Code »

Originally ceph_page_mkwrite called ceph_write_begin, hoping that
the returned locked page would be the page that it was requested
to mkwrite. Factored out relevant part of ceph_page_mkwrite and
we lock the right page anyway.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-12 03:48:50 +0800
b056c8769 ceph: remove unused variable ... Browse Code »

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-12 03:48:48 +0800

03 Feb, 2010

1 commit

79788c698 ceph: release all pages after successful osd write response ... Browse Code »

We release all the pages, even if the osd response was
different than the number of pages written. This could only
happen due to truncation that arrives the osd in
different order, for which we want the pages released anyway.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2010-02-03 08:34:04 +0800

26 Jan, 2010

1 commit

ec7384ec2 ceph: remove duplicate variable initialization ... Browse Code »

The variable client is initialized twice to the same (side effect-free)
expression. Drop one initialization.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

//
@forall@
idexpression *x;
identifier f!=ERR_PTR;
@@

x = f(...)
... when != x
(
x = f(...,,...)
|
* x = f(...)
)
//

Signed-off-by: Julia Lawall
Signed-off-by: Sage Weil

Julia Lawall
2010-01-26 03:33:35 +0800

22 Dec, 2009

2 commits

2baba2501 ceph: writeback congestion control ... Browse Code »

Set bdi congestion bit when amount of write data in flight exceeds adjustable
threshold.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2009-12-22 08:39:56 +0800
dbd646a85 ceph: writepage grabs and releases inode ... Browse Code »

Fixes a deadlock that is triggered due to kswapd,
while the page was locked and the iput couldn't tear
down the address space.

Signed-off-by: Yehuda Sadeh

Yehuda Sadeh
2009-12-22 08:39:56 +0800

28 Oct, 2009

1 commit

6b8051855 ceph: allocate and parse mount args before client instance ... Browse Code »

This simplifies much of the error handling during mount. It also means
that we have the mount args before client creation, and we can initialize
based on those options.

Signed-off-by: Sage Weil

Sage Weil
2009-10-28 02:57:03 +0800

07 Oct, 2009

1 commit

1d3576fd1 ceph: address space operations ... Browse Code »

The ceph address space methods are concerned primarily with managing
the dirty page accounting in the inode, which (among other things)
must keep track of which snapshot context each page was dirtied in,
and ensure that dirty data is written out to the OSDs in snapshort
order.

A writepage() on a page that is not currently writeable due to
snapshot writeback ordering constraints is ignored (it was presumably
called from kswapd).

Signed-off-by: Sage Weil

Sage Weil
2009-10-07 02:31:09 +0800