Eric Lee / smarc-fsl-linux-kernel

05 May, 2011

2 commits

bd355f8ae Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: do not call __mark_dirty_inode under i_lock
libceph: fix ceph_osdc_alloc_request error checks
ceph: handle ceph_osdc_new_request failure in ceph_writepages_start
libceph: fix ceph_msg_new error path
ceph: use ihold() when i_lock is held

Linus Torvalds
2011-05-05 05:22:20 +0800
fca65b4ad ceph: do not call __mark_dirty_inode under i_lock ... Browse Code »

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.

Signed-off-by: Sage Weil

Sage Weil
2011-05-05 03:56:45 +0800

04 May, 2011

3 commits

cce2c56e7 logfs: initialize superblock entries earlier ... Browse Code »

In particular, s_freeing_list needs to be initialized early, since it is
used on some of the error paths when mounts fail. The mapping inode,
for example, would be initialized and then free'd on an error path
before s_freeing_list was initialized, but the inode drop operation
needs the s_freeing_list to be set up.

Normally you'd never see this, because not only is logfs fairly rare,
but a successful mount will never have any issues.

Reported-by: werner
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-05-04 07:10:25 +0800
8c71897be ceph: handle ceph_osdc_new_request failure in ceph_writepages_start ... Browse Code »

We should unlock the page and return -ENOMEM if ceph_osdc_new_request
failed.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2011-05-04 00:28:12 +0800
3772d26d8 ceph: use ihold() when i_lock is held ... Browse Code »

See 0444d76ae64fffc7851797fc1b6ebdbb44ac504a.

Signed-off-by: Sage Weil

Sage Weil
2011-05-04 00:28:08 +0800

03 May, 2011

3 commits

adadfe48d Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6 ... Browse Code »

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBIFS: seek journal heads to the latest bud in replay
UBIFS: do not free write-buffers when in R/O mode

Linus Torvalds
2011-05-03 03:17:29 +0800
52c6e6f99 UBIFS: seek journal heads to the latest bud in replay ... Browse Code »

This is the second fix of the following symptom:

UBIFS error (pid 34456): could not find an empty LEB

which sometimes happens after power cuts when we mount the file-system - UBIFS
refuses it with the above error message which comes from the
'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
with the UBIFS power cut emulation enabled.

Analysis of the problem.

Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
But the buds are replayed out-of-order, so the replay basically seeks journal
heads to the "random" bud belonging to this head, and not to the _last_ one.

The result of this is that the GC head may be seeked to a full LEB with no free
space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
fully or mostly dirty LEB to match the current GC head (because we need to
garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
also fail. As a result - recovery fails and mounting fails.

This patch teaches the replay to initialize the GC heads exactly to the latest
buds, i.e. the buds which have the largest sequence number in corresponding
log reference nodes.

Signed-off-by: Artem Bityutskiy
Cc: stable@kernel.org

Artem Bityutskiy
2011-05-03 00:23:48 +0800
b50b9f408 UBIFS: do not free write-buffers when in R/O mode ... Browse Code »

Currently UBIFS has a small optimization - it frees write-buffers when it is
re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
does not allocate write-buffers as well.

This optimization is nice but it leads to subtle problems and complications
in recovery, which I can reproduce using the integck test. The symptoms are
that after a power cut the file-system cannot be mounted if we first mount
it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:

UBIFS error (pid 34456): could not find an empty LEB

Analysis of the problem.

When mounting R/W, the reply process sets journal heads to buds [1], but
when mounting R/O - it does not do this, because the write-buffers are not
allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
same file-system but for the following 2 cases:

1. mounting R/W after a power cut and recover
2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery

In the former case, we have journal heads seeked to the a bud, in the latter
case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
try to recover the GC LEB by garbage-collecting to the GC head, but we just
try to find an empty LEB, and there may be no empty LEBs, so we just fail.
On the other hand, in the former case (mount R/W), we are able to make a GC LEB
(@c->gc_lnum) by garbage-collecting.

Thus, let's remove this small nice optimization and always allocate
write-buffers. This should not make too big difference - we have only 3
of them, each of max. write unit size, which is usually 2KiB. So this is
about 6KiB of RAM for the typical case, and only when mounted R/O.

[1]: Note, currently the replay process is setting (seeking) the journal heads
to _some_ buds, not necessarily to the buds which had been the journal heads
before the power cut happened. This will be fixed separately.

Signed-off-by: Artem Bityutskiy
Cc: stable@kernel.org

Artem Bityutskiy
2011-05-03 00:23:36 +0800

29 Apr, 2011

2 commits

9cab1ba42 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: don't lose MS_SYNCHRONOUS on remount of noac mount
NFS: Return meaningful status from decode_secinfo()
NFSv4: Ensure we request the ordinary fileid when doing readdirplus
NFSv4: Ensure that clientid and session establishment can time out
SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
NFSv4.1: Don't loop forever in nfs4_proc_create_session
NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception()
NFSv4.1: Don't update sequence number if rpc_task is not sent
NFSv4.1: Ensure state manager thread dies on last umount
SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies
NFS: Use correct variable for page bounds checking
NFS: don't negotiate when user specifies sec flavor
NFS: Attempt mount with default sec flavor first
NFS: flav_array honors NFS_MAX_SECFLAVORS
NFS: Fix infinite loop in gss_create_upcall()
Don't mark_inode_dirty_sync() while holding lock
NFS: Get rid of pointless test in nfs_commit_done
NFS: Remove unused argument from nfs_find_best_sec()
NFS: Eliminate duplicate call to nfs_mark_request_dirty
NFS: Remove dead code from nfs_fs_mount()

Linus Torvalds
2011-04-29 04:13:07 +0800
6d4831c28 vfs: avoid large kmalloc()s for the fdtable ... Browse Code »

Azurit reports large increases in system time after 2.6.36 when running
Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").

That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.

Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.

Reported-by: azurIt
Tested-by: azurIt
Acked-by: Changli Gao
Cc: Americo Wang
Cc: Jiri Slaby
Acked-by: Eric Dumazet
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2011-04-29 02:28:20 +0800

28 Apr, 2011

3 commits

26c4c1707 nfs: don't lose MS_SYNCHRONOUS on remount of noac mount ... Browse Code »

On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the
assumption that the flags on the mount syscall will have it set if the
remounted fs is supposed to keep it.

In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of
such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part
of the mount options.

Reported-by: Max Matveev
Signed-off-by: Jeff Layton
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust

Jeff Layton
2011-04-28 04:20:01 +0800
613e901e1 NFS: Return meaningful status from decode_secinfo() ... Browse Code »

When compiling, I was getting this warning:
fs/nfs/nfs4xdr.c: In function ‘decode_secinfo’:
fs/nfs/nfs4xdr.c:4839:6: warning: variable ‘status’ set but not used
[-Wunused-but-set-variable]

We were unconditionally returning 0 as long as there wasn't an error
coming out of xdr_inline_decode(). We probably want to check the error
status coming out of decode_op_hdr() and decode_secinfo_gss(), rather
than assuming that everything is OK all the time.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2011-04-28 04:17:29 +0800
28331a46d NFSv4: Ensure we request the ordinary fileid when doing readdirplus ... Browse Code »

When readdir() returns a directory entry for the root of a mounted
filesystem, Linux follows the old convention of returning the inode
number of the covered directory (despite newer versions of POSIX declaring
that this is a bug).
To ensure this continues to work, the NFSv4 readdir implementation requests
the 'mounted-on-fileid' from the server.

However, readdirplus also needs to instantiate an inode for this entry, and
for that, we also need to request the real fileid as per this patch.

Signed-off-by: Trond Myklebust

Trond Myklebust
2011-04-28 03:57:16 +0800

27 Apr, 2011

1 commit

e9c549998 Revert wrong fixes for common misspellings ... Browse Code »

These changes were incorrectly fixed by codespell. They were now
manually corrected.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-04-27 14:31:11 +0800

26 Apr, 2011

18 commits

019793b75 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: cleanup error handling in inode.c
Btrfs: put the right bio if we have an error
Btrfs: free bitmaps properly when evicting the cache
Btrfs: Free free_space item properly in btrfs_trim_block_group()
btrfs: add missing spin_unlock to a rare exit path
Btrfs: check return value of kmalloc()
btrfs: fix wrong allocating flag when reading page
Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()

Linus Torvalds
2011-04-26 23:26:58 +0800
cb49f5778 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: do some plugging in the submit_bio threads

Linus Torvalds
2011-04-26 23:25:16 +0800
f727a938c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
CIFS: Fix memory over bound bug in cifs_parse_mount_options

Linus Torvalds
2011-04-26 11:38:50 +0800
cd2e49e90 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Flush dirty pages in setattr
eCryptfs: Handle failed metadata read in lookup
eCryptfs: Add reference counting to lower files
eCryptfs: dput dentries returned from dget_parent
eCryptfs: Remove extra d_delete in ecryptfs_rmdir

Linus Torvalds
2011-04-26 10:01:12 +0800
1879fd6a2 add hlist_bl_lock/unlock helpers ... Browse Code »

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2011-04-26 09:14:10 +0800
5be79de2e eCryptfs: Flush dirty pages in setattr ... Browse Code »
43

After 57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.

To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.

https://bugzilla.kernel.org/show_bug.cgi?id=33372

Reported-by: Rocko
Signed-off-by: Tyler Hicks

Tyler Hicks
2011-04-26 07:49:46 +0800
3aeb86ea4 eCryptfs: Handle failed metadata read in lookup ... Browse Code »

When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.

If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.

This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.

https://bugs.launchpad.net/bugs/509180

Cc:
Signed-off-by: Tyler Hicks

Tyler Hicks
2011-04-26 07:45:06 +0800
7cf96da3e Btrfs: cleanup error handling in inode.c ... Browse Code »

The error processing of several places is changed like setting the
error number only at the error.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-04-26 07:43:53 +0800
64728bbbf Btrfs: put the right bio if we have an error ... Browse Code »

In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
the orig_bio, not bio.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-04-26 07:43:52 +0800
a4f0162fd Btrfs: free bitmaps properly when evicting the cache ... Browse Code »

If our space cache is wrong, we do the right thing and free up everything that
we loaded, however we don't reset the total_bitmaps counter or the thresholds or
anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap()
if it's a bitmap, this will keep us from panicing when we check to make sure we
don't have too many bitmaps. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-04-26 07:43:52 +0800
f789b684b Btrfs: Free free_space item properly in btrfs_trim_block_group() ... Browse Code »

Since commit dc89e9824464e91fa0b06267864ceabe3186fd8b, we've changed
to use a specific slab for alocation of free_space items.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-04-26 07:43:52 +0800
cfece4db1 btrfs: add missing spin_unlock to a rare exit path ... Browse Code »

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-04-26 07:43:52 +0800
8d413713c Btrfs: check return value of kmalloc() ... Browse Code »

The check on the return value of kmalloc() is added to some places.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-04-26 07:43:52 +0800
43e817a1f btrfs: fix wrong allocating flag when reading page ... Browse Code »

the space cache use extent_readpages() to read free space information,
so we can not use GFP_KERNEL flag to allocate memory, or it may lead
to deadlock.

Signed-off-by: Itaru Kitayama
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Itaru Kitayama
2011-04-26 07:43:51 +0800
a62f44a5f Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log() ... Browse Code »

It is necessary to unlock mutex_lock before it return an error when
btrfs_alloc_path() fails.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-04-26 07:43:51 +0800
332ab16f8 eCryptfs: Add reference counting to lower files ... Browse Code »

For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.

This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.

Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.

This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-04-26 07:32:37 +0800
dd55c8985 eCryptfs: dput dentries returned from dget_parent ... Browse Code »

Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-04-26 07:32:36 +0800
35ffa948b eCryptfs: Remove extra d_delete in ecryptfs_rmdir ... Browse Code »

vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-04-26 07:32:35 +0800

25 Apr, 2011

2 commits

1bd714f2a NFSv4: Ensure that clientid and session establishment can time out ... Browse Code »

The following patch ensures that we do not get permanently trapped in
the RPC layer when trying to establish a new client id or session.
This again ensures that the state manager can finish in a timely
fashion when the last filesystem to reference the nfs_client exits.

Signed-off-by: Trond Myklebust

Trond Myklebust
2011-04-25 02:29:33 +0800
fd954ae12 NFSv4.1: Don't loop forever in nfs4_proc_create_session ... Browse Code »

If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
up looping forever inside nfs4_proc_create_session, and so the usual
mechanisms for detecting if the nfs_client is dead don't work.

Fix this by ensuring that we loop inside the nfs4_state_manager thread
instead.

Signed-off-by: Trond Myklebust

Trond Myklebust
2011-04-25 02:28:18 +0800

24 Apr, 2011

4 commits

5dd12af05 Merge branch 'dcache-cleanup' ... Browse Code »

* dcache-cleanup:
vfs: get rid of insane dentry hashing rules

Linus Torvalds
2011-04-24 23:51:15 +0800
1f91f48b6 Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6 ... Browse Code »

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix master node recovery
UBIFS: fix false assertion warning in case of I/O failures
UBIFS: fix false space checking failure

Linus Torvalds
2011-04-24 23:42:15 +0800
dea3667bc vfs: get rid of insane dentry hashing rules ... Browse Code »

The dentry hashing rules have been really quite complicated for a long
while, in odd ways. That made functions like __d_drop() very fragile
and non-obvious.

In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit. That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).

The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").

That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.

That's just insane. It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.

So turn all of it around, and make it logical.

Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.

And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules. Now suddently that test at dentry free
time makes sense too.

And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.

Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.

Signed-off-by: Linus Torvalds

Linus Torvalds
2011-04-24 22:58:46 +0800
b07ad9967 vfs: get rid of 'struct dcache_hash_bucket' abstraction ... Browse Code »

It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse. All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.

And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.

As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.

Make it all look and work the same.

Signed-off-by: Linus Torvalds

Linus Torvalds
2011-04-24 13:32:03 +0800

22 Apr, 2011

2 commits

4906e50b3 CIFS: Fix memory over bound bug in cifs_parse_mount_options ... Browse Code »

While password processing we can get out of options array bound if
the next character after array is delimiter. The patch adds a check
if we reach the end.

Signed-off-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Pavel Shilovsky
2011-04-22 01:22:43 +0800
37fc67c9f Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix duplicate message output

Linus Torvalds
2011-04-22 01:01:26 +0800