Eric Lee / smarc-fsl-linux-kernel

22 Apr, 2010

1 commit

989a29792 fasync: RCU and fine grained locking ... Browse Code »

kill_fasync() uses a central rwlock, candidate for RCU conversion, to
avoid cache line ping pongs on SMP.

fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
section instead during whole list scan.

Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
doesnt need its own implementation and can use fasync_helper(), to
reduce code size and complexity.

We can remove __kill_fasync() direct use in net/socket.c, and rename it
to kill_fasync_rcu().

Signed-off-by: Eric Dumazet
Cc: Paul E. McKenney
Cc: Lai Jiangshan
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-22 07:19:29 +0800

21 Apr, 2010

2 commits

05ce7bfe5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Convert __DQUOT_PARANOIA symbol to standard config option

Linus Torvalds
2010-04-21 00:39:40 +0800
62af9b520 quota: Convert __DQUOT_PARANOIA symbol to standard config option ... Browse Code »

Make __DQUOT_PARANOIA define from the old days a standard config option
and turn it off by default.

This gets rid of a quota warning about writes before quota is turned on
for systems with ext4 root filesystem. Currently there's no way to legally
solve this because /etc/mtab has to be written before quota is turned on
on most systems.

Signed-off-by: Jan Kara

Jan Kara
2010-04-21 00:25:25 +0800

20 Apr, 2010

6 commits

9b030e200 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Turn lower lookup error messages into debug messages
eCryptfs: Copy lower directory inode times and size on link
ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
ecryptfs: fix error code for missing xattrs in lower fs
eCryptfs: Decrypt symlink target for stat size
eCryptfs: Strip metadata in xattr flag in encrypted view
eCryptfs: Clear buffer before reading in metadata xattr
eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front
eCryptfs: Fix metadata in xattr feature regression

Linus Torvalds
2010-04-20 05:20:32 +0800
9f37622f8 eCryptfs: Turn lower lookup error messages into debug messages ... Browse Code »

Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
file name have caused many users to become concerned about their data.
Since this is a rather harmless condition, I'm moving this warning to
only be printed when the ecryptfs_verbosity module param is 1.

Signed-off-by: Tyler Hicks

Tyler Hicks
2010-04-20 03:42:18 +0800
3a8380c07 eCryptfs: Copy lower directory inode times and size on link ... Browse Code »

The timestamps and size of a lower inode involved in a link() call was
being copied to the upper parent inode. Instead, we should be
copying lower parent inode's timestamps and size to the upper parent
inode. I discovered this bug using the POSIX test suite at Tuxera.

Signed-off-by: Tyler Hicks

Tyler Hicks
2010-04-20 03:42:15 +0800
133b8f9d6 ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode ... Browse Code »

Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.

When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.

The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.

As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().

This patch removes the d_drop call and fixes both issues.

This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887

Reported-by: Árpád Bíró
Signed-off-by: Jeff Mahoney
Cc: Dustin Kirkland
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks

Jeff Mahoney
2010-04-20 03:42:13 +0800
cfce08c6b ecryptfs: fix error code for missing xattrs in lower fs ... Browse Code »

If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.

Signed-off-by: Christian Pulvermacher
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks

Christian Pulvermacher
2010-04-20 03:42:09 +0800
3a60a1686 eCryptfs: Decrypt symlink target for stat size ... Browse Code »

Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.

https://bugs.launchpad.net/bugs/524919

Reported-by: Loïc Minier
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks

Tyler Hicks
2010-04-20 03:41:51 +0800

17 Apr, 2010

2 commits

f1d486a36 xfs: don't warn on EAGAIN in inode reclaim ... Browse Code »

Any inode reclaim flush that returns EAGAIN will result in the inode
reclaim being attempted again later. There is no need to issue a
warning into the logs about this situation.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Signed-off-by: Alex Elder

Dave Chinner
2010-04-17 02:51:44 +0800
b6f8dd49d xfs: ensure that sync updates the log tail correctly ... Browse Code »

Updates to the VFS layer removed an extra ->sync_fs call into the
filesystem during the sync process (from the quota code).
Unfortunately the sync code was unknowingly relying on this call to
make sure metadata buffers were flushed via a xfs_buftarg_flush()
call to move the tail of the log forward in memory before the final
transactions of the sync process were issued.

As a result, the old code would write a very recent log tail value
to the log by the end of the sync process, and so a subsequent crash
would leave nothing for log recovery to do. Hence in qa test 182,
log recovery only replayed a small handle for inode fsync
transactions in this case.

However, with the removal of the extra ->sync_fs call, the log tail
was now not moved forward with the inode fsync transactions near the
end of the sync procese the first (and only) buftarg flush occurred
after these transactions went to disk. The result is that log
recovery now sees a large number of transactions for metadata that
is already on disk.

This usually isn't a problem, but when the transactions include
inode chunk allocation, the inode create transactions and all
subsequent changes are replayed as we cannt rely on what is on disk
is valid. As a result, if the inode was written and contains
unlogged changes, the unlogged changes are lost, thereby violating
sync semantics.

The fix is to always issue a transaction after the buftarg flush
occurs is the log iѕ not idle or covered. This results in a dummy
transaction being written that contains the up-to-date log tail
value, which will be very recent. Indeed, it will be at least as
recent as the old code would have left on disk, so log recovery
will behave exactly as it used to in this situation.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-04-17 02:51:23 +0800

15 Apr, 2010

1 commit

96e35b40c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: use separate class for ceph sockets' sk_lock
ceph: reserve one more caps space when doing readdir
ceph: queue_cap_snap should always queue dirty context
ceph: fix dentry reference leak in dcache readdir
ceph: decode v5 of osdmap (pool names) [protocol change]
ceph: fix ack counter reset on connection reset
ceph: fix leaked inode ref due to snap metadata writeback race
ceph: fix snap context reference leaks
ceph: allow writeback of snapped pages older than 'oldest' snapc
ceph: fix dentry rehashing on virtual .snap dir

Linus Torvalds
2010-04-15 09:45:31 +0800

14 Apr, 2010

4 commits

0fdfe5ad2 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4: fix delegated locking
NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
NFS: Fix a race with the new commit code
NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
NFS: Fix the mode calculation in nfs_find_open_context
NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR

Linus Torvalds
2010-04-14 06:10:16 +0800
a6a5349d1 ceph: use separate class for ceph sockets' sk_lock ... Browse Code »

Use a separate class for ceph sockets to prevent lockdep confusion.
Because ceph sockets only get passed kernel pointers, there is no
dependency from sk_lock -> mmap_sem. If we share the same class as other
sockets, lockdep detects a circular dependency from

mmap_sem (page fault) -> fs mutex -> sk_lock -> mmap_sem

because dependencies are noted from both ceph and user contexts. Using
a separate class prevents the sk_lock(ceph) -> mmap_sem dependency and
makes lockdep happy.

Signed-off-by: Sage Weil

Sage Weil
2010-04-14 05:07:07 +0800
e1e4dd0ca ceph: reserve one more caps space when doing readdir ... Browse Code »

We were missing space for the directory cap. The result was a BUG at
fs/ceph/caps.c:2178.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-04-14 03:28:54 +0800
fc837c8f0 ceph: queue_cap_snap should always queue dirty context ... Browse Code »

This simplifies the calling convention, and fixes a bug where we queue a
capsnap with a context other than i_head_snapc (the one that matches the
dirty pages). The result was a BUG at fs/ceph/caps.c:2178 on writeback
completion when a capsnap matching the writeback snapc could not be found.

Signed-off-by: Sage Weil

Sage Weil
2010-04-14 03:28:31 +0800

13 Apr, 2010

9 commits

d6cf853d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: make sure the chunk allocator doesn't create zero length chunks
Btrfs: fix data enospc check overflow

Linus Torvalds
2010-04-13 09:37:04 +0800
6a945f38b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Fix possible dq_flags corruption
quota: Hide warnings about writes to the filesystem before quota was turned on
ext3: symlink must be handled via filesystem specific operation
ext2: symlink must be handled via filesystem specific operation

Linus Torvalds
2010-04-13 09:36:49 +0800
50fc88cb0 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: add speciffic ->setattr callback
udf: potential integer overflow

Linus Torvalds
2010-04-13 09:36:34 +0800
44fa2b4be Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix typo "numer" -> "number" in alloc.c
nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
nilfs2: fix a wrong type conversion in nilfs_ioctl()

Linus Torvalds
2010-04-13 09:34:25 +0800
f5b066287 ceph: fix dentry reference leak in dcache readdir ... Browse Code »

When filldir returned an error (e.g. buffer full for a large directory),
we would leak a dentry reference, causing an oops on umount.

Signed-off-by: Sage Weil

Sage Weil
2010-04-13 05:25:51 +0800
08261673c quota: Fix possible dq_flags corruption ... Browse Code »

dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.

Signed-off-by: Andrew Perepechko
Signed-off-by: Jan Kara

Andrew Perepechko
2010-04-13 03:12:36 +0800
4c5e6c0e7 quota: Hide warnings about writes to the filesystem before quota was turned on ... Browse Code »

For a root filesystem write to the filesystem before quota is turned on happens
regularly and there's no way around it because of writes to syslog, /etc/mtab,
and similar. So the warning is rather pointless for ordinary users. It's
still useful during development so we just hide the warning behind
__DQUOT_PARANOIA config option.

Signed-off-by: Jan Kara

Jan Kara
2010-04-13 03:12:19 +0800
774f03fb2 ext3: symlink must be handled via filesystem specific operation ... Browse Code »

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext3_setattr.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2010-04-13 03:11:39 +0800
fc7683a3c ext2: symlink must be handled via filesystem specific operation ... Browse Code »

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext2_setattr.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2010-04-13 03:11:25 +0800

12 Apr, 2010

2 commits

0df5dd4aa NFSv4: fix delegated locking ... Browse Code »

Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4:
Don't allow posix locking against servers that don't support it).

According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.

The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

Reported-by: Arnaud Giersch
Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-04-12 19:55:15 +0800
be3bd2223 nilfs2: fix typo "numer" -> "number" in alloc.c ... Browse Code »

Fixes the typo found in a warning message of a persistent object
allocator function.

Signed-off-by: Ryusuke Konishi

Ryusuke Konishi
2010-04-12 00:51:03 +0800

10 Apr, 2010

7 commits

2c61be0a9 NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible ... Browse Code »

We always want to ensure that WRITE and COMMIT completes, whether or not
the user presses ^C. Do this by making the call asynchronous, and allowing
the user to do an interruptible wait for rpc_task completion.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-04-10 07:54:50 +0800
a6305ddb0 NFS: Fix a race with the new commit code ... Browse Code »

This patch fixes a race which occurs due to the fact that we release the
PG_writeback flag while still holding the nfs_page locked.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-04-10 07:08:17 +0800
b80c3cb62 NFS: Ensure that writeback_single_inode() calls write_inode() when syncing ... Browse Code »
1

Since writeback_single_inode() checks the inode->i_state flags _before_ it
flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is
already set. Otherwise we risk not seeing a call to write_inode(), which
again means that we break fsync() et al...

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-04-10 07:08:17 +0800
1544fa0f7 NFS: Fix the mode calculation in nfs_find_open_context ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-04-10 07:08:16 +0800
80e60639f NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR ... Browse Code »

Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-04-10 07:08:16 +0800
2844a76a2 ceph: decode v5 of osdmap (pool names) [protocol change] ... Browse Code »

Teach the client to decode an updated format for the osdmap. The new
format includes pool names, which will be useful shortly. Get this change
in earlier rather than later.

Signed-off-by: Sage Weil

Sage Weil
2010-04-10 06:50:58 +0800
2f4084209 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
loop: Update mtime when writing using aops
block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
backing-dev: Handle class_create() failure
Block: Fix block/elevator.c elevator_get() off-by-one error
drbd: lc_element_by_index() never returns NULL
cciss: unlock on error path
cfq-iosched: Do not merge queues of BE and IDLE classes
cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
i2o: Remove the dangerous kobj_to_i2o_device macro
block: remove 16 bytes of padding from struct request on 64bits
cfq-iosched: fix a kbuild regression
block: make CONFIG_BLK_CGROUP visible
Remove GENHD_FL_DRIVERFS
block: Export max number of segments and max segment size in sysfs
block: Finalize conversion of block limits functions
block: Fix overrun in lcm() and move it to lib
vfs: improve writeback_inodes_wb()
paride: fix off-by-one test
drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
...

Linus Torvalds
2010-04-10 02:50:29 +0800

09 Apr, 2010

1 commit

9ddd3a31a Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
not overwriting file_lock structure after GET_LK
cifs: Fix a kernel BUG with remote OS/2 server (try #3)
[CIFS] initialize nbytes at the beginning of CIFSSMBWrite()
[CIFS] Add mmap for direct, nobrl cifs mount types

Linus Torvalds
2010-04-09 02:58:14 +0800

08 Apr, 2010

3 commits

c15d0fc0f udf: add speciffic ->setattr callback ... Browse Code »

generic setattr not longer responsible for quota transfer.
use udf_setattr for all udf's inodes.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2010-04-08 21:35:20 +0800
69ecbbeda udf: potential integer overflow ... Browse Code »

bloc->logicalBlockNum is unsigned so it's never less than zero.

When I saw that, it made me worry that "bloc->logicalBlockNum + count"
could overflow. That's why I changed the check for less than zero
to an overflow check. (The test works because "count" is also
unsigned.)

Signed-off-by: Dan Carpenter
Signed-off-by: Jan Kara

Dan Carpenter
2010-04-08 21:35:20 +0800
04287f975 Have nfs ->d_revalidate() report errors properly ... Browse Code »

If nfs atomic open implementation ends up doing open request from
->d_revalidate() codepath and gets an error from server, return that error
to caller explicitly and don't bother with lookup_instantiate_filp() at all.
->d_revalidate() can return an error itself just fine...

See
http://bugzilla.kernel.org/show_bug.cgi?id=15674
http://marc.info/?l=linux-kernel&m=126988782722711&w=2

for original report.

Reported-by: Daniel J Blueman
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2010-04-08 07:10:16 +0800

07 Apr, 2010

2 commits

cc4fc29e5 fs-cache: order the debugfs stats correctly ... Browse Code »

Order the debugfs statistics correctly. The values displayed through a
seq_printf() statement should be in the same order as the names in the
format string.

In the 'Lookups' line, objects created ('crt=') and lookups timed out
('tmo=') have their values transposed.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-04-07 23:38:05 +0800
116354d17 pagemap: fix pfn calculation for hugepage ... Browse Code »

When we look into pagemap using page-types with option -p, the value of
pfn for hugepages looks wrong (see below.) This is because pte was
evaluated only once for one vma although it should be updated for each
hugepage. This patch fixes it.

$ page-types -p 3277 -Nl -b huge
voffset offset len flags
7f21e8a00 11e400 1 ___U___________H_G________________
7f21e8a01 11e401 1ff ________________TG________________
^^^
7f21e8c00 11e400 1 ___U___________H_G________________
7f21e8c01 11e401 1ff ________________TG________________
^^^

One hugepage contains 1 head page and 511 tail pages in x86_64 and each
two lines represent each hugepage. Voffset and offset mean virtual
address and physical address in the page unit, respectively. The
different hugepages should not have the same offset value.

With this patch applied:

$ page-types -p 3386 -Nl -b huge
voffset offset len flags
7fec7a600 112c00 1 ___UD__________H_G________________
7fec7a601 112c01 1ff ________________TG________________
^^^
7fec7a800 113200 1 ___UD__________H_G________________
7fec7a801 113201 1ff ________________TG________________
^^^
OK

More info:

- This patch modifies walk_page_range()'s hugepage walker. But the
change only affects pagemap_read(), which is the only caller of hugepage
callback.

- Without this patch, hugetlb_entry() callback is called per vma, that
doesn't match the natural expectation from its name.

- With this patch, hugetlb_entry() is called per hugepte entry and the
callback can become much simpler.

Signed-off-by: Naoya Horiguchi
Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-04-07 23:38:04 +0800