Eric Lee / linux-smarc-t335x-v3.2

07 Oct, 2010

2 commits

089eed29b Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
writeback: always use sb->s_bdi for writeback purposes

Linus Torvalds
2010-10-07 02:11:18 +0800
8fe9793af Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: Initialize total_len in fuse_retrieve()

Linus Torvalds
2010-10-07 00:50:41 +0800

04 Oct, 2010

2 commits

aaead25b9 writeback: always use sb->s_bdi for writeback purposes ... Browse Code »

We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.

To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.

Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.

The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-10-04 20:25:33 +0800
0157443c5 fuse: Initialize total_len in fuse_retrieve() ... Browse Code »

fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
function

Initialize total_len to zero, else its value will be undefined.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Miklos Szeredi

Geert Uytterhoeven
2010-10-04 16:45:32 +0800

02 Oct, 2010

5 commits

c6ea21e35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: prevent infinite recursion in cifs_reconnect_tcon
cifs: set backing_dev_info on new S_ISREG inodes

Linus Torvalds
2010-10-02 06:03:37 +0800
9d8117e72 reiserfs: fix unwanted reiserfs lock recursion ... Browse Code »

Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
because we may call journal_begin() that requires the lock to be taken
only once, otherwise it won't be able to release the lock while taking
other mutexes, ending up in inverted dependencies between the journal
mutex and the reiserfs lock for example.

This fixes:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35.4.4a #3
-------------------------------------------------------
lilo/1620 is trying to acquire lock:
(&journal->j_mutex){+.+...}, at: [] do_journal_begin_r+0x7f/0x340 [reiserfs]

but task is already holding lock:
(&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_write_lock+0x28/0x40 [reiserfs]
[] do_journal_begin_r+0x86/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_remount+0x224/0x530 [reiserfs]
[] do_remount_sb+0x60/0x110
[] do_mount+0x625/0x790
[] sys_mount+0x84/0xb0
[] syscall_call+0x7/0xb

-> #0 (&journal->j_mutex){+.+...}:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] do_journal_begin_r+0x7f/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
[] reiserfs_get_block+0x22c/0x1530 [reiserfs]
[] __block_prepare_write+0x1bb/0x3a0
[] block_prepare_write+0x26/0x40
[] reiserfs_prepare_write+0x88/0x170 [reiserfs]
[] reiserfs_unpack+0xe6/0x120 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

other info that might help us debug this:

2 locks held by lilo/1620:
#0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x6a/0x120 [reiserfs]
#1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

stack backtrace:
Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
Call Trace:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] do_journal_begin_r+0x7f/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
[] reiserfs_get_block+0x22c/0x1530 [reiserfs]
[] __block_prepare_write+0x1bb/0x3a0
[] block_prepare_write+0x26/0x40
[] reiserfs_prepare_write+0x88/0x170 [reiserfs]
[] reiserfs_unpack+0xe6/0x120 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski
Tested-by: Jarek Poplawski
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: All since 2.6.32
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Frederic Weisbecker
2010-10-02 01:50:59 +0800
3f259d092 reiserfs: fix dependency inversion between inode and reiserfs mutexes ... Browse Code »

The reiserfs mutex already depends on the inode mutex, so we can't lock
the inode mutex in reiserfs_unpack() without using the safe locking API,
because reiserfs_unpack() is always called with the reiserfs mutex locked.

This fixes:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35c #13
-------------------------------------------------------
lilo/1606 is trying to acquire lock:
(&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x60/0x110 [reiserfs]

but task is already holding lock:
(&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_write_lock+0x28/0x40 [reiserfs]
[] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
[] reiserfs_fill_super+0x941/0xe60 [reiserfs]
[] get_sb_bdev+0x117/0x170
[] get_super_block+0x21/0x30 [reiserfs]
[] vfs_kern_mount+0x6a/0x1b0
[] do_kern_mount+0x39/0xe0
[] do_mount+0x340/0x790
[] sys_mount+0x84/0xb0
[] syscall_call+0x7/0xb

-> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_unpack+0x60/0x110 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

other info that might help us debug this:

1 lock held by lilo/1606:
#0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

stack backtrace:
Pid: 1606, comm: lilo Not tainted 2.6.35c #13
Call Trace:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_unpack+0x60/0x110 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski
Tested-by: Jarek Poplawski
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: [2.6.32 and later]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Frederic Weisbecker
2010-10-02 01:50:59 +0800
3036e7b49 proc: make /proc/pid/limits world readable ... Browse Code »

Having the limits file world readable will ease the task of system
management on systems where root privileges might be restricted.

Having admin restricted with root priviledges, he/she could not check
other users process' limits.

Also it'd align with most of the /proc stat files.

Signed-off-by: Jiri Olsa
Acked-by: Neil Horman
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Olsa
2010-10-02 01:50:59 +0800
f569599ae cifs: prevent infinite recursion in cifs_reconnect_tcon ... Browse Code »

cifs_reconnect_tcon is called from smb_init. After a successful
reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That
function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo.
Those functions also call smb_init.

It's possible for the session and tcon reconnect to succeed, and then
for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or
CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call
smb_init and cifs_reconnect_tcon again, ad infinitum...

Break the infinite recursion by having those functions use a new
smb_init variant that doesn't attempt to perform a reconnect.

Reported-and-Tested-by: Michal Suchanek
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2010-10-02 01:50:08 +0800

30 Sep, 2010

3 commits

0d4911081 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Don't walk off the end of fast symlinks.

Linus Torvalds
2010-09-30 11:38:07 +0800
1fc8a1178 ocfs2: Don't walk off the end of fast symlinks. ... Browse Code »

ocfs2 fast symlinks are NUL terminated strings stored inline in the
inode data area. However, disk corruption or a local attacker could, in
theory, remove that NUL. Because we're using strlen() (my fault,
introduced in a731d1 when removing vfs_follow_link()), we could walk off
the end of that string.

Signed-off-by: Joel Becker
Cc: stable@kernel.org

Joel Becker
2010-09-30 08:33:05 +0800
522440ed5 cifs: set backing_dev_info on new S_ISREG inodes ... Browse Code »

Testing on very recent kernel (2.6.36-rc6) made this warning pop:

WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x65/0x70()
Hardware name:
Dirtiable inode bdi default != sb bdi cifs

...the following patch fixes it and seems to be the obviously correct
thing to do for cifs.

Cc: stable@kernel.org
Acked-by: Dave Kleikamp
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2010-09-30 03:23:23 +0800

29 Sep, 2010

1 commit

80168676e xfs: force background CIL push under sustained load ... Browse Code »

I have been seeing occasional pauses in transaction throughput up to
30s long under heavy parallel workloads. The only notable thing was
that the xfsaild was trying to be active during the pauses, but
making no progress. It was running exactly 20 times a second (on the
50ms no-progress backoff), and the number of pushbuf events was
constant across this time as well. IOWs, the xfsaild appeared to be
stuck on buffers that it could not push out.

Further investigation indicated that it was trying to push out inode
buffers that were pinned and/or locked. The xfsbufd was also getting
woken at the same frequency (by the xfsaild, no doubt) to push out
delayed write buffers. The xfsbufd was not making any progress
because all the buffers in the delwri queue were pinned. This scan-
and-make-no-progress dance went one in the trace for some seconds,
before the xfssyncd came along an issued a log force, and then
things started going again.

However, I noticed something strange about the log force - there
were way too many IO's issued. 516 log buffers were written, to be
exact. That added up to 129MB of log IO, which got me very
interested because it's almost exactly 25% of the size of the log.
He delayed logging code is suppose to aggregate the minimum of 25%
of the log or 8MB worth of changes before flushing. That's what
really puzzled me - why did a log force write 129MB instead of only
8MB?

Essentially what has happened is that no CIL pushes had occurred
since the previous tail push which cleared out 25% of the log space.
That caused all the new transactions to block because there wasn't
log space for them, but they kick the xfsaild to push the tail.
However, the xfsaild was not making progress because there were
buffers it could not lock and flush, and the xfsbufd could not flush
them because they were pinned. As a result, both the xfsaild and the
xfsbufd could not move the tail of the log forward without the CIL
first committing.

The cause of the problem was that the background CIL push, which
should happen when 8MB of aggregated changes have been committed, is
being held off by the concurrent transaction commit load. The
background push does a down_write_trylock() which will fail if there
is a concurrent transaction commit holding the push lock in read
mode. With 8 CPUs all doing transactions as fast as they can, there
was enough concurrent transaction commits to hold off the background
push until tail-pushing could no longer free log space, and the halt
would occur.

It should be noted that there is no reason why it would halt at 25%
of log space used by a single CIL checkpoint. This bug could
definitely violate the "no transaction should be larger than half
the log" requirement and hence result in corruption if the system
crashed under heavy load. This sort of bug is exactly the reason why
delayed logging was tagged as experimental....

The fix is to start blocking background pushes once the threshold
has been exceeded. Rework the threshold calculations to keep the
amount of log space a CIL checkpoint can use to below that of the
AIL push threshold to avoid the problem completely.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-09-29 20:51:03 +0800

25 Sep, 2010

1 commit

d1f3e68ef Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
o2dlm: force free mles during dlm exit
ocfs2: Sync inode flags with ext2.
ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.
ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.
ocfs2: update ctime when changing the file's permission by setfacl
ocfs2/net: fix uninitialized ret in o2net_send_message_vec()
Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c
Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.
ocfs2: Fix lockdep warning in reflink.
ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.

Linus Torvalds
2010-09-25 05:08:15 +0800

24 Sep, 2010

5 commits

5dad6c39d o2dlm: force free mles during dlm exit ... Browse Code »

While umounting, a block mle doesn't get freed if dlm is shutdown after
master request is received but before assert master. This results in unclean
shutdown of dlm domain.

This patch frees all mles that lie around after other nodes were notified about
exiting the dlm and marking dlm state as leaving. Only block mles are expected
to be around, so we log ERROR for other mles but still free them.

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2010-09-24 05:16:53 +0800
0000b8620 ocfs2: Sync inode flags with ext2. ... Browse Code »

We sync our inode flags with ext2 and define them by hex
values. But actually in commit 3669567(4 years ago), all
these values are moved to include/linux/fs.h. So we'd
better also use them as what ext2 did. So sync our inode
flags with ext2 by using FS_*.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:49 +0800
4a452de4f ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ... Browse Code »

The first time I read the function ocfs2_resmap_resv_bits, I consider
about what 'wanted' will be used and consider about the comments.
Then I find it is only used if the reservation is empty. ;)

So we'd better move it to the parens so that it make the code more
readable, what's more, ocfs2_resmap_resv_bits is used so frequently
and we should save some cpus.

Acked-by: Mark Fasheh
Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:47 +0800
47dea4237 ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ... Browse Code »

e_leaf_clusters is a le16, so use cpu_to_le16 instead
of cpu_to_le32.

What's more, we change 'clusters' to unsigned int to
signify that the size of 'clusters' isn't important here.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:34 +0800
12828061c ocfs2: update ctime when changing the file's permission by setfacl ... Browse Code »

In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2.

Steps to reproduce:
# touch aaa
# stat -c %Z aaa
1283760364
# setfacl -m 'u::x,g::x,o::x' aaa
# stat -c %Z aaa
1283760364

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:21 +0800

23 Sep, 2010

5 commits

1c2499ae8 /proc/pid/smaps: fix dirty pages accounting ... Browse Code »

Currently, /proc//smaps has wrong dirty pages accounting.
Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
PG_dirty page flag. It is difference against documentation, but also
inconsistent against Referenced field. (Referenced checks both pte and
page flags)

This patch fixes it.

Test program:

large-array.c
---------------------------------------------------
#include
#include
#include
#include

char array[1*1024*1024*1024L];

int main(void)
{
memset(array, 1, sizeof(array));
pause();

return 0;
}
---------------------------------------------------

Test case:
1. run ./large-array
2. cat /proc/`pidof large-array`/smaps
3. swapoff -a
4. cat /proc/`pidof large-array`/smaps again

Test result:

00601000-40601000 rw-p 00000000 00:00 0
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 218992 kB

00601000-40601000 rw-p 00000000 00:00 0
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 1048576 kB
Acked-by: Hugh Dickins
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-09-23 08:22:39 +0800
a0c42bac7 aio: do not return ERESTARTSYS as a result of AIO ... Browse Code »

OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option). Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions. As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).

Signed-off-by: Jan Kara
Reviewed-by: Jeff Moyer
Cc: Christoph Hellwig
Cc: Zach Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2010-09-23 08:22:39 +0800
c227e6902 /proc/vmcore: fix seeking ... Browse Code »

Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore")
broke seeking on /proc/vmcore. This changes it back to use default_llseek
in order to restore the original behaviour.

The problem with generic_file_llseek is that it only allows seeks up to
inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
file systems. We should merge generic_file_llseek and default_llseek some
day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
is the safer solution.

Signed-off-by: Arnd Bergmann
Cc: Frederic Weisbecker
Reported-by: CAI Qian
Tested-by: CAI Qian
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-09-23 08:22:38 +0800
767b68e96 Prevent freeing uninitialized pointer in compat_do_readv_writev ... Browse Code »

In 32-bit compatibility mode, the error handling for
compat_do_readv_writev() may free an uninitialized pointer, potentially
leading to all sorts of ugly memory corruption. This is reliably
triggerable by unprivileged users by invoking the readv()/writev()
syscalls with an invalid iovec pointer. The below patch fixes this to
emulate the non-compat version.

Introduced by commit b83733639a49 ("compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev")

Signed-off-by: Dan Rosenberg
Cc: stable@kernel.org (2.6.35)
Cc: Al Viro
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-09-23 08:22:38 +0800
b68e9d458 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends
char: Mark /dev/zero and /dev/kmem as not capable of writeback
bdi: Initialize noop_backing_dev_info properly
cfq-iosched: fix a kernel OOPs when usb key is inserted
block: fix blk_rq_map_kern bio direction flag
cciss: freeing uninitialized data on error path

Linus Torvalds
2010-09-23 00:12:37 +0800

22 Sep, 2010

3 commits

692ebd17c bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends ... Browse Code »

Inodes of devices such as /dev/zero can get dirty for example via
utime(2) syscall or due to atime update. Backing device of such inodes
(zero_bdi, etc.) is however unable to handle dirty inodes and thus
__mark_inode_dirty complains. In fact, inode should be rather dirtied
against backing device of the filesystem holding it. This is generally a
good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
in these pseudofilesystems are referenced from ordinary filesystem
inodes and carry mapping with real data of the device. Thus for these
inodes we have to use inode->i_mapping->backing_dev_info as we did so
far. We distinguish these filesystems by checking whether sb->s_bdi
points to a non-trivial backing device or not.

Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
There's a device inode A described by a path "/dev/sdb" on this
filesystem. This inode will be dirtied against backing device "8:0"
after this patch. bdev filesystem contains block device inode B coupled
with our inode A. When someone modifies a page of /dev/sdb, it's B that
gets dirtied and the dirtying happens against the backing device "8:16".
Thus both inodes get filed to a correct bdi list.

Cc: stable@kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2010-09-22 15:48:47 +0800
371d217ee char: Mark /dev/zero and /dev/kmem as not capable of writeback ... Browse Code »

These devices don't do any writeback but their device inodes still can get
dirty so mark bdi appropriately so that bdi code does the right thing and files
inodes to lists of bdi carrying the device inodes.

Cc: stable@kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2010-09-22 15:48:47 +0800
19746cad0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: select CRYPTO
ceph: check mapping to determine if FILE_CACHE cap is used
ceph: only send one flushsnap per cap_snap per mds session
ceph: fix cap_snap and realm split
ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap
ceph: correctly set 'follows' in flushsnap messages
ceph: fix dn offset during readdir_prepopulate
ceph: fix file offset wrapping at 4GB on 32-bit archs
ceph: fix reconnect encoding for old servers
ceph: fix pagelist kunmap tail
ceph: fix null pointer deref on anon root dentry release

Linus Torvalds
2010-09-22 02:20:10 +0800

20 Sep, 2010

1 commit

112d421df Coda: mount hangs because of missed REQ_WRITE rename ... Browse Code »

Coda's REQ_* defines were renamed to avoid clashes with the block layer
(commit 4aeefdc69f7b: "coda: fixup clash with block layer REQ_*
defines").

However one was missed and response messages are no longer matched with
requests and waiting threads are no longer woken up. This patch fixes
this.

Signed-off-by: Jan Harkes
[ Also fixed up whitespace while at it -Linus ]
Signed-off-by: Linus Torvalds

Jan Harkes
2010-09-20 02:03:09 +0800

18 Sep, 2010

3 commits

50aff0403 ocfs2/net: fix uninitialized ret in o2net_send_message_vec() ... Browse Code »

mmotm/fs/ocfs2/cluster/tcp.c: In function ‘o2net_send_message_vec’:
mmotm/fs/ocfs2/cluster/tcp.c:980:6: warning: ‘ret’ may be used uninitialized in this function

It seems a real bug introduced by commit 9af0b38ff3 (ocfs2/net:
Use wait_event() in o2net_send_message_vec()).

cc: Sunil Mushran
Signed-off-by: Wu Fengguang
Signed-off-by: Joel Becker

Wu Fengguang
2010-09-18 23:48:54 +0800
be4f104df ceph: select CRYPTO ... Browse Code »

We select CRYPTO_AES, but not CRYPTO.

Signed-off-by: Sage Weil

Sage Weil
2010-09-18 03:30:31 +0800
a43fb7310 ceph: check mapping to determine if FILE_CACHE cap is used ... Browse Code »

See if the i_data mapping has any pages to determine if the FILE_CACHE
capability is currently in use, instead of assuming it is any time the
rdcache_gen value is set (i.e., issued -> used).

This allows the MDS RECALL_STATE process work for inodes that have cached
pages.

Signed-off-by: Sage Weil

Sage Weil
2010-09-18 00:54:31 +0800

17 Sep, 2010

4 commits

e835124c2 ceph: only send one flushsnap per cap_snap per mds session ... Browse Code »

Sending multiple flushsnap messages is problematic because we ignore
the response if the tid doesn't match, and the server may only respond to
each one once. It's also a waste.

So, skip cap_snaps that are already on the flushing list, unless the caller
tells us to resend (because we are reconnecting).

Signed-off-by: Sage Weil

Sage Weil
2010-09-17 23:03:08 +0800
5f4874903 GFS2: gfs2_logd should be using interruptible waits ... Browse Code »

Looks like this crept in, in a recent update.

Reported-by: Krzysztof Urbaniak
Signed-off-by: Steven Whitehouse

Steven Whitehouse
2010-09-17 21:00:10 +0800
ae00d4f37 ceph: fix cap_snap and realm split ... Browse Code »

The cap_snap creation/queueing relies on both the current i_head_snapc
_and_ the i_snap_realm pointers being correct, so that the new cap_snap
can properly reference the old context and the new i_head_snapc can be
updated to reference the new snaprealm's context. To fix this, we:

- move inodes completely to the new (split) realm so that i_snap_realm
is correct, and
- generate the new snapc's _before_ queueing the cap_snaps in
ceph_update_snap_trace().

Signed-off-by: Sage Weil

Sage Weil
2010-09-17 07:26:51 +0800
03a7ab083 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix potential double put of TCP session reference

Linus Torvalds
2010-09-17 03:59:11 +0800

15 Sep, 2010

5 commits

de8d4f5d7 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
statfs() gives ESTALE error
NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
sunrpc: increase MAX_HASHTABLE_BITS to 14
gss:spkm3 miss returning error to caller when import security context
gss:krb5 miss returning error to caller when import security context
Remove incorrect do_vfs_lock message
SUNRPC: cleanup state-machine ordering
SUNRPC: Fix a race in rpc_info_open
SUNRPC: Fix race corrupting rpc upcall
Fix null dereference in call_allocate

Linus Torvalds
2010-09-15 08:04:48 +0800
75e1c70fc aio: check for multiplication overflow in do_io_submit ... Browse Code »

Tavis Ormandy pointed out that do_io_submit does not do proper bounds
checking on the passed-in iocb array:

if (unlikely(nr < 0))
return -EINVAL;

if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
return -EFAULT; ^^^^^^^^^^^^^^^^^^

The attached patch checks for overflow, and if it is detected, the
number of iocbs submitted is scaled down to a number that will fit in
the long. This is an ok thing to do, as sys_io_submit is documented as
returning the number of iocbs submitted, so callers should handle a
return value of less than the 'nr' argument passed in.

Reported-by: Tavis Ormandy
Signed-off-by: Jeff Moyer
Signed-off-by: Linus Torvalds

Jeff Moyer
2010-09-15 08:02:37 +0800
460cf3411 cifs: fix potential double put of TCP session reference ... Browse Code »

cifs_get_smb_ses must be called on a server pointer on which it holds an
active reference. It first does a search for an existing SMB session. If
it finds one, it'll put the server reference and then try to ensure that
the negprot is done, etc.

If it encounters an error at that point then it'll return an error.
There's a potential problem here though. When cifs_get_smb_ses returns
an error, the caller will also put the TCP server reference leading to a
double-put.

Fix this by having cifs_get_smb_ses only put the server reference if
it found an existing session that it could use and isn't returning an
error.

Cc: stable@kernel.org
Reviewed-by: Suresh Jayaraman
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2010-09-15 07:21:03 +0800
cfc0bf664 ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap ... Browse Code »

Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages
or is still writing. We'll send the newer capsnaps only after the older
ones complete.

Signed-off-by: Sage Weil

Sage Weil
2010-09-15 06:50:59 +0800
8bef9239e ceph: correctly set 'follows' in flushsnap messages ... Browse Code »

The 'follows' should match the seq for the snap context for the given snap
cap, which is the context under which we have been dirtying and writing
data and metadata. The snapshot that _contains_ those updates thus
_follows_ that context's seq #.

Signed-off-by: Sage Weil

Sage Weil
2010-09-15 06:45:44 +0800