Eric Lee / smarc-fsl-linux-kernel

15 Oct, 2010

3 commits

8fd01d6cf Export dump_{write,seek} to binary loader modules ... Browse Code »

If you build aout support as a module, you'll want these exported.

Reported-by: Tetsuo Handa
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-10-15 10:15:28 +0800
3aa0ce825 Un-inline the core-dump helper functions ... Browse Code »

Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41da ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.

Rather than add yet another include () to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.

dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical. And we really
don't need to mess up our include file headers more than they already
are.

Reported-and-tested-by: Tony Luck
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-10-15 05:32:06 +0800
0eead9ab4 Don't dump task struct in a.out core-dumps ... Browse Code »

akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.

Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-10-15 01:57:40 +0800

14 Oct, 2010

2 commits

8c35bf368 Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink

Linus Torvalds
2010-10-14 07:51:29 +0800
b1e86db1d nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink ... Browse Code »

As of commit 43a9aa64a2f4330a9cb59aaf5c5636566bce067c "NFSD:
Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call
fh_unlock on a filehandle that isn't fully initialized.

We should fix up the callers, but as a quick fix it is also sufficient
just to remove this assertion.

Reported-by: Marius Tolzmann
Cc: Chuck Lever
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-10-14 03:48:55 +0800

12 Oct, 2010

1 commit

7c5347733 fanotify: disable fanotify syscalls ... Browse Code »

This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().

It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups. This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.

This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have. I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.

Three choices:
Go with what we got (and implement the new feature next cycle). Add a
new field right now (and implement the new feature next cycle). Wait
till next cycle to release the ABI (and implement the new feature next
cycle). This is number 3.

Signed-off-by: Eric Paris
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-12 09:15:28 +0800

10 Oct, 2010

2 commits

8dc54e49c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: update issue_seq on cap grant
ceph: send cap release message early on failed revoke.
ceph: Update max_len with minimum required size
ceph: Fix return value of encode_fh function
ceph: avoid null deref in osd request error path
ceph: fix list_add usage on unsafe_writes list

Linus Torvalds
2010-10-10 03:03:46 +0800
267aeb6c1 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Fix double page_unlock BUG in write_begin/end

Linus Torvalds
2010-10-10 03:03:23 +0800

08 Oct, 2010

2 commits

f17b1f9f1 exofs: Fix double page_unlock BUG in write_begin/end ... Browse Code »

This BUG is there since the first submit of the code, but only triggered
in last Kernel. It's timing related do to the asynchronous object-creation
behaviour of exofs. (Which should be investigated farther)

The bug is obvious hence the fixed.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-10-08 23:26:54 +0800
5710c2b27 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: properly account for reclaimed inodes

Linus Torvalds
2010-10-08 04:45:26 +0800

07 Oct, 2010

9 commits

d91f2438d ceph: update issue_seq on cap grant ... Browse Code »

We need to update the issue_seq on any grant operation, be it via an MDS
reply or a separate grant message. The update in the grant path was
missing. This broke cap release for inodes in which the MDS sent an
explicit grant message that was not soon after followed by a successful
MDS reply on the same inode.

Also fix the signedness on seq locals.

Signed-off-by: Sage Weil

Sage Weil
2010-10-07 23:01:50 +0800
21b559de5 ceph: send cap release message early on failed revoke. ... Browse Code »

If an MDS tries to revoke caps that we don't have, we want to send
releases early since they probably contain the caps message the MDS
is looking for.

Previously, we only sent the messages if we didn't have the inode either. But
in a multi-mds system we can retain the inode after dropping all caps for
a single MDS.

Signed-off-by: Greg Farnum
Signed-off-by: Sage Weil

Greg Farnum
2010-10-07 23:00:24 +0800
bba0cd0e3 ceph: Update max_len with minimum required size ... Browse Code »

encode_fh on error should update max_len with minimum required
size, so that caller can redo the call with the reallocated buffer.
This is required with open by handle patch series

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Sage Weil

Aneesh Kumar K.V
2010-10-07 23:00:24 +0800
92923dcbf ceph: Fix return value of encode_fh function ... Browse Code »

encode_fh function should return 255 on error as done by other file
system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units
and not in bytes.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Sage Weil

Aneesh Kumar K.V
2010-10-07 23:00:23 +0800
6bc18876b ceph: avoid null deref in osd request error path ... Browse Code »

If we interrupt an osd request, we call __cancel_request, but it wasn't
verifying that req->r_osd was non-NULL before dereferencing it. This could
cause a crash if osds were flapping and we aborted a request on said osd.

Reported-by: Henry C Chang
Signed-off-by: Sage Weil

Sage Weil
2010-10-07 23:00:23 +0800
936aeb5c4 ceph: fix list_add usage on unsafe_writes list ... Browse Code »

Fix argument order.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-10-07 23:00:23 +0800
081003fff xfs: properly account for reclaimed inodes ... Browse Code »

When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.

When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree. Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.

Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.

The counters on the other hand signal an increasing amount of slab
objects to reclaim. Since "70e60ce xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.

But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.

Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.

Signed-off-by: Johannes Weiner
Cc: stable@kernel.org
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Johannes Weiner
2010-10-07 11:35:48 +0800
089eed29b Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
writeback: always use sb->s_bdi for writeback purposes

Linus Torvalds
2010-10-07 02:11:18 +0800
8fe9793af Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: Initialize total_len in fuse_retrieve()

Linus Torvalds
2010-10-07 00:50:41 +0800

04 Oct, 2010

2 commits

aaead25b9 writeback: always use sb->s_bdi for writeback purposes ... Browse Code »

We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.

To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.

Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.

The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-10-04 20:25:33 +0800
0157443c5 fuse: Initialize total_len in fuse_retrieve() ... Browse Code »

fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
function

Initialize total_len to zero, else its value will be undefined.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Miklos Szeredi

Geert Uytterhoeven
2010-10-04 16:45:32 +0800

02 Oct, 2010

5 commits

c6ea21e35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: prevent infinite recursion in cifs_reconnect_tcon
cifs: set backing_dev_info on new S_ISREG inodes

Linus Torvalds
2010-10-02 06:03:37 +0800
9d8117e72 reiserfs: fix unwanted reiserfs lock recursion ... Browse Code »

Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
because we may call journal_begin() that requires the lock to be taken
only once, otherwise it won't be able to release the lock while taking
other mutexes, ending up in inverted dependencies between the journal
mutex and the reiserfs lock for example.

This fixes:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35.4.4a #3
-------------------------------------------------------
lilo/1620 is trying to acquire lock:
(&journal->j_mutex){+.+...}, at: [] do_journal_begin_r+0x7f/0x340 [reiserfs]

but task is already holding lock:
(&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_write_lock+0x28/0x40 [reiserfs]
[] do_journal_begin_r+0x86/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_remount+0x224/0x530 [reiserfs]
[] do_remount_sb+0x60/0x110
[] do_mount+0x625/0x790
[] sys_mount+0x84/0xb0
[] syscall_call+0x7/0xb

-> #0 (&journal->j_mutex){+.+...}:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] do_journal_begin_r+0x7f/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
[] reiserfs_get_block+0x22c/0x1530 [reiserfs]
[] __block_prepare_write+0x1bb/0x3a0
[] block_prepare_write+0x26/0x40
[] reiserfs_prepare_write+0x88/0x170 [reiserfs]
[] reiserfs_unpack+0xe6/0x120 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

other info that might help us debug this:

2 locks held by lilo/1620:
#0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x6a/0x120 [reiserfs]
#1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

stack backtrace:
Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
Call Trace:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] do_journal_begin_r+0x7f/0x340 [reiserfs]
[] journal_begin+0x77/0x140 [reiserfs]
[] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
[] reiserfs_get_block+0x22c/0x1530 [reiserfs]
[] __block_prepare_write+0x1bb/0x3a0
[] block_prepare_write+0x26/0x40
[] reiserfs_prepare_write+0x88/0x170 [reiserfs]
[] reiserfs_unpack+0xe6/0x120 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski
Tested-by: Jarek Poplawski
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: All since 2.6.32
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Frederic Weisbecker
2010-10-02 01:50:59 +0800
3f259d092 reiserfs: fix dependency inversion between inode and reiserfs mutexes ... Browse Code »

The reiserfs mutex already depends on the inode mutex, so we can't lock
the inode mutex in reiserfs_unpack() without using the safe locking API,
because reiserfs_unpack() is always called with the reiserfs mutex locked.

This fixes:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35c #13
-------------------------------------------------------
lilo/1606 is trying to acquire lock:
(&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x60/0x110 [reiserfs]

but task is already holding lock:
(&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_write_lock+0x28/0x40 [reiserfs]
[] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
[] reiserfs_fill_super+0x941/0xe60 [reiserfs]
[] get_sb_bdev+0x117/0x170
[] get_super_block+0x21/0x30 [reiserfs]
[] vfs_kern_mount+0x6a/0x1b0
[] do_kern_mount+0x39/0xe0
[] do_mount+0x340/0x790
[] sys_mount+0x84/0xb0
[] syscall_call+0x7/0xb

-> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_unpack+0x60/0x110 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

other info that might help us debug this:

1 lock held by lilo/1606:
#0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

stack backtrace:
Pid: 1606, comm: lilo Not tainted 2.6.35c #13
Call Trace:
[] __lock_acquire+0x1026/0x1180
[] lock_acquire+0x67/0x80
[] __mutex_lock_common+0x4d/0x410
[] mutex_lock_nested+0x18/0x20
[] reiserfs_unpack+0x60/0x110 [reiserfs]
[] reiserfs_ioctl+0x272/0x320 [reiserfs]
[] vfs_ioctl+0x28/0xa0
[] do_vfs_ioctl+0x32d/0x5c0
[] sys_ioctl+0x63/0x70
[] syscall_call+0x7/0xb

Reported-by: Jarek Poplawski
Tested-by: Jarek Poplawski
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: [2.6.32 and later]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Frederic Weisbecker
2010-10-02 01:50:59 +0800
3036e7b49 proc: make /proc/pid/limits world readable ... Browse Code »

Having the limits file world readable will ease the task of system
management on systems where root privileges might be restricted.

Having admin restricted with root priviledges, he/she could not check
other users process' limits.

Also it'd align with most of the /proc stat files.

Signed-off-by: Jiri Olsa
Acked-by: Neil Horman
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Olsa
2010-10-02 01:50:59 +0800
f569599ae cifs: prevent infinite recursion in cifs_reconnect_tcon ... Browse Code »

cifs_reconnect_tcon is called from smb_init. After a successful
reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That
function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo.
Those functions also call smb_init.

It's possible for the session and tcon reconnect to succeed, and then
for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or
CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call
smb_init and cifs_reconnect_tcon again, ad infinitum...

Break the infinite recursion by having those functions use a new
smb_init variant that doesn't attempt to perform a reconnect.

Reported-and-Tested-by: Michal Suchanek
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2010-10-02 01:50:08 +0800

30 Sep, 2010

3 commits

0d4911081 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Don't walk off the end of fast symlinks.

Linus Torvalds
2010-09-30 11:38:07 +0800
1fc8a1178 ocfs2: Don't walk off the end of fast symlinks. ... Browse Code »

ocfs2 fast symlinks are NUL terminated strings stored inline in the
inode data area. However, disk corruption or a local attacker could, in
theory, remove that NUL. Because we're using strlen() (my fault,
introduced in a731d1 when removing vfs_follow_link()), we could walk off
the end of that string.

Signed-off-by: Joel Becker
Cc: stable@kernel.org

Joel Becker
2010-09-30 08:33:05 +0800
522440ed5 cifs: set backing_dev_info on new S_ISREG inodes ... Browse Code »

Testing on very recent kernel (2.6.36-rc6) made this warning pop:

WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x65/0x70()
Hardware name:
Dirtiable inode bdi default != sb bdi cifs

...the following patch fixes it and seems to be the obviously correct
thing to do for cifs.

Cc: stable@kernel.org
Acked-by: Dave Kleikamp
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2010-09-30 03:23:23 +0800

29 Sep, 2010

1 commit

80168676e xfs: force background CIL push under sustained load ... Browse Code »

I have been seeing occasional pauses in transaction throughput up to
30s long under heavy parallel workloads. The only notable thing was
that the xfsaild was trying to be active during the pauses, but
making no progress. It was running exactly 20 times a second (on the
50ms no-progress backoff), and the number of pushbuf events was
constant across this time as well. IOWs, the xfsaild appeared to be
stuck on buffers that it could not push out.

Further investigation indicated that it was trying to push out inode
buffers that were pinned and/or locked. The xfsbufd was also getting
woken at the same frequency (by the xfsaild, no doubt) to push out
delayed write buffers. The xfsbufd was not making any progress
because all the buffers in the delwri queue were pinned. This scan-
and-make-no-progress dance went one in the trace for some seconds,
before the xfssyncd came along an issued a log force, and then
things started going again.

However, I noticed something strange about the log force - there
were way too many IO's issued. 516 log buffers were written, to be
exact. That added up to 129MB of log IO, which got me very
interested because it's almost exactly 25% of the size of the log.
He delayed logging code is suppose to aggregate the minimum of 25%
of the log or 8MB worth of changes before flushing. That's what
really puzzled me - why did a log force write 129MB instead of only
8MB?

Essentially what has happened is that no CIL pushes had occurred
since the previous tail push which cleared out 25% of the log space.
That caused all the new transactions to block because there wasn't
log space for them, but they kick the xfsaild to push the tail.
However, the xfsaild was not making progress because there were
buffers it could not lock and flush, and the xfsbufd could not flush
them because they were pinned. As a result, both the xfsaild and the
xfsbufd could not move the tail of the log forward without the CIL
first committing.

The cause of the problem was that the background CIL push, which
should happen when 8MB of aggregated changes have been committed, is
being held off by the concurrent transaction commit load. The
background push does a down_write_trylock() which will fail if there
is a concurrent transaction commit holding the push lock in read
mode. With 8 CPUs all doing transactions as fast as they can, there
was enough concurrent transaction commits to hold off the background
push until tail-pushing could no longer free log space, and the halt
would occur.

It should be noted that there is no reason why it would halt at 25%
of log space used by a single CIL checkpoint. This bug could
definitely violate the "no transaction should be larger than half
the log" requirement and hence result in corruption if the system
crashed under heavy load. This sort of bug is exactly the reason why
delayed logging was tagged as experimental....

The fix is to start blocking background pushes once the threshold
has been exceeded. Rework the threshold calculations to keep the
amount of log space a CIL checkpoint can use to below that of the
AIL push threshold to avoid the problem completely.

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-09-29 20:51:03 +0800

25 Sep, 2010

1 commit

d1f3e68ef Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
o2dlm: force free mles during dlm exit
ocfs2: Sync inode flags with ext2.
ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.
ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.
ocfs2: update ctime when changing the file's permission by setfacl
ocfs2/net: fix uninitialized ret in o2net_send_message_vec()
Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c
Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.
ocfs2: Fix lockdep warning in reflink.
ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.

Linus Torvalds
2010-09-25 05:08:15 +0800

24 Sep, 2010

5 commits

5dad6c39d o2dlm: force free mles during dlm exit ... Browse Code »

While umounting, a block mle doesn't get freed if dlm is shutdown after
master request is received but before assert master. This results in unclean
shutdown of dlm domain.

This patch frees all mles that lie around after other nodes were notified about
exiting the dlm and marking dlm state as leaving. Only block mles are expected
to be around, so we log ERROR for other mles but still free them.

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2010-09-24 05:16:53 +0800
0000b8620 ocfs2: Sync inode flags with ext2. ... Browse Code »

We sync our inode flags with ext2 and define them by hex
values. But actually in commit 3669567(4 years ago), all
these values are moved to include/linux/fs.h. So we'd
better also use them as what ext2 did. So sync our inode
flags with ext2 by using FS_*.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:49 +0800
4a452de4f ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ... Browse Code »

The first time I read the function ocfs2_resmap_resv_bits, I consider
about what 'wanted' will be used and consider about the comments.
Then I find it is only used if the reservation is empty. ;)

So we'd better move it to the parens so that it make the code more
readable, what's more, ocfs2_resmap_resv_bits is used so frequently
and we should save some cpus.

Acked-by: Mark Fasheh
Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:47 +0800
47dea4237 ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ... Browse Code »

e_leaf_clusters is a le16, so use cpu_to_le16 instead
of cpu_to_le32.

What's more, we change 'clusters' to unsigned int to
signify that the size of 'clusters' isn't important here.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:34 +0800
12828061c ocfs2: update ctime when changing the file's permission by setfacl ... Browse Code »

In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2.

Steps to reproduce:
# touch aaa
# stat -c %Z aaa
1283760364
# setfacl -m 'u::x,g::x,o::x' aaa
# stat -c %Z aaa
1283760364

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-24 05:16:21 +0800

23 Sep, 2010

4 commits

1c2499ae8 /proc/pid/smaps: fix dirty pages accounting ... Browse Code »

Currently, /proc//smaps has wrong dirty pages accounting.
Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
PG_dirty page flag. It is difference against documentation, but also
inconsistent against Referenced field. (Referenced checks both pte and
page flags)

This patch fixes it.

Test program:

large-array.c
---------------------------------------------------
#include
#include
#include
#include

char array[1*1024*1024*1024L];

int main(void)
{
memset(array, 1, sizeof(array));
pause();

return 0;
}
---------------------------------------------------

Test case:
1. run ./large-array
2. cat /proc/`pidof large-array`/smaps
3. swapoff -a
4. cat /proc/`pidof large-array`/smaps again

Test result:

00601000-40601000 rw-p 00000000 00:00 0
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 218992 kB

00601000-40601000 rw-p 00000000 00:00 0
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 1048576 kB
Acked-by: Hugh Dickins
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-09-23 08:22:39 +0800
a0c42bac7 aio: do not return ERESTARTSYS as a result of AIO ... Browse Code »

OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option). Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions. As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).

Signed-off-by: Jan Kara
Reviewed-by: Jeff Moyer
Cc: Christoph Hellwig
Cc: Zach Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2010-09-23 08:22:39 +0800
c227e6902 /proc/vmcore: fix seeking ... Browse Code »

Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore")
broke seeking on /proc/vmcore. This changes it back to use default_llseek
in order to restore the original behaviour.

The problem with generic_file_llseek is that it only allows seeks up to
inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
file systems. We should merge generic_file_llseek and default_llseek some
day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
is the safer solution.

Signed-off-by: Arnd Bergmann
Cc: Frederic Weisbecker
Reported-by: CAI Qian
Tested-by: CAI Qian
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-09-23 08:22:38 +0800
767b68e96 Prevent freeing uninitialized pointer in compat_do_readv_writev ... Browse Code »

In 32-bit compatibility mode, the error handling for
compat_do_readv_writev() may free an uninitialized pointer, potentially
leading to all sorts of ugly memory corruption. This is reliably
triggerable by unprivileged users by invoking the readv()/writev()
syscalls with an invalid iovec pointer. The below patch fixes this to
emulate the non-compat version.

Introduced by commit b83733639a49 ("compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev")

Signed-off-by: Dan Rosenberg
Cc: stable@kernel.org (2.6.35)
Cc: Al Viro
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-09-23 08:22:38 +0800