Eric Lee / smarc-fsl-linux-kernel

02 Aug, 2010

1 commit

77a63f3d1 NFS: Fix a typo in include/linux/nfs_fs.h ... Browse Code »

nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.

Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.

Reported-and-tested-by: Ingo Molnar
Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds

Trond Myklebust
2010-08-02 06:10:01 +0800

31 Jul, 2010

5 commits

fc71ff8a6 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Ensure that writepage respects the nonblock flag
NFS: kswapd must not block in nfs_release_page
nfs: include space for the NUL in root path

Linus Torvalds
2010-07-31 10:02:21 +0800
51c20fcce CIFS: Remove __exit mark from cifs_exit_dns_resolver() ... Browse Code »

Remove the __exit mark from cifs_exit_dns_resolver() as it's called by the
module init routine in case of error, and so may have been discarded during
linkage.

Signed-off-by: David Howells
Acked-by: Jeff Layton
Signed-off-by: Linus Torvalds

David Howells
2010-07-31 09:56:09 +0800
cfb506e1d NFS: Ensure that writepage respects the nonblock flag ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-07-31 03:38:56 +0800
b608b283a NFS: kswapd must not block in nfs_release_page ... Browse Code »

See https://bugzilla.kernel.org/show_bug.cgi?id=16056

If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.

Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-07-31 03:38:42 +0800
674b22229 nfs: include space for the NUL in root path ... Browse Code »

In root_nfs_name() it does the following:

if (strlen(buf) + strlen(cp) > NFS_MAXPATHLEN) {
printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n");
return -1;
}
sprintf(nfs_export_path, buf, cp);

In the original code if (strlen(buf) + strlen(cp) == NFS_MAXPATHLEN)
then the sprintf() would lead to an overflow. Generally the rest of the
code assumes that the path can have NFS_MAXPATHLEN (1024) characters and
a NUL terminator so the fix is to add space to the nfs_export_path[]
buffer.

Signed-off-by: Dan Carpenter
Signed-off-by: Trond Myklebust

Dan Carpenter
2010-07-31 03:33:39 +0800

30 Jul, 2010

1 commit

de09a9771 CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials ... Browse Code »

It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.

What happens is that get_task_cred() can race with commit_creds():

TASK_1 TASK_2 RCU_CLEANER
-->get_task_cred(TASK_2)
rcu_read_lock()
__cred = __task_cred(TASK_2)
-->commit_creds()
old_cred = TASK_2->real_cred
TASK_2->real_cred = ...
put_cred(old_cred)
call_rcu(old_cred)
[__cred->usage == 0]
get_cred(__cred)
[__cred->usage == 1]
rcu_read_unlock()
-->put_cred_rcu()
[__cred->usage == 1]
panic()

However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.

If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.

We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.

Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:

kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[] [] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
[] put_cred+0x13/0x15
[] commit_creds+0x16b/0x175
[] set_current_groups+0x47/0x4e
[] sys_setgroups+0xf6/0x105
[] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP [] __put_cred+0xc/0x45
RSP
---[ end trace df391256a100ebdd ]---

Signed-off-by: David Howells
Acked-by: Jiri Olsa
Signed-off-by: Linus Torvalds

David Howells
2010-07-30 06:16:17 +0800

29 Jul, 2010

3 commits

a6f80fb7b ecryptfs: Bugfix for error related to ecryptfs_hash_buckets ... Browse Code »

The function ecryptfs_uid_hash wrongly assumes that the
second parameter to hash_long() is the number of hash
buckets instead of the number of hash bits.
This patch fixes that and renames the variable
ecryptfs_hash_buckets to ecryptfs_hash_bits to make it
clearer.

Fixes: CVE-2010-2492

Signed-off-by: Andre Osterhues
Signed-off-by: Tyler Hicks
Signed-off-by: Linus Torvalds

Andre Osterhues
2010-07-29 10:59:24 +0800
6c50e1a49 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: use complete_all and wake_up_all
ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"
ceph: fix dentry lease release
ceph: fix leak of dentry in ceph_init_dentry() error path
ceph: fix pg_mapping leak on pg_temp updates
ceph: fix d_release dop for snapdir, snapped dentries
ceph: avoid dcache readdir for snapdir

Linus Torvalds
2010-07-29 02:10:53 +0800
d2a97a4e9 GFS2: Use kmalloc when possible for ->readdir() ... Browse Code »

If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Also using GFP_NOFS to avoid any issues wrt to deleting inodes
while under a glock, and suggestion from Linus to factor out
the alloc/dealloc.

I've given this a test with a variety of different sized
directories and it seems to work ok.

Cc: Andrew Morton
Cc: Nick Piggin
Cc: Prarit Bhargava
Signed-off-by: Steven Whitehouse
Signed-off-by: Linus Torvalds

Steven Whitehouse
2010-07-29 02:10:03 +0800

28 Jul, 2010

2 commits

03066f234 ceph: use complete_all and wake_up_all ... Browse Code »

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-07-28 04:11:17 +0800
da7ddd329 9p: Pass the correct end of buffer to p9stat_read ... Browse Code »

Pass the correct end of the buffer to p9stat_read.

Signed-off-by: Latchesar Ionkov
Signed-off-by: Eric Van Hensbergen

Latchesar Ionkov
2010-07-28 03:52:04 +0800

27 Jul, 2010

3 commits

d33002129 sysfs: allow creating symlinks from untagged to tagged directories ... Browse Code »

Supporting symlinks from untagged to tagged directories is reasonable,
and needed to support CONFIG_SYSFS_DEPRECATED. So don't fail a prior
allowing that case to work.

Signed-off-by: Eric W. Biederman
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2010-07-27 03:02:41 +0800
521d04535 sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories. ... Browse Code »

This happens for network devices when SYSFS_DEPRECATED is enabled.

Signed-off-by: Eric W. Biederman
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2010-07-27 03:02:41 +0800
96d6523ad sysfs: Don't allow the creation of symlinks we can't remove ... Browse Code »

Recently my tagged sysfs support revealed a flaw in the device core
that a few rare drivers are running into such that we don't always put
network devices in a class subdirectory named net/.

Since we are not creating the class directory the network devices wind
up in a non-tagged directory, but the symlinks to the network devices
from /sys/class/net are in a tagged directory. All of which works
until we go to remove or rename the symlink. When we remove or rename
a symlink we look in the namespace of the target of the symlink.
Since the target of the symlink is in a non-tagged sysfs directory we
don't have a namespace to look in, and we fail to remove the symlink.

Detect this problem up front and simply don't create symlinks we won't
be able to remove later. This prevents symlink leakage and fails in
a much clearer and more understandable way.

Signed-off-by: Eric W. Biederman
Cc: Andrew Morton
Cc: Rafael J. Wysocki
Cc: Maciej W. Rozycki
Cc: Kay Sievers
Cc: Johannes Berg
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2010-07-27 03:02:41 +0800

25 Jul, 2010

1 commit

25848b3ec ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES" ... Browse Code »

Signed-off-by: Robert P. J. Day
Signed-off-by: Sage Weil

Robert P. J. Day
2010-07-25 12:36:07 +0800

24 Jul, 2010

4 commits

1dadcce35 ceph: fix dentry lease release ... Browse Code »

When we embed a dentry lease release notification in a request, invalidate
our lease so we don't think we still have it. Otherwise we can get all
sorts of incorrect client behavior when multiple clients are interacting
with the same part of the namespace.

Signed-off-by: Sage Weil

Sage Weil
2010-07-24 04:54:21 +0800
8c696737a ceph: fix leak of dentry in ceph_init_dentry() error path ... Browse Code »

If we fail to allocate a ceph_dentry_info, don't leak the dn reference.

Signed-off-by: Sage Weil

Sage Weil
2010-07-24 01:02:07 +0800
bc4fdca85 ceph: fix pg_mapping leak on pg_temp updates ... Browse Code »

Free the ceph_pg_mapping structs when they are removed from the pg_temp
rbtree. Also fix a leak in the __insert_pg_mapping() error path.

Signed-off-by: Sage Weil

Sage Weil
2010-07-24 01:02:06 +0800
252af5214 ceph: fix d_release dop for snapdir, snapped dentries ... Browse Code »

We need to set the d_release dop for snapdir and snapped dentries so that
the ceph_dentry_info struct gets released. We also use the dcache to
cache readdir results when possible, which only works if we know when
dentries are dropped from the cache. Since we don't use the dcache for
readdir in the hidden snapdir, avoid that case in ceph_dentry_release.

Signed-off-by: Sage Weil

Sage Weil
2010-07-24 01:02:05 +0800

23 Jul, 2010

2 commits

a0dff78da ceph: avoid dcache readdir for snapdir ... Browse Code »

We should always go to the MDS for readdir on the hidden snapdir. The
set of snapshots can change at any time; the client can't trust its cache
for that.

Signed-off-by: Sage Weil

Sage Weil
2010-07-23 04:50:45 +0800
4c0c03ca5 CIFS: Fix a malicious redirect problem in the DNS lookup code ... Browse Code »

Fix the security problem in the CIFS filesystem DNS lookup code in which a
malicious redirect could be installed by a random user by simply adding a
result record into one of their keyrings with add_key() and then invoking a
CIFS CFS lookup [CVE-2010-2524].

This is done by creating an internal keyring specifically for the caching of
DNS lookups. To enforce the use of this keyring, the module init routine
creates a set of override credentials with the keyring installed as the thread
keyring and instructs request_key() to only install lookup result keys in that
keyring.

The override is then applied around the call to request_key().

This has some additional benefits when a kernel service uses this module to
request a key:

(1) The result keys are owned by root, not the user that caused the lookup.

(2) The result keys don't pop up in the user's keyrings.

(3) The result keys don't come out of the quota of the user that caused the
lookup.

The keyring can be viewed as root by doing cat /proc/keys:

2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4

It can then be listed with 'keyctl list' by root.

# keyctl list 0x2a0ca6c3
1 key in keyring:
726766307: --alswrv 0 0 dns_resolver: foo.bar.com

Signed-off-by: David Howells
Reviewed-and-Tested-by: Jeff Layton
Acked-by: Steve French
Signed-off-by: Linus Torvalds

David Howells
2010-07-23 00:42:40 +0800

22 Jul, 2010

1 commit

a4ce96ac3 Fix up trivial spelling errors ('taht' -> 'that') ... Browse Code »

Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.

Reported-by: Lucas
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-07-22 00:25:42 +0800

21 Jul, 2010

1 commit

e0959371b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: do not include cap/dentry releases in replayed messages
ceph: reuse request message when replaying against recovering mds
ceph: fix creation of ipv6 sockets
ceph: fix parsing of ipv6 addresses
ceph: fix printing of ipv6 addrs
ceph: add kfree() to error path
ceph: fix leak of mon authorizer
ceph: fix message revocation

Linus Torvalds
2010-07-21 07:27:58 +0800

20 Jul, 2010

7 commits

620d0be88 Merge branch 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev ... Browse Code »

* 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
xfs: track AGs with reclaimable inodes in per-ag radix tree
xfs: convert inode shrinker to per-filesystem contexts
mm: add context argument to shrinker callback

Linus Torvalds
2010-07-20 11:18:24 +0800
ee1039307 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE
Btrfs: fix CLONE ioctl destination file size expansion to block boundary
Btrfs: fix split_leaf double split corner case

Linus Torvalds
2010-07-20 10:33:02 +0800
16fd53673 xfs: track AGs with reclaimable inodes in per-ag radix tree ... Browse Code »

https://bugzilla.kernel.org/show_bug.cgi?id=16348

When the filesystem grows to a large number of allocation groups,
the summing of recalimable inodes gets expensive. In many cases,
most AGs won't have any reclaimable inodes and so we are wasting CPU
time aggregating over these AGs. This is particularly important for
the inode shrinker that gets called frequently under memory
pressure.

To avoid the overhead, track AGs with reclaimable inodes in the
per-ag radix tree so that we can find all the AGs with reclaimable
inodes via a simple gang tag lookup. This involves setting the tag
when the first reclaimable inode is tracked in the AG, and removing
the tag when the last reclaimable inode is removed from the tree.
Then the summation process becomes a loop walking the radix tree
summing AGs with the reclaim tag set.

This significantly reduces the overhead of scanning - a 6400 AG
filesystea now only uses about 25% of a cpu in kswapd while slab
reclaim progresses instead of being permanently stuck at 100% CPU
and making little progress. Clean filesystems filesystems will see
no overhead and the overhead only increases linearly with the number
of dirty AGs.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-07-20 07:43:39 +0800
70e60ce71 xfs: convert inode shrinker to per-filesystem contexts ... Browse Code »

Now the shrinker passes us a context, wire up a shrinker context per
filesystem. This allows us to remove the global mount list and the
locking problems that introduced. It also means that a shrinker call
does not need to traverse clean filesystems before finding a
filesystem with reclaimable inodes. This significantly reduces
scanning overhead when lots of filesystems are present.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-07-20 06:07:02 +0800
2ebc34647 Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE ... Browse Code »

1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check
whether the donor file is append-only before writing to it.

2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer
overflow that allows a user to specify an out-of-bounds range to copy
from the source file (if off + len wraps around). I haven't been able
to successfully exploit this, but I'd imagine that a clever attacker
could use this to read things he shouldn't. Even if it's not
exploitable, it couldn't hurt to be safe.

Signed-off-by: Dan Rosenberg
cc: stable@kernel.org
Signed-off-by: Chris Mason

Dan Rosenberg
2010-07-20 04:58:20 +0800
b5384d48f Btrfs: fix CLONE ioctl destination file size expansion to block boundary ... Browse Code »

The CLONE and CLONE_RANGE ioctls round up the range of extents being
cloned to the block size when the range to clone extends to the end of file
(this is always the case with CLONE). It was then using that offset when
extending the destination file's i_size. Fix this by not setting i_size
beyond the originally requested ending offset.

This bug was introduced by a22285a6 (2.6.35-rc1).

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2010-07-20 04:15:06 +0800
99d8f83c9 Btrfs: fix split_leaf double split corner case ... Browse Code »

split_leaf was not properly balancing leaves when it was forced to
split a leaf twice. This commit adds an extra push left and right
before forcing the double split in hopes of getting the slot where
we want to insert at either the start or end of the leaf.

If the extra pushes do work, then we are able to avoid splitting twice
and we keep the tree properly balanced.

Signed-off-by: Chris Mason

Chris Mason
2010-07-20 04:14:50 +0800

19 Jul, 2010

3 commits

cffab6bc5 [S390] dasd: use correct label location for diag fba disks ... Browse Code »

Partition boundary calculation fails for DASD FBA disks under the
following conditions:
- disk is formatted with CMS FORMAT with a blocksize of more than
512 bytes
- all of the disk is reserved to a single CMS file using CMS RESERVE
- the disk is accessed using the DIAG mode of the DASD driver

Under these circumstances, the partition detection code tries to
read the CMS label block containing partition-relevant information
from logical block offset 1, while it is in fact located at physical
block offset 1.

Fix this problem by using the correct CMS label block location
depending on the device type as determined by the DASD SENSE ID
information.

Signed-off-by: Peter Oberparleiter
Signed-off-by: Martin Schwidefsky

Peter Oberparleiter
2010-07-19 15:22:50 +0800
7f8275d0d mm: add context argument to shrinker callback ... Browse Code »

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-07-19 12:56:17 +0800
bea9a6d23 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Silence gcc warning in ocfs2_write_zero_page().
jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
ocfs2: Don't duplicate pages past i_size during CoW.
ocfs2: tighten up strlen() checking
ocfs2: Make xattr reflink work with new local alloc reservation.
ocfs2: make xattr extension work with new local alloc reservation.
ocfs2: Remove the redundant cpu_to_le64.
ocfs2/dlm: don't access beyond bitmap size
ocfs2: No need to zero pages past i_size.
ocfs2: Zero the tail cluster when extending past i_size.
ocfs2: When zero extending, do it by page.
ocfs2: Limit default local alloc size within bitmap range.
ocfs2: Move orphan scan work to ocfs2_wq.
fs/ocfs2/dlm: Add missing spin_unlock

Linus Torvalds
2010-07-19 01:09:25 +0800

17 Jul, 2010

3 commits

5453258d5 ocfs2: Silence gcc warning in ocfs2_write_zero_page(). ... Browse Code »

ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc
doesn't know that. Set ret=0 just to make gcc happy.

Signed-off-by: Joel Becker

Joel Becker
2010-07-17 04:33:39 +0800
e979cf503 ceph: do not include cap/dentry releases in replayed messages ... Browse Code »

Strip the cap and dentry releases from replayed messages. They can
cause the shared state to get out of sync because they were generated
(with the request message) earlier, and no longer reflect the current
client state.

Signed-off-by: Sage Weil

Sage Weil
2010-07-17 01:30:18 +0800
01a92f174 ceph: reuse request message when replaying against recovering mds ... Browse Code »

Replayed rename operations (after an mds failure/recovery) were broken
because the request paths were regenerated from the dentry names, which
get mangled when d_move() is called.

Instead, resend the previous request message when replaying completed
operations. Just make sure the REPLAY flag is set and the target ino is
filled in.

This fixes problems with workloads doing renames when the MDS restarts,
where the rename operation appears to succeed, but on mds restart then
fails (leading to client confusion, app breakage, etc.).

Signed-off-by: Sage Weil

Sage Weil
2010-07-17 01:30:17 +0800

16 Jul, 2010

3 commits

13ceef099 jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions ... Browse Code »

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
- This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
- This copies off b_frozen_data to make it safe for transaction1 to commit.
jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
- There was no dirty call for the bh on handle2, so it is never queued for
any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback. This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Jan Kara
2010-07-16 06:17:47 +0800
a39953dd9 ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node ... Browse Code »

For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
dlm_migration_can_proceed() for that purpose. However, if the node is
down, dlm_migration_can_proceed() will also return "go ahead". In this
rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
the BUG_ON() that trips over this condition.

Signed-off-by: Wengang Wang
Signed-off-by: Joel Becker

Wengang Wang
2010-07-16 01:56:30 +0800
f5e27b6dd ocfs2: Don't duplicate pages past i_size during CoW. ... Browse Code »

During CoW, the pages after i_size don't contain valid data, so there's
no need to read and duplicate them.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-07-16 01:54:28 +0800