Doug / smarc-fsl-linux-kernel | Embedian Git Server

19 Feb, 2011

1 commit

bc3adfc67 Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: wake up a worker when a rescuer is leaving a gcwq

Linus Torvalds
2011-02-19 04:36:06 +0800

18 Feb, 2011

1 commit

fa7ea87a0 fs/partitions: Validate map_count in Mac partition tables ... Browse Code »

Validate number of blocks in map and remove redundant variable.

Signed-off-by: Timo Warns
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Timo Warns
2011-02-18 09:50:51 +0800

17 Feb, 2011

5 commits

ee7150870 Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
nfsd: correctly handle return value from nfsd_map_name_to_*

Linus Torvalds
2011-02-17 13:53:41 +0800
e51900f7d block: revert block_dev read-only check ... Browse Code »

This reverts commit 75f1dc0d076d ("block: check bdev_read_only() from
blkdev_get()"). That commit added stricter checking to make sure
devices that were being used read-only were actually opened in that
mode.

It turns out that the change breaks a bunch of kernel code that opens
block devices. Affected systems include dm, md, and the loop device.
Because strict checking for read-only opens of block devices was not
done before this, the code that opens the devices was opening them
read-write even if they were being used read-only. Auditing all that
code will take time, and new userspace packages for dm, mdadm, etc.
will also be required.

Signed-off-by: Chuck Ebbert
Signed-off-by: Linus Torvalds

Chuck Ebbert
2011-02-17 08:48:13 +0800
47c85291d nfsd: correctly handle return value from nfsd_map_name_to_* ... Browse Code »

These functions return an nfs status, not a host_err. So don't
try to convert before returning.

This is a regression introduced by
3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
but missed these two.

Cc: stable@kernel.org
Reported-by: Herbert Poetzl
Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields

NeilBrown
2011-02-17 07:31:05 +0800
3abb17e82 vfs: fix BUG_ON() in fs/namei.c:1461 ... Browse Code »

When Al moved the nameidata_dentry_drop_rcu_maybe() call into the
do_follow_link function in commit 844a391799c2 ("nothing in
do_follow_link() is going to see RCU"), he mistakenly left the

BUG_ON(inode != path->dentry->d_inode);

behind. Which would otherwise be ok, but that BUG_ON() really needs to
be _after_ dropping RCU, since the dentry isn't necessarily stable
otherwise.

So complete the code movement in that commit, and move the BUG_ON() into
do_follow_link() too. This means that we need to pass in 'inode' as an
argument (just for this one use), but that's a small thing. And
eventually we may be confident enough in our path lookup that we can
just remove the BUG_ON() and the unnecessary inode argument.

Reported-and-tested-by: Eric Dumazet
Acked-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-02-17 00:56:55 +0800
58a69cb47 workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' ... Browse Code »

There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.

Signed-off-by: Tejun Heo
Reported-by: Alan Stern
Acked-by: "Rafael J. Wysocki"
Acked-by: Greg Kroah-Hartman
Acked-by: Dmitry Torokhov
Cc: David Woodhouse
Cc: Alex Dubov
Cc: "David S. Miller"
Cc: Steven Whitehouse

Tejun Heo
2011-02-17 00:48:59 +0800

16 Feb, 2011

3 commits

f60c153d5 Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
nfsd: break lease on unlink due to rename
nfsd4: acquire only one lease per file
nfsd4: modify fi_delegations under recall_lock
nfsd4: remove unused deleg dprintk's.
nfsd4: split lease setting into separate function
nfsd4: fix leak on allocation error
nfsd4: add helper function for lease setup
nfsd4: split up nfsd_break_deleg_cb
NFSD: memory corruption due to writing beyond the stat array
NFSD: use nfserr for status after decode_cb_op_status
nfsd: don't leak dentry count on mnt_want_write failure

Linus Torvalds
2011-02-16 04:06:38 +0800
055d21944 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()
drop out of RCU in return_reval
split do_revalidate() into RCU and non-RCU cases
in do_lookup() split RCU and non-RCU cases of need_revalidate
nothing in do_follow_link() is going to see RCU

Linus Torvalds
2011-02-16 00:06:36 +0800
007a14af2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: check return value of alloc_extent_map()
Btrfs - Fix memory leak in btrfs_init_new_device()
btrfs: prevent heap corruption in btrfs_ioctl_space_info()
Btrfs: Fix balance panic
Btrfs: don't release pages when we can't clear the uptodate bits
Btrfs: fix page->private races

Linus Torvalds
2011-02-16 00:00:35 +0800

15 Feb, 2011

12 commits

261cd298a s390: remove task_show_regs ... Browse Code »

task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc//status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.

Reported-by: Al Viro
Signed-off-by: Martin Schwidefsky
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2011-02-15 23:34:16 +0800
4e924a4f5 get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() ... Browse Code »

can't happen anymore and didn't work right anyway

Signed-off-by: Al Viro

Al Viro
2011-02-15 15:26:54 +0800
f60aef7ec drop out of RCU in return_reval ... Browse Code »

... thus killing the need to handle drop-from-RCU in d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-02-15 15:26:54 +0800
f5e1c1c1a split do_revalidate() into RCU and non-RCU cases ... Browse Code »

fixing oopsen in lookup_one_len()

Signed-off-by: Al Viro

Al Viro
2011-02-15 15:26:54 +0800
24643087e in do_lookup() split RCU and non-RCU cases of need_revalidate ... Browse Code »

and use unlikely() instead of gotos, for fsck sake...

Signed-off-by: Al Viro

Al Viro
2011-02-15 15:26:54 +0800
844a39179 nothing in do_follow_link() is going to see RCU ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-02-15 15:26:53 +0800
c26a92037 Btrfs: check return value of alloc_extent_map() ... Browse Code »

I add the check on the return value of alloc_extent_map() to several places.
In addition, alloc_extent_map() returns only the address or NULL.
Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-02-15 05:21:37 +0800
67100f255 Btrfs - Fix memory leak in btrfs_init_new_device() ... Browse Code »

Memory allocated by calling kstrdup() should be freed.

Signed-off-by: Ilya Dryomov
Signed-off-by: Chris Mason

Ilya Dryomov
2011-02-15 05:21:31 +0800
51788b1bd btrfs: prevent heap corruption in btrfs_ioctl_space_info() ... Browse Code »

Commit bf5fc093c5b625e4259203f1cee7ca73488a5620 refactored
btrfs_ioctl_space_info() and introduced several security issues.

space_args.space_slots is an unsigned 64-bit type controlled by a
possibly unprivileged caller. The comparison as a signed int type
allows providing values that are treated as negative and cause the
subsequent allocation size calculation to wrap, or be truncated to 0.
By providing a size that's truncated to 0, kmalloc() will return
ZERO_SIZE_PTR. It's also possible to provide a value smaller than the
slot count. The subsequent loop ignores the allocation size when
copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.

The fix changes the slot count type and comparison typecast to u64,
which prevents truncation or signedness errors, and also ensures that we
don't copy more data than we've allocated in the subsequent loop. Note
that zero-size allocations are no longer possible since there is already
an explicit check for space_args.space_slots being 0 and truncation of
this value is no longer an issue.

Signed-off-by: Dan Rosenberg
Signed-off-by: Josef Bacik
Reviewed-by: Josef Bacik
Signed-off-by: Chris Mason

Dan Rosenberg
2011-02-15 05:04:23 +0800
6848ad646 Btrfs: Fix balance panic ... Browse Code »

Mark the cloned backref_node as checked in clone_backref_node()

Signed-off-by: Yan, Zheng
Signed-off-by: Chris Mason

Yan, Zheng
2011-02-15 05:00:03 +0800
e3f24cc52 Btrfs: don't release pages when we can't clear the uptodate bits ... Browse Code »

Btrfs tracks uptodate state in an rbtree as well as in the
page bits. This is supposed to enable us to use block sizes other than
the page size, but there are a few parts still missing before that
completely works.

But, our readpage routine trusts this additional range based tracking
of uptodateness, much in the same way the buffer head up to date bits
are trusted for the other filesystems.

The problem is that sometimes we need to allocate memory in order to
split records in the rbtree, even when we are just clearing bits. This
can be difficult when our clearing function is called GFP_ATOMIC, which
can happen in the releasepage path.

So, what happens today looks like this:

releasepage called with GFP_ATOMIC
btrfs_releasepage calls clear_extent_bit
clear_extent_bit fails to allocate ram, leaving the up to date bit set
btrfs_releasepage returns success

The end result is the page being gone, but btrfs thinking the range is
up to date. Later on if someone tries to read that same page, the
btrfs readpage code will return immediately thinking the page is already
up to date.

This commit fixes things to fail the releasepage when we can't clear the
extent state bits. It covers both data pages and metadata tree blocks.

Signed-off-by: Chris Mason

Chris Mason
2011-02-15 02:04:01 +0800
eb14ab8ed Btrfs: fix page->private races ... Browse Code »

There is a race where btrfs_releasepage can drop the
page->private contents just as alloc_extent_buffer is setting
up pages for metadata. Because of how the Btrfs page flags work,
this results in us skipping the crc on the page during IO.

This patch sovles the race by waiting until after the extent buffer
is inserted into the radix tree before it sets page private.

Signed-off-by: Chris Mason

Chris Mason
2011-02-15 02:03:52 +0800

14 Feb, 2011

11 commits

83f6b0c18 nfsd: break lease on unlink due to rename ... Browse Code »

4795bb37effb7b8fe77e2d2034545d062d3788a8 "nfsd: break lease on unlink,
link, and rename", only broke the lease on the file that was being
renamed, and didn't handle the case where the target path refers to an
already-existing file that will be unlinked by a rename--in that case
the target file should have any leases broken as well.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:19 +0800
acfdf5c38 nfsd4: acquire only one lease per file ... Browse Code »

Instead of acquiring one lease each time another client opens a file,
nfsd can acquire just one lease to represent all of them, and reference
count it to determine when to release it.

This fixes a regression introduced by
c45821d263a8a5109d69a9e8942b8d65bcd5f31a "locks: eliminate fl_mylease
callback": after that patch, only the struct file * is used to determine
who owns a given lease. But since we recently converted the server to
share a single struct file per open, if we acquire multiple leases on
the same file from nfsd, it then becomes impossible on unlocking a lease
to determine which of those leases (all of whom share the same struct
file *) we meant to remove.

Thanks to Takashi Iwai for catching a bug in a previous
version of this patch.

Tested-by: Takashi Iwai
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:19 +0800
5d926e8c2 nfsd4: modify fi_delegations under recall_lock ... Browse Code »

Modify fi_delegations only under the recall_lock, allowing us to use
that list on lease breaks.

Also some trivial cleanup to simplify later changes.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:19 +0800
65bc58f51 nfsd4: remove unused deleg dprintk's. ... Browse Code »

These aren't all that useful, and get in the way of the next steps.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:19 +0800
edab9782b nfsd4: split lease setting into separate function ... Browse Code »

Splitting some code into a separate function which we'll be adding some
more to.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:18 +0800
dd239cc05 nfsd4: fix leak on allocation error ... Browse Code »

Also share some common exit code.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:18 +0800
22d38c4c1 nfsd4: add helper function for lease setup ... Browse Code »

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:18 +0800
6b57d9c86 nfsd4: split up nfsd_break_deleg_cb ... Browse Code »

We'll be adding some more code here soon.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:35:18 +0800
3aa6e0aa8 NFSD: memory corruption due to writing beyond the stat array ... Browse Code »

If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.

In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.

In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.

Signed-off-by: Konstantin Khorenko
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

Konstantin Khorenko
2011-02-14 23:35:18 +0800
0af3f814c NFSD: use nfserr for status after decode_cb_op_status ... Browse Code »

Bugs introduced in 85a56480191ca9f08fc775c129b9eb5c8c1f2c05
"NFSD: Update XDR decoders in NFSv4 callback client"

Cc: Chuck Lever
Signed-off-by: Benny Halevy
Signed-off-by: J. Bruce Fields

Benny Halevy
2011-02-14 23:35:18 +0800
541ce98c1 nfsd: don't leak dentry count on mnt_want_write failure ... Browse Code »

The exit cleanup isn't quite right here.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-02-14 23:31:08 +0800

13 Feb, 2011

1 commit

c8e0b00ed Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: call __jbd2_log_start_commit with j_state_lock write locked
ext4: serialize unaligned asynchronous DIO
ext4: make grpinfo slab cache names static
ext4: Fix data corruption with multi-block writepages support
ext4: fix up ext4 error handling
ext4: unregister features interface on module unload
ext4: fix panic on module unload when stopping lazyinit thread

Linus Torvalds
2011-02-13 01:10:24 +0800

12 Feb, 2011

6 commits

e44718318 jbd2: call __jbd2_log_start_commit with j_state_lock write locked ... Browse Code »

On an SMP ARM system running ext4, I've received a report that the
first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

J_ASSERT(journal->j_running_transaction != NULL);

While investigating possible causes for this problem, I noticed that
__jbd2_log_start_commit() is getting called with j_state_lock only
read-locked, in spite of the fact that it's possible for it might
j_commit_request. Fix this by grabbing the necessary information so
we can test to see if we need to start a new transaction before
dropping the read lock, and then calling jbd2_log_start_commit() which
will grab the write lock.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2011-02-12 21:18:24 +0800
e9e3bcecf ext4: serialize unaligned asynchronous DIO ... Browse Code »

ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and
dio_zero_block() will zero out the unwritten portions. When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO. I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write(). But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex. So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment. I've
tested a backport of this patch with qemu, and it does
avoid the corruption. It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
variables are protected with an "ext4_" prefix.]

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2011-02-12 21:17:34 +0800
2892c15dd ext4: make grpinfo slab cache names static ... Browse Code »

In 2.6.37 I was running into oopses with repeated module
loads & unloads. I tracked this down to:

fb1813f4 ext4: use dedicated slab caches for group_info structures

(this was in addition to the features advert unload problem)

The kstrdup & subsequent kfree of the cache name was causing
a double free. In slub, at least, if I read it right it allocates
& frees the name itself, slab seems to do something different...
so in slub I think we were leaking -our- cachep->name, and double
freeing the one allocated by slub.

After getting lost in slab/slub/slob a bit, I just looked at other
sized-caches that get allocated. jbd2, biovec, sgpool all do it
more or less the way jbd2 does. Below patch follows the jbd2
method of dynamically allocating a cache at mount time from
a list of static names.

(This might also possibly fix a race creating the caches with
parallel mounts running).

[Folded in a fix from Dan Carpenter which fixed an off-by-one error in
the original patch]

Cc: stable@kernel.org
Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2011-02-12 21:12:18 +0800
d40b0c348 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
dlm: use single thread workqueues

Linus Torvalds
2011-02-12 08:29:57 +0800
3aec46c1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: don't always drop malformed replies on the floor (try #3)
cifs: clean up checks in cifs_echo_request
[CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate

Linus Torvalds
2011-02-12 08:29:50 +0800
d863b50ab vfs: call rcu_barrier after ->kill_sb() ... Browse Code »

In commit fa0d7e3de6d6 ("fs: icache RCU free inodes"), we use rcu free
inode instead of freeing the inode directly. It causes a crash when we
rmmod immediately after we umount the volume[1].

So we need to call rcu_barrier after we kill_sb so that the inode is
freed before we do rmmod. The idea is inspired by Aneesh Kumar.
rcu_barrier will wait for all callbacks to end before preceding. The
original patch was done by Tao Ma, but synchronize_rcu() is not enough
here.

1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2

Tested-by: Tao Ma
Signed-off-by: Boaz Harrosh
Cc: Nick Piggin
Cc: Al Viro
Cc: Chris Mason
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Boaz Harrosh
2011-02-12 08:12:19 +0800