Eric Lee / smarc-fsl-linux-kernel

01 Mar, 2012

5 commits

8ed3fe820 NFSv4: fix server_scope memory leak ... Browse Code »

commit abe9a6d57b4544ac208401f9c0a4262814db2be4 upstream.

server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Weston Andros Adamson
2012-03-01 08:30:56 +0800
0cea513e3 NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID ... Browse Code »

commit b9f9a03150969e4bd9967c20bce67c4de769058f upstream.

To ensure that we don't just reuse the bad delegation when we attempt to
recover the nfs4_state that received the bad stateid error.

Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2012-03-01 08:30:55 +0800
4a818b428 NFSv4: Fix an Oops in the NFSv4 getacl code ... Browse Code »

commit 331818f1c468a24e581aedcbe52af799366a9dfe upstream.

Commit bf118a342f10dafe44b14451a1392c3254629a1f (NFSv4: include bitmap
in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
where we may need to decode multi-page data. However it fails to take
into account the fact that the variable may be NULL (for the case where
we're not doing multi-page decode), and it also attaches it to the
encoding xdr_stream rather than the decoding one.

The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
call to page_address() with a NULL page pointer.

Signed-off-by: Trond Myklebust
Cc: Andy Adamson
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2012-03-01 08:30:55 +0800
3c40e5e08 vfs: fix d_inode_lookup() dentry ref leak ... Browse Code »

commit e188dc02d3a9c911be56eca5aa114fe7e9822d53 upstream.

d_inode_lookup() leaks a dentry reference on IS_DEADDIR().

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Miklos Szeredi
2012-03-01 08:30:53 +0800
6b6cd603f eCryptfs: Copy up lower inode attrs after setting lower xattr ... Browse Code »

commit 545d680938be1e86a6c5250701ce9abaf360c495 upstream.

After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292

Signed-off-by: Tyler Hicks
Reported-by: Sebastien Bacher
Cc: John Johansen
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2012-03-01 08:30:52 +0800

21 Feb, 2012

3 commits

23cfecf97 cifs: don't return error from standard_receive3 after marking response malformed ... Browse Code »

commit ff4fa4a25a33f92b5653bb43add0c63bea98d464 upstream.

standard_receive3 will check the validity of the response from the
server (via checkSMB). It'll pass the result of that check to handle_mid
which will dequeue it and mark it with a status of
MID_RESPONSE_MALFORMED if checkSMB returned an error. At that point,
standard_receive3 will also return an error, which will make the
demultiplex thread skip doing the callback for the mid.

This is wrong -- if we were able to identify the request and the
response is marked malformed, then we want the demultiplex thread to do
the callback. Fix this by making standard_receive3 return 0 in this
situation.

Reported-and-Tested-by: Mark Moseley
Signed-off-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2012-02-21 04:46:18 +0800
77d04b76d cifs: request oplock when doing open on lookup ... Browse Code »

commit 8b0192a5f478da1c1ae906bf3ffff53f26204f56 upstream.

Currently, it's always set to 0 (no oplock requested).

Signed-off-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2012-02-21 04:46:17 +0800
aec14f459 writeback: fix NULL bdi->dev in trace writeback_single_inode ... Browse Code »

commit 15eb77a07c714ac80201abd0a9568888bcee6276 upstream.

bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
tearing down the original bdi. Fix trace_writeback_single_inode to
use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
teared down bdi.

Reported-by: Rabin Vincent
Tested-by: Rabin Vincent
Signed-off-by: Wu Fengguang
Signed-off-by: Greg Kroah-Hartman

Wu Fengguang
2012-02-21 04:46:16 +0800

14 Feb, 2012

6 commits

85f2f3e05 cifs: Fix oops in session setup code for null user mounts ... Browse Code »

commit de47a4176c532ef5961b8a46a2d541a3517412d3 upstream.

For null user mounts, do not invoke string length function
during session setup.

Reported-and-Tested-by: Chris Clayton
Acked-by: Jeff Layton
Signed-off-by: Shirish Pargaonkar
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Shirish Pargaonkar
2012-02-14 03:16:58 +0800
3d1b7976d eCryptfs: Infinite loop due to overflow in ecryptfs_write() ... Browse Code »

commit 684a3ff7e69acc7c678d1a1394fe9e757993fd34 upstream.

ecryptfs_write() can enter an infinite loop when truncating a file to a
size larger than 4G. This only happens on architectures where size_t is
represented by 32 bits.

This was caused by a size_t overflow due to it incorrectly being used to
store the result of a calculation which uses potentially large values of
type loff_t.

[tyhicks@canonical.com: rewrite subject and commit message]
Signed-off-by: Li Wang
Signed-off-by: Yunchuan Wen
Reviewed-by: Cong Wang
Signed-off-by: Tyler Hicks
Signed-off-by: Greg Kroah-Hartman

Li Wang
2012-02-14 03:16:58 +0800
43f4a516b udf: Mark LVID buffer as uptodate before marking it dirty ... Browse Code »

commit 853a0c25baf96b028de1654bea1e0c8857eadf3d upstream.

When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.

Reviewed-by: Namjae Jeon
Signed-off-by: Jan Kara
Cc: Dave Jones
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2012-02-14 03:16:56 +0800
43904e95b proc: make sure mem_open() doesn't pin the target's memory ... Browse Code »

commit 6d08f2c7139790c268820a2e590795cb8333181a upstream.

Once /proc/pid/mem is opened, the memory can't be released until
mem_release() even if its owner exits.

Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
before access_remote_vm(), this verifies that this mm is still alive.

I am not sure what should mem_rw() return if atomic_inc_not_zero()
fails. With this patch it returns zero to match the "mm == NULL" case,
may be it should return -EINVAL like it did before e268337d.

Perhaps it makes sense to add the additional fatal_signal_pending()
check into the main loop, to ensure we do not hold this memory if
the target task was oom-killed.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2012-02-14 03:16:53 +0800
034089b6f proc: unify mem_read() and mem_write() ... Browse Code »

commit 572d34b946bae070debd42db1143034d9687e13f upstream.

No functional changes, cleanup and preparation.

mem_read() and mem_write() are very similar. Move this code into the
new common helper, mem_rw(), which takes the additional "int write"
argument.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2012-02-14 03:16:53 +0800
3a196fbe2 proc: mem_release() should check mm != NULL ... Browse Code »

commit 71879d3cb3dd8f2dfdefb252775c1b3ea04a3dd4 upstream.

mem_release() can hit mm == NULL, add the necessary check.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Oleg Nesterov
2012-02-14 03:16:53 +0800

04 Feb, 2012

8 commits

bfd534c72 sysfs: Complain bitterly about attempts to remove files from nonexistent directories. ... Browse Code »

commit ce597919361dcec97341151690e780eade2a9cf4 upstream.

Recently an OOPS was observed from the usb serial io_ti driver when it tried to remove
sysfs directories. Upon investigation it turns out this driver was always buggy
and that a recent sysfs change had stopped guarding itself against removing attributes
from sysfs directories that had already been removed. :(

Historically we have been silent about attempting to files from nonexistent sysfs
directories and have politely returned error codes. That has resulted in people writing
broken code that ignores the error codes.

Issue a kernel WARNING and a stack backtrace to make it clear in no uncertain
terms that abusing sysfs is not ok, and the callers need to fix their code.

This change transforms the io_ti OOPS into a more comprehensible error message
and stack backtrace.

Signed-off-by: Eric W. Biederman
Reported-by: Wolfgang Frisch
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2012-02-04 01:21:47 +0800
fde1c2621 jbd: Issue cache flush after checkpointing ... Browse Code »

commit 353b67d8ced4dc53281c88150ad295e24bc4b4c5 upstream.

When we reach cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock, effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash.

A similar problem can happen if we replay the journal and wipe it before
flushing disk's caches.

Thus we must unconditionally issue a cache flush before we update journal
superblock in these cases. The fix is slightly complicated by the fact that we
have to get log tail before we issue cache flush but we can store it in the
journal superblock only after the cache flush. Otherwise we risk races where
new tail is written before appropriate cache flush is finished.

I managed to reproduce the corruption using somewhat tweaked Chris Mason's
barrier-test scheduler. Also this should fix occasional reports of 'Bit already
freed' filesystem errors which are totally unreproducible but inspection of
several fs images I've gathered over time points to a problem like this.

Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2012-02-04 01:21:32 +0800
358d01392 xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() ... Browse Code »

commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream.

Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.

CC: Carlos Maiolino
Signed-off-by: Jan Kara
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2012-02-04 01:21:27 +0800
1924fe587 eCryptfs: Fix oops when printing debug info in extent crypto functions ... Browse Code »

commit 58ded24f0fcb85bddb665baba75892f6ad0f4b8a upstream.

If pages passed to the eCryptfs extent-based crypto functions are not
mapped and the module parameter ecryptfs_verbosity=1 was specified at
loading time, a NULL pointer dereference will occur.

Note that this wouldn't happen on a production system, as you wouldn't
pass ecryptfs_verbosity=1 on a production system. It leaks private
information to the system logs and is for debugging only.

The debugging info printed in these messages is no longer very useful
and rather than doing a kmap() in these debugging paths, it will be
better to simply remove the debugging paths completely.

https://launchpad.net/bugs/913651

Signed-off-by: Tyler Hicks
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2012-02-04 01:21:24 +0800
963f50802 eCryptfs: Check inode changes in setattr ... Browse Code »

commit a261a03904849c3df50bd0300efb7fb3f865137d upstream.

Most filesystems call inode_change_ok() very early in ->setattr(), but
eCryptfs didn't call it at all. It allowed the lower filesystem to make
the call in its ->setattr() function. Then, eCryptfs would copy the
appropriate inode attributes from the lower inode to the eCryptfs inode.

This patch changes that and actually calls inode_change_ok() on the
eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
would happen earlier in ecryptfs_setattr(), but there are some possible
inode initialization steps that must happen first.

Since the call was already being made on the lower inode, the change in
functionality should be minimal, except for the case of a file extending
truncate call. In that case, inode_newsize_ok() was never being
called on the eCryptfs inode. Rather than inode_newsize_ok() catching
maximum file size errors early on, eCryptfs would encrypt zeroed pages
and write them to the lower filesystem until the lower filesystem's
write path caught the error in generic_write_checks(). This patch
introduces a new function, called ecryptfs_inode_newsize_ok(), which
checks if the new lower file size is within the appropriate limits when
the truncate operation will be growing the lower file.

In summary this change prevents eCryptfs truncate operations (and the
resulting page encryptions), which would exceed the lower filesystem
limits or FSIZE rlimits, from ever starting.

Signed-off-by: Tyler Hicks
Reviewed-by: Li Wang
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2012-02-04 01:21:24 +0800
ccc10d459 eCryptfs: Make truncate path killable ... Browse Code »

commit 5e6f0d769017cc49207ef56996e42363ec26c1f0 upstream.

ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
page, zeroes out the appropriate portions, and then encrypts the page
before writing it to the lower filesystem. It was unkillable and due to
the lack of sparse file support could result in tying up a large portion
of system resources, while encrypting pages of zeros, with no way for
the truncate operation to be stopped from userspace.

This patch adds the ability for ecryptfs_write() to detect a pending
fatal signal and return as gracefully as possible. The intent is to
leave the lower file in a useable state, while still allowing a user to
break out of the encryption loop. If a pending fatal signal is detected,
the eCryptfs inode size is updated to reflect the modified inode size
and then -EINTR is returned.

Signed-off-by: Tyler Hicks
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2012-02-04 01:21:23 +0800
75d26d309 ecryptfs: Improve metadata read failure logging ... Browse Code »

commit 30373dc0c87ffef68d5628e77d56ffb1fa22e1ee upstream.

Print inode on metadata read failure. The only real
way of dealing with metadata read failures is to delete
the underlying file system file. Having the inode
allows one to 'find . -inum INODE`.

[tyhicks@canonical.com: Removed some minor not-for-stable parts]
Signed-off-by: Tim Gardner
Reviewed-by: Kees Cook
Signed-off-by: Tyler Hicks
Signed-off-by: Greg Kroah-Hartman

Tim Gardner
2012-02-04 01:21:23 +0800
2f46e90c6 eCryptfs: Sanitize write counts of /dev/ecryptfs ... Browse Code »

commit db10e556518eb9d21ee92ff944530d84349684f4 upstream.

A malicious count value specified when writing to /dev/ecryptfs may
result in a a very large kernel memory allocation.

This patch peeks at the specified packet payload size, adds that to the
size of the packet headers and compares the result with the write count
value. The resulting maximum memory allocation size is approximately 532
bytes.

Signed-off-by: Tyler Hicks
Reported-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
2012-02-04 01:21:22 +0800

26 Jan, 2012

18 commits

bd1dc8b10 proc: clear_refs: do not clear reserved pages ... Browse Code »

commit 85e72aa5384b1a614563ad63257ded0e91d1a620 upstream.

/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.

On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.

To see this problem in action, just run:

$ echo 1 > /proc/self/clear_refs

on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37

This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.

Signed-off-by: Will Deacon
Reported-by: Moussa Ba
Acked-by: Hugh Dickins
Cc: David Rientjes
Cc: Russell King
Acked-by: Nicolas Pitre
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2012-01-26 08:13:58 +0800
bdac3a105 cifs: lower default wsize when unix extensions are not used ... Browse Code »

commit ce91acb3acae26f4163c5a6f1f695d1a1e8d9009 upstream.

We've had some reports of servers (namely, the Solaris in-kernel CIFS
server) that don't deal properly with writes that are "too large" even
though they set CAP_LARGE_WRITE_ANDX. Change the default to better
mirror what windows clients do.

Cc: Pavel Shilovsky
Reported-by: Nick Davis
Signed-off-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2012-01-26 08:13:57 +0800
afa2f5f83 xfs: fix endian conversion issue in discard code ... Browse Code »

commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream

When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers
Signed-off-by: Greg Kroah-Hartman

Dave Chinner
2012-01-26 08:13:55 +0800
4d96a0c10 proc: clean up and fix /proc/<pid>/mem handling ... Browse Code »

commit e268337dfe26dfc7efd422a804dbb27977a3cccc upstream.

Jüri Aedla reported that the /proc//mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.

Reported-by: Jüri Aedla
Cc: Al Viro
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-01-26 08:13:43 +0800
ca4a55730 fix cputime overflow in uptime_proc_show ... Browse Code »

commit c3e0ef9a298e028a82ada28101ccd5cf64d209ee upstream.

For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.

Cc: Michael Abbott
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman

Martin Schwidefsky
2012-01-26 08:13:41 +0800
b4e8ff299 pnfsblock: limit bio page count ... Browse Code »

commit 74a6eeb44ca6174d9cc93b9b8b4d58211c57bc80 upstream.

One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
bio_alloc will fail when there are many pages in one read/write_pagelist.

Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Peng Tao
2012-01-26 08:13:37 +0800
7e0f9f47b pnfsblock: don't spinlock when freeing block_dev ... Browse Code »

commit 93a3844ee0f843b05a1df4b52e1a19ff26b98d24 upstream.

bl_free_block_dev() may sleep. We can not call it with spinlock held.
Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.

Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Peng Tao
2012-01-26 08:13:36 +0800
ae6481331 pnfsblock: acquire im_lock in _preload_range ... Browse Code »

commit 39e567ae36fe03c2b446e1b83ee3d39bea08f90b upstream.

When calling _add_entry, we should take the im_lock to protect
agains other modifiers.

Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Peng Tao
2012-01-26 08:13:36 +0800
1f489b2d8 fix shrink_dcache_parent() livelock ... Browse Code »

commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream.

Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.

Here's what appears to happen:

1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

2 - CPU1: select_parent(P) locks P->d_lock

3 - CPU0: shrink_dentry_list() locks C->d_lock
dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

4 - CPU1: select_parent(P) locks C->d_lock,
moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1

5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

6 - Goto 2 with CPU0 and CPU1 switched

Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.

One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.

Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:

ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

Then execute the following loop:

while true; do
echo -bond0 > /sys/class/net/bonding_masters
echo +bond0 > /sys/class/net/bonding_masters
echo -bond1 > /sys/class/net/bonding_masters
echo +bond1 > /sys/class/net/bonding_masters
done

One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent(). But I think a better solution is to stop the
stealing behavior.

This patch adds a new dentry flag that is set when the dentry is added to the
dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.

If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it. With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.

Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Miklos Szeredi
2012-01-26 08:13:35 +0800
9ba5dc56a dcache: use a dispose list in select_parent ... Browse Code »

commit b48f03b319ba78f3abf9a7044d1f436d8d90f4f9 upstream.

select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.

IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.

Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.

Signed-off-by: Dave Chinner
Signed-off-by: Linus Torvalds
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Dave Chinner
2012-01-26 08:13:35 +0800
a8b1c0add fsnotify: don't BUG in fsnotify_destroy_mark() ... Browse Code »

commit fed474857efbed79cd390d0aee224231ca718f63 upstream.

Removing the parent of a watched file results in "kernel BUG at
fs/notify/mark.c:139".

To reproduce

add "-w /tmp/audit/dir/watched_file" to audit.rules
rm -rf /tmp/audit/dir

This is caused by fsnotify_destroy_mark() being called without an
extra reference taken by the caller.

Reported by Francesco Cosoleto here:

https://bugzilla.novell.com/show_bug.cgi?id=689860

Fix by removing the BUG_ON and adding a comment about not accessing mark after
the iput.

Signed-off-by: Miklos Szeredi
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Miklos Szeredi
2012-01-26 08:13:33 +0800
6437ff72d nfsd: Fix oops when parsing a 0 length export ... Browse Code »

commit b2ea70afade7080360ac55c4e64ff7a5fafdb67b upstream.

expkey_parse() oopses when handling a 0 length export. This is easily
triggerable from usermode by writing 0 bytes into
'/proc/[proc id]/net/rpc/nfsd.fh/channel'.

Below is the log:

[ 1402.286893] BUG: unable to handle kernel paging request at ffff880077c49fff
[ 1402.287632] IP: [] expkey_parse+0x28/0x2e1
[ 1402.287632] PGD 2206063 PUD 1fdfd067 PMD 1ffbc067 PTE 8000000077c49160
[ 1402.287632] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1402.287632] CPU 1
[ 1402.287632] Pid: 20198, comm: trinity Not tainted 3.2.0-rc2-sasha-00058-gc65cd37 #6
[ 1402.287632] RIP: 0010:[] [] expkey_parse+0x28/0x2e1
[ 1402.287632] RSP: 0018:ffff880077f0fd68 EFLAGS: 00010292
[ 1402.287632] RAX: ffff880077c49fff RBX: 00000000ffffffea RCX: 0000000001043400
[ 1402.287632] RDX: 0000000000000000 RSI: ffff880077c4a000 RDI: ffffffff82283de0
[ 1402.287632] RBP: ffff880077f0fe18 R08: 0000000000000001 R09: ffff880000000000
[ 1402.287632] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880077c4a000
[ 1402.287632] R13: ffffffff82283de0 R14: 0000000001043400 R15: ffffffff82283de0
[ 1402.287632] FS: 00007f25fec3f700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
[ 1402.287632] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1402.287632] CR2: ffff880077c49fff CR3: 0000000077e1d000 CR4: 00000000000406e0
[ 1402.287632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1402.287632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1402.287632] Process trinity (pid: 20198, threadinfo ffff880077f0e000, task ffff880077db17b0)
[ 1402.287632] Stack:
[ 1402.287632] ffff880077db17b0 ffff880077c4a000 ffff880077f0fdb8 ffffffff810b411e
[ 1402.287632] ffff880000000000 ffff880077db17b0 ffff880077c4a000 ffffffff82283de0
[ 1402.287632] 0000000001043400 ffffffff82283de0 ffff880077f0fde8 ffffffff81111f63
[ 1402.287632] Call Trace:
[ 1402.287632] [] ? lock_release+0x1af/0x1bc
[ 1402.287632] [] ? might_fault+0x97/0x9e
[ 1402.287632] [] ? might_fault+0x4e/0x9e
[ 1402.287632] [] cache_do_downcall+0x3e/0x4f
[ 1402.287632] [] cache_write.clone.16+0xbb/0x130
[ 1402.287632] [] ? cache_write_pipefs+0x1a/0x1a
[ 1402.287632] [] cache_write_procfs+0x19/0x1b
[ 1402.287632] [] proc_reg_write+0x8e/0xad
[ 1402.287632] [] vfs_write+0xaa/0xfd
[ 1402.287632] [] ? fget_light+0x35/0x9e
[ 1402.287632] [] sys_write+0x48/0x6f
[ 1402.287632] [] system_call_fastpath+0x16/0x1b
[ 1402.287632] Code: c0 c9 c3 55 48 63 d2 48 89 e5 48 8d 44 32 ff 41 57 41 56 41 55 41 54 53 bb ea ff ff ff 48 81 ec 88 00 00 00 48 89 b5 58 ff ff ff
[ 1402.287632] 38 0a 0f 85 89 02 00 00 c6 00 00 48 8b 3d 44 4a e5 01 48 85
[ 1402.287632] RIP [] expkey_parse+0x28/0x2e1
[ 1402.287632] RSP
[ 1402.287632] CR2: ffff880077c49fff
[ 1402.287632] ---[ end trace 368ef53ff773a5e3 ]---

Cc: "J. Bruce Fields"
Cc: Neil Brown
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Sasha Levin
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Sasha Levin
2012-01-26 08:13:33 +0800
4c5ecfd2e nfsd4: fix lockowner matching ... Browse Code »

commit b93d87c19821ba7d3ee11557403d782e541071ad upstream.

Lockowners are looked up by file as well as by owner, but we were
forgetting to do a comparison on the file. This could cause an
incorrect result from lockt.

(Note looking up the inode from the lockowner is pretty awkward here.
The data structures need fixing.)

Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2012-01-26 08:13:32 +0800
802f43594 Unused iocbs in a batch should not be accounted as active. ... Browse Code »

commit 69e4747ee9727d660b88d7e1efe0f4afcb35db1b upstream.

Since commit 080d676de095 ("aio: allocate kiocbs in batches") iocbs are
allocated in a batch during processing of first iocbs. All iocbs in a
batch are automatically added to ctx->active_reqs list and accounted in
ctx->reqs_active.

If one (not the last one) of iocbs submitted by an user fails, further
iocbs are not processed, but they are still present in ctx->active_reqs
and accounted in ctx->reqs_active. This causes process to stuck in a D
state in wait_for_all_aios() on exit since ctx->reqs_active will never
go down to zero. Furthermore since kiocb_batch_free() frees iocb
without removing it from active_reqs list the list become corrupted
which may cause oops.

Fix this by removing iocb from ctx->active_reqs and updating
ctx->reqs_active in kiocb_batch_free().

Signed-off-by: Gleb Natapov
Reviewed-by: Jeff Moyer
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Gleb Natapov
2012-01-26 08:13:29 +0800
172b291a1 UBIFS: make debugging messages light again ... Browse Code »

commit 1f5d78dc4823a85f112aaa2d0f17624f8c2a6c52 upstream.

We switch to dynamic debugging in commit
56e46742e846e4de167dde0e1e1071ace1c882a5 but did not take into account that
now we do not control anymore whether a specific message is enabled or not.
So now we lock the "dbg_lock" and release it in every debugging macro, which
make them not so light-weight.

This commit removes the "dbg_lock" protection from the debugging macros to
fix the issue.

The downside is that now our DBGKEY() stuff is broken, but this is not
critical at all and will be fixed later.

Signed-off-by: Artem Bityutskiy
Signed-off-by: Greg Kroah-Hartman

Artem Bityutskiy
2012-01-26 08:13:26 +0800
6aeb366d6 UBIFS: fix debugging messages ... Browse Code »

commit d34315da9146253351146140ea4b277193ee5e5f upstream.

Patch 56e46742e846e4de167dde0e1e1071ace1c882a5 broke UBIFS debugging messages:
before that commit when UBIFS debugging was enabled, users saw few useful
debugging messages after mount. However, that patch turned 'dbg_msg()' into
'pr_debug()', so to enable the debugging messages users have to enable them
first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.

This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.

Signed-off-by: Artem Bityutskiy
Signed-off-by: Greg Kroah-Hartman

Artem Bityutskiy
2012-01-26 08:13:25 +0800
7462f4ae1 nfs: fix regression in handling of context= option in NFSv4 ... Browse Code »

commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 upstream.

Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.

Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.

I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Jeff Layton
2012-01-26 08:13:16 +0800
628fc192a NFSv4: include bitmap in nfsv4 get acl data ... Browse Code »

commit bf118a342f10dafe44b14451a1392c3254629a1f upstream.

The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.

This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.

Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Andy Adamson
2012-01-26 08:13:15 +0800