Eric Lee / smarc-fsl-linux-kernel

10 Dec, 2011

1 commit

8def5f51b Merge git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

* git://git.samba.org/sfrench/cifs-2.6:
cifs: check for NULL last_entry before calling cifs_save_resume_key
cifs: attempt to freeze while looping on a receive attempt
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions

Linus Torvalds
2011-12-10 06:45:44 +0800

09 Dec, 2011

7 commits

2a95ea6c0 procfs: do not overflow get_{idle,iowait}_time for nohz ... Browse Code »

Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless. We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int. Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko
Tested-by: Artem S. Tashkinov
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-12-09 23:50:29 +0800
b53fc7c29 fs/proc/meminfo.c: fix compilation error ... Browse Code »
1

Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino
Cc: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Claudio Scordino
2011-12-09 23:50:28 +0800
7023676f9 cifs: check for NULL last_entry before calling cifs_save_resume_key ... Browse Code »

Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.

When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.

Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.

This should fix:

https://bugzilla.redhat.com/show_bug.cgi?id=750247

Cc: stable@vger.kernel.org
Cc: Christoph Hellwig
Reported-by: Adam G. Metzler
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2011-12-09 12:04:47 +0800
95edcff49 cifs: attempt to freeze while looping on a receive attempt ... Browse Code »

In the recent overhaul of the demultiplex thread receive path, I
neglected to ensure that we attempt to freeze on each pass through the
receive loop.

Reported-and-Tested-by: Woody Suwalski
Reported-and-Tested-by: Adam Williamson
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2011-12-09 12:04:47 +0800
59edb63ad cifs: Fix sparse warning when calling cifs_strtoUCS ... Browse Code »

Fix sparse endian check warning while calling cifs_strtoUCS

CHECK fs/cifs/smbencrypt.c
fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1
(different base types)
fs/cifs/smbencrypt.c:216:37: expected restricted __le16 [usertype] *
fs/cifs/smbencrypt.c:216:37: got unsigned short *

Signed-off-by: Steve French
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com

Steve French
2011-12-09 12:04:47 +0800
9a5101c89 CIFS: Add descriptions to the brlock cache functions ... Browse Code »

Signed-off-by: Pavel Shilovsky
Signed-off-by: Steve French

Pavel Shilovsky
2011-12-09 12:04:47 +0800
fb38f9b8f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: drop spin lock when memory alloc fails
Btrfs: check if the to-be-added device is writable
Btrfs: try cluster but don't advance in search list
Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE

Linus Torvalds
2011-12-09 05:18:59 +0800

08 Dec, 2011

6 commits

1cf4ffdb3 Btrfs: drop spin lock when memory alloc fails ... Browse Code »

Drop spin lock in convert_extent_bit() when memory alloc fails,
otherwise, it will be a deadlock.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-12-08 21:55:47 +0800
a5d163336 Btrfs: check if the to-be-added device is writable ... Browse Code »

If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-12-08 21:55:46 +0800
274bd4fb3 Btrfs: try cluster but don't advance in search list ... Browse Code »

When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.

This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the
attempt.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-08 21:55:40 +0800
062c05c46 Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE ... Browse Code »

If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that
others might have set up. Odds are that there won't be one, but if
someone else succeeded in setting it up, we might as well use it, even
if we don't try to set up a cluster again.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-08 08:50:42 +0800
a694ad94b Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix the logspace waiting algorithm
xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels
xfs: fix allocation length overflow in xfs_bmapi_write()

Linus Torvalds
2011-12-08 08:13:54 +0800
3172f8fe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API

Linus Torvalds
2011-12-08 00:14:42 +0800

07 Dec, 2011

3 commits

02125a826 fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API ... Browse Code »
1

__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that. The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root. Without grabbing references. Sure, at the moment of call it had
been pinned down by what we have in *path. And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.

It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?". Dereferencing is not safe and apparmor ended up stepping into
that. d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace. As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.

The fix is fairly straightforward, even though it's bigger than I'd like:
* prepend_path() root argument becomes const.
* __d_path() is never called with NULL/NULL root. It was a kludge
to start with. Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops. apparmor and tomoyo are using it.
* __d_path() returns NULL on path outside of root. The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail. Those who don't want that can (and do)
use d_path().
* __d_path() root argument becomes const. Everyone agrees, I hope.
* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount. In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there. Handling of sysctl()-triggered weirdness is moved to that place.
* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead. That's the other remaining caller of __d_path(),
BTW.
* seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine). However, if it gets path not reachable
from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).

Reviewed-by: John Johansen
ACKed-by: John Johansen
Signed-off-by: Al Viro
Cc: stable@vger.kernel.org

Al Viro
2011-12-07 12:57:18 +0800
9f9c19ec1 xfs: fix the logspace waiting algorithm ... Browse Code »

Apply the scheme used in log_regrant_write_log_space to wake up any other
threads waiting for log space before the newly added one to
log_regrant_write_log_space as well, and factor the code into readable
helpers. For each of the queues we have add two helpers:

- one to try to wake up all waiting threads. This helper will also be
usable by xfs_log_move_tail once we remove the current opportunistic
wakeups in it.
- one to sleep on t_wait until enough log space is available, loosely
modelled after Linux waitqueues.

And use them to reimplement the guts of log_regrant_write_log_space and
log_regrant_write_log_space. These two function now use one and the same
algorithm for waiting on log space instead of subtly different ones before,
with an option to completely unify them in the near future.

Also move the filesystem shutdown handling to the common caller given
that we had to touch it anyway.

Based on hard debugging and an earlier patch from
Chandra Seetharaman .

Signed-off-by: Christoph Hellwig
Reviewed-by: Chandra Seetharaman
Tested-by: Chandra Seetharaman
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-07 04:19:47 +0800
c29f7d457 xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels ... Browse Code »
1

The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels. We have the full
inode number in the XFS inode, so use that one for nfs exports. Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.

Reported-by: Guoquan Yang
Reported-by: Hank Peng
Signed-off-by: Christoph Hellwig
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-07 00:46:23 +0800

03 Dec, 2011

2 commits

a99ebf43f xfs: fix allocation length overflow in xfs_bmapi_write() ... Browse Code »

When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():

XFS: Assertion failed: args->minlen maxlen, file: fs/xfs/xfs_alloc.c, line: 2239

The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.

This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.

Fix it by limiting the extent lenth to allocate to MAXEXTLEN.

Signed-off-by: Dave Chinner
Signed-off-by: Ben Myers
Reviewed-by: Christoph Hellwig

Dave Chinner
2011-12-03 06:24:02 +0800
ffb8fb546 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix attr2 vs large data fork assert
xfs: force buffer writeback before blocking on the ilock in inode reclaim
xfs: validate acl count

Linus Torvalds
2011-12-03 02:38:20 +0800

02 Dec, 2011

3 commits

0a4ebed78 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
ocfs2: avoid unaligned access to dqc_bitmap
ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
ocfs2: honor O_(D)SYNC flag in fallocate
ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
ocfs2: send correct UUID to cleancache initialization
ocfs2: Commit transactions in error cases -v2
ocfs2: make direntry invalid when deleting it
fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
ocfs2: Avoid livelock in ocfs2_readpage()
ocfs2: serialize unaligned aio
ocfs2: Implement llseek()
ocfs2: Fix ocfs2_page_mkwrite()
ocfs2: Add comment about orphan scanning
ocfs2: Clean up messages in the fs
ocfs2/cluster: Cluster up now includes network connections too
ocfs2/cluster: Add new function o2net_fill_node_map()
ocfs2/cluster: Fix output in file elapsed_time_in_ms
ocfs2/dlm: dlmlock_remote() needs to account for remastery
ocfs2/dlm: Take inflight reference count for remotely mastered resources too
ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
...

Linus Torvalds
2011-12-02 06:55:34 +0800
939255798 ocfs2: avoid unaligned access to dqc_bitmap ... Browse Code »

The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita
Acked-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Joel Becker

Akinobu Mita
2011-12-02 06:39:32 +0800
b930c2641 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix meta data raid-repair merge problem
Btrfs: skip allocation attempt from empty cluster
Btrfs: skip block groups without enough space for a cluster
Btrfs: start search for new cluster at the beginning
Btrfs: reset cluster's max_size when creating bitmap
Btrfs: initialize new bitmaps' list
Btrfs: fix oops when calling statfs on readonly device
Btrfs: Don't error on resizing FS to same size
Btrfs: fix deadlock on metadata reservation when evicting a inode
Fix URL of btrfs-progs git repository in docs
btrfs scrub: handle -ENOMEM from init_ipath()

Linus Torvalds
2011-12-02 00:28:53 +0800

01 Dec, 2011

10 commits

f4a8e6563 Btrfs: fix meta data raid-repair merge problem ... Browse Code »

Commit 4a54c8c16 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.

Signed-off-by: Jan Schmidt
Reported-by: Stefan Behrens
Signed-off-by: Chris Mason

Jan Schmidt
2011-12-01 22:30:36 +0800
be064d113 Btrfs: skip allocation attempt from empty cluster ... Browse Code »

If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
425d83156 Btrfs: skip block groups without enough space for a cluster ... Browse Code »

We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
1b22bad77 Btrfs: start search for new cluster at the beginning ... Browse Code »

Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
b78d09bce Btrfs: reset cluster's max_size when creating bitmap ... Browse Code »

The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk. We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
f2d0f6765 Btrfs: initialize new bitmaps' list ... Browse Code »

We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation. For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.

Alexandre Oliva
2011-12-01 01:46:06 +0800
b772a86ea Btrfs: fix oops when calling statfs on readonly device ... Browse Code »

To reproduce this bug:

# dd if=/dev/zero of=img bs=1M count=256
# mkfs.btrfs img
# losetup -r /dev/loop1 img
# mount /dev/loop1 /mnt
OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

# df -h /dev/loop1
Filesystem Size Used Avail Use% Mounted on
/dev/loop1 250M 28K 238M 1% /mnt

Signed-off-by: Li Zefan

Li Zefan
2011-12-01 01:46:05 +0800
ece7d20e8 Btrfs: Don't error on resizing FS to same size ... Browse Code »

It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed. User app GParted trips
over this error. Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood

Mike Fleetwood
2011-12-01 01:46:04 +0800
aa38a711a Btrfs: fix deadlock on metadata reservation when evicting a inode ... Browse Code »

When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
start transaction
|
v
reserve meta-data space
|
v
flush delay allocation -> iput inode -> evict inode
^ |
| v
wait for delay allocation flush

Miao Xie
2011-12-01 01:46:03 +0800
26bdef541 btrfs scrub: handle -ENOMEM from init_ipath() ... Browse Code »

init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter

Dan Carpenter
2011-12-01 01:46:01 +0800

30 Nov, 2011

3 commits

4c393a605 xfs: fix attr2 vs large data fork assert ... Browse Code »
1

With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute. It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some loose bits.

Reviewed-by: Dave Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Ben Myers

Christoph Hellwig
2011-11-30 03:03:12 +0800
4dd2cb4a2 xfs: force buffer writeback before blocking on the ilock in inode reclaim ... Browse Code »
1

If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim. So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.

The scheme is copied from what we do for dquot reclaims.

Reported-by: Simon Kirby
Signed-off-by: Christoph Hellwig
Tested-by: Simon Kirby
Signed-off-by: Ben Myers

Christoph Hellwig
2011-11-30 02:06:14 +0800
883381d9f Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix racy use-after-free in ext4_end_io_dio()

Linus Torvalds
2011-11-30 00:59:12 +0800

29 Nov, 2011

2 commits

fa8b18edd xfs: validate acl count ... Browse Code »
2

This prevents in-memory corruption and possible panics if the on-disk
ACL is badly corrupted.

Signed-off-by: Christoph Hellwig
Signed-off-by: Ben Myers

Christoph Hellwig
2011-11-29 12:14:24 +0800
cb3599926 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore: pass allocated memory region back to caller

Linus Torvalds
2011-11-29 03:27:57 +0800

25 Nov, 2011

1 commit

4c81f045c ext4: fix racy use-after-free in ext4_end_io_dio() ... Browse Code »

ext4_end_io_dio() queues io_end->work and then clears iocb->private;
however, io_end->work calls aio_complete() which frees the iocb
object. If that slab object gets reallocated, then ext4_end_io_dio()
can end up clearing someone else's iocb->private, this use-after-free
can cause a leak of a struct ext4_io_end_t structure.

Detected and tested with slab poisoning.

[ Note: Can also reproduce using 12 fio's against 12 file systems with the
following configuration file:

[global]
direct=1
ioengine=libaio
iodepth=1
bs=4k
ba=4k
size=128m

[create]
filename=${TESTDIR}
rw=write

-- tytso ]

Google-Bug-Id: 5354697
Signed-off-by: Tejun Heo
Signed-off-by: "Theodore Ts'o"
Reported-by: Kent Overstreet
Tested-by: Kent Overstreet
Cc: stable@kernel.org

Tejun Heo
2011-11-25 08:22:24 +0800

24 Nov, 2011

2 commits

de7badf1a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Extend array bounds for all filename chars
eCryptfs: Flush file in vma close
eCryptfs: Prevent file create race condition

Linus Torvalds
2011-11-24 06:28:13 +0800
0f751e641 eCryptfs: Extend array bounds for all filename chars ... Browse Code »
1

From mhalcrow's original commit message:

Characters with ASCII values greater than the size of
filename_rev_map[] are valid filename characters.
ecryptfs_decode_from_filename() will access kernel memory beyond
that array, and ecryptfs_parse_tag_70_packet() will then decrypt
those characters. The attacker, using the FNEK of the crafted file,
can then re-encrypt the characters to reveal the kernel memory past
the end of the filename_rev_map[] array. I expect low security
impact since this array is statically allocated in the text area,
and the amount of memory past the array that is accessible is
limited by the largest possible ASCII filename character.

This patch solves the issue reported by mhalcrow but with an
implementation suggested by Linus to simply extend the length of
filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
0x00, which is how invalid characters less than 0x7A were previously
being handled.

Signed-off-by: Tyler Hicks
Reported-by: Michael Halcrow
Cc: stable@kernel.org

Tyler Hicks
2011-11-24 05:43:53 +0800