29 Jan, 2012
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c
27 Jan, 2012
11 commits
-
Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.This makes sure we only release if we were able to reserve.
Signed-off-by: Chris Mason
-
If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason -
btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963[] cache_alloc_refill+0x3b4/0x5c8
[] kmem_cache_alloc+0x204/0x294
[] mempool_alloc+0x52/0x170
[] alloc_extent_state+0x40/0xd4 [btrfs]
[] __clear_extent_bit+0x38a/0x4cc [btrfs]
[] try_release_extent_state+0x9c/0xd4 [btrfs]
[] btree_releasepage+0x7e/0xd0 [btrfs]
[] shrink_page_list+0x6a0/0x724
[] shrink_inactive_list+0x230/0x578
[] shrink_list+0x6c/0x120
[] shrink_zone+0x1e2/0x228
[] shrink_zones+0x90/0x254
[] do_try_to_free_pages+0xac/0x420
[] try_to_free_pages+0x13c/0x1b0
[] __alloc_pages_nodemask+0x5b4/0x9a8
[] grab_cache_page_write_begin+0x7e/0xe8Signed-off-by: David Sterba
Signed-off-by: Chris Mason -
When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
# mount /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run
The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.Signed-off-by: Miao Xie
Signed-off-by: Chris Mason -
xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2To fix this, we need to set max_to_defrag count properly.
Signed-off-by: Liu Bo
Signed-off-by: Chris Mason -
There have been 4 warnings on 32-bit build, they are herewith fixed.
Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason -
We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason -
A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.cfor (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}This patch leaves the issue with the locked page yet to be resolved.
Signed-off-by: Mitch Harder
Signed-off-by: Chris Mason -
wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...Signed-off-by: Jan Kara
Signed-off-by: Chris Mason -
We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason -
Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason
26 Jan, 2012
14 commits
-
Quoth Ben Myers:
"Please pull in the following bugfix for xfs. We forgot to drop a lock on
error in xfs_readlink. It hasn't been through -next yet, but there is no
-next tree tomorrow. The fix is clear so I'm sending this request today."* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() -
The data encryption was moved from ecryptfs_write_end into
ecryptfs_writepage, this patch moves the corresponding function
comments to be consistent with the modification.Signed-off-by: Li Wang
Signed-off-by: Linus Torvalds -
Says Tyler:
"Tim's logging message update will be really helpful to users when
they're trying to locate a problematic file in the lower filesystem
with filename encryption enabled.You'll recognize the fix from Li, as you commented on that.
You should also be familiar with my setattr/truncate improvements,
since you were the one that pointed them out to us (thanks again!).
Andrew noted the /dev/ecryptfs write count sanitization needed to be
improved, so I've got a fix in there for that along with some other
less important cleanups of the /dev/ecryptfs read/write code."* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Fix oops when printing debug info in extent crypto functions
eCryptfs: Remove unused ecryptfs_read()
eCryptfs: Check inode changes in setattr
eCryptfs: Make truncate path killable
eCryptfs: Infinite loop due to overflow in ecryptfs_write()
eCryptfs: Replace miscdev read/write magic numbers
eCryptfs: Report errors in writes to /dev/ecryptfs
eCryptfs: Sanitize write counts of /dev/ecryptfs
ecryptfs: Remove unnecessary variable initialization
ecryptfs: Improve metadata read failure logging
MAINTAINERS: Update eCryptfs maintainer address -
If pages passed to the eCryptfs extent-based crypto functions are not
mapped and the module parameter ecryptfs_verbosity=1 was specified at
loading time, a NULL pointer dereference will occur.Note that this wouldn't happen on a production system, as you wouldn't
pass ecryptfs_verbosity=1 on a production system. It leaks private
information to the system logs and is for debugging only.The debugging info printed in these messages is no longer very useful
and rather than doing a kmap() in these debugging paths, it will be
better to simply remove the debugging paths completely.https://launchpad.net/bugs/913651
Signed-off-by: Tyler Hicks
Reported-by: Daniel DeFreez
Cc: -
ecryptfs_read() has been ifdef'ed out for years now and it was
apparently unused before then. It is time to get rid of it for good.Signed-off-by: Tyler Hicks
-
Most filesystems call inode_change_ok() very early in ->setattr(), but
eCryptfs didn't call it at all. It allowed the lower filesystem to make
the call in its ->setattr() function. Then, eCryptfs would copy the
appropriate inode attributes from the lower inode to the eCryptfs inode.This patch changes that and actually calls inode_change_ok() on the
eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
would happen earlier in ecryptfs_setattr(), but there are some possible
inode initialization steps that must happen first.Since the call was already being made on the lower inode, the change in
functionality should be minimal, except for the case of a file extending
truncate call. In that case, inode_newsize_ok() was never being
called on the eCryptfs inode. Rather than inode_newsize_ok() catching
maximum file size errors early on, eCryptfs would encrypt zeroed pages
and write them to the lower filesystem until the lower filesystem's
write path caught the error in generic_write_checks(). This patch
introduces a new function, called ecryptfs_inode_newsize_ok(), which
checks if the new lower file size is within the appropriate limits when
the truncate operation will be growing the lower file.In summary this change prevents eCryptfs truncate operations (and the
resulting page encryptions), which would exceed the lower filesystem
limits or FSIZE rlimits, from ever starting.Signed-off-by: Tyler Hicks
Reviewed-by: Li Wang
Cc: -
ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
page, zeroes out the appropriate portions, and then encrypts the page
before writing it to the lower filesystem. It was unkillable and due to
the lack of sparse file support could result in tying up a large portion
of system resources, while encrypting pages of zeros, with no way for
the truncate operation to be stopped from userspace.This patch adds the ability for ecryptfs_write() to detect a pending
fatal signal and return as gracefully as possible. The intent is to
leave the lower file in a useable state, while still allowing a user to
break out of the encryption loop. If a pending fatal signal is detected,
the eCryptfs inode size is updated to reflect the modified inode size
and then -EINTR is returned.Signed-off-by: Tyler Hicks
Cc: -
ecryptfs_write() can enter an infinite loop when truncating a file to a
size larger than 4G. This only happens on architectures where size_t is
represented by 32 bits.This was caused by a size_t overflow due to it incorrectly being used to
store the result of a calculation which uses potentially large values of
type loff_t.[tyhicks@canonical.com: rewrite subject and commit message]
Signed-off-by: Li Wang
Signed-off-by: Yunchuan Wen
Reviewed-by: Cong Wang
Cc:
Signed-off-by: Tyler Hicks -
ecryptfs_miscdev_read() and ecryptfs_miscdev_write() contained many
magic numbers for specifying packet header field sizes and offsets. This
patch defines those values and replaces the magic values.Signed-off-by: Tyler Hicks
-
Errors in writes to /dev/ecryptfs were being incorrectly reported by
returning 0 or the value of the original write count.This patch clears up the return code assignment in error paths.
Signed-off-by: Tyler Hicks
-
A malicious count value specified when writing to /dev/ecryptfs may
result in a a very large kernel memory allocation.This patch peeks at the specified packet payload size, adds that to the
size of the packet headers and compares the result with the write count
value. The resulting maximum memory allocation size is approximately 532
bytes.Signed-off-by: Tyler Hicks
Reported-by: Sasha Levin
Cc: -
Removes unneeded variable initialization in ecryptfs_read_metadata(). Also adds
a small comment to help explain metadata reading logic.[tyhicks@canonical.com: Pulled out of for-stable patch and wrote commit msg]
Signed-off-by: Tim Gardner
Signed-off-by: Tyler Hicks -
Print inode on metadata read failure. The only real
way of dealing with metadata read failures is to delete
the underlying file system file. Having the inode
allows one to 'find . -inum INODE`.[tyhicks@canonical.com: Removed some minor not-for-stable parts]
Signed-off-by: Tim Gardner
Reviewed-by: Kees Cook
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks -
Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.CC: stable@kernel.org
CC: Carlos Maiolino
Signed-off-by: Jan Kara
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers
25 Jan, 2012
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Pass information that quota is stored in system file to userspace
ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
jbd: Issue cache flush after checkpointing
24 Jan, 2012
5 commits
-
The usual kernel-doc fixups from Randy. Some of them David acked as
merged in his tree, this is the random left-overs.* kernel-doc:
docbook: fix sched source file names in device-drivers book
docbook: change iomap source filename in deviceiobook
docbook: don't use serial_core.h in device-drivers book
kernel-doc: fix kernel-doc warnings in sched
kernel-doc: fix new warnings in cfg80211.h
kernel-doc: fix new warning in usb.h
kernel-doc: fix new warnings in device.h
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warning in regulator core
kernel-doc: fix new warnings in pci
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in auditsc.c
scripts/kernel-doc: fix fatal error caused by cfg80211.h -
Quoth Andrew:
"Random fixes. And a simple new LED driver which I'm trying to sneak
in while you're not looking."Sneaking successful.
* akpm:
score: fix off-by-one index into syscall table
mm: fix rss count leakage during migration
SHM_UNLOCK: fix Unevictable pages stranded after swap
SHM_UNLOCK: fix long unpreemptible section
kdump: define KEXEC_NOTE_BYTES arch specific for s390x
mm/hugetlb.c: undo change to page mapcount in fault handler
mm: memcg: update the correct soft limit tree during migration
proc: clear_refs: do not clear reserved pages
drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails
drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
kprobes: initialize before using a hlist
ipc/mqueue: simplify reading msgqueue limit
leds: add led driver for Bachmann's ot200
mm: __count_immobile_pages(): make sure the node is online
mm: fix NULL ptr dereference in __count_immobile_pages
mm: fix warnings regarding enum migrate_mode -
* git://git.samba.org/sfrench/cifs-2.6:
CIFS: Rename *UCS* functions to *UTF16*
[CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
[CIFS] Fix build break with multiuser patch when LANMAN disabled
cifs: warn about impending deprecation of legacy MultiuserMount code
cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
cifs: sanitize username handling
keys: add a "logon" key type
cifs: lower default wsize when unix extensions are not used
cifs: better instrumentation for coalesce_t2
cifs: integer overflow in parse_dacl()
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions -
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'Signed-off-by: Randy Dunlap
Cc: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds -
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.To see this problem in action, just run:
$ echo 1 > /proc/self/clear_refs
on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.Signed-off-by: Will Deacon
Reported-by: Moussa Ba
Acked-by: Hugh Dickins
Cc: David Rientjes
Cc: Russell King
Acked-by: Nicolas Pitre
Cc: Matt Mackall
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jan, 2012
5 commits
-
…-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/accounting, proc: Fix /proc/stat interrupts sum* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT
x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore
x86/kprobes: Fix typo transferred from Intel manual* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
x86/kconfig: Move the ZONE_DMA entry under a menu
x86/UV2: Add accounting for BAU strong nacks
x86/UV2: Ack BAU interrupt earlier
x86/UV2: Remove stale no-resources test for UV2 BAU
x86/UV2: Work around BAU bug
x86/UV2: Fix BAU destination timeout initialization
x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode
x86: Get rid of dubious one-bit signed bitfield -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qnx4: don't leak ->BitMap on late failure exits
qnx4: reduce the insane nesting in qnx4_checkroot()
qnx4: di_fname is an array, for crying out loud...
vfs: remove printk from set_nlink()
wake up s_wait_unfrozen when ->freeze_fs fails -
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
(struct qnx4_inode_entry *)(bh->b_data + some_offset)->di_fname
is not going to be NULL, TYVM...Signed-off-by: Al Viro
19 Jan, 2012
3 commits
-
to reflect the unicode encoding used by CIFS protocol.
Signed-off-by: Pavel Shilovsky
Acked-by: Jeff Layton
Reviewed-by: Shirish Pargaonkar -
CIFS ACL support and FSCACHE support have been in long enough
to be no longer considered experimental. Remove obsolete Kconfig
dependency.Signed-off-by: Steve French
Acked-by: Jeff Layton -
CC: Jeff Layton
Signed-off-by: Steve French