Eric Lee / smarc-fsl-linux-kernel

08 Feb, 2012

1 commit

84f8bf38b Merge git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

* git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops in session setup code for null user mounts
[CIFS] Update cifs Kconfig title to match removal of experimental dependency
cifs: fix printk format warnings
cifs: check offset in decode_ntlmssp_challenge()
cifs: NULL dereference on allocation failure

Linus Torvalds
2012-02-08 06:07:20 +0800

07 Feb, 2012

1 commit

96e02d158 exec: fix use-after-free bug in setup_new_exec() ... Browse Code »

Setting the task name is done within setup_new_exec() by accessing
bprm->filename. However this happens after flush_old_exec().
This may result in a use after free bug, flush_old_exec() may
"complete" vfork_done, which will wake up the parent which in turn
may free the passed in filename.
To fix this add a new tcomm field in struct linux_binprm which
contains the now early generated task name until it is used.

Fixes this bug on s390:

Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
Call Trace:
([] setup_new_exec+0x38/0x374)
[] load_elf_binary+0x402/0x1bf4
[] search_binary_handler+0x38e/0x5bc
[] do_execve_common+0x410/0x514
[] do_execve+0x46/0x58
[] kernel_execve+0x28/0x70
[] ____call_usermodehelper+0x102/0x140
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
Last Breaking-Event-Address:
[] setup_new_exec+0x2fc/0x374

Kernel panic - not syncing: Fatal exception: panic_on_oops

Reported-by: Sebastian Ott
Signed-off-by: Heiko Carstens
Signed-off-by: Linus Torvalds

Heiko Carstens
2012-02-07 07:15:20 +0800

04 Feb, 2012

1 commit

71b1b20b8 Merge tag 'for-linus-3.3' of git://git.infradead.org/~dwmw2/mtd-3.3 ... Browse Code »

- Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
- Fix breakage with MTD suspend caused by the API rework
- Fix a problem with resetting the MX28 BCH module
- A couple of other trivial fixes

* tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3:
Revert "mtd: atmel_nand: optimize read/write buffer functions"
mtd: fix MTD suspend
jffs2: do not initialize variable unnecessarily
mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23
mtd: nand: fix typo in comment

Linus Torvalds
2012-02-04 23:17:47 +0800

03 Feb, 2012

6 commits

6c073a7ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix safety of rbd_put_client()
rbd: fix a memory leak in rbd_get_client()
ceph: create a new session lock to avoid lock inversion
ceph: fix length validation in parse_reply_info()
ceph: initialize client debugfs outside of monc->mutex
ceph: change "ceph.layout" xattr to be "ceph.file.layout"

Linus Torvalds
2012-02-03 07:47:33 +0800
de47a4176 cifs: Fix oops in session setup code for null user mounts ... Browse Code »
1

For null user mounts, do not invoke string length function
during session setup.

Cc:
Acked-by: Jeff Layton
Signed-off-by: Shirish Pargaonkar
Signed-off-by: Steve French

Shirish Pargaonkar
2012-02-03 06:59:09 +0800
8cdb878dc Fix race in process_vm_rw_core ... Browse Code »

This fixes the race in process_vm_core found by Oleg (see

http://article.gmane.org/gmane.linux.kernel/1235667/

for details).

This has been updated since I last sent it as the creation of the new
mm_access() function did almost exactly the same thing as parts of the
previous version of this patch did.

In order to use mm_access() even when /proc isn't enabled, we move it to
kernel/fork.c where other related process mm access functions already
are.

Signed-off-by: Chris Yeoh
Signed-off-by: Linus Torvalds

Christopher Yeoh
2012-02-03 04:55:17 +0800
d8fb02abd ceph: create a new session lock to avoid lock inversion ... Browse Code »

Lockdep was reporting a possible circular lock dependency in
dentry_lease_is_valid(). That function needs to sample the
session's s_cap_gen and and s_cap_ttl fields coherently, but needs
to do so while holding a dentry lock. The s_cap_lock field was
being used to protect the two fields, but that can't be taken while
holding a lock on a dentry within the session.

In most cases, the s_cap_gen and s_cap_ttl fields only get operated
on separately. But in three cases they need to be updated together.
Implement a new lock to protect the spots updating both fields
atomically is required.

Signed-off-by: Alex Elder
Reviewed-by: Sage Weil

Alex Elder
2012-02-03 04:49:19 +0800
32852a81b ceph: fix length validation in parse_reply_info() ... Browse Code »

"len" is read from network and thus needs validation. Otherwise, given
a bogus "len" value, p+len could be an out-of-bounds pointer, which is
used in further parsing.

Signed-off-by: Xi Wang
Signed-off-by: Sage Weil

Xi Wang
2012-02-03 04:49:11 +0800
114fc4749 ceph: change "ceph.layout" xattr to be "ceph.file.layout" ... Browse Code »

The virtual extended attribute named "ceph.layout" is meaningful
only for regular files. Change its name to be "ceph.file.layout" to
more directly reflect that in the ceph xattr namespace. Preserve
the old "ceph.layout" name for the time being (until we decide it's
safe to get rid of it entirely).

Add a missing initializer for "readonly" in the terminating entry.

Signed-off-by: Alex Elder
Reviewed-by: Sage Weil

Alex Elder
2012-02-03 04:48:52 +0800

02 Feb, 2012

4 commits

6d08f2c71 proc: make sure mem_open() doesn't pin the target's memory ... Browse Code »
1

Once /proc/pid/mem is opened, the memory can't be released until
mem_release() even if its owner exits.

Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
before access_remote_vm(), this verifies that this mm is still alive.

I am not sure what should mem_rw() return if atomic_inc_not_zero()
fails. With this patch it returns zero to match the "mm == NULL" case,
may be it should return -EINVAL like it did before e268337d.

Perhaps it makes sense to add the additional fatal_signal_pending()
check into the main loop, to ensure we do not hold this memory if
the target task was oom-killed.

Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-02-02 06:39:01 +0800
572d34b94 proc: unify mem_read() and mem_write() ... Browse Code »
1

No functional changes, cleanup and preparation.

mem_read() and mem_write() are very similar. Move this code into the
new common helper, mem_rw(), which takes the additional "int write"
argument.

Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-02-02 06:39:01 +0800
71879d3cb proc: mem_release() should check mm != NULL ... Browse Code »
1

mem_release() can hit mm == NULL, add the necessary check.

Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-02-02 06:39:01 +0800
7d7310192 mtd: fix merge conflict resolution breakage ... Browse Code »

This patch fixes merge conflict resolution breakage introduced by merge
d3712b9dfcf4 ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream").

The commit changed 'mtd_can_have_bb()' function and made it always
return zero, which is incorrect. Instead, we need it to return whether
the underlying flash device can have bad eraseblocks or not. UBI needs
this information because it affects how it handles the underlying flash.
E.g., if the underlying flash is NOR, it cannot have bad blocks and any
write or erase error is fatal, and all we can do is to switch to R/O
mode. We do not need to reserve a pool of good eraseblocks for bad
eraseblocks handling, and so on.

This patch also removes 'mtd_can_have_bb()' invocations from Logfs to
ensure correct Logfs behavior.

I've tested that with this patch UBI works on top of NOR and NAND
flashes emulated by mtdram and nandsim correspondingly.

This patch is based on patch from Linus Torvalds.

Signed-off-by: Artem Bityutskiy
Acked-by: Jörn Engel
Acked-by: Prasad Joshi
Acked-by: Brian Norris
Signed-off-by: Linus Torvalds

Artem Bityutskiy
2012-02-02 03:10:24 +0800

01 Feb, 2012

2 commits

2a73ca820 [CIFS] Update cifs Kconfig title to match removal of experimental dependency ... Browse Code »

Removed the dependency on CONFIG_EXPERIMENTAL but forgot to update
the text description to be consistent.

Signed-off-by: Steve French

Steve French
2012-02-01 02:51:24 +0800
d3712b9df Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream ... Browse Code »
43

There are few important bug fixes for LogFS

* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
Logfs: Allow NULL block_isbad() methods
logfs: Grow inode in delete path
logfs: Free areas before calling generic_shutdown_super()
logfs: remove useless BUG_ON
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
logfs: Propagate page parameter to __logfs_write_inode
logfs: set superblock shutdown flag after generic sb shutdown
logfs: take write mutex lock during fsync and sync
logfs: Prevent memory corruption
logfs: update page reference count for pined pages

Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
"mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL
block_isbad() methods" clashing with the abstraction changes in the
commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".

This resolution takes the semantics from commit f2933e86ad93, and just
makes mtd_block_isbad() return zero (false) if the 'block_isbad'
function is NULL. But that also means that now "mtd_can_have_bb()"
always returns 0.

Now, "mtd_block_markbad()" will obviously return an error if the
low-level driver doesn't support bad blocks, so this is somewhat
non-symmetric, but it actually makes sense if a NULL "block_isbad"
function is considered to mean "I assume that all my blocks are always
good".

Linus Torvalds
2012-02-01 01:23:59 +0800

31 Jan, 2012

2 commits

000f9bb83 cifs: fix printk format warnings ... Browse Code »

Fix printk format warnings for ssize_t variables:

fs/cifs/connect.c:2145:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2152:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2160:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2170:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

Signed-off-by: Randy Dunlap
Acked-by: Jeff Layton
Cc: linux-cifs@vger.kernel.org

Randy Dunlap
2012-01-31 21:42:08 +0800
4991a5faa cifs: check offset in decode_ntlmssp_challenge() ... Browse Code »

We should check that we're not copying memory from beyond the end of the
blob.

Signed-off-by: Dan Carpenter
Reviewed-by: Jeff Layton

Dan Carpenter
2012-01-31 21:42:06 +0800

29 Jan, 2012

2 commits

0a9626575 Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/k… ... Browse Code »

…ernel/git/gregkh/driver-core

Here are some patches for the 3.3-rc1 tree.

It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.

* tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
stable: update documentation to ask for kernel version
base/core.c:fix typo in comment in function device_add
Documentation: devres: add allocation functions to list of supported calls
Documentation update for the driver model core
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warnings in device.h
driver core: remove drivers/base/sys.c and include/linux/sysdev.h

Linus Torvalds
2012-01-29 10:20:48 +0800
67d2433ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c

Linus Torvalds
2012-01-29 09:00:19 +0800

28 Jan, 2012

9 commits

f2933e86a Logfs: Allow NULL block_isbad() methods ... Browse Code »
86

Not all mtd drivers define block_isbad(). Let's assume no bad blocks
instead of refusing to mount.

Signed-off-by: Joern Engel

Joern Engel
2012-01-28 14:13:40 +0800
bbe013871 logfs: Grow inode in delete path ... Browse Code »

Can be necessary if an inode gets deleted (through -ENOSPC) before being
written. Might be better to move this into logfs_write_rec(), but for
now go with the stupid&safe patch.

Signed-off-by: Joern Engel

Joern Engel
2012-01-28 14:13:07 +0800
1bcceaff8 logfs: Free areas before calling generic_shutdown_super() ... Browse Code »

Or hit an assertion in map_invalidatepage() instead.

Signed-off-by: Joern Engel

Joern Engel
2012-01-28 14:12:39 +0800
6c69494f6 logfs: remove useless BUG_ON ... Browse Code »

It prevents write sizes >4k.

Signed-off-by: Joern Engel

Joern Engel
2012-01-28 14:11:56 +0800
0bd90387e logfs: Propagate page parameter to __logfs_write_inode ... Browse Code »

During GC LogFS has to rewrite each valid block to a separate segment.
Rewrite operation reads data from an old segment and writes it to a
newly allocated segment. Since every write operation changes data
block pointers maintained in inode, inode should also be rewritten.

In GC path to avoid AB-BA deadlock LogFS marks a page with
PG_pre_locked in addition to locking the page (PG_locked). The page
lock is ignored iff the page is pre-locked.

LogFS uses a special file called segment file. The segment file
maintains an 8 bytes entry for every segment. It keeps track of erase
count, level etc. for every segment.

Bad things happen with a segment belonging to the segment file is GCed

------------[ cut here ]------------
kernel BUG at /home/prasad/logfs/readwrite.c:297!
invalid opcode: 0000 [#1] SMP
Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4
serio_raw [last unloaded: logfs]
Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH
VirtualBox
EIP: 0060:[] EFLAGS: 00010292 CPU: 0
EIP is at logfs_lock_write_page+0x6a/0x70 [logfs]
EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094
ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000)
Stack:
f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4
00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5
00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900
Call Trace:
[] logfs_get_write_page.clone.16+0xdd/0x100 [logfs]
[] logfs_mod_segment_entry+0x55/0x110 [logfs]
[] logfs_get_segment_entry+0x1d/0x20 [logfs]
[] ? logfs_cleanup_journal+0x50/0x50 [logfs]
[] ostore_get_erase_count+0x1b/0x40 [logfs]
[] logfs_open_area+0xc8/0x150 [logfs]
[] ? kmemleak_alloc+0x2c/0x60
[] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs]
[] ? mempool_kmalloc+0x13/0x20
[] ? mempool_kmalloc+0x13/0x20
[] logfs_segment_write+0x17f/0x1d0 [logfs]
[] logfs_write_i0+0x11c/0x180 [logfs]
[] logfs_write_direct+0x45/0x90 [logfs]
[] __logfs_write_buf+0xbd/0xf0 [logfs]
[] ? kmap_atomic_prot+0x4e/0xe0
[] logfs_write_buf+0x3b/0x60 [logfs]
[] __logfs_write_inode+0xa9/0x110 [logfs]
[] logfs_rewrite_block+0xc0/0x110 [logfs]
[] ? get_mapping_page+0x10/0x60 [logfs]
[] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs]
[] logfs_gc_segment+0x2ad/0x310 [logfs]
[] __logfs_gc_once+0x4a/0x80 [logfs]
[] logfs_gc_pass+0x683/0x6a0 [logfs]
[] logfs_mount+0x5a9/0x680 [logfs]
[] mount_fs+0x21/0xd0
[] ? __alloc_percpu+0xf/0x20
[] ? alloc_vfsmnt+0xb1/0x130
[] vfs_kern_mount+0x4b/0xa0
[] do_kern_mount+0x3e/0xe0
[] do_mount+0x34d/0x670
[] ? strndup_user+0x49/0x70
[] sys_mount+0x6b/0xa0
[] syscall_call+0x7/0xb
Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09
EIP: [] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18
---[ end trace 96e67d5b3aa3d6ca ]---

The patch passes locked page to __logfs_write_inode. It calls function
logfs_get_wblocks() to pre-lock the page. This ensures any further
attempts to lock the page are ignored (esp from get_erase_count).

Acked-by: Joern Engel
Signed-off-by: Prasad Joshi

Prasad Joshi
2012-01-28 14:08:25 +0800
ecfd89099 logfs: set superblock shutdown flag after generic sb shutdown ... Browse Code »

While unmounting the file system LogFS calls generic_shutdown_super.
The function does file system independent superblock shutdown.
However, it might result in call file system specific inode eviction.

LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in
super->s_flags. Since, inode eviction might call truncate on inode,
following BUG is observed when file system is unmounted:

------------[ cut here ]------------
kernel BUG at /home/prasad/logfs/segment.c:362!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU 3
Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp
parport psmouse floppy virtio_pci serio_raw virtio_ring virtio

Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs
RIP: 0010:[] []
logfs_segment_write+0x211/0x230 [logfs]
RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202
RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000
RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000
RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000
R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40
R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000
FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 1933, threadinfo ffff880062d7a000,
task ffff880070b44500)
Stack:
ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000
8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460
ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000
Call Trace:
[] logfs_write_i0+0x12e/0x190 [logfs]
[] __logfs_write_rec+0x140/0x220 [logfs]
[] __logfs_write_rec+0xf2/0x220 [logfs]
[] logfs_write_rec+0x64/0xd0 [logfs]
[] __logfs_write_buf+0x106/0x110 [logfs]
[] logfs_write_buf+0x4e/0x80 [logfs]
[] __logfs_write_inode+0x98/0x110 [logfs]
[] logfs_truncate+0x54/0x290 [logfs]
[] logfs_evict_inode+0xdc/0x190 [logfs]
[] evict+0x85/0x170
[] iput+0xe6/0x1b0
[] shrink_dcache_for_umount_subtree+0x218/0x280
[] shrink_dcache_for_umount+0x51/0x90
[] generic_shutdown_super+0x2c/0x100
[] logfs_kill_sb+0x57/0xf0 [logfs]
[] deactivate_locked_super+0x45/0x70
[] deactivate_super+0x4a/0x70
[] mntput_no_expire+0xa4/0xf0
[] sys_umount+0x6f/0x380
[] system_call_fastpath+0x16/0x1b
Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55
b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff
0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66
RIP [] logfs_segment_write+0x211/0x230 [logfs]
RSP
---[ end trace fe6b040cea952290 ]---

Therefore, move super->s_flags setting after the fs-indenpendent work
has been finished.

Reviewed-by: Joern Engel
Signed-off-by: Prasad Joshi

Prasad Joshi
2012-01-28 14:07:47 +0800
13ced29cb logfs: take write mutex lock during fsync and sync ... Browse Code »

LogFS uses super->s_write_mutex while writing data to disk. Taking the
same mutex lock in sync and fsync code path solves the following BUG:

------------[ cut here ]------------
kernel BUG at /home/prasad/logfs/dev_bdev.c:134!

Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs
RIP: 0010:[] []
bdev_writeseg+0x25d/0x270 [logfs]
Call Trace:
[] logfs_open_area+0x91/0x150 [logfs]
[] ? find_level.clone.9+0x62/0x100
[] __logfs_segment_write.clone.20+0x5c/0x190 [logfs]
[] ? mempool_kmalloc+0x15/0x20
[] ? mempool_alloc+0x53/0x130
[] logfs_segment_write+0x1d4/0x230 [logfs]
[] logfs_write_i0+0x12e/0x190 [logfs]
[] __logfs_write_rec+0x140/0x220 [logfs]
[] logfs_write_rec+0x64/0xd0 [logfs]
[] __logfs_write_buf+0x106/0x110 [logfs]
[] logfs_write_buf+0x4e/0x80 [logfs]
[] __logfs_writepage+0x23/0x80 [logfs]
[] logfs_writepage+0xdc/0x110 [logfs]
[] __writepage+0x17/0x40
[] write_cache_pages+0x208/0x4f0
[] ? set_page_dirty+0x70/0x70
[] generic_writepages+0x4a/0x70
[] do_writepages+0x21/0x40
[] writeback_single_inode+0x101/0x250
[] writeback_sb_inodes+0xed/0x1c0
[] writeback_inodes_wb+0x7b/0x1e0
[] wb_writeback+0x4c3/0x530
[] ? sub_preempt_count+0x9d/0xd0
[] wb_do_writeback+0xdb/0x290
[] ? sub_preempt_count+0x9d/0xd0
[] ? _raw_spin_unlock_irqrestore+0x18/0x40
[] ? del_timer+0x8a/0x120
[] bdi_writeback_thread+0x8c/0x2e0
[] ? wb_do_writeback+0x290/0x290
[] kthread+0x96/0xa0
[] kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x190/0x190
[] ? gs_change+0xb/0xb
RIP [] bdev_writeseg+0x25d/0x270 [logfs]
---[ end trace 0211ad60a57657c4 ]---

Reviewed-by: Joern Engel
Signed-off-by: Prasad Joshi

Prasad Joshi
2012-01-28 14:06:06 +0800
934eed395 logfs: Prevent memory corruption ... Browse Code »

This is a bad one. I wonder whether we were so far protected by
no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.

Found by Dan Carpenter using smatch.

Signed-off-by: Joern Engel
Signed-off-by: Prasad Joshi

Joern Engel
2012-01-28 13:54:21 +0800
96150606e logfs: update page reference count for pined pages ... Browse Code »

LogFS sets PG_private flag to indicate a pined page. We assumed that
marking a page as private is enough to ensure its existence. But
instead it is necessary to hold a reference count to the page.

The change resolves the following BUG

BUG: Bad page state in process flush-253:16 pfn:6a6d0
page flags: 0x100000000000808(uptodate|private)

Suggested-and-Acked-by: Joern Engel
Signed-off-by: Prasad Joshi

Prasad Joshi
2012-01-28 13:53:10 +0800

27 Jan, 2012

11 commits

9998eb703 Btrfs: fix reservations in btrfs_page_mkwrite ... Browse Code »

Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.

This makes sure we only release if we were able to reserve.

Signed-off-by: Chris Mason

Chris Mason
2012-01-27 23:44:44 +0800
9b2306284 Btrfs: advance window_start if we're using a bitmap ... Browse Code »

If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:12 +0800
0c4e538bc btrfs: mask out gfp flags in releasepage ... Browse Code »

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

[] cache_alloc_refill+0x3b4/0x5c8
[] kmem_cache_alloc+0x204/0x294
[] mempool_alloc+0x52/0x170
[] alloc_extent_state+0x40/0xd4 [btrfs]
[] __clear_extent_bit+0x38a/0x4cc [btrfs]
[] try_release_extent_state+0x9c/0xd4 [btrfs]
[] btree_releasepage+0x7e/0xd0 [btrfs]
[] shrink_page_list+0x6a0/0x724
[] shrink_inactive_list+0x230/0x578
[] shrink_list+0x6c/0x120
[] shrink_zone+0x1e2/0x228
[] shrink_zones+0x90/0x254
[] do_try_to_free_pages+0xac/0x420
[] try_to_free_pages+0x13c/0x1b0
[] __alloc_pages_nodemask+0x5b4/0x9a8
[] grab_cache_page_write_begin+0x7e/0xe8

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2012-01-27 04:01:12 +0800
9e622d6be Btrfs: fix enospc error caused by wrong checks of the chunk ... Browse Code »

When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.

Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
# mount /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run

The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.

One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.

The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-01-27 04:01:12 +0800
7ec31b548 Btrfs: do not defrag a file partially ... Browse Code »

xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2

To fix this, we need to set max_to_defrag count properly.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-01-27 04:01:12 +0800
0b485143d Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c ... Browse Code »

There have been 4 warnings on 32-bit build, they are herewith fixed.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-01-27 04:01:11 +0800
0b4a9d248 Btrfs: use cluster->window_start when allocating from a cluster bitmap ... Browse Code »

We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
8bedd51b6 Btrfs: Check for NULL page in extent_range_uptodate ... Browse Code »

A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).

After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c

for (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}

This patch leaves the issue with the locked page yet to be resolved.

Signed-off-by: Mitch Harder
Signed-off-by: Chris Mason

Mitch Harder
2012-01-27 04:01:11 +0800
6dd70ce4e btrfs: Fix busyloops in transaction waiting code ... Browse Code »

wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...

Signed-off-by: Jan Kara
Signed-off-by: Chris Mason

Jan Kara
2012-01-27 04:01:11 +0800
357b9784b Btrfs: make sure a bitmap has enough bytes ... Browse Code »

We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
b1375d64c Btrfs: fix uninit warning in backref.c ... Browse Code »

Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2012-01-27 04:01:11 +0800

26 Jan, 2012

1 commit

aaad641ea Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Quoth Ben Myers:
"Please pull in the following bugfix for xfs. We forgot to drop a lock on
error in xfs_readlink. It hasn't been through -next yet, but there is no
-next tree tomorrow. The fix is clear so I'm sending this request today."

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()

Linus Torvalds
2012-01-26 07:36:44 +0800