Eric Lee / smarc-fsl-linux-kernel

11 Jan, 2012

2 commits

37c69b98d reiserfs: don't lock journal_init() ... Browse Code »

journal_init() doesn't need the lock since no operation on the filesystem
is involved there. journal_read() and get_list_bitmap() have yet to be
reviewed carefully though before removing the lock there. Just keep the
it around these two calls for safety.

Signed-off-by: Frederic Weisbecker
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Jeff Mahoney
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Frederic Weisbecker
2012-01-11 08:30:53 +0800
b18c1c6e0 reiserfs: delete comments referring to the BKL ... Browse Code »

Signed-off-by: Davidlohr Bueso
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2012-01-11 08:30:53 +0800

15 Sep, 2011

1 commit

558feb081 fs: Convert vmalloc/memset to vzalloc ... Browse Code »

Signed-off-by: Joe Perches
Acked-by: Alex Elder
Signed-off-by: Jiri Kosina

Joe Perches
2011-09-15 19:56:28 +0800

12 Jul, 2011

1 commit

4aede84b3 fixlet: Remove fs_excl from struct task. ... Browse Code »

fs_excl is a poor man's priority inheritance for filesystems to hint to
the block layer that an operation is important. It was never clearly
specified, not widely adopted, and will not prevent starvation in many
cases (like across cgroups).

fs_excl was introduced with the time sliced CFQ IO scheduler, to
indicate when a process held FS exclusive resources and thus needed
a boost.

It doesn't cover all file systems, and it was never fully complete.
Lets kill it.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-07-12 14:35:10 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

01 Feb, 2011

1 commit

28aadf516 reiserfs: make commit_wq use the default concurrency level ... Browse Code »

The maximum number of concurrent work items queued on commit_wq is
bound by the number of active journals. Convert to alloc_workqueue()
and use the default concurrency level so that they can be processed in
parallel.

Signed-off-by: Tejun Heo
Cc: reiserfs-devel@vger.kernel.org

Tejun Heo
2011-02-01 18:42:42 +0800

14 Jan, 2011

1 commit

275220f0f Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...

Linus Torvalds
2011-01-14 02:45:01 +0800

18 Nov, 2010

1 commit

451a3c24b BKL: remove extraneous #include <smp_lock.h> ... Browse Code »

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-11-18 00:59:32 +0800

13 Nov, 2010

2 commits

d4d776299 block: clean up blkdev_get() wrappers and their users ... Browse Code »

After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.

All users are converted. Most conversions are mechanical and don't
introduce any behavior difference. There are several exceptions.

* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
reason to OR it explicitly on blkdev_put().

* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
sb->s_mode.

* With the above changes, sb->s_mode now always should contain
FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
errors.

The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.

Signed-off-by: Tejun Heo
Cc: Philipp Reisner
Cc: Neil Brown
Cc: Mike Snitzer
Cc: Joern Engel
Cc: Chris Mason
Cc: Jan Kara
Cc: "Theodore Ts'o"
Cc: KONISHI Ryusuke
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro

Tejun Heo
2010-11-13 18:55:18 +0800
e525fd89d block: make blkdev_get/put() handle exclusive access ... Browse Code »
43

Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.

* blkdev_get/put() are the primary open and close functions.

* bd_claim/release() deal with exclusive open.

* open/close_bdev_exclusive() are combination of open and claim and
the other way around, respectively.

* bd_link/unlink_disk_holder() to create and remove holder/slave
symlinks.

* open_by_devnum() wraps bdget() + blkdev_get().

The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access. Reorganize the interface such that,

* blkdev_get() is extended to include exclusive access management.
@holder argument is added and, if is @FMODE_EXCL specified, it will
gain exclusive access atomically w.r.t. other exclusive accesses.

* blkdev_put() is similarly extended. It now takes @mode argument and
if @FMODE_EXCL is set, it releases an exclusive access. Also, when
the last exclusive claim is released, the holder/slave symlinks are
removed automatically.

* bd_claim/release() and close_bdev_exclusive() are no longer
necessary and either made static or removed.

* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
is no longer necessary and removed.

* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
and blkdev_get(). It also has an unexpected extra bdev_read_only()
test which probably should be moved into blkdev_get().

* open_by_devnum() is modified to take @holder argument and pass it to
blkdev_get().

Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should). This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.

open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features. Well, let's leave them for another day.

Most conversions are straight-forward. drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.

Signed-off-by: Tejun Heo
Acked-by: Neil Brown
Acked-by: Ryusuke Konishi
Acked-by: Mike Snitzer
Acked-by: Philipp Reisner
Cc: Peter Osterlund
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Jan Kara
Cc: Andrew Morton
Cc: Andreas Dilger
Cc: "Theodore Ts'o"
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Alex Elder
Cc: Christoph Hellwig
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen
Cc: Scott Branden
Cc: Chris Mason
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro

Tejun Heo
2010-11-13 18:55:17 +0800

10 Sep, 2010

1 commit

7cd33ad23 reiserfs: replace barriers with explicit flush / FUA usage ... Browse Code »

Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP
detection for barriers. Note that reiserfs had a fairly different code
path for barriers before as it wa the only filesystem actually making use
of them. The new code always uses the old non-barrier codepath and just
sets the WRITE_FLUSH_FUA explicitly for the journal commits.

Signed-off-by: Christoph Hellwig
Acked-by: Jan Kara
Acked-by: Chris Mason
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-09-10 18:35:39 +0800

18 Aug, 2010

1 commit

9cb569d60 remove SWRITE* I/O types ... Browse Code »

These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-18 13:09:01 +0800

11 Aug, 2010

1 commit

b3397ad54 reiserfs: remove unused local `wait' ... Browse Code »

Signed-off-by: Changli Gao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changli Gao
2010-08-11 23:59:12 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

25 Mar, 2010

1 commit

3f8b5ee33 reiserfs: properly honor read-only devices ... Browse Code »

The reiserfs journal behaves inconsistently when determining whether to
allow a mount of a read-only device.

This is due to the use of the continue_replay variable to short circuit
the journal scanning. If it's set, it's assumed that there are
transactions to replay, but there may not be. If it's unset, it's assumed
that there aren't any, and that may not be the case either.

I've observed two failure cases:
1) Where a clean file system on a read-only device refuses to mount
2) Where a clean file system on a read-only device passes the
optimization and then tries writing the journal header to update
the latest mount id.

The former is easily observable by using a freshly created file system on
a read-only loopback device.

This patch moves the check into journal_read_transaction, where it can
bail out before it's about to replay a transaction. That way it can go
through and skip transactions where appropriate, yet still refuse to mount
a file system with outstanding transactions.

Signed-off-by: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Mahoney
2010-03-25 07:31:21 +0800

28 Jan, 2010

1 commit

bbec91915 reiserfs: Fix vmalloc call under reiserfs lock ... Browse Code »

Vmalloc is called to allocate journal->j_cnode_free_list but
we hold the reiserfs lock at this time, which raises a
{RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion.

Just drop the reiserfs lock at this time, as it's not even
needed but kept for paranoid reasons.

This fixes:

[ INFO: inconsistent lock state ]
2.6.33-rc5 #1
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&REISERFS_SB(s)->lock){+.+.?.}, at: []
reiserfs_write_lock_once+0x28/0x50
{RECLAIM_FS-ON-W} state was registered at:
[] mark_held_locks+0x62/0x90
[] lockdep_trace_alloc+0x9a/0xc0
[] kmem_cache_alloc+0x26/0xf0
[] __get_vm_area_node+0x6c/0xf0
[] __vmalloc_node+0x7e/0xa0
[] vmalloc+0x2b/0x30
[] journal_init+0x6cb/0xa10
[] reiserfs_fill_super+0x342/0xb80
[] get_sb_bdev+0x145/0x180
[] get_super_block+0x21/0x30
[] vfs_kern_mount+0x40/0xd0
[] do_kern_mount+0x39/0xd0
[] do_mount+0x2c7/0x6d0
[] sys_mount+0x66/0xa0
[] mount_block_root+0xc4/0x245
[] mount_root+0x59/0x5f
[] prepare_namespace+0x111/0x14b
[] kernel_init+0xcf/0xdb
[] kernel_thread_helper+0x6/0x1c
irq event stamp: 63236801
hardirqs last enabled at (63236801): []
__mutex_unlock_slowpath+0x9a/0x120
hardirqs last disabled at (63236800): []
__mutex_unlock_slowpath+0x39/0x120
softirqs last enabled at (63218800): [] __do_softirq+0xc1/0x110
softirqs last disabled at (63218789): [] do_softirq+0x4d/0x60

other info that might help us debug this:
2 locks held by kswapd0/313:
#0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x24/0x170
#1: (&type->s_umount_key#19){++++..}, at: []
shrink_dcache_memory+0xfd/0x1a0

stack backtrace:
Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1
Call Trace:
[] ? printk+0x18/0x1c
[] print_usage_bug+0x15f/0x1a0
[] mark_lock+0x39f/0x5a0
[] ? trace_hardirqs_off+0xb/0x10
[] ? check_usage_forwards+0x0/0xf0
[] __lock_acquire+0x214/0xa70
[] ? sched_clock_cpu+0x95/0x110
[] lock_acquire+0x7a/0xa0
[] ? reiserfs_write_lock_once+0x28/0x50
[] mutex_lock_nested+0x5f/0x2b0
[] ? reiserfs_write_lock_once+0x28/0x50
[] ? reiserfs_write_lock_once+0x28/0x50
[] reiserfs_write_lock_once+0x28/0x50
[] reiserfs_delete_inode+0x50/0x140
[] ? generic_delete_inode+0x5f/0x150
[] ? reiserfs_delete_inode+0x0/0x140
[] generic_delete_inode+0x9c/0x150
[] generic_drop_inode+0x3d/0x60
[] iput+0x47/0x50
[] dentry_iput+0x6f/0xf0
[] d_kill+0x24/0x50
[] __shrink_dcache_sb+0x21d/0x2b0
[] shrink_dcache_memory+0x12f/0x1a0
[] shrink_slab+0x10e/0x170
[] kswapd+0x477/0x6a0
[] ? isolate_pages_global+0x0/0x1b0
[] ? autoremove_wake_function+0x0/0x40
[] ? kswapd+0x0/0x6a0
[] kthread+0x6c/0x80
[] ? kthread+0x0/0x80
[] kernel_thread_helper+0x6/0x1c

Reported-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Christian Kujau
Cc: Chris Mason

Frederic Weisbecker
2010-01-28 20:43:50 +0800

02 Jan, 2010

1 commit

0523676d3 reiserfs: Relax reiserfs lock while freeing the journal ... Browse Code »

Keeping the reiserfs lock while freeing the journal on
umount path triggers a lock inversion between bdev->bd_mutex
and the reiserfs lock.

We don't need the reiserfs lock at this stage. The filesystem
is not usable anymore, and there are no more pending commits,
everything got flushed (even this operation was done in parallel
and didn't required the reiserfs lock from the current process).

This fixes the following lockdep report:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #172
-------------------------------------------------------
umount/3904 is trying to acquire lock:
(&bdev->bd_mutex){+.+.+.}, at: [] __blkdev_put+0x22/0x160

but task is already holding lock:
(&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x29/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&REISERFS_SB(s)->lock){+.+.+.}:
[] __lock_acquire+0x11ff/0x19e0
[] lock_acquire+0x68/0x90
[] mutex_lock_nested+0x5b/0x340
[] reiserfs_write_lock_once+0x29/0x50
[] reiserfs_get_block+0x85/0x1620
[] do_mpage_readpage+0x1f0/0x6d0
[] mpage_readpages+0xc0/0x100
[] reiserfs_readpages+0x19/0x20
[] __do_page_cache_readahead+0x1bc/0x260
[] ra_submit+0x28/0x40
[] filemap_fault+0x40e/0x420
[] __do_fault+0x3d/0x430
[] handle_mm_fault+0x12e/0x790
[] do_page_fault+0x135/0x330
[] error_code+0x6b/0x70
[] load_elf_binary+0x82a/0x1a10
[] search_binary_handler+0x90/0x1d0
[] do_execve+0x1df/0x250
[] sys_execve+0x46/0x70
[] syscall_call+0x7/0xb

-> #2 (&mm->mmap_sem){++++++}:
[] __lock_acquire+0x11ff/0x19e0
[] lock_acquire+0x68/0x90
[] might_fault+0x8b/0xb0
[] copy_to_user+0x32/0x70
[] filldir64+0xa4/0xf0
[] sysfs_readdir+0x116/0x210
[] vfs_readdir+0x8d/0xb0
[] sys_getdents64+0x69/0xb0
[] sysenter_do_call+0x12/0x32

-> #1 (sysfs_mutex){+.+.+.}:
[] __lock_acquire+0x11ff/0x19e0
[] lock_acquire+0x68/0x90
[] mutex_lock_nested+0x5b/0x340
[] sysfs_addrm_start+0x2c/0xb0
[] create_dir+0x40/0x90
[] sysfs_create_dir+0x2b/0x50
[] kobject_add_internal+0xc2/0x1b0
[] kobject_add_varg+0x31/0x50
[] kobject_add+0x2c/0x60
[] device_add+0x94/0x560
[] add_partition+0x18a/0x2a0
[] rescan_partitions+0x33a/0x450
[] __blkdev_get+0x12f/0x2d0
[] blkdev_get+0xa/0x10
[] register_disk+0x108/0x130
[] add_disk+0xd9/0x130
[] sd_probe_async+0x105/0x1d0
[] async_thread+0xcf/0x230
[] kthread+0x74/0x80
[] kernel_thread_helper+0x7/0x3c

-> #0 (&bdev->bd_mutex){+.+.+.}:
[] __lock_acquire+0x18f6/0x19e0
[] lock_acquire+0x68/0x90
[] mutex_lock_nested+0x5b/0x340
[] __blkdev_put+0x22/0x160
[] blkdev_put+0xa/0x10
[] free_journal_ram+0xd2/0x130
[] do_journal_release+0x98/0x190
[] journal_release+0xa/0x10
[] reiserfs_put_super+0x36/0x130
[] generic_shutdown_super+0x4f/0xe0
[] kill_block_super+0x25/0x40
[] reiserfs_kill_sb+0x7f/0x90
[] deactivate_super+0x7a/0x90
[] mntput_no_expire+0x98/0xd0
[] sys_umount+0x4c/0x310
[] sys_oldumount+0x19/0x20
[] sysenter_do_call+0x12/0x32

other info that might help us debug this:

2 locks held by umount/3904:
#0: (&type->s_umount_key#30){+++++.}, at: [] deactivate_super+0x75/0x90
#1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x29/0x40

stack backtrace:
Pid: 3904, comm: umount Not tainted 2.6.32-atom #172
Call Trace:
[] ? printk+0x18/0x1a
[] print_circular_bug+0xca/0xd0
[] __lock_acquire+0x18f6/0x19e0
[] ? free_pcppages_bulk+0x1f/0x250
[] lock_acquire+0x68/0x90
[] ? __blkdev_put+0x22/0x160
[] ? __blkdev_put+0x22/0x160
[] mutex_lock_nested+0x5b/0x340
[] ? __blkdev_put+0x22/0x160
[] ? mark_held_locks+0x62/0x80
[] ? kfree+0x92/0xd0
[] __blkdev_put+0x22/0x160
[] ? trace_hardirqs_on+0xb/0x10
[] blkdev_put+0xa/0x10
[] free_journal_ram+0xd2/0x130
[] do_journal_release+0x98/0x190
[] journal_release+0xa/0x10
[] reiserfs_put_super+0x36/0x130
[] ? up_write+0x16/0x30
[] generic_shutdown_super+0x4f/0xe0
[] kill_block_super+0x25/0x40
[] ? vfs_quota_off+0x0/0x20
[] reiserfs_kill_sb+0x7f/0x90
[] deactivate_super+0x7a/0x90
[] mntput_no_expire+0x98/0xd0
[] sys_umount+0x4c/0x310
[] sys_oldumount+0x19/0x20
[] sysenter_do_call+0x12/0x32

Signed-off-by: Frederic Weisbecker
Cc: Alexander Beregalov
Cc: Chris Mason
Cc: Ingo Molnar

Frederic Weisbecker
2010-01-02 08:56:54 +0800

30 Dec, 2009

1 commit

98ea3f50b reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion ... Browse Code »

Commit 500f5a0bf5f0624dae34307010e240ec090e4cde
(reiserfs: Fix possible recursive lock) fixed a vmalloc under reiserfs
lock that triggered a lockdep warning because of a
IN-FS-RECLAIM RECLAIM-FS-ON locking dependency inversion.

But this patch has ommitted another vmalloc call in the same path
that allocates the journal. Relax the lock for this one too.

Reported-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Chris Mason
Cc: Ingo Molnar

Frederic Weisbecker
2009-12-30 05:34:59 +0800

05 Oct, 2009

1 commit

48f6ba5e6 kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency ... Browse Code »

While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x72/0xc0

but task is already holding lock:
(sysfs_mutex){+.+.+.}, at: [] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
[...]

-> #4 (slub_lock){+++++.}:
[...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
[...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
[...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[...]

-> #0 (&mm->mmap_sem){++++++}:
[...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.

Reported-by: Alexander Beregalov
Tested-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-10-05 22:31:37 +0800

17 Sep, 2009

1 commit

193be0ee1 kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency ... Browse Code »

Alexander Beregalov reported the following warning:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-03149-gdcc030a #1
-------------------------------------------------------
udevadm/716 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x4a/0xa0

but task is already holding lock:
(sysfs_mutex){+.+.+.}, at: [] sysfs_readdir+0x5a/0x200

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (sysfs_mutex){+.+.+.}:
[...]

-> #2 (&bdev->bd_mutex){+.+.+.}:
[...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[...]

-> #0 (&mm->mmap_sem){++++++}:
[...]

On reiserfs mount path, we take the reiserfs lock and while
initializing the journal, we open the device, taking the
bdev->bd_mutex. Then rescan_partition() may signal the change
to sysfs.

We have then the following dependency:

reiserfs_lock -> bd_mutex -> sysfs_mutex

Later, while entering reiserfs_readpage() after a pagefault in an
mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
to take the reiserfs lock too.
We have then the following dependency:

mm->mmap_sem -> reiserfs_lock

which, expanded with the previous dependency gives us:

mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex

Now while entering the sysfs readdir path, we are holding the
sysfs_mutex. And when we copy a directory entry to the user buffer, we
might fault and then take the mm->mmap_sem lock. Which leads to the
circular locking dependency reported.

We can fix that by relaxing the reiserfs lock during the call to
journal_init_dev(), which is the place where we open the mounted
device.

This is fine to relax the lock here because we are in the begining of
the reiserfs mount path and there is nothing to protect at this time,
the journal is not intialized.
We just keep this lock around for paranoid reasons.

Reported-by: Alexander Beregalov
Tested-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-17 11:31:37 +0800

14 Sep, 2009

6 commits

c72e05756 kill-the-bkl/reiserfs: acquire the inode mutex safely ... Browse Code »

While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.

On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).

So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.

As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.

[ Impact: fix a possible deadlock with reiserfs ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:24 +0800
c63e3c0b2 kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe ... Browse Code »

reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
an internal reiserfs mutex and the write lock, it has been proposed
to follow the old bkl logic.

The code does the following:

while (!mutex_trylock(m)) {
reiserfs_write_unlock(s);
schedule();
reiserfs_write_lock(s);
}

It then imitate the implicit behaviour of the lock when it was
a Bkl and hadn't such dependency:

mutex_lock(m) {
if (fastpath)
let's go
else {
wait_for_mutex() {
schedule() {
unlock_kernel()
reacquire_lock_kernel()
}
}
}
}

The problem is that by using such explicit schedule(), we don't
benefit of the adaptive mutex spinning on owner.

The logic in use now is:

reiserfs_write_unlock(s);
mutex_lock(m); // -> possible adaptive spinning
reiserfs_write_lock(s);

[ Impact: restore the use of adaptive spinning mutexes in reiserfs ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:21 +0800
6e3647acb kill-the-BKL/reiserfs: release the write lock on flush_commit_list() ... Browse Code »

flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
ll_rw_block() might sleep, and the bkl was released at this point. Then
we can also relax the write lock at this point.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:13 +0800
e6950a4da kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() ... Browse Code »

When do_journal_end() copies data to the journal blocks buffers in memory,
it reschedules if needed between each block copied and dirtyfied.

We can also release the write lock at this rescheduling stage,
like did the bkl implicitly.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:08 +0800
a412f9efd reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock ... Browse Code »

Impact: fix a deadlock

The j_flush_mutex is acquired safely in journal.c:
if we can't take it, we free the reiserfs per superblock lock
and wait a bit.

But we have a remaining place in kupdate_transactions() where
j_flush_mutex is still acquired traditionnaly. Thus the following
scenario (warned by lockdep) can happen:

A B

mutex_lock(&write_lock) mutex_lock(&write_lock)
mutex_lock(&j_flush_mutex) mutex_lock(&j_flush_mutex) //block
mutex_unlock(&write_lock)
sleep...
mutex_lock(&write_lock) //deadlock

Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().

Signed-off-by: Frederic Weisbecker
Cc: Alessio Igor Bogani
Cc: Jeff Mahoney
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:18:01 +0800
8ebc42323 reiserfs: kill-the-BKL ... Browse Code »

This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
of the code.
- We need to identify the sites where the bkl was implictly relaxed
(schedule, wait, sync, etc...) so that we can in turn release and
reacquire our new lock explicitly.
Such implicit releases of the lock are often required to let other
resources producer/consumer do their job or we can suffer unexpected
starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

T1 T2

lock_kernel()
mutex_lock(A)
unlock_kernel()
// do something
lock_kernel()
mutex_lock(A) -> already locked by T1
schedule() (and then unlock_kernel())
lock_kernel()
mutex_unlock(A)
....

This is not fine with a mutex:

T1 T2

mutex_lock(write)
mutex_lock(A)
mutex_unlock(write)
// do something
mutex_lock(write)
mutex_lock(A) -> already locked by T1
schedule()

mutex_lock(write) -> already locked by T2
deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.

Signed-off-by: Frederic Weisbecker
Reviewed-by: Ingo Molnar
Cc: Jeff Mahoney
Cc: Peter Zijlstra
Cc: Bron Gondwana
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Alexander Viro
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:17:59 +0800

11 Jul, 2009

1 commit

8aa7e847d Fix congestion_wait() sync/async vs read/write confusion ... Browse Code »

Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe

Jens Axboe
2009-07-11 02:31:53 +0800

31 Mar, 2009

6 commits

a9dd36435 reiserfs: rename p_s_sb to sb ... Browse Code »

This patch is a simple s/p_s_sb/sb/g to the reiserfs code. This is the
first in a series of patches to rip out some of the awful variable
naming in reiserfs.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:39 +0800
0222e6571 reiserfs: strip trailing whitespace ... Browse Code »

This patch strips trailing whitespace from the reiserfs code.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:39 +0800
32e8b1062 reiserfs: rearrange journal abort ... Browse Code »

This patch kills off reiserfs_journal_abort as it is never called, and
combines __reiserfs_journal_abort_{soft,hard} into one function called
reiserfs_abort_journal, which performs the same work. It is silent
as opposed to the old version, since the message was always issued
after a regular 'abort' message.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:36 +0800
c3a9c2109 reiserfs: rework reiserfs_panic ... Browse Code »

ReiserFS panics can be somewhat inconsistent.
In some cases:
* a unique identifier may be associated with it
* the function name may be included
* the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:36 +0800
45b03d5e8 reiserfs: rework reiserfs_warning ... Browse Code »

ReiserFS warnings can be somewhat inconsistent.
In some cases:
* a unique identifier may be associated with it
* the function name may be included
* the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:36 +0800
600ed4167 reiserfs: audit transaction ids to always be unsigned ints ... Browse Code »

This patch fixes up the reiserfs code such that transaction ids are
always unsigned ints. In places they can currently be signed ints or
unsigned longs.

The former just causes an annoying clm-2200 warning and may join a
transaction when it should wait.

The latter is just for correctness since the disk format uses a 32-bit
transaction id. There aren't any runtime problems that result from it
not wrapping at the correct location since the value is truncated
correctly even on big endian systems. The 0 value might make it to
disk, but the mount-time checks will bump it to 10 itself.

Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-03-31 03:16:35 +0800

21 Oct, 2008

4 commits

e5eb8caa8 [PATCH] remember mode of reiserfs journal ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:49:04 +0800
30c40d2c0 [PATCH] propagate mode through open_bdev_excl/close_bdev_excl ... Browse Code »

replace open_bdev_excl/close_bdev_excl with variants taking fmode_t.
superblock gets the value used to mount it stored in sb->s_mode

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:49:00 +0800
9a1c35427 [PATCH] pass fmode_t to blkdev_put() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:48:58 +0800
aeb5d7270 [PATCH] introduce fmode_t, do annotations ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:06 +0800

05 Aug, 2008

2 commits

ca5de404f fs: rename buffer trylock ... Browse Code »

Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2008-08-05 12:56:09 +0800
529ae9aaa mm: rename page trylock ... Browse Code »

Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds

Nick Piggin
2008-08-05 12:31:34 +0800

26 Jul, 2008

1 commit

90415deac reiserfs: convert j_commit_lock to mutex ... Browse Code »

j_commit_lock is a semaphore but uses it as if it were a mutex. This patch
converts it to a mutex.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Mahoney
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Edward Shishkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Mahoney
2008-07-26 01:53:33 +0800