Eric Lee / smarc-fsl-linux-kernel

18 Dec, 2009

2 commits

ec8e2f746 reiserfs: truncate blocks not used by a write ... Browse Code »

It can happen that write does not use all the blocks allocated in
write_begin either because of some filesystem error (like ENOSPC) or
because page with data to write has been removed from memory. We truncate
these blocks so that we don't have dangling blocks beyond i_size.

Cc: Jeff Mahoney
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-12-18 07:45:30 +0800
b6e3224fb Revert "task_struct: make journal_info conditional" ... Browse Code »

This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as
requested by Alexey:

"I think I gave a good enough arguments to not merge it.
To iterate:
* patch makes impossible to start using ext3 on EXT3_FS=n kernels
without reboot.
* this is done only for one pointer on task_struct"

None of config options which define task_struct are tristate directly
or effectively."

Requested-by: Alexey Dobriyan
Acked-by: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-12-18 05:23:24 +0800

17 Dec, 2009

2 commits

bac5e54c2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
direct I/O fallback sync simplification
ocfs: stop using do_sync_mapping_range
cleanup blockdev_direct_IO locking
make generic_acl slightly more generic
sanitize xattr handler prototypes
libfs: move EXPORT_SYMBOL for d_alloc_name
vfs: force reval of target when following LAST_BIND symlinks (try #7)
ima: limit imbalance msg
Untangling ima mess, part 3: kill dead code in ima
Untangling ima mess, part 2: deal with counters
Untangling ima mess, part 1: alloc_file()
O_TRUNC open shouldn't fail after file truncation
ima: call ima_inode_free ima_inode_free
IMA: clean up the IMA counts updating code
ima: only insert at inode creation time
ima: valid return code from ima_inode_alloc
fs: move get_empty_filp() deffinition to internal.h
Sanitize exec_permission_lite()
Kill cached_lookup() and real_lookup()
Kill path_lookup_open()
...

Trivial conflicts in fs/direct-io.c

Linus Torvalds
2009-12-17 04:04:02 +0800
431547b3c sanitize xattr handler prototypes ... Browse Code »

Add a flags argument to struct xattr_handler and pass it to all xattr
handler methods. This allows using the same methods for multiple
handlers, e.g. for the ACL methods which perform exactly the same action
for the access and default ACLs, just using a different underlying
attribute. With a little more groundwork it'll also allow sharing the
methods for the regular user/trusted/secure handlers in extN, ocfs2 and
jffs2 like it's already done for xfs in this patch.

Also change the inode argument to the handlers to a dentry to allow
using the handlers mechnism for filesystems that require it later,
e.g. cifs.

[with GFS2 bits updated by Steven Whitehouse ]

Signed-off-by: Christoph Hellwig
Reviewed-by: James Morris
Acked-by: Joel Becker
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:49 +0800

16 Dec, 2009

3 commits

e3c96f53a reiserfs: don't compile procfs.o at all if no support ... Browse Code »

* small define cleanup in header
* fix #ifdeffery in procfs.c via Kconfig

Signed-off-by: Alexey Dobriyan
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-12-16 23:20:06 +0800
904e81293 reiserfs: remove /proc/fs/reiserfs/version ... Browse Code »

/proc/fs/reiserfs/version is on the way of removing ->read_proc interface.
It's empty however, so simply remove it instead of doing dummy
conversion. It's hard to see what information userspace can extract from
empty file.

Signed-off-by: Alexey Dobriyan
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-12-16 23:20:06 +0800
e4c570c4c task_struct: make journal_info conditional ... Browse Code »

journal_info in task_struct is used in journaling file system only. So
introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

Signed-off-by: Hiroshi Shimamoto
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Steven Whitehouse
Cc: KONISHI Ryusuke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hiroshi Shimamoto
2009-12-16 00:53:27 +0800

10 Dec, 2009

1 commit

4ef58d4e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
tree-wide: fix misspelling of "definition" in comments
reiserfs: fix misspelling of "journaled"
doc: Fix a typo in slub.txt.
inotify: remove superfluous return code check
hdlc: spelling fix in find_pvc() comment
doc: fix regulator docs cut-and-pasteism
mtd: Fix comment in Kconfig
doc: Fix IRQ chip docs
tree-wide: fix assorted typos all over the place
drivers/ata/libata-sff.c: comment spelling fixes
fix typos/grammos in Documentation/edac.txt
sysctl: add missing comments
fs/debugfs/inode.c: fix comment typos
sgivwfb: Make use of ARRAY_SIZE.
sky2: fix sky2_link_down copy/paste comment error
tree-wide: fix typos "couter" -> "counter"
tree-wide: fix typos "offest" -> "offset"
fix kerneldoc for set_irq_msi()
spidev: fix double "of of" in comment
comment typo fix: sybsystem -> subsystem
...

Linus Torvalds
2009-12-10 11:43:33 +0800

09 Dec, 2009

1 commit

a9280fed3 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/frederic/random-tracing

* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: (31 commits)
kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block()
kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0()
kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl()
kill-the-bkl/reiserfs: always lock the ioctl path
kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency
kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency
kill-the-bkl/reiserfs: panic in case of lock imbalance
kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write()
kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir()
kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency
kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock
kill-the-bkl/reiserfs: acquire the inode mutex safely
kill-the-bkl/reiserfs: unlock only when needed in search_by_key
kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe
kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end()
kill-the-bkl/reiserfs: reduce number of contentions in search_by_key()
kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup()
kill-the-bkl/reiserfs: lock only once on reiserfs_get_block()
kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed()
kill-the-BKL/reiserfs: add reiserfs_cond_resched()
...

Linus Torvalds
2009-12-09 23:58:15 +0800

05 Dec, 2009

1 commit

febe29d95 reiserfs: fix misspelling of "journaled" ... Browse Code »

"Journaled" is misspelled "journlaled" in an output string; this patch
fixed it. No changes in functionality.

Signed-off-by: Adam Buchbinder
Signed-off-by: Jiri Kosina

Adam Buchbinder
2009-12-05 06:39:11 +0800

21 Nov, 2009

1 commit

1d2c6cfd4 kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() ... Browse Code »

GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
nobody can modify the tree in the middle of its work. Now that we
kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
here because we already hold the reiserfs lock.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard
Cc: Thomas Gleixner

Frederic Weisbecker
2009-11-21 01:25:02 +0800

15 Oct, 2009

3 commits

27b3a5c51 kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() ... Browse Code »

We had a watchdog in _get_block_create_0() that jumped to a fixup retry
path in case the bkl got relaxed while calling kmap().
This is not necessary anymore since we now have a reiserfs lock that is
not implicitly relaxed while sleeping.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard
Cc: Thomas Gleixner

Frederic Weisbecker
2009-10-15 05:34:31 +0800
205cb37b8 kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() ... Browse Code »

The reiserfs ioctl path doesn't need the big kernel lock anymore , now
that the filesystem synchronizes through its own lock.

We can then turn reiserfs_ioctl() into an unlocked_ioctl callback.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard
Cc: Thomas Gleixner

Frederic Weisbecker
2009-10-15 05:28:12 +0800
ac78a0789 kill-the-bkl/reiserfs: always lock the ioctl path ... Browse Code »

Reiserfs uses the ioctl callback for its file operations, which means
that its ioctl path is still locked by the bkl, this was synchronizing
with the rest of the filsystem operations. We have changed that by
locking it with the new reiserfs lock but we do that only from the
compat_ioctl callback.

Fix that by locking reiserfs_ioctl() everytime.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard
Cc: Thomas Gleixner

Frederic Weisbecker
2009-10-15 05:27:57 +0800

05 Oct, 2009

1 commit

48f6ba5e6 kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency ... Browse Code »

While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x72/0xc0

but task is already holding lock:
(sysfs_mutex){+.+.+.}, at: [] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
[...]

-> #4 (slub_lock){+++++.}:
[...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
[...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
[...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[...]

-> #0 (&mm->mmap_sem){++++++}:
[...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.

Reported-by: Alexander Beregalov
Tested-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-10-05 22:31:37 +0800

22 Sep, 2009

2 commits

0d54b217a const: make struct super_block::s_qcop const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800
61e225dc3 const: make struct super_block::dq_op const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800

17 Sep, 2009

1 commit

193be0ee1 kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency ... Browse Code »

Alexander Beregalov reported the following warning:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-03149-gdcc030a #1
-------------------------------------------------------
udevadm/716 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x4a/0xa0

but task is already holding lock:
(sysfs_mutex){+.+.+.}, at: [] sysfs_readdir+0x5a/0x200

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (sysfs_mutex){+.+.+.}:
[...]

-> #2 (&bdev->bd_mutex){+.+.+.}:
[...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[...]

-> #0 (&mm->mmap_sem){++++++}:
[...]

On reiserfs mount path, we take the reiserfs lock and while
initializing the journal, we open the device, taking the
bdev->bd_mutex. Then rescan_partition() may signal the change
to sysfs.

We have then the following dependency:

reiserfs_lock -> bd_mutex -> sysfs_mutex

Later, while entering reiserfs_readpage() after a pagefault in an
mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
to take the reiserfs lock too.
We have then the following dependency:

mm->mmap_sem -> reiserfs_lock

which, expanded with the previous dependency gives us:

mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex

Now while entering the sysfs readdir path, we are holding the
sysfs_mutex. And when we copy a directory entry to the user buffer, we
might fault and then take the mm->mmap_sem lock. Which leads to the
circular locking dependency reported.

We can fix that by relaxing the reiserfs lock during the call to
journal_init_dev(), which is the place where we open the mounted
device.

This is fine to relax the lock here because we are in the begining of
the reiserfs mount path and there is nothing to protect at this time,
the journal is not intialized.
We just keep this lock around for paranoid reasons.

Reported-by: Alexander Beregalov
Tested-by: Alexander Beregalov
Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-17 11:31:37 +0800

14 Sep, 2009

22 commits

805031859 kill-the-bkl/reiserfs: panic in case of lock imbalance ... Browse Code »

Until now, trying to unlock the reiserfs write lock whereas the current
task doesn't hold it lead to a simple warning.
We should actually warn and panic in this case to avoid the user datas
to reach an unstable state.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-14 13:18:30 +0800
7e9427705 kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() ... Browse Code »

reiserfs_commit_write() is always called with the write lock held.
Thus the current calls to reiserfs_write_lock() in this function are
acquiring the lock recursively.
We can safely drop them.

This also solves further assumptions for this lock to be really
released while calling reiserfs_write_unlock().

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-14 13:18:29 +0800
b10ab4c33 kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() ... Browse Code »

reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called
from the dir inodes callbacks, without the lock held.

But it can also be called from other internal sites such as
reiserfs_xattr_init() which already holds the lock. This recursive
locking leads to further wrong assumptions. For example, later calls
to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock
the time we acquire a given mutex, creating unexpected lock inversions.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-14 13:18:27 +0800
ae635c0bb kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency ... Browse Code »

reiserfs_xattr_init is called with the reiserfs write lock held, but
if the ".reiserfs_priv" entry is not created, we take the superblock
root directory inode mutex until .reiserfs_priv is created.

This creates a lock dependency inversion against other sites such as
reiserfs_file_release() which takes an inode mutex and the reiserfs
lock after.

Signed-off-by: Frederic Weisbecker
Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Cc: Laurent Riffard

Frederic Weisbecker
2009-09-14 13:18:26 +0800
08f14fc89 kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock ... Browse Code »

When do_balance() balances the tree, a trick is performed to
provide the ability for other tree writers/readers to check whether
do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).

This is done to protect concurrent accesses to the tree. The trick
is the following:

When do_balance is called, a unique global variable called cur_tb
takes a pointer to the current tree to be rebalanced.
Once do_balance finishes its work, cur_tb takes the NULL value.

Then, concurrent tree readers/writers just have to check the value
of cur_tb to ensure do_balance isn't executing concurrently.
If it is, then it proves that schedule() occured on do_balance(),
which then relaxed the bkl that protected the tree.

Now that the bkl has be turned into a mutex, this check is still
fine even though do_balance() becomes preemptible: the write lock
will not be automatically released on schedule(), so the tree is
still protected.

But this is only fine if we have a single reiserfs mountpoint.
Indeed, because the bkl is a global lock, it didn't allowed
concurrent executions between a tree reader/writer in a mount point
and a do_balance() on another tree from another mountpoint.

So assuming all these readers/writers weren't supposed to be
reentrant, the current check now sometimes detect false positives with
the current per-superblock mutex which allows this reentrancy.

This patch keeps the concurrent tree accesses check but moves it
per superblock, so that only trees from a same mount point are
checked to be not accessed concurrently.

[ Impact: fix spurious panic while running several reiserfs mount-points ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:25 +0800
c72e05756 kill-the-bkl/reiserfs: acquire the inode mutex safely ... Browse Code »

While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.

On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).

So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.

As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.

[ Impact: fix a possible deadlock with reiserfs ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:24 +0800
2ac626955 kill-the-bkl/reiserfs: unlock only when needed in search_by_key ... Browse Code »

search_by_key() is the site which most requires the lock.
This is mostly because it is a very central function and also
because it releases/reaqcuires the write lock at least once each
time it is called.

Such release/reacquire creates a lot of contention in this place and
also opens more the window which let another thread changing the tree.
When it happens, the current path searching over the tree must be
retried from the beggining (the root) which is a wasteful and
time consuming recovery.

This patch factorizes two release/reacquire sequences:

- reading leaf nodes blocks
- reading current block

The latter immediately follows the former.

The whole sequence is safe as a single unlocked section because
we check just after if the tree has changed during these operations.

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:22 +0800
c63e3c0b2 kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe ... Browse Code »

reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
an internal reiserfs mutex and the write lock, it has been proposed
to follow the old bkl logic.

The code does the following:

while (!mutex_trylock(m)) {
reiserfs_write_unlock(s);
schedule();
reiserfs_write_lock(s);
}

It then imitate the implicit behaviour of the lock when it was
a Bkl and hadn't such dependency:

mutex_lock(m) {
if (fastpath)
let's go
else {
wait_for_mutex() {
schedule() {
unlock_kernel()
reacquire_lock_kernel()
}
}
}
}

The problem is that by using such explicit schedule(), we don't
benefit of the adaptive mutex spinning on owner.

The logic in use now is:

reiserfs_write_unlock(s);
mutex_lock(m); // -> possible adaptive spinning
reiserfs_write_lock(s);

[ Impact: restore the use of adaptive spinning mutexes in reiserfs ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:21 +0800
d6f5b0aa0 kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() ... Browse Code »

reiserfs_write_end() is a hot path in reiserfs.
We have two wasteful write lock lock/release inside that can be gathered
without changing the code logic.

This patch factorizes them out in a single protected section, reducing the
number of contentions inside.

[ Impact: reduce lock contention in a reiserfs hotpath ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:20 +0800
09eb47a7c kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() ... Browse Code »

search_by_key() is a central function in reiserfs which searches
the patch in the fs tree from the root to a node given its key.

It is the function that is most requesting the write lock
because it's a path very often used.

Also we forget to release the lock while reading the next tree node,
making us holding the lock in a wasteful way.

Then we release the lock while reading the current node and its childs,
all-in-one. It should be safe because we have a reference to these
blocks and even if we read a block that will be concurrently changed,
we have an fs_changed check later that will make us retry the path from
the root.

[ Impact: release the write lock while unused in a hot path ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:19 +0800
b1c839bb2 kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() ... Browse Code »

The write lock can be acquired recursively in reiserfs_lookup(). But we may
want to *really* release the lock before possible rescheduling from a
reiserfs_lookup() callee.

Hence we want to only acquire the lock once (ie: not recursively).

[ Impact: prevent from possible false unreleased write lock on sleeping ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:17 +0800
26931309a kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() ... Browse Code »

reiserfs_get_block() is one of these sites where the write lock might
be acquired recursively.

It's a particular problem because this function is called very often.
It's a hot spot which needs to reschedule() periodically while converting
direct items to indirect ones because it can take some time.

Then if we are applying the write lock release/reacquire pattern on
schedule() here, it may not produce the desired effect since we may have
locked in more than one depth.

The solution is to use reiserfs_write_lock_once() which won't try
to reacquire the lock recursively. Then the lock will be *really*
released before schedule().

Also, we only release the lock if TIF_NEED_RESCHED is set to not
create wasteful numerous contentions.

[ Impact: fix a too long holded lock case in reiserfs_get_block() ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Ingo Molnar
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:16 +0800
6e3647acb kill-the-BKL/reiserfs: release the write lock on flush_commit_list() ... Browse Code »

flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
ll_rw_block() might sleep, and the bkl was released at this point. Then
we can also relax the write lock at this point.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:13 +0800
4c5eface5 kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block() ... Browse Code »

reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This
helper might sleep.

Then, when the bkl was used, it was released at this point. We can then
relax the write lock too here.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:11 +0800
148d3504c kill-the-BKL/reiserfs: release the write lock inside get_neighbors() ... Browse Code »

get_neighbors() is used to get the left and/or right blocks
against a given one in order to balance a tree.

sb_bread() is used to read the buffer of these neighors blocks and
while it waits for this operation, it might sleep.

The bkl was released at this point, and then we can also release
the write lock before calling sb_bread().

This is safe because if the filesystem is changed after this
lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED
in the function header comments) in order to repeat the neighbhor
research.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:10 +0800
5e69e3a44 kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() ... Browse Code »

prepare_for_delete_or_cut() can process several types of items, including
indirect items, ie: items which contain no file data but pointers to
unformatted nodes scattering the datas of a file.

In this case it has to zero out these pointers to block numbers of
unformatted nodes and release the bitmap from these block numbers.

It can take some time, so a rescheduling() is performed between each
block processed. We can safely release the write lock while
rescheduling(), like the bkl did, because the code checks just after
if the item has moved after sleeping.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:09 +0800
e6950a4da kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() ... Browse Code »

When do_journal_end() copies data to the journal blocks buffers in memory,
it reschedules if needed between each block copied and dirtyfied.

We can also release the write lock at this rescheduling stage,
like did the bkl implicitly.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney
Cc: Chris Mason
Cc: Alexander Beregalov
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2009-09-14 13:18:08 +0800
dc8f6d893 kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode ... Browse Code »

Impact: fix a deadlock

reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
of reiserfs. It can be called from different contexts where the write
lock can be already held.

But this function also grab the write lock (possibly recursively).
Subsequent release of the lock before sleep will actually not release
the lock if the caller of mark_inode_dirty() (which in turn calls
reiserfs_dirty_inode()) already owns the lock.

A typical case:

reiserfs_write_end() {
acquire_write_lock()
mark_inode_dirty() {
reiserfs_dirty_inode() {
reacquire_write_lock() {
journal_begin() {
do_journal_begin_r() {
/*
* fail to release, still
* one depth of lock
*/
release_write_lock()
reiserfs_wait_on_write_block() {
wait_event()

The event is usually provided by something which needs the write lock but
it hasn't been released.

We use reiserfs_write_lock_once() here to ensure we only grab the
write lock in one level.

Signed-off-by: Frederic Weisbecker
Cc: Frederic Weisbecker
Cc: Alessio Igor Bogani
Cc: Jeff Mahoney
Cc: Chris Mason
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:18:04 +0800
22c963add kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file ... Browse Code »

Impact: fix a deadlock

reiserfs_truncate_file() can be called from multiple context where
the write lock can be already hold or not.

This function also acquire (possibly recursively) the write
lock. Subsequent releases before sleeping will not actually release
the lock because we may be in more than one lock depth degree.

A typical case is:

reiserfs_file_release {
acquire_the_lock()
reiserfs_truncate_file()
reacquire_the_lock()
journal_begin() {
do_journal_begin_r() {
reiserfs_wait_on_write_block() {
/*
* Not released because still one
* depth owned
*/
release_lock()
wait_for_event()

At this stage the event never happen because the one which provides
it needs the write lock.

We use reiserfs_write_lock_once() here to ensure that we don't acquire the
write lock recursively.

Signed-off-by: Frederic Weisbecker
Cc: Alessio Igor Bogani
Cc: Jeff Mahoney
Cc: Alexander Beregalov
Cc: Chris Mason
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:18:03 +0800
daf88c898 kill-the-BKL/reiserfs: provide a tool to lock only once the write lock ... Browse Code »

Sometimes we don't want to recursively hold the per superblock write
lock because we want to be sure it is actually released when we come
to sleep.

This patch introduces the necessary tools for that.

reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
except that it won't try to acquire recursively the lock if the current
task already owns it. Also the lock_depth before the call of this function
is returned.

reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
returned a depth equal to -1, ie: only if it actually locked.

Signed-off-by: Frederic Weisbecker
Cc: Alessio Igor Bogani
Cc: Jeff Mahoney
Cc: Alexander Beregalov
Cc: Chris Mason
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:18:02 +0800
a412f9efd reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock ... Browse Code »

Impact: fix a deadlock

The j_flush_mutex is acquired safely in journal.c:
if we can't take it, we free the reiserfs per superblock lock
and wait a bit.

But we have a remaining place in kupdate_transactions() where
j_flush_mutex is still acquired traditionnaly. Thus the following
scenario (warned by lockdep) can happen:

A B

mutex_lock(&write_lock) mutex_lock(&write_lock)
mutex_lock(&j_flush_mutex) mutex_lock(&j_flush_mutex) //block
mutex_unlock(&write_lock)
sleep...
mutex_lock(&write_lock) //deadlock

Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().

Signed-off-by: Frederic Weisbecker
Cc: Alessio Igor Bogani
Cc: Jeff Mahoney
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:18:01 +0800
8ebc42323 reiserfs: kill-the-BKL ... Browse Code »

This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
of the code.
- We need to identify the sites where the bkl was implictly relaxed
(schedule, wait, sync, etc...) so that we can in turn release and
reacquire our new lock explicitly.
Such implicit releases of the lock are often required to let other
resources producer/consumer do their job or we can suffer unexpected
starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

T1 T2

lock_kernel()
mutex_lock(A)
unlock_kernel()
// do something
lock_kernel()
mutex_lock(A) -> already locked by T1
schedule() (and then unlock_kernel())
lock_kernel()
mutex_unlock(A)
....

This is not fine with a mutex:

T1 T2

mutex_lock(write)
mutex_lock(A)
mutex_unlock(write)
// do something
mutex_lock(write)
mutex_lock(A) -> already locked by T1
schedule()

mutex_lock(write) -> already locked by T2
deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.

Signed-off-by: Frederic Weisbecker
Reviewed-by: Ingo Molnar
Cc: Jeff Mahoney
Cc: Peter Zijlstra
Cc: Bron Gondwana
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Alexander Viro
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-09-14 13:17:59 +0800