Eric Lee / smarc-fsl-linux-kernel

11 Jun, 2010

3 commits

3e6c05052 block: remove duplicate BUG_ON() in bd_finish_claiming() ... Browse Code »

We do the same BUG_ON() just a line later when calling into
__bd_abort_claiming().

Reported-by: Tejun Heo
Signed-off-by: Jens Axboe

Jens Axboe
2010-06-11 01:08:34 +0800
b0018361c block: bd_start_claiming cleanup ... Browse Code »

I don't like the subtle multi-context code in bd_claim (ie. detects where it
has been called based on bd_claiming). It seems clearer to just require a new
function to finish a 2-part claim.

Also improve commentary in bd_start_claiming as to how it should
be used.

Signed-off-by: Nick Piggin
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Nick Piggin
2010-06-11 01:08:34 +0800
cf3425707 block: bd_start_claiming fix module refcount ... Browse Code »

bd_start_claiming has an unbalanced module_put introduced in 6b4517a79.

Signed-off-by: Nick Piggin
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Nick Piggin
2010-06-11 01:08:34 +0800

28 May, 2010

2 commits

3322e79a3 fs: convert simple fs to new truncate ... Browse Code »

Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate
sequence.

Cc: Christoph Hellwig
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2010-05-28 10:15:47 +0800
7ea808591 drop unused dentry argument to ->fsync ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-05-28 10:05:02 +0800

22 May, 2010

3 commits

e8bebe2f7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
get rid of home-grown mutex in cris eeprom.c
switch ecryptfs_write() to struct inode *, kill on-stack fake files
switch ecryptfs_get_locked_page() to struct inode *
simplify access to ecryptfs inodes in ->readpage() and friends
AFS: Don't put struct file on the stack
Ban ecryptfs over ecryptfs
logfs: replace inode uid,gid,mode initialization with helper function
ufs: replace inode uid,gid,mode initialization with helper function
udf: replace inode uid,gid,mode init with helper
ubifs: replace inode uid,gid,mode initialization with helper function
sysv: replace inode uid,gid,mode initialization with helper function
reiserfs: replace inode uid,gid,mode initialization with helper function
ramfs: replace inode uid,gid,mode initialization with helper function
omfs: replace inode uid,gid,mode initialization with helper function
bfs: replace inode uid,gid,mode initialization with helper function
ocfs2: replace inode uid,gid,mode initialization with helper function
nilfs2: replace inode uid,gid,mode initialization with helper function
minix: replace inode uid,gid,mode init with helper
ext4: replace inode uid,gid,mode init with helper
...

Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)

Linus Torvalds
2010-05-22 10:37:45 +0800
18e9e5104 Introduce freeze_super and thaw_super for the fsfreeze ioctl ... Browse Code »

Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
letting it do all the work. But freezing is more of an fs thing, and doesn't
really have much to do with the bdev at all, all the work gets done with the
super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
This means that freezing a btrfs filesystem fails, which causes us to corrupt
with things like tux-on-ice which use the fsfreeze mechanism. So instead of
populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
freezing stuff into freeze_super and thaw_super. These just take the
super_block that we're freezing and does the appropriate work. It's basically
just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
use the new super helpers. I've tested this with ext4 and btrfs and verified
everything continues to work the same as before.

The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
the fs is already frozen. I thought this was a better solution than adding a
freeze counter to the super_block, but if everybody hates this idea I'm open to
suggestions. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2010-05-22 06:31:18 +0800
d3f214730 Move grabbing s_umount to callers of grab_super() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-05-22 06:31:17 +0800

29 Apr, 2010

2 commits

7407cf355 Merge branch 'master' into for-2.6.35 ... Browse Code »

Conflicts:
fs/block_dev.c

Signed-off-by: Jens Axboe

Jens Axboe
2010-04-29 15:36:24 +0800
fbd9b09a1 blkdev: generalize flags for blkdev_issue_fn functions ... Browse Code »

The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2010-04-29 01:47:36 +0800

27 Apr, 2010

2 commits

6b4517a79 block: implement bd_claiming and claiming block ... Browse Code »

Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully. This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.

cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process. In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.

This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.

This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming(). bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.

bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.

blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.

This problem was discovered while investigating bko#15403.

https://bugzilla.kernel.org/show_bug.cgi?id=15403

The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.

Signed-off-by: Tejun Heo
Reported-by: Matthias-Christian Ott
Cc: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2010-04-27 16:57:54 +0800
1a3cbbc5a block: factor out bd_may_claim() ... Browse Code »

Factor out bd_may_claim() from bd_claim(), add comments and apply a
couple of cosmetic edits. This is to prepare for further updates to
claim path.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-04-27 16:57:54 +0800

25 Apr, 2010

1 commit

b8af67e26 fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices ... Browse Code »

We are seeing a large regression in database performance on recent
kernels. The database opens a block device with O_DIRECT|O_SYNC and a
number of threads write to different regions of the file at the same time.

A simple test case is below. I haven't defined DEVICE since getting it
wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
see about 17MB/sec and only a few threads in IO wait:

procs -----io---- -system-- -----cpu------
r b bi bo in cs us sy id wa st
0 3 0 16170 656 2259 0 0 86 14 0
0 2 0 16704 695 2408 0 0 92 8 0
0 2 0 17308 744 2653 0 0 86 14 0
0 2 0 17933 759 2777 0 0 89 10 0

Most threads are blocking in vfs_fsync_range, which has:

mutex_lock(&mapping->host->i_mutex);
err = fop->fsync(file, dentry, datasync);
if (!ret)
ret = err;
mutex_unlock(&mapping->host->i_mutex);

commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
some explanation of what is going on:

Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.

Thanks Jan for such a good commit message! As well as dropping i_mutex,
Christoph suggests we should remove the call to sync_blockdev():

> sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
> the block device inode, which is exactly what we did just before calling
> into ->fsync

The patch below incorporates both suggestions. With it the testcase improves
from 17MB/s to 68M/sec:

procs -----io---- -system-- -----cpu------
r b bi bo in cs us sy id wa st
0 7 0 65536 1000 3878 0 0 70 30 0
0 34 0 69632 1016 3921 0 1 46 53 0
0 57 0 69632 1000 3921 0 0 55 45 0
0 53 0 69640 754 4111 0 0 81 19 0

Testcase:

#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include

#define NR_THREADS 64
#define BUFSIZE (64 * 1024)

#define DEVICE "/dev/mapper/XXXXXX"

#define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))

static int fd;

static void *doit(void *arg)
{
unsigned long offset = (long)arg;
char *b, *buf;

b = malloc(BUFSIZE + 1024);
buf = (char *)ALIGN((unsigned long)b, 1024);
memset(buf, 0, BUFSIZE);

while (1)
pwrite(fd, buf, BUFSIZE, offset);
}

int main(int argc, char *argv[])
{
int flags = O_RDWR|O_DIRECT;
int i;
unsigned long offset = 0;

if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
flags |= O_SYNC;

fd = open(DEVICE, flags);
if (fd == -1) {
perror("open");
exit(1);
}

for (i = 0; i < NR_THREADS-1; i++) {
pthread_t tid;
pthread_create(&tid, NULL, doit, (void *)offset);
offset += BUFSIZE;
}
doit((void *)offset);

return 0;
}

Signed-off-by: Anton Blanchard
Acked-by: Jan Kara
Cc: Christoph Hellwig
Cc: Alexander Viro
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2010-04-25 02:31:26 +0800

07 Apr, 2010

2 commits

b1dd3b284 vfs: rename block_fsync() to blkdev_fsync() ... Browse Code »

Requested by hch, for consistency now it is exported.

Cc: Alexander Viro
Cc: Anton Blanchard
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-04-07 23:38:04 +0800
55ab3a1ff raw: fsync method is now required ... Browse Code »

Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range. vfs_fsync_range has:

if (!fop || !fop->fsync) {
ret = -EINVAL;
goto out;
}

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely. I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver. My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard
Reviewed-by: Jan Kara
Cc: Christoph Hellwig
Cc: Alexander Viro
Cc: Jens Axboe
Reviewed-by: Jeff Moyer
Tested-by: Jeff Moyer
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2010-04-07 23:38:04 +0800

07 Feb, 2010

1 commit

4b06e5b9a freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb ... Browse Code »

Thanks Thomas and Christoph for testing and review.
I removed 'smp_wmb()' before up_write from the previous patch,
since up_write() should have necessary ordering constraints.
(I.e. the change of s_frozen is visible to others after up_write)
I'm quite sure the change is harmless but if you are uncomfortable
with Tested-by/Reviewed-by on the modified patch, please remove them.

If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.

Otherwise a crash reported here can happen:
http://lkml.org/lkml/2010/1/16/37
http://lkml.org/lkml/2010/1/28/53

This patch should be applied for 2.6.32 stable series, too.

Reviewed-by: Christoph Hellwig
Tested-by: Thomas Backlund
Signed-off-by: Jun'ichi Nomura
Cc: stable@kernel.org
Signed-off-by: Al Viro

Jun'ichi Nomura
2010-02-07 16:06:21 +0800

04 Nov, 2009

1 commit

2058297d2 Merge branch 'for-linus' into for-2.6.33 ... Browse Code »

Conflicts:
block/cfq-iosched.c

Signed-off-by: Jens Axboe

Jens Axboe
2009-11-04 04:14:39 +0800

29 Oct, 2009

1 commit

ab0a9735e blkdev: flush disk cache on ->fsync ... Browse Code »

Currently there is no barrier support in the block device code. That
means we cannot guarantee any sort of data integerity when using the
block device node with dis kwrite caches enabled. Using the raw block
device node is a typical use case for virtualization (and I assume
databases, too). This patch changes block_fsync to issue a cache flush
and thus make fsync on block device nodes actually useful.

Note that in mainline we would also need to add such code to the
->aio_write method for O_SYNC handling, but assuming that Jan's patch
series for the O_SYNC rewrite goes in it will also call into ->fsync
for 2.6.32.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2009-10-29 21:14:04 +0800

26 Oct, 2009

1 commit

960cc0f4f block: use after free bug in __blkdev_get ... Browse Code »

commit 0762b8bde9729f10f8e6249809660ff2ec3ad735
(from 14 months ago) introduced a use-after-free bug which has just
recently started manifesting in my md testing.
I tried git bisect to find out what caused the bug to start
manifesting, and it could have been the recent change to
blk_unregister_queue (48c0d4d4c04) but the results were inconclusive.

This patch certainly fixes my symptoms and looks correct as the two
calls are now in the same order as elsewhere in that function.

Signed-off-by: NeilBrown
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Neil Brown
2009-10-26 22:27:11 +0800

24 Sep, 2009

2 commits

4504230a7 freeze_bdev: grab active reference to frozen superblocks ... Browse Code »

Currently we held s_umount while a filesystem is frozen, despite that we
might return to userspace and unlock it from a different process. Instead
grab an active reference to keep the file system busy and add an explicit
check for frozen filesystems in remount and reject the remount instead
of blocking on s_umount.

Add a new get_active_super helper to super.c for use by freeze_bdev that
grabs an active reference to a superblock from a given block device.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-09-24 19:47:41 +0800
4fadd7bb2 freeze_bdev: kill bd_mount_sem ... Browse Code »

Now that we have the freeze count there is not much reason for bd_mount_sem
anymore. The actual freeze/thaw operations are serialized using the
bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
get_sb_bdev which tries to prevent mounting a filesystem while the block
device is frozen. Instead of add a check for bd_fsfreeze_count and
return -EBUSY if a filesystem is frozen. While that is a change in user
visible behaviour a failing mount is much better for this case rather
than having the mount process stuck uninterruptible for a long time.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-09-24 19:47:39 +0800

22 Sep, 2009

1 commit

83d5cde47 const: make block_device_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:25 +0800

16 Sep, 2009

1 commit

2c96ce9f2 fs: remove bdev->bd_inode_backing_dev_info ... Browse Code »

It has been unused since it was introduced in:

commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton
Date: Fri May 21 00:46:17 2004 -0700

[PATCH] block device layer: separate backing_dev_info infrastructure

So lets just kill it.

Acked-by: Jan Kara
Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 21:16:18 +0800

14 Sep, 2009

1 commit

eef993806 vfs: Rename generic_file_aio_write_nolock ... Browse Code »

generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2009-09-14 23:08:15 +0800

30 Jul, 2009

1 commit

dddac6a7b PM / Hibernate: Replace bdget call with simple atomic_inc of i_count ... Browse Code »

Create bdgrab(). This function copies an existing reference to a
block_device. It is safe to call from any context.

Hibernation code wishes to copy a reference to the active swap device.
Right now it calls bdget() under a spinlock, but this is wrong because
bdget() can sleep. It doesn't need a full bdget() because we already
hold a reference to active swap devices (and the spinlock protects
against swapoff).

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827

Signed-off-by: Alan Jenkins
Signed-off-by: Rafael J. Wysocki

Alan Jenkins
2009-07-30 03:07:55 +0800

12 Jun, 2009

5 commits

60b0680fa vfs: Rename fsync_super() to sync_filesystem() (version 4) ... Browse Code »

Rename the function so that it better describe what it really does. Also
remove the unnecessary include of buffer_head.h.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:04 +0800
5cee5815d vfs: Make sys_sync() use fsync_super() (version 4) ... Browse Code »

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
429479f03 vfs: Make __fsync_super() a static function (version 4) ... Browse Code »

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
512626a04 Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 ... Browse Code »

* 'for-linus' of git://linux-arm.org/linux-2.6:
kmemleak: Add the corresponding MAINTAINERS entry
kmemleak: Simple testing module for kmemleak
kmemleak: Enable the building of the memory leak detector
kmemleak: Remove some of the kmemleak false positives
kmemleak: Add modules support
kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
kmemleak: Add the vmalloc memory allocation/freeing hooks
kmemleak: Add the slub memory allocation/freeing hooks
kmemleak: Add the slob memory allocation/freeing hooks
kmemleak: Add the slab memory allocation/freeing hooks
kmemleak: Add documentation on the memory leak detector
kmemleak: Add the base support

Manual conflict resolution (with the slab/earlyboot changes) in:
drivers/char/vt.c
init/main.c
mm/slab.c

Linus Torvalds
2009-06-12 05:15:57 +0800
2e1483c99 kmemleak: Remove some of the kmemleak false positives ... Browse Code »

There are allocations for which the main pointer cannot be found but
they are not memory leaks. This patch fixes some of them. For more
information on false positives, see Documentation/kmemleak.txt.

Signed-off-by: Catalin Marinas

Catalin Marinas
2009-06-12 00:04:18 +0800

05 Jun, 2009

1 commit

172124e22 Revert "block: implement blkdev_readpages" ... Browse Code »

This reverts commit db2dbb12dc47a50c7a4c5678f526014063e486f6.

It apparently causes problems with partition table read-ahead
on archs with large page sizes. Until that problem is diagnosed
further, just drop the readpages support on block devices.

Signed-off-by: Jens Axboe

Jens Axboe
2009-06-05 04:34:44 +0800

23 May, 2009

1 commit

e1defc4ff block: Do away with the notion of hardsect_size ... Browse Code »

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-05-23 05:22:54 +0800

28 Apr, 2009

1 commit

db2dbb12d block: implement blkdev_readpages ... Browse Code »

Doing a proper block dev ->readpages() speeds up the crazy dump(8)
approach of using interleaved process IO.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-04-28 13:37:33 +0800

01 Apr, 2009

1 commit

47e4491b4 Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225 ... Browse Code »

fsync_bdev() export and a bunch of stubs for !CONFIG_BLOCK case had
been left behind

Signed-off-by: Al Viro

Al Viro
2009-04-01 19:07:16 +0800

28 Mar, 2009

1 commit

585d3bc06 fs: move bdev code out of buffer.c ... Browse Code »

Move some block device related code out from buffer.c and put it in
block_dev.c. I'm trying to move non-buffer_head code out of buffer.c

Signed-off-by: Al Viro

Nick Piggin
2009-03-28 02:44:03 +0800

10 Jan, 2009

1 commit

fcccf5025 filesystem freeze: implement generic freeze feature ... Browse Code »

The ioctls for the generic freeze feature are below.
o Freeze the filesystem
int ioctl(int fd, int FIFREEZE, arg)
fd: The file descriptor of the mountpoint
FIFREEZE: request code for the freeze
arg: Ignored
Return value: 0 if the operation succeeds. Otherwise, -1

o Unfreeze the filesystem
int ioctl(int fd, int FITHAW, arg)
fd: The file descriptor of the mountpoint
FITHAW: request code for unfreeze
arg: Ignored
Return value: 0 if the operation succeeds. Otherwise, -1
Error number: If the filesystem has already been unfrozen,
errno is set to EINVAL.

[akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
Signed-off-by: Takashi Sato
Signed-off-by: Masayuki Hamaguchi
Cc:
Cc:
Cc: Christoph Hellwig
Cc: Dave Kleikamp
Cc: Dave Chinner
Cc: Alasdair G Kergon
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Takashi Sato
2009-01-10 08:54:42 +0800

09 Jan, 2009

2 commits

2150edc6c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...

Linus Torvalds
2009-01-09 09:14:59 +0800
d3374825c md: make devices disappear when they are no longer needed. ... Browse Code »

Currently md devices, once created, never disappear until the module
is unloaded. This is essentially because the gendisk holds a
reference to the mddev, and the mddev holds a reference to the
gendisk, this a circular reference.

If we drop the reference from mddev to gendisk, then we need to ensure
that the mddev is destroyed when the gendisk is destroyed. However it
is not possible to hook into the gendisk destruction process to enable
this.

So we drop the reference from the gendisk to the mddev and destroy the
gendisk when the mddev gets destroyed. However this has a
complication.
Between the call
__blkdev_get->get_gendisk->kobj_lookup->md_probe
and the call
__blkdev_get->md_open

there is no obvious way to hold a reference on the mddev any more, so
unless something is done, it will disappear and gendisk will be
destroyed prematurely.

Also, once we decide to destroy the mddev, there will be an unlockable
moment before the gendisk is unlinked (blk_unregister_region) during
which a new reference to the gendisk can be created. We need to
ensure that this reference can not be used. i.e. the ->open must
fail.

So:
1/ in md_probe we set a flag in the mddev (hold_active) which
indicates that the array should be treated as active, even
though there are no references, and no appearance of activity.
This is cleared by md_release when the device is closed if it
is no longer needed.
This ensures that the gendisk will survive between md_probe and
md_open.

2/ In md_open we check if the mddev we expect to open matches
the gendisk that we did open.
If there is a mismatch we return -ERESTARTSYS and modify
__blkdev_get to retry from the top in that case.
In the -ERESTARTSYS sys case we make sure to wait until
the old gendisk (that we succeeded in opening) is really gone so
we loop at most once.

Some udev configurations will always open an md device when it first
appears. If we allow an md device that was just created by an open
to disappear on an immediate close, then this can race with such udev
configurations and result in an infinite loop the device being opened
and closed, then re-open due to the 'ADD' even from the first open,
and then close and so on.
So we make sure an md device, once created by an open, remains active
at least until some md 'ioctl' has been made on it. This means that
all normal usage of md devices will allow them to disappear promptly
when not needed, but the worst that an incorrect usage will do it
cause an inactive md device to be left in existence (it can easily be
removed).

As an array can be stopped by writing to a sysfs attribute
echo clear > /sys/block/mdXXX/md/array_state
we need to use scheduled work for deleting the gendisk and other
kobjects. This allows us to wait for any pending gendisk deletion to
complete by simply calling flush_scheduled_work().

Signed-off-by: NeilBrown

NeilBrown
2009-01-09 05:31:10 +0800

07 Jan, 2009

1 commit

94e2959e7 fs: fix function param name in kernel-doc ... Browse Code »

Fix function parameter name in kernel-doc:

Warning(linux-2.6.28-git5//fs/block_dev.c:1272): No description found for parameter 'pathname'
Warning(linux-2.6.28-git5//fs/block_dev.c:1272): Excess function parameter 'path' description in 'lookup_bdev'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2009-01-07 07:59:14 +0800

03 Jan, 2009

1 commit

87d8fe1ee add releasepage hooks to block devices which can be used by file systems ... Browse Code »

Implement blkdev_releasepage() to release the buffer_heads and pages
after we release private data belonging to a mounted filesystem.

Cc: Toshiyuki Okajima
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-01-03 22:47:09 +0800