24 Sep, 2009
1 commit
-
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
23 Sep, 2009
1 commit
-
Unlike on most other architectures ino_t is an unsigned int on s390. So
add an explicit cast to avoid this compile warning:fs/ext2/namei.c: In function 'ext2_lookup':
fs/ext2/namei.c:73: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
1 commit
-
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Sep, 2009
1 commit
-
Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen
14 Sep, 2009
1 commit
-
We rely on generic_write_sync() now.
CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara
09 Sep, 2009
1 commit
-
Don't implement per-filesystem 'extX_permission()' functions that have
to be called for every path component operation, and instead just expose
the actual ACL checking so that the VFS layer can now do it for us.Reviewed-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: Linus Torvalds
06 Sep, 2009
1 commit
-
In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever. Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.Signed-off-by: Nicolas Pitre
Acked-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
13 Jul, 2009
1 commit
-
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPTThis will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
01 Jul, 2009
1 commit
-
ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.Signed-off-by: Bryan Donlan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jun, 2009
2 commits
-
helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).ubifs/xattr.c needed includes reordered, the rest is a plain switchover.
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
19 Jun, 2009
1 commit
-
One of our users is complaining that his backup tool is upset on ext2
(while it's happy on ext3, xfs, ...) because of the mtime change.The problem is:
mkdir foo
mkdir bar
mkdir foo/aNow under ext2:
mv foo/a foo/bchanges mtime of 'foo/a' (foo/b after the move). That does not really
make sense and it does not happen under any other filesystem I've seen.More complicated is:
mv foo/a bar/aThis changes mtime of foo/a (bar/a after the move) and it makes some
sense since we had to update parent directory pointer of foo/a. But
again, no other filesystem does this. So after some thoughts I'd vote
for consistency and change ext2 to behave the same as other filesystems.Do not update mtime of a moved directory. Specs don't say anything
about it (neither that it should, nor that it should not be updated) and
other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
it. So let's become more consistent.Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.
Reported-by:
Cc:
Cc: Jörn Engel
Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jun, 2009
1 commit
-
Signed-off-by: Ali Gholami Rudi
Signed-off-by: Jiri Kosina
12 Jun, 2009
5 commits
-
Add a ->sync_fs method for data integrity syncs, and reimplement
->write_super ontop of it.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
kill ext2_sync_file() (along with ext2/fsync.c), get rid of
ext2_update_inode() - it's an alias of ext2_write_inode().Signed-off-by: Al Viro
-
[xfs, btrfs, capifs, shmem don't need BKL, exempt]
Signed-off-by: Alessio Igor Bogani
Signed-off-by: Al Viro -
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.Exceptions:
- affs already has identical copy & pasted code at the beginning of
affs_put_super so no need to do it twice.
- xfs does the right thing without it and I have changes pending for
the xfs tree touching this are so I don't really need conflicts
here..Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
18 May, 2009
1 commit
-
Signed-off-by: Manish Katiyar
Signed-off-by: "Theodore Ts'o"
27 Apr, 2009
1 commit
-
The inode->i_mutex should be unlocked.
Found by smatch (http://repo.or.cz/w/smatch.git). Compile tested.
Signed-off-by: Dan Carpenter
Signed-off-by: Jan Kara
14 Apr, 2009
1 commit
-
If two writers allocating blocks to file race with each other (e.g.
because writepages races with ordinary write or two writepages race with
each other), ext2_getblock() can be called on the same inode in parallel.
Before we are going to allocate new blocks, we have to recheck the block
chain we have obtained so far without holding truncate_mutex. Otherwise
we could overwrite the indirect block pointer set by the other writer
leading to data loss.The below test program by Ying is able to reproduce the data loss with ext2
on in BRD in a few minutes if the machine is under memory pressure:long kMemSize = 50 << 20;
int kPageSize = 4096;int main(int argc, char **argv) {
int status;
int count = 0;
int i;
char *fname = "/mnt/test.mmap";
char *mem;
unlink(fname);
int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
status = ftruncate(fd, kMemSize);
mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// Fill the memory with 1s.
memset(mem, 1, kMemSize);
sleep(2);
for (i = 0; i < kMemSize; i++) {
int byte_good = mem[i] != 0;
if (!byte_good && ((i % kPageSize) == 0)) {
//printf("%d ", i / kPageSize);
count++;
}
}
munmap(mem, kMemSize);
close(fd);
unlink(fname);if (count > 0) {
printf("Running %d bad page\n", count);
return 1;
}
return 0;
}Cc: Ying Han
Cc: Nick Piggin
Signed-off-by: Jan Kara
Cc: Mingming Cao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Apr, 2009
1 commit
-
current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.Signed-off-by: Al Viro
26 Mar, 2009
2 commits
-
ext2_quota_read() doesn't initialize tmp_bh.b_size before calling
ext2_get_block() where we access it. Since it is a local variable it
might contain some garbage. Make sure it is filled with reasonable
value before passing.Signed-off-by: Manish Katiyar
Signed-off-by: Jan Kara -
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara
CC: linux-ext4@vger.kernel.org
12 Feb, 2009
1 commit
-
For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around. According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin
Date: Mon Apr 28 02:13:02 2008 -0700xip: support non-struct page backed memory
In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x. During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork). Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.Both parent and child process crash some time later due to inconsistencies
in their address spaces. The issue seems to be timing sensitive, various
attempts to recreate it have failed.This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated. This patch keeps users from running into
that issue.[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte
Cc: Nick Piggin
Cc: Jared Hulbert
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Jan, 2009
1 commit
-
We used to just write changed page for IS_DIRSYNC inodes. But we also
have to update the directory inode itself just for the case that we've
allocated a new block and changed i_size.[akpm@linux-foundation.org: still sync the data page]
Signed-off-by: Jan Kara
Tested-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Jan, 2009
4 commits
-
At the moment there are few restrictions on which flags may be set on
which inodes. Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
At present BTREE/INDEX is the only flag that new ext2 inodes do NOT
inherit from their parent. In addition prevent the flags DIRTY, ECOMPR,
INDEX, IMAGIC and TOPDIR from being inherited. List inheritable flags
explicitly to prevent future flags from accidentally being inherited.This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.Signed-off-by: Duane Griffin
Acked-by: Andreas Dilger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes on 64-bit
which makes it a very bad fit for SLAB allocators. The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately. This shinks down struct ext2_sb_info
enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
32 KB saving 15 KB of memory.Acked-by: Andreas Dilger
Signed-off-by: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is no argument named @chain in ext2_splice_branch, remove references
to it.Signed-off-by: Qinghuang Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jan, 2009
2 commits
-
* make ext2_new_inode() put the inode into icache in locked state
* do not unlock until the inode is fully set up; otherwise nfsd
might pick it in half-baked state.
* make sure that ext2_new_inode() does *not* lead to two inodes with the
same inumber hashed at the same time; otherwise a bogus fhandle coming
from nfsd might race with inode creation:nfsd: iget_locked() creates inode
nfsd: try to read from disk, block on that.
ext2_new_inode(): allocate inode with that inumber
ext2_new_inode(): insert it into icache, set it up and dirty
ext2_write_inode(): get the relevant part of inode table in cache,
set the entry for our inode (and start writing to disk)
nfsd: get CPU again, look into inode table, see nice and sane on-disk
inode, set the in-core inode from itoops - we have two in-core inodes with the same inumber live in icache,
both used for IO. Welcome to fs corruption...Signed-off-by: Al Viro
-
Ensure fast symlink targets are NUL-terminated, even if corrupted
on-disk.Cc: Andrew Morton
Signed-off-by: Duane Griffin
Signed-off-by: Al Viro
14 Nov, 2008
1 commit
-
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris
24 Oct, 2008
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...
23 Oct, 2008
2 commits
-
Signed-off-by: Al Viro
-
Switch all users of d_alloc_anon to d_obtain_alias.
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
21 Oct, 2008
2 commits
-
Signed-off-by: Al Viro
-
Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
17 Oct, 2008
2 commits
-
A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.This is fixed by only reporting the corruption once each time we try to
read the directory.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We could run into ENOSPC error on ext2, even when there is free blocks on
the filesystem.The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.2) we don't need to check the window size if the block reservation is off.
The problem was originally found and fixed in ext4.
Signed-off-by: Mingming Cao
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds