13 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

06 Mar, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

7 commits

  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the drop dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_drop helper to __dquot_drop
    and vfs_dq_drop to dquot_drop to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently clear_inode calls vfs_dq_drop directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the drop inside the ->clear_inode
    superblock operation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the transfer dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_transfer helper to __dquot_transfer
    and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
    and make the new dquot_transfer return a normal negative errno value
    which all callers expect.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the alloc_inode and free_inode dquot operations - they are
    always called from the filesystem and if a filesystem really needs
    their own (which none currently does) it can just call into it's
    own routine directly.

    Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
    call the lowlevel dquot_alloc_inode / dqout_free_inode routines
    directly, which now lose the number argument which is always 1.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the alloc_space, free_space, reserve_space, claim_space and
    release_rsv dquot operations - they are always called from the filesystem
    and if a filesystem really needs their own (which none currently does)
    it can just call into it's own routine directly.

    Move shared logic into the common __dquot_alloc_space,
    dquot_claim_space_nodirty and __dquot_free_space low-level methods,
    and rationalize the wrappers around it to move as much as possible
    code into the common block for CONFIG_QUOTA vs not. Also rename
    all these helpers to be named dquot_* instead of vfs_dq_*.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

17 Dec, 2009

2 commits

  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
    direct I/O fallback sync simplification
    ocfs: stop using do_sync_mapping_range
    cleanup blockdev_direct_IO locking
    make generic_acl slightly more generic
    sanitize xattr handler prototypes
    libfs: move EXPORT_SYMBOL for d_alloc_name
    vfs: force reval of target when following LAST_BIND symlinks (try #7)
    ima: limit imbalance msg
    Untangling ima mess, part 3: kill dead code in ima
    Untangling ima mess, part 2: deal with counters
    Untangling ima mess, part 1: alloc_file()
    O_TRUNC open shouldn't fail after file truncation
    ima: call ima_inode_free ima_inode_free
    IMA: clean up the IMA counts updating code
    ima: only insert at inode creation time
    ima: valid return code from ima_inode_alloc
    fs: move get_empty_filp() deffinition to internal.h
    Sanitize exec_permission_lite()
    Kill cached_lookup() and real_lookup()
    Kill path_lookup_open()
    ...

    Trivial conflicts in fs/direct-io.c

    Linus Torvalds
     
  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

16 Dec, 2009

2 commits

  • When an IO error happens while writing metadata buffers, we should better
    report it and call ext2_error since the filesystem is probably no longer
    consistent. Sometimes such IO errors happen while flushing thread does
    background writeback, the buffer gets later evicted from memory, and thus
    the only trace of the error remains as AS_EIO bit set in blockdevice's
    mapping. So we check this bit in ext2_fsync and report the error although
    we cannot be really sure which buffer we failed to write.

    Signed-off-by: Jan Kara
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • This fixes a common warning reported by kerneloops.org

    [Kernel summit hacking hour]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

10 Dec, 2009

3 commits

  • Signed-off-by: Jérémy Cochoy
    Signed-off-by: Jan Kara

    Jérémy Cochoy
     
  • This fixes a WARN backtrace in mark_buffer_dirty() that occurs during
    unmount when a USB or floppy device is removed. I reported this a kernel
    regression, but looks like it might have been there for longer
    than that.

    The super block update from a previous operation has marked the buffer
    as in error, and the flag has to be cleared before doing the update.
    (Similar code already exists in ext4).

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Jan Kara

    Stephen Hemminger
     
  • make messages produced by ext2 more unified. It should be
    easy to parse.

    dmesg before patch:
    [ 4893.684892] reservations ON
    [ 4893.684896] xip option not supported
    [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2
    [ 4893.684964] EXT2-fs warning: maximal mount count reached, running
    e2fsck is recommended
    [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
    bpg=8192, ipg=1280, mo=80010]

    dmesg after patch:
    [ 4893.684892] EXT2-fs (loop0): reservations ON
    [ 4893.684896] EXT2-fs (loop0): xip option not supported
    [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as
    ext2
    [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached,
    running e2fsck is recommended
    [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
    bpg=8192, ipg=1280, mo=80010]

    Signed-off-by: Alexey Fisher
    Reviewed-by: Andreas Dilger
    Signed-off-by: Jan Kara

    Alexey Fisher
     

24 Sep, 2009

1 commit

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     

23 Sep, 2009

1 commit

  • Unlike on most other architectures ino_t is an unsigned int on s390. So
    add an explicit cast to avoid this compile warning:

    fs/ext2/namei.c: In function 'ext2_lookup':
    fs/ext2/namei.c:73: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'

    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

22 Sep, 2009

1 commit


16 Sep, 2009

1 commit

  • Enable removing of corrupted pages through truncation
    for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
    These should cover most server needs.

    I chose the set of migration aware file systems for this
    for now, assuming they have been especially audited.
    But in general it should be safe for all file systems
    on the data area that support read/write and truncate.

    Caveat: the hardware error handler does not take i_mutex
    for now before calling the truncate function. Is that ok?

    Cc: tytso@mit.edu
    Cc: hch@infradead.org
    Cc: mfasheh@suse.com
    Cc: aia21@cantab.net
    Cc: hugh.dickins@tiscali.co.uk
    Cc: swhiteho@redhat.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     

14 Sep, 2009

1 commit


09 Sep, 2009

1 commit


06 Sep, 2009

1 commit

  • In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is
    then released through ext2_set_link() but only if old_dir != new_dir.
    Failing that, the pkmap reference count is never decremented and the
    page remains pinned forever. Repeat that a couple times with highmem
    pages and all pkmap slots get exhausted, and every further kmap() calls
    end up stalling on the pkmap_map_wait queue at which point the whole
    system comes to a halt.

    Signed-off-by: Nicolas Pitre
    Acked-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Nicolas Pitre
     

13 Jul, 2009

1 commit

  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 Jul, 2009

1 commit

  • ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
    report errors to NFS properly. However, in ext[234]_lookup(), this
    -ESTALE can be propagated to userspace if the filesystem is corrupted such
    that a directory entry references a deleted inode. This leads to a
    misleading error message - "Stale NFS file handle" - and confusion on the
    part of the admin.

    The bug can be easily reproduced by creating a new filesystem, making a
    link to an unused inode using debugfs, then mounting and attempting to ls
    -l said link.

    This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
    from ext2_iget(), as ext2 does for other filesystem metadata corruption;
    and also invokes the appropriate ext*_error functions when this case is
    detected.

    Signed-off-by: Bryan Donlan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Donlan
     

24 Jun, 2009

2 commits


19 Jun, 2009

1 commit

  • One of our users is complaining that his backup tool is upset on ext2
    (while it's happy on ext3, xfs, ...) because of the mtime change.

    The problem is:

    mkdir foo
    mkdir bar
    mkdir foo/a

    Now under ext2:
    mv foo/a foo/b

    changes mtime of 'foo/a' (foo/b after the move). That does not really
    make sense and it does not happen under any other filesystem I've seen.

    More complicated is:
    mv foo/a bar/a

    This changes mtime of foo/a (bar/a after the move) and it makes some
    sense since we had to update parent directory pointer of foo/a. But
    again, no other filesystem does this. So after some thoughts I'd vote
    for consistency and change ext2 to behave the same as other filesystems.

    Do not update mtime of a moved directory. Specs don't say anything
    about it (neither that it should, nor that it should not be updated) and
    other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
    it. So let's become more consistent.

    Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.

    Reported-by:
    Cc:
    Cc: Jörn Engel
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

13 Jun, 2009

1 commit


12 Jun, 2009

5 commits

  • Add a ->sync_fs method for data integrity syncs, and reimplement
    ->write_super ontop of it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • kill ext2_sync_file() (along with ext2/fsync.c), get rid of
    ext2_update_inode() - it's an alias of ext2_write_inode().

    Signed-off-by: Al Viro

    Al Viro
     
  • [xfs, btrfs, capifs, shmem don't need BKL, exempt]

    Signed-off-by: Alessio Igor Bogani
    Signed-off-by: Al Viro

    Alessio Igor Bogani
     
  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We just did a full fs writeout using sync_filesystem before, and if
    that's not enough for the filesystem it can perform it's own writeout
    in ->put_super, which many filesystems already do.

    Move a call to foofs_write_super into every foofs_put_super for now to
    guarantee identical behaviour until it's cleaned up by the individual
    filesystem maintainers.

    Exceptions:

    - affs already has identical copy & pasted code at the beginning of
    affs_put_super so no need to do it twice.
    - xfs does the right thing without it and I have changes pending for
    the xfs tree touching this are so I don't really need conflicts
    here..

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

18 May, 2009

1 commit


27 Apr, 2009

1 commit


14 Apr, 2009

1 commit

  • If two writers allocating blocks to file race with each other (e.g.
    because writepages races with ordinary write or two writepages race with
    each other), ext2_getblock() can be called on the same inode in parallel.
    Before we are going to allocate new blocks, we have to recheck the block
    chain we have obtained so far without holding truncate_mutex. Otherwise
    we could overwrite the indirect block pointer set by the other writer
    leading to data loss.

    The below test program by Ying is able to reproduce the data loss with ext2
    on in BRD in a few minutes if the machine is under memory pressure:

    long kMemSize = 50 << 20;
    int kPageSize = 4096;

    int main(int argc, char **argv) {
    int status;
    int count = 0;
    int i;
    char *fname = "/mnt/test.mmap";
    char *mem;
    unlink(fname);
    int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
    status = ftruncate(fd, kMemSize);
    mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    // Fill the memory with 1s.
    memset(mem, 1, kMemSize);
    sleep(2);
    for (i = 0; i < kMemSize; i++) {
    int byte_good = mem[i] != 0;
    if (!byte_good && ((i % kPageSize) == 0)) {
    //printf("%d ", i / kPageSize);
    count++;
    }
    }
    munmap(mem, kMemSize);
    close(fd);
    unlink(fname);

    if (count > 0) {
    printf("Running %d bad page\n", count);
    return 1;
    }
    return 0;
    }

    Cc: Ying Han
    Cc: Nick Piggin
    Signed-off-by: Jan Kara
    Cc: Mingming Cao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

01 Apr, 2009

1 commit