11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

5 commits

  • The mbcache code was written to support a variable number of indexes,
    but all the existing users use exactly one index. Simplify to code to
    support only that case.

    There are also no users of the cache entry free operation, and none of
    the users keep extra data in cache entries. Remove those features as
    well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split up the block_write_begin implementation - __block_write_begin is a new
    trivial wrapper for block_prepare_write that always takes an already
    allocated page and can be either called from block_write_begin or filesystem
    code that already has a page allocated. Remove the handling of already
    allocated pages from block_write_begin after switching all callers that
    do it to __block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

06 Aug, 2010

1 commit

  • In data=journal mode, we still use block_write_begin() to prepare page for
    writing. This function can occasionally mark buffer dirty which violates
    journalling assumptions - when a buffer is part of a transaction, it should be
    dirty and a buffer can be already part of a forget list of some transaction
    when block_write_begin() gets called. This violation of journalling assumptions
    then results in "JBD: Spotted dirty metadata buffer..." warnings.

    In fact, temporary dirtying the buffer while the page is still locked does not
    really cause problems to the journalling because we won't write the buffer
    until the page gets unlocked. So we just have to make sure to clear dirty bits
    before unlocking the page.

    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Jan Kara

    Jan Kara
     

23 Jul, 2010

1 commit

  • data=writeback mode is dangerous as it leads to higher data loss and stale data
    exposure when systems crash. It should not be the default, especially when all
    major distros ensure their ext3 filesystems default to ordered mode. Change the
    default mode to the safer data=ordered mode, because we should be caring far
    more about avoiding stale data exposure than performance.

    CC: linux-ext4@vger.kernel.org
    Signed-off-by: Dave Chinner
    Acked-by: Eric Sandeen
    Signed-off-by: Jan Kara

    Dave Chinner
     

21 Jul, 2010

3 commits

  • It can happen that ext3_free_branches calls ext3_forget() for an indirect block
    in an earlier transaction than a transaction in which we clear pointer to this
    indirect block. Thus if we crash before a transaction clearing the block
    pointer is committed, we will see indirect block pointing to already freed
    blocks and complain during orphan list cleanup.

    The fix is simple: Make sure ext3_forget() is called in the transaction
    doing block pointer clearing.

    This is a backport of an ext4 fix by Amir G.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • The nobh option was only supported for writeback mode, but given that all
    write paths (except mmapped writed) actually create buffer heads, it
    effectively was a no-op already.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • [tytso@mit.edu: Fix compilation with CONFIG_JBD_DEBUG enabled]

    Acked-by: tytso@mit.edu
    cc: linux-ext4@vger.kernel.org
    Signed-off-by: Andi Kleen
    Signed-off-by: Jan Kara

    Andi Kleen
     

25 Jun, 2010

1 commit


31 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    quota: Convert quota statistics to generic percpu_counter
    ext3 uses rb_node = NULL; to zero rb_root.
    quota: Fixup dquot_transfer
    reiserfs: Fix resuming of quotas on remount read-write
    pohmelfs: Remove dead quota code
    ufs: Remove dead quota code
    udf: Remove dead quota code
    quota: rename default quotactl methods to dquot_
    quota: explicitly set ->dq_op and ->s_qcop
    quota: drop remount argument to ->quota_on and ->quota_off
    quota: move unmount handling into the filesystem
    quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
    quota: move remount handling into the filesystem
    ocfs2: Fix use after free on remount read-only

    Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c

    Linus Torvalds
     

28 May, 2010

1 commit


27 May, 2010

1 commit

  • The problem with this is that 17d9ddc72fb8bba0d4f678 ("rbtree: Add support
    for augmented rbtrees") in the linux-next tree adds a new field to that
    struct which needs to be NULLas well. This patch uses RB_ROOT as the
    intializer so all of the relevant fields will be NULL'd.

    Signed-off-by: Venkatesh Pallipadi
    Cc: Eric Paris
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Jan Kara

    Venkatesh Pallipadi
     

24 May, 2010

5 commits

  • Follow the dquot_* style used elsewhere in dquot.c.

    [Jan Kara: Fixed up missing conversion of ext2]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Remount handling has fully moved into the filesystem, so all this is
    superflous now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently the VFS calls into the quotactl interface for unmounting
    filesystems. This means filesystems with their own quota handling
    can't easily distinguish between user-space originating quotaoff
    and an unount. Instead move the responsibily of the unmount handling
    into the filesystem to be consistent with all other dquot handling.

    Note that we do call dquot_disable a lot later now, e.g. after
    a sync_filesystem. But this is fine as the quota code does all its
    writes via blockdev's mapping and that is synced even later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Instead of having wrappers in the VFS namespace export the dquot_suspend
    and dquot_resume helpers directly. Also rename vfs_quota_disable to
    dquot_disable while we're at it.

    [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently do_remount_sb calls into the dquot code to tell it about going
    from rw to ro and ro to rw. Move this code into the filesystem to
    not depend on the dquot code in the VFS - note ocfs2 already ignores
    these calls and handles remount by itself. This gets rid of overloading
    the quotactl calls and allows to unify the VFS and XFS codepaths in
    that area later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

22 May, 2010

9 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
    fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
    get rid of home-grown mutex in cris eeprom.c
    switch ecryptfs_write() to struct inode *, kill on-stack fake files
    switch ecryptfs_get_locked_page() to struct inode *
    simplify access to ecryptfs inodes in ->readpage() and friends
    AFS: Don't put struct file on the stack
    Ban ecryptfs over ecryptfs
    logfs: replace inode uid,gid,mode initialization with helper function
    ufs: replace inode uid,gid,mode initialization with helper function
    udf: replace inode uid,gid,mode init with helper
    ubifs: replace inode uid,gid,mode initialization with helper function
    sysv: replace inode uid,gid,mode initialization with helper function
    reiserfs: replace inode uid,gid,mode initialization with helper function
    ramfs: replace inode uid,gid,mode initialization with helper function
    omfs: replace inode uid,gid,mode initialization with helper function
    bfs: replace inode uid,gid,mode initialization with helper function
    ocfs2: replace inode uid,gid,mode initialization with helper function
    nilfs2: replace inode uid,gid,mode initialization with helper function
    minix: replace inode uid,gid,mode init with helper
    ext4: replace inode uid,gid,mode init with helper
    ...

    Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)

    Linus Torvalds
     
  • Acked-by: Jan Kara
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Al Viro

    Dmitry Monakhov
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: Al Viro

    Stephen Hemminger
     
  • Conflicts:
    fs/ext3/fsync.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Quota must being initialized if size or uid/git changes requested.
    But initialization performed in two different places:
    in case of i_size file system is responsible for dquot init
    , but in case of uid/gid init will be called internally in
    dquot_transfer().
    This ambiguity makes code harder to understand.
    Let's move this logic to one common helper function.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • ext4 was updated to accept barrier/nobarrier mount options
    in addition to the older barrier=0/1. The barrier story
    is complex enough, we should help people by making the options
    the same at least, even if the defaults are different.

    This patch allows the barrier/nobarrier mount options for ext3,
    while keeping nobarrier the default.

    It also unconditionally displays barrier status in show_options,
    and prints a message at mount time if barriers are not enabled,
    just as ext4 does.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara

    Eric Sandeen
     
  • log_start_commit() returns 1 only when it started a transaction
    commit. Thus in case transaction commit is already running, we
    fail to wait for the commit to finish. Fix the issue by always
    waiting for the commit regardless of the log_start_commit return
    value.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Currently block/inode/dir counters are initialized before journal was
    recovered. In fact after journal recovery this info will probably
    change which results in incorrect numbers returned from statfs(2).
    BUG:#15768

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • There is no point in loading bitmap for groups which are completely full.
    This causes noticeable performance problems (and memory pressure) on small
    systems with large full filesystem
    (http://marc.info/?l=linux-ext4&m=126843108314310&w=2).

    Jan Kara: Added a comment and changed check to use cpu-endian value.

    Signed-off-by: "Frans van de Wiel"
    Signed-off-by: Jan Kara

    Frans van de Wiel
     

29 Apr, 2010

1 commit


13 Apr, 2010

1 commit


30 Mar, 2010

2 commits

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     
  • In commit 9df93939b735 ("ext3: Use bitops to read/modify
    EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
    use bitops for its state handling. However, unline the same ext4
    change, it didn't actually change the name of the field when it changed
    the semantics of it.

    As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
    initialized the field to EXT3_STATE_NEW. And that does not work
    _at_all_ when we're now working with individually named bits rather than
    values that get masked. So the code tried to mark the state to be new,
    but in actual fact set the field to EXT3_STATE_JDATA. Which makes no
    sense at all, and screws up all the code that checks whether the inode
    was newly allocated.

    In particular, it made the xattr code unhappy, and caused various random
    behavior, like apparently

    https://bugzilla.redhat.com/show_bug.cgi?id=577911

    So fix the initialization, and rename the field to match ext4 so that we
    don't have this happen again.

    Cc: James Morris
    Cc: Stephen Smalley
    Cc: Daniel J Walsh
    Cc: Eric Paris
    Cc: Jan Kara
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Mar, 2010

1 commit


06 Mar, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

4 commits

  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the drop dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_drop helper to __dquot_drop
    and vfs_dq_drop to dquot_drop to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently clear_inode calls vfs_dq_drop directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the drop inside the ->clear_inode
    superblock operation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig