05 Jun, 2010

1 commit


03 Jun, 2010

1 commit


31 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    quota: Convert quota statistics to generic percpu_counter
    ext3 uses rb_node = NULL; to zero rb_root.
    quota: Fixup dquot_transfer
    reiserfs: Fix resuming of quotas on remount read-write
    pohmelfs: Remove dead quota code
    ufs: Remove dead quota code
    udf: Remove dead quota code
    quota: rename default quotactl methods to dquot_
    quota: explicitly set ->dq_op and ->s_qcop
    quota: drop remount argument to ->quota_on and ->quota_off
    quota: move unmount handling into the filesystem
    quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
    quota: move remount handling into the filesystem
    ocfs2: Fix use after free on remount read-only

    Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c

    Linus Torvalds
     

28 May, 2010

3 commits

  • We don't name our generic fsync implementations very well currently.
    The no-op implementation for in-memory filesystems currently is called
    simple_sync_file which doesn't make too much sense to start with,
    the the generic one for simple filesystems is called simple_fsync
    which can lead to some confusion.

    This patch renames the generic file fsync method to generic_file_fsync
    to match the other generic_file_* routines it is supposed to be used
    with, and the no-op implementation to noop_fsync to make it obvious
    what to expect. In addition add some documentation for both methods.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Make fsync sync new parent directories in no-journal mode
    ext4: Drop whitespace at end of lines
    ext4: Fix compat EXT4_IOC_ADD_GROUP
    ext4: Conditionally define compat ioctl numbers
    tracing: Convert more ext4 events to DEFINE_EVENT
    ext4: Add new tracepoints to track mballoc's buddy bitmap loads
    ext4: Add a missing trace hook
    ext4: restart ext4_ext_remove_space() after transaction restart
    ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
    ext4: Avoid crashing on NULL ptr dereference on a filesystem error
    ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
    ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
    ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
    ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
    ext4: Use our own write_cache_pages()
    ext4: Show journal_checksum option
    ext4: Fix for ext4_mb_collect_stats()
    ext4: check for a good block group before loading buddy pages
    ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
    ext4: Remove extraneous newlines in ext4_msg() calls
    ...

    Fixed up trivial conflict in fs/ext4/fsync.c

    Linus Torvalds
     

24 May, 2010

5 commits

  • Follow the dquot_* style used elsewhere in dquot.c.

    [Jan Kara: Fixed up missing conversion of ext2]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Remount handling has fully moved into the filesystem, so all this is
    superflous now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently the VFS calls into the quotactl interface for unmounting
    filesystems. This means filesystems with their own quota handling
    can't easily distinguish between user-space originating quotaoff
    and an unount. Instead move the responsibily of the unmount handling
    into the filesystem to be consistent with all other dquot handling.

    Note that we do call dquot_disable a lot later now, e.g. after
    a sync_filesystem. But this is fine as the quota code does all its
    writes via blockdev's mapping and that is synced even later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Instead of having wrappers in the VFS namespace export the dquot_suspend
    and dquot_resume helpers directly. Also rename vfs_quota_disable to
    dquot_disable while we're at it.

    [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently do_remount_sb calls into the dquot code to tell it about going
    from rw to ro and ro to rw. Move this code into the filesystem to
    not depend on the dquot code in the VFS - note ocfs2 already ignores
    these calls and handles remount by itself. This gets rid of overloading
    the quotactl calls and allows to unify the VFS and XFS codepaths in
    that area later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

22 May, 2010

5 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
    fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
    get rid of home-grown mutex in cris eeprom.c
    switch ecryptfs_write() to struct inode *, kill on-stack fake files
    switch ecryptfs_get_locked_page() to struct inode *
    simplify access to ecryptfs inodes in ->readpage() and friends
    AFS: Don't put struct file on the stack
    Ban ecryptfs over ecryptfs
    logfs: replace inode uid,gid,mode initialization with helper function
    ufs: replace inode uid,gid,mode initialization with helper function
    udf: replace inode uid,gid,mode init with helper
    ubifs: replace inode uid,gid,mode initialization with helper function
    sysv: replace inode uid,gid,mode initialization with helper function
    reiserfs: replace inode uid,gid,mode initialization with helper function
    ramfs: replace inode uid,gid,mode initialization with helper function
    omfs: replace inode uid,gid,mode initialization with helper function
    bfs: replace inode uid,gid,mode initialization with helper function
    ocfs2: replace inode uid,gid,mode initialization with helper function
    nilfs2: replace inode uid,gid,mode initialization with helper function
    minix: replace inode uid,gid,mode init with helper
    ext4: replace inode uid,gid,mode init with helper
    ...

    Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)

    Linus Torvalds
     
  • Signed-off-by: Dmitry Monakhov
    Signed-off-by: Al Viro

    Dmitry Monakhov
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: Al Viro

    Stephen Hemminger
     
  • Conflicts:
    fs/ext3/fsync.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Quota must being initialized if size or uid/git changes requested.
    But initialization performed in two different places:
    in case of i_size file system is responsible for dquot init
    , but in case of uid/gid init will be called internally in
    dquot_transfer().
    This ambiguity makes code harder to understand.
    Let's move this logic to one common helper function.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     

17 May, 2010

20 commits

  • Add a new ext4 state to tell us when a file has been newly created; use
    that state in ext4_sync_file in no-journal mode to tell us when we need
    to sync the parent directory as well as the inode and data itself. This
    fixes a problem in which a panic or power failure may lose the entire
    file even when using fsync, since the parent directory entry is lost.

    Addresses-Google-Bug: #2480057

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     
  • This patch was generated using:

    #!/usr/bin/perl -i
    while (<>) {
    s/[ ]+$//;
    print;
    }

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • struct ext4_new_group_input needs to be converted because u64 has
    only 32-bit alignment on some 32-bit architectures, notably i386.

    Signed-off-by: Ben Hutchings
    Signed-off-by: "Theodore Ts'o"

    Ben Hutchings
     
  • It is unnecessary, and in general impossible, to define the compat
    ioctl numbers except when building the filesystem with CONFIG_COMPAT
    defined.

    Signed-off-by: Ben Hutchings
    Signed-off-by: "Theodore Ts'o"

    Ben Hutchings
     
  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Commit f8ec9d6837241865cf99bed97bb99f4399fd5a03 added a
    trace event ext4_da_release_space, but didn't add some
    corresponding trace hook.

    Signed-off-by: Li Zefan
    Signed-off-by: "Theodore Ts'o"

    Li Zefan
     
  • If i_data_sem was internally dropped due to transaction restart, it is
    necessary to restart path look-up because extents tree was possibly
    modified by ext4_get_block().

    https://bugzilla.kernel.org/show_bug.cgi?id=15827

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"
    Acked-by: Jan Kara

    Dmitry Monakhov
     
  • Dimitry Monakhov discovered an edge case where it was possible for the
    EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true;
    I have a test case that can be exercised via downloading and
    decompressing the file:

    wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2
    bunzip2 eofblocks-fl-test-case.img
    dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc

    However, triggering it in real life is highly unlikely since it
    requires an extremely fragmented sparse file with a hole in exactly
    the right place in the extent tree. (It actually took quite a bit of
    work to generate this test case.) Still, it's nice to get even
    extreme corner cases to be correct, so this patch makes sure that we
    don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner
    case.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • If the EOFBLOCK_FL flag is set when it should not be and the inode is
    zero length, then eh_entries is zero, and ex is NULL, so dereferencing
    ex to print ex->ee_block causes a kernel OOPS in
    ext4_ext_map_blocks().

    On top of that, the error message which is printed isn't very helpful.
    So we fix this by printing something more explanatory which doesn't
    involve trying to print ex->ee_block.

    Addresses-Google-Bug: #2655740

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • At several places we modify EXT4_I(inode)->i_flags without holding
    i_mutex (ext4_do_update_inode, ...). These modifications are racy and
    we can lose updates to i_flags. So convert handling of i_flags to use
    bitops which are atomic.

    https://bugzilla.kernel.org/show_bug.cgi?id=15792

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • EXT4_ERROR_INODE() tends to provide better error information and in a
    more consistent format. Some errors were not even identifying the inode
    or directory which was corrupted, which made them not very useful.

    Addresses-Google-Bug: #2507977

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This saves a huge amount of stack space by avoiding unnecesary struct
    buffer_head's from being allocated on the stack.

    In addition, to make the code easier to understand, collapse and
    refactor ext4_get_block(), ext4_get_block_write(),
    noalloc_get_block_write(), into a single function.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
    which uses a much smaller structure, struct ext4_map_blocks which is
    20 bytes, as opposed to a struct buffer_head, which nearly 5 times
    bigger on an x86_64 machine. By switching things to use
    ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
    since we can avoid allocating a struct buffer_head on the stack.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Make a copy of write_cache_pages() for the benefit of
    ext4_da_writepages(). This allows us to simplify the code some, and
    will allow us to further customize the code in future patches.

    There are some nasty hacks in write_cache_pages(), which Linus has
    (correctly) characterized as vile. I've just copied it into
    write_cache_pages_da(), without trying to clean those bits up lest I
    break something in the ext4's delalloc implementation, which is a bit
    fragile right now. This will allow Dave Chinner to clean up
    write_cache_pages() in mm/page-writeback.c, without worrying about
    breaking ext4. Eventually write_cache_pages_da() will go away when I
    rewrite ext4's delayed allocation and create a general
    ext4_writepages() which is used for all of ext4's writeback. Until
    now this is the lowest risk way to clean up the core
    write_cache_pages() function.

    Signed-off-by: "Theodore Ts'o"
    Cc: Dave Chinner

    Theodore Ts'o
     
  • We failed to show journal_checksum option in /proc/mounts. Fix it.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it
    should be testing "best-extent.fe_len >= orig-extent.fe_len" , not
    "orig-extent.fe_len >= goal-extent.fe_len" .

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     
  • This adds a new field in ext4_group_info to cache the largest available
    block range in a block group; and don't load the buddy pages until *after*
    we've done a sanity check on the block group.

    With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
    partitions, it's easy to have no block groups with a block extent large
    enough to satisfy the input request length. This currently causes the loop
    during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
    for EVERY block group. That can be a lot of pages. The patch below allows
    us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
    have check again after we lock the block group).

    Addresses-Google-Bug: #2578108
    Addresses-Google-Bug: #2704453

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     
  • Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit
    and create a file larger than the limit. Add a check for that.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Amit Arora
    Signed-off-by: "Theodore Ts'o"

    Nikanth Karthikesan
     
  • Addresses-Google-Bug: #2562325

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     
  • This adds a "re-mounted" message to ext4_remount(), and both it and
    the mount message in ext4_fill_super() now have the original mount
    options data string.

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     

16 May, 2010

4 commits

  • Because we can badly over-reserve metadata when we
    calculate worst-case, it complicates things for quota, since
    we must reserve and then claim later, retry on EDQUOT, etc.
    Quota is also a generally smaller pool than fs free blocks,
    so this over-reservation hurts more, and more often.

    I'm of the opinion that it's not the worst thing to allow
    metadata to push a user slightly over quota. This simplifies
    the code and avoids the false quota rejections that result
    from worst-case speculation.

    This patch stops the speculative quota-charging for
    worst-case metadata requirements, and just charges quota
    when the blocks are allocated at writeout. It also is
    able to remove the try-again loop on EDQUOT.

    This patch has been tested indirectly by running the xfstests
    suite with a hack to mount & enable quota prior to the test.

    I also did a more specific test of fragmenting freespace
    and then doing a large delalloc write under quota; quota
    stopped me at the right amount of file IO, and then the
    writeout generated enough metadata (due to the fragmentation)
    that it put me slightly over quota, as expected.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Currently block/inode/dir counters initialized before journal was
    recovered. In fact after journal recovery this info will probably
    change. And freeblocks it critical for correct delalloc mode
    accounting.

    https://bugzilla.kernel.org/show_bug.cgi?id=15768

    Signed-off-by: Dmitry Monakhov
    Acked-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • - Reorganize locking scheme to batch two atomic operation in to one.
    This also allow us to state what healthy group must obey following rule
    ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
    - Fix possible undefined pointer dereference.
    - Even if group descriptor stats aren't accessible we have to update
    inode bitmaps.
    - Move non-group members update out of group_lock.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • The extents code will sometimes zero out blocks and mark them as
    initialized instead of splitting an extent into several smaller ones.
    This optimization however, causes problems if the extent is beyond
    i_size because fsck will complain if there are uninitialized blocks
    after i_size as this can not be distinguished from an inode that has
    an incorrect i_size field.

    https://bugzilla.kernel.org/show_bug.cgi?id=15742

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov