29 Dec, 2008

1 commit

  • We have two seperate config entries for large devices/files. One
    is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
    that handles large files. This doesn't make a lot of sense, you typically
    want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
    to indicate that it covers both.

    Acked-by: Jean Delvare
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Dec, 2008

1 commit


11 Dec, 2008

2 commits

  • Revert

    commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e
    Author: Mingming Cao
    Date: Fri Jul 11 19:27:31 2008 -0400

    percpu_counter: new function percpu_counter_sum_and_set

    As described in

    revert "percpu counter: clean up percpu_counter_sum_and_set()"

    the new percpu_counter_sum_and_set() is racy against updates to the
    cpu-local accumulators on other CPUs. Revert that change.

    This means that ext4 will be slow again. But correct.

    Reported-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Peter Zijlstra
    Cc: Mingming Cao
    Cc:
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert

    commit 1f7c14c62ce63805f9574664a6c6de3633d4a354
    Author: Mingming Cao
    Date: Thu Oct 9 12:50:59 2008 -0400

    percpu counter: clean up percpu_counter_sum_and_set()

    Before this patch we had the following:

    percpu_counter_sum(): return the percpu_counter's value

    percpu_counter_sum_and_set(): return the percpu_counter's value, copying
    that value into the central value and zeroing the per-cpu counters before
    returning.

    After this patch, percpu_counter_sum_and_set() has gone, and
    percpu_counter_sum() gets the old percpu_counter_sum_and_set()
    functionality.

    Problem is, as Eric points out, the old percpu_counter_sum_and_set()
    functionality was racy and wrong. It zeroes out counters on "other" cpus,
    without holding any locks which will prevent races agaist updates from
    those other CPUS.

    This patch reverts 1f7c14c62ce63805f9574664a6c6de3633d4a354. This means
    that percpu_counter_sum_and_set() still has the race, but
    percpu_counter_sum() does not.

    Note that this is not a simple revert - ext4 has since started using
    percpu_counter_sum() for its dirty_blocks counter as well.

    Note that this revert patch changes percpu_counter_sum() semantics.

    Before the patch, a call to percpu_counter_sum() will bring the counter's
    central counter mostly up-to-date, so a following percpu_counter_read()
    will return a close value.

    After this patch, a call to percpu_counter_sum() will leave the counter's
    central accumulator unaltered, so a subsequent call to
    percpu_counter_read() can now return a significantly inaccurate result.

    If there is any code in the tree which was introduced after
    e8ced39d5e8911c662d4d69a342b9d053eaaac4e was merged, and which depends
    upon the new percpu_counter_sum() semantics, that code will break.

    Reported-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Peter Zijlstra
    Cc: Mingming Cao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

14 Nov, 2008

2 commits

  • Conflicts:
    security/keys/internal.h
    security/keys/process_keys.c
    security/keys/request_key.c

    Fixed conflicts above by using the non 'tsk' versions.

    Signed-off-by: James Morris

    James Morris
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Stephen Tweedie
    Cc: Andrew Morton
    Cc: adilger@sun.com
    Cc: linux-ext4@vger.kernel.org
    Signed-off-by: James Morris

    David Howells
     

07 Nov, 2008

3 commits

  • When initializing an uninitialized block group in ext4_new_inode(),
    its block group checksum must be re-calculated. This fixes a race
    when several threads try to allocate a new inode in an UNINIT'd group.

    There is some question whether we need to be initializing the block
    bitmap in ext4_new_inode() at all, but for now, if we are going to
    init the block group, let's eliminate the race.

    Signed-off-by: Frederic Bohe
    Signed-off-by: "Theodore Ts'o"

    Frederic Bohe
     
  • We need to make sure we mark the buffer_heads as dirty and uptodate
    so that block_write_full_page write them correctly.

    This fixes mmap corruptions that can occur in low memory situations.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • This fixes a 2.6.27 regression which was introduced in commit a02908f1.

    We weren't passing the chunk parameter down to the two subections,
    ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
    the result that massively overestimate the amount of credits needed by
    ext4_da_writepages, especially in the non-extents case. This causes
    failures especially on /boot partitions, which tend to be small and
    non-extent using since GRUB doesn't handle extents.

    This patch fixes the bug reported by Joseph Fannin at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11964

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

04 Nov, 2008

3 commits

  • In ext4_sync_fs, we only wait for a commit to finish if we started it,
    but there may be one already in progress which will not be synced.

    In the case of a data=ordered umount with pending long symlinks which
    are delayed due to a long list of other I/O on the backing block
    device, this causes the buffer associated with the long symlinks to
    not be moved to the inode dirty list in the second phase of
    fsync_super. Then, before they can be dirtied again, kjournald exits,
    seeing the UMOUNT flag and the dirty pages are never written to the
    backing block device, causing long symlink corruption and exposing new
    or previously freed block data to userspace.

    To ensure all commits are synced, we flush all journal commits now
    when sync_fs'ing ext4.

    Signed-off-by: Arthur Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"
    Cc: Eric Sandeen
    Cc:

    Theodore Ts'o
     
  • Use le16_to_cpu to read the s_reserved_gdt_blocks values
    from super block.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • If we try to free a block which is already freed, the code was
    returning without first unlocking the group.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

28 Oct, 2008

3 commits

  • As reported by Eric Paris, the capable() check in ext4_has_free_blocks()
    sometimes causes SELinux denials.

    We can rearrange the logic so that we only try to use the root-reserved
    blocks when necessary, and even then we can move the capable() test
    to last, to avoid the check most of the time.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Mingming pointed out that ext4_claim_free_blocks & ext4_has_free_blocks
    are largely cut & pasted; they can be collapsed/merged as follows.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Vegard Nossum reported a bug which accesses freed memory (found via
    kmemcheck). When journal has been aborted, ext4_put_super() calls
    ext4_abort() after freeing the journal_t object, and then ext4_abort()
    accesses it. This patch fix it.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Hidehiro Kawai
     

26 Oct, 2008

1 commit

  • Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
    hash collision handling", where deleting files in a large directory
    (requiring more than one getdents system call), results in some
    filenames being returned twice. This was caused by a failure to
    update info->curr_hash and info->curr_minor_hash, so that if the
    directory had gotten modified since the last getdents() system call
    (as would be the case if the user is running "rm -r" or "git clean"),
    a directory entry would get returned twice to the userspace.

    Signed-off-by: "Theodore Ts'o"

    This patch fixes the bug reported by Markus Trippelsdorf at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11844

    Signed-off-by: "Theodore Ts'o"
    Tested-by: Markus Trippelsdorf

    Theodore Ts'o
     

24 Oct, 2008

2 commits

  • Signed-off-by: Christoph Hellwig
    [ All users removed in "switch all filesystems over to d_obtain_alias",
    aka commit 440037287c5ebb07033ab927ca16bb68c291d309 ]
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
    [PATCH] kill the rest of struct file propagation in block ioctls
    [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
    [PATCH] get rid of blkdev_locked_ioctl()
    [PATCH] get rid of blkdev_driver_ioctl()
    [PATCH] sanitize blkdev_get() and friends
    [PATCH] remember mode of reiserfs journal
    [PATCH] propagate mode through swsusp_close()
    [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
    [PATCH] pass fmode_t to blkdev_put()
    [PATCH] kill the unused bsize on the send side of /dev/loop
    [PATCH] trim file propagation in block/compat_ioctl.c
    [PATCH] end of methods switch: remove the old ones
    [PATCH] switch sr
    [PATCH] switch sd
    [PATCH] switch ide-scsi
    [PATCH] switch tape_block
    [PATCH] switch dcssblk
    [PATCH] switch dasd
    [PATCH] switch mtd_blkdevs
    [PATCH] switch mmc
    ...

    Linus Torvalds
     

23 Oct, 2008

2 commits


21 Oct, 2008

2 commits


18 Oct, 2008

1 commit


17 Oct, 2008

4 commits


16 Oct, 2008

3 commits


14 Oct, 2008

3 commits


13 Oct, 2008

1 commit

  • fs/ext4/super.c: In function 'ext4_fill_super':
    fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use
    in this function)
    fs/ext4/super.c:2226: error: (Each undeclared identifier is reported
    only once
    fs/ext4/super.c:2226: error: for each function it appears in.)

    Signed-off-by: Alexander Beregalov
    Signed-off-by: Theodore Ts'o

    Alexander Beregalov
     

11 Oct, 2008

5 commits

  • We need to make sure we don't reuse the data blocks released
    during the transaction untill the transaction commits. We force
    this mode only for ordered and journalled mode. Writeback mode
    already don't provided data consistency.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Theodore Ts'o

    Aneesh Kumar K.V
     
  • During filesystem recovery we may be doing a truncate
    which expects some of the mballoc data structures to
    be initialized. So do ext4_mb_init before recovery.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Theodore Ts'o

    Aneesh Kumar K.V
     
  • If the journal doesn't abort when it gets an IO error in file data
    blocks, the file data corruption will spread silently. Because
    most of applications and commands do buffered writes without fsync(),
    they don't notice the IO error. It's scary for mission critical
    systems. On the other hand, if the journal aborts whenever it gets
    an IO error in file data blocks, the system will easily become
    inoperable. So this patch introduces a filesystem option to
    determine whether it aborts the journal or just call printk() when
    it gets an IO error in file data.

    If you mount an ext4 fs with data_err=abort option, it aborts on file
    data write error. If you mount it with data_err=ignore, it doesn't
    abort, just call printk(). data_err=ignore is the default.

    Here is the corresponding patch of the ext3 version:
    http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Theodore Ts'o

    Hidehiro Kawai
     
  • If the journal has aborted due to a checkpointing failure, we
    have to keep the contents of the journal space. Otherwise, the
    filesystem will lose uncheckpointed metadata completely and
    become inconsistent. To avoid this, we need to keep needs_recovery
    flag if checkpoint has failed.

    With this patch, ext4_put_super() detects a checkpointing failure
    from the return value of journal_destroy(), then it invokes
    ext4_abort() to make the filesystem read only and keep
    needs_recovery flag. Errors from jbd2_journal_flush() are also
    handled by this patch in some places.

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Theodore Ts'o

    Hidehiro Kawai
     
  • The ext4 filesystem is getting stable enough that it's time to drop
    the "dev" prefix. Also remove the requirement for the TEST_FILESYS
    flag.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

07 Oct, 2008

1 commit