18 Aug, 2010

2 commits

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Aug, 2010

1 commit


08 Aug, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Adding error check after calling ext4_mb_regular_allocator()
    ext4: Fix dirtying of journalled buffers in data=journal mode
    ext4: re-inline ext4_rec_len_(to|from)_disk functions
    jbd2: Remove t_handle_lock from start_this_handle()
    jbd2: Change j_state_lock to be a rwlock_t
    jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
    ext4: Add mount options in superblock
    ext4: force block allocation on quota_off
    ext4: fix freeze deadlock under IO
    ext4: drop inode from orphan list if ext4_delete_inode() fails
    ext4: check to make make sure bd_dev is set before dereferencing it
    jbd2: Make barrier messages less scary
    ext4: don't print scary messages for allocation failures post-abort
    ext4: fix EFBIG edge case when writing to large non-extent file
    ext4: fix ext4_get_blocks references
    ext4: Always journal quota file modifications
    ext4: Fix potential memory leak in ext4_fill_super
    ext4: Don't error out the fs if the user tries to make a file too big
    ext4: allocate stripe-multiple IOs on stripe boundaries
    ext4: move aio completion after unwritten extent conversion
    ...

    Fix up conflicts in fs/ext4/inode.c as per Ted.

    Fix up xfs conflicts as per earlier xfs merge.

    Linus Torvalds
     

04 Aug, 2010

2 commits


02 Aug, 2010

1 commit


27 Jul, 2010

2 commits

  • Saying things like "sync failed" when a device does
    not support barriers makes users slightly more worried than
    they need to be; rather than talking about sync failures,
    let's just state the barrier-based facts.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • __GFP_NOFAIL is going away, so add our own retry loop. Also add
    jbd2__journal_start() and jbd2__journal_restart() which take a gfp
    mask, so that file systems can optionally (re)start transaction
    handles using GFP_KERNEL. If they do this, then they need to be
    prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
    to reflect that error up to userspace.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 Jul, 2010

1 commit

  • OCFS2 uses t_commit trigger to compute and store checksum of the just
    committed blocks. When a buffer has b_frozen_data, checksum is computed
    for it instead of b_data but this can result in an old checksum being
    written to the filesystem in the following scenario:

    1) transaction1 is opened
    2) handle1 is opened
    3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
    4) modify(bh)
    5) journal_dirty(handle1, bh)
    6) handle1 is closed
    7) start committing transaction1, opening transaction2
    8) handle2 is opened
    9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
    jh->b_next_transaction is set to transaction2.
    10) jbd2_journal_write_metadata() checksums b_frozen_data
    11) the journal correctly writes b_frozen_data to the disk journal
    12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
    any more journal operation
    13) Checkpointing finally happens, and it just spools the bh via normal buffer
    writeback. This will write b_data, which was never triggered on and thus
    contains a wrong (old) checksum.

    This patch fixes the problem by calling the trigger at the moment data is
    frozen for journal commit - i.e., either when b_frozen_data is created by
    do_get_write_access or just before we write a buffer to the log if
    b_frozen_data does not exist. We also rename the trigger to t_frozen as
    that better describes when it is called.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Jan Kara
     

15 Jun, 2010

1 commit


28 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Make fsync sync new parent directories in no-journal mode
    ext4: Drop whitespace at end of lines
    ext4: Fix compat EXT4_IOC_ADD_GROUP
    ext4: Conditionally define compat ioctl numbers
    tracing: Convert more ext4 events to DEFINE_EVENT
    ext4: Add new tracepoints to track mballoc's buddy bitmap loads
    ext4: Add a missing trace hook
    ext4: restart ext4_ext_remove_space() after transaction restart
    ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
    ext4: Avoid crashing on NULL ptr dereference on a filesystem error
    ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
    ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
    ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
    ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
    ext4: Use our own write_cache_pages()
    ext4: Show journal_checksum option
    ext4: Fix for ext4_mb_collect_stats()
    ext4: check for a good block group before loading buddy pages
    ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
    ext4: Remove extraneous newlines in ext4_msg() calls
    ...

    Fixed up trivial conflict in fs/ext4/fsync.c

    Linus Torvalds
     

22 May, 2010

1 commit


16 May, 2010

1 commit

  • One of the most contended locks in the jbd2 layer is j_state_lock when
    running dbench. This is especially true if using the real-time kernel
    with its "sleeping spinlocks" patch that replaces spinlocks with
    priority inheriting mutexes --- but it also shows up on large SMP
    benchmarks.

    Thanks to John Stultz for pointing this out.

    Reviewed by Mingming Cao and Jan Kara.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

11 May, 2010

1 commit


29 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Feb, 2010

1 commit


16 Feb, 2010

1 commit

  • Delay discarding buffers in journal_unmap_buffer until
    we know that "add to orphan" operation has definitely been
    committed, otherwise the log space of committing transation
    may be freed and reused before truncate get committed, updates
    may get lost if crash happens.

    Signed-off-by: dingdinghua
    Signed-off-by: "Theodore Ts'o"

    dingdinghua
     

23 Dec, 2009

4 commits

  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • It triggers the warning in get_page_from_freelist(), and it isn't
    appropriate to use __GFP_NOFAIL here anyway.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14843

    Reported-by: Christian Casteyde
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Andrew Morton
     
  • This is a bit complicated because we are trying to optimize when we
    send barriers to the fs data disk. We could just throw in an extra
    barrier to the data disk whenever we send a barrier to the journal
    disk, but that's not always strictly necessary.

    We only need to send a barrier during a commit when there are data
    blocks which are must be written out due to an inode written in
    ordered mode, or if fsync() depends on the commit to force data blocks
    to disk. Finally, before we drop transactions from the beginning of
    the journal during a checkpoint operation, we need to guarantee that
    any blocks that were flushed out to the data disk are firmly on the
    rust platter before we drop the transaction from the journal.

    Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not
    correct. Make it writable so that we can start debuging.

    Signed-off-by: Yin Kangkai
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton
    Signed-off-by: Jan Kara

    Yin Kangkai
     

18 Dec, 2009

1 commit

  • This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as
    requested by Alexey:

    "I think I gave a good enough arguments to not merge it.
    To iterate:
    * patch makes impossible to start using ext3 on EXT3_FS=n kernels
    without reboot.
    * this is done only for one pointer on task_struct"

    None of config options which define task_struct are tristate directly
    or effectively."

    Requested-by: Alexey Dobriyan
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Dec, 2009

1 commit

  • journal_info in task_struct is used in journaling file system only. So
    introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: KONISHI Ryusuke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     

12 Dec, 2009

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
    ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
    ext3: Fix data / filesystem corruption when write fails to copy data
    ext4: Support for 64-bit quota format
    ext3: Support for vfsv1 quota format
    quota: Implement quota format with 64-bit space and inode limits
    quota: Move definition of QFMT_OCFS2 to linux/quota.h
    ext2: fix comment in ext2_find_entry about return values
    ext3: Unify log messages in ext3
    ext2: clear uptodate flag on super block I/O error
    ext2: Unify log messages in ext2
    ext3: make "norecovery" an alias for "noload"
    ext3: Don't update the superblock in ext3_statfs()
    ext3: journal all modifications in ext3_xattr_set_handle
    ext2: Explicitly assign values to on-disk enum of filetypes
    quota: Fix WARN_ON in lookup_one_len
    const: struct quota_format_ops
    ubifs: remove manual O_SYNC handling
    afs: remove manual O_SYNC handling
    kill wait_on_page_writeback_range
    vfs: Implement proper O_SYNC semantics
    ...

    Linus Torvalds
     

10 Dec, 2009

2 commits


07 Dec, 2009

1 commit

  • Now that the SLUB seems to be fixed so that it respects the requested
    alignment, use kmem_cache_alloc() to allocator if the block size of
    the buffer heads to be allocated is less than the page size.
    Previously, we were using 16k page on a Power system for each buffer,
    even when the file system was using 1k or 4k block size.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

01 Dec, 2009

1 commit


16 Nov, 2009

1 commit

  • If there is a failed journal checksum, don't reset the journal. This
    allows for userspace programs to decide how to recover from this
    situation. It may be that ignoring the journal checksum failure might
    be a better way of recovering the file system. Once we add per-block
    checksums, we can definitely do better. Until then, a system
    administrator can try backing up the file system image (or taking a
    snapshot) and and trying to determine experimentally whether ignoring
    the checksum failure or aborting the journal replay results in less
    data loss.

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

11 Nov, 2009

1 commit


02 Oct, 2009

1 commit


30 Sep, 2009

2 commits

  • The /proc/fs/jbd2//history was maintained manually; by using
    tracepoints, we can get all of the existing functionality of the /proc
    file plus extra capabilities thanks to the ftrace infrastructure. We
    save memory as a bonus.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • There are a number of kernel printk's which are printed when an ext4
    filesystem is mounted and unmounted. Disable them to economize space
    in the system logs. In addition, disabling the mballoc stats by
    default saves a number of unneeded atomic operations for every block
    allocation or deallocation.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

23 Sep, 2009

1 commit

  • Make all seq_operations structs const, to help mitigate against
    revectoring user-triggerable function pointers.

    This is derived from the grsecurity patch, although generated from scratch
    because it's simpler than extracting the changes from there.

    Signed-off-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morris
     

11 Sep, 2009

1 commit

  • Previously the journal_async_commit mount option was equivalent to
    using barrier=0 (and just as unsafe). This patch fixes it so that we
    eliminate the barrier before the commit block (by not using ordered
    mode), and explicitly issuing an empty barrier bio after writing the
    commit block. Because of the journal checksum, it is safe to do this;
    if the journal blocks are not all written before a power failure, the
    checksum in the commit block will prevent the last transaction from
    being replayed.

    Using the fs_mark benchmark, using journal_async_commit shows a 50%
    improvement:

    FSUse% Count Size Files/sec App Overhead
    8 1000 10240 30.5 28242

    vs.

    FSUse% Count Size Files/sec App Overhead
    8 1000 10240 45.8 28620

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

18 Aug, 2009

1 commit


11 Aug, 2009

1 commit

  • fix jiffie rounding in jbd commit timer setup code. Rounding down
    could cause the timer to be fired before the corresponding transaction
    has expired. That transaction can stay not committed forever if no
    new transaction is created or expicit sync/umount happens.

    Signed-off-by: Alex Zhuravlev (Tomas)
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Andreas Dilger
     

17 Jul, 2009

1 commit