12 Feb, 2011

1 commit

  • On an SMP ARM system running ext4, I've received a report that the
    first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

    J_ASSERT(journal->j_running_transaction != NULL);

    While investigating possible causes for this problem, I noticed that
    __jbd2_log_start_commit() is getting called with j_state_lock only
    read-locked, in spite of the fact that it's possible for it might
    j_commit_request. Fix this by grabbing the necessary information so
    we can test to see if we need to start a new transaction before
    dropping the read lock, and then calling jbd2_log_start_commit() which
    will grab the write lock.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

11 Jan, 2011

1 commit


23 Dec, 2010

1 commit


19 Dec, 2010

5 commits


17 Dec, 2010

1 commit


10 Dec, 2010

1 commit


18 Nov, 2010

1 commit


30 Oct, 2010

1 commit


28 Oct, 2010

6 commits

  • * 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4,jbd2: convert tracepoints to use major/minor numbers
    ext4: optimize orphan_list handling for ext4_setattr
    ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
    ext4: fix compile error in ext4_fallocate()
    ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
    ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
    ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
    ext4: rename {ext,idx}_pblock and inline small extent functions
    ext4: make various ext4 functions be static
    ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
    ext4: fix kernel oops if the journal superblock has a non-zero j_errno
    ext4: update writeback_index based on last page scanned
    ext4: implement writeback livelock avoidance using page tagging
    ext4: tidy up a void argument in inode.c
    ext4: add batched_discard into ext4 feature list
    ext4: Add batched discard support for ext4
    fs: Add FITRIM ioctl
    ext4: Use return value from sb_issue_discard()
    ext4: Check return value of sb_getblk() and friends
    ext4: use bio layer instead of buffer layer in mpage_da_submit_io
    ...

    Linus Torvalds
     
  • Conflicts:
    fs/ext4/inode.c
    fs/ext4/mballoc.c
    include/trace/events/ext4.h

    Theodore Ts'o
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
    quota: Fix possible oops in __dquot_initialize()
    ext3: Update kernel-doc comments
    jbd/2: fixed typos
    ext2: fixed typo.
    ext3: Fix debug messages in ext3_group_extend()
    jbd: Convert atomic_inc() to get_bh()
    ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
    jbd: Fix debug message in do_get_write_access()
    jbd: Check return value of __getblk()
    ext3: Use DIV_ROUND_UP() on group desc block counting
    ext3: Return proper error code on ext3_fill_super()
    ext3: Remove unnecessary casts on bh->b_data
    ext3: Cleanup ext3_setup_super()
    quota: Fix issuing of warnings from dquot_transfer
    quota: fix dquot_disable vs dquot_transfer race v2
    jbd: Convert bitops to buffer fns
    ext3/jbd: Avoid WARN() messages when failing to write the superblock
    jbd: Use offset_in_page() instead of manual calculation
    jbd: Remove unnecessary goto statement
    jbd: Use printk_ratelimited() in journal_alloc_journal_head()
    ...

    Linus Torvalds
     
  • An attempt to modify the file system during the call to
    jbd2_destroy_journal() can lead to a system lockup. So add some
    checking to make it much more obvious when this happens to and to
    determine where the offending code is located.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This fixes a hang seen in jbd2_journal_release_jbd_inode
    on a lot of Power 6 systems running with ext4. When we get
    in the hung state, all I/O to the disk in question gets blocked
    where we stay indefinitely. Looking at the task list, I can see
    we are stuck in jbd2_journal_release_jbd_inode waiting on a
    wake up. I added some debug code to detect this scenario and
    dump additional data if we were stuck in jbd2_journal_release_jbd_inode
    for longer than 30 minutes. When it hit, I was able to see that
    i_flags was 0, suggesting we missed the wake up.

    This patch changes i_flags to be an unsigned long, uses bit operators
    to access it, and adds barriers around the accesses. Prior to applying
    this patch, we were regularly hitting this hang on numerous systems
    in our test environment. After applying the patch, the hangs no longer
    occur.

    Signed-off-by: Brian King
    Signed-off-by: "Theodore Ts'o"

    Brian King
     
  • "wakup"

    Signed-off-by: Andrea Gelmini
    Signed-off-by: Jan Kara

    Andrea Gelmini
     

23 Oct, 2010

2 commits

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     
  • * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
    cfq-iosched: Fix a gcc 4.5 warning and put some comments
    block: Turn bvec_k{un,}map_irq() into static inline functions
    block: fix accounting bug on cross partition merges
    block: Make the integrity mapped property a bio flag
    block: Fix double free in blk_integrity_unregister
    block: Ensure physical block size is unsigned int
    blkio-throttle: Fix possible multiplication overflow in iops calculations
    blkio-throttle: limit max iops value to UINT_MAX
    blkio-throttle: There is no need to convert jiffies to milli seconds
    blkio-throttle: Fix link failure failure on i386
    blkio: Recalculate the throttled bio dispatch time upon throttle limit change
    blkio: Add root group to td->tg_list
    blkio: deletion of a cgroup was causes oops
    blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: revert bad fix for memory hotplug causing bounces
    Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: Prevent hang_check firing during long I/O
    cfq: improve fsync performance for small files
    ...

    Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

    Linus Torvalds
     

20 Sep, 2010

1 commit

  • Fsync performance for small files achieved by cfq on high-end disks is
    lower than what deadline can achieve, due to idling introduced between
    the sync write happening in process context and the journal commit.

    Moreover, when competing with a sequential reader, a process writing
    small files and fsync-ing them is starved.

    This patch fixes the two problems by:
    - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
    flag set,
    - force all queues that have REQ_NOIDLE requests to be put in the noidle
    tree.

    Having the queue associated to the fsync-ing process and the one associated
    to journal commits in the noidle tree allows:
    - switching between them without idling,
    - fairness vs. competing idling queues, since they will be serviced only
    after the noidle tree expires its slice.

    Acked-by: Vivek Goyal
    Reviewed-by: Jeff Moyer
    Tested-by: Jeff Moyer
    Signed-off-by: Corrado Zoccolo
    Signed-off-by: Jens Axboe

    Corrado Zoccolo
     

17 Sep, 2010

1 commit

  • All the blkdev_issue_* helpers can only sanely be used for synchronous
    caller. To issue cache flushes or barriers asynchronously the caller needs
    to set up a bio by itself with a completion callback to move the asynchronous
    state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
    specified when calling blkdev_issue_* and also remove the now unused flags
    argument to blkdev_issue_flush and blkdev_issue_zeroout. For
    blkdev_issue_discard we need to keep it for the secure discard flag, which
    gains a more descriptive name and loses the bitops vs flag confusion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

10 Sep, 2010

3 commits

  • Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to
    confirm that its journal supports 64-bit offsets. In particular, we
    need to check the journal's feature bits before recovering the journal.

    This is not possible with JBD2 at present, because the journal
    superblock (where the feature bits reside) is not loaded from disk until
    the journal is recovered.

    This patch loads the journal superblock in
    jbd2_journal_check_used_features() if it has not already been loaded,
    allowing us to check the feature bits before journal recovery.

    Signed-off-by: Patrick LoPresti
    Cc: linux-ext4@vger.kernel.org
    Acked-by: "Theodore Ts'o"
    Signed-off-by: Joel Becker

    Patrick J. LoPresti
     
  • Switch to the WRITE_FLUSH_FUA flag for journal commits and remove the
    EOPNOTSUPP detection for barriers.

    Signed-off-by: Christoph Hellwig
    Acked-by: Jan Kara
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently JBD2 relies blkdev_issue_flush() draining the queue when ASYNC_COMMIT
    feature is set. This property is going away so make JBD2 wait for buffers it
    needs on its own before submitting the cache flush.

    Signed-off-by: Jan Kara
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Jan Kara
     

18 Aug, 2010

2 commits

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Aug, 2010

1 commit


08 Aug, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Adding error check after calling ext4_mb_regular_allocator()
    ext4: Fix dirtying of journalled buffers in data=journal mode
    ext4: re-inline ext4_rec_len_(to|from)_disk functions
    jbd2: Remove t_handle_lock from start_this_handle()
    jbd2: Change j_state_lock to be a rwlock_t
    jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
    ext4: Add mount options in superblock
    ext4: force block allocation on quota_off
    ext4: fix freeze deadlock under IO
    ext4: drop inode from orphan list if ext4_delete_inode() fails
    ext4: check to make make sure bd_dev is set before dereferencing it
    jbd2: Make barrier messages less scary
    ext4: don't print scary messages for allocation failures post-abort
    ext4: fix EFBIG edge case when writing to large non-extent file
    ext4: fix ext4_get_blocks references
    ext4: Always journal quota file modifications
    ext4: Fix potential memory leak in ext4_fill_super
    ext4: Don't error out the fs if the user tries to make a file too big
    ext4: allocate stripe-multiple IOs on stripe boundaries
    ext4: move aio completion after unwritten extent conversion
    ...

    Fix up conflicts in fs/ext4/inode.c as per Ted.

    Fix up xfs conflicts as per earlier xfs merge.

    Linus Torvalds
     

04 Aug, 2010

2 commits


02 Aug, 2010

1 commit


27 Jul, 2010

2 commits

  • Saying things like "sync failed" when a device does
    not support barriers makes users slightly more worried than
    they need to be; rather than talking about sync failures,
    let's just state the barrier-based facts.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • __GFP_NOFAIL is going away, so add our own retry loop. Also add
    jbd2__journal_start() and jbd2__journal_restart() which take a gfp
    mask, so that file systems can optionally (re)start transaction
    handles using GFP_KERNEL. If they do this, then they need to be
    prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
    to reflect that error up to userspace.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

16 Jul, 2010

1 commit

  • OCFS2 uses t_commit trigger to compute and store checksum of the just
    committed blocks. When a buffer has b_frozen_data, checksum is computed
    for it instead of b_data but this can result in an old checksum being
    written to the filesystem in the following scenario:

    1) transaction1 is opened
    2) handle1 is opened
    3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
    4) modify(bh)
    5) journal_dirty(handle1, bh)
    6) handle1 is closed
    7) start committing transaction1, opening transaction2
    8) handle2 is opened
    9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
    jh->b_next_transaction is set to transaction2.
    10) jbd2_journal_write_metadata() checksums b_frozen_data
    11) the journal correctly writes b_frozen_data to the disk journal
    12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
    any more journal operation
    13) Checkpointing finally happens, and it just spools the bh via normal buffer
    writeback. This will write b_data, which was never triggered on and thus
    contains a wrong (old) checksum.

    This patch fixes the problem by calling the trigger at the moment data is
    frozen for journal commit - i.e., either when b_frozen_data is created by
    do_get_write_access or just before we write a buffer to the log if
    b_frozen_data does not exist. We also rename the trigger to t_frozen as
    that better describes when it is called.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Jan Kara
     

15 Jun, 2010

1 commit


28 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: Make fsync sync new parent directories in no-journal mode
    ext4: Drop whitespace at end of lines
    ext4: Fix compat EXT4_IOC_ADD_GROUP
    ext4: Conditionally define compat ioctl numbers
    tracing: Convert more ext4 events to DEFINE_EVENT
    ext4: Add new tracepoints to track mballoc's buddy bitmap loads
    ext4: Add a missing trace hook
    ext4: restart ext4_ext_remove_space() after transaction restart
    ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
    ext4: Avoid crashing on NULL ptr dereference on a filesystem error
    ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
    ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
    ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
    ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
    ext4: Use our own write_cache_pages()
    ext4: Show journal_checksum option
    ext4: Fix for ext4_mb_collect_stats()
    ext4: check for a good block group before loading buddy pages
    ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
    ext4: Remove extraneous newlines in ext4_msg() calls
    ...

    Fixed up trivial conflict in fs/ext4/fsync.c

    Linus Torvalds
     

22 May, 2010

1 commit


16 May, 2010

1 commit

  • One of the most contended locks in the jbd2 layer is j_state_lock when
    running dbench. This is especially true if using the real-time kernel
    with its "sleeping spinlocks" patch that replaces spinlocks with
    priority inheriting mutexes --- but it also shows up on large SMP
    benchmarks.

    Thanks to John Stultz for pointing this out.

    Reviewed by Mingming Cao and Jan Kara.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o