03 Jan, 2013

1 commit

  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. Perhaps the most serious bug fixed is one
    which could cause file system corruptions when performing file punch
    operations."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid hang when mounting non-journal filesystems with orphan list
    ext4: lock i_mutex when truncating orphan inodes
    ext4: do not try to write superblock on ro remount w/o journal
    ext4: include journal blocks in df overhead calcs
    ext4: remove unaligned AIO warning printk
    ext4: fix an incorrect comment about i_mutex
    ext4: fix deadlock in journal_unmap_buffer()
    ext4: split off ext4_journalled_invalidatepage()
    jbd2: fix assertion failure in jbd2_journal_flush()
    ext4: check dioread_nolock on remount
    ext4: fix extent tree corruption caused by hole punch

    Linus Torvalds
     

26 Dec, 2012

1 commit

  • We cannot wait for transaction commit in journal_unmap_buffer()
    because we hold page lock which ranks below transaction start. We
    solve the issue by bailing out of journal_unmap_buffer() and
    jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
    for waiting for transaction commit to finish and try invalidation
    again. Since the issue can happen only for page stradding i_size, it
    is simple enough to manually call jbd2_journal_invalidatepage() for
    such page from ext4_setattr(), check the return value and wait if
    necessary.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Dec, 2012

1 commit

  • The following race is possible between start_this_handle() and someone
    calling jbd2_journal_flush().

    Process A Process B
    start_this_handle().
    if (journal->j_barrier_count) # false
    if (!journal->j_running_transaction) { #true
    read_unlock(&journal->j_state_lock);
    jbd2_journal_lock_updates()
    jbd2_journal_flush()
    write_lock(&journal->j_state_lock);
    if (journal->j_running_transaction) {
    # false
    ... wait for committing trans ...
    write_unlock(&journal->j_state_lock);
    ...
    write_lock(&journal->j_state_lock);
    if (!journal->j_running_transaction) { # true
    jbd2_get_transaction(journal, new_transaction);
    write_unlock(&journal->j_state_lock);
    goto repeat; # eventually blocks on j_barrier_count > 0
    ...
    J_ASSERT(!journal->j_running_transaction);
    # fails

    We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
    in exclusive mode.

    Reported-by: yjwsignal@empal.com
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jan Kara
     

17 Dec, 2012

1 commit

  • Pull ext4 update from Ted Ts'o:
    "There are two major features for this merge window. The first is
    inline data, which allows small files or directories to be stored in
    the in-inode extended attribute area. (This requires that the file
    system use inodes which are at least 256 bytes or larger; 128 byte
    inodes do not have any room for in-inode xattrs.)

    The second new feature is SEEK_HOLE/SEEK_DATA support. This is
    enabled by the extent status tree patches, and this infrastructure
    will be used to further optimize ext4 in the future.

    Beyond that, we have the usual collection of code cleanups and bug
    fixes."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
    ext4: zero out inline data using memset() instead of empty_zero_page
    ext4: ensure Inode flags consistency are checked at build time
    ext4: Remove CONFIG_EXT4_FS_XATTR
    ext4: remove unused variable from ext4_ext_in_cache()
    ext4: remove redundant initialization in ext4_fill_super()
    ext4: remove redundant code in ext4_alloc_inode()
    ext4: use sync_inode_metadata() when syncing inode metadata
    ext4: enable ext4 inline support
    ext4: let fallocate handle inline data correctly
    ext4: let ext4_truncate handle inline data correctly
    ext4: evict inline data out if we need to strore xattr in inode
    ext4: let fiemap work with inline data
    ext4: let ext4_rename handle inline dir
    ext4: let empty_dir handle inline dir
    ext4: let ext4_delete_entry() handle inline data
    ext4: make ext4_delete_entry generic
    ext4: let ext4_find_entry handle inline data
    ext4: create a new function search_dir
    ext4: let ext4_readdir handle inline data
    ext4: let add_dir_entry handle inline data properly
    ...

    Linus Torvalds
     

19 Nov, 2012

1 commit


09 Nov, 2012

1 commit

  • ext4_handle_release_buffer() was intended to remove journal
    write access from a buffer, but it doesn't actually do anything
    at all other than add a BUFFER_TRACE point, but it's not reliably
    used for that either. Remove all the associated dead code.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Carlos Maiolino

    Eric Sandeen
     

08 Oct, 2012

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The big new feature added this time is supporting online resizing
    using the meta_bg feature. This allows us to resize file systems
    which are greater than 16TB. In addition, the speed of online
    resizing has been improved in general.

    We also fix a number of races, some of which could lead to deadlocks,
    in ext4's Asynchronous I/O and online defrag support, thanks to good
    work by Dmitry Monakhov.

    There are also a large number of more minor bug fixes and cleanups
    from a number of other ext4 contributors, quite of few of which have
    submitted fixes for the first time."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
    ext4: fix ext4_flush_completed_IO wait semantics
    ext4: fix mtime update in nodelalloc mode
    ext4: fix ext_remove_space for punch_hole case
    ext4: punch_hole should wait for DIO writers
    ext4: serialize truncate with owerwrite DIO workers
    ext4: endless truncate due to nonlocked dio readers
    ext4: serialize unlocked dio reads with truncate
    ext4: serialize dio nonlocked reads with defrag workers
    ext4: completed_io locking cleanup
    ext4: fix unwritten counter leakage
    ext4: give i_aiodio_unwritten a more appropriate name
    ext4: ext4_inode_info diet
    ext4: convert to use leXX_add_cpu()
    ext4: ext4_bread usage audit
    fs: reserve fallocate flag codepoint
    ext4: remove redundant offset check in mext_check_arguments()
    ext4: don't clear orphan list on ro mount with errors
    jbd2: fix assertion failure in commit code due to lacking transaction credits
    ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
    ext4: enable FITRIM ioctl on bigalloc file system
    ...

    Linus Torvalds
     

27 Sep, 2012

1 commit

  • ext4 users of data=journal mode with blocksize < pagesize were
    occasionally hitting assertion failure in
    jbd2_journal_commit_transaction() checking whether the transaction has
    at least as many credits reserved as buffers attached. The core of the
    problem is that when a file gets truncated, buffers that still need
    checkpointing or that are attached to the committing transaction are
    left with buffer_mapped set. When this happens to buffers beyond i_size
    attached to a page stradding i_size, subsequent write extending the file
    will see these buffers and as they are mapped (but underlying blocks
    were freed) things go awry from here.

    The assertion failure just coincidentally (and in this case luckily as
    we would start corrupting filesystem) triggers due to journal_head not
    being properly cleaned up as well.

    We fix the problem by unmapping buffers if possible (in lots of cases we
    just need a buffer attached to a transaction as a place holder but it
    must not be written out anyway). And in one case, we just have to bite
    the bullet and wait for transaction commit to finish.

    CC: Josef Bacik
    Signed-off-by: Jan Kara

    Jan Kara
     

19 Aug, 2012

1 commit

  • This sequence:

    # truncate --size=1g fsfile
    # mkfs.ext4 -F fsfile
    # mount -o loop,ro fsfile /mnt
    # umount /mnt
    # dmesg | tail

    results in an IO error when unmounting the RO filesystem:

    [ 318.020828] Buffer I/O error on device loop1, logical block 196608
    [ 318.027024] lost page write due to I/O error on loop1
    [ 318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8.

    This was a regression introduced by commit 24bcc89c7e7c: "jbd2: split
    updating of journal superblock and marking journal empty".

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Eric Sandeen
     

17 Aug, 2012

2 commits

  • Pull ext4 bug fixes from Ted Ts'o:
    "The following are all bug fixes and regressions. The most notable are
    the ones which cause problems for ext4 on RAID --- a performance
    problem when mounting very large filesystems, and a kernel OOPS when
    doing an rm -rf on large directory hierarchies on fast devices."

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix kernel BUG on large-scale rm -rf commands
    ext4: fix long mount times on very big file systems
    ext4: don't call ext4_error while block group is locked
    ext4: avoid kmemcheck complaint from reading uninitialized memory
    ext4: make sure the journal sb is written in ext4_clear_journal_err()

    Linus Torvalds
     
  • blkdev_issue_flush() can fail; make sure the error gets properly
    propagated.

    This is a port of the equivalent jbd patch from commit 349ecd6a3c0e.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

06 Aug, 2012

1 commit

  • After we transfer set the EXT4_ERROR_FS bit in the file system
    superblock, it's not enough to call jbd2_journal_clear_err() to clear
    the error indication from journal superblock --- we need to call
    jbd2_journal_update_sb_errno() as well. Otherwise, when the root file
    system is mounted read-only, the journal is replayed, and the error
    indicator is transferred to the superblock --- but the s_errno field
    in the jbd2 superblock is left set (since although we cleared it in
    memory, we never flushed it out to disk).

    This can end up confusing e2fsck. We should make e2fsck more robust
    in this case, but the kernel shouldn't be leaving things in this
    confused state, either.

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

04 Aug, 2012

1 commit


23 Jul, 2012

1 commit


01 Jun, 2012

1 commit


27 May, 2012

7 commits


23 May, 2012

1 commit


24 Apr, 2012

1 commit

  • flush request is issued in transaction commit code path, so looks using
    GFP_KERNEL to allocate memory for flush request bio falls into the classic
    deadlock issue. I saw btrfs and dm get it right, but ext4, xfs and md are
    using GFP.

    Signed-off-by: Shaohua Li
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Shaohua Li
     

29 Mar, 2012

3 commits

  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     
  • Pull ext4 updates for 3.4 from Ted Ts'o:
    "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes

    The changes to export dirty_writeback_interval are from Artem's s_dirt
    cleanup patch series. The same is true of the change to remove the
    s_dirt helper functions which never got used by anyone in-tree. I've
    run these changes by Al Viro, and am carrying them so that Artem can
    more easily fix up the rest of the file systems during the next merge
    window. (Originally we had hopped to remove the use of s_dirt from
    ext4 during this merge window, but his patches had some bugs, so I
    ultimately ended dropping them from the ext4 tree.)"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits)
    vfs: remove unused superblock helpers
    mm: export dirty_writeback_interval
    ext4: remove useless s_dirt assignment
    ext4: write superblock only once on unmount
    ext4: do not mark superblock as dirty unnecessarily
    ext4: correct ext4_punch_hole return codes
    ext4: remove restrictive checks for EOFBLOCKS_FL
    ext4: always set then trimmed blocks count into len
    ext4: fix trimmed block count accunting
    ext4: fix start and len arguments handling in ext4_trim_fs()
    ext4: update s_free_{inodes,blocks}_count during online resize
    ext4: change some printk() calls to use ext4_msg() instead
    ext4: avoid output message interleaving in ext4_error_()
    ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
    ext4: add no_printk argument validation, fix fallout
    ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
    ext4: give more helpful error message in ext4_ext_rm_leaf()
    ext4: remove unused code from ext4_ext_map_blocks()
    ext4: rewrite punch hole to use ext4_ext_remove_space()
    jbd2: cleanup journal tail after transaction commit
    ...

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Pull power management updates for 3.4 from Rafael Wysocki:
    "Assorted extensions and fixes including:

    * Introduction of early/late suspend/hibernation device callbacks.
    * Generic PM domains extensions and fixes.
    * devfreq updates from Axel Lin and MyungJoo Ham.
    * Device PM QoS updates.
    * Fixes of concurrency problems with wakeup sources.
    * System suspend and hibernation fixes."

    * tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
    PM / Domains: Check domain status during hibernation restore of devices
    PM / devfreq: add relation of recommended frequency.
    PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
    PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
    PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
    PM / Domains: Introduce "always on" device flag
    PM / Domains: Fix hibernation restore of devices, v2
    PM / Domains: Fix handling of wakeup devices during system resume
    sh_mmcif / PM: Use PM QoS latency constraint
    tmio_mmc / PM: Use PM QoS latency constraint
    PM / QoS: Make it possible to expose PM QoS latency constraints
    PM / Sleep: JBD and JBD2 missing set_freezable()
    PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
    PM / Freezer: Remove references to TIF_FREEZE in comments
    PM / Sleep: Add more wakeup source initialization routines
    PM / Hibernate: Enable usermodehelpers in hibernate() error path
    PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
    PM / Sleep: Fix race conditions related to wakeup source timer function
    PM / Sleep: Fix possible infinite loop during wakeup source destruction
    PM / Hibernate: print physical addresses consistently with other parts of kernel
    ...

    Linus Torvalds
     

20 Mar, 2012

1 commit


14 Mar, 2012

9 commits

  • Normally, we have to issue a cache flush before we can update journal tail in
    journal superblock, effectively wiping out old transactions from the journal.
    So use the fact that during transaction commit we issue cache flush anyway and
    opportunistically push journal tail as far as we can. Since update of journal
    superblock is still costly (we have to use WRITE_FUA), we update log tail only
    if we can free significant amount of space.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • All accesses to checkpointing entries in journal_head are protected
    by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
    need bh_state lock.

    Also the only part of journal head that the rest of checkpointing code
    needs to check is jh->b_transaction which is safe to read under
    j_list_lock.

    So we can safely remove bh_state lock from all of checkpointing code which
    makes it considerably prettier.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • The check b_jlist == BJ_None in __journal_try_to_free_buffer() is
    always true (__jbd2_journal_temp_unlink_buffer() also checks this in
    an assertion) so just remove it.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • BH_JWrite bit should be set when buffer is written to the journal. So
    checkpointing shouldn't set this bit when writing out buffer. This didn't
    cause any observable bug since BH_JWrite bit is used only for debugging
    purposes but it's good to have this consistent.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
    checkpointed buffers are on a stable storage - especially if buffers were
    written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
    caches. Thus when we update journal superblock effectively removing old
    transaction from journal, this write of superblock can get to stable storage
    before those checkpointed buffers which can result in filesystem corruption
    after a crash. Thus we must unconditionally issue a cache flush before we
    update journal superblock in these cases.

    A similar problem can also occur if journal superblock is written only in
    disk's caches, other transaction starts reusing space of the transaction
    cleaned from the log and power failure happens. Subsequent journal replay would
    still try to replay the old transaction but some of it's blocks may be already
    overwritten by the new transaction. For this reason we must use WRITE_FUA when
    updating log tail and we must first write new log tail to disk and update
    in-memory information only after that.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • With the latest and greatest changes to the freezer, I started seeing
    panics that were caused by jbd2 running post-process freezing and
    hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
    this back to a lack of set_freezable calls in both jbd and jbd2. Since
    they're clearly meant to be frozen (there are tests for freezing()), I
    submit the following patch to add the missing calls.

    Signed-off-by: Nigel Cunningham
    Acked-by: Jan Kara
    Signed-off-by: Rafael J. Wysocki

    Nigel Cunningham
     
  • There are some log tail updates that are not protected by j_checkpoint_mutex.
    Some of these are harmless because they happen during startup or shutdown but
    updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
    really race with other log tail updates (e.g. someone doing
    jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
    protect all log tail updates with j_checkpoint_mutex.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • There are three case of updating journal superblock. In the first case, we want
    to mark journal as empty (setting s_sequence to 0), in the second case we want
    to update log tail, in the third case we want to update s_errno. Split these
    cases into separate functions. It makes the code slightly more straightforward
    and later patches will make the distinction even more important.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

21 Feb, 2012

2 commits