13 Oct, 2016

1 commit

  • When 'jh->b_transaction == transaction' (asserted by below)

    J_ASSERT_JH(jh, (jh->b_transaction == transaction || ...

    'journal->j_list_lock' will be incorrectly unlocked, since
    the the lock is aquired only at the end of if / else-if
    statements (missing the else case).

    Signed-off-by: Taesoo Kim
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28
    Cc: stable@vger.kernel.org # 3.14+

    Taesoo Kim
     

12 Oct, 2016

1 commit

  • The mapping_set_error() helper sets the correct AS_ flag for the mapping
    so there is no reason to open code it. Use the helper directly.

    [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
    Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

22 Sep, 2016

1 commit

  • Thomas has reported a lockdep splat hitting in
    add_transaction_credits(). The problem is that that function calls
    jbd2_might_wait_for_commit() while holding j_state_lock which is wrong
    (we do not really wait for transaction commit while holding that lock).

    Fix the problem by moving jbd2_might_wait_for_commit() into places where
    we are ready to wait for transaction commit and thus j_state_lock is
    unlocked.

    Cc: stable@vger.kernel.org
    Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae
    Reported-by: Thomas Gleixner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

16 Sep, 2016

1 commit

  • There are some repetitive code in jbd2_journal_init_dev() and
    jbd2_journal_init_inode(). So this patch moves the common code into
    journal_init_common() helper to simplify the code. And fix the coding
    style warnings reported by checkpatch.pl by the way.

    Signed-off-by: Geliang Tang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Geliang Tang
     

27 Jul, 2016

2 commits

  • Pull ext4 updates from Ted Ts'o:
    "The major change this cycle is deleting ext4's copy of the file system
    encryption code and switching things over to using the copies in
    fs/crypto. I've updated the MAINTAINERS file to add an entry for
    fs/crypto listing Jaeguk Kim and myself as the maintainers.

    There are also a number of bug fixes, most notably for some problems
    found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
    fixed is a writeback deadlock detected by generic/130, and some
    potential races in the metadata checksum code"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
    ext4: verify extent header depth
    ext4: short-cut orphan cleanup on error
    ext4: fix reference counting bug on block allocation error
    MAINTAINRES: fs-crypto maintainers update
    ext4 crypto: migrate into vfs's crypto engine
    ext2: fix filesystem deadlock while reading corrupted xattr block
    ext4: fix project quota accounting without quota limits enabled
    ext4: validate s_reserved_gdt_blocks on mount
    ext4: remove unused page_idx
    ext4: don't call ext4_should_journal_data() on the journal inode
    ext4: Fix WARN_ON_ONCE in ext4_commit_super()
    ext4: fix deadlock during page writeback
    ext4: correct error value of function verifying dx checksum
    ext4: avoid modifying checksum fields directly during checksum verification
    ext4: check for extents that wrap around
    jbd2: make journal y2038 safe
    jbd2: track more dependencies on transaction commit
    jbd2: move lockdep tracking to journal_s
    jbd2: move lockdep instrumentation for jbd2 handles
    ext4: respect the nobarrier mount option in nojournal mode
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

30 Jun, 2016

4 commits

  • The jbd2 journal stores the commit time in 64-bit seconds and 32-bit
    nanoseconds, which avoids an overflow in 2038, but it gets the numbers
    from current_kernel_time(), which uses 'long' seconds on 32-bit
    architectures.

    This simply changes the code to call current_kernel_time64() so
    we use 64-bit seconds consistently.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Arnd Bergmann
     
  • So far we were tracking only dependency on transaction commit due to
    starting a new handle (which may require commit to start a new
    transaction). Now add tracking also for other cases where we wait for
    transaction commit. This way lockdep can catch deadlocks e. g. because we
    call jbd2_journal_stop() for a synchronous handle with some locks held
    which rank below transaction start.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently lockdep map is tracked in each journal handle. To be able to
    expand lockdep support to cover also other cases where we depend on
    transaction commit and where handle is not available, move lockdep map
    into struct journal_s. Since this makes the lockdep map shared for all
    handles, we have to use rwsem_acquire_read() for acquisitions now.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The transaction the handle references is free to commit once we've
    decremented t_updates counter. Move the lockdep instrumentation to that
    place. Currently it was a bit later which did not really matter but
    subsequent improvements to lockdep instrumentation would cause false
    positives with it.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

25 Jun, 2016

1 commit

  • jbd2_alloc is explicit about its allocation preferences wrt. the
    allocation size. Sub page allocations go to the slab allocator and
    larger are using either the page allocator or vmalloc. This is all good
    but the logic is unnecessarily complex.

    1) as per Ted, the vmalloc fallback is a left-over:

    : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so
    : the code path that calls vmalloc() should never get called. When we
    : conveted jbd2_alloc() to suppor sub-page size allocations in commit
    : d2eecb039368, there was an assumption that it could be called with a size
    : greater than PAGE_SIZE, but that's certaily not true today.

    Moreover vmalloc allocation might even lead to a deadlock because the
    callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

    2) __GFP_REPEAT for requests
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

08 Jun, 2016

3 commits


25 May, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Fix a number of bugs, most notably a potential stale data exposure
    after a crash and a potential BUG_ON crash if a file has the data
    journalling flag enabled while it has dirty delayed allocation blocks
    that haven't been written yet. Also fix a potential crash in the new
    project quota code and a maliciously corrupted file system.

    In addition, fix some DAX-specific bugs, including when there is a
    transient ENOSPC situation and races between writes via direct I/O and
    an mmap'ed segment that could lead to lost I/O.

    Finally the usual set of miscellaneous cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
    ext4: pre-zero allocated blocks for DAX IO
    ext4: refactor direct IO code
    ext4: fix race in transient ENOSPC detection
    ext4: handle transient ENOSPC properly for DAX
    dax: call get_blocks() with create == 1 for write faults to unwritten extents
    ext4: remove unmeetable inconsisteny check from ext4_find_extent()
    jbd2: remove excess descriptions for handle_s
    ext4: remove unnecessary bio get/put
    ext4: silence UBSAN in ext4_mb_init()
    ext4: address UBSAN warning in mb_find_order_for_block()
    ext4: fix oops on corrupted filesystem
    ext4: fix check of dqget() return value in ext4_ioctl_setproject()
    ext4: clean up error handling when orphan list is corrupted
    ext4: fix hang when processing corrupted orphaned inode list
    ext4: remove trailing \n from ext4_warning/ext4_error calls
    ext4: fix races between changing inode journal mode and ext4_writepages
    ext4: handle unwritten or delalloc buffers before enabling data journaling
    ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
    ext4: do not ask jbd2 to write data for delalloc buffers
    jbd2: add support for avoiding data writes during transaction commits
    ...

    Linus Torvalds
     

24 Apr, 2016

1 commit

  • Currently when filesystem needs to make sure data is on permanent
    storage before committing a transaction it adds inode to transaction's
    inode list. During transaction commit, jbd2 writes back all dirty
    buffers that have allocated underlying blocks and waits for the IO to
    finish. However when doing writeback for delayed allocated data, we
    allocate blocks and immediately submit the data. Thus asking jbd2 to
    write dirty pages just unnecessarily adds more work to jbd2 possibly
    writing back other redirtied blocks.

    Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
    outstanding data writes before committing a transaction and thus avoid
    unnecessary writes.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

18 Apr, 2016

2 commits


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

14 Mar, 2016

1 commit

  • Journal transaction might fail prematurely because the frozen_buffer
    is allocated by GFP_NOFS request:
    [ 72.440013] do_get_write_access: OOM for frozen_buffer
    [ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
    [ 72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
    (...snipped....)
    [ 72.495559] do_get_write_access: OOM for frozen_buffer
    [ 72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
    [ 72.496839] do_get_write_access: OOM for frozen_buffer
    [ 72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
    [ 72.505766] Aborting journal on device sda1-8.
    [ 72.505851] EXT4-fs (sda1): Remounting filesystem read-only

    This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
    allocations upon OOM" because small GPF_NOFS allocations never failed.
    This allocation seems essential for the journal and GFP_NOFS is too
    restrictive to the memory allocator so let's use __GFP_NOFAIL here to
    emulate the previous behavior.

    Signed-off-by: Michal Hocko
    Signed-off-by: Theodore Ts'o

    Michal Hocko
     

10 Mar, 2016

1 commit

  • On umount path, jbd2_journal_destroy() writes latest transaction ID
    (->j_tail_sequence) to be used at next mount.

    The bug is that ->j_tail_sequence is not holding latest transaction ID
    in some cases. So, at next mount, there is chance to conflict with
    remaining (not overwritten yet) transactions.

    mount (id=10)
    write transaction (id=11)
    write transaction (id=12)
    umount (id=10) j_tail_sequence is not updated.
    (And another case is, __jbd2_journal_clean_checkpoint_list() is called
    with empty transaction.)

    So in above cases, ->j_tail_sequence is not pointing latest
    transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
    done too.

    So, to fix this problem with minimum changes, this patch updates
    ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes,
    some optimizations would be possible to avoid unnecessary REQ_FLUSH
    for example though.)

    BTW,

    journal->j_tail_sequence =
    ++journal->j_transaction_sequence;

    Increment of ->j_transaction_sequence seems to be unnecessary, but
    ext3 does this.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    OGAWA Hirofumi
     

23 Feb, 2016

4 commits


07 Jan, 2016

1 commit


08 Dec, 2015

1 commit

  • Pull ext4 fixes from Ted Ts'o:
    "Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
    some endian conversion problems with ext4 encryption, potential memory
    leaks after truncate in data=journal mode, and an ocfs2 regression
    caused by a jbd2 performance improvement"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: fix null committed data return in undo_access
    ext4: add "static" to ext4_seq_##name##_fops struct
    ext4: fix an endianness bug in ext4_encrypted_follow_link()
    ext4: fix an endianness bug in ext4_encrypted_zeroout()
    jbd2: Fix unreclaimed pages after truncate in data=journal mode
    ext4: Fix handling of extended tv_sec

    Linus Torvalds
     

05 Dec, 2015

1 commit

  • introduced jbd2_write_access_granted() to improve write|undo_access
    speed, but missed to check the status of b_committed_data which caused
    a kernel panic on ocfs2.

    [ 6538.405938] ------------[ cut here ]------------
    [ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400!
    [ 6538.406686] invalid opcode: 0000 [#1] SMP
    [ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod
    [ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1
    [ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
    [ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000
    [ 6538.406686] RIP: 0010:[] [] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
    [ 6538.406686] RSP: 0018:ffff880075b7b7f8 EFLAGS: 00010246
    [ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0
    [ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430
    [ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001
    [ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001
    [ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0
    [ 6538.406686] FS: 00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000
    [ 6538.406686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0
    [ 6538.406686] Stack:
    [ 6538.406686] ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800
    [ 6538.406686] ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d
    [ 6538.406686] ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000
    [ 6538.406686] Call Trace:
    [ 6538.406686] [] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2]
    [ 6538.406686] [] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2]
    [ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
    [ 6538.406686] [] _ocfs2_free_clusters+0xee/0x210 [ocfs2]
    [ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
    [ 6538.406686] [] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
    [ 6538.406686] [] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2]
    [ 6538.406686] [] ocfs2_free_clusters+0x15/0x20 [ocfs2]
    [ 6538.406686] [] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2]
    [ 6538.406686] [] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2]
    [ 6538.406686] [] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2]
    [ 6538.406686] [] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2]
    [ 6538.406686] [] ocfs2_remove_btree_range+0x374/0x630 [ocfs2]
    [ 6538.406686] [] ? jbd2_journal_stop+0x25b/0x470 [jbd2]
    [ 6538.406686] [] ocfs2_commit_truncate+0x305/0x670 [ocfs2]
    [ 6538.406686] [] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
    [ 6538.406686] [] ocfs2_truncate_file+0x297/0x380 [ocfs2]
    [ 6538.406686] [] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2]
    [ 6538.406686] [] ocfs2_setattr+0x572/0x860 [ocfs2]
    [ 6538.406686] [] ? current_fs_time+0x3f/0x50
    [ 6538.406686] [] notify_change+0x1d7/0x340
    [ 6538.406686] [] ? generic_getxattr+0x79/0x80
    [ 6538.406686] [] do_truncate+0x66/0x90
    [ 6538.406686] [] ? __audit_syscall_entry+0xb0/0x110
    [ 6538.406686] [] do_sys_ftruncate.clone.0+0xf3/0x120
    [ 6538.406686] [] SyS_ftruncate+0xe/0x10
    [ 6538.406686] [] entry_SYSCALL_64_fastpath+0x12/0x71
    [ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
    [ 6538.406686] RIP [] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
    [ 6538.406686] RSP
    [ 6538.691128] ---[ end trace 31cd7011d6770d7e ]---
    [ 6538.694492] Kernel panic - not syncing: Fatal exception
    [ 6538.695484] Kernel Offset: disabled

    Fixes: de92c8caf16c("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
    Cc:
    Signed-off-by: Junxiao Bi
    Signed-off-by: Theodore Ts'o

    Junxiao Bi
     

25 Nov, 2015

1 commit

  • Ted and Namjae have reported that truncated pages don't get timely
    reclaimed after being truncated in data=journal mode. The following test
    triggers the issue easily:

    for (i = 0; i < 1000; i++) {
    pwrite(fd, buf, 1024*1024, 0);
    fsync(fd);
    fsync(fd);
    ftruncate(fd, 0);
    }

    The reason is that journal_unmap_buffer() finds that truncated buffers
    are not journalled (jh->b_transaction == NULL), they are part of
    checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
    been already written out (!buffer_dirty(bh)). We clean such buffers but
    we leave them in the checkpoint list. Since checkpoint transaction holds
    a reference to the journal head, these buffers cannot be released until
    the checkpoint transaction is cleaned up. And at that point we don't
    call release_buffer_page() anymore so pages detached from mapping are
    lingering in the system waiting for reclaim to find them and free them.

    Fix the problem by removing buffers from transaction checkpoint lists
    when journal_unmap_buffer() finds out they don't have to be there
    anymore.

    Reported-and-tested-by: Namjae Jeon
    Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Jan Kara
     

08 Nov, 2015

1 commit

  • Merge second patch-bomb from Andrew Morton:

    - most of the rest of MM

    - procfs

    - lib/ updates

    - printk updates

    - bitops infrastructure tweaks

    - checkpatch updates

    - nilfs2 update

    - signals

    - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
    dma-debug, dma-mapping, ...

    * emailed patches from Andrew Morton : (102 commits)
    ipc,msg: drop dst nil validation in copy_msg
    include/linux/zutil.h: fix usage example of zlib_adler32()
    panic: release stale console lock to always get the logbuf printed out
    dma-debug: check nents in dma_sync_sg*
    dma-mapping: tidy up dma_parms default handling
    pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
    kexec: use file name as the output message prefix
    fs, seqfile: always allow oom killer
    seq_file: reuse string_escape_str()
    fs/seq_file: use seq_* helpers in seq_hex_dump()
    coredump: change zap_threads() and zap_process() to use for_each_thread()
    coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
    signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
    signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
    signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
    signals: kill block_all_signals() and unblock_all_signals()
    nilfs2: fix gcc uninitialized-variable warnings in powerpc build
    nilfs2: fix gcc unused-but-set-variable warnings
    MAINTAINERS: nilfs2: add header file for tracing
    nilfs2: add tracepoints for analyzing reading and writing metadata files
    ...

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

19 Oct, 2015

1 commit

  • If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
    journaling will be aborted first and the error number will be recorded
    into JBD2 superblock and, finally, the system will enter into the
    panic state in "errors=panic" option. But, in the rare case, this
    sequence is little twisted like the below figure and it will happen
    that the system enters into panic state, which means the system reset
    in mobile environment, before completion of recording an error in the
    journal superblock. In this case, e2fsck cannot recognize that the
    filesystem failure occurred in the previous run and the corruption
    wouldn't be fixed.

    Task A Task B
    ext4_handle_error()
    -> jbd2_journal_abort()
    -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    | __ext4_abort()
    | -> jbd2_journal_abort()
    | | -> __journal_abort_soft()
    | | -> if (journal->j_flags & JBD2_ABORT)
    | | return;
    | -> panic()
    |
    -> jbd2_journal_update_sb_errno()

    Tested-by: Hobin Woo
    Signed-off-by: Daeho Jeong
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Daeho Jeong
     

18 Oct, 2015

3 commits

  • Unlike comments and expectation of callers journal_clean_one_cp_list()
    returned 1 not only if it freed the transaction but also if it freed
    some buffers in the transaction. That could make
    __jbd2_journal_clean_checkpoint_list() skip processing
    t_checkpoint_io_list and continue with processing the next transaction.
    This is mostly a cosmetic issue since the only result is we can
    sometimes free less memory than we could. But it's still worth fixing.
    Fix journal_clean_one_cp_list() to return 1 only if the transaction was
    really freed.

    Fixes: 50849db32a9f529235a84bcc84a6b8e631b1d0ec
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Jan Kara
     
  • Create separate predicate functions to test/set/clear feature flags,
    thereby replacing the wordy old macros. Furthermore, clean out the
    places where we open-coded feature tests.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     
  • Instead of overloading EIO for CRC errors and corrupt structures,
    return the same error codes that XFS returns for the same issues.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     

15 Oct, 2015

1 commit

  • Change the journal's checksum functions to gate on whether or not the
    crc32c driver is loaded, and gate the loading on the superblock bits.
    This prevents a journal crash if someone loads a journal in no-csum
    mode and then randomizes the superblock, thus flipping on the feature
    bits.

    Tested-By: Nikolay Borisov
    Reported-by: Nikolay Borisov
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     

04 Aug, 2015

1 commit

  • Currently there is no limitation on number of reserved credits we can
    ask for. If we ask for more reserved credits than 1/2 of maximum
    transaction size, or if total number of credits exceeds the maximum
    transaction size per operation (which is currently only possible with
    the former) we will spin forever in start_this_handle().

    Fix this by adding this limitation at the start of start_this_handle().

    This patch also removes the credit limitation 1/2 of maximum transaction
    size, since we really only want to limit the number of reserved credits.
    There is not much point to limit the credits if there is still space in
    the journal.

    This accidentally also fixes the online resize, where due to the
    limitation of the journal credits we're unable to grow file systems with
    1k block size and size between 16M and 32M. It has been partially fixed
    by 2c869b262a10ca99cb866d04087d75311587a30c, but not entirely.

    Thanks Jan Kara for helping me getting the correct fix.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Lukas Czerner
     

29 Jul, 2015

1 commit

  • Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
    superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
    when the journal is aborted. That makes logic in
    jbd2_log_do_checkpoint() bail out which is fine, except that
    jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
    a progress in cleaning the journal. Without it jbd2_journal_destroy()
    just loops in an infinite loop.

    Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
    jbd2_log_do_checkpoint() fails with error.

    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

23 Jul, 2015

1 commit

  • When an error condition is detected, an error status should be recorded into
    superblocks of EXT4 or JBD2. However, the write request is submitted now
    without REQ_FUA flag, even in "barrier=1" mode, which is followed by
    panic() function in "errors=panic" mode. On mobile devices which make
    whole system reset as soon as kernel panic occurs, this write request
    containing an error flag will disappear just from storage cache without
    written to the physical cells. Therefore, when next start, even forever,
    the error flag cannot be shown in both superblocks, and e2fsck cannot fix
    the filesystem problems automatically, unless e2fsck is executed in
    force checking mode.

    [ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ]

    Signed-off-by: Daeho Jeong
    Signed-off-by: Theodore Ts'o

    Daeho Jeong
     

13 Jul, 2015

1 commit

  • It is often the case that we mark buffer as having dirty metadata when
    the buffer is already in that state (frequent for bitmaps, inode table
    blocks, superblock). Thus it is unnecessary to contend on grabbing
    journal head reference and bh_state lock. Avoid that by checking whether
    any modification to the buffer is needed before grabbing any locks or
    references.

    [ Note: this is a fixed version of commit 2143c1965a761, which was
    reverted in ebeaa8ddb3663b5 due to a false positive triggering of an
    assertion check. -- Ted ]

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara