06 Jan, 2017

2 commits

  • commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 upstream.

    The number of 'counters' elements needed in 'struct sg' is
    super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
    elements in the array. This is insufficient for block sizes >= 32k. In
    such cases the memcpy operation performed in ext4_mb_seq_groups_show()
    would cause stack memory corruption.

    Fixes: c9de560ded61f
    Signed-off-by: Chandan Rajendra
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Chandan Rajendra
     
  • commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 upstream.

    'border' variable is set to a value of 2 times the block size of the
    underlying filesystem. With 64k block size, the resulting value won't
    fit into a 16-bit variable. Hence this commit changes the data type of
    'border' to 'unsigned int'.

    Fixes: c9de560ded61f
    Signed-off-by: Chandan Rajendra
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Signed-off-by: Greg Kroah-Hartman

    Chandan Rajendra
     

29 Jul, 2016

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     

15 Jul, 2016

1 commit

  • If we hit this error when mounted with errors=continue or
    errors=remount-ro:

    EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata

    then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
    continue. However, ext4_mb_release_context() is the wrong thing to call
    here since we are still actually using the allocation context.

    Instead, just error out. We could retry the allocation, but there is a
    possibility of getting stuck in an infinite loop instead, so this seems
    safer.

    [ Fixed up so we don't return EAGAIN to userspace. --tytso ]

    Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
    Signed-off-by: Vegard Nossum
    Signed-off-by: Theodore Ts'o
    Cc: Aneesh Kumar K.V
    Cc: stable@vger.kernel.org

    Vegard Nossum
     

27 Jun, 2016

1 commit


30 May, 2016

1 commit


06 May, 2016

2 commits

  • Currently, in ext4_mb_init(), there's a loop like the following:

    do {
    ...
    offset += 1 << (sb->s_blocksize_bits - i);
    i++;
    } while (i s_blocksize_bits + 1);

    Note that the updated offset is used in the loop's next iteration only.

    However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
    the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
    and UBSAN reports

    UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
    shift exponent 4294967295 is too large for 32-bit type 'int'
    [...]
    Call Trace:
    [] dump_stack+0xbc/0x117
    [] ? _atomic_dec_and_lock+0x169/0x169
    [] ubsan_epilogue+0xd/0x4e
    [] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
    [] ? __ubsan_handle_load_invalid_value+0x158/0x158
    [] ? kmem_cache_alloc+0x101/0x390
    [] ? ext4_mb_init+0x13b/0xfd0
    [] ? create_cache+0x57/0x1f0
    [] ? create_cache+0x11a/0x1f0
    [] ? mutex_lock+0x38/0x60
    [] ? mutex_unlock+0x1b/0x50
    [] ? put_online_mems+0x5b/0xc0
    [] ? kmem_cache_create+0x117/0x2c0
    [] ext4_mb_init+0xc49/0xfd0
    [...]

    Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.

    Unless compilers start to do some fancy transformations (which at least
    GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
    such calculated value of offset is never used again.

    Silence UBSAN by introducing another variable, offset_incr, holding the
    next increment to apply to offset and adjust that one by right shifting it
    by one position per loop iteration.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161

    Cc: stable@vger.kernel.org
    Signed-off-by: Nicolai Stange
    Signed-off-by: Theodore Ts'o

    Nicolai Stange
     
  • Currently, in mb_find_order_for_block(), there's a loop like the following:

    while (order bd_blkbits + 1) {
    ...
    bb += 1 << (e4b->bd_blkbits - order);
    }

    Note that the updated bb is used in the loop's next iteration only.

    However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
    the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports

    UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
    shift exponent -1 is negative
    [...]
    Call Trace:
    [] dump_stack+0xbc/0x117
    [] ? _atomic_dec_and_lock+0x169/0x169
    [] ubsan_epilogue+0xd/0x4e
    [] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
    [] ? __ubsan_handle_load_invalid_value+0x158/0x158
    [] ? ext4_mb_generate_from_pa+0x590/0x590
    [] ? ext4_read_block_bitmap_nowait+0x598/0xe80
    [] mb_find_order_for_block+0x1ce/0x240
    [...]

    Unless compilers start to do some fancy transformations (which at least
    GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
    such calculated value of bb is never used again.

    Silence UBSAN by introducing another variable, bb_incr, holding the next
    increment to apply to bb and adjust that one by right shifting it by one
    position per loop iteration.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161

    Cc: stable@vger.kernel.org
    Signed-off-by: Nicolai Stange
    Signed-off-by: Theodore Ts'o

    Nicolai Stange
     

27 Apr, 2016

1 commit


05 Apr, 2016

2 commits

  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

14 Mar, 2016

1 commit

  • This might be unexpected but pages allocated for sbi->s_buddy_cache are
    charged to current memory cgroup. So, GFP_NOFS allocation could fail if
    current task has been killed by OOM or if current memory cgroup has no
    free memory left. Block allocator cannot handle such failures here yet.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Theodore Ts'o

    Konstantin Khlebnikov
     

10 Mar, 2016

1 commit


22 Feb, 2016

1 commit

  • Now, ext4_free_blocks() doesn't revoke data blocks of per-file data
    journalled inode and it can cause file data inconsistency problems.
    Even though data blocks of per-file data journalled inode are already
    forgotten by jbd2_journal_invalidatepage() in advance of invoking
    ext4_free_blocks(), we still need to revoke the data blocks here.
    Moreover some of the metadata blocks, which are not found by
    sb_find_get_block(), are still needed to be revoked, but this is also
    missing here.

    Signed-off-by: Daeho Jeong
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Daeho Jeong
     

12 Feb, 2016

1 commit


10 Nov, 2015

1 commit

  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

19 Oct, 2015

1 commit


18 Oct, 2015

2 commits

  • When you repeatly execute xfstest generic/269 with bigalloc_1k option
    enabled using the below command:

    "./kvm-xfstests -c bigalloc_1k -m nodelalloc -C 1000 generic/269"

    you can easily see the below bug message.

    "JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);"

    This means that an already revoked buffer is erroneously revoked again
    and it is caused by doing revoke for the buffer at the wrong position
    in ext4_free_blocks(). We need to re-position the buffer revoke
    procedure for an unspecified buffer after checking the cluster boundary
    for bigalloc option. If not, some part of the cluster can be doubly
    revoked.

    Signed-off-by: Daeho Jeong

    Daeho Jeong
     
  • Make the bitmap reaading routines return real error codes (EIO,
    EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
    userspace for more precise diagnosis work.

    In particular, this means that mballoc no longer claims that we're out
    of memory if the block bitmaps become corrupt.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     

24 Sep, 2015

1 commit


06 Jul, 2015

2 commits

  • Pull ext4 bugfixes from Ted Ts'o:
    "Bug fixes (all for stable kernels) for ext4:

    - address corner cases for indirect blocks->extent migration

    - fix reserved block accounting invalidate_page when
    page_size != block_size (i.e., ppc or 1k block size file systems)

    - fix deadlocks when a memcg is under heavy memory pressure

    - fix fencepost error in lazytime optimization"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: replace open coded nofail allocation in ext4_free_blocks()
    ext4: correctly migrate a file with a hole at the beginning
    ext4: be more strict when migrating to non-extent based file
    ext4: fix reservation release on invalidatepage for delalloc fs
    ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp
    bufferhead: Add _gfp version for sb_getblk()
    ext4: fix fencepost error in lazytime optimization

    Linus Torvalds
     
  • ext4_free_blocks is looping around the allocation request and mimics
    __GFP_NOFAIL behavior without any allocation fallback strategy. Let's
    remove the open coded loop and replace it with __GFP_NOFAIL. Without the
    flag the allocator has no way to find out never-fail requirement and
    cannot help in any way.

    Signed-off-by: Michal Hocko
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Michal Hocko
     

26 Jun, 2015

1 commit

  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     

15 Jun, 2015

1 commit


08 Jun, 2015

2 commits

  • Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
    the allocation group is suitable for use or not. However we might get
    various errors and fail while initializing new group including -EIO
    which would never get propagated up the call chain. This might lead to
    an endless loop at writeback when we're trying to find a good group to
    allocate from and we fail to initialize new group (read error for
    example).

    Fix this by returning proper error code from ext4_mb_good_group() and
    using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
    we will always return only the first occurred error from
    ext4_mb_good_group() and we only propagate it back to the caller if we
    do not get any other errors and we fail to allocate any blocks.

    Note that with other modes than errors=continue, we will fail
    immediately in ext4_mb_good_group() in case of error, however with
    errors=continue we should try to continue using the file system, that's
    why we're not going to fail immediately when we see an error from
    ext4_mb_good_group(), but rather when we fail to find a suitable block
    group to allocate from due to an problem in group initialization.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Darrick J. Wong

    Lukas Czerner
     
  • Currently on the machines with page size > block size when initializing
    block group buddy cache we initialize it for all the block group bitmaps
    in the page. However in the case of read error, checksum error, or if
    a single bitmap is in any way corrupted we would fail to initialize all
    of the bitmaps. This is problematic because we will not have access to
    the other allocation groups even though those might be perfectly fine
    and usable.

    Fix this by reading all the bitmaps instead of error out on the first
    problem and simply skip the bitmaps which were either not read properly,
    or are not valid.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o

    Lukas Czerner
     

02 Jun, 2015

1 commit

  • With the planned cgroup writeback support, backing-dev related
    declarations will be more widely used across block and cgroup;
    unfortunately, including backing-dev.h from include/linux/blkdev.h
    makes cyclic include dependency quite likely.

    This patch separates out backing-dev-defs.h which only has the
    essential definitions and updates blkdev.h to include it. c files
    which need access to more backing-dev details now include
    backing-dev.h directly. This takes backing-dev.h off the common
    include dependency chain making it a lot easier to use it across block
    and cgroup.

    v2: fs/fat build failure fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Tejun Heo
     

26 Nov, 2014

2 commits

  • The iput() function tests whether its argument is NULL and then
    returns immediately. Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Theodore Ts'o

    Markus Elfring
     
  • We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo
    and ext4_calculate_overhead() because they are called from inside a
    journal transaction. Call trace:

    ioctl
    ->ext4_group_add
    ->journal_start
    ->ext4_setup_new_descs
    ->ext4_mb_add_groupinfo -> GFP_KERNEL
    ->ext4_flex_group_add
    ->ext4_update_super
    ->ext4_calculate_overhead -> GFP_KERNEL
    ->journal_stop

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o

    Dmitry Monakhov
     

21 Nov, 2014

1 commit


21 Oct, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "A large number of cleanups and bug fixes, with some (minor) journal
    optimizations"

    [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ]

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits)
    ext4: check s_chksum_driver when looking for bg csum presence
    ext4: move error report out of atomic context in ext4_init_block_bitmap()
    ext4: Replace open coded mdata csum feature to helper function
    ext4: delete useless comments about ext4_move_extents
    ext4: fix reservation overflow in ext4_da_write_begin
    ext4: add ext4_iget_normal() which is to be used for dir tree lookups
    ext4: don't orphan or truncate the boot loader inode
    ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
    ext4: optimize block allocation on grow indepth
    ext4: get rid of code duplication
    ext4: fix over-defensive complaint after journal abort
    ext4: fix return value of ext4_do_update_inode
    ext4: fix mmap data corruption when blocksize < pagesize
    vfs: fix data corruption when blocksize < pagesize for mmaped data
    ext4: fold ext4_nojournal_sops into ext4_sops
    ext4: support freezing ext2 (nojournal) file systems
    ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()
    ext4: don't check quota format when there are no quota files
    jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
    jbd2: avoid pointless scanning of checkpoint lists
    ...

    Linus Torvalds
     

15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

02 Oct, 2014

1 commit


05 Sep, 2014

2 commits

  • Having done a full regression test, we can now drop the
    DELALLOC_RESERVED state flag.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Theodore Ts'o
     
  • The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
    because it was too hard to make sure the mballoc and get_block flags
    could be reliably passed down through all of the codepaths that end up
    calling ext4_mb_new_blocks().

    Since then, we have mb_flags passed down through most of the code
    paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
    as it used to.

    This commit plumbs in the last of what is required, and then adds a
    WARN_ON check to make sure we haven't missed anything. If this passes
    a full regression test run, we can then drop
    EXT4_STATE_DELALLOC_RESERVED.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

27 Aug, 2014

1 commit


24 Aug, 2014

1 commit

  • If we suffer a block allocation failure (for example due to a memory
    allocation failure), it's possible that we will call
    ext4_discard_allocated_blocks() before we've actually allocated any
    blocks. In that case, fe_len and fe_start in ac->ac_f_ex will still
    be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
    triggering the BUG_ON on mb_free_blocks():

    BUG_ON(last >= (sb->s_blocksize << 3));

    Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
    is zero.

    Also fix a missing ext4_mb_unload_buddy() call in
    ext4_discard_allocated_blocks().

    Google-Bug-Id: 16844242

    Fixes: 86f0afd463215fc3e58020493482faa4ac3a4d69
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

31 Jul, 2014

1 commit

  • If there is a failure while allocating the preallocation structure, a
    number of blocks can end up getting marked in the in-memory buddy
    bitmap, and then not getting released. This can result in the
    following corruption getting reported by the kernel:

    EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
    12793 clusters in bitmap, 12729 in gd

    In that case, we need to release the blocks using mb_free_blocks().

    Tested: fs smoke test; also demonstrated that with injected errors,
    the file system is no longer getting corrupted

    Google-Bug-Id: 16657874

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

28 Jul, 2014

1 commit


15 Jul, 2014

1 commit

  • Commit 27dd43854227b ("ext4: introduce reserved space") reserves 2% of
    the file system space to make sure metadata allocations will always
    succeed. Given that, tracking the reservation of metadata blocks is
    no longer necessary.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o