04 Jul, 2013

2 commits

  • Pull pstore update from Tony Luck:
    "Fixes for pstore for 3.11 merge window"

    * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    efivars: If pstore_register fails, free unneeded pstore buffer
    acpi: Eliminate console msg if pstore.backend excludes ERST
    pstore: Return unique error if backend registration excluded by kernel param
    pstore: Fail to unlink if a driver has not defined pstore_erase
    pstore/ram: remove the power of buffer size limitation
    pstore/ram: avoid atomic accesses for ioremapped regions
    efi, pstore: Cocci spatch "memdup.spatch"

    Linus Torvalds
     
  • Pull second set of VFS changes from Al Viro:
    "Assorted f_pos race fixes, making do_splice_direct() safe to call with
    i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
    ->d_hash/->d_compare calling conventions changes from Linus, misc
    stuff all over the place."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    Document ->tmpfile()
    ext4: ->tmpfile() support
    vfs: export lseek_execute() to modules
    lseek_execute() doesn't need an inode passed to it
    block_dev: switch to fixed_size_llseek()
    cpqphp_sysfs: switch to fixed_size_llseek()
    tile-srom: switch to fixed_size_llseek()
    proc_powerpc: switch to fixed_size_llseek()
    ubi/cdev: switch to fixed_size_llseek()
    pci/proc: switch to fixed_size_llseek()
    isapnp: switch to fixed_size_llseek()
    lpfc: switch to fixed_size_llseek()
    locks: give the blocked_hash its own spinlock
    locks: add a new "lm_owner_key" lock operation
    locks: turn the blocked_list into a hashtable
    locks: convert fl_link to a hlist_node
    locks: avoid taking global lock if possible when waking up blocked waiters
    locks: protect most of the file_lock handling with i_lock
    locks: encapsulate the fl_link list handling
    locks: make "added" in __posix_lock_file a bool
    ...

    Linus Torvalds
     

03 Jul, 2013

10 commits

  • very similar to ext3 counterpart...

    Signed-off-by: Al Viro

    Al Viro
     
  • For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
    SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
    matter in lseek_execute() to update the current file offset
    to the desired offset if it is valid, ceph also does the
    simliar things at ceph_llseek().

    To reduce the duplications, this patch make lseek_execute()
    public accessible so that we can call it directly from the
    underlying file systems.

    Thanks Dave Chinner for this suggestion.

    [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

    v2->v1:
    - Add kernel-doc comments for lseek_execute()
    - Call lseek_execute() in ceph->llseek()

    Signed-off-by: Jie Liu
    Cc: Dave Chinner
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: Ben Myers
    Cc: Ted Tso
    Cc: Hugh Dickins
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Sage Weil
    Signed-off-by: Al Viro

    Jie Liu
     
  • Pull driver core updates from Greg KH:
    "Here's the big driver core merge for 3.11-rc1

    Lots of little things, and larger firmware subsystem updates, all
    described in the shortlog. Nice thing here is that we finally get rid
    of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
    been always on for a number of kernel releases, now it's just
    removed)"

    * tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
    driver core: device.h: fix doc compilation warnings
    firmware loader: fix another compile warning with PM_SLEEP unset
    build some drivers only when compile-testing
    firmware loader: fix compile warning with PM_SLEEP set
    kobject: sanitize argument for format string
    sysfs_notify is only possible on file attributes
    firmware loader: simplify holding module for request_firmware
    firmware loader: don't export cache_firmware and uncache_firmware
    drivers/base: Use attribute groups to create sysfs memory files
    firmware loader: fix compile warning
    firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER
    Documentation: Updated broken link in HOWTO
    Finally eradicate CONFIG_HOTPLUG
    driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend
    driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware
    Documentation: Tidy up some drivers/base/core.c kerneldoc content.
    platform_device: use a macro instead of platform_driver_register
    firmware: move EXPORT_SYMBOL annotations
    firmware: Avoid deadlock of usermodehelper lock at shutdown
    dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly
    ...

    Linus Torvalds
     
  • Pull FS-Cache updates from David Howells:
    "This contains a number of fixes for various FS-Cache issues plus some
    cleanups. The commits are, in order:

    1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing
    the bit-wait table (enhancement for #8).

    2) Don't put spin_lock() in a while-condition as spin_lock() may have
    a do {} while(0) wrapper (cleanup).

    3) Symbolically name i_mutex lock classes rather than using numbers
    in CacheFiles (cleanup).

    4) Don't sleep in page release if __GFP_FS is not set (deadlock vs
    ext4).

    5) Uninline fscache_object_init() (cleanup for #7).

    6) Wrap checks on object state (cleanup for #7).

    7) Simplify the object state machine by separating work states from
    wait states.

    8) Simplify cookie retention by objects (NULL pointer deref fix).

    9) Remove unused list_to_page() macro (cleanup).

    10) Make the remaining-pages counter in the retrieval op atomic
    (assertion failure fix).

    11) Don't use spin_is_locked() in assertions (assertion failure fix)"

    * tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    FS-Cache: Don't use spin_is_locked() in assertions
    FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
    cachefiles: remove unused macro list_to_page()
    FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
    FS-Cache: Fix object state machine to have separate work and wait states
    FS-Cache: Wrap checks on object state
    FS-Cache: Uninline fscache_object_init()
    FS-Cache: Don't sleep in page release if __GFP_FS is not set
    CacheFiles: name i_mutex lock class explicitly
    fs/fscache: remove spin_lock() from the condition in while()
    Add wait_on_atomic_t() and wake_up_atomic_t()

    Linus Torvalds
     
  • Pull dlm updates from David Teigland:
    "This set includes a number of SCTP related fixes in the dlm, and a few
    other minor fixes and changes."

    * tag 'dlm-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    dlm: Avoid LVB truncation
    dlm: log an error for unmanaged lockspaces
    dlm: config: using strlcpy instead of strncpy
    dlm: remove duplicated include from lowcomms.c
    dlm: disable nagle for SCTP
    dlm: retry failed SCTP sends
    dlm: try other IPs when sctp init assoc fails
    dlm: clear correct bit during sctp init failure handling
    dlm: set sctp assoc id during setup
    dlm: clear correct init bit during sctp setup

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "This patch-set includes the following major enhancement patches:
    - remount_fs callback function
    - restore parent inode number to enhance the fsync performance
    - xattr security labels
    - reduce the number of redundant lock/unlock data pages
    - avoid frequent write_inode calls

    The other minor bug fixes are as follows.
    - endian conversion bugs
    - various bugs in the roll-forward recovery routine"

    * tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits)
    f2fs: fix to recover i_size from roll-forward
    f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
    f2fs: remove reusing any prefree segments
    f2fs: code cleanup and simplify in func {find/add}_gc_inode
    f2fs: optimize the init_dirty_segmap function
    f2fs: fix an endian conversion bug detected by sparse
    f2fs: fix crc endian conversion
    f2fs: add remount_fs callback support
    f2fs: recover wrong pino after checkpoint during fsync
    f2fs: optimize do_write_data_page()
    f2fs: make locate_dirty_segment() as static
    f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
    f2fs: avoid freqeunt write_inode calls
    f2fs: optimise the truncate_data_blocks_range() range
    f2fs: use the F2FS specific flags in f2fs_ioctl()
    f2fs: sync dir->i_size with its block allocation
    f2fs: fix i_blocks translation on various types of files
    f2fs: set sb->s_fs_info before calling parse_options()
    f2fs: support xattr security labels
    f2fs: fix iget/iput of dir during recovery
    ...

    Linus Torvalds
     
  • Pull GFS2 updates from Steven Whitehouse:
    "There are a few bug fixes for various, mostly very minor corner cases,
    plus some interesting new features.

    The new features include atomic_open whose main benefit will be the
    reduction in locking overhead in case of combined lookup/create and
    open operations, sorting the log buffer lists by block number to
    improve the efficiency of AIL writeback, and aggressively issuing
    revokes in gfs2_log_flush to reduce overhead when dropping glocks."

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: Reserve journal space for quota change in do_grow
    GFS2: Fix fstrim boundary conditions
    GFS2: fix warning message
    GFS2: aggressively issue revokes in gfs2_log_flush
    GFS2: fix regression in dir_double_exhash
    GFS2: Add atomic_open support
    GFS2: Only do one directory search on create
    GFS2: fix error propagation in init_threads()
    GFS2: Remove no-op wrapper function
    GFS2: Cocci spatch "ptr_ret.spatch"
    GFS2: Eliminate gfs2_rg_lops
    GFS2: Sort buffer lists by inplace block number

    Linus Torvalds
     
  • Pull ext4 update from Ted Ts'o:
    "Lots of bug fixes, cleanups and optimizations. In the bug fixes
    category, of note is a fix for on-line resizing file systems where the
    block size is smaller than the page size (i.e., file systems 1k blocks
    on x86, or more interestingly file systems with 4k blocks on Power or
    ia64 systems.)

    In the cleanup category, the ext4's punch hole implementation was
    significantly improved by Lukas Czerner, and now supports bigalloc
    file systems. In addition, Jan Kara significantly cleaned up the
    write submission code path. We also improved error checking and added
    a few sanity checks.

    In the optimizations category, two major optimizations deserve
    mention. The first is that ext4_writepages() is now used for
    nodelalloc and ext3 compatibility mode. This allows writes to be
    submitted much more efficiently as a single bio request, instead of
    being sent as individual 4k writes into the block layer (which then
    relied on the elevator code to coalesce the requests in the block
    queue). Secondly, the extent cache shrink mechanism, which was
    introduce in 3.9, no longer has a scalability bottleneck caused by the
    i_es_lru spinlock. Other optimizations include some changes to reduce
    CPU usage and to avoid issuing empty commits unnecessarily."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
    ext4: optimize starting extent in ext4_ext_rm_leaf()
    jbd2: invalidate handle if jbd2_journal_restart() fails
    ext4: translate flag bits to strings in tracepoints
    ext4: fix up error handling for mpage_map_and_submit_extent()
    jbd2: fix theoretical race in jbd2__journal_restart
    ext4: only zero partial blocks in ext4_zero_partial_blocks()
    ext4: check error return from ext4_write_inline_data_end()
    ext4: delete unnecessary C statements
    ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
    jbd2: move superblock checksum calculation to jbd2_write_superblock()
    ext4: pass inode pointer instead of file pointer to punch hole
    ext4: improve free space calculation for inline_data
    ext4: reduce object size when !CONFIG_PRINTK
    ext4: improve extent cache shrink mechanism to avoid to burn CPU time
    ext4: implement error handling of ext4_mb_new_preallocation()
    ext4: fix corruption when online resizing a fs with 1K block size
    ext4: delete unused variables
    ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
    jbd2: remove debug dependency on debug_fs and update Kconfig help text
    jbd2: use a single printk for jbd_debug()
    ...

    Linus Torvalds
     
  • Pull VFS patches (part 1) from Al Viro:
    "The major change in this pile is ->readdir() replacement with
    ->iterate(), dealing with ->f_pos races in ->readdir() instances for
    good.

    There's a lot more, but I'd prefer to split the pull request into
    several stages and this is the first obvious cutoff point."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
    [readdir] constify ->actor
    [readdir] ->readdir() is gone
    [readdir] convert ecryptfs
    [readdir] convert coda
    [readdir] convert ocfs2
    [readdir] convert fatfs
    [readdir] convert xfs
    [readdir] convert btrfs
    [readdir] convert hostfs
    [readdir] convert afs
    [readdir] convert ncpfs
    [readdir] convert hfsplus
    [readdir] convert hfs
    [readdir] convert befs
    [readdir] convert cifs
    [readdir] convert freevxfs
    [readdir] convert fuse
    [readdir] convert hpfs
    reiserfs: switch reiserfs_readdir_dentry to inode
    reiserfs: is_privroot_deh() needs only directory inode, actually
    ...

    Linus Torvalds
     
  • When sync does it's WB_SYNC_ALL writeback, it issues data Io and
    then immediately waits for IO completion. This is done in the
    context of the flusher thread, and hence completely ties up the
    flusher thread for the backing device until all the dirty inodes
    have been synced. On filesystems that are dirtying inodes constantly
    and quickly, this means the flusher thread can be tied up for
    minutes per sync call and hence badly affect system level write IO
    performance as the page cache cannot be cleaned quickly.

    We already have a wait loop for IO completion for sync(2), so cut
    this out of the flusher thread and delegate it to wait_sb_inodes().
    Hence we can do rapid IO submission, and then wait for it all to
    complete.

    Effect of sync on fsmark before the patch:

    FSUse% Count Size Files/sec App Overhead
    .....
    0 640000 4096 35154.6 1026984
    0 720000 4096 36740.3 1023844
    0 800000 4096 36184.6 916599
    0 880000 4096 1282.7 1054367
    0 960000 4096 3951.3 918773
    0 1040000 4096 40646.2 996448
    0 1120000 4096 43610.1 895647
    0 1200000 4096 40333.1 921048

    And a single sync pass took:

    real 0m52.407s
    user 0m0.000s
    sys 0m0.090s

    After the patch, there is no impact on fsmark results, and each
    individual sync(2) operation run concurrently with the same fsmark
    workload takes roughly 7s:

    real 0m6.930s
    user 0m0.000s
    sys 0m0.039s

    IOWs, sync is 7-8x faster on a busy filesystem and does not have an
    adverse impact on ongoing async data write operations.

    Signed-off-by: Dave Chinner
    Reviewed-by: Jan Kara
    Signed-off-by: Linus Torvalds

    Dave Chinner
     

02 Jul, 2013

7 commits

  • If user requests many data writes and fsync together, the last updated i_size
    should be stored to the inode block consistently.

    But, previous write_end just marks the inode as dirty and doesn't update its
    metadata into its inode block.
    After that, fsync just writes the inode block with newly updated data index
    excluding inode metadata updates.

    So, this patch introduces write_end in which updates inode block too when the
    i_size is changed.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • As destroy_fsync_dnodes() is a simple list-cleanup func, so delete the unused
    and unrelated f2fs_sb_info argument of it.

    Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • This patch removes check_prefree_segments initially designed to enhance the
    performance by narrowing the range of LBA usage across the whole block device.

    When allocating a new segment, previous f2fs tries to find proper prefree
    segments, and then, if finds a segment, it reuses the segment for further
    data or node block allocation.

    However, I found that this was totally wrong approach since the prefree segments
    have several data or node blocks that will be used by the roll-forward mechanism
    operated after sudden-power-off.

    Let's assume the following scenario.

    /* write 8MB with fsync */
    for (i = 0; i < 2048; i++) {
    offset = i * 4096;
    write(fd, offset, 4KB);
    fsync(fd);
    }

    In this case, naive segment allocation sequence will be like:
    data segment: x, x+1, x+2, x+3
    node segment: y, y+1, y+2, y+3.

    But, if we can reuse prefree segments, the sequence can be like:
    data segment: x, x+1, y, y+1
    node segment: y, y+1, y+2, y+3.
    Because, y, y+1, and y+2 became prefree segments one by one, and those are
    reused by data allocation.

    After conducting this workload, we should consider how to recover the latest
    inode with its data.
    If we reuse the prefree segments such as y or y+1, we lost the old node blocks
    so that f2fs even cannot start roll-forward recovery.

    Therefore, I suggest that we should remove reusing prefree segments.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch simplifies list operations in find_gc_inode and add_gc_inode.
    Just simple code cleanup.

    Signed-off-by: Gu Zheng
    [Jaegeuk Kim: add description]
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • Optimize the while loop condition

    Since this condition will always be true and while loop will
    be terminated by the following condition in code:

    if (segno >= TOTAL_SEGS(sbi))
    break;
    Hence we can replace the while loop condition with while(1)
    instead of always checking for segno to be less than Total segs.

    Also we do not need to use TOTAL_SEGS() everytime. We can store
    this value in a local variable since this value is constant.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • This patch should fix the following bug reported by kbuild test robot.

    fs/f2fs/recovery.c:233:33: sparse: incorrect type in assignment
    (different base types)

    parse warnings: (new ones prefixed by >>)

    >> recovery.c:233: sparse: incorrect type in assignment (different base types)
    recovery.c:233: expected unsigned int [unsigned] [assigned] ofs_in_node
    recovery.c:233: got restricted __le16 [assigned] [usertype] ofs_in_node
    >> recovery.c:238: sparse: incorrect type in assignment (different base types)
    recovery.c:238: expected unsigned int [unsigned] ofs_in_node
    recovery.c:238: got restricted __le16 [assigned] [usertype] ofs_in_node

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • While calculating CRC for the checkpoint block, we use __u32, but when storing
    the crc value to the disk, we use __le32.

    Let's fix the inconsistency.

    Reported-and-Tested-by: Oded Gabbay
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

01 Jul, 2013

16 commits

  • Both hole punch and truncate use ext4_ext_rm_leaf() for removing
    blocks. Currently we choose the last extent as the starting
    point for removing blocks:

    ex = EXT_LAST_EXTENT(eh);

    This is OK for truncate but for hole punch we can optimize the extent
    selection as the path is already initialized. We could use this
    information to select proper starting extent. The code change in this
    patch will not affect truncate as for truncate path[depth].p_ext will
    always be NULL.

    Signed-off-by: Ashish Sangwan
    Signed-off-by: Namjae Jeon
    Signed-off-by: "Theodore Ts'o"

    Ashish Sangwan
     
  • If jbd2_journal_restart() fails the handle will have been disconnected
    from the current transaction. In this situation, the handle must not
    be used for for any jbd2 function other than jbd2_journal_stop().
    Enforce this with by treating a handle which has a NULL transaction
    pointer as an aborted handle, and issue a kernel warning if
    jbd2_journal_extent(), jbd2_journal_get_write_access(),
    jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

    This commit also fixes a bug where jbd2_journal_stop() would trip over
    a kernel jbd2 assertion check when trying to free an invalid handle.

    Also move the responsibility of setting current->journal_info to
    start_this_handle(), simplifying the three users of this function.

    Signed-off-by: "Theodore Ts'o"
    Reported-by: Younger Liu
    Cc: Jan Kara

    Theodore Ts'o
     
  • Translate the bitfields used in various flags argument to strings to
    make the tracepoint output more human-readable.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The function mpage_released_unused_page() must only be called once;
    otherwise the kernel will BUG() when the second call to
    mpage_released_unused_page() tries to unlock the pages which had been
    unlocked by the first call.

    Also restructure the error handling so that we only give up on writing
    the dirty pages in the case of ENOSPC where retrying the allocation
    won't help. Otherwise, a transient failure, such as a kmalloc()
    failure in calling ext4_map_blocks() might cause us to give up on
    those pages, leading to a scary message in /var/log/messages plus data
    loss.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara

    Theodore Ts'o
     
  • Once we decrement transaction->t_updates, if this is the last handle
    holding the transaction from closing, and once we release the
    t_handle_lock spinlock, it's possible for the transaction to commit
    and be released. In practice with normal kernels, this probably won't
    happen, since the commit happens in a separate kernel thread and it's
    unlikely this could all happen within the space of a few CPU cycles.

    On the other hand, with a real-time kernel, this could potentially
    happen, so save the tid found in transaction->t_tid before we release
    t_handle_lock. It would require an insane configuration, such as one
    where the jbd2 thread was set to a very high real-time priority,
    perhaps because a high priority real-time thread is trying to read or
    write to a file system. But some people who use real-time kernels
    have been known to do insane things, including controlling
    laser-wielding industrial robots. :-)

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Currently if we pass range into ext4_zero_partial_blocks() which covers
    entire block we would attempt to zero it even though we should only zero
    unaligned part of the block.

    Fix this by checking whether the range covers the whole block skip
    zeroing if so.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     
  • The function ext4_write_inline_data_end() can return an error. So we
    need to assign it to a signed integer variable to check for an error
    return (since copied is an unsigned int).

    Signed-off-by: "Theodore Ts'o"
    Cc: Zheng Liu
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Comparing unsigned variable with 0 always returns false.
    err = 0 is duplicated and unnecessary.

    [ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ]

    Signed-off-by: "Jon Ernst"
    Signed-off-by: "Theodore Ts'o"

    jon ernst
     
  • Both ext3 and ext4 htree_dirblock_to_tree() is just filling the
    in-core rbtree for use by call_filldir(). All updates of ->f_pos are
    done by the latter; bumping it here (on error) is obviously wrong - we
    might very well have it nowhere near the block we'd found an error in.

    Signed-off-by: Al Viro
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Al Viro
     
  • Some of the functions which modify the jbd2 superblock were not
    updating the checksum before calling jbd2_write_superblock(). Move
    the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
    that the checksum is calculated consistently.

    Signed-off-by: "Theodore Ts'o"
    Cc: Darrick J. Wong
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • No need to pass file pointer when we can directly pass inode pointer.

    Signed-off-by: Ashish Sangwan
    Signed-off-by: Namjae Jeon
    Signed-off-by: "Theodore Ts'o"

    Ashish Sangwan
     
  • In ext4 feature inline_data,it use the xattr's space to store the
    inline data in inode.When we calculate the inline data as the xattr,we
    add the pad.But in get_max_inline_xattr_value_size() function we count
    the free space without pad.It cause some contents are moved to a block
    even if it can be
    stored in the inode.

    Signed-off-by: liulei
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Tao Ma

    boxi liu
     
  • Reduce the object size ~10% could be useful for embedded systems.

    Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and
    arguments, passing " " to functions when !CONFIG_PRINTK and still
    verifying format and arguments with no_printk.

    $ size fs/ext4/built-in.o*
    text data bss dec hex filename
    239375 610 888 240873 3ace9 fs/ext4/built-in.o.new
    264167 738 888 265793 40e41 fs/ext4/built-in.o.old

    $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config
    # CONFIG_PRINTK is not set
    CONFIG_EXT4_FS=y
    CONFIG_EXT4_USE_FOR_EXT23=y
    CONFIG_EXT4_FS_POSIX_ACL=y
    # CONFIG_EXT4_FS_SECURITY is not set
    # CONFIG_EXT4_DEBUG is not set

    Signed-off-by: Joe Perches
    Signed-off-by: "Theodore Ts'o"

    Joe Perches
     
  • Now we maintain an proper in-order LRU list in ext4 to reclaim entries
    from extent status tree when we are under heavy memory pressure. For
    keeping this order, a spin lock is used to protect this list. But this
    lock burns a lot of CPU time. We can use the following steps to trigger
    it.

    % cd /dev/shm
    % dd if=/dev/zero of=ext4-img bs=1M count=2k
    % mkfs.ext4 ext4-img
    % mount -t ext4 -o loop ext4-img /mnt
    % cd /mnt
    % for ((i=0;i

    Zheng Liu
     
  • If memory allocation in ext4_mb_new_group_pa() is failed,
    it returns error code, ext4_mb_new_preallocation() propages it,
    but ext4_mb_new_blocks() ignores it.

    An observed result was:

    - allocation fail means ext4_mb_new_group_pa() does not update
    ext4_allocation_context;

    - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len =
    ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead
    of number of blocks requested (1);

    - that activates update cycle in ext4_splice_branch():
    for (i = 1; i < blks; i++) p + i) = cpu_to_le32(current_block++);

    - it iterates 511 times and corrupts a chunk of memory including inode
    structure;

    - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty();

    - system hangs with 'scheduling while atomic' BUG.

    The patch implements a check for ext4_mb_new_preallocation() error
    code and handles its failure as if ext4_mb_regular_allocator() fails.

    Found by Linux File System Verification project (linuxtesting.org).

    [ Patch restructed by tytso to make the flow of control easier to follow. ]

    Signed-off-by: Alexey Khoroshilov
    Signed-off-by: "Theodore Ts'o"

    Alexey Khoroshilov
     
  • Subtracting the number of the first data block places the superblock
    backups one block too early, corrupting the file system. When the block
    size is larger than 1K, the first data block is 0, so the subtraction
    has no effect and no corruption occurs.

    Signed-off-by: Maarten ter Huurne
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara
    CC: stable@vger.kernel.org

    Maarten ter Huurne
     

30 Jun, 2013

2 commits


29 Jun, 2013

3 commits