07 Jan, 2013

3 commits


03 Jan, 2013

1 commit

  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. Perhaps the most serious bug fixed is one
    which could cause file system corruptions when performing file punch
    operations."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid hang when mounting non-journal filesystems with orphan list
    ext4: lock i_mutex when truncating orphan inodes
    ext4: do not try to write superblock on ro remount w/o journal
    ext4: include journal blocks in df overhead calcs
    ext4: remove unaligned AIO warning printk
    ext4: fix an incorrect comment about i_mutex
    ext4: fix deadlock in journal_unmap_buffer()
    ext4: split off ext4_journalled_invalidatepage()
    jbd2: fix assertion failure in jbd2_journal_flush()
    ext4: check dioread_nolock on remount
    ext4: fix extent tree corruption caused by hole punch

    Linus Torvalds
     

27 Dec, 2012

2 commits

  • When trying to mount a file system which does not contain a journal,
    but which does have a orphan list containing an inode which needs to
    be truncated, the mount call with hang forever in
    ext4_orphan_cleanup() because ext4_orphan_del() will return
    immediately without removing the inode from the orphan list, leading
    to an uninterruptible loop in kernel code which will busy out one of
    the CPU's on the system.

    This can be trivially reproduced by trying to mount the file system
    found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
    source tree. If a malicious user were to put this on a USB stick, and
    mount it on a Linux desktop which has automatic mounts enabled, this
    could be considered a potential denial of service attack. (Not a big
    deal in practice, but professional paranoids worry about such things,
    and have even been known to allocate CVE numbers for such problems.)

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
    called without i_mutex being taken. It had previously not been taken
    during orphan cleanup since races weren't possible at that point in
    the mount process, but as a result of this c278531d39, we will now see
    a kernel WARN_ON in this case. Take the i_mutex in
    ext4_orphan_cleanup() to suppress this warning.

    Reported-by: Alexander Beregalov
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

26 Dec, 2012

6 commits

  • When a journal-less ext4 filesystem is mounted on a read-only block
    device (blockdev --setro will do), each remount (for other, unrelated,
    flags, like suid=>nosuid etc) results in a series of scary messages
    from kernel telling about I/O errors on the device.

    This is becauese of the following code ext4_remount():

    if (sbi->s_journal == NULL)
    ext4_commit_super(sb, 1);

    at the end of remount procedure, which forces writing (flushing) of
    a superblock regardless whenever it is dirty or not, if the filesystem
    is readonly or not, and whenever the device itself is readonly or not.

    We only need call ext4_commit_super when the file system had been
    previously mounted read/write.

    Thanks to Eric Sandeen for help in diagnosing this issue.

    Signed-off-By: Michael Tokarev
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Michael Tokarev
     
  • To more accurately calculate overhead for "bsd" style
    df reporting, we should count the journal blocks as
    overhead as well.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Tested-by: Eric Whitney

    Eric Sandeen
     
  • Although I put this in, I now think it was a bad decision. For most
    users, there is very little to be done in this case. They get the
    message, once per day, with no real context or proposed action. TBH,
    it generates support calls when it probably does not need to; the
    message sounds more dire than the situation really is.

    Just nuke it. Normal investigation via blktrace or whatnot can
    reveal poor IO patterns if bad performance is encountered.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • i_mutex is not held when ->sync_file is called.

    Reviewed-by: Jan Kara
    Signed-off-by: Andy Lutomirski
    Signed-off-by: "Theodore Ts'o"

    Andy Lutomirski
     
  • We cannot wait for transaction commit in journal_unmap_buffer()
    because we hold page lock which ranks below transaction start. We
    solve the issue by bailing out of journal_unmap_buffer() and
    jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
    for waiting for transaction commit to finish and try invalidation
    again. Since the issue can happen only for page stradding i_size, it
    is simple enough to manually call jbd2_journal_invalidatepage() for
    such page from ext4_setattr(), check the return value and wait if
    necessary.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • In data=journal mode we don't need delalloc or DIO handling in invalidatepage
    and similarly in other modes we don't need the journal handling. So split
    invalidatepage implementations.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

20 Dec, 2012

1 commit

  • Currently we allow enabling dioread_nolock mount option on remount for
    filesystems where blocksize < PAGE_CACHE_SIZE. This isn't really
    supported so fix the bug by moving the check for blocksize !=
    PAGE_CACHE_SIZE into parse_options(). Change the original PAGE_SIZE to
    PAGE_CACHE_SIZE along the way because that's what we are really
    interested in.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Eric Sandeen
    Cc: stable@vger.kernel.org

    Jan Kara
     

18 Dec, 2012

1 commit


17 Dec, 2012

2 commits

  • When depth of extent tree is greater than 1, logical start value of
    interior node is not correctly updated in ext4_ext_rm_idx.

    Signed-off-by: Forrest Liu
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Ashish Sangwan
    Cc: stable@vger.kernel.org

    Forrest Liu
     
  • Pull ext4 update from Ted Ts'o:
    "There are two major features for this merge window. The first is
    inline data, which allows small files or directories to be stored in
    the in-inode extended attribute area. (This requires that the file
    system use inodes which are at least 256 bytes or larger; 128 byte
    inodes do not have any room for in-inode xattrs.)

    The second new feature is SEEK_HOLE/SEEK_DATA support. This is
    enabled by the extent status tree patches, and this infrastructure
    will be used to further optimize ext4 in the future.

    Beyond that, we have the usual collection of code cleanups and bug
    fixes."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
    ext4: zero out inline data using memset() instead of empty_zero_page
    ext4: ensure Inode flags consistency are checked at build time
    ext4: Remove CONFIG_EXT4_FS_XATTR
    ext4: remove unused variable from ext4_ext_in_cache()
    ext4: remove redundant initialization in ext4_fill_super()
    ext4: remove redundant code in ext4_alloc_inode()
    ext4: use sync_inode_metadata() when syncing inode metadata
    ext4: enable ext4 inline support
    ext4: let fallocate handle inline data correctly
    ext4: let ext4_truncate handle inline data correctly
    ext4: evict inline data out if we need to strore xattr in inode
    ext4: let fiemap work with inline data
    ext4: let ext4_rename handle inline dir
    ext4: let empty_dir handle inline dir
    ext4: let ext4_delete_entry() handle inline data
    ext4: make ext4_delete_entry generic
    ext4: let ext4_find_entry handle inline data
    ext4: create a new function search_dir
    ext4: let ext4_readdir handle inline data
    ext4: let add_dir_entry handle inline data properly
    ...

    Linus Torvalds
     

14 Dec, 2012

1 commit

  • Pull trivial branch from Jiri Kosina:
    "Usual stuff -- comment/printk typo fixes, documentation updates, dead
    code elimination."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    HOWTO: fix double words typo
    x86 mtrr: fix comment typo in mtrr_bp_init
    propagate name change to comments in kernel source
    doc: Update the name of profiling based on sysfs
    treewide: Fix typos in various drivers
    treewide: Fix typos in various Kconfig
    wireless: mwifiex: Fix typo in wireless/mwifiex driver
    messages: i2o: Fix typo in messages/i2o
    scripts/kernel-doc: check that non-void fcts describe their return value
    Kernel-doc: Convention: Use a "Return" section to describe return values
    radeon: Fix typo and copy/paste error in comments
    doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
    various: Fix spelling of "asynchronous" in comments.
    Fix misspellings of "whether" in comments.
    eisa: Fix spelling of "asynchronous".
    various: Fix spelling of "registered" in comments.
    doc: fix quite a few typos within Documentation
    target: iscsi: fix comment typos in target/iscsi drivers
    treewide: fix typo of "suport" in various comments and Kconfig
    treewide: fix typo of "suppport" in various comments
    ...

    Linus Torvalds
     

11 Dec, 2012

23 commits

  • Not all architectures (in particular, sparc64) have empty_zero_page.
    So instead of copying from empty_zero_page, use memset to clear the
    inline data by signalling to ext4_xattr_set_entry() via a magic
    pointer value, EXT4_ZERO_ATTR_VALUE, which is defined by casting -1 to
    a pointer.

    This fixes a build failure on sparc64, and the memset() should be more
    efficient than using memcpy() anyway.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Flags being used by atomic operations in inode flags (e.g.
    ext4_test_inode_flag(), should be consistent with that actually stored
    in inodes, i.e.: EXT4_XXX_FL.

    It ensures that this consistency is checked at build-time, not at
    run-time.

    Currently, the flags consistency are being checked at run-time, but,
    there is no real reason to not do a build-time check instead of a
    run-time check. The code is comparing macro defined values with enum
    type variables, where both are constants, so, there is no problem in
    comparing constants at build-time.

    enum variables are treated as constants by the C compiler, according
    to the C99 specs (see www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
    sec. 6.2.5, item 16), so, there is no real problem in comparing an
    enumeration type at build time

    Signed-off-by: Carlos Maiolino
    Signed-off-by: "Theodore Ts'o"

    Carlos Maiolino
     
  • Ted has sent out a RFC about removing this feature. Eric and Jan
    confirmed that both RedHat and SUSE enable this feature in all their
    product. David also said that "As far as I know, it's enabled in all
    Android kernels that use ext4." So it seems OK for us.

    And what's more, as inline data depends its implementation on xattr,
    and to be frank, I don't run any test again inline data enabled while
    xattr disabled. So I think we should add inline data and remove this
    config option in the same release.

    [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
    isn't much in the grand scheme of things. Since no one seems to be
    testing this configuration except for some automated compile farms, on
    balance we are better removing this config option, and so that it is
    effectively always enabled. -- tytso ]

    Cc: David Brown
    Cc: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Zhi Yong Wu
    Reviewed-by: Zheng Liu

    Zhi Yong Wu
     
  • We use kzalloc() to allocate sbi, no need to zero its field.

    Signed-off-by: Guo Chao
    Signed-off-by: "Theodore Ts'o"

    Guo Chao
     
  • inode_init_always() will initialize inode->i_data.writeback_index
    anyway, no need to do this in ext4_alloc_inode().

    Signed-off-by: Guo Chao
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Lukas Czerner

    Guo Chao
     
  • We have a dedicated interface to sync inode metadata. Use it to
    simplify ext4's code some.

    Signed-off-by: Guo Chao
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Lukas Czerner

    Guo Chao
     
  • Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • If we are punching hole in a file, we will return ENOTSUPP.
    As for the fallocation of some extents, we will convert the
    inline data to a normal extent based file first.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Signed-off-by: Robin Dong
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Now we that store data in the inode, in case we need to store some
    xattrs and inode doesn't have enough space, Andreas suggested that we
    should keep the xattr(metadata) in and data should be pushed out. So
    this patch does the work.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • fiemap is used to find the disk layout of a file, as for inline data,
    let us just pretend like a file with just one extent.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • In case we rename a directory, ext4_rename has to read the dir block
    and change its dotdot's information. The old ext4_rename encapsulated
    the dir_block read into itself. So this patch adds a new function
    ext4_get_first_dir_block() which gets the dir buffer information so
    the ext4_rename can handle it properly. As it will also change the
    parent inode number, we return the parent_de so that ext4_rename() can
    handle it more easily.

    ext4_find_entry is also changed so that the caller(rename) can tell
    whether the found entry is an inlined one or not and journaling the
    corresponding buffer head.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • empty_dir is used when deleting a dir. So it should handle inline dir
    properly.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Currently ext4_delete_entry() is used only for dir entry removing from
    a dir block. So let us create a new function
    ext4_generic_delete_entry and this function takes a entry_buf and a
    buf_size so that it can be used for inline data.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Create a new function ext4_find_inline_entry() to handle the case of
    inline data.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • search_dirblock is used to search a dir block, but the code is almost
    the same for searching an inline dir.

    So create a new fuction search_dir and let search_dirblock call it.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • For "." and "..", we just call filldir by ourselves
    instead of iterating the real dir entry.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • This patch let add_dir_entry handle the inline data case. So the
    dir is initialized as inline dir first and then we can try to add
    some files to it, when the inline space can't hold all the entries,
    a dir block will be created and the dir entry will be moved to it.

    Also for an inlined dir, "." and ".." are removed and we only use
    4 bytes to store the parent inode number. These 2 entries will be
    added when we convert an inline dir to a block-based one.

    [ Folded in patch from Dan Carpenter to remove an unused variable. ]

    Signed-off-by: Tao Ma
    Signed-off-by: Dan Carpenter
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • The old add_dirent_to_buf handles all the work related to the
    work of adding dir entry to a dir block. Now we have inline data,
    so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry
    that do the real work and let add_dirent_to_buf call them.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • The __ext4_check_dir_entry() function() is used to check whether the
    de is over the block boundary. Now with inline data, it could be
    within the block boundary while exceeds the inode size. So check this
    function to check the overflow more precisely.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     
  • Currently, the initialization of dot and dotdot are encapsulated in
    ext4_mkdir and also bond with dir_block. So create a new function
    named ext4_init_new_dir and the initialization is moved to
    ext4_init_dot_dotdot. Now it will called either in the normal non-inline
    case(rec_len of ".." will cover the whole block) or when we converting an
    inline dir to a block(rec len of ".." will be the real length). The start
    of the next entry is also returned for inline dir usage.

    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma