06 Sep, 2016

1 commit


27 Jul, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The major change this cycle is deleting ext4's copy of the file system
    encryption code and switching things over to using the copies in
    fs/crypto. I've updated the MAINTAINERS file to add an entry for
    fs/crypto listing Jaeguk Kim and myself as the maintainers.

    There are also a number of bug fixes, most notably for some problems
    found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
    fixed is a writeback deadlock detected by generic/130, and some
    potential races in the metadata checksum code"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
    ext4: verify extent header depth
    ext4: short-cut orphan cleanup on error
    ext4: fix reference counting bug on block allocation error
    MAINTAINRES: fs-crypto maintainers update
    ext4 crypto: migrate into vfs's crypto engine
    ext2: fix filesystem deadlock while reading corrupted xattr block
    ext4: fix project quota accounting without quota limits enabled
    ext4: validate s_reserved_gdt_blocks on mount
    ext4: remove unused page_idx
    ext4: don't call ext4_should_journal_data() on the journal inode
    ext4: Fix WARN_ON_ONCE in ext4_commit_super()
    ext4: fix deadlock during page writeback
    ext4: correct error value of function verifying dx checksum
    ext4: avoid modifying checksum fields directly during checksum verification
    ext4: check for extents that wrap around
    jbd2: make journal y2038 safe
    jbd2: track more dependencies on transaction commit
    jbd2: move lockdep tracking to journal_s
    jbd2: move lockdep instrumentation for jbd2 handles
    ext4: respect the nobarrier mount option in nojournal mode
    ...

    Linus Torvalds
     

11 Jul, 2016

1 commit


08 Jun, 2016

1 commit


30 Apr, 2016

2 commits

  • Instead of just printing warning messages, if the orphan list is
    corrupted, declare the file system is corrupted. If there are any
    reserved inodes in the orphaned inode list, declare the file system
    corrupted and stop right away to avoid doing more potential damage to
    the file system.

    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • If the orphaned inode list contains inode #5, ext4_iget() returns a
    bad inode (since the bootloader inode should never be referenced
    directly). Because of the bad inode, we end up processing the inode
    repeatedly and this hangs the machine.

    This can be reproduced via:

    mke2fs -t ext4 /tmp/foo.img 100
    debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
    mount -o loop /tmp/foo.img /mnt

    (But don't do this if you are using an unpatched kernel if you care
    about the system staying functional. :-)

    This bug was found by the port of American Fuzzy Lop into the kernel
    to find file system problems[1]. (Since it *only* happens if inode #5
    shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
    surprising that AFL needed two hours before it found it.)

    [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf

    Cc: stable@vger.kernel.org
    Reported by: Vegard Nossum
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

10 Mar, 2016

1 commit


12 Feb, 2016

1 commit

  • When block group checksum is wrong, we call ext4_error() while holding
    group spinlock from ext4_init_block_bitmap() or
    ext4_init_inode_bitmap() which results in scheduling while in atomic.
    Fix the issue by calling ext4_error() later after dropping the spinlock.

    CC: stable@vger.kernel.org
    Reported-by: Dmitry Vyukov
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Darrick J. Wong

    Jan Kara
     

09 Jan, 2016

1 commit


18 Oct, 2015

3 commits


24 Jul, 2015

1 commit


01 Jun, 2015

1 commit

  • Factor out calls to ext4_inherit_context() and move them to
    __ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't
    calling calling ext4_inherit_context(), so the temporary file wasn't
    getting protected. Since the blocks for the tmpfile could end up on
    disk, they really should be protected if the tmpfile is created within
    the context of an encrypted directory.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

19 May, 2015

1 commit

  • The superblock fields s_file_encryption_mode and s_dir_encryption_mode
    are vestigal, so remove them as a cleanup. While we're at it, allow
    file systems with both encryption and inline_data enabled at the same
    time to work correctly. We can't have encrypted inodes with inline
    data, but there's no reason to prohibit unencrypted inodes from using
    the inline data feature.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

16 Apr, 2015

2 commits


12 Apr, 2015

2 commits


03 Apr, 2015

1 commit


30 Oct, 2014

1 commit

  • When we fail to load block bitmap in __ext4_new_inode() we will
    dereference NULL pointer in ext4_journal_get_write_access(). So check
    for error from ext4_read_block_bitmap().

    Coverity-id: 989065
    Cc: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

13 Oct, 2014

1 commit

  • Besides the fact that this replacement improves code readability
    it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
    which may result attempt to use uninitialized csum machinery.

    #Testcase_BEGIN
    IMG=/dev/ram0
    MNT=/mnt
    mkfs.ext4 $IMG
    mount $IMG $MNT
    #Enable feature directly on disk, on mounted fs
    tune2fs -O metadata_csum $IMG
    # Provoke metadata update, likey result in OOPS
    touch $MNT/test
    umount $MNT
    #Testcase_END

    # Replacement script
    @@
    expression E;
    @@
    - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
    + ext4_has_metadata_csum(E)

    https://bugzilla.kernel.org/show_bug.cgi?id=82201

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     

13 Jul, 2014

1 commit


06 Jul, 2014

1 commit

  • The first time that we allocate from an uninitialized inode allocation
    bitmap, if the block allocation bitmap is also uninitalized, we need
    to get write access to the block group descriptor before we start
    modifying the block group descriptor flags and updating the free block
    count, etc. Otherwise, there is the potential of a bad journal
    checksum (if journal checksums are enabled), and of the file system
    becoming inconsistent if we crash at exactly the wrong time.

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

26 Jun, 2014

1 commit

  • We should decrement free clusters counter when block bitmap is marked
    as corrupt and free inodes counter when the allocation bitmap is
    marked as corrupt to avoid misunderstanding due to incorrect available
    size in statfs result. User can get immediately ENOSPC error from
    write begin without reaching for the writepages.

    Cc: Darrick J. Wong
    Reported-by: Amit Sahrawat
    Signed-off-by: Namjae Jeon
    Signed-off-by: Ashish Sangwan

    Namjae Jeon
     

08 Nov, 2013

1 commit


29 Aug, 2013

2 commits

  • If the group descriptor fails validation, mark the whole blockgroup
    corrupt so that the inode/block allocators skip this group. The
    previous approach takes the risk of writing to a damaged group
    descriptor; hopefully it was never the case that the [ib]bitmap fields
    pointed to another valid block and got dirtied, since the memset would
    fill the page with 1s.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong
     
  • If we detect either a discrepancy between the inode bitmap and the
    inode counts or the inode bitmap fails to pass validation checks, mark
    the block group corrupt and refuse to allocate or deallocate inodes
    from the group.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong
     

17 Aug, 2013

1 commit

  • In no journal mode, if an inode has recently been deleted, we
    shouldn't reuse it right away. Otherwise it's possible, after an
    unclean shutdown, to hit a situation where a recently deleted inode
    gets reused for some other purpose before the inode table block has
    been written to disk. However, if the directory entry has been
    updated, then the directory entry will be pointing at the old inode
    contents.

    E2fsck will make sure the file system is consistent after the
    unclean shutdown. However, if the recently deleted inode is a
    character mode device, or an inode with the immutable bit set, even
    after the file system has been fixed up by e2fsck, it can be
    possible for a *.pyc file to be pointing at a character mode
    device, and when python tries to open the *.pyc file, Hilarity
    Ensues. We could change all of userspace to be very suspicious
    about stat'ing files before opening them, and clearing the
    immutable flag if necessary --- or we can just avoid reusing an
    inode number if it has been recently deleted.

    Google-Bug-Id: 10017573

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

27 Jul, 2013

1 commit

  • When we try to allocate an inode, and there is a race between two
    CPU's trying to grab the same inode, _and_ this inode is the last free
    inode in the block group, make sure the group number is bumped before
    we continue searching the rest of the block groups. Otherwise, we end
    up searching the current block group twice, and we end up skipping
    searching the last block group. So in the unlikely situation where
    almost all of the inodes are allocated, it's possible that we will
    return ENOSPC even though there might be free inodes in that last
    block group.

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

05 Jun, 2013

1 commit


21 Apr, 2013

1 commit


20 Apr, 2013

1 commit

  • Inode allocation transaction is pretty heavy (246 credits with quotas
    and extents before previous patch, still around 200 after it). This is
    mostly due to credits required for allocation of quota structures
    (credits there are heavily overestimated but it's difficult to make
    better estimates if we don't want to wire non-trivial assumptions about
    quota format into filesystem).

    So move quota initialization out of allocation transaction. That way
    transaction for quota structure allocation will be started only if we
    need to look up quota structure on disk (rare) and furthermore it will
    be started for each quota type separately, not for all of them at once.
    This reduces maximum transaction size to 34 is most cases and to 73 in
    the worst case.

    [ Modified by tytso to clean up the cleanup paths for error handling.
    Also use a separate call to ext4_std_error() for each failure so it
    is easier for someone who is debugging a problem in this function to
    determine which function call failed. ]

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

10 Apr, 2013

1 commit


12 Mar, 2013

1 commit

  • A user who was using a 8TB+ file system and with a very large flexbg
    size (> 65536) could cause the atomic_t used in the struct flex_groups
    to overflow. This was detected by PaX security patchset:

    http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551

    This bug was introduced in commit 9f24e4208f7e, so it's been around
    since 2.6.30. :-(

    Fix this by using an atomic64_t for struct orlav_stats's
    free_clusters.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Lukas Czerner
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

15 Feb, 2013

1 commit


10 Feb, 2013

1 commit

  • In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
    until the inode has been succesfully allocated. In order to do this,
    we need to start the handle in the ext4_new_inode(). So create a new
    variant of this function, ext4_new_inode_start_handle(), so the handle
    can be created at the last possible minute, before we need to modify
    the inode allocation bitmap block.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Feb, 2013

1 commit

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

11 Dec, 2012

1 commit