08 Oct, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Lots of bug fixes and cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
    ext4: remove unused variable
    ext4: use journal inode to determine journal overhead
    ext4: create function to read journal inode
    ext4: unmap metadata when zeroing blocks
    ext4: remove plugging from ext4_file_write_iter()
    ext4: allow unlocked direct IO when pages are cached
    ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
    fscrypto: use standard macros to compute length of fname ciphertext
    ext4: do not unnecessarily null-terminate encrypted symlink data
    ext4: release bh in make_indexed_dir
    ext4: Allow parallel DIO reads
    ext4: allow DAX writeback for hole punch
    jbd2: fix lockdep annotation in add_transaction_credits()
    blockgroup_lock.h: simplify definition of NR_BG_LOCKS
    blockgroup_lock.h: remove debris from bgl_lock_ptr() conversion
    fscrypto: make filename crypto functions return 0 on success
    fscrypto: rename completion callbacks to reflect usage
    fscrypto: remove unnecessary includes
    fscrypto: improved validation when loading inode encryption metadata
    ext4: fix memory leak when symlink decryption fails
    ...

    Linus Torvalds
     

30 Sep, 2016

1 commit

  • ...otherwise an user can enable encryption for certain files even
    when the filesystem is unable to support it.
    Such a case would be a filesystem created by mkfs.ext4's default
    settings, 1KiB block size. Ext4 supports encyption only when block size
    is equal to PAGE_SIZE.
    But this constraint is only checked when the encryption feature flag
    is set.

    Signed-off-by: Richard Weinberger
    Signed-off-by: Theodore Ts'o

    Richard Weinberger
     

15 Sep, 2016

1 commit


10 Sep, 2016

1 commit

  • Since setting an encryption policy requires writing metadata to the
    filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
    Otherwise, a user could cause a write to a frozen or readonly
    filesystem. This was handled correctly by f2fs but not by ext4. Make
    fscrypt_process_policy() handle it rather than relying on the filesystem
    to get it right.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
    Signed-off-by: Theodore Ts'o
    Acked-by: Jaegeuk Kim

    Eric Biggers
     

06 Sep, 2016

1 commit


11 Jul, 2016

1 commit


06 Jul, 2016

1 commit


25 May, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Fix a number of bugs, most notably a potential stale data exposure
    after a crash and a potential BUG_ON crash if a file has the data
    journalling flag enabled while it has dirty delayed allocation blocks
    that haven't been written yet. Also fix a potential crash in the new
    project quota code and a maliciously corrupted file system.

    In addition, fix some DAX-specific bugs, including when there is a
    transient ENOSPC situation and races between writes via direct I/O and
    an mmap'ed segment that could lead to lost I/O.

    Finally the usual set of miscellaneous cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
    ext4: pre-zero allocated blocks for DAX IO
    ext4: refactor direct IO code
    ext4: fix race in transient ENOSPC detection
    ext4: handle transient ENOSPC properly for DAX
    dax: call get_blocks() with create == 1 for write faults to unwritten extents
    ext4: remove unmeetable inconsisteny check from ext4_find_extent()
    jbd2: remove excess descriptions for handle_s
    ext4: remove unnecessary bio get/put
    ext4: silence UBSAN in ext4_mb_init()
    ext4: address UBSAN warning in mb_find_order_for_block()
    ext4: fix oops on corrupted filesystem
    ext4: fix check of dqget() return value in ext4_ioctl_setproject()
    ext4: clean up error handling when orphan list is corrupted
    ext4: fix hang when processing corrupted orphaned inode list
    ext4: remove trailing \n from ext4_warning/ext4_error calls
    ext4: fix races between changing inode journal mode and ext4_writepages
    ext4: handle unwritten or delalloc buffers before enabling data journaling
    ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
    ext4: do not ask jbd2 to write data for delalloc buffers
    jbd2: add support for avoiding data writes during transaction commits
    ...

    Linus Torvalds
     

21 May, 2016

1 commit

  • Let's gather the UUID related functions under one hood.

    Signed-off-by: Andy Shevchenko
    Reviewed-by: Matt Fleming
    Cc: Dmitry Kasatkin
    Cc: Mimi Zohar
    Cc: Rasmus Villemoes
    Cc: Arnd Bergmann
    Cc: "Theodore Ts'o"
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

05 May, 2016

1 commit


28 Feb, 2016

1 commit

  • Online defrag operations for ext4 are hard coded to use the page cache.
    See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()

    When combined with DAX I/O, which circumvents the page cache, this can
    result in data corruption. This was observed with xfstests ext4/307 and
    ext4/308.

    Fix this by only allowing online defrag for non-DAX files.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

12 Feb, 2016

1 commit

  • The ext4_ioctl_setflags() function which is used in the ioctls
    EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
    EPERM instead of -EPERM in case of error. This bug was introduced by a
    recent commit 9b7365fc.

    The following program can be used to illustrate the wrong behavior:

    #include
    #include
    #include
    #include
    #include

    #define FS_IOC_GETFLAGS _IOR('f', 1, long)
    #define FS_IOC_SETFLAGS _IOW('f', 2, long)
    #define FS_IMMUTABLE_FL 0x00000010

    int main(void)
    {
    int fd;
    long flags;

    fd = open("file", O_RDWR|O_CREAT, 0600);
    if (fd < 0)
    err(1, "open");

    if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
    err(1, "ioctl: FS_IOC_GETFLAGS");

    flags |= FS_IMMUTABLE_FL;

    if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
    err(1, "ioctl: FS_IOC_SETFLAGS");

    warnx("ioctl returned no error");

    return 0;
    }

    Running it gives the following result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
    test: ioctl returned no error
    +++ exited with 0 +++

    Running the program on a kernel with the bug fixed gives the proper result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
    test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
    +++ exited with 1 +++

    Signed-off-by: Anton Protopopov
    Signed-off-by: Theodore Ts'o

    Anton Protopopov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

09 Jan, 2016

1 commit


18 Oct, 2015

1 commit


10 Jul, 2015

1 commit

  • The FITRIM ioctl has the same arguments on 32-bit and 64-bit
    architectures, so we can add it to the list of compatible ioctls and
    drop it from compat_ioctl method of various filesystems.

    Signed-off-by: Mikulas Patocka
    Cc: Al Viro
    Cc: Ted Ts'o
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

13 Jun, 2015

1 commit


09 Jun, 2015

1 commit


11 Apr, 2015

1 commit


03 Apr, 2015

1 commit


26 Nov, 2014

2 commits

  • Currently callers adding extents to extent status tree were responsible
    for adding the inode to the list of inodes with freeable extents. This
    is error prone and puts list handling in unnecessarily many places.

    Just add inode to the list automatically when the first non-delay extent
    is added to the tree and remove inode from the list when the last
    non-delay extent is removed.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • In this commit we discard the lru algorithm for inodes with extent
    status tree because it takes significant effort to maintain a lru list
    in extent status tree shrinker and the shrinker can take a long time to
    scan this lru list in order to reclaim some objects.

    We replace the lru ordering with a simple round-robin. After that we
    never need to keep a lru list. That means that the list needn't be
    sorted if the shrinker can not reclaim any objects in the first round.

    Cc: Andreas Dilger
    Signed-off-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Zheng Liu
     

13 Oct, 2014

1 commit

  • Besides the fact that this replacement improves code readability
    it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
    which may result attempt to use uninitialized csum machinery.

    #Testcase_BEGIN
    IMG=/dev/ram0
    MNT=/mnt
    mkfs.ext4 $IMG
    mount $IMG $MNT
    #Enable feature directly on disk, on mounted fs
    tune2fs -O metadata_csum $IMG
    # Provoke metadata update, likey result in OOPS
    touch $MNT/test
    umount $MNT
    #Testcase_END

    # Replacement script
    @@
    expression E;
    @@
    - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
    + ext4_has_metadata_csum(E)

    https://bugzilla.kernel.org/show_bug.cgi?id=82201

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     

04 Oct, 2014

1 commit

  • Otherwise this provokes complain like follows:
    WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
    Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
    CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
    Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
    0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
    0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
    ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
    Call Trace:
    [] dump_stack+0x51/0x6d
    [] warn_slowpath_common+0x8c/0xc0
    [] ? ext4_ioctl+0x9e8/0xeb0
    [] warn_slowpath_null+0x1a/0x20
    [] ext4_journal_check_start+0x4e/0xa0
    [] __ext4_journal_start_sb+0x90/0x110
    [] ext4_ioctl+0x9e8/0xeb0
    [] ? ptrace_stop+0x24d/0x2f0
    [] ? alloc_pid+0x480/0x480
    [] ? ptrace_do_notify+0x92/0xb0
    [] do_vfs_ioctl+0x4e5/0x550
    [] ? _raw_spin_unlock_irq+0x2b/0x40
    [] SyS_ioctl+0x53/0x80
    [] tracesys+0xd0/0xd5

    Reviewed-by: Jan Kara
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Dmitry Monakhov
     

18 Feb, 2014

1 commit


13 Feb, 2014

1 commit

  • In swap_inode_boot_loader() we forgot to release ->i_mutex and resume
    unlocked dio for inode and inode_bl if there is an error starting the
    journal handle. This commit fixes this issue.

    Reported-by: Ahmed Tamrawi
    Cc: Andreas Dilger
    Cc: Dr. Tilmann Bubeck
    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org # v3.10+

    Zheng Liu
     

12 Jan, 2014

1 commit


09 Nov, 2013

1 commit


29 Aug, 2013

1 commit

  • After applied the commit (4a092d73), we have reduced the number of
    source files that need to #include ext4_extents.h. But we can do
    better.

    This commit defines ext4_zeroout_es() in extents.c and move
    EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
    indirect.c and ioctl.c. Meanwhile we just need to include this file in
    extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
    commit removes a duplicated declaration in trace/events/ext4.h.

    After applied this patch, we just need to include ext4_extents.h file
    in {super,migrate,move_extents,extents}.c, and it is easy for us to
    define a new extent disk layout.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     

17 Aug, 2013

1 commit

  • Add a new fiemap flag which forces the all of the extents in an inode
    to be cached in the extent_status tree. This is critically important
    when using AIO to a preallocated file, since if we need to read in
    blocks from the extent tree, the io_submit(2) system call becomes
    synchronous, and the AIO is no longer "A", which is bad.

    In addition, for most files which have an external leaf tree block,
    the cost of caching the information in the extent status tree will be
    less than caching the entire 4k block in the buffer cache. So it is
    generally a win to keep the extent information cached.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

12 Aug, 2013

1 commit


10 Apr, 2013

1 commit


09 Apr, 2013

1 commit

  • Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
    associated attributes (like i_blocks, i_size, i_flags, ...) from the
    specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
    typically used to store a boot loader in a secure part of the
    filesystem, where it can't be changed by a normal user by accident.
    The data blocks of the previous boot loader will be associated with
    the given inode.

    This usercode program is a simple example of the usage:

    int main(int argc, char *argv[])
    {
    int fd;
    int err;

    if ( argc != 2 ) {
    printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
    exit(1);
    }

    fd = open(argv[1], O_WRONLY);
    if ( fd < 0 ) {
    perror("open");
    exit(1);
    }

    err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
    if ( err < 0 ) {
    perror("ioctl");
    exit(1);
    }

    close(fd);
    exit(0);
    }

    [ Modified by Theodore Ts'o to fix a number of bugs in the original code.]

    Signed-off-by: Dr. Tilmann Bubeck
    Signed-off-by: "Theodore Ts'o"

    Dr. Tilmann Bubeck
     

04 Apr, 2013

1 commit

  • In order to make it simpler to test the code which support
    i_blocks/indirect-mapped inodes, support the conversion of inodes
    which are less than 12 blocks and which are contained in no more than
    a single extent.

    The primary intended use of this code is to converting freshly created
    zero-length files and empty directories.

    Note that the version of chattr in e2fsprogs 1.42.7 and earlier has a
    check that prevents the clearing of the extent flag. A simple patch
    which allows "chattr -e " to work will be checked into the
    e2fsprogs git repository.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


09 Feb, 2013

1 commit

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

13 Jan, 2013

1 commit


08 Oct, 2012

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The big new feature added this time is supporting online resizing
    using the meta_bg feature. This allows us to resize file systems
    which are greater than 16TB. In addition, the speed of online
    resizing has been improved in general.

    We also fix a number of races, some of which could lead to deadlocks,
    in ext4's Asynchronous I/O and online defrag support, thanks to good
    work by Dmitry Monakhov.

    There are also a large number of more minor bug fixes and cleanups
    from a number of other ext4 contributors, quite of few of which have
    submitted fixes for the first time."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
    ext4: fix ext4_flush_completed_IO wait semantics
    ext4: fix mtime update in nodelalloc mode
    ext4: fix ext_remove_space for punch_hole case
    ext4: punch_hole should wait for DIO writers
    ext4: serialize truncate with owerwrite DIO workers
    ext4: endless truncate due to nonlocked dio readers
    ext4: serialize unlocked dio reads with truncate
    ext4: serialize dio nonlocked reads with defrag workers
    ext4: completed_io locking cleanup
    ext4: fix unwritten counter leakage
    ext4: give i_aiodio_unwritten a more appropriate name
    ext4: ext4_inode_info diet
    ext4: convert to use leXX_add_cpu()
    ext4: ext4_bread usage audit
    fs: reserve fallocate flag codepoint
    ext4: remove redundant offset check in mext_check_arguments()
    ext4: don't clear orphan list on ro mount with errors
    jbd2: fix assertion failure in commit code due to lacking transaction credits
    ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
    ext4: enable FITRIM ioctl on bigalloc file system
    ...

    Linus Torvalds
     

27 Sep, 2012

1 commit