13 Dec, 2016

1 commit


12 Dec, 2016

1 commit

  • Multiple bugs were recently fixed in the "set encryption policy" ioctl.
    To make it clear that fscrypt_process_policy() and fscrypt_get_policy()
    implement ioctls and therefore their implementations must take standard
    security and correctness precautions, rename them to
    fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy(). Make the
    latter take in a struct file * to make it consistent with the former.

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     

02 Dec, 2016

1 commit

  • ext4_sb_has_crypto() just called through to ext4_has_feature_encrypt(),
    and all callers except one were already using the latter. So remove it
    and switch its one caller to ext4_has_feature_encrypt().

    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     

30 Nov, 2016

2 commits

  • Currently we just silently ignore flags that we don't understand (or
    that cannot be manipulated) through EXT4_IOC_SETFLAGS and
    EXT4_IOC_FSSETXATTR ioctls. This makes it problematic for the unused
    flags to be used in future (some app may be inadvertedly setting them
    and we won't notice until the flag gets used). Also this is inconsistent
    with other filesystems like XFS or BTRFS which return EOPNOTSUPP when
    they see a flag they cannot set.

    ext4 has the additional problem that there are flags which are returned
    by EXT4_IOC_GETFLAGS ioctl but which cannot be modified via
    EXT4_IOC_SETFLAGS. So we have to be careful to ignore value of these
    flags and not fail the ioctl when they are set (as e.g. chattr(1) passes
    flags returned from EXT4_IOC_GETFLAGS to EXT4_IOC_SETFLAGS without any
    masking and thus we'd break this utility).

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to EXT4_FL_USER_MODIFIABLE
    to recognize that they are modifiable by userspace. So far we got away
    without having them there because ext4_ioctl_setflags() treats them in a
    special way. But it was really confusing like that.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

22 Nov, 2016

1 commit


21 Nov, 2016

2 commits


19 Nov, 2016

1 commit

  • If the block size or cluster size is insane, reject the mount. This
    is important for security reasons (although we shouldn't be just
    depending on this check).

    Ref: http://www.securityfocus.com/archive/1/539661
    Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
    Reported-by: Borislav Petkov
    Reported-by: Nikolay Borisov
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

15 Nov, 2016

1 commit

  • CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
    current_time() will be transitioned to be y2038 safe
    along with vfs.

    current_time() returns timestamps according to the
    granularities set in the super_block.
    The granularity check in ext4_current_time() to call
    current_time() or CURRENT_TIME_SEC is not required.
    Use current_time() directly to obtain timestamps
    unconditionally, and remove ext4_current_time().

    Quota files are assumed to be on the same filesystem.
    Hence, use current_time() for these files as well.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Arnd Bergmann

    Deepa Dinamani
     

14 Nov, 2016

2 commits


15 Sep, 2016

1 commit


06 Sep, 2016

2 commits

  • Use the ext4_{has,set,clear}_feature_* helpers to replace the old
    feature helpers.

    Signed-off-by: Kaho Ng
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Reviewed-by: Darrick J. Wong

    Kaho Ng
     
  • When quota information is stored in quota files, we enable only quota
    accounting on mount and enforcement is enabled only in response to
    Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a
    possibility to enable quota enforcement on mount by specifying
    corresponding quota mount option (usrquota, grpquota, prjquota).

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

11 Jul, 2016

1 commit


27 Jun, 2016

1 commit


13 May, 2016

2 commits

  • Currently ext4 treats DAX IO the same way as direct IO. I.e., it
    allocates unwritten extents before IO is done and converts unwritten
    extents afterwards. However this way DAX IO can race with page fault to
    the same area:

    ext4_ext_direct_IO() dax_fault()
    dax_io()
    get_block() - allocates unwritten extent
    copy_from_iter_pmem()
    get_block() - converts
    unwritten block to
    written and zeroes it
    out
    ext4_convert_unwritten_extents()

    So data written with DAX IO gets lost. Similarly dax_new_buf() called
    from dax_io() can overwrite data that has been already written to the
    block via mmap.

    Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
    use them for DAX mmap. The downside of this solution is that every
    allocating write writes each block twice (once zeros, once data). Fixing
    the race with locking is possible as well however we would need to
    lock-out faults for the whole range written to by DAX IO. And that is
    not easy to do without locking-out faults for the whole file which seems
    too aggressive.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
    and ext4_ind_direct_IO(). However the extent based function calls into
    the indirect based one for some cases and for example it is not able to
    handle file extending. Previously it was not also properly handling
    retries in case of ENOSPC errors. With DAX things would get even more
    contrieved so just refactor the direct IO code and instead of indirect /
    extent split do the split to read vs writes.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

26 Apr, 2016

1 commit

  • In ext4, there is a race condition between changing inode journal mode
    and ext4_writepages(). While ext4_writepages() is executed on a
    non-journalled mode inode, the inode's journal mode could be enabled
    by ioctl() and then, some pages dirtied after switching the journal
    mode will be still exposed to ext4_writepages() in non-journaled mode.
    To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan
    Kara's suggestion because we don't want to waste ext4_inode_info's
    space for this extra rare case.

    Signed-off-by: Daeho Jeong
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Daeho Jeong
     

24 Apr, 2016

2 commits

  • Currently we ask jbd2 to write all dirty allocated buffers before
    committing a transaction when doing writeback of delay allocated blocks.
    However this is unnecessary since we move all pages to writeback state
    before dropping a transaction handle and then submit all the necessary
    IO. We still need the transaction commit to wait for all the outstanding
    writeback before flushing disk caches during transaction commit to avoid
    data exposure issues though. Use the new jbd2 capability and ask it to
    only wait for outstanding writeback during transaction commit when
    writing back data in ext4_writepages().

    Tested-by: "HUANG Weller (CM/ESW12-CN)"
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • This flag is just duplicating what ext4_should_order_data() tells you
    and is used in a single place. Furthermore it doesn't reflect changes to
    inode data journalling flag so it may be possibly misleading. Just
    remove it.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

08 Apr, 2016

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "These changes contains a fix for overlayfs interacting with some
    (badly behaved) dentry code in various file systems. These have been
    reviewed by Al and the respective file system mtinainers and are going
    through the ext4 tree for convenience.

    This also has a few ext4 encryption bug fixes that were discovered in
    Android testing (yes, we will need to get these sync'ed up with the
    fs/crypto code; I'll take care of that). It also has some bug fixes
    and a change to ignore the legacy quota options to allow for xfstests
    regression testing of ext4's internal quota feature and to be more
    consistent with how xfs handles this case"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: ignore quota mount options if the quota feature is enabled
    ext4 crypto: fix some error handling
    ext4: avoid calling dquot_get_next_id() if quota is not enabled
    ext4: retry block allocation for failed DIO and DAX writes
    ext4: add lockdep annotations for i_data_sem
    ext4: allow readdir()'s of large empty directories to be interrupted
    btrfs: fix crash/invalid memory access on fsync when using overlayfs
    ext4 crypto: use dget_parent() in ext4_d_revalidate()
    ext4: use file_dentry()
    ext4: use dget_parent() in ext4_file_open()
    nfs: use file_dentry()
    fs: add file_dentry()
    ext4 crypto: don't let data integrity writebacks fail with ENOMEM
    ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()

    Linus Torvalds
     

05 Apr, 2016

1 commit


01 Apr, 2016

1 commit

  • With the internal Quota feature, mke2fs creates empty quota inodes and
    quota usage tracking is enabled as soon as the file system is mounted.
    Since quotacheck is no longer preallocating all of the blocks in the
    quota inode that are likely needed to be written to, we are now seeing
    a lockdep false positive caused by needing to allocate a quota block
    from inside ext4_map_blocks(), while holding i_data_sem for a data
    inode. This results in this complaint:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&ei->i_data_sem);
    lock(&s->s_dquot.dqio_mutex);
    lock(&ei->i_data_sem);
    lock(&s->s_dquot.dqio_mutex);

    Google-Bug-Id: 27907753

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

27 Mar, 2016

1 commit

  • We don't want the writeback triggered from the journal commit (in
    data=writeback mode) to cause the journal to abort due to
    generic_writepages() returning an ENOMEM error. In addition, if
    fsync() fails with ENOMEM, most applications will probably not do the
    right thing.

    So if we are doing a data integrity sync, and ext4_encrypt() returns
    ENOMEM, we will submit any queued I/O to date, and then retry the
    allocation using GFP_NOFAIL.

    Google-Bug-Id: 27641567

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

22 Mar, 2016

1 commit

  • Pull xfs updates from Dave Chinner:
    "There's quite a lot in this request, and there's some cross-over with
    ext4, dax and quota code due to the nature of the changes being made.

    As for the rest of the XFS changes, there are lots of little things
    all over the place, which add up to a lot of changes in the end.

    The major changes are that we've reduced the size of the struct
    xfs_inode by ~100 bytes (gives an inode cache footprint reduction of
    >10%), the writepage code now only does a single set of mapping tree
    lockups so uses less CPU, delayed allocation reservations won't
    overrun under random write loads anymore, and we added compile time
    verification for on-disk structure sizes so we find out when a commit
    or platform/compiler change breaks the on disk structure as early as
    possible.

    Change summary:

    - error propagation for direct IO failures fixes for both XFS and
    ext4
    - new quota interfaces and XFS implementation for iterating all the
    quota IDs in the filesystem
    - locking fixes for real-time device extent allocation
    - reduction of duplicate information in the xfs and vfs inode, saving
    roughly 100 bytes of memory per cached inode.
    - buffer flag cleanup
    - rework of the writepage code to use the generic write clustering
    mechanisms
    - several fixes for inode flag based DAX enablement
    - rework of remount option parsing
    - compile time verification of on-disk format structure sizes
    - delayed allocation reservation overrun fixes
    - lots of little error handling fixes
    - small memory leak fixes
    - enable xfsaild freezing again"

    * tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits)
    xfs: always set rvalp in xfs_dir2_node_trim_free
    xfs: ensure committed is initialized in xfs_trans_roll
    xfs: borrow indirect blocks from freed extent when available
    xfs: refactor delalloc indlen reservation split into helper
    xfs: update freeblocks counter after extent deletion
    xfs: debug mode forced buffered write failure
    xfs: remove impossible condition
    xfs: check sizes of XFS on-disk structures at compile time
    xfs: ioends require logically contiguous file offsets
    xfs: use named array initializers for log item dumping
    xfs: fix computation of inode btree maxlevels
    xfs: reinitialise per-AG structures if geometry changes during recovery
    xfs: remove xfs_trans_get_block_res
    xfs: fix up inode32/64 (re)mount handling
    xfs: fix format specifier , should be %llx and not %llu
    xfs: sanitize remount options
    xfs: convert mount option parsing to tokens
    xfs: fix two memory leaks in xfs_attr_list.c error paths
    xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
    xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
    ...

    Linus Torvalds
     

14 Mar, 2016

1 commit

  • the error is:
    fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has
    no member named 'bb_bitmap'.
    so, the definition of macro DOUBLE_CHECK should before
    'struct ext4_group_info', I fixed it, and I moved the macro
    AGGRESSIVE_CHECK together, because I think they shoule be together.

    Signed-off-by: Aihua Zhang
    Signed-off-by: Theodore Ts'o

    Aihua Zhang
     

10 Mar, 2016

1 commit

  • Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
    ext4_seek_data() iterates hole block-by-block. Fix the problem by using
    returned hole size from ext4_map_blocks() and thus skip the hole in one
    go.

    Update also SEEK_HOLE implementation to follow the same pattern as
    SEEK_DATA to make future maintenance easier.

    Furthermore we add cond_resched() to both ext4_seek_data() and
    ext4_seek_hole() to avoid softlockups in case evil user creates huge
    fragmented file and we have to go through lots of extents.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

09 Mar, 2016

5 commits

  • Remove counter of pending io ends as it is unused.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • When mapping blocks for direct IO, we allocate io_end structure before
    mapping blocks and store pointer to it in the inode. This creates a
    requirement that any AIO DIO using io_end must be protected by i_mutex.
    This created problems in the past with dioread_nolock mode which was
    corrupting io_end pointers. Also io_end is allocated unnecessarily in
    case where we don't need to convert any extents (which is a common case
    for example when overwriting file).

    We fix the problem by allocating io_end only once we return unwritten
    extent from block mapping function for AIO DIO (so we can save some
    pointless io_end allocations) and we pass pointer to it in bh->b_private
    which generic DIO code later passes to our end IO callback. That way we
    remove any need for global pointer to io_end structure and thus fix the
    races.

    The downside of this change is that the checking for unwritten IO in
    flight in ext4_extents_can_be_merged() is more racy since we now
    increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
    i_data_sem. However the check has been racy already before because
    ext4_writepages() already increment i_unwritten after dropping
    i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
    case.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
    describe what it does. Also split out get blocks functions for direct
    IO. Later we move functionality from _ext4_get_blocks() there. There's no
    functional change in this patch.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
    However the code cleanups that happened after 2011 when the lock was
    introduced made aio_mutex acquired at almost the same places where we
    already have exclusion using i_mutex. So just use i_mutex for the
    exclusion of unaligned AIO DIO.

    The change moves waiting for pending unwritten extent conversion under
    i_mutex. That makes special handling of O_APPEND writes unnecessary and
    also avoids possible livelocking of unaligned AIO DIO with aligned one
    (nothing was preventing contiguous stream of aligned AIO DIOs to let
    unaligned AIO DIO wait forever).

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
    Order entries better to avoid this and thus make the structure occupy
    64 instead of 72 bytes for 64-bit architectures.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

29 Feb, 2016

1 commit

  • When AIO DIO fails e.g. due to IO error, we must not convert unwritten
    extents as that will expose uninitialized data. Handle this case
    by clearing unwritten flag from io_end in case of error and thus
    preventing extent conversion.

    Signed-off-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Jan Kara
     

23 Feb, 2016

2 commits

  • Since old mbcache code is gone, let's rename new code to mbcache since
    number 2 is now meaningless. This is just a mechanical replacement.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The conversion is generally straightforward. The only tricky part is
    that xattr block corresponding to found mbcache entry can get freed
    before we get buffer lock for that block. So we have to check whether
    the entry is still valid after getting buffer lock.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

08 Feb, 2016

1 commit


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

09 Jan, 2016

1 commit